Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Clowder can write messages directly to RabbitMQ
  • Lightweight Flask API we run in a python container that also connects to RabbitMQ
    • Other code can post datapoints to this API, that get forwarded to RabbitMQ

API design notes:

  • Seems lightweight enough for possibly a single generic endpoint to enqueue an item
    • Can expand to an endpoint-per-event-type as API evolves or as needs grow
    • Use Swagger from the start, as best practice... should make it easier to alter/scale/sync API server/clients when necessary
  • Requires authentication via API key or some other mechanism to prevent spam or potentially malicious fake messages
    • Cannot fetch / auth using Clowder (since this is for handling the case when Clowder is down)

Internal Clowder events service

For the user activity (Max's reporting part and Mike's Clickstream stuff basically) we can call an internal RabbitMQ service for the events that we want to capture, to generate datapoints.

Current (frontend) tracking:

  • Allows for configuration of Amplitude API key
  • If configured, tracking snippet added to every view (via index.html)
  • If configured, events tracked in the Javascript via amplitude.logEvent()
  • Events tracked:
    • Resource views (files, datasets, collections)
    • Submit file/dataset to extractor
    • File uploads

Proposed changes:

  • Allow for configuration of Amplitude API key (no change)
  • If configured, tracking snippet added to every view (no change)
  • For the tracked events above, call the new backend SinkService, which will check for configured integrations with Amplitude/Google Analytics/etc and delegate appropriately:
  • Bonus points: add a backend action that automatically tracks API calls and sends to the SinkService

Clowder health monitor(s)


The number of connections: It would be good to see how many connections to Clowder website. we can measure the number of connections within a period of time. We would analyze the Ngnix NGINX log to get those information.


componentevent typedata capturednotes

file uploaded

file deleted

fileid, datasetid, spaceid, bytes
extractionsextraction eventmessage, type (queued or working)do we care about data traffic downloaded to the extractor containers?

page views

resource downloads

url, resourceid


do we care about every page view? this is currently tracking which resources are being viewed but without the full url
healthping updateresponse time, queue length, other?