Overview

There are several parallel efforts to capture information about Clowder metrics:

...

The goal is to minimize number of moving parts to capture and store this data. Below is summary of our discussion from 12/7.

Recon / Planning Phase

RabbitMQ Queue & Flask API

...

Clowder can write messages directly to RabbitMQ
Lightweight Flask API we run in a python container that also connects to RabbitMQ
- Other code can post datapoints to this API, that get forwarded to RabbitMQ

Flask API design notes - ideally these endpoints also match the calls on the new backend SinkService:

Seems lightweight enough for possibly a single generic endpoint to enqueue an item
- Can expand to an endpoint-per-event-type as API evolves or as needs grow
- Use Swagger from the start, as best practice... should make it easier to alter/scale/sync API server/clients when necessary
Requires authentication via API key or some other mechanism to prevent spam or potentially malicious fake messages
- Cannot fetch / auth using Clowder (since this is for handling the case when Clowder is down)

...

Let's consider some different types of events. Assume user and timestamp for all data captured too.

component	event type	data captured	notes
storage	file uploaded file deleted	fileid, datasetid, spaceid, bytes
extractions	extraction event	message, type (queued or working)	do we care about data traffic downloaded to the extractor containers?
traffic	page views resource downloads	url, resourceid bytes	do we care about every page view? this is currently tracking which resources are being viewed but without the full url
health	ping update	response time, queue length, other?

The Full Picture

Image Added

Legend

Orange: existing technology
Blue: likely a new piece that needs to be written

Refactoring RabbitMQ

Clowder currently has RabbitMqPlugin.scala which contains logic for submitting files and datasets to the appropriate queues in RabbitMQ for scheduling extractions. A client (e.g. pyClowder) can then subscribe to the queue(s) to systematically work through related extraction jobs in a scalable fashion.

We want to preserve the above relationship, while making the underlying code slightly more generalized such that we can use it send an arbitrary message to an arbitrary queue in the configured RabbitMQ instance.

This new generalized service (name TBD, but for now I will call this RabbitMqService.scala) can be used by the existing RabbitMqPlugin.scala (or optionally a simplified rewrite) to submit jobs to extractors in the same way. The new RabbitMqService.scala can also be utilized by new code to send events to the event sink system.

The Event Sink System

As described above, the "event sink" is simply a special exchange in RabbitMQ that is configured to fanout to multiple queues. This way we can siphon different types of events into specific queues based on the intended target.

The plan is to create a set of very thin AMQP event worker clients, which will start up and subscribe to the queue(s) to systematically work through delivering the metrics/event payloads to interested parties.

Emergency Flask API

In order to track events surrounding Clowder downtime, we should offer an alternative/emergency API (via Flask, or similar). This way if Clowder is down or has an outage, we can still continue to collect metrics data by using the exposed Flask API endpoint(s) to submit to the event sink exchange in RabbitMQ, as described above.

The Flask API only needs to be instrumented to collect/submit events/metrics related to Clowder uptime/downtime, maintenance, and outages.

The Flask API only needs to submit directly to RabbitMQ, and does not need to integrate directly with Clowder itself.

Event Workers

MongoDB Worker

A debug worker that will simply echo the given message into a new collection in MongoDB.

This can be used for debugging purposes and to provide a simple pattern/template for creating future event workers.

InfluxDB Worker

Our centralized receiver that will receive all events of all types and parse them into InfluxDB.

The ultimate goal is to use Grafana/InfluxDB as a centralized place from which to view/store all metrics/event data. Grafana then allows us to build graphs, dashboards, and alerts based off of these metrics

Experimental Workers

The tricky part here is that typically these types of analytics/clickstream services tend to integrate directly with the user's browser via JavaScript, and not via a backend service. We have no guarantee that these proposed workers will function as described, but we are hopeful that the resources offered by the associated SDKs will allow for them to fit into our solution.

Amplitude Worker

Amplitude supposedly offers a REST API for submitting events directly from the server-side:

https://help.amplitude.com/hc/en-us/articles/115000959052-For-Developers-Getting-Started#h_c5787e3e-5f2a-450e-b2b1-d51a4859e903

If possible, this will send UI-driven or event-related metrics to Amplitude to track user analytics:

Demographic (e.g. Age/Location/Language)
Device Info (e.g. Hardware/OS/Browser)
User Interactions (e.g. Page Views/Events)

Google Analytics Worker

Google Analytics supposedly once offered an API for submitting events directly from the server-side:

https://cloud.google.com/appengine/docs/standard/php7/integrating-with-analytics#server-side_analytics_collection

Unfortunately, briefly attempting to follow the links above yielded some 404 errors, so some exploratory work will be necessary here.

If possible, this will send UI-driven or event-related metrics to Google Analytics to track user analytics:

Demographic (e.g. Age/Location/Language)
Device Info (e.g. Hardware/OS/Browser)
User Interactions (e.g. Page Views/Events)

Space shortcuts

Page tree

Versions Compared

Old Version 6

New Version Current

Key

Table of Contents

Overview

Recon / Planning Phase

The Full Picture

Refactoring RabbitMQ

The Event Sink System

Emergency Flask API

Event Workers

MongoDB Worker

InfluxDB Worker

Experimental Workers

Amplitude Worker

Google Analytics Worker

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 6

New Version Current

Key

Table of Contents

Overview

Recon / Planning Phase

The Full Picture

Refactoring RabbitMQ

The Event Sink System

Emergency Flask API

Event Workers

MongoDB Worker

InfluxDB Worker

Experimental Workers

Amplitude Worker

Google Analytics Worker