Page History

Overall goal is to add a new group of Mongo collections to track usage/user activity on a per-resource basis:

views & downloads
- files
- datasets
  - datasets can be downloaded as a zip file which include complete contents of the dataset. it is important to decide whether dataset metrics correspond to the dataset itself (e.g. that zip file downloaded N times) vs. a metric indicating the sum of downloads for all files in the dataset (e.g. files in this dataset were downloaded N times individually). the database model below suggests the former option, but the latter statistic could be generated on-the-fly with a slightly more complex query if it is desired.
- collections
  - see above - database would track views of the collection page itself, not act as a sum of all views of files within the collection.
distinction between agents
- logged in users via GUI and API
- user API keys
- extractors
- essentially want to differentiate intentional "human" access (even via API, e.g. from external processes) vs. automated access (e.g. from an internal extractor such as preview generator) for accurate usage tracking & "last used" information
- one option would be to implement a separate download endpoint intended for extractors that does not increment metrics and encourage developers to use that endpoint for any extractors that are not relevant to behavior tracking
ability to export simple report
- per-file statistics
  - bytes
  - path on disk
  - access count (different from simple views)
  - last accessed
- potentially filter report by "only objects that have been accessed > X months ago" to avoid massive reports
- start with a downloadable CSV output
- longer-term, could adapt aspects of search results page to show list of resources including files, datasets, etc. could introduce a more compact view than current list view for administrator viewing

With those goals in mind, one initial implementation:

implement 2 new collections in Mongo
- StatisticTotals - total views & downloads for each resource, including timestamp for last viewed and last downloaded. for driving use case, downloads/"access" is more important than page views which don't necessarily represent engagement.
- StatisticUser - views & downloads on a per-user basis. this would include via the GUI and API calls using the user's API key, with ability to exclude automated extractors from these statistics even if they use a user API key to fetch data.
Each collection will track views & downloads for files and datasets, and views for collections.

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version 4

Key