Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overall goal is to add a new group of Mongo collections to track usage/user activity on a per-resource basis:

  • views & downloads
    • files
    • datasets
      • datasets can be downloaded as a zip file which include complete contents of the dataset. it is important to decide whether dataset metrics correspond to the dataset itself (e.g. that zip file downloaded N times) vs. a metric indicating the sum of downloads for all files in the dataset (e.g. files in this dataset were downloaded N times individually). the database model below suggests the former option, but the latter statistic could be generated on-the-fly with a slightly more complex query if it is desired.
    • collections
      • see above - database would track views of the collection page itself, not act as a sum of all views of files within the collection. 

  • distinction between agents
    • logged in users via GUI and API
    • user API keys
    • extractors
    • essentially want to differentiate intentional "human" access (even via API, e.g. from external processes) vs. automated access (e.g. from an internal extractor such as preview generator) for accurate usage tracking & "last used" information
    • one option would be to implement a separate download endpoint intended for extractors that does not increment metrics and encourage developers to use that endpoint for any extractors that are not relevant to behavior tracking

  • ability to export simple report
    • per-file statistics
      • bytes
      • path on disk
      • access count (different from simple views)
      • last accessed
    • potentially filter report by "only objects that have been accessed > X months ago" to avoid massive reports
    • start with a downloadable CSV output
    • longer-term, could adapt aspects of search results page to show list of resources including files, datasets, etc. could introduce a more compact view than current list view for administrator viewing

With those goals in mind, one initial implementation:

  • implement 2 new collections in Mongo
    • StatisticTotals - total views & downloads for each resource, including timestamp for last viewed and last downloaded. for driving use case, downloads/"access" is more important than page views which don't necessarily represent engagement.
    • StatisticUser - views & downloads on a per-user basis. this would include via the GUI and API calls using the user's API key, with ability to exclude automated extractors from these statistics even if they use a user API key to fetch data.
  • Each collection will track views & downloads for files and datasets, and views for collections.