Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There are two possible architectures:

Current "Standard" Approach: Create a Box client application that will handle the additional functionality not supported by Box Skills (e.g. polling, metadata mapping) involved with interacting with Fence, likely using bd.py:

draw.io Diagram
bordertrue
viewerToolbartrue
fitWindowfalse
diagramNameBoxSkillsWithBD.py
simpleViewerfalse
diagramWidth899
revision1


Box Tuned/Potentially Streamlined: Enhance The first one is to enhance pyclowder to enable it to download files and upload metadata to Box in addition to clowder (eliminating data movement that isn't required). Eventually support would be added to Google Drive and Dataverse, etc.

...

When a Skill is registered with a Box account, the invocation URL is provided. This URL will resolve to an endpoint in Fence.

The second architecture is to create a Box client application that will interact with Brown Dog using bd.py.

draw.io DiagrambordertrueviewerToolbartruefitWindowfalsediagramNameBoxSkillsWithBD.pysimpleViewerfalsediagramWidth899revision1



Comparison of Approaches

Box in PyclowderTuned/Potentially StreamlinedCurrent "Standard" Approachbd.py

File is downloaded once from box to the extractor container

File is lives in extractor and deleted at end of process_message by PyClowder (is this correct?)

If using the /extractions/file endpoint the file is transferred three times:

  1. From Box to BoxClient
  2. From BoxClient to Clowder (via fence)
  3. From Clowder to extractor container

If using the /extractions/url endpoint the file is trasferred transferred ?? times:

  1. ??

File lives in clowder until the cleanup script is run

Box SDK has to be introduced into Pyclowder library. Any other repos we want to support would also have to be included

.

Burden of maintenance/adding new supported external services on our end vs on the client's end

Is Pyclowder the right place for this?  PyClowder is just a convenient wrapper mechanism for creating "some" extractors that happen to be written/wrapped in python

Box SDK only lives in the BoxClient service. No changes are required for pyclowderelsewhere
Custom metadata structure for box would be implemented in the extractor.An automated translation of clowder metadata to box skills cards would have to be developed.

Potential Bottlenecks for massive scaling:

  1. Fence
  2. RabbitMQ
  3. Extractors

Notes:

Everything apart from Rabbit is stateless and can be horizontally scaled.

Potential Bottlenecks for massive scaling:

  1. Fence
  2. RabbitMQ
  3. Extractors
  4. BoxClient
  5. Clowder
  6. MongoDB

Notes:

We can't rely on threading in the BoxClient to do the polling since we would risk running out of threads.

Would need to add some endpoints to fence

Would need to deploy a new service and proxy it behind Apache.

BoxClient would need to be allocated a service account and handle Brown Dog tokens

Limited error logging and reporting

Unsure about retry if an extraction fails

Eventually we could create an app where user logs in via their Box credentials to see the history of extractions

Errors can be reported in Clowder (not visible to the Box user)

The BoxClient could potentially retry

...