Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Finally the file is marked as PROCESSED, and the user should be once again given the option to Archive the file and requests to download the file bytes should succeed. 

Automatic File Archival

If configured (see below), Clowder can automatically archive files of sufficient size after a predetermined period of inactivity. 

By default, files that are over 1MB and have not been downloaded in that last 90 days will be automatically archived.

Both the file size and the inactivity period can be configured according to your preferences.


Configuration Options / Defaults for Clowder

...

With the RabbitMQ plugin enabled, the following defaults are configured in application.conf, but can be overridden by using a custom.conf file:

Configuration PathDefaultDescription

archiveEnabled

falseIf true, Clowder should perform a lookup once per day to see if any files uploaded the past hour are candidates for archival
.archiveDebugfalseIf true, Clowder should temporarily use "5 minutes" as the archive check interval (instead of once per day)
.
In addition, it only considers candidate files that were uploaded in the past hour.
archiveExtractorId"ncsa.archival.disk"

The id of the Extractor to use for archival

  • Use ncsa.archival.disk  for DiskByteStorageDriver 
  • Use ncsa.archival.s3  for S3ByteStorageDriver

archiveAllowUnarchivefalseIf true, the UI should offer a way to Unarchive a file that is ARCHIVED

Automatic File Archival

If configured (see below), Clowder can automatically archive files of sufficient size after a predetermined period of inactivity. 

By default, this behavior is disabled.

The default values after enabling the feature will cause files that are over 1MB and have not been downloaded in that last 90 days to be automatically archived.

Both the file size and the inactivity period can be configured according to your preferences.

archiveMinimumStorageSize
Configuration PathDefault

Description

archiveAutoInterval0

If == 0, disable automatic archiving.

If > 0, check every interval seconds for candidates for automatic archival.

archiveAutoDelay120Number of seconds to wait before starting the first iteration of the automatic archival loop. 
archiveAutoAfterInactiveCount90Number of units a file can go un-downloaded before it is considered "inactive".
archiveAutoAfterInactiveUnitsdaysThe units for the inactivity timeout above (e.g. "90 days" old)
archiveAutoAboveMinimumStorageSize
archiveAutoAfterDaysInactive90The number of days that an item can go without being downloaded before it is automatically archived.
1000000

The minimum number of bytes for a file to be considered as a candidate for automatic archival.


ncsa.archival.disk

This image has been pre-built as clowder/extractors-archival-disk .

(Optional) Building the Image

To build the Disk archival extractor's Docker image, execute the following commands:

...

The following configuration options must match your configuration of the DiskByteStorageDriver:

Environment VariableCommand-Line FlagDefault ValueDescription
ARCHIVE_SOURCE_DIRECTORY--archive-source$HOME/clowder/data/uploads/The current directory where Clowder stores it's uploaded files
ARCHIVE_TARGET_DIRECTORY--archive-target$HOME/clowder/data/archive/The target directory where the archival extractor should store the files that it archives. Note that this path can be on a network or other persistent storage.

Example Configuration: Archive to another folder

...

Code Block
# storage driver
service.byteStorage=services.filesystem.DiskByteStorageService

# disk storage path
#clowder.diskStorage.path="/Users/lambert8/clowder/data"    # MacOSX
clowder.diskStorage.path="/home/clowder/clowder/data"      # Linux

# disk archival settings
archiveEnabled=true
archiveDebug=false
archiveExtractorId="ncsa.archival.disk"
archiveAutoAfterDaysInactive
archiveAllowUnarchive=true

archiveAutoInterval=86400
archiveAutoDelay=300
archiveAutoAfterInactiveCount=90
archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days
archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000

To run the Disk archival extractor with this configuration:

...

NOTE 2: on MacOSX, you may need to run the extractor with the --net=host option to connect to RabbitMQ.

ncsa.archival.s3

This image has been pre-built as clowder/extractors-archival-s3 .

(Optional) Building the Image

To build the S3 archival extractor's Docker image, execute the following commands:

...

The following configuration options must match your configuration of the S3ByteStorageDriver:

Environment VariableCommand-Line FlagDefault ValueDescription
AWS_S3_SERVICE_ENDPOINT--service-endpoint <value>https://s3.amazonaws.comWhich AWS Service Endpoint to use to connect to S3. Note that this may depend on the region used, but can also be used to point at a running MinIO instance.
AWS_ACCESS_KEY--access-key <value>""The AccessKey that should be used to authorize with AWS or MinIO
AWS_SECRET_KEY--secret-key <value>""The SecretKey that should be used to authorize with AWS or MinIO
AWS_BUCKET_NAME--bucket-name <value>clowder-archiveThe name of the bucket where the files are stored in Clowder.
AWS_REGION--region <value>us-east-1AWS only: the region where the S3 bucket exists
AWS_ARCHIVED_STORAGE_CLASS --archived-storage-class <value> INTELLIGENT_TIERING The S3 StorageClass to set for objects that are ARCHIVED.
AWS_UNARCHIVED_STORAGE_CLASS --unarchived-storage-class <value> STANDARD The S3 StorageClass to set for objects that are not archived (aka PROCESSED).

Example Configuration: S3 on AWS in us-east-2 Region

...

Code Block
# storage driver
service.byteStorage=services.s3.S3ByteStorageService

# AWS S3
clowder.s3.serviceEndpoint="https://s3-us-east-2.amazonaws.com"
clowder.s3.accessKey="AWSACCESSKEYKASOKD"
clowder.s3.secretKey="aWSseCretKey+asAfasf90asdASDADAOaisdoas"
clowder.s3.bucketName="bucket-on-aws"
clowder.s3.region="us-east-2"

# diskS3 archival settings  
archiveEnabled=true
archiveDebug=false
archiveExtractorId="ncsa.archival.s3"
archiveAutoAfterDaysInactive
archiveAllowUnarchive=true

archiveAutoInterval=86400
archiveAutoDelay=300
archiveAutoAfterInactiveCount=90
archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days
archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000

NOTE: Changing the Region typically requires changing the S3 Service Endpoint.

...

Code Block
# storage driver
service.byteStorage=services.s3.S3ByteStorageService

# Minio S3
clowder.s3.serviceEndpoint="http://localhost:8000"
clowder.s3.accessKey="AMINIOACCESSKEYKASOKD"
clowder.s3.secretKey="aMinIOseCretKey+asAfasf90asdASDADAOaisdoas"
clowder.s3.bucketName="bucket-on-minio"

# S3 archival settings  
archiveEnabled=true
archiveDebug=false
archiveExtractorId="ncsa.archival.s3"
archiveAutoAfterDaysInactive
archiveAllowUnarchive=true

archiveAutoInterval=86400
archiveAutoDelay=300
archiveAutoAfterInactiveCount=90
archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days
archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000

NOTE: MinIO ignores the value for "Region", if one is specified.

...

Code Block
docker run --net=host -itd -e AWS_S3_SERVICE_ENDPOINT='http://localhost:8000' -e AWS_ACCESS_KEY='AMINIOACCESSKEYKASOKD' -e AWS_SECRET_KEY='aMinIOseCretKey+asAfasf90asdASDADAOaisdoas' -e AWS_BUCKET_NAME='bucket-on-minio' -e AWS_ARCHIVED_STORAGE_CLASS='REDUCED_REDUNDANCY' clowder/extractors-archival-s3

...

Code Block
languagetext
clowder.rabbitmq.uri="amqp://guest:guest@<PRIVATE IP>:5672/%2F"
clowder.rabbitmq.exchange="clowder"
clowder.rabbitmq.clowderurl="http://<PRIVATE IP>:9000"

Gotcha: extractor complains about Python's built-in Thread.isAlive(), and dies quickly after starting

pyclowder has an open issue here regarding a minor incompatibility with Python 3.9

...