...
Finally the file is marked as PROCESSED, and the user should be once again given the option to Archive the file and requests to download the file bytes should succeed.
Automatic File Archival
If configured (see below), Clowder can automatically archive files of sufficient size after a predetermined period of inactivity.
By default, files that are over 1MB and have not been downloaded in that last 90 days will be automatically archived.
Both the file size and the inactivity period can be configured according to your preferences.
Configuration Options / Defaults for Clowder
...
With the RabbitMQ plugin enabled, the following defaults are configured in application.conf, but can be overridden by using a custom.conf file:
Configuration Path | Default | Description |
---|---|---|
| false | If true, Clowder should perform a lookup once per day to see if any files uploaded the past hour are candidates for archival |
archiveDebug
false
. |
archiveExtractorId | "ncsa.archival.disk" | The id of the Extractor to use for archival
|
archiveAllowUnarchive | false | If true, the UI should offer a way to Unarchive a file that is ARCHIVED |
Automatic File Archival
If configured (see below), Clowder can automatically archive files of sufficient size after a predetermined period of inactivity.
By default, this behavior is disabled.
The default values after enabling the feature will cause files that are over 1MB and have not been downloaded in that last 90 days to be automatically archived.
Both the file size and the inactivity period can be configured according to your preferences.
Configuration Path | Default | Description |
---|---|---|
archiveAutoInterval | 0 | If == 0, disable automatic archiving. If > 0, check every |
archiveAutoDelay | 120 | Number of seconds to wait before starting the first iteration of the automatic archival loop. |
archiveAutoAfterInactiveCount | 90 | Number of units a file can go un-downloaded before it is considered "inactive". |
archiveAutoAfterInactiveUnits | days | The units for the inactivity timeout above (e.g. "90 days" old) |
archiveAutoAboveMinimumStorageSize | ||
archiveAutoAfterDaysInactive | 90 | The number of days that an item can go without being downloaded before it is automatically archived. |
1000000 | The minimum number of bytes for a file to be considered as a candidate for automatic archival. |
ncsa.archival.disk
This image has been pre-built as clowder/extractors-archival-disk
.
(Optional) Building the Image
To build the Disk archival extractor's Docker image, execute the following commands:
...
The following configuration options must match your configuration of the DiskByteStorageDriver:
Environment Variable | Command-Line Flag | Default Value | Description |
---|---|---|---|
ARCHIVE_SOURCE_DIRECTORY | --archive-source | $HOME/clowder/data/uploads/ | The current directory where Clowder stores it's uploaded files |
ARCHIVE_TARGET_DIRECTORY | --archive-target | $HOME/clowder/data/archive/ | The target directory where the archival extractor should store the files that it archives. Note that this path can be on a network or other persistent storage. |
Example Configuration: Archive to another folder
...
Code Block |
---|
# storage driver service.byteStorage=services.filesystem.DiskByteStorageService # disk storage path #clowder.diskStorage.path="/Users/lambert8/clowder/data" # MacOSX clowder.diskStorage.path="/home/clowder/clowder/data" # Linux # disk archival settings archiveEnabled=true archiveDebug=false archiveExtractorId="ncsa.archival.disk" archiveAutoAfterDaysInactive archiveAllowUnarchive=true archiveAutoInterval=86400 archiveAutoDelay=300 archiveAutoAfterInactiveCount=90 archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000 |
To run the Disk archival extractor with this configuration:
...
NOTE 2: on MacOSX, you may need to run the extractor with the --net=host
option to connect to RabbitMQ.
ncsa.archival.s3
This image has been pre-built as clowder/extractors-archival-s3
.
(Optional) Building the Image
To build the S3 archival extractor's Docker image, execute the following commands:
...
The following configuration options must match your configuration of the S3ByteStorageDriver:
Environment Variable | Command-Line Flag | Default Value | Description |
---|---|---|---|
AWS_S3_SERVICE_ENDPOINT | --service-endpoint <value> | https://s3.amazonaws.com | Which AWS Service Endpoint to use to connect to S3. Note that this may depend on the region used, but can also be used to point at a running MinIO instance. |
AWS_ACCESS_KEY | --access-key <value> | "" | The AccessKey that should be used to authorize with AWS or MinIO |
AWS_SECRET_KEY | --secret-key <value> | "" | The SecretKey that should be used to authorize with AWS or MinIO |
AWS_BUCKET_NAME | --bucket-name <value> | clowder-archive | The name of the bucket where the files are stored in Clowder. |
AWS_REGION | --region <value> | us-east-1 | AWS only: the region where the S3 bucket exists |
AWS_ARCHIVED_STORAGE_CLASS | --archived-storage-class <value> | INTELLIGENT_TIERING | The S3 StorageClass to set for objects that are ARCHIVED. |
AWS_UNARCHIVED_STORAGE_CLASS | --unarchived-storage-class <value> | STANDARD | The S3 StorageClass to set for objects that are not archived (aka PROCESSED). |
Example Configuration: S3 on AWS in us-east-2 Region
...
Code Block |
---|
# storage driver service.byteStorage=services.s3.S3ByteStorageService # AWS S3 clowder.s3.serviceEndpoint="https://s3-us-east-2.amazonaws.com" clowder.s3.accessKey="AWSACCESSKEYKASOKD" clowder.s3.secretKey="aWSseCretKey+asAfasf90asdASDADAOaisdoas" clowder.s3.bucketName="bucket-on-aws" clowder.s3.region="us-east-2" # diskS3 archival settings archiveEnabled=true archiveDebug=false archiveExtractorId="ncsa.archival.s3" archiveAutoAfterDaysInactive archiveAllowUnarchive=true archiveAutoInterval=86400 archiveAutoDelay=300 archiveAutoAfterInactiveCount=90 archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000 |
NOTE: Changing the Region typically requires changing the S3 Service Endpoint.
...
Code Block |
---|
# storage driver service.byteStorage=services.s3.S3ByteStorageService # Minio S3 clowder.s3.serviceEndpoint="http://localhost:8000" clowder.s3.accessKey="AMINIOACCESSKEYKASOKD" clowder.s3.secretKey="aMinIOseCretKey+asAfasf90asdASDADAOaisdoas" clowder.s3.bucketName="bucket-on-minio" # S3 archival settings archiveEnabled=true archiveDebug=false archiveExtractorId="ncsa.archival.s3" archiveAutoAfterDaysInactive archiveAllowUnarchive=true archiveAutoInterval=86400 archiveAutoDelay=300 archiveAutoAfterInactiveCount=90 archiveMinimumStorageSizearchiveAutoAfterInactiveUnits=1000000days archiveAllowUnarchivearchiveAutoAboveMinimumStorageSize=true1000000 |
NOTE: MinIO ignores the value for "Region", if one is specified.
...
Code Block |
---|
docker run --net=host -itd -e AWS_S3_SERVICE_ENDPOINT='http://localhost:8000' -e AWS_ACCESS_KEY='AMINIOACCESSKEYKASOKD' -e AWS_SECRET_KEY='aMinIOseCretKey+asAfasf90asdASDADAOaisdoas' -e AWS_BUCKET_NAME='bucket-on-minio' -e AWS_ARCHIVED_STORAGE_CLASS='REDUCED_REDUNDANCY' clowder/extractors-archival-s3 |
...
Code Block | ||
---|---|---|
| ||
clowder.rabbitmq.uri="amqp://guest:guest@<PRIVATE IP>:5672/%2F" clowder.rabbitmq.exchange="clowder" clowder.rabbitmq.clowderurl="http://<PRIVATE IP>:9000" |
Gotcha: extractor complains about Python's built-in Thread.isAlive()
, and dies quickly after starting
pyclowder has an open issue here regarding a minor incompatibility with Python 3.9
...