These design notes concern exposing the NBI data ( ) to users via workbench.
See also Shared data directories
git clone https://github.com/kaleoyster/ProjectNBI
python ./nbiCsvJsonConverter-2/Downloadv1.py
Creates directory NBIDATA containing the raw data
python3 ./nbiCsvJsonConverter-2/ProcessMain.py
Converts the data to JSON format
We can initially add the shared directory to Gluster and transfer or download the data directly.
For this project, the raw data is probably less of a concern than the Mongo data – which poses an interesting question about data sharing.
We could setup a globus endpoint or use Santiago?
We will host the raw data under /shared/NBIDATA/ as a read-only volume
We will also host an instance of MongoDB in the "public"? namespace with the official database
Users can access via nbidata.public
This will require running a process to ingest the data.
The "public" namespace can have no service timeouts
Mongo must be accessible to all namespaces, even after we apply network security policies.
Do we host a record describing this dataset
Data citation/identifiers
Is this a different version
How do we deal with access? In this case, there's nothing to worry about with this dataset, but in the future.
Can someone still use this in 5-10 years
How do we upgrade Mongo,
This is active data vs a live database
Metadata via Globus?
Need to point to FHWA and github repo
Github repo needs to be tagged/versioned.