Overview
- https://www.datadrivenag.org/
26-27 February 2018
- Hackathons will run 6:00 - 7:30 pm
Resources:
Tentative plan
- Day 1:
- Roman:
- Timeseries of plants in AZ
- Kinship data (1400 time series from AZ) and matrix (300 genotypes in fields), 85 measurements
- Need data from BETYdb (height histogram)
- Octave and Python scripts to get started
- David:
- SWIR prediction problem
- NREL training data
- As many spectra as we can get (we have 5)
- Daily measurements over the season – we may only have some made by Solmaz on one day
- 2500 solar spectra, 10nm resolution
- Roman:
- Day 2:
- Jack
- Commercial small satellite data (Planet Labs), licensed data
- Good temporal resolution coincides with sampling on field
- ~270 images
- Python notebooks for exploratory analysis
- QGIS for satellite data (ideally using geoserver or similar for desktop access)
- Jack
Workbench requirements
- SDSC Workbench supporting up to 50 users (2 core x GB RAM)
- www.workshop1.nationaldataservice.org (TLS/DNS)
- Data mounted via NFS /data/ and symlinked to ~/data for all containers
- Images should pre-checkout tutorial materials
- Shared data mounted via usual Gluster
- Disable approval
- No timeouts?
- Cloud9 upload size limits?
- Pre-load accounts
Deliverables
Deliverable | Status | Notes | |
---|---|---|---|
Workbench instance | Deployed master+gfs+1 at SDSC as https://www.datadrivenag.ndslabs.org. Scaled up to 4 nodes (48 cores, 128G) | ||
Move DNS/TLS for workshop1.nationaldataservice.org | Done | ||
In-cluster BETY-db instance | Done | ||
Day 1/Roman | Timeseries data | Done | |
Day 1/Roman | Kinship matrix | TBD | |
Day 1/Roman | NREL data | Available under /data/shared/roman/nrel | |
Day 1/Roman | Day 1 Octave script | In github | |
Day 1/Roman | Day 1 Python notebook | In github | |
Day 1/Roman | Octave environment | Done | |
Day 1/Roman | Python environment | Done | |
Day1/David | Spectra | ||
TERRA-REF subset | Week of 6/18 Level_1 vnir_netcdf, laser3d_mergedlas, fullfield, rgb_geotiff, | ||
UAV data | Done | ||
Planet Labs data | Won't do | ||
Day 2 notebooks | Won't do | ||
Geoserver | Done | ||
Pre-load user accounts | Done |
What we should have working (2/26/2018)
- Main deliverables
- https://www.workshop1.nationaldataservice.org/
- Main workbench instance
- https://github.com/datadrivenag
- https://www.workshop1.nationaldataservice.org/
- DNS pointing to 132.249.238.220 with valid TLS for www.workshop1.nationaldataservice.org
- Nagios monitoring http://132.249.238.214/nagios/
- workshop1-node1 - node10 + www.workshop1.nationaldataservice.org
- User accounts
- Scaled cluster
- 1 master + etcd
- 1 NFS server w/ 3TB volume
- 2 GLFS servers
- 10 compute nodes m1.2xlarge w/ 100GB docker volume each
- Customized UI
- Included NIFA/USDA logos
- Link to user guide
- Link to tutorial repo
- Description makes sense to DataDrivenAg users
- Link to contact us if they want to run their own hackathon
- Data available via Geoserver
- See https://github.com/datadrivenag/tutorials
- BETY db plot boundaries
- Sample of UAV data
- Sample of RGB fullfield data
- Example RGB pyramid
- Custom environments
- See https://github.com/datadrivenag/tutorials
- Cloud9 with Octave/GCC, etc
- Jupyter with Octave kernel
- RStudio geospatial
- PostgresSQL studio
- XPRA
- Sample data
- Mounted readonly in all containers under /data/
- UAV:
- sites/ua-mac/Level_1/uav
- UAV:
- Some available via geoserver (above)
- Available via direct HTTP, in case needed
- Mounted readonly in all containers under /data/
Setup and support log
- 2/6/2018
- Initial conference call with organizers
- 2/9/2018
- Sent new estimates based on increased scope based on organizers meeting
- Confirmed resource availability at SDSC
- 2/12/2018
- Decided to go foward with instance at SDSC
- Began provisioning instance. This workshop is special since we'll be setting up an external NFS server and transferring ~2+TB of sample data for users
- Created NFS instance manually (in hindsight, should've just made it a labeled node) and basic master+GFS (2) + node (1) Kubernetes cluster
- Setup local Globus personal endpoint on NFS server to handle transfer from TERRA-REF endpoint
- 2/13/2018
- Setup datadrivenag.slack.com and github.com/datadrivenag organizations
- 2/15/2018
- Extended cloud9all image to include gcc and octave
- After some coordination with organizers, began transfer of sample data (~1 week duration). Babysit transfers, troubleshoot performance problem, resolve permission and other issues
- 2/16/2018
- Basic Geoserver configuration to run under Kubernetes
- Requested DNS change
- Began investigation of how to setup/scale geoserver (Geoserver on Steroids –
https://www.slideshare.net/geosolutions/geoserver-on-steroids)
- Updated Jupyter image to include octave kernel, provided sample notebook for Roman's plant height estimation
- Had to scale up NFS volume size from 2TB to 3TB to support RGB, LAS and VNIR data
- 2/21 - 2/22/2018
- Began tracking down UAV data. Loaded UAV data into endpoint and Geoserver
- Had to transfer via Drive due to lack of access to Globus endpoint
- 2/23/2018
- Scaled up cluster to 4 nodes (48cores) with monitoring
- Confirmed geoserver can indeed scale horizontally (scaled RC), but appears to present problem to UI session?
- Created simple NGINX server to allow users to browse data for download
- 2/24/2018
- Had to troubleshoot networking issue (flannel subnet bug)
- Fixed Jupyter image dependency problems for terrautils
- Tried to create example code to get sensor files by plot (sensorquery API)
- Updated github repo, documentation, created videos for Workbench, QGIS and Xpra
- etcd2 error on master suggests possible problems with slow filesystem at SDSC.