Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the case of the hyperspectral extractor development, the hyperspectral data is too large to move so the researcher/developer working on related algorithms needs to work on data in place.  This is done via a NetCDF/NCO-specific development environment, but could just as easily by Jupyter or RStudio. If the researchers needs exceed container constraints, they can apply for dedicated cloud resources via OpenStack.  They also have access to run jobs on ROGER via PBS. This is similar to Cyverse.

...

Fruend Case (CSE UCSD, Kevin)

  • Data analysis using Juypter Notebooks and Spark for the student Capstone Projects in the MAS Data Science and Engineering program.
  • Launches Spark Clusters using Elastic Map Reduce (EMR) on AWS.
    • When the EMR cluster is created a bootstrap script is passed to the cluster to install and configure Juypter.
  • Uses the Flask Python framework to launch a locally running web server to make it easy to configure AWS credentials and launch the EMR Cluster.
  • Students do work in the Jupyter notebooks by connecting to the EMR SparkContext using pyspark.
  • Python libraries are included that make it easy to copy data to/from S3, HDFS and the local filesystem on the Spark master.
  • https://github.com/mas-dse/spark-notebook
  • Does work in Jupyter notebooks
  • Launch Spark clusters with Jupyter and all the python libraries and access to the data
  • Run on AWS