This page aggregates project ideas for Google Summer of Code 2021. Each of the projects listed below have one or more mentors associated. Please contact the mentors for more information about these projects. If you have generic questions about NCSA and GSOC please contact:

Galaxy images recommendation system

This project will create a recommendation system when visualizing galaxy images and make recommendations to the user on similar galaxies. The main idea is to run a Convolutional Neural Network (CNN) and/or autoencoders to rank galaxies from a large DB by their similarities to a query image. There is already work done for this projects and mentor will provide all the necessary images and tools to be used. Ideally this will end up in a web application itself or as a addition to DESaccess

Requirements

front-end - Polymer, HTML, Javascript, Python, Tornado

back-end - Python, Tensorflow, Scikit-learn, image manipulation, Machine learning

Deliverable

working plugin and recommendation system

Mentors

Links

Accelerating Neat: multiprocessing next-gen sequence simulator

NEATis a next gen sequence simulator widely used by the genomics community that currently only runs single threaded. Speeding up this program through multiprocessing will allow greater access and speed gains

Requirements

python

Deliverable

Multiprocessing for next-gen sequence simulator

Mentors

Links

Continuous Integration for NEAT

The NEAT project continues to gain momentum with the recently release Python 3 version. By incorporating continuous integration, we can improve NEAT while maintaining its consistency.

Requirements

Knowledge of CI platform such as Travis, Jenkins or similar. python.

Deliverable

NEAT has continuous integration for future development

Mentors

Links

Comparison of Genomic Data Sets

A key utility in processing genetic datasets in VCF format is the VCF_compare tool, a part of the Next-Gen Analysis Toolkit (NEAT). This tool, while useful, is outdated and needs to run smoothly and efficiently on modern high-speed processors.

Requirements

python

Deliverable

A functioning python3 VCF comparison tool with a faster walltime than the original

Mentors

Links

Modernize the DataWolf scientific workflow web frontend with React

Currently the DataWolf scientific workflow system has a web frontend for creating and executing workflows that is written in backbone-js. The goal is to begin modernizing the frontend with React with a focus on setting up existing workflows for execution through the web interface.

Requirements

Knowledge of React

Deliverable

React web app that can communicate with a DataWolf server and execute existing workflows.

Mentors

Links

Update CoverCrop Analyzer web service to use OpenAPI and document API.

Currently the CoverCrop Analyzer web service is written using Python Flask-RESTful. It needs to be improved by writing an OpenAPI 3.0 specification that includes authentication, and using packages like Python Connexion to do automatic validation. The main goals of this project are to generate API documentation, which can be rendered using Swagger UI and simplify the service code in the process, and update the Dockerfile accordingly.

Requirements

Knowledge of Python, OpenAPI, and Connexion (optional).

Deliverable

Improved CoverCrop Analyzer web service code that uses OpenAPI and Python Connexion and API documentation.

Mentors

Links

Parsl site checking

One of the most significant challenges when using high performance computing (HPC) systems is that of customizing software to match heterogeneous computing environments. In this project you will develop a user-focused tool for evaluating the configuration of the Parsl parallel programming library (parsl-project.org) for a specific target system. We will provide access to large scale HPC clusters and work with the student to explore troublesome configuration scenarios.

Requirements

Python

Deliverable

A Python tool for evaluating Parsl configuration on a HPC cluster

Mentors

Dan Katz

Kyle Chard (U Chicago)

Yadu Babuji (U Chicago)

Links

Training Deep Learning Models using Clowder

In this project we are looking for students to build out the capabilities of the system by simplifying the task of training deep learning models stored within the system as well as executing existing models developed with Tensorflow, Keras and Pytorch.

Requirements

Scala, Python, or Javascript

Deliverable

Modifications to Clowder’s core and supporting libraries (pyclowder) to simplify training and running deep learning models with Tensorflow, Keras and Pytorch.

Mentors

Links

Clowder and GeoSpatial Data

Extend the build-in geospatial capabilities of the system by adding support for geolocating datasets and extending the metadata search capabilities to geospatial queries.

Requirements

Scala, Python, or Javascript

Deliverable

Modifications to Clowder’s core to support geospatial queries and visualizations using MongoDB or Postgis databases.

Mentors

Links