Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-842

Experiment with running Kubernetes jobs in a multi-node environment


    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None
    • NDS Sprint 25, NDS Sprint 26, NDS Sprint 27, NDS Sprint 28

      For BioCADDIE, it would be nice to have a more efficient way of scheduling baseline runs across multiple compute resources. Kubernetes allows us to schedule run-to-completion jobs. Using the current baseline scripts in the nds-org/biocaddie repo, attempt to use our current sed strategy to schedule a particular run with a particular parameter combination.

      For KnowEnG, it would be nice to see if there is a way to replace the current method of running their genomics pipelines on Kubernetes, which would be a bit more scalable than the current Mesos / Chronos setup they have currently. These jobs are often complex DAGs (directed acyclical graphs) that already Dockerized, in some form. More information may be needed, as these may need to be handled on a case-by-case basis. Worst case, we may need to look into something slightly more complex, using tools like Celery + RabbitMQ. Ideally, we wouldn't need to change the pipelines themselves, but without doing so any new solution may have the same shortcomings as the old one.

      For Labs Workbench, it would be nice to be able to find a way to, via the Labs Workbench API, schedule such finite-time jobs on a Kubernetes cluster that's also running Workbench. This could be as simple as a pass-through where we pipe an image and a command, or something more complex. More information may be needed on what use cases we would like to support with this feature, if any. NOTE: there may be security risks entailed in exposing this functionality publicly, even if it is (weakly) protected by JWT auth (granted, the same could be said about the services currently run by the platform)

      For DataDNS: we will need some backend system to handle the "repository crawler" aspect of the proposal. This framework might fit this need quite nicely, as well as allowing it to run in Labs Workbench. This is more "concept" than "proof-of-concept".

              lambert8 Sara Lambert
              lambert8 Sara Lambert
              0 Vote for this issue
              3 Start watching this issue


                  Original Estimate - 6 hours
                  Remaining Estimate - 6 hours
                  Time Spent - Not Specified
                  Not Specified