Attendees:

  • Mike L,
  • David R,
  • Kenton M

Subject:    WBS Planning re: NDS infrastructure

Questions and Answers

Note:  some Q/A are paraphrased

  • What things do we need to have in our work plan for after NDSC
    • Not sure
    • Data integration for tool developers
    • Inter-cluster services
      • search
      • NDS-global (multi-cluster) integration
    • NOTES:  
      • Data integration means supportive infrastructure for data access in services, access to metadata about collections, data-sets, and data access methods outside of the "normal" application path - i.e. a "back-door" to access data in the cluster that does not require modification of the tool/service but would require the tool/service provider to provide data-access "plug-ins" that provide schema and data access methods to the system services - which in turn provide them outward to clients 

  • What plan?
    • 2 year detailed work plan
  • Who is the plan for?
    • Ed (Seidel) - But the TAC will approve it
    • If we don't have a plan, we could get budget cut
  • How long does it take to add more services?
    • Highly variable - depends on the tool/service, and d epends on the implmentation - if it's simple to containerize it should be fairly easy,  if not then it could be an involved effort 
    • NOTES: We do not yet have enough experience to know how long any particular tool will take to integrate. Simple tools (example: owncloud) should be very quick, Complex aggregates (example: dataverse) could be very involved
  • How are new services added?
    • There are service catalogs published on the web (github), and the project admincan point their project to include them.  NDSL will have a "standard" service catalog
  • Where is the catalog manager?
    • Not done, currently the dev environment let's a developer "prime" their project with service descriptions, so they don't need to perform the full workflow to publish and consume  something they're working on
  • Difference between the PADM and the CCD
    • CCD is the project manager service deployment interface - The GUI for NDSC

    • PADM is larger, CCD is one part of it.

    • PADM includes all project-manager tools - resource provisioning for volumes and storage, and has project monitoring tools

  • Where is PADM?
    • Not done yet, just the CCD portion is done

  • But we have ELK done right?

    • We have ELK implemented, and kubernetes provides other monitoring tools, but we have not worked with them from a project managers perspective yet, it was the first thing Craig did and it needs to be gone back over and adjusted.

  • Is the API server part of CCD or PADM?
    • The API server is the NDSL cluster API server.   It implements all NDS interfaces on top of Kubernetes, include PADM, CCD, catalog support, and pretty much everything that NDSL layers over kubernetes.   It abstracts the entire system and insulates the kubernetes and etcd interfaces.
  • Where does the API server run?
    • Currently it runs standalone, but it should run on the cluster as a service.
    • NOTE:   A bootup sequence is needed
      1. boot kubernentes
      2. deploy cluster monitoing tools
      3. Deploy API server
      4. Deploy other NDSL services
  • The GUI supports adding services?
    • No, the GUI is just the CCD part of the PADM
  • How do services get added?
    • The API server has a CLI.   Currently the catalog integration is not done, so a developer can add a service via the CLI
  • So the CLI and GUI are in the API server?
    • The API server is a REST server that provides API services at the NDSL layer of abstraction.   This is the primary interface to the system, we hide all the other kubernetes and low-level things behind it.
  • Where is the GUI?
    • The GUI is a bootstrap/angular application that runs in a web server.   It currently runs standalone but will run in kubernetes.
  • The cluster tools, heat scripts, etc. - where do they come in?
    • The system comes up in layers, and those tools are meant for a seasoned sysadmin that would setup a production cluster.
    • Production system setup
      1. Deploy a cluster with heat on 6 nodes (base config) using the heat script.
      2. Deploy etcd in HA configuration on 3 nodes using ansible scripts
      3. Deploy flannel or weave for cluster-wide networking with ansible scripts
      4. Deploy the cluster monitor tools on base OS of cluster nodes
      5. Deploy Kubernentes on the cluster
        1. Deploy the kubernetes master on 4th node with ansible
        2. Deploy kubernetes kubetlet workers on 5th/6th worker nodes
          1. Additional workers can be added up to approx 2000 nodes if needed
      6. Deploy the API server pod on kubernetes
      7. Deploy the GUI server pod on kubernentes
      8. Ready for creating projects (namespaces)
  • The CADM GUI is not done, how to you deploy projects?
    • The CLI has support for deploying projects
  • We will provide a project on our cluster for the tutorial participants?
    • We will provide a nebula VM where they will run
  • They will login, and download and install the NDSL runtime and run the deploy process?
    • We provide a containerized developer environment (ndslabs/ndsdev), so the simply run the container, and bring up their micro-cluster:
      • docker run ndslabs/ndsdev
      • nds up
  • If they're not on the cluster, how do they get the API server and GUI and all that?
    • They access it on their VM, each ndsdev has a full NDSLabs stack running, it just runs everything on a single machine rather than on a large cluster
  • Isn't that very different than on a real cluster?
    • The ndsdev single-machine environment runs the exact same set of services, in fact it runs the exact same containers which are bit-for-bit identical to running on a big cluster, so it is functionally identical to the real production cluster environment.
  • (They can run on their own laptop)   Developers need to run on a real cluster, right?
    • The model is that developers start on a micro-cluster, where they can play around, break things, do all sorts of odd things messing around and it won't impact anyone else or require us to provide resources.   When they have something that works, they publish their containers to a container hub (docker), and publish their service descriptions to a catalog (ndslabs or their own) and then their services can run in the large cluster environment.
    • NOTE:   Although anyone can pull our source, and our containers and run in their laptop or local machine, most linux distributions don't support our requirements - docker version 10 and our customized kubernetes.    We do not wish to have people trying this only to be confused, and we wish to keep them close in case of problems so we will have them run in a VM on nebula.
  • How does this tie in to the NDSLabs website, can they request a pilot and get on our cluster?
    • The API cli supports this, but we are not providing this at NDSC.   We are only providing an early access alpha preview without support.
  • We are not planning to provide cluster access are we?
    • Yes
    • But I thought we are a provider of last-resort - with only small-scale limited time support for hand-selected pilots, right?
      • We are a service provider
  • Is all this stuff documented? - How to startup and run and use the cli and gui etc.....
    • Documentation for the tutorial will start after development is done, but the documentation will only be enough to help them through the early-access tutorial and will not be production/release quality.
    • We are providing early-access pre-release beta access only, they should have a community support site where they help each other otherwise we will be overwhelmed with support at the expense of moving forward
  • What will we do for the tutorial?
    • Tour the architecture
    • Tour the codebases
    • Boot up their microcluster
    • Tour the runtime system at a high level
    • Open the hood and look at the inner workings
    • Run a demonstration services
    • Walk through adding a service to the ndsdev environment - owncloud or similar
      • containerize it
      • publish the container
      • implement the service description
      • inject it into the local catalog
      • run it
      • Interact with the service
      • using system log/mon tools
    • Show capabilites of ndsdev
      • Kubernetes keeping services running in outage
      • Saving/restoring fully configured environments for dev purposes
    • Show same published services on production cluster
  • Where in the source repo are all these things?
    • All over now - the repos is getting a reorg
      • Container build support separated
      • NDSLabs service descriptions published separately
      • Cluster setup supportive tools separated
      • Custom Kubernetes published
      • NDSdev potentially separated
      • NDS Production Services Separated
        • Dataverse
        • Clowder
      •  NDS cluster services separated
        • API server
        • GUI server

 

  • No labels