This tutorial introduces how Kurator-Akka workflows are specified in YAML files.
What is YAML?
YAML is a plain text format for representing data organized as lists of values and sets of key-value pairs (mappings). The values in these lists and in the key-value pairs can themselves be lists or mappings. YAML is a superset of JSON (every JSON document is a valid YAML document) that uses white space (rather than braces and quotes) to organize data, and is thus easy to read.
...
See http://yaml.org/ for more information about YAML and a list of libraries in C/C++, Ruby, Python, Java, Perl, C#, and other languages for composing and parsing YAML.
Example workflow
We will use the hello.yaml
workflow from the Kurator-Akka distribution to illustrate how workflows are specified in YAML. You can extract this YAML file from the kurator-akka jar file using the unzip command (the -j option prevents any directories from being created during inflation):
...
- id: GreetingPrinter
type: PrinterActor
properties:
listensTo:
- !ref GreetingSource
- id: HelloWorldWorkflow
type: Workflow
properties:
actors:
- !ref GreetingSource
- !ref GreetingPrinter
parameters:
greeting:
actor: !ref GreetingSource
parameter: value
Structure of a YAML workflow definition file
Inspect the contents of hello.yaml
above. Kurator-Akka expects each YAML workflow definition file to have a mapping (set of key-value pairs) as the top-level data structure. A colon in YAML indicates that the preceding string is a key, and that the following value or block of text is the value assigned to that key. The valid keys of this top-level mapping are imports, types, and components, with the result that Kurator-Akka workflow definition files have up to three top-level sections: an imports section, a types section, and a components section. The hello.yaml
file does not have a types section.
The imports section (the block of text following the imports line) is for providing a list of other YAML files to be included in the current workflow definition. List items are preceded by a dash. The components section provides a list of the workflow components comprising the workflow. We will focus on the workflow components for the remainder of this tutorial page.
Workflow components
The components section of hello.yaml contains declarations for two actors and for the workflow as whole. Actors are the active, data processing components of workflow. The workflow itself is considered a component as well, with the workflow component declaration containing references to the actor components in it.
...
The type of a component refers to a declaration either in the type section of the current YAML file or in a YAML file included (directly or indirectly) included in in the imports section. Type declarations ultimately refer to Java classes, and are covered in associate each component type with a Java class, and will be covered in a later tutorial.
Finally, each component has a set of properties that represents its configuration in the current workflow.
Examing the components in hello.yaml
The first component in hello.yaml
is the actor with id GreetingSource
:
...
The final component of in hello.yaml
is identifies identified as HelloWorldWorkflow
:
- id: HelloWorldWorkflow
type: Workflow
properties:
actors:
- !ref GreetingSource
- !ref GreetingPrinter
parameters:
greeting:
actor: !ref GreetingSource
parameter: value
...
Here, HelloWorldWorkflow
is configured to comprise two actors, GreetingSource
and GreetingPrinter
. The value parameter of the GreetingSource
actor is exposed as the greeting parameter of the workflow.
See the final section of Tutorial 2 - Running workflows for an example run of hello.yaml
with an assignment to the greeting parameter of the workflow.