Skip to content

Commit

Permalink
Add YAML parsing capability #10995
Browse files Browse the repository at this point in the history
  • Loading branch information
jieguangzhou committed Sep 3, 2022
1 parent d0d481d commit 4e4e580
Show file tree
Hide file tree
Showing 39 changed files with 1,606 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,10 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.condition

YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Condition.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,16 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.datax

YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/DataX.yaml
:start-after: # under the License.
:language: yaml


example_datax.json:

.. literalinclude:: ../../../examples/yaml_define/example_datax.json
:language: json
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,17 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.dependent


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Dependent.yaml
:start-after: # under the License.
:language: yaml

Dependent_External.yaml:

.. literalinclude:: ../../../examples/yaml_define/Dependent_External.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,10 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.flink

YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Flink.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ Example
Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.func_wrap
.. automodule:: pydolphinscheduler.tasks.func_wrap
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,11 @@ HTTP
====

.. automodule:: pydolphinscheduler.tasks.http


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Http.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,11 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.map_reduce


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/MapReduce.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,11 @@ Procedure
=========

.. automodule:: pydolphinscheduler.tasks.procedure


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Procedure.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,11 @@ Python
======

.. automodule:: pydolphinscheduler.tasks.python


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Python.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,11 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.shell


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Shell.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,11 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.spark


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Spark.yaml
:start-after: # under the License.
:language: yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,17 @@ SQL
===

.. automodule:: pydolphinscheduler.tasks.sql


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Sql.yaml
:start-after: # under the License.
:language: yaml

example_sql.sql:

.. literalinclude:: ../../../examples/yaml_define/example_sql.sql
:start-after: */
:language: sql
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,20 @@ Sub Process
===========

.. automodule:: pydolphinscheduler.tasks.sub_process


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/SubProcess.yaml
:start-after: # under the License.
:language: yaml



example_subprocess.yaml:

.. literalinclude:: ../../../examples/yaml_define/example_sub_workflow.yaml
:start-after: # under the License.
:language: yaml

Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,12 @@ Dive Into
---------

.. automodule:: pydolphinscheduler.tasks.switch


YAML file example
-----------------

.. literalinclude:: ../../../examples/yaml_define/Switch.yaml
:start-after: # under the License.
:language: yaml

Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ There are two types of tutorials: traditional and task decorator.
versatility to the traditional way because it only supported Python functions and without build-in tasks
supported. But it is helpful if your workflow is all built with Python or if you already have some Python
workflow code and want to migrate them to pydolphinscheduler.
- **YAML File**: We can use pydolphinscheduler CLI to create process using YAML file: :code:`pydolphinscheduler yaml -f tutorial.yaml`.
We can find more YAML file examples in `examples/yaml_define <https://github.com/apache/dolphinscheduler/tree/py-yaml/dolphinscheduler-python/pydolphinscheduler/examples/yaml_define>`_

.. tab:: Tradition

Expand All @@ -52,6 +54,12 @@ There are two types of tutorials: traditional and task decorator.
:start-after: [start tutorial]
:end-before: [end tutorial]

.. tab:: YAML File

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # under the License.
:language: yaml

Import Necessary Module
-----------------------

Expand Down Expand Up @@ -104,6 +112,13 @@ will be running this task in the DolphinScheduler worker. See :ref:`section tena
:start-after: [start workflow_declare]
:end-before: [end workflow_declare]

.. tab:: YAML File

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # under the License.
:end-before: # Define the tasks under the workflow
:language: yaml

We could find more detail about :code:`ProcessDefinition` in :ref:`concept about process definition <concept:process definition>`
if you are interested in it. For all arguments of object process definition, you could find in the
:class:`pydolphinscheduler.core.process_definition` API documentation.
Expand Down Expand Up @@ -139,6 +154,12 @@ Task Declaration
It makes our workflow more Pythonic, but be careful that when we use task decorator mode mean we only use
Python function as a task and could not use the :doc:`built-in tasks <tasks/index>` most of the cases.

.. tab:: YAML File

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # Define the tasks under the workflow
:language: yaml

Setting Task Dependence
-----------------------

Expand Down Expand Up @@ -167,6 +188,14 @@ and task `task_child_two` was done, because both two task is `task_union`'s upst
:start-after: [start task_relation_declare]
:end-before: [end task_relation_declare]

.. tab:: YAML File

We can use :code:`deps:[]` to set task dependence

.. literalinclude:: ../../examples/yaml_define/tutorial.yaml
:start-after: # Define the tasks under the workflow
:language: yaml

.. note::

We could set task dependence in batch mode if they have the same downstream or upstream by declaring those
Expand Down Expand Up @@ -198,6 +227,17 @@ will create workflow definition as well as workflow schedule.
:start-after: [start submit_or_run]
:end-before: [end submit_or_run]

.. tab:: YAML File

pydolphinscheduler YAML CLI always submit workflow. We can run the workflow if we set :code:`run: true`

.. code-block:: yaml
# Define the workflow
workflow:
name: "tutorial"
run: true
At last, we could execute this workflow code in your terminal like other Python scripts, running
:code:`python tutorial.py` to trigger and execute it.

Expand All @@ -219,5 +259,61 @@ named "tutorial" or "tutorial_decorator". The task graph of workflow like below:
:language: text
:lines: 24-28

Create Process Using YAML File
------------------------------

We can use pydolphinscheduler CLI to create process using YAML file

.. code-block:: bash
pydolphinscheduler yaml -f Shell.yaml
We can use the following three special grammars to define workflows more flexibly.

- :code:`$FILE{"file_name"}`: Read the file (:code:`file_name`) contents and replace them to that location.
- :code:`$WORKFLOW{"other_workflow.yaml"}`: Refer to another process defined using YAML file (:code:`other_workflow.yaml`) and replace the process name in this location.
- :code:`$ENV{env_name}`: Read the environment variable (:code:`env_name`) and replace it to that location.
- :code:`${CONFIG.key_name}`: Read the configuration value of key (:code:`key_name`) and it them to that location.


In addition, when loading the file path use :code:`$FILE{"file_name"}` or :code:`$WORKFLOW{"other_workflow.yaml"}`, pydolphinscheduler will search in the path of the YAMl file if the file does not exist.

For exmaples, our file directory structure is as follows:

.. code-block:: bash
.
└── yaml_define
├── Condition.yaml
├── DataX.yaml
├── Dependent_External.yaml
├── Dependent.yaml
├── example_datax.json
├── example_sql.sql
├── example_subprocess.yaml
├── Flink.yaml
├── Http.yaml
├── MapReduce.yaml
├── MoreConfiguration.yaml
├── Procedure.yaml
├── Python.yaml
├── Shell.yaml
├── Spark.yaml
├── Sql.yaml
├── SubProcess.yaml
└── Switch.yaml
After we run

.. code-block:: bash
pydolphinscheduler yaml -file yaml_define/SubProcess.yaml
the :code:`$WORKFLOW{"example_sub_workflow.yaml"}` will be set to :code:`$WORKFLOW{"yaml_define/example_sub_workflow.yaml"}`, because :code:`./example_subprocess.yaml` does not exist and :code:`yaml_define/example_sub_workflow.yaml` does.

Furthermore, this feature supports recursion all the way down.


.. _`DolphinScheduler project page`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project.html
.. _`Python context manager`: https://docs.python.org/3/library/stdtypes.html#context-manager-types
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Define the workflow
workflow:
name: "Condition"

# Define the tasks under the workflow
tasks:
- { "task_type": "Shell", "name": "pre_task_1", "command": "echo pre_task_1" }
- { "task_type": "Shell", "name": "pre_task_2", "command": "echo pre_task_2" }
- { "task_type": "Shell", "name": "pre_task_3", "command": "echo pre_task_3" }
- { "task_type": "Shell", "name": "success_branch", "command": "echo success_branch" }
- { "task_type": "Shell", "name": "fail_branch", "command": "echo fail_branch" }

- task_type: Condition
name: condition
success_task: success_branch
failed_task: fail_branch
op: AND
groups:
- op: AND
groups:
- task: pre_task_1
flag: true
- task: pre_task_2
flag: true
- task: pre_task_3
flag: false
Loading

0 comments on commit 4e4e580

Please sign in to comment.