diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5f87d820a94a9..8553158a3922d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -86,9 +86,9 @@ Before you submit a pull request from your forked repo, check that it meets these guidelines: 1. The pull request should include tests, either as doctests, unit tests, or both. -1. If the pull request adds functionality, the docs should be updated as part of the same PR. Doc string are often sufficient, make sure to follow the sphinx compatible standards. +1. If the pull request adds functionality, the docs should be updated as part of the same PR. Doc string are often sufficient. Make sure to follow the sphinx compatible standards. 1. The pull request should work for Python 2.6, 2.7, and 3.3. If you need help writing code that works in both Python 2 and 3, see the documentation at the [Python-Future project](http://python-future.org) (the future package is an Airflow requirement and should be used where possible). -1. Code will be reviewed by re running the unittests, flake8 and syntax should be as rigorous as the core Python project. +1. Code will be reviewed by re running the unittests and flake8. Syntax should be as rigorous as the core Python project. 1. Please rebase and resolve all conflicts before submitting. ## Running unit tests diff --git a/README.md b/README.md index 0da594a93ff12..bb93888d6e53f 100644 --- a/README.md +++ b/README.md @@ -28,7 +28,7 @@ one to the other (though tasks can exchange metadata!). Airflow is not in the [Spark Streaming](http://spark.apache.org/streaming/) or [Storm](https://storm.apache.org/) space, it is more comparable to [Oozie](http://oozie.apache.org/) or -[Azkaban](http://data.linkedin.com/opensource/azkaban). +[Azkaban](https://azkaban.github.io/). Workflows are expected to be mostly static or slowly changing. You can think of the structure of the tasks in your workflow as slightly more dynamic diff --git a/docs/installation.rst b/docs/installation.rst index f2c97a4fe1a83..d2ad7f3f6b14f 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -177,48 +177,46 @@ In addition, users can supply an S3 location for storing log backups. If logs ar Scaling Out on Mesos (community contributed) '''''''''''''''''''''''''''''''''''''''''''' MesosExecutor allows you to schedule airflow tasks on a Mesos cluster. -For this to work, you need a running mesos cluster and perform following +For this to work, you need a running mesos cluster and you must perform the following steps - 1. Install airflow on a machine where webserver and scheduler will run, - let's refer this as Airflow server. -2. On Airflow server, install mesos python eggs from `mesos downloads `_. -3. On Airflow server, use a database which can be accessed from mesos - slave machines, for example mysql, and configure in ``airflow.cfg``. + let's refer to this as the "Airflow server". +2. On the Airflow server, install mesos python eggs from `mesos downloads `_. +3. On the Airflow server, use a database (such as mysql) which can be accessed from mesos + slave machines and add configuration in ``airflow.cfg``. 4. Change your ``airflow.cfg`` to point executor parameter to - MesosExecutor and provide related Mesos settings. + `MesosExecutor` and provide related Mesos settings. 5. On all mesos slaves, install airflow. Copy the ``airflow.cfg`` from Airflow server (so that it uses same sql alchemy connection). -6. On all mesos slaves, run +6. On all mesos slaves, run the following for serving logs: .. code-block:: bash airflow serve_logs -for serving logs. - -7. On Airflow server, run +7. On Airflow server, to start processing/scheduling DAGs on mesos, run: .. code-block:: bash airflow scheduler -p -to start processing DAGs and scheduling them on mesos. We need -p parameter to pickle the DAGs. +Note: We need -p parameter to pickle the DAGs. You can now see the airflow framework and corresponding tasks in mesos UI. The logs for airflow tasks can be seen in airflow UI as usual. -For more information about mesos, refer `mesos documentation `_. -For any queries/bugs on MesosExecutor, please contact `@kapil-malik `_. +For more information about mesos, refer to `mesos documentation `_. +For any queries/bugs on `MesosExecutor`, please contact `@kapil-malik `_. Integration with systemd '''''''''''''''''''''''' Airflow can integrate with systemd based systems. This makes watching your daemons easy as systemd -can take care restarting a daemon on failure. In the ``scripts/systemd`` directory you can find unit files that -have been tested on Redhat based systems. You can copy those ``/usr/lib/systemd/system``. It is assumed that +can take care of restarting a daemon on failure. In the ``scripts/systemd`` directory you can find unit files that +have been tested on Redhat based systems. You can copy those to ``/usr/lib/systemd/system``. It is assumed that Airflow will run under ``airflow:airflow``. If not (or if you are running on a non Redhat based system) you -probably need adjust the unit files. +probably need to adjust the unit files. -Environment configuration is picked up from ``/etc/sysconfig/airflow``. An example file is supplied - . Make sure to specify the ``SCHEDULER_RUNS`` variable in this file when you run the schduler. You - can also define here, for example, ``AIRFLOW_HOME`` or ``AIRFLOW_CONFIG``. \ No newline at end of file +Environment configuration is picked up from ``/etc/sysconfig/airflow``. An example file is supplied. +Make sure to specify the ``SCHEDULER_RUNS`` variable in this file when you run the schduler. You + can also define here, for example, ``AIRFLOW_HOME`` or ``AIRFLOW_CONFIG``. diff --git a/docs/plugins.rst b/docs/plugins.rst index 75f80b2e0b883..418a16647dfbd 100644 --- a/docs/plugins.rst +++ b/docs/plugins.rst @@ -79,7 +79,7 @@ looks like: Example ------- -The code bellow defines a plugin that injects a set of dummy object +The code below defines a plugin that injects a set of dummy object definitions in Airflow. .. code:: python diff --git a/docs/tutorial.rst b/docs/tutorial.rst index c03ad935bbceb..403ac2a9ae62b 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -77,12 +77,12 @@ at first) is that this Airflow Python script is really just a configuration file specifying the DAG's structure as code. The actual tasks defined here will run in a different context from the context of this script. Different tasks run on different workers -at different point it time, which means this script cannot be directly -to cross communicate between tasks for instance. Note that for this +at different points in time, which means that this script cannot be used +to cross communicate between tasks. Note that for this purpose we have a more advanced feature called ``XCom``. People sometimes think of the DAG definition file as a place where they -can do some actual data processing, that is not the case at all! +can do some actual data processing - that is not the case at all! The script's purpose is to define a DAG object. It needs to evaluate quickly (seconds, not minutes) since the scheduler will execute it periodically to reflect the changes if any. @@ -420,7 +420,7 @@ running against it should get it to get triggered and run every day. Here's a few things you might want to do next: -* Take an in-depth tour of the UI, click all the things! +* Take an in-depth tour of the UI - click all the things! * Keep reading the docs! Especially the sections on: * Command line interface