Skip to content

Commit

Permalink
Merge branch 'upstream-master' into feature/email-attachment
Browse files Browse the repository at this point in the history
* upstream-master: (82 commits)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
  Add stale config
  Move github templates to .github dir
  Fix transfer config import (spotify#2458)
  Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373)
  Version 2.7.6
  ...
  • Loading branch information
dlstadther committed Aug 14, 2018
2 parents 70336bc + c696f40 commit 328c6bf
Show file tree
Hide file tree
Showing 90 changed files with 4,611 additions and 935 deletions.
12 changes: 12 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# The following patterns are used to auto-assign review requests
# to specific individuals. Order is important; the last matching
# pattern takes the most precedence.

# These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence,
* @dlstadther @Tarrasch @ulzha

# Specific files, directories, paths, or file types can be
# assigned more specificially.
contrib/redshift*.py @dlstadther

File renamed without changes.
File renamed without changes.
20 changes: 20 additions & 0 deletions .github/stale.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 120
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 14
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
# Label to use when marking an issue as stale
staleLabel: wontfix
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs.
If closed, you may revisit when your time allows and reopen!
Thank you for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false
# Limit to only `issues` or `pulls`
# only: issues
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,17 @@ pig_property_file

packages.tar

# Ignore the data files
data
test/data
examples/data

Vagrantfile

*.pickle
*.rej
*.orig


# Created by https://www.gitignore.io

### Python ###
Expand Down
3 changes: 3 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ env:
- BQ_TEST_PROJECT_ID=luigi-travistestenvironment
- BQ_TEST_INPUT_BUCKET=luigi-bigquery-test
- GOOGLE_APPLICATION_CREDENTIALS=test/gcloud-credentials.json
- AWS_DEFAULT_REGION=us-east-1
- AWS_ACCESS_KEY_ID=accesskey
- AWS_SECRET_ACCESS_KEY=secretkey
matrix:
- TOXENV=flake8
- TOXENV=docs
Expand Down
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ or held presentations about Luigi:
* `Open Targets <https://www.opentargets.org/>`_ `(blog, 2017) <https://blog.opentargets.org/using-containers-with-luigi>`__
* `Leipzig University Library <https://ub.uni-leipzig.de>`_ `(presentation, 2016) <https://de.slideshare.net/MartinCzygan/build-your-own-discovery-index-of-scholary-eresources>`__ / `(project) <https://finc.info/de/datenquellen>`__
* `Synetiq <https://synetiq.net/>`_ `(presentation, 2017) <https://www.youtube.com/watch?v=M4xUQXogSfo>`__
* `Glossier <https://www.glossier.com/>`_ `(blog, 2018) <https://medium.com/glossier/how-to-build-a-data-warehouse-what-weve-learned-so-far-at-glossier-6ff1e1783e31>`__
* `Data Revenue <https://www.datarevenue.com/>`_ `(blog, 2018) <https://www.datarevenue.com/en/blog/how-to-scale-your-machine-learning-pipeline>`_

Some more companies are using Luigi but haven't had a chance yet to write about it:

Expand All @@ -163,6 +165,7 @@ Some more companies are using Luigi but haven't had a chance yet to write about
* `Deloitte <https://www.Deloitte.co.uk/>`_
* `Stacktome <https://stacktome.com/>`_
* `LINX+Neemu+Chaordic <https://www.chaordic.com.br/>`_
* `Foxberry <https://www.foxberry.com/>`_

We're more than happy to have your company added here. Just send a PR on GitHub.

Expand Down
35 changes: 23 additions & 12 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,32 @@
# First just blindly copy paste what is default values from the docs page
# https://github.com/codecov/support/wiki/codecov.yml
coverage:
precision: 2
round: down
range: "70...100"
precision: 2 # Just copied from default
round: down # Just copied from default
range: "70...100" # Just copied from default

status:
project:
default: false # disable the default status that measures entire project
core:
target: 92%
paths: "luigi/*.py"
patch: # Just copied from default
default:
target: auto
if_no_uploads: error

patch:
default:
if_no_uploads: error

changes: true
changes: true # Just copied from default

ignore:
- "examples/"
- "luigi/tools" # These are tested as actual run commands without coverage
# List modules who's tests are not run by Travis or
# are run in a subprocesses (like on cluster).
- "luigi/contrib/gcs.py"
- "luigi/contrib/bigquery.py"
- "luigi/contrib/bigquery_avro.py"
- "luigi/contrib/hdfs/"
- "luigi/contrib/hadoop.py"
- "luigi/contrib/mrrunner.py"
- "luigi/contrib/kubernetes.py"

# But for luigi we do not want any comments
# For luigi we do not want any comments
comment: false
38 changes: 0 additions & 38 deletions doc/command_line.rst

This file was deleted.

56 changes: 50 additions & 6 deletions doc/configuration.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,35 @@
Configuration
=============

All configuration can be done by adding configuration files. They are looked for in:
All configuration can be done by adding configuration files.

* ``/etc/luigi/client.cfg``
* ``luigi.cfg`` (or its legacy name ``client.cfg``) in your current working directory
* ``LUIGI_CONFIG_PATH`` environment variable
Supported config parsers:
* ``cfg`` (default)
* ``toml``

in increasing order of preference. The order only matters in case of key conflicts (see docs for ConfigParser.read_). These files are meant for both the client and ``luigid``. If you decide to specify your own configuration you should make sure that both the client and ``luigid`` load it properly.
You can choose right parser via ``LUIGI_CONFIG_PARSER`` environment variable. For example, ``LUIGI_CONFIG_PARSER=toml``.

Default (cfg) parser are looked for in:

* ``/etc/luigi/client.cfg`` (deprecated)
* ``/etc/luigi/luigi.cfg``
* ``client.cfg`` (deprecated)
* ``luigi.cfg``
* ``LUIGI_CONFIG_PATH`` environment variable

`TOML <https://github.com/toml-lang/toml>`_ parser are looked for in:

* ``/etc/luigi/luigi.toml``
* ``luigi.toml``
* ``LUIGI_CONFIG_PATH`` environment variable

Both config lists increase in priority (from low to high). The order only matters in case of key conflicts (see docs for ConfigParser.read_). These files are meant for both the client and ``luigid``. If you decide to specify your own configuration you should make sure that both the client and ``luigid`` load it properly.

.. _ConfigParser.read: https://docs.python.org/3.6/library/configparser.html#configparser.ConfigParser.read

The config file is broken into sections, each controlling a different part of the config. Example configuration file:
The config file is broken into sections, each controlling a different part of the config.

Example cfg config:

.. code:: ini
Expand All @@ -23,6 +40,17 @@ The config file is broken into sections, each controlling a different part of th
[core]
scheduler_host=luigi-host.mycompany.foo
Example toml config:

.. code:: python
[hadoop]
version = "cdh4"
streaming-jar = "/usr/lib/hadoop-xyz/hadoop-streaming-xyz-123.jar"
[core]
scheduler_host = "luigi-host.mycompany.foo"
.. _ParamConfigIngestion:

Expand Down Expand Up @@ -154,6 +182,7 @@ parallel_scheduling
If true, the scheduler will compute complete functions of tasks in
parallel using multiprocessing. This can significantly speed up
scheduling, but requires that all tasks can be pickled.
Defaults to false.

parallel-scheduling-processes
The number of processes to use for parallel scheduling. If not specified
Expand Down Expand Up @@ -270,6 +299,12 @@ check_unfulfilled_deps
resource-intensive.
Defaults to true.

force_multiprocessing
By default, luigi uses multiprocessing when *more than one* worker process is
requested. Whet set to true, multiprocessing is used independent of the
the number of workers.
Defaults to false.


[elasticsearch]
---------------
Expand Down Expand Up @@ -716,6 +751,15 @@ worker_disconnect_delay
scheduler before removing it and marking all of its running tasks as
failed. Defaults to 60.

pause_enabled
If false, disables pause/unpause operations and hides the pause toggle from
the visualiser.

send_messages
When true, the scheduler is allowed to send messages to running tasks and
the central scheduler provides a simple prompt per task to send messages.
Defaults to true.


[sendgrid]
----------
Expand Down
2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Table of Contents
workflows.rst
tasks.rst
parameters.rst
command_line.rst
running_luigi.rst
central_scheduler.rst
execution_model.rst
luigi_patterns.rst
Expand Down
63 changes: 63 additions & 0 deletions doc/luigi_patterns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,33 @@ the task parameters or other dynamic attributes:
Since, by default, resources have a usage limit of 1, no two instances of Task A
will now run if they have the same `important_file_name` property.

Decreasing resources of running tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At scheduling time, the luigi scheduler needs to be aware of the maximum
resource consumption a task might have once it runs. For some tasks, however,
it can be beneficial to decrease the amount of consumed resources between two
steps within their run method (e.g. after some heavy computation). In this
case, a different task waiting for that particular resource can already be
scheduled.

.. code-block:: python
class A(luigi.Task):
# set maximum resources a priori
resources = {"some_resource": 3}
def run(self):
# do something
...
# decrease consumption of "some_resource" by one
self.decrease_running_resources({"some_resource": 1})
# continue with reduced resources
...
Monitoring task pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -290,3 +317,39 @@ built-in solutions. In the case of you're dealing with a file system
:meth:`~luigi.target.FileSystemTarget.temporary_path`. For other targets, you
should ensure that the way you're writing your final output directory is
atomic.

Sending messages to tasks
~~~~~~~~~~~~~~~~~~~~~~~~~

The central scheduler is able to send messages to particular tasks. When a running task accepts
messages, it can access a `multiprocessing.Queue <https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues>`__
object storing incoming messages. You can implement custom behavior to react and respond to
messages:

.. code-block:: python
class Example(luigi.Task):
# common task setup
...
# configure the task to accept all incoming messages
accepts_messages = True
def run(self):
# this example runs some loop and listens for the
# "terminate" message, and responds to all other messages
for _ in some_loop():
# check incomming messages
if not self.scheduler_messages.empty():
msg = self.scheduler_messages.get()
if msg.content == "terminate":
break
else:
msg.respond("unknown message")
# finalize
...
Messages can be sent right from the scheduler UI which also displays responses (if any). Note that
this feature is only available when the scheduler is configured to send messages (see the :ref:`scheduler-config` config), and the task is configured to accept them.
21 changes: 20 additions & 1 deletion doc/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ i.e.
.. code:: python
d = DailyReport(datetime.date(2012, 5, 10))
print d.date
print(d.date)
will return the same date that the object was constructed with.
Same goes if you invoke Luigi on the command line.
Expand Down Expand Up @@ -88,6 +88,25 @@ are not the same instance:
>>> hash(c) == hash(d)
True
Parameter visibility
^^^^^^^^^^^^^^^^^^^^

Using :class:`~luigi.parameter.ParameterVisibility` you can configure parameter visibility. By default, all
parameters are public, but you can also set them hidden or private.

.. code:: python
>>> import luigi
>>> from luigi.parameter import ParameterVisibility
>>> luigi.Parameter(visibility=ParameterVisibility.PRIVATE)
``ParameterVisibility.PUBLIC`` (default) - visible everywhere

``ParameterVisibility.HIDDEN`` - ignored in WEB-view, but saved into database if save db_history is true

``ParameterVisibility.PRIVATE`` - visible only inside task.

Parameter types
^^^^^^^^^^^^^^^

Expand Down
Loading

0 comments on commit 328c6bf

Please sign in to comment.