Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of external pipelines with Control.py #390

Merged
merged 26 commits into from
Jan 29, 2018

Conversation

antoniojbt
Copy link
Member

@antoniojbt antoniojbt commented Jan 24, 2018

Hi @sebastian-luna-valero, @Acribbs, @AndreasHeger,
I've made a few changes to Control.py in order to:

  • Call an external pipeline
  • Copy configuration files from that pipeline
  • Copy any additional config files (if using sphinx but not CGATReport for instance)

Changes are potentially breaking.
Could you please check carefully when possible?

Currently, we run pipelines with:

cgatflow readqc CMD
python ../some/path/pipe.py CMD

And possibly as:

pipeline_external CMD

This last one can be done easily if it is a python package with the correct entry point in its setup.py

I've tested changes with readqc and an external pipeline I'm working on:

pipeline_QTL

in a CentOS HPC and locally in a Mac.

The external pipeline can be installed with pip and ran as e.g.:

pipeline_QTL --help
pipeline_QTL config

Most of its dependencies are already included in CGATPipelines. You might be missing r-docopt only. See:

https://raw.githubusercontent.com/EpiCompBio/pipeline_QTL/master/installation/Dockerfile

for detailed instructions. The Dockerfile itself is untested though.

Additionally, there are some very minor changes to make configuration files slightly more generic. These can be discarded. It may be a thought to have generic conf.py and pipeline.ini files if later transferring to CGAT core for instance.

Any thoughts, changes, errors, etc. please let me know.

Thanks!
Antonio

f = sys._getframe(1)
#caller = f.f_globals["__file__"] # cgatflow config
# globals will get Control.py
caller = f.f_locals["__file__"] # TO DO: cgatflow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreasHeger
Do we need globals here instead of locals?
The module's namespace will get Control.py and breaks the rest of the changes.
locals seems to work OK but needs more tests.
If globals is needed let me know if you can think of a better way of calling external code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. There might have been python2/python3 issues here, but can't remember. Now with python3 only, not a problem any more.

for path in config_paths:
if os.path.exists(path) and os.path.isdir(path):
for f in os.listdir(os.path.abspath(path)):
if fnmatch.fnmatch(f, 'pipeline*ini'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pick up two pipeline.ini files e.g.:

https://github.com/CGATOxford/CGATPipelines/blob/master/CGATPipelines/pipeline_readqc/pipeline.ini
https://github.com/CGATOxford/CGATPipelines/blob/master/CGATPipelines/configuration/pipeline.ini

The specific pipeline ini file is picked up first though, might cause unintended issues elsewhere.

@antoniojbt
Copy link
Member Author

PS @sebastian-luna-valero
Several commits passed tests but would have easily broken CGATPipelines. I have no suggestions at the moment as I didn't look in detail. Is any one pipeline executed as part of the tests? I'll check tests when possible though to get a better understanding.

@antoniojbt antoniojbt requested a review from IanSudbery January 24, 2018 18:18
@sebastian-luna-valero
Copy link
Member

Many thanks @antoniojbt

The tests on travis do not run pipelines since the resources over there are quite limited (only a tiny virtual machine). However, we do so internally in our cluster with Jenkins but the testing results are not publicly accessible yet (only people with CGAT accounts can see it if they want to). I am planning to open it up at some point anyway.

I have rerun the Jenkins tests for this branch. To make it work internally I had to create a $HOME/.cgat with:

[cluster]
queue=all.q

and all the tests passed.

Having said that, rather than merging these changes into the master now, I would wait for the CGAT core code to come out, since all pipelines would need refactoring anyway after we move in that direction.

@AndreasHeger
Copy link
Member

@sebastian-luna-valero suggestions is sensible.
Great stuff, @antoniojbt . Having pipelines pip-installable sounds useful.

@antoniojbt
Copy link
Member Author

Hi @sebastian-luna-valero and @AndreasHeger,
Thanks for testing, happy to hear changes pass (!).
This is only a suggestion of course but I had coded it already and decided to submit a PR.

What about merging only changes to Control.py and Cluster.py?

That should mean that .cgat and .ini file defaults should run without the need to change anything.

More generally, what's are the thoughts/plans at the moment for refactoring? Would switching this PR to CGATCore be better for instance?

We could hold a second meeting if easier. I'd be keen on being able to continue working on CGAT code and external pipelines without too much worry. Happy to wait otherwise and run off a branch though.

Antonio

@sebastian-luna-valero
Copy link
Member

I think we could merge right now the changes for:

CGATPipelines/Pipeline/__init__.py 
CGATPipelines/configuration/pipeline.ini 
CGATPipelines/Pipeline/Cluster.py 

and leave out the changes for Control.py due to the incoming changes?

Are you happy with this @antoniojbt ?

@AndreasHeger will announce it in due course -- pretty soon by the looks of it! ;)

@antoniojbt
Copy link
Member Author

@sebastian-luna-valero
That's fine with me. Reverting for now then in a new commit and merging recent changes from master...
Looking forward to @AndreasHeger announcement 👍

@sebastian-luna-valero
Copy link
Member

Great, thanks @antoniojbt

Before merging into the master branch, I just thought that adding to-set to all parameters that need to be configured makes it easier to find where to change them with: grep to-set -r <clone-of-cgat>

Is that helpful?

@antoniojbt
Copy link
Member Author

Hi @sebastian-luna-valero
Yes, that is probably an easy way to quickly check what needs setting. We'll probably need a more thorough check in the future to catch all of them though
Thanks!
Antonio

@sebastian-luna-valero sebastian-luna-valero merged commit 69603cf into master Jan 29, 2018
@sebastian-luna-valero sebastian-luna-valero deleted the AJBT-pipeline-control branch January 29, 2018 16:30
@AndreasHeger
Copy link
Member

Hi @sebastian-luna-valero and @antoniojbt , in the past we used !? to mark parameters that need to be set manually.

@sebastian-luna-valero
Copy link
Member

Thanks @AndreasHeger

I have replaced to-set with !? in 50430d2 to be consistent with the previous convention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants