Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4364 install prov flask #4461

Closed
wants to merge 18 commits into from
Closed

4364 install prov flask #4461

wants to merge 18 commits into from

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Feb 8, 2018

connects to #4364 As a Dataverse Installation Administrator, I want to set up the Provenance system to run alongside Dataverse so that researchers using my installation can track and view provenance

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0009%) to 12.856% when pulling 01bfa41 on 4364-install-prov-flask into 2baa606 on develop.

@pdurbin pdurbin requested a review from donsizemore February 8, 2018 20:05
Copy link
Contributor

@pameyer pameyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short version:

  • cleaner separation between production build CPL RPM / production installation might be helpful (or existing separation clarified?). Possibly one script to setup the build VM/container, one to build the RPM, and one that installs and configures it?
  • preferable to use apache or nginx over the flask development server in production


If you're feeling adventurous, you can attempt to install CPL directly on your Mac but this is not recommended. Rather, below CPL runs as a REST service within Vagrant.

First, install Vagrant and VirtualBox as described in the :doc:`tools` section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it would be less complex to move these two files (possibly renamed if they're too DV specific) to the prov-cpl repo.

If not, it would probably be more consistent if these were moved to conf/ (rather than doc/); or we should move the docker-related stuff to a different directory with the idea of having all dev VM/containerization/etc living in one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put files in doc to make them easily downloadable from Sphinx. If anything I would move more conf files under Sphinx when they are used as examples. "Download this file, edit it, etc."

We can certainly see if @jacksonokuhn would like to have Vagrant files in the prov-cpl repo. I didn't want to inhibit progress by making that a dependency. My goal was to unblock Dataverse developers who wanted to get started on prov related work and needed a prov system running on their laptops. Incremental progress. We can make improvements in the future.


curl -X PUT -d 'http://localhost:7777' http://localhost:8080/api/admin/settings/:ProvServiceUrl

Installing CPL on Ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention which versions of ubuntu are expected to work. On version is mentioned below, but it might be clearer to move it to earlier in this section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who is relatively familiar with Vagrant, I would disagree with documenting the version of Ubuntu because what matters is the "base box" in the Vagrantfile. This is currently config.vm.box = "ubuntu/xenial64" which means 16.04 (the latest LTS as of this writing). Basically a comment here has the potential to get out of date. That said, I'm fine adding a comment if it's truly helpful.


Build the RPM:

``rpmbuild -ba dataverse-provenance.spec``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, it might be better to put these steps into a script that can be run from within the Vagrantfile. Manually editing the existing Vagrantfile might be simpler, but would also slow down automating things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe but I think @landreev and I are the only ones on the team who have ever built an RPM. I purposefully documented the steps in prose. I could see adding more automation once we've gotten more familiar with the process. Unfortunately, we can't use the HMDC Jenkins server to automate the RPM building because it's running el6 and there appear to be incompatibilities between el6 and el7. I left a comment about this at #4364 (comment) . For now the plan is the build this RPM in Vagrant and I don't want to be the only one who knows how. Note also that the version of the RPM matters. At this stage in the game, a human needs to make decisions about when to bump the version and when to get in touch with the upstream author about cutting a release, etc. It's complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I was reminded of the discussions we had in #4369 about documentation / scripts for "surgery" on dependencies, which was why I mentioned it.


Download :download:`dataverse-provenance.service <../_static/installation/files/etc/systemd/system/dataverse-provenance.service>` and place it at ``/etc/systemd/system/dataverse-provenance.service``. Here are the default contents of that file but see below for adjustsments you might want to make:

.. literalinclude:: ../_static/installation/files/etc/systemd/system/dataverse-provenance.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the flask development server in production is probably something we should avoid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. Sure enough http://flask.pocoo.org/docs/0.12/deploying/#deployment says, "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time." Do you have any suggestions on how to deploy a Flask app in a better way? Heads up to @jacksonokuhn about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache / wsgi would be the way that I'd go; but there are other approaches.

@@ -0,0 +1,79 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest splitting these into build-provenance.sh and install-provenance.sh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything, I might delete this script. It doesn't quite work, especially the part around postgresql-setup-conf.sql.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a way to reduce the confusion.

sudo pip install flask
cd /prov-cpl/bindings/python/RestAPI
REST_SERVICE_USER=postgres # FIXME: create a "cplrest" user?
su $REST_SERVICE_USER -s /bin/sh -c "python cpl-rest.py --host=0.0.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning when vagrant up was a little surprising to me. I'd agree about creating a cplrest user; even though it shouldn't matter inside a dev VM.

I also ran into the vagrant halt doesn't halt problem with this file; but that may be more related to my system than this Vagrantfile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I haven't run into problems with vagrant halt (or vagrant destroy). I know it's weird to run the service as the postgres user but like you said, this is a dev VM and doesn't really matter. For production I'm using the name @jacksonokuhn and I talked about (cplservice) but I'm open to changing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cplservice for production sounds reasonable to me; but I'll defer to @jacksonokuhn . postgres for dev works (even if it's non-standard).

Install ``dataverse-provenance`` RPM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Download this :download:`dataverse-provenance RPM <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/dataverse-provenance-0.1-1.x86_64.rpm>` and install it with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guides.dataverse.org isn't currently available over https; distributing rpms over http is something that it's probably worth avoiding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point but no one has complained (that I'm aware of) about downloading the rapache RPM over plain HTTP: http://guides.dataverse.org/en/4.8.5/installation/r-rapache-tworavens.html#c-rapache

We did have a very security-minded individual take a look at our dev guide a couple years ago. You might enjoy reading through this comment, @pameyer: #2863 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I installed rapache, it would be an issue I'd raise.

echo "Starting CPL REST service..."
sudo bash -c "su -l $CPL_SERVICE_USER -c 'LD_LIBRARY_PATH=/usr/local/lib python cpl-rest.py --host=0.0.0.0 &'"
echo "Checking version of CPL REST service..."
curl http://localhost:5000/provapi/version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks failed when I ran them vagrant up; cpl-rest.py wasn't running; initial guess was cwd mismatch. Initial guess was incorrect; CPL module not installed in default python of cplsystem user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, the line about postgresql-setup-conf.sql isn't quite working. That's probably why the checks fail. The database user probably hasn't been created.

echo "Install dependencies for CPL REST service..."
sudo yum install -y python-flask
echo "Downloading CPL REST service script..."
sudo bash -c "su -l $CPL_SERVICE_USER -c 'wget https://raw.githubusercontent.com/ProvTools/prov-cpl/8150ee315abc21712b49da2bf4cfdbf308eef1d7/bindings/python/RestAPI/cpl-rest.py'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why this script would download a file from a repository that has already been cloned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postgres user doesn't have access to /home/vagrant which is where the git repo has been cloned.

@pdurbin
Copy link
Member Author

pdurbin commented Apr 10, 2018

Closing. As explained at #4364 (comment) the RPM will be built using a Vagrantfile in the prov-cpl repo. The RPM will likely be hosted as a binary uploaded to a release of that repo. Installation instructions will reside in the prov-cpl repo.

@pdurbin pdurbin closed this Apr 10, 2018
@matthew-a-dunlap matthew-a-dunlap deleted the 4364-install-prov-flask branch January 16, 2019 21:13
@matthew-a-dunlap matthew-a-dunlap restored the 4364-install-prov-flask branch January 16, 2019 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants