Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve automation of data publishing #5

Closed
ColmMassey opened this issue Feb 4, 2020 · 37 comments
Closed

Improve automation of data publishing #5

ColmMassey opened this issue Feb 4, 2020 · 37 comments
Assignees

Comments

@ColmMassey
Copy link
Collaborator

Explore what we can do to reduce the number of manual steps required to in the publication of datasets to the Virtuoso server and the web server .

@ColmMassey
Copy link
Collaborator Author

Include here exploring options for triggering the publication of some new data, e.g. after some new Limesurvey entries have been approved.

@wu-lee
Copy link
Contributor

wu-lee commented Feb 4, 2020

If the sausage factory is going to remain implemented in Ruby, it makes sense to either use Rake instead of Make, or just provide a script to invoke.

The Makefiles I've seen tend to:

  1. Require the user to edit variables in some Makefile;
  2. Which are often committed to the repo, and so:
    1. local changes and committed changes fight;
    2. local changes risk being re-committed and shooting someone else's foot (or merely creating churn).
  3. Mix concerns, in that config and code (or implementation) are intertwined.

My suggestions:

  1. Put any config in a separate place to the implementation.
  2. Do not require changes to committed files for local configuration (perhaps allow a local file, which is .gitignored, to override selected defaults, or use an equivalent of NPM's project settings)
  3. Minimise the amount of configuration required - or at least make easy cases easy and keep the complicated cases out of the typical usage process.
  4. Avoid imposing long commands and recipies on the user

For example:

config.rb set 'user' 'joe'
config.rb set 'host' 'sea-map'
build.rb --dataset $dataset # if there are choices like  $dataset, they could be configured or go here
deploy.rb # uses configured destination above

@ColmMassey ColmMassey transferred this issue from DigitalCommons/open-data-and-maps Mar 4, 2020
@ColmMassey
Copy link
Collaborator Author

ColmMassey commented Mar 4, 2020

How many manual steps are there currently to rebuilding and deploying a triplestore graph? (Assuming the original.csv file is present in the correct directory and the structure is unchanged from the previous build.) @dtmakm27

@ColmMassey
Copy link
Collaborator Author

  • 1 to build standard csv
  • 1 to generate rdf
  • 1 to deploy rdf
  • 1 to deploy the virtuoso
  • 1 generates one to get the data on virtuoso

@ColmMassey
Copy link
Collaborator Author

There are then project specific steps,

  • Limesurvey - Oxford/Newbridge
  • DotCoop
  • Co-ops UK - Has two source csv files, initiatives.csv and organisations.csv

@ColmMassey
Copy link
Collaborator Author

ColmMassey commented Mar 5, 2020

Include here exploring options for triggering the publication of some new data, e.g. after some new Limesurvey entries have been approved.

Let's give this its own Issue.

@wu-lee
Copy link
Contributor

wu-lee commented Mar 5, 2020

I have a feeling there may be other steps too - for instance

  • when there is more than one graph being queried the data generation needs to be repeated for each
  • some datasets need SAMEAS mappings generated too
  • I'm currently trying to find out where the data for URIs like this come from: https://w3id.solidarityeconomy.coop/essglobal/V2a/

@wu-lee
Copy link
Contributor

wu-lee commented Mar 5, 2020

I've added a Makefile here on a side-branch dotcoop-demo-nick which works on the dotcoop build and makes it simpler / fewer steps (run make all, but you still need to upload to Virtuoso manually as a final step)

(The branch is where I'm trying to work out how to deploy a build entirely on dev-0 with no reference to sea-0, which contains various historical data sets, and I wonder if might unwittingly depend on some)

@ColmMassey
Copy link
Collaborator Author

@wu-lee does it make sense to get the simpler Newbridge database working first?

@wu-lee
Copy link
Contributor

wu-lee commented Mar 5, 2020

does it make sense to get the simpler Newbridge database working first?

Possibly? I'm just starting with what I know / want to get to... and won't that also need ESSGlobal URIs?

I think this code generates the ESSGlobal data:

https://github.com/SolidarityEconomyAssociation/open-data-and-maps/tree/master/vocab

And that needs to be deployed too, currently it's on https://vocabs.solidarityeconomy.coop/

The w3id redirector on dev-0 currently redirect self-resolving URIs like this:

https://dev.w3id.solidarityeconomy.coop/essglobal/V2a/vocab/SSEInitiative

...to the sea-0 pages like this:

https://vocabs.solidarityeconomy.coop/essglobal/V2a/html-content/essglobal.html#SSEInitiative/

And the sausage factory needs these URLs to resolve, or it explodes. This means a self-contained deployment on dev-0 would need to modify the w3id redirector on it to point resolve these URLs to dev.vocabs.solidarityeconomy.coop, and there'd need to be the ESSGlobal vocab data deployed there too. Possibly we don't have to do that if the vocab is essentially unchanging, but it is useful to note that this is an essential thread in the weaving process.

@wu-lee
Copy link
Contributor

wu-lee commented Mar 5, 2020

Furthermore, digging around on the vocabs site, I see it says here the data is generated by a script:

https://vocabs.solidarityeconomy.coop/essglobal/V2a/DO_NOT_EDIT_HERE.txt

Specifically:

https://github.com/essglobal-linked-open-data/map-sse/blob/develop/generators/Makefile

I see Matt Wallis is the main author in that repo (although there is one commit by marianamalta). Do we have access to it? And do we know if or how that is used by the open-data-and-maps repo?

@ColmMassey
Copy link
Collaborator Author

Do we have access to it?

I think I have just invited you to the repository.

@ColmMassey
Copy link
Collaborator Author

Possibly we don't have to do that if the vocab is essentially unchanging

It is not changed often but does change. I 'think' the last change we made was the addition of Co-operative to the Organisation Structure, as we sometimes knew something was a Co-op, but not what type.

https://w3id.solidarityeconomy.coop/essglobal/V2a/standard/organisational-structure/OS115
OS115
Co-operative.

@ColmMassey
Copy link
Collaborator Author

I 'think' the last change we made was the addition of Co-operative to the Organisation Structure, as we sometimes knew something was a Co-op, but not what type.

Yes, here it is.

@wu-lee
Copy link
Contributor

wu-lee commented Mar 6, 2020

I notice that the ESSGlobal vocab page's title (seen in browser tabs) is "Raptor Graph Serialisation" - is that correct?

https://vocabs.solidarityeconomy.coop/essglobal/V2a/vocab-content/essglobal-vocab.html

@ColmMassey
Copy link
Collaborator Author

ColmMassey commented Mar 11, 2020

I notice that the ESSGlobal vocab page's title (seen in browser tabs) is "Raptor Graph Serialisation" - is that correct?

Raptor is an rdf serialiser so it might be the default title generated if not specified by the user. I'm sure we could choose a more useful title. What would you suggest? @wu-lee ?

@sunnydean
Copy link
Contributor

Hi all, a quick heads up, i will start working on transforming the make files into ruby

@ColmMassey
Copy link
Collaborator Author

I assume Rake is a ruby version of make. Does it make sense to break this into a few smaller Issues? @wu-lee, @dtmakm27

@sunnydean
Copy link
Contributor

Better do it in one task as it is just translating the make files to ruby and it has to be done in one go

@ColmMassey
Copy link
Collaborator Author

And the limesurvey api?

@sunnydean
Copy link
Contributor

If we were to split it in tasks we would have

Translate the make files to rake files for ease of use and automation

Update tipple store data instead of removing and reuploading all the data each time

Use limesurvey api to trigger updates to data

@ColmMassey
Copy link
Collaborator Author

Use limesurvey api to trigger updates to data

Which is covered by #6

@ColmMassey
Copy link
Collaborator Author

Update tipple store data instead of removing and reuploading all the data each time

Is deserves another ticket and we don't need it for this sprint. I'll create on so this Issue is now just the rmake port and tidy.

@sunnydean
Copy link
Contributor

This issue (#9
) is dependant on how we go about automating/translating the code
We should do this first and then follow up with the atomic updates, then fix the caching

@sunnydean sunnydean self-assigned this Apr 6, 2020
@ColmMassey
Copy link
Collaborator Author

This issue (#9
) is dependant on how we go about automating/translating the code
We should do this first and then follow up with the atomic updates, then fix the caching

This is not that clear to me. I'd like to close this ticket as it is too big.

Is there any more work you need to do on the rmake port? Is that what you mean when you say "automating/translating the code". If so, create a new Issue for it and we can close this one, #5.

@sunnydean
Copy link
Contributor

@ColmMassey @wu-lee
in the latest commit to open data I changed a whole bunch of the make files into ruby scripts and added some functionality to make the whole thing easier to generate/deploy.

What's missing is some documentation on how to configure it and run all the pipeline processes. Before I write that up I would like to test it out and make sure it can be configured easily to deploy to a new server (it is currently only tested only on the production one). @wu-lee could you get the dev virtuoso server up and running?

In a nutshell, configurability is much easier now, some of the ruby files will be reused for the caching and atomic changes (which is a bit time-saving), you no longer have to delete graphs to load data (scripts do that for you), htaccess file is generated for you when you deploy

There's two things left: writing the documentation (after testing it on the dev server), and implementing it in the other datasets

@ColmMassey
Copy link
Collaborator Author

It would be great to get a walk thorough/demonstration of these changes.

@wu-lee
Copy link
Contributor

wu-lee commented Apr 15, 2020

Virtuoso dev server should be up and running already. See here for some documentation on what's where:

https://github.com/SolidarityEconomyAssociation/technology-and-infrastructure/blob/master/ansible/REBUILDING-DEV-0.md

The virtuoso service I find seems to be a little flaky, as you know it shut down for some reason. In which case check the reason with:

service virtuoso-opensource-6.1 status

And start it with

service virtuoso-opensource-6.1 start

@ColmMassey
Copy link
Collaborator Author

as you know it shut down for some reason.
Do we need some sort of watch on the production version that can email us if it goes down?

@wu-lee
Copy link
Contributor

wu-lee commented Apr 15, 2020

Do we need some sort of watch on the production version that can email us if it goes down?

Probably, yes. I'll add a ticket to check with Web Architects.

https://github.com/SolidarityEconomyAssociation/technology-and-infrastructure/issues/45

@sunnydean
Copy link
Contributor

Hi @wu-lee it stopped again for whatever the reason
I restarted it using the virtuoso command directly and debugged a bit to see if there's a problem.
We should check it in a couple of days to see if it's still working

@wu-lee
Copy link
Contributor

wu-lee commented Jun 26, 2020

I see Matt Wallis is the main author in that repo (although there is one commit by marianamalta). Do we have access to it? And do we know if or how that is used by the open-data-and-maps repo?

Regarding #20, came back here to check the URLs. We also have a fork of that original ESSGLOBAL repository here:

https://github.com/SolidarityEconomyAssociation/map-sse

I assume we should work on that one rather than the original?

@ColmMassey
Copy link
Collaborator Author

I assume we should work on that one rather than the original?

I would assume so. What other options are there? What were the last changes made to that repository?

@wu-lee
Copy link
Contributor

wu-lee commented Jun 26, 2020

If you look at the "network" graph on the insights tab of map-sse, you can see that the original repo has some slightly newer commits, but they're all by @joebillings, back in 2019 - so without more information from Joe, I would assume the repos are in the state they're meant to be in.

https://github.com/SolidarityEconomyAssociation/map-sse/network

@ColmMassey
Copy link
Collaborator Author

Can you share what you recall about the repository history @joebillings ?

@sunnydean
Copy link
Contributor

@ColmMassey
Copy link
Collaborator Author

I'm closing this Issue as it's a sprawler and has been essentially done. Moving conversation about

https://github.com/SolidarityEconomyAssociation/map-sse

here.

If you disagree please create a new Issue to cover what you don't want lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants