Improve automation of data publishing #5

ColmMassey · 2020-02-04T10:09:17Z

Explore what we can do to reduce the number of manual steps required to in the publication of datasets to the Virtuoso server and the web server .

ColmMassey · 2020-02-04T10:27:57Z

Include here exploring options for triggering the publication of some new data, e.g. after some new Limesurvey entries have been approved.

wu-lee · 2020-02-04T12:01:59Z

If the sausage factory is going to remain implemented in Ruby, it makes sense to either use Rake instead of Make, or just provide a script to invoke.

The Makefiles I've seen tend to:

Require the user to edit variables in some Makefile;
Which are often committed to the repo, and so:
1. local changes and committed changes fight;
2. local changes risk being re-committed and shooting someone else's foot (or merely creating churn).
Mix concerns, in that config and code (or implementation) are intertwined.

My suggestions:

Put any config in a separate place to the implementation.
Do not require changes to committed files for local configuration (perhaps allow a local file, which is .gitignored, to override selected defaults, or use an equivalent of NPM's project settings)
Minimise the amount of configuration required - or at least make easy cases easy and keep the complicated cases out of the typical usage process.
Avoid imposing long commands and recipies on the user

For example:

config.rb set 'user' 'joe'
config.rb set 'host' 'sea-map'
build.rb --dataset $dataset # if there are choices like  $dataset, they could be configured or go here
deploy.rb # uses configured destination above

ColmMassey · 2020-03-04T17:16:10Z

How many manual steps are there currently to rebuilding and deploying a triplestore graph? (Assuming the original.csv file is present in the correct directory and the structure is unchanged from the previous build.) @dtmakm27

ColmMassey · 2020-03-05T15:30:52Z

1 to build standard csv
1 to generate rdf
1 to deploy rdf
1 to deploy the virtuoso
1 generates one to get the data on virtuoso

ColmMassey · 2020-03-05T15:33:38Z

There are then project specific steps,

Limesurvey - Oxford/Newbridge
DotCoop
Co-ops UK - Has two source csv files, initiatives.csv and organisations.csv

ColmMassey · 2020-03-05T15:40:14Z

Include here exploring options for triggering the publication of some new data, e.g. after some new Limesurvey entries have been approved.

Let's give this its own Issue.

wu-lee · 2020-03-05T15:52:37Z

I have a feeling there may be other steps too - for instance

when there is more than one graph being queried the data generation needs to be repeated for each
some datasets need SAMEAS mappings generated too
I'm currently trying to find out where the data for URIs like this come from: https://w3id.solidarityeconomy.coop/essglobal/V2a/

wu-lee · 2020-03-05T15:57:34Z

I've added a Makefile here on a side-branch dotcoop-demo-nick which works on the dotcoop build and makes it simpler / fewer steps (run make all, but you still need to upload to Virtuoso manually as a final step)

(The branch is where I'm trying to work out how to deploy a build entirely on dev-0 with no reference to sea-0, which contains various historical data sets, and I wonder if might unwittingly depend on some)

ColmMassey · 2020-03-05T16:00:14Z

@wu-lee does it make sense to get the simpler Newbridge database working first?

wu-lee · 2020-03-05T16:22:19Z

does it make sense to get the simpler Newbridge database working first?

Possibly? I'm just starting with what I know / want to get to... and won't that also need ESSGlobal URIs?

I think this code generates the ESSGlobal data:

https://github.com/SolidarityEconomyAssociation/open-data-and-maps/tree/master/vocab

And that needs to be deployed too, currently it's on https://vocabs.solidarityeconomy.coop/

The w3id redirector on dev-0 currently redirect self-resolving URIs like this:

https://dev.w3id.solidarityeconomy.coop/essglobal/V2a/vocab/SSEInitiative

...to the sea-0 pages like this:

https://vocabs.solidarityeconomy.coop/essglobal/V2a/html-content/essglobal.html#SSEInitiative/

And the sausage factory needs these URLs to resolve, or it explodes. This means a self-contained deployment on dev-0 would need to modify the w3id redirector on it to point resolve these URLs to dev.vocabs.solidarityeconomy.coop, and there'd need to be the ESSGlobal vocab data deployed there too. Possibly we don't have to do that if the vocab is essentially unchanging, but it is useful to note that this is an essential thread in the weaving process.

wu-lee · 2020-03-05T16:30:25Z

Furthermore, digging around on the vocabs site, I see it says here the data is generated by a script:

https://vocabs.solidarityeconomy.coop/essglobal/V2a/DO_NOT_EDIT_HERE.txt

Specifically:

https://github.com/essglobal-linked-open-data/map-sse/blob/develop/generators/Makefile

I see Matt Wallis is the main author in that repo (although there is one commit by marianamalta). Do we have access to it? And do we know if or how that is used by the open-data-and-maps repo?

ColmMassey · 2020-03-05T16:38:50Z

Do we have access to it?

I think I have just invited you to the repository.

ColmMassey · 2020-03-05T17:00:54Z

Possibly we don't have to do that if the vocab is essentially unchanging

It is not changed often but does change. I 'think' the last change we made was the addition of Co-operative to the Organisation Structure, as we sometimes knew something was a Co-op, but not what type.

https://w3id.solidarityeconomy.coop/essglobal/V2a/standard/organisational-structure/OS115
OS115
Co-operative.

ColmMassey · 2020-03-05T17:10:58Z

I 'think' the last change we made was the addition of Co-operative to the Organisation Structure, as we sometimes knew something was a Co-op, but not what type.

Yes, here it is.

wu-lee · 2020-03-06T10:15:27Z

I notice that the ESSGlobal vocab page's title (seen in browser tabs) is "Raptor Graph Serialisation" - is that correct?

https://vocabs.solidarityeconomy.coop/essglobal/V2a/vocab-content/essglobal-vocab.html

ColmMassey · 2020-03-11T16:58:48Z

I notice that the ESSGlobal vocab page's title (seen in browser tabs) is "Raptor Graph Serialisation" - is that correct?

Raptor is an rdf serialiser so it might be the default title generated if not specified by the user. I'm sure we could choose a more useful title. What would you suggest? @wu-lee ?

sunnydean · 2020-03-14T08:37:39Z

Hi all, a quick heads up, i will start working on transforming the make files into ruby

ColmMassey · 2020-03-14T17:01:34Z

I assume Rake is a ruby version of make. Does it make sense to break this into a few smaller Issues? @wu-lee, @dtmakm27

sunnydean · 2020-03-14T21:40:45Z

Better do it in one task as it is just translating the make files to ruby and it has to be done in one go

ColmMassey · 2020-03-15T11:20:23Z

And the limesurvey api?

sunnydean · 2020-03-15T11:44:48Z

If we were to split it in tasks we would have

Translate the make files to rake files for ease of use and automation

Update tipple store data instead of removing and reuploading all the data each time

Use limesurvey api to trigger updates to data

ColmMassey · 2020-03-15T12:07:51Z

Use limesurvey api to trigger updates to data

Which is covered by #6

ColmMassey · 2020-03-15T12:16:08Z

Update tipple store data instead of removing and reuploading all the data each time

Is deserves another ticket and we don't need it for this sprint. I'll create on so this Issue is now just the rmake port and tidy.

sunnydean · 2020-04-06T19:57:31Z

This issue (#9
) is dependant on how we go about automating/translating the code
We should do this first and then follow up with the atomic updates, then fix the caching

ColmMassey · 2020-04-06T20:58:35Z

This issue (#9
) is dependant on how we go about automating/translating the code
We should do this first and then follow up with the atomic updates, then fix the caching

This is not that clear to me. I'd like to close this ticket as it is too big.

Is there any more work you need to do on the rmake port? Is that what you mean when you say "automating/translating the code". If so, create a new Issue for it and we can close this one, #5.

sunnydean · 2020-04-14T22:13:10Z

@ColmMassey @wu-lee
in the latest commit to open data I changed a whole bunch of the make files into ruby scripts and added some functionality to make the whole thing easier to generate/deploy.

What's missing is some documentation on how to configure it and run all the pipeline processes. Before I write that up I would like to test it out and make sure it can be configured easily to deploy to a new server (it is currently only tested only on the production one). @wu-lee could you get the dev virtuoso server up and running?

In a nutshell, configurability is much easier now, some of the ruby files will be reused for the caching and atomic changes (which is a bit time-saving), you no longer have to delete graphs to load data (scripts do that for you), htaccess file is generated for you when you deploy

There's two things left: writing the documentation (after testing it on the dev server), and implementing it in the other datasets

ColmMassey · 2020-04-15T07:51:19Z

It would be great to get a walk thorough/demonstration of these changes.

wu-lee · 2020-04-15T20:27:42Z

Virtuoso dev server should be up and running already. See here for some documentation on what's where:

https://github.com/SolidarityEconomyAssociation/technology-and-infrastructure/blob/master/ansible/REBUILDING-DEV-0.md

The virtuoso service I find seems to be a little flaky, as you know it shut down for some reason. In which case check the reason with:

service virtuoso-opensource-6.1 status

And start it with

service virtuoso-opensource-6.1 start

ColmMassey · 2020-04-15T20:31:26Z

as you know it shut down for some reason.
Do we need some sort of watch on the production version that can email us if it goes down?

wu-lee · 2020-04-15T20:41:29Z

Do we need some sort of watch on the production version that can email us if it goes down?

Probably, yes. I'll add a ticket to check with Web Architects.

https://github.com/SolidarityEconomyAssociation/technology-and-infrastructure/issues/45

sunnydean · 2020-04-19T20:13:20Z

Hi @wu-lee it stopped again for whatever the reason
I restarted it using the virtuoso command directly and debugged a bit to see if there's a problem.
We should check it in a couple of days to see if it's still working

wu-lee · 2020-06-26T08:52:47Z

I see Matt Wallis is the main author in that repo (although there is one commit by marianamalta). Do we have access to it? And do we know if or how that is used by the open-data-and-maps repo?

Regarding #20, came back here to check the URLs. We also have a fork of that original ESSGLOBAL repository here:

https://github.com/SolidarityEconomyAssociation/map-sse

I assume we should work on that one rather than the original?

ColmMassey · 2020-06-26T09:38:41Z

I assume we should work on that one rather than the original?

I would assume so. What other options are there? What were the last changes made to that repository?

wu-lee · 2020-06-26T10:23:05Z

If you look at the "network" graph on the insights tab of map-sse, you can see that the original repo has some slightly newer commits, but they're all by @joebillings, back in 2019 - so without more information from Joe, I would assume the repos are in the state they're meant to be in.

https://github.com/SolidarityEconomyAssociation/map-sse/network

ColmMassey · 2020-06-26T10:28:02Z

Can you share what you recall about the repository history @joebillings ?

sunnydean · 2020-07-03T19:31:07Z

https://github.com/essglobal-linked-open-data/map-sse

This is the repo that the https://vocabs.solidarityeconomy.coop/essglobal/V2a/vocab-content/essglobal-vocab.html is based on and contains the most recent code

ColmMassey · 2020-07-21T13:49:42Z

I'm closing this Issue as it's a sprawler and has been essentially done. Moving conversation about

https://github.com/SolidarityEconomyAssociation/map-sse

here.

If you disagree please create a new Issue to cover what you don't want lost.

ColmMassey transferred this issue from DigitalCommons/open-data-and-maps Mar 4, 2020

sunnydean self-assigned this Apr 6, 2020

ColmMassey closed this as completed Jul 21, 2020

ColmMassey mentioned this issue Jul 21, 2020

Review how we are using the ESSGlobal repository fork #40

Open

Improve automation of data publishing #5

Improve automation of data publishing #5

Comments

ColmMassey commented Feb 4, 2020

ColmMassey commented Feb 4, 2020

wu-lee commented Feb 4, 2020 • edited Loading

ColmMassey commented Mar 4, 2020 • edited Loading

ColmMassey commented Mar 5, 2020

ColmMassey commented Mar 5, 2020

ColmMassey commented Mar 5, 2020 • edited Loading

wu-lee commented Mar 5, 2020

wu-lee commented Mar 5, 2020 • edited Loading

ColmMassey commented Mar 5, 2020

wu-lee commented Mar 5, 2020 • edited Loading

wu-lee commented Mar 5, 2020

ColmMassey commented Mar 5, 2020

ColmMassey commented Mar 5, 2020

ColmMassey commented Mar 5, 2020

wu-lee commented Mar 6, 2020

ColmMassey commented Mar 11, 2020 • edited Loading

sunnydean commented Mar 14, 2020

ColmMassey commented Mar 14, 2020

sunnydean commented Mar 14, 2020

ColmMassey commented Mar 15, 2020

sunnydean commented Mar 15, 2020

ColmMassey commented Mar 15, 2020

ColmMassey commented Mar 15, 2020

sunnydean commented Apr 6, 2020

ColmMassey commented Apr 6, 2020

sunnydean commented Apr 14, 2020

ColmMassey commented Apr 15, 2020

wu-lee commented Apr 15, 2020

ColmMassey commented Apr 15, 2020

wu-lee commented Apr 15, 2020

sunnydean commented Apr 19, 2020

wu-lee commented Jun 26, 2020

ColmMassey commented Jun 26, 2020

wu-lee commented Jun 26, 2020

ColmMassey commented Jun 26, 2020

sunnydean commented Jul 3, 2020

ColmMassey commented Jul 21, 2020

wu-lee commented Feb 4, 2020 •

edited

Loading

ColmMassey commented Mar 4, 2020 •

edited

Loading

ColmMassey commented Mar 5, 2020 •

edited

Loading

wu-lee commented Mar 5, 2020 •

edited

Loading

wu-lee commented Mar 5, 2020 •

edited

Loading

ColmMassey commented Mar 11, 2020 •

edited

Loading