Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create project from WCQS SPARQL query (special case: requires Wikimedia authentication) #7

Open
trnstlntk opened this issue Apr 27, 2022 · 11 comments

Comments

@trnstlntk
Copy link

This is a feature request for a specific subcase of OpenRefine/OpenRefine#1212. This will be helpful for users who want to edit structured data of existing Wikimedia Commons files with the help of OpenRefine.

Through the SDC project for OpenRefine, users will be able to edit and upload files with structured data on Wikimedia Commons. See more info about this project on meta.wikimedia.org.

In some cases, it may be very handy for users to start an OpenRefine project with a SPARQL query from the Wikimedia Commons Query Service (WCQS). However, this specific SPARQL endpoint requires Wikimedia OAuth authentication.

It would be great if the work done on OpenRefine/OpenRefine#1212 also includes this use case, or alternatively we add support for WCQS after that general task has been completed.

Proposed solution

I have no idea at all about the technical difficulties re: this request. Curious to hear considerations around this!

Additional context

@trnstlntk trnstlntk added enhancement New feature or request wikicommons labels Apr 27, 2022
@trnstlntk
Copy link
Author

@antoine2711 tagging you here, since it touches upon the Outreachy project you will hopefully be mentoring soon :-)

@lozanaross
Copy link

lozanaross commented May 1, 2022

One thing worth adding here is that in the SDC survey (https://commons.wikimedia.org/wiki/File:Analysis_of_the_first_OpenRefine_SDC_open_survey.pdf) users expressed interest in being able to query WCQS, as well as WDQS, directly via OpenRefine as one route towards project creation. The WDQS case is also relevant to SDC because many commons files are linked also in WD via image P18, so in theory if we can support at least WDQS queries (perhaps via the Outreachy project) that will already meet some of the user needs, if not all of course.

@antoine2711
Copy link
Member

(…) being able to query WCQS, as well as WDQS, directly via OpenRefine as one route towards project creation.

@lozanaross: the way I see it, this SHOULD be done. And let's say I'm in a good position to promote it. ;-)
That being, I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time. This is probably more complex than choosing a different end-point, but still, technologically, it should work.

Let's see our far we can go.

Regards, Antoine

@lozanaross
Copy link

I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time

@antoine2711: that sounds goods really good, if any UI help is needed, I'm happy to advise.

@thadguidry
Copy link
Member

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

@antoine2711
Copy link
Member

antoine2711 commented May 2, 2022

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

@thadguidry: I think we can say that it has many scopes.
I do believe it can be achieve thru the Outreachy project « Implement a SPARQL Importer » which is very vague and which could fully and legitimately be a generic SPARQL end-point, or the specific WDQS and WCQS.

It does have the https://github.com/OpenRefine/OpenRefine/labels/gsoc%2Foutreachy … ;-) And thru the logic of being core to modifying SDC, it can only also be in it, in my opinion.

Regards, Antoine

@trnstlntk
Copy link
Author

trnstlntk commented May 2, 2022

@lozanaross @trnstlntk Isn't this issue out of scope for the SDC grant? Looks like it's tagged against the Project via GitHub and maybe should not? I might be wrong and it's in scope? :-)

This is totally in scope. See Lozana's comment above. Starting a project from a Wikimedia Commons SPARQL query is going to be very helpful for Wikimedia Commons users (batch SDC editors) in OpenRefine.

We have not promised this feature as part of the current Wikimedia Foundation grant, but I do want to investigate if we can implement it and I want keep this issue on our (Wikimedia Commons focused) radar generally.

@trnstlntk
Copy link
Author

That being, I will also try to do federated queries, that is, query 2 or more SPARQL end-point at the same time. This is probably more complex than choosing a different end-point, but still, technologically, it should work.

Pertaining this specific task (the Wikimedia Commons SPARQL endpoint): I am hearing that federated querying (e.g. involving both WDQS and WCQS) is not very obvious there, because of the authentication at WCQS.

You would make me very happy (and help the Wikimedia ecosystem) if we can at least research from our side if (federated) querying with WCQS is possible for project creation in OpenRefine. If it is very hard or impossible to do for us, then I can take this as an additional argument to Wikimedia Foundation search/query teams that more investment in Commons' SPARQL endpoint is needed, so that the authentication layer there can be removed.

@antoine2711
Copy link
Member

Pertaining this specific task (the Wikimedia Commons SPARQL endpoint): I am hearing that federated querying (e.g. involving both WDQS and WCQS) is not very obvious there, because of the authentication at WCQS.

You would make me very happy (and help the Wikimedia ecosystem) if we can at least research from our side if (federated) querying with WCQS is possible for project creation in OpenRefine.

Well, @trnstlntk, for WDQS, I did a federated query with another end-point, and I think that I also did it from an external query service that used WDQS in a federated query.

Now, if there is authentification with the WCQS, it, in itself, will be a challenge. But once it's setteld, I don't see why FROM WCQS, we couldn't do a federated query. But it's probably very hard to do it from another query service and use WC end-point as a federated query.

Regards, Antoine

@lozanaross
Copy link

for WDQS, I did a federated query with another end-point, and I think that I also did it from an external query service that used WDQS in a federated query.

@antoine2711 @trnstlntk from my point of view federated queries would be super useful even outside SDC scope (ie outside WCQS) and just in general for the Wikdiata/Wikibase extension purposes. With my NFDI hat on, I would find querying e.g. my own Wikibase + Wikdiata pretty useful. WCQS would be added bonus, but only if the balance between added value vs extra dev effort is worth it in the end.

@trnstlntk trnstlntk transferred this issue from OpenRefine/OpenRefine Jun 18, 2022
@trnstlntk
Copy link
Author

We discussed that it may be useful if I'd collect some typical example queries. Here are a few that will be useful for people who want to batch edit SDC with OpenRefine, and who would like to start from a SPARQL query:

  • An advanced query which is very useful for SDC batch editing; it uses the Commons API to retrieve files from a specific Wikimedia Commons category. https://w.wiki/5JiK
  • From the example queries, and very useful for SDC editing as well (cleanup tasks): a query that shows files that depict both a specific church and the generic 'church building' (which would be good to remove). It would be great if OpenRefine could load more than 500 (ie not use the LIMIT if the query doesn't time out) https://w.wiki/5JiR
  • This one shows files by a specific username, with Depicts statements: https://w.wiki/5Jiu

@trnstlntk trnstlntk moved this from 🛣 Out of WMF grant scope but on our radar to Larger feature requests (future grants?) in Structured Data on Commons Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🛣 SDC support: Larger feature requests (future grants?)
Development

No branches or pull requests

4 participants