-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8372 gdcc xoai library #8753
8372 gdcc xoai library #8753
Conversation
…t from a local fork in 2016, with a new and improved io.gdcc version. Yay! Checking in the results of the first swipe of this I took last week. Very much work in progress; it (builds but) does not work properly yet. (ref. #8372)
…dFilters; fixes a couple of other small things. (#8372)
…rkaround (to be revisited; #8372)
@poikilotherm Hi, Harvard just gave us all an extra mini-vacation, but I'm back at work. I made this draft PR last week that finally gets rid of the old custom jars in local_lib. Once again, I cannot fully express my gratitude for your leading of this effort. There is one thing I could not implement 1:1 as it was: the handling of the proprietary "Dataverse JSON harvesting". I wanted to consult with you about this. The way the JSON api line has been encoded in the original implementation:
- This doesn't seem to be possible, to implement this using the gdcc.xoai framework without adding ugly hacks on the library side again (?). And it was a somewhat unfortunate choice in the first place. I feel like if we want to keep this "json harvesting" going forward, it should be something like
instead. This I was able to easily implement in the item repository; but it obviously introduces a backward incompatibility. Do you have any thoughts on this? I'm also going to talk to the core dev. team, about whether we want to continue supporting this functionality; and/or if we want to change it. |
src/main/java/edu/harvard/iq/dataverse/harvest/client/HarvesterServiceBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/harvest/client/HarvesterServiceBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/harvest/server/OAIRecordServiceBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/harvest/server/web/servlet/OAIServlet.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/harvest/server/web/servlet/OAIServlet.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/harvest/server/web/servlet/OAIServlet.java
Show resolved
Hide resolved
Issues so far: OK, editing to better spell out the client and server versions used. Using v5.12 to refer to this branch. Testing against both prod and demo for v5.11, will identify which host, too.
|
As for the first 2 formats under 1., we really don't support them (at least for now), so we should probably hide them from the pulldown and/or not allow people to select any formats except the supported ones. |
As for
dataverse_json does require an additional http call (directly to the export API, to obtain json), outside of the OAI library. So it does not surprise me that the code that makes this call may be more sensitive to whether the cert is valid or not. But if the client is 5.11 in this context, then this is the existing behavior. |
@landreev I wasn't saying that but since you asked I have now tested json using v5.11 as both client (demo) and server (prod) and get the same license error. |
@kcondon sounds like this reveals some internal network problems?! If there is anything I can do, please reach out (here/Matrix/Slack/Issue in XOAI/...) |
I don't think it's network problems. It looks like it literally is what it says - a problem trying to validate the cert of the remote server. Which in turn could have happened for various reasons... I'm working with Kevin on taking these cert issues out of the equation, so that we can focus on testing the XOAI functionality proper. |
OK, good to know. I'll figure this out, even if it's not specific to my branch. Again, to be able to take this error out of the equation/be able to test before vs. after. |
So, about those dataverse_json import errors: I'm guessing this means that the only way to test harvesting dataverse_json now is to have the new On the [somewhat] plus side, I still think that the very fact that these dataverse_json harvests are failing with the same import errors in my branch does mean that the harvesting framework is working as it should be. By the nature of the changes there, if it were broken, it would be failing to obtain the json, and never getting to the import stage. |
(Just put my name on the PR to update the status, as requested in standup) |
Since the json export PR has been merged, this issue is ready to be re-tested. |
OK, everything has been re-tested very carefully. This branch simplified the process of harvesting of our proprietary json records; it skips the step where the client gets a hacked GetRecord entry from the server and extracts the url for obtaining the JSON record. It just calls the export API directly, since the url for it can be derived from knowing the server address. However, any remaining Dataverse clients not yet upgraded to this version will continue using the old model. So for backward compatibility, this new version of xoai-server still knows how to produce the old-style, custom version of GetRecord with the url embedded (which is unfortunate, because it's really hacky. But we will get rid of it in a future version, once it is safe to assume that few if any old-style clients harvesting this format are still out there). So all three possible combinations of harvesting dataverse_json had to be tested: new-to-new; old-to-new; new-to-old. Testing was done using my own dev. build as the client, and either dataverse-internal or the perf. cluster as the server. |
What this PR does / why we need it:
Big technical debt PR; eliminates the custom jars of the old lyncode XOAI library; replaces them with dependencies for the new io.gdcc.xoai packages; fixes some known issues in the process.
Still a draft; basic functionality is there; still working on more tests, although I'm considering splitting that out as a separate issue.
Which issue(s) this PR closes:
Special notes for your reviewer:
The scope of this PR has ballooned A LOT. It was initially started under the assumption that it was simply to accommodate the move of the XOAI libraries from local_lib into a gdcc repo as they were, with only some minimal changes, leaving any further improvements and cleanup of the modifications in the original fork for later, as separate issues/PRs. Instead a very thorough refactoring of the entire xoai package was done before a stable 5.0 RC release was made on gdcc/xoai, reimplementing and/or getting rid of the early dataverse hacks and solving many issues in the process. Huge thanks to @poikilotherm for all the work there.
This is all a "good thing, not a bad thing"; it was all very real technical debt that we needed to address rather sooner than later. This has allowed to very much streamline the dataverse-side OAI implementation and get rid of most of the code that was maintained there. There's more that still needs to be done on the dataverse side for sure. But since this has already been a much larger effort than was originally scheduled, I really want to move the PR along and then proceed to handle the remaining improvements as separate tasks.
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: