-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
739 html codebook #6081
739 html codebook #6081
Conversation
Thanks @lubitchv! I've added this to Code Review and we'll take a look soon. Can you add a few notes here about how we'd test this? I think much of this is self explanatory, but I want to make sure we're not missing any details. Tagging @jggautier for his thoughts as well. |
@lubitchv, I added some docs about the new export format, feel free to check if these are in the correct spot and feel free to expand on them if necessary. Thanks again for this contribution!! |
@djbrooke This can be tested by comparing corresponding XML file from ExportMetadata->DDI with html file from ExportMetadata->DDI html codebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very cool feature and might make it easier to apply stylesheets other XML output. I emailed @jggautier about #5960 (comment) but I'll probably add some screenshots and thoughts on that issue, even though it's a little off topic. In short https://easy.dans.knaw.nl/oai/?verb=ListMetadataFormats is much prettier than https://demo.dataverse.org/oai?verb=ListMetadataFormats
The code seems fine (some tests would be nice) so I guess I'll go ahead and click "Approve". If we care to, one of us can change "html" to "HTML" in the bundle.
@@ -0,0 +1,2538 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to remind everyone that there's a related codebook file at src/test/java/edu/harvard/iq/dataverse/export/ddi/codebook.xsd
@@ -1209,6 +1209,7 @@ dataset.exportBtn.itemLabel.datacite=DataCite | |||
dataset.exportBtn.itemLabel.json=JSON | |||
dataset.exportBtn.itemLabel.oai_ore=OAI_ORE | |||
dataset.exportBtn.itemLabel.dataciteOpenAIRE=OpenAIRE | |||
dataset.exportBtn.itemLabel.html=DDI html codebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest capitalizing this to "HTML".
@@ -1610,4 +1632,47 @@ private static boolean checkParentElement(XMLStreamWriter xmlw, String elementNa | |||
return true; | |||
} | |||
|
|||
public static void datasetHtmlDDI(JsonObject datasetDtoAsJson, DatasetVersion version, OutputStream outputStream) throws XMLStreamException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much effort would it be to write a test for this "datasetHtmlDDI" method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To write a test to see that conversion from xml to html is successful would not be difficult. It can be added to DdiExportUtilTest. But to parse the resulting html file to see that the fields of xml converted properly to html would be quite cumbersome and time consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lubitchv ok, if you'd like to write the easier test, please let us know if you'd like us to pull this out of QA until you're done. Or we can leave it in QA and tests can be added in a future pull request. It's up to you. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will try to write the easier test on Monday. I think this PR can wait till I finish that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lubitchv sounds good. Please ping us when you're ready for us to move it back to code review. Have a nice weekend!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lubitchv - thanks for the commit adding the test. Is this ready for another round of code review and then QA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djbrooke Not yet, I need to check some things out. I will let you know when its ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djbrooke It is now ready for code review and QA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lubitchv! We'll take another look.
I was curious how "DDI HTML codebook" looks and works. Here are some screenshots as of 2163dc6 showing how a new tab is opened: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1209,6 +1209,7 @@ dataset.exportBtn.itemLabel.datacite=DataCite | |||
dataset.exportBtn.itemLabel.json=JSON | |||
dataset.exportBtn.itemLabel.oai_ore=OAI_ORE | |||
dataset.exportBtn.itemLabel.dataciteOpenAIRE=OpenAIRE | |||
dataset.exportBtn.itemLabel.html=DDI HTML codebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title case is used for buttons, like this:
dataset.exportBtn.itemLabel.html=DDI HTML codebook | |
dataset.exportBtn.itemLabel.html=DDI HTML Codebook |
@Override | ||
public void exportDataset(DatasetVersion version, JsonObject json, OutputStream outputStream) throws ExportException { | ||
try { | ||
Path cachedMetadataFilePath = Paths.get(version.getDataset().getFileSystemDirectory().toString(), "export_ddi" + ".cached"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFileSystemDirectory makes me suspect that this might not work on S3 or Swift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdurbin You are probably right. I did not test it for S3 or Swift, it probably will not work with it. That is why I just replaced getFileSystemDirectory with getExport from ExportService. As I understand it should work for S3 and Swift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lubitchv perfect, thanks! I just moved this to QA.
I feel like this issue is close (screenshots above show it working, even) so I'm leaving it in code review but here's what I'm thinking.
|
@djbrooke I think just talking about the new feature in the release note is enough. The first time someone tries to export metadata as an HTML codebook for a given dataset, it will be created on the fly and cached. That's my understanding anyway. In addition, according to this, missing exports will be created automatically nightly by a timer: http://guides.dataverse.org/en/4.15.1/admin/metadataexport.html#automatic-exports . Apologies if I'm getting any of this wrong. |
@pdurbin perfect, thanks. |
It creates an html codebook during publishing. Html codebook can be accessed from ExportMetadata->DDI html codebook
Related Issues
Pull Request Checklist