-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset versions, cards, and Solr indexing #380
Comments
Original Redmine Comment No indexing changes have been made but I laid out the logic that's been talked about with up to two Solr documents (a draft version and a released version) per dataset. Significant refactoring will be required next in the indexDatasetAddOrUpdate() method in the code block below. In the curl command tests I introduce the notion of a toggle (released=true vs. released=false) but no code has been written to support this yet. That's also next. This commmit: 42aeafb no change but add logic and tests for indexing dataset versions #3795
|
Original Redmine Comment Philip Durbin wrote:
Prior to this commit, we only ever had one Solr document at a time for datasets so the public could see when a version 1.0 released datasets became a draft. Now we keep the old Solr document around for the released 1.0 dataset until 1.1 (or 2.0) is published: aad5c01 start hiding drafts after version 1.0 from public #3795 There's still a lot of work to do but this groundwork was important for the next steps... the toggle facet, etc. Also, I haven't thought a lot about the implications for files... like Dataverses, right now there's still only one Solr document per file. Also, there's something funny going on with the citation. |
Original Redmine Comment Philip Durbin wrote:
There are bugs around the citation in general, it seems, as described in #3875. From my perspective, getting the citation for a dataset version should be a black box. Gustavo and I agreed today we'll stop indexing the citation anyway, which we've only been indexing for #3737 for a week or so anyway (since e9de92a). We should probably remove the custom non-highlighting CSS Mike and I added in b27c341. |
Original Redmine Comment First we tried a "Unpublished/Published" toggle and didn't like it. That was commit 58dfcf9 (still deployed to dvn-alpha). Then we tried merging Solr documents together with a "group by" feature of Solr. That code is here: https://github.com/IQSS/dataverse_temp/tree/solr-groupby Then we decided to try showing multiple cards (screenshot with comments at https://docs.google.com/a/harvard.edu/drawings/d/1PRJSP2EG31RPTG2SyTkjqldKe5lkNnV5FxQg2wFyql8/edit?usp=sharing). That's this commit I just made:
See also some of the reasoning and meeting notes at https://docs.google.com/a/harvard.edu/document/d/1clGJKOmrH8zhQyG_8vQHui5L4fszdqRjM4t3U6NFJXg/edit?usp=sharing |
Original Redmine Comment Philip Durbin wrote:
I finally started taking a look at files. We still have only one card per file (should we?) but by indexing based on the dataset version rather than the dataset itself, a bug has been fixed where new files uploaded to published studies were discoverable: index files based on version, not dataset itself #3795 · 522642b · IQSS/dataverse - 522642b There's still more work to be done in the area of files, however. Is the expectation that a single file can have multiple cards based changes made to the description (for example) after the file was initial published? I raise this question in the integration test at https://github.com/IQSS/dataverse/blob/master/scripts/search/tests/dataset-versioning05 as well as in the screenshot with comments at https://docs.google.com/a/harvard.edu/drawings/d/1PRJSP2EG31RPTG2SyTkjqldKe5lkNnV5FxQg2wFyql8/edit?usp=sharing |
Original Redmine Comment Philip Durbin wrote:
I stopped indexing the citation and am now looking it up properly on the dataset version rather than the dataset itself: stop indexing citation, get from dataset version #3795 · caea2f1 · IQSS/dataverse - caea2f1 I did not yet look at removing the custom CSS Mike added to avoid highlighting on the citation. It shouldn't be necessary anymore since the citation isn't being indexed at all now (so it will never show highlights). |
Original Redmine Comment Philip Durbin wrote:
At our Monday meeting it was decided that we do want multiple cards for files (just like datasets). This functionality is available as of this commit: file card draft/publish lifecycle, including delete #3795 · d10add4 · IQSS/dataverse - d10add4 Please note that the minute you edit a published dataset the creator will be able to see two cards for the dataset (published and draft) and for each file in the published study, a draft file card and a published file card. The rules on when draft file cards are created might change in #3943 but for beta this ticket is ready for testing. #3943 has lots of screenshots of a publishing workflow example that might be useful in understanding what to expect visually. If you're interested in trying the integration tests, they start at I'm moving this ticket to QA. |
Original Redmine Comment I realized that by default Solr limits queries to 10 documents which was imposing a low limit on the number of draft file cards that could be deleted. This commit fixes this, allowing for 2147483647 of these Solr docs to be deleted at once: don't limit deletion to 10 file solr docs #3795 · ebc5163 · IQSS/dataverse - ebc5163 Should be plenty. Also we may switch how we do this in #3960. |
Original Redmine Comment Tested on 5/14 Basic version logic works, at most 2 versions viewable: draft/published, results have proper tags and clicking on them goes to the correct version. Also tested with 11 files and draft entries for all 11 files are removed when published. Closing ticket |
Author Name: Philip Durbin (@pdurbin)
Original Redmine Issue: 3795, https://redmine.hmdc.harvard.edu/issues/3795
Original Date: 2014-03-31
Original Assignee: Kevin Condon
So far, every dataset has had a single corresponding Solr document. Now that we have versioning of datasets, Solr documents of published studies are being overwritten by draft versions when they are saved. This needs to change.
From a discussion between Gustavo and Merce on 2014-04-02:
The idea is that you'd only never see two cards for the same dataset at once. By default, you would see the published version. If you click the toggle, you will see only draft versions of datasets. Under the covers there will be two Solr documents such as id:dataset_42 and id:dataset_42_draft. Once the draft study is released, there will again only be one Solr document (and one card) for the dataset.
Related issue(s): #214, #394, #472, #483
Redmine related issue(s): 3628, 3809, 3887, 3898
The text was updated successfully, but these errors were encountered: