Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing PR prep #159

Merged
merged 12 commits into from
Dec 22, 2017
Merged

Indexing PR prep #159

merged 12 commits into from
Dec 22, 2017

Conversation

emetsger
Copy link
Contributor

@emetsger emetsger commented Nov 3, 2017

PR to get Travis' feedback on changes. :)

@emetsger emetsger force-pushed the indexing-pr-prep branch 5 times, most recently from 6afb7f8 to 012aaec Compare December 21, 2017 20:47
…aluating the dependency tree.

The 'maven.antrun.skip' property by itself is not sufficient.  Wrapping the Derby ant task in a profile will hopefully prevent errors with Travis by preventing the plugin dependencies from being evaluated.
@karenhanson karenhanson changed the title [DO NOT MERGE] Indexing PR prep Indexing PR prep Dec 22, 2017
Copy link
Contributor

@karenhanson karenhanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK except for the comments around how status changes are handled. I think as it stands it might be possible to delete active discos, if I'm understanding correctly. If so, we can either put in an issue, or fix before merging.

src/test/docker/cores/resources/data/*
src/docker/cores/discos/data/*
src/docker/cores/versions/data/*
src/docker/cores/resources/data/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these no longer exist


case UPDATE:
case INACTIVATION:
// FIXME: INACTIVATE every ACTIVE document in the newly indexed document's lineage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why fix? there can only ever be one active disco in a lineage, and if you are able to inactivate then you are inactivating that document. So, if this is done it wouldn't harm anything, but also wouldn't do anything.


case TOMBSTONE:
// FIXME: TOMBSTONE all (ACTIVE, INACTIVE) documents in the newly indexed document's lineage
discoOperations.updateStatus(doc.getDiscoUri(), RMapStatus.TOMBSTONED, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not something to fix - you can tombstone or delete one version of a DiSCO without it affecting other versions. Say you create a DiSCO and share that DiSCO's PID. Then you notice you accidentally put a private email address in the disco, You can create a new version and tombstone the previous version to hide the private information. The PID would allow people to find the latest version without seeing the deleted data.

Copy link
Contributor Author

@emetsger emetsger Dec 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so just making sure I understand:

  1. Create Disco A (implies a lineage with progenitor A is created), with the "bad" data
  2. Create Disco A' with "clean" data, a DERIVATION of A, and implies a lineage with progenitor A' is created
  3. Then, TOMBSTONE Disco A, which implies that it (or any previous versions, if they existed), should not be viewable.

Is my interpretation above correct?

If so, what if step 3 read:
3. Then, TOMBSTONE Disco A, which implies that its lineage should not be viewable.

Copy link
Contributor

@karenhanson karenhanson Dec 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this clarifies things. These would all be on the same lineage progenitor (discoV1). A DERIVATION is a whole new lineage, lineages do not affect each other.

//create discoV1
discoV1 (active)
//update discoV1
discoV1 (inactive)
discoV2 (active)
//update discoV2
discoV1 (inactive)
discoV2 (inactive)
discoV3 (active)
//update discoV2 
Cant update, the most recent version is discoV3
//tombstone discoV2
discoV1 (inactive)
discoV2 (tombstoned - but content in triplestore)
discoV3 (active)
//hard delete discoV3
discoV1 (inactive)
discoV2 (tombstoned - but content in triplestore)
discoV3 (deleted - content not in triplestore)

*
* @param lineageUri the lineage to delete from the index
*/
public void deleteDocumentsForLineage(String lineageUri) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above. you can delete a version without deleting the entire lineage. status is per disco version not per lineage

@emetsger
Copy link
Contributor Author

@karenhanson all decisions made in DiscosIndexer except for DELETION event types were made prior to lineage being available, so they definitely deserve to be revisited, and can be re-cast in terms of operations on the lineage.

Because there is no concept of a Solr transaction at the moment, it is possible to add a Solr document representing an ACTIVE DiSCO, and have the system crash before the previously ACTIVE DiSCO is inactivated. I think we would need some kind of transaction/two-phase commit implementation to insure this doesn't happen. Since we don't have that, the post processing doesn't necessarily make assumptions about how many ACTIVE documents to expect when deactivating or tombstoning.

If/when these operations are recast in terms of lineage (e.g. "tombstone this lineage"), probably not much will change, but at least the mental model ("operations on lineages") will be a little cleaner.

@karenhanson
Copy link
Contributor

After a conversation, the reason for going through the lineage and setting ACTIVE->INACTIVE on an update makes good sense. I've logged issue #176 for work on the other comment.

@karenhanson karenhanson merged commit c370f45 into rmap-project:master Dec 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants