Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't push datastreams to Citation objects #55

Open
bondjimbond opened this issue Apr 18, 2018 · 26 comments
Open

Can't push datastreams to Citation objects #55

bondjimbond opened this issue Apr 18, 2018 · 26 comments

Comments

@bondjimbond
Copy link
Contributor

I just performed a push_pids operation on 2,111 MODS records. All of them went through except for a narrow range of objects with two-digit PIDs.

The result for all of these PIDs:

MODS datastream could not be pushed to object (object:pid); details below [error]
Not Found Error [error]

These objects all exit, and their MODS records are fine. Example:

Object: https://arcabc.ca/islandora/object/unbc%3A51
MODS record to push: attached.
unbc_51_MODS.xml.zip

There were a lot of errors, but the only commonality between them is that they were in a certain two-digit range. The errors go from unbc:32 through unbc:60 inclusive - the entire range errored out. Everything before and after those numbers were pushed with no problems.

@bondjimbond
Copy link
Contributor Author

Attempted to push just those files, and once again, got errors. Is there something odd about those objects, or those PIDs?

@mjordan
Copy link
Collaborator

mjordan commented Apr 18, 2018

Shouldn't be, I'll try to replicate it and report back.

@mjordan
Copy link
Collaborator

mjordan commented Apr 20, 2018

I can push MODS files to objects with PIDS ending in two digits:

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/crud55 
You are about to push datastreams to objects in your repository. This will create new versions of the datastreams, or create new datastreams if none exist. Do you want to want to continue? (y/n): y
Do you want to update each object's DC datastream using the new MODS? (y/n): y
MODS datastream pushed to object crud55:10                                                                                      [ok]
DC datastream for object crud55:10 regenerated from MODS                                                                        [ok]
MODS datastream pushed to object crud55:20                                                                                      [ok]
DC datastream for object crud55:20 regenerated from MODS    

So there must be something else going on. What were some of the errors?

@bondjimbond
Copy link
Contributor Author

The errors were simply "not found" each time.

One thing I notice about the PIDs in question: they are all Citation objects from the same collection (https://arcabc.ca/islandora/object/unbc%3Afacultypubs), and they are all objects without a PDF datastream. Other objects within that collection that have a PDF datastream pushed just fine.

You can see the results if you browse the collection: objects that pushed successfully have an "In Copyright" badge displayed on them, and objects that failed do not.

Is islandora_datastream_crud requiring an OBJ or PDF, for some reason, before it can push other datastreams?

@mjordan
Copy link
Collaborator

mjordan commented Apr 20, 2018

Is islandora_datastream_crud requiring an OBJ or PDF, for some reason, before it can push other datastreams?

No, the two objects I tested with last night were collection objects, so didn't have either an OBJ or a PDF.

It does check to see if the object exists or is accessible by the user specified in the drush command:

  if (!islandora_object_load($pid)) {
    drush_set_error('OBJECT_DOES_NOT_EXIST',
      dt('The specified object (!pid) does not exist or is not accessible.',
      array('!pid' => $pid)));
    return FALSE;
  }

Can you confirm that the user you are specifying has permission to view that object? If so, maybe the code needs to be more explicit about permissions as a result of https://jira.duraspace.org/browse/ISLANDORA-2064.

@bondjimbond
Copy link
Contributor Author

Can you confirm that the user you are specifying has permission to view that object?

Yes, I'm running as user 1, which has all permissions. And these objects are all publicly viewable, with no restrictions on access. They have only three datastreams: RELS_EXT, MODS, DC. No XACML policies or other ways to block access.

@mjordan
Copy link
Collaborator

mjordan commented Apr 20, 2018

There's another way to eliminate the possibility that the absence of a particular datastream is the problem. Dump the object and then reload it into your vagrant using the same PID using https://github.com/mjordan/islandora_batch_with_derivs. I haven't tested this module hugely but it does work. Or you could clone the object using https://github.com/mjordan/islandora_object_clone to see what the results are. These wouldn't be perfect experiments but they might be fruitful. Using the clone module might be the easiest.

@bondjimbond
Copy link
Contributor Author

Tried more tests.

Took one of the XML files I was failing to push, uploaded to my Vagrant machine, changed the filename, tried to push it to a different Citation object (no document).

Result: Error.

Added a document to the Citation object. Push again.

Result: Error

Rename file, try pushing it to a Large Image object.

Result: Success.

Conclusion: islandora_datastream_crud_push_datastreams doesn't like Citation objects?

@bondjimbond
Copy link
Contributor Author

Another test: renamed same file to match a Thesis object, push datastreams.

Result: Success!

Conclusion: islandora_datastream_crud_push_datastreams almost certainly does not like Citation objects.

@bondjimbond bondjimbond changed the title Can't push datastreams to certain two-digit PIDs? Can't push datastreams to Citation objects Apr 20, 2018
@mjordan
Copy link
Collaborator

mjordan commented Apr 20, 2018

@bondjimbond thanks for the extra testing. I am sure you are seeing a pattern by now in modules that I have developed - they seem to have problems with citiations. Reality is that we are only now, two years after migrating to Islandora, starting to use them so they are not well understood or well tested within our custom modules.

It's not like I don't like citations, it's just that they are different enough from more common content models that they seem to surface less less than thorough testing easily. Now that you've identified this as a fruitful area for investigation, let me confirm your results locally and then move on to addressing the problem.

@bondjimbond
Copy link
Contributor Author

@mjordan I think I've fully confirmed that Citations don't work with this command. I've just fetched, modified, and pushed all the objects from the UFV repository now, and get an error for every citation as well.

It's not like I don't like citations, it's just that they are different enough from more common content models that they seem to surface less less than thorough testing easily.

Curiously, theses work just fine, even though they're almost exactly the same. Very curious.

Thanks for looking into it. I think I'll have to put my current work on hold until this is resolved.

@mjordan
Copy link
Collaborator

mjordan commented Apr 20, 2018

I'll take a look over the weekend. Thanks for finding this glitch.

@mjordan
Copy link
Collaborator

mjordan commented Apr 21, 2018

No luck reproducing the error. On my vagrant, I created two new Citations and downloaded their MODS. Then edited the MODS XML files and pushed them back up:

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/vagrant/crudissue55
You are about to push datastreams to objects in your repository. This will create new versions of the datastreams, or create new datastreams if none exist. Do you want to want to continue? (y/n): y
Do you want to update each object's DC datastream using the new MODS? (y/n): y
MODS datastream pushed to object ir:12                                                                                          [ok]
DC datastream for object ir:12 regenerated from MODS                                                                            [ok]
MODS datastream pushed to object ir:13                                                                                          [ok]
DC datastream for object ir:13 regenerated from MODS                                                                            [ok]
vagrant@islandora:/var/www/drupal/sites/all/modules/islandora_datastream_crud$ 

I confirmed that the MODS and DC were updated.

Can you replicate those exact steps and see what you get?

@mjordan
Copy link
Collaborator

mjordan commented Apr 21, 2018

Reran CRUD but answered n to the DC question, with same results.

@bondjimbond
Copy link
Contributor Author

Interesting... I just tried re-exporting my PIDs and their MODS records to see whether updating Scholar affected things. And the MODS records all contained the updated text!

So despite the error message, pushing the datastreams actually did work. What failed on the Citation objects, it seems, is the DC transform afterward.

@mjordan
Copy link
Collaborator

mjordan commented Apr 23, 2018

Hmm... still now sure what's what here. Would you like me to do any additional testing for now?

@bondjimbond
Copy link
Contributor Author

I'll try it again in my Vagrant and see if anything is different following your exact approach. Did you test using the XML file I attached at the top, see if it reacts differently than the test objects you created?

@mjordan
Copy link
Collaborator

mjordan commented Apr 23, 2018

No, I didn't try with that file, will do so later today.

@bondjimbond
Copy link
Contributor Author

@mjordan Did you get a chance to try out my datastream?

@mjordan
Copy link
Collaborator

mjordan commented May 1, 2018

Nope. I'll try to get to it today.

@mjordan
Copy link
Collaborator

mjordan commented May 1, 2018

@bondjimbond just pushed your MODS.xml datastream up with no errors or problems. I'm sorry, I don't know what else to try. Is it possible to paste in some of the errors that you are seeing?

@mjordan
Copy link
Collaborator

mjordan commented May 1, 2018

drush islandora_datastream_crud_push_datastreams --user=admin --datastreams_source_directory=/tmp/crud55
You are about to push datastreams to objects in your repository. This will create new versions of the datastreams, or create new datastreams if none exist. Do you want to want to continue? (y/n): y
Do you want to update each object's DC datastream using the new MODS? (y/n): y
MODS datastream pushed to object ir:12                                                                                                              [ok]
DC datastream for object ir:12 regenerated from MODS                                                                                                [ok]
vagrant@islandora:/var/www/drupal/sites/all$ 

crud55

@bondjimbond
Copy link
Contributor Author

That's very weird. It failed for me in Vagrant and in production. The only error I got was the one I pasted at the start of this issue:

MODS datastream could not be pushed to object (object:pid); details below [error]
Not Found Error [error]

@mjordan
Copy link
Collaborator

mjordan commented May 2, 2018

What is "(object:pid)"? That looks like a PID. Is there anything in the directory --datastreams_source_directory points to that would fool CRUD into thinking it should push to an object with the PID "object:pid"? I'm grasping at straws here but I have no idea what's causing the problems you're seeing.

@bondjimbond
Copy link
Contributor Author

What is "(object:pid)"? That looks like a PID. Is there anything in the directory --datastreams_source_directory points to that would fool CRUD into thinking it should push to an object with the PID "object:pid"?

In that case it was:

MODS datastream could not be pushed to object unbc:51; details below [error]
Not Found Error [error]

I just used (object:pid) as a placeholder to represent all the PIDs that errored out.

@mjordan
Copy link
Collaborator

mjordan commented Apr 22, 2019

@bondjimbond is this still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants