Start planning on how to structure the CSV to account for complex objects #56

mjordan · 2019-08-19T14:22:14Z

The paged content sprint for Islandora 8 is first half of September. Workbench should be ready to ingest paged and compound objects soon after the sprint is done. The sprint will determine how the relationship between the parent and its children (and the ordering of children) will be instantiated, but likely, given discussion so far, it will be via one or more node fields on each child pointing to the parent.

Assuming that the parent<->child relationship will be expressed in node fields, Workbench can handle populating those fields if it can translate that relationship from the structure of the metadata CSV file. For example, a structure like this for non-complex images:

id,file,title,description
001,image1.jgp,Title 1,First image
002,image2.jpg,Title 2,Second image

could be expanded to something like this for compound objects, where the parent field contains the id of the object's parent:

id,parent_id,file,title,description
001,,,Postcard 1,The first postcard
003,001,front.jpg,Front of postcard 1,The first postcard's front
004,001,back.jpg,Back of postcard 1,The first postcard's back
002,,,Postcard 2,The second postcard
006,002,front2.jpg,Front of postcard 2,The second postcard's front
007,002,back2.jpg,Back of postcard 2,The second postcard's back

An important ability will be to include parent and child-level metadata in the same CSV file, so they can be ingested during the same task.

The text was updated successfully, but these errors were encountered:

mjordan · 2019-08-19T14:26:52Z

It would be useful to not have to have a row for each page in a newspaper issue, for example. In that case, all page files could be grouped into a directory and their order expressed in their filenames. However, in that case, we'd need some sort of way to define a minimal set of field values such as title and identifier; in other words, if these values are not in the CSV file, how do we derive them when populating the child nodes?

seth-shaw-unlv · 2019-08-19T15:39:26Z

The method you first describe is what I had in mind (and what I was thinking of when I posted issue #18).

As to the "use a directory" method: that is one of the ways CONTENTdm deals with compound objects: each compound object has a directory with the parent record's identifier. Each file in that directory corresponds to a child object with the only metadata being the file name (sans extension). E.g.

root/
  | - 001/ (Parent record has the identifier "001")
  |     | - 001_001.tif (Child record of "001" with identifier "001_001")
  |     | - 001_002.tif (Child record of "001" with identifier "001_002")  
  |     | - 001_003.tif (Child record of "001" with identifier "001_003")  
  |
  | - 002/ (Parent record has the identifier "002")
  |     | - 002_recto.tif (Child record of "002" with identifier "002_recto")
  |     | - 002_verso.tif (Child record of "002" with identifier "002_verso")
  |
  | - 003.tif (Simple object with the identifier "003")

mjordan · 2019-08-19T18:02:31Z

@seth-shaw-unlv yes, I think we can support both "with metadata" and "directory" methods.

…V option described in #56.

mjordan · 2020-03-29T18:49:46Z

@seth-shaw-unlv I've got a working implementation of the first method described above in the "issue-56" branch. How it works is described in the "Creating paged content" section of that branch's README. Can you take a look to see if it's clear?

seth-shaw-unlv · 2020-03-30T16:01:30Z

@mjordan, your description looks good to me; although I may not be the best test-subject.

mjordan · 2020-03-30T16:28:08Z

Thanks for looking. The required fields in the CSV are pretty standard, so they shouldn't pose any problems for the spreadsheet editor.

No sweat on the testing, I included an integration test that passes. I think I'll merge so that branch doesn't get stale.

mjordan · 2020-04-05T21:57:15Z

As of 2e94036, Workbench support both of the methods for creating paged content described above. Closing for now, we can reopen this one or new issues as needed.

mjordan added enhancement New feature or request question Further information is requested labels Aug 19, 2019

mjordan added a commit that referenced this issue Mar 29, 2020

WIP on #56.

962f7b9

mjordan added a commit that referenced this issue Mar 29, 2020

WIP on #56.

2526d31

mjordan added a commit that referenced this issue Mar 29, 2020

Initial implemenation of creating paged content using the explicit CS…

6dc46e9

…V option described in #56.

This was referenced Mar 29, 2020

Dynamically generate Islandora model values for parent (paged content) and child (page) nodes #85

Open

Dynamically generate weight values for child (page) nodes #84

Open

mjordan added a commit that referenced this issue Mar 29, 2020

Added intgration test for paged content (part of #56).

0e3ded5

This was referenced Apr 1, 2020

Write a CSV file that contains the IDs for newly created nodes, which could then be used to update those nodes later #92

Closed

Require a unique ID field #52

Closed

mjordan added a commit that referenced this issue Apr 5, 2020

WIP on 'directory' method described in #56.

cb16472

mjordan added a commit that referenced this issue Apr 5, 2020

WIP on 'directory' method described in #56.

2e512da

mjordan added a commit that referenced this issue Apr 5, 2020

WIP on 'directory' method described in #56.

1ea213f

mjordan closed this as completed Apr 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start planning on how to structure the CSV to account for complex objects #56

Start planning on how to structure the CSV to account for complex objects #56

mjordan commented Aug 19, 2019 •

edited

Loading

mjordan commented Aug 19, 2019

seth-shaw-unlv commented Aug 19, 2019 •

edited

Loading

mjordan commented Aug 19, 2019

mjordan commented Mar 29, 2020

seth-shaw-unlv commented Mar 30, 2020

mjordan commented Mar 30, 2020

mjordan commented Apr 5, 2020

Start planning on how to structure the CSV to account for complex objects #56

Start planning on how to structure the CSV to account for complex objects #56

Comments

mjordan commented Aug 19, 2019 • edited Loading

mjordan commented Aug 19, 2019

seth-shaw-unlv commented Aug 19, 2019 • edited Loading

mjordan commented Aug 19, 2019

mjordan commented Mar 29, 2020

seth-shaw-unlv commented Mar 30, 2020

mjordan commented Mar 30, 2020

mjordan commented Apr 5, 2020

mjordan commented Aug 19, 2019 •

edited

Loading

seth-shaw-unlv commented Aug 19, 2019 •

edited

Loading