Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible StreamReceiver output by marc modules due to inconsistent leader handling #454

Closed
TobiasNx opened this issue Jun 10, 2022 · 11 comments · Fixed by #526
Closed
Assignees
Labels

Comments

@TobiasNx
Copy link
Contributor

TobiasNx commented Jun 10, 2022

While the documentation of encode-marc21 states that it is compatible with the output of handle-marc-xml and decode-marc21, this is not factual due to inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml.

e.g.: We cannot transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

Functional review: @TobiasNx
Code review: @blackwinter


Behaviour of Flux-Modules:

decode-marc21
changes the leader to their specific function of the position:
See here

---
leader:
  status: "p"
  type: "a"
  bibliographicLevel: "m"
  typeOfControl: " "
  characterCodingScheme: "a"
  encodingLevel: " "
  catalogingForm: "c"
  multipartLevel: " "
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"

with option emitleaderaswhow="true" the leader-element is an toplevel and sublevel field
See here

---
leader:
  leader: "02602pam a2200529 c 4500"
"001": "946638705"
"003": "DE-101"
"005": "20070429135622.0"
"007": "tu"
"008": "960123s2004    gw |||||r|||| 00||||eng  "
"015  ":
  a: "05,A03,2104"
  z: "96,N47,0454"
  "2": "dnb"
"0167 ":

handle-marc-xml keeps the leader as an own field:
See here:

---
type: "Bibliographic"
leader: "00000naa a2200000uc 4500"
"001": "1106253078"
"003": "DE-101"
"005": "20171202230117.0"
"007": "cr||||||||||||"
"008": "160712s2016    gw |||||o|||| 00||||eng  "
"0167 ":
  "2": "DE-101"
  a: "1106253078"
"022  ":

encode-marcxml can handle the result of decode-marc21(emitleaderaswhole="true") but cannot if the leader is ommited in multiple fields results in leader with multiple fields.

Then re result looks like this:

	<marc:record>
		<marc:leader>p</marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader>m</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>a</marc:leader>
		<marc:leader> </marc:leader>
		<marc:leader>c</marc:leader>
		<marc:leader> </marc:leader>

It seems that there is no control if there is only one leader.


encode-marc21 cannot handle data from handle-marcxml: see

Error is:

org.metafacture.framework.FormatException: invalid tag format for reference field
    at org.metafacture.biblio.iso2709.RecordBuilder.checkValidReferenceFieldTag (RecordBuilder.java:260)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:244)
        org.metafacture.biblio.iso2709.RecordBuilder.appendReferenceField (RecordBuilder.java:224)
        org.metafacture.biblio.marc21.Marc21Encoder.processTopLevelLiteral (Marc21Encoder.java:254)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:186)
        org.metafacture.biblio.marc21.MarcXmlHandler.endElement (MarcXmlHandler.java:135)

Also not from decode-marc21(emitleaderaswhole="true") see

The error is:

org.metafacture.framework.FormatException: literal must only contain a single character:leader
    at org.metafacture.biblio.marc21.Marc21Encoder.processLiteralInLeader (Marc21Encoder.java:195)
        org.metafacture.biblio.marc21.Marc21Encoder.literal (Marc21Encoder.java:183)
        org.metafacture.biblio.marc21.Marc21Decoder.emitLeader (Marc21Decoder.java:254)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:221)
        org.metafacture.biblio.marc21.Marc21Decoder.process (Marc21Decoder.java:136)

So besides inconsistencies it is difficult to transform marc21-> marcxml or the other way around. even marc21 -> marc21 is not so easy. See here This creates the same error as if it would process marc-xml.

@TobiasNx TobiasNx changed the title Inconsistent leader handling by decode-marc21 and handle-marc-xml Inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml Jun 10, 2022
@TobiasNx
Copy link
Contributor Author

TobiasNx commented Jun 10, 2022

I would suggest the following changes:

  • change the duplication of leader in decode-marc21(emitleaderaswhole="true"), so that leader is not an entity with a subfield but there is only one element leader
  • add the option emitleaderasentity="true" to handle-marc-xml so that it outputs marc as the decode-marc21 does by default @blackwinter suggested a better way: emitleaderaswhole (with default true) this is more consistent and if false leader would output like decode-marc21
  • enable encode-marcxml and encode-marc so that they can handle the leader as entity with subfields and as simple field

@blackwinter

This comment was marked as resolved.

@TobiasNx

This comment was marked as resolved.

@TobiasNx

This comment was marked as outdated.

@github-project-automation github-project-automation bot moved this to Backlog in Metafacture Mar 27, 2023
@dr0i dr0i moved this from Backlog to Ready in Metafacture Apr 24, 2023
@dr0i dr0i self-assigned this Apr 24, 2023
@TobiasNx

This comment was marked as duplicate.

@TobiasNx
Copy link
Contributor Author

Also docu states wrongly:

* The stream expected by the encoder is compatible to the streams emitted by
* the {@link Marc21Decoder} and the {@link MarcXmlHandler}.

@TobiasNx TobiasNx added the Bug label Oct 13, 2023
@dr0i dr0i moved this from Ready to Selected in Metafacture Oct 30, 2023
@dr0i dr0i moved this from Selected to Ready in Metafacture Oct 30, 2023
@TobiasNx TobiasNx changed the title Inconsistent leader handling by decode-marc21, handle-marc-xml, encode-marc21 and encode-marcxml Incompatible StreamReceiver output by marc modules due to inconsistent leader handling Nov 7, 2023
@TobiasNx TobiasNx assigned dr0i and unassigned dr0i Apr 9, 2024
@TobiasNx
Copy link
Contributor Author

TobiasNx commented Apr 9, 2024

@dr0i as we talked about with I.W. transformation marc21 -> marcxml is needed.

@TobiasNx
Copy link
Contributor Author

Found two workarounds for:
decode-marc21(emitLeaderAsWhole="true") -> encode-marc21: See here.

handle-marcXml -> encode-marc21: See here.

@dr0i
Copy link
Member

dr0i commented Apr 18, 2024

I try to condense the issues.
I will give the scenarios references ([a,b,c ...] so we can easily refer to them :

a) marc21 -> marc21 works ( just do | decode-marc21(emitLeaderAsWhole="false"))
b) marc21-> marcxml works (just do | decode-marc21(emitleaderaswhole="true"))
c) handle-marcxml -> encode-marc21 doesn't work

For c) we have to think about a solution: The Marc21Encoder expects (in method processLiteralInLeader) that a leader consists of single literals which consists as a Byte (a leader entity with many values). I.e. a leader cannot be one String. See 6d04d69 for introducing this and also the motivation to do so (which I don't understand - I mean we see there are problems coming with the removing of parsing/producing the leader as one String.)).

We could solve c) by:
ca) "would be nice if the handle-marc-xml -module would support the emitleaderaswhole= option soon". We would allow emitleaderaswhole=false which would set them as a single Byte array or
cb) encode-marc21 would be able (again) to cope with a single leader String.

I think cb) would be the best , because as a sideeffect we wouldn't need to tell in a) emitleaderaswhole=false as it would also cope emitleaderaswhole=true.

@dr0i dr0i assigned blackwinter and TobiasNx and unassigned dr0i Apr 18, 2024
@TobiasNx
Copy link
Contributor Author

I think we touch reasons for the change of handling of the leader here #524. Changes in the records when transforming marc21->marc21 (XML and binary) also need changes in the leader since part of the leader are generated based on the number of signs, indicators, elements, subfields. Otherwise the leader and the record are not valid.

@dr0i dr0i self-assigned this Apr 18, 2024
@dr0i dr0i moved this from Ready to Working in Metafacture Apr 18, 2024
dr0i added a commit that referenced this issue Apr 19, 2024
This makes the claim "The stream expected by the encoder is compatible to the
streams emitted by the {@link Marc21Decoder} and the {@link MarcXmlHandler}."
true again.
dr0i added a commit that referenced this issue Apr 19, 2024
This makes the claim "The stream expected by the encoder is compatible to the
streams emitted by the {@link Marc21Decoder} and the {@link MarcXmlHandler}."
true again.
dr0i added a commit that referenced this issue Apr 19, 2024
"leader" can be given aa top-level literal or as literal in an entity.
This makes the claim "The stream expected by the encoder is compatible to the
streams emitted by the {@link Marc21Decoder} and the {@link MarcXmlHandler}."
true again.
dr0i added a commit that referenced this issue Apr 19, 2024
"leader" can be given aa top-level literal or as literal in an entity.
This makes the claim "The stream expected by the encoder is compatible to the
streams emitted by the {@link Marc21Decoder} and the {@link MarcXmlHandler}."
true again.
@dr0i dr0i moved this from Working to Review in Metafacture Apr 19, 2024
dr0i added a commit that referenced this issue Apr 22, 2024
- fix typo
- remove superflous endEntity
- add method to avoid unnecessary char-string conversion
@blackwinter blackwinter removed their assignment Apr 22, 2024
@dr0i dr0i closed this as completed in #526 Apr 22, 2024
@github-project-automation github-project-automation bot moved this from Review to Done in Metafacture Apr 22, 2024
@dr0i
Copy link
Member

dr0i commented Apr 22, 2024

Note: went with cb) as fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants