-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass data through Metamorph #107
Comments
The old PicaDecoder used regular expressions to parse PICA+ records. This let to two problems: * Errors in the data resulted in exceptions which did not refer to the portion of the data that caused the problem (e.g. a character index) * Due to the use of String.substring() for extracting data from the record the full record was kept in memory (see issue metafacture#51) The new PicaDecoder was written to solve these problems. The first one was addressed by constructing the parser so that it only fails in two clearly defined situations (missing id field and unexpected end of record). The second one was solved by copying the parsed data portions into new strings. In addition to the problems listed above, the following issues were addressed: * metafacture#109 -- removed support for static usages of the encoder * metafacture#112 -- removed support for appendControlSubField. If Metamorph is extended to pass data through (issue metafacture#107), this functionality can easily be implemented in a script. It is also not clear how widely it is used at all. While having removed support for control subfields the new decoder introduces a range of new options: * ignore missing id -- do not fail on missing ids but use an empty string as record id * skip empty fields -- do not output fields without subfields or empty subfields only (i.e. subfields without name and value) * fix unexpected end of record -- if a record does not end with a field delimiter one will be automatically added. * normalize UTF8 -- automatically performs UTF8 normalization of values The unit tests have been rewritten to match the new options and to be more useful for debugging.
There is metamorph's |
Yes, I meant something that could also pass whole entities and not only literals. Currently, it is not possible to easily write a script that simply changes some literals or entities in a record while passing the remainder of a record untouched. This makes it quite difficult to write script to filter some data in a record without completely rewriting it. |
Seems related to an email from February 2015, which also provides more details. |
With this version of Metamorph entities (well, entity events) can be passed through. It is slightly incomaptible with the default Metamorph where two tests would fail: - org.metafacture.metamorph.collectors.EntityTest > shouldEmitEntityOnEachFlushEvent - org.metafacture.metamorph.functions.UniqueTest -> shouldAllowSelectingTheUniqueScope So this introduces a "<version>" element under the "<meta>" element in morph. The data is flattened, as with Metamorph 1, but the entity's "start" and "end" events are passed through so that the receiver can handle the flattened data structure, unflatten it etc. By preserving the entity events it's now also possible, without any workarounds, to handle reiterations of entities having the same name. - improve MarcXmlEncoder to work with both Metamorph versions - add "version" element to metamorph.xsd See #107. See also https://github.com/hagbeck/metafacture-sandbox/tree/master/enrich_marcxml.
With this version of Metamorph entities (well, entity events) can be passed through. It is slightly incomaptible with the default Metamorph where two tests would fail: - org.metafacture.metamorph.collectors.EntityTest > shouldEmitEntityOnEachFlushEvent - org.metafacture.metamorph.functions.UniqueTest -> shouldAllowSelectingTheUniqueScope So this introduces a "<version>" element under the "<meta>" element in morph. The data is flattened, as with Metamorph 1, but the entity's "start" and "end" events are passed through so that the receiver can handle the flattened data structure, unflatten it etc. By preserving the entity events it's now also possible, without any workarounds, to handle reiterations of entities having the same name. - add "version" element to metamorph.xsd See #107. See also https://github.com/hagbeck/metafacture-sandbox/tree/master/enrich_marcxml.
With this version of Metamorph entities (well, entity events) can be passed through. It is slightly incomaptible with the default Metamorph where two tests would fail: - org.metafacture.metamorph.collectors.EntityTest > shouldEmitEntityOnEachFlushEvent - org.metafacture.metamorph.functions.UniqueTest -> shouldAllowSelectingTheUniqueScope So this introduces a "<version>" element under the "<meta>" element in morph. The data is flattened, as with Metamorph 1, but the entity's "start" and "end" events are passed through so that the receiver can handle the flattened data structure, unflatten it etc. By preserving the entity events it's now also possible, without any workarounds, to handle reiterations of entities having the same name. - add "version" element to metamorph.xsd See #107. See also https://github.com/hagbeck/metafacture-sandbox/tree/master/enrich_marcxml.
With this version of Metamorph entities (well, entity events) can be passed through. It is slightly incomaptible with the default Metamorph where two tests would fail: - org.metafacture.metamorph.collectors.EntityTest > shouldEmitEntityOnEachFlushEvent - org.metafacture.metamorph.functions.UniqueTest -> shouldAllowSelectingTheUniqueScope So this introduces a "<version>" element under the "<meta>" element in morph. The data is flattened, as with Metamorph 1, but the entity's "start" and "end" events are passed through so that the receiver can handle the flattened data structure, unflatten it etc. By preserving the entity events it's now also possible, without any workarounds, to handle reiterations of entities having the same name. - add "version" element to metamorph.xsd See #107. See also https://github.com/hagbeck/metafacture-sandbox/tree/master/enrich_marcxml.
With this version of Metamorph entities (well, entity events) can be passed through. It is slightly incomaptible with the default Metamorph where two tests would fail: - org.metafacture.metamorph.collectors.EntityTest > shouldEmitEntityOnEachFlushEvent - org.metafacture.metamorph.functions.UniqueTest -> shouldAllowSelectingTheUniqueScope So this introduces a "<version>" element under the "<meta>" element in morph. The data is flattened, as with Metamorph 1, but the entity's "start" and "end" events are passed through so that the receiver can handle the flattened data structure, unflatten it etc. By preserving the entity events it's now also possible, without any workarounds, to handle reiterations of entities having the same name. - add "version" element to metamorph.xsd See #107. See also https://github.com/hagbeck/metafacture-sandbox/tree/master/enrich_marcxml.
Just an FYI in reference to PR #328 (not really sure where to discuss): You could achieve (almost) the same result with a filter ( @Test
public void metamorph1_passthrough() {
final Filter metamorph = new Filter(InlineMorph.in(this) //
.with("<rules>")//
.with(" <data source='_else'/>")//
.with("</rules>")//
.create());
metamorph.setReceiver(receiver);
metamorph.startRecord("1");
metamorph.startEntity("clone");
metamorph.literal("id", "0");
metamorph.endEntity();
metamorph.startEntity("clone");
metamorph.literal("id", "1");
metamorph.endEntity();
metamorph.endRecord();
final InOrder ordered = inOrder(receiver);
ordered.verify(receiver).startRecord("1");
ordered.verify(receiver).startEntity("clone");
ordered.verify(receiver).literal("id", "0");
ordered.verify(receiver).endEntity();
ordered.verify(receiver).startEntity("clone");
ordered.verify(receiver).literal("id", "1");
ordered.verify(receiver).endEntity();
ordered.verify(receiver).endRecord();
} The only difference is that literals are emitted with their base name only, not with the full entity path ( |
Sorry, but this isn't working anymore! I've pulled the branch and the resulting dist produces <marc:controlfield tag="leader">01339nmm a2200024 c 4500</marc:controlfield>
<marc:controlfield tag="001">1635091</marc:controlfield>
<marc:controlfield tag="005">20180914</marc:controlfield>
<marc:controlfield tag="007">cr |||||||||||</marc:controlfield>
<marc:controlfield tag="008">171117s2017uuuu|||||| |o|||||||||||ger||</marc:controlfield>
<marc:controlfield tag="020 .a">9783662532607</marc:controlfield>
<marc:controlfield tag="0247.2">doi</marc:controlfield>
<marc:controlfield tag="0247.a">10.1007/978-3-662-53260-7</marc:controlfield>
<marc:controlfield tag="035 .a">(UNION_SEAL)HT019079275</marc:controlfield>
<marc:controlfield tag="035 .a">(DE-599)HBZHT019079275</marc:controlfield>
<marc:controlfield tag="040 .e">rda</marc:controlfield>
<marc:controlfield tag="24500a">Weißbuch Gelenkersatz</marc:controlfield>
<marc:controlfield tag="24500b">Versorgungssituation bei endoprothetischen Hüft- und Knieoperationen in Deutschland</marc:controlfield>
<marc:controlfield tag="24500c">herausgegeben von H.-H. Bleß, M. Kip</marc:controlfield>
<marc:controlfield tag="260 .a">Berlin, Heidelberg</marc:controlfield>
<marc:controlfield tag="260 .b">Springer Berlin Heidelberg</marc:controlfield>
<marc:controlfield tag="260 .c">2017</marc:controlfield>
<marc:controlfield tag="260 .b">Imprint: Springer</marc:controlfield>
<marc:controlfield tag="2641.a">Berlin, Heidelberg</marc:controlfield>
<marc:controlfield tag="2641.b">Springer Berlin Heidelberg</marc:controlfield>
<marc:controlfield tag="2641.c">2017</marc:controlfield> The morph I've used <?xml version="1.0" encoding="UTF-8"?>
<metamorph xmlns="http://www.culturegraph.org/metamorph"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.culturegraph.org/metamorph metamorph.xsd"
version="1">
<!-- Metadata -->
<meta />
<!-- Macro definitions -->
<macros />
<!-- Transformation rules-->
<rules>
<data source="_else"/>
</rules>
<maps />
</metamorph> And the flux
|
@hagbeck: It's caused by this particular change in the branch: - writeRaw(String.format(CONTROLFIELD_OPEN_TEMPLATE, name));
+ writeRaw(String.format(CONTROLFIELD_OPEN_TEMPLATE, name.replaceFirst("\\W",""))); |
@hagbeck for 100% backward compatibility a To use the |
@blackwinter oh, thx, wasn't aware of this "Filter" ... Wonder if this is the better solution and wonder why this was not discussed here before. Will try it and see how it behaves, but it looks very promising! |
The difference is |
Ok, I see, the comment "this isn't working anymore" refers to metafacture-sandbox, not the original behaviour. I still think it's an incompatibility, though. This change is not guarded by the version parameter. |
@blackwinter yes, it's indeed an issue. Wished to have put 7ca5615 in another branch so that it could be discussed there. |
I guess that's because it's not selective and doesn't allow "enriching" the output. It only allows for converting from one format into another and/or filtering out unwanted records. The e-mail thread you mentioned above sounds like pass-through should be applicable to selected elements, not necessarily the stream as a whole.
Well, why did you make those changes in the first place? I assume they were intended to address a shortcoming of this particular pass-through implementation: namely, that it passes the full entity path instead of just the individual entity/literal names (see the difference between Both these points seem to indicate that we should maybe approach this issue from a different angle: API-wise, isn't this rather a property of the To illustrate (untested): @Test
public void metamorph1_unflatten() {
metamorph = InlineMorph.in(this) //
.with("<rules>")//
.with(" <entity name='flattened' flushWith='record'>")//
.with(" <data source='_else'/>")//
.with(" </entity>")//
.with(" <entity name='unflattened' flushWith='record'>")//
.with(" <data source='_else' unflatten='true'/>")//
.with(" </entity>")//
.with("</rules>")//
.createConnectedTo(receiver);
metamorph.startRecord("1");
metamorph.startEntity("clone");
metamorph.literal("id", "0");
metamorph.endEntity();
metamorph.startEntity("clone");
metamorph.literal("id", "1");
metamorph.endEntity();
metamorph.endRecord();
final InOrder ordered = inOrder(receiver);
ordered.verify(receiver).startRecord("1");
ordered.verify(receiver).startEntity("flattened");
ordered.verify(receiver).literal("clone.id", "0");
ordered.verify(receiver).literal("clone.id", "1");
ordered.verify(receiver).endEntity();
ordered.verify(receiver).startEntity("unflattened");
ordered.verify(receiver).startEntity("clone");
ordered.verify(receiver).literal("id", "0");
ordered.verify(receiver).endEntity();
ordered.verify(receiver).startEntity("clone");
ordered.verify(receiver).literal("id", "1");
ordered.verify(receiver).endEntity();
ordered.verify(receiver).endEntity();
ordered.verify(receiver).endRecord();
} WDYT? |
With the new keyword "_elseAndPassEntityEvents" (set with <data source="_elseAndPassEntityEvents" /> ) the known "_else" is triggered AND entity events for these _else sources are fired. With this, data can be passed through metamorph. These "_else"-data is handled in receivers like all the other data handled by morph rules. Data which is handled by metamorph rules will NOT be passed through (hence the aptly named "_else"). If you want to use data in the morph AND pass it through, you have to add an explicit rule for this, as usual. See #107.
With the new keyword "_elseAndPassEntityEvents" (set with <data source="_elseAndPassEntityEvents" /> ) the known "_else" is triggered AND entity events for these "_else" sources are fired. With this, data can be passed through metamorph. All "_else" data are handled in receivers like all the other data handled by morph rules. Data which is handled by metamorph rules will NOT be passed through (hence the aptly named "_else"). If you want to use data in the morph AND pass it through, you have to add an explicit rule for this, as usual. See #107.
Added to the wiki. |
Metamorph should be able to pass entities and literals through even if they are not processed.
The text was updated successfully, but these errors were encountered: