PicaDecoder uses static members for instance-related information #109

cboehme · 2013-07-11T08:57:37Z

This is not obvious and prone to produce unexpected behaviour.

The old PicaDecoder used regular expressions to parse PICA+ records. This let to two problems: * Errors in the data resulted in exceptions which did not refer to the portion of the data that caused the problem (e.g. a character index) * Due to the use of String.substring() for extracting data from the record the full record was kept in memory (see issue metafacture#51) The new PicaDecoder was written to solve these problems. The first one was addressed by constructing the parser so that it only fails in two clearly defined situations (missing id field and unexpected end of record). The second one was solved by copying the parsed data portions into new strings. In addition to the problems listed above, the following issues were addressed: * metafacture#109 -- removed support for static usages of the encoder * metafacture#112 -- removed support for appendControlSubField. If Metamorph is extended to pass data through (issue metafacture#107), this functionality can easily be implemented in a script. It is also not clear how widely it is used at all. While having removed support for control subfields the new decoder introduces a range of new options: * ignore missing id -- do not fail on missing ids but use an empty string as record id * skip empty fields -- do not output fields without subfields or empty subfields only (i.e. subfields without name and value) * fix unexpected end of record -- if a record does not end with a field delimiter one will be automatically added. * normalize UTF8 -- automatically performs UTF8 normalization of values The unit tests have been rewritten to match the new options and to be more useful for debugging.

cboehme · 2013-07-15T12:56:38Z

The new PicaDecoder (pull request #113) solves this issue by removing support for static use of the decoder completely.

cboehme · 2013-11-02T18:03:41Z

Fixed in pul lrequest #113

cboehme mentioned this issue Jul 12, 2013

default value of appendControlSubField should be false. #112

Closed

ghost assigned cboehme Jul 14, 2013

cboehme mentioned this issue Jul 14, 2013

Re-implemented PicaDecoder based on a state machine. #113

Merged

cboehme closed this as completed Nov 2, 2013

blackwinter added a commit that referenced this issue Dec 13, 2024

Produce Value.Array for marked array entities. (#109)

a42943d

blackwinter added a commit that referenced this issue Dec 13, 2024

Require explicit array initialization. (#109)

7d9b2a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicaDecoder uses static members for instance-related information #109

PicaDecoder uses static members for instance-related information #109

cboehme commented Jul 11, 2013

cboehme commented Jul 15, 2013

cboehme commented Nov 2, 2013

PicaDecoder uses static members for instance-related information #109

PicaDecoder uses static members for instance-related information #109

Comments

cboehme commented Jul 11, 2013

cboehme commented Jul 15, 2013

cboehme commented Nov 2, 2013