You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The old PicaDecoder used regular expressions to parse PICA+ records.
This let to two problems:
* Errors in the data resulted in exceptions which did not refer to the
portion of the data that caused the problem (e.g. a character index)
* Due to the use of String.substring() for extracting data from the
record the full record was kept in memory (see issue metafacture#51)
The new PicaDecoder was written to solve these problems. The first one
was addressed by constructing the parser so that it only fails in two
clearly defined situations (missing id field and unexpected end of
record). The second one was solved by copying the parsed data portions
into new strings.
In addition to the problems listed above, the following issues were
addressed:
* metafacture#109 -- removed support for static usages of the encoder
* metafacture#112 -- removed support for appendControlSubField. If Metamorph is
extended to pass data through (issue metafacture#107), this functionality can
easily be implemented in a script. It is also not clear how widely it
is used at all.
While having removed support for control subfields the new decoder
introduces a range of new options:
* ignore missing id -- do not fail on missing ids but use an empty
string as record id
* skip empty fields -- do not output fields without subfields or empty
subfields only (i.e. subfields without name and value)
* fix unexpected end of record -- if a record does not end with a field
delimiter one will be automatically added.
* normalize UTF8 -- automatically performs UTF8 normalization of values
The unit tests have been rewritten to match the new options and to be
more useful for debugging.
substring keeps the original char[] which leads to high memory usage in sorting.
The text was updated successfully, but these errors were encountered: