Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoders (MARC21, Pica etc.) should not use String.substring() #51

Closed
mgeipel opened this issue Mar 27, 2013 · 0 comments
Closed

Decoders (MARC21, Pica etc.) should not use String.substring() #51

mgeipel opened this issue Mar 27, 2013 · 0 comments

Comments

@mgeipel
Copy link

mgeipel commented Mar 27, 2013

substring keeps the original char[] which leads to high memory usage in sorting.

@ghost ghost assigned mgeipel and cboehme Jul 14, 2013
cboehme added a commit to cboehme/metafacture-core that referenced this issue Jul 14, 2013
The old PicaDecoder used regular expressions to parse PICA+ records. 
This let to two problems:

 * Errors in the data resulted in exceptions which did not refer to the    
   portion of the data that caused the problem (e.g. a character index)
 * Due to the use of String.substring() for extracting data from the  
   record the full record was kept in memory (see issue metafacture#51)

The new PicaDecoder was written to solve these problems. The first one
was addressed by constructing the parser so that it only fails in two
clearly defined situations (missing id field and unexpected end of
record). The second one was solved by copying the parsed data portions
into new strings. 

In addition to the problems listed above, the following issues were
addressed:
 
 * metafacture#109 -- removed support for static usages of the encoder
 * metafacture#112 -- removed support for appendControlSubField. If Metamorph is  
   extended to pass data through (issue metafacture#107), this functionality can 
   easily be implemented in a script. It is also not clear how widely it 
   is used at all.
 
While having removed support for control subfields the new decoder
introduces a range of new options:

 * ignore missing id -- do not fail on missing ids but use an empty 
   string as record id
 * skip empty fields -- do not output fields without subfields or empty
   subfields only (i.e. subfields without name and value)
 * fix unexpected end of record -- if a record does not end with a field 
   delimiter one will be automatically added.
 * normalize UTF8 -- automatically performs UTF8 normalization of values
 
The unit tests have been rewritten to match the new options and to be
more useful for debugging.
@cboehme cboehme removed this from the Version 2 milestone Feb 19, 2014
@cboehme cboehme added this to the metafacture-4.0.0 milestone Jan 8, 2017
@cboehme cboehme closed this as completed Jan 8, 2017
blackwinter pushed a commit that referenced this issue Dec 13, 2024
Basic if statements and record mode draft
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants