-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix and refactor ISO 2709 record processing
Fixes a bug when writing ISO 2709 records containing characters which are encoded with more than one byte (such as German umlauts in UTF-8). Various length and position values were computed incorrectly in such cases. Additionally, the interface and the internals of the `iso2709` package are heavily refactored. This also affects the MARC 21 metafacture modules. `RecordBuilder` is refactored to expect char-arrays instead of strings for all values that represent fixed-length values or sets of characters instead of strings. Validation of input values is improved. This includes checks for allowed characters. Additionally, checks are introduced to ensure the correct order of record id, reference and data fields. The `RecordBuilder#toString()` method now returns a descriptive string. For retrieving the actual record data the new method `RecordBuilder#build()` is introduced which returns a byte array. The internals of `RecordBuilder` and its associated builder classes are refactored to make them easier to read. `Record` is refactored to no longer allow reading the full record label but only the parts containing application specific data. The internal structure of `Record` and its associated classes is refactored to improve maintainability. As the record label is now considered an implementation detail of ISO 2709 records, the `Label` class is no longer part of the public interface of the `iso2709` package. The `RecordFormat` class is made immutable. To simplify creating new instances a builder is provided. The constants defined in `Iso2709Format` and `Iso646Characters` and the classes themselves are now package-private and no longer publicly accessible. The `Marc21Encoder` is updated to reflect the changes in the `RecordBuilder`. However, the module still generates a string representation of the record instead of a byte array. Support for setting the full record leader has been removed. Application specific values in the leader can be set through an entity containing a literal for each value to set. The names of the entity and the literals are defined in the constant holding class `Marc21EventNames`. A new parameter `Marc21Encoder#setGenerateRecordId` is introduced which controls whether the record id field in the MARC record will be created from the record id in the start-record event. The `Marc21Decoder` is updated to reflect the changes in `Record`. It is also capable of creating the events describing the application specific parts of the leader expected by `Marc21Encoder`. The ability to emit the full record leader has been removed (this was controllable with `Marc21Decoder#splitLeader(boolean)`). The reason for removing the support for emitting and receiving the full record leader in `Marc21Decoder` and `Marc21Encoder` is that the record label (which is the record leader in ISO 2709 lingo) is considered to be an implementation detail of ISO 2709 record processing. Most of the information in the label is not relevant for processing MARC 21 records.
- Loading branch information
Showing
27 changed files
with
2,107 additions
and
1,668 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.