Skip to content

Commit

Permalink
Fix and refactor ISO 2709 record processing
Browse files Browse the repository at this point in the history
Fixes a bug when writing ISO 2709 records containing characters which
are encoded with more than one byte (such as German umlauts in UTF-8).
Various length and position values were computed incorrectly in such
cases.

Additionally, the interface and the internals of the `iso2709` package
are heavily refactored. This also affects the MARC 21 metafacture
modules.

`RecordBuilder` is refactored to expect char-arrays instead of strings
for all values that represent fixed-length values or sets of characters
instead of strings. Validation of input values is improved. This
includes checks for allowed characters. Additionally, checks are
introduced to ensure the correct order of record id, reference and data
fields.

The `RecordBuilder#toString()` method now returns a descriptive string.
For retrieving the actual record data the new method
`RecordBuilder#build()` is introduced which returns a byte array.

The internals of `RecordBuilder` and its associated builder classes are
refactored to make them easier to read.

`Record` is refactored to no longer allow reading the full record label
but only the parts containing application specific data. The internal
structure of `Record` and its associated classes is refactored to
improve maintainability.

As the record label is now considered an implementation detail of ISO
2709 records, the `Label` class is no longer part of the public
interface of the `iso2709` package.

The `RecordFormat` class is made immutable. To simplify creating new
instances a builder is provided.

The constants defined in `Iso2709Format` and `Iso646Characters` and the
classes themselves are now package-private and no longer publicly
accessible.

The `Marc21Encoder` is updated to reflect the changes in the
`RecordBuilder`. However, the module still generates a string
representation of the record instead of a byte array. Support for
setting the full record leader has been removed. Application specific
values in the leader can be set through an entity containing a literal
for each value to set. The names of the entity and the literals are
defined in the constant holding class `Marc21EventNames`.

A new parameter `Marc21Encoder#setGenerateRecordId` is introduced which
controls whether the record id field in the MARC record will be
created from the record id in the start-record event.

The `Marc21Decoder` is updated to reflect the changes in `Record`. It
is also capable of creating the events describing the application
specific parts of the leader expected by `Marc21Encoder`. The ability
to emit the full record leader has been removed (this was controllable
with `Marc21Decoder#splitLeader(boolean)`).

The reason for removing the support for emitting and receiving the full
record leader in `Marc21Decoder` and `Marc21Encoder` is that the record
label (which is the record leader in ISO 2709 lingo) is considered to
be an implementation detail of ISO 2709 record processing. Most of the
information in the label is not relevant for processing MARC 21 records.
  • Loading branch information
cboehme committed Oct 10, 2016
1 parent 4f94377 commit 6d04d69
Show file tree
Hide file tree
Showing 27 changed files with 2,107 additions and 1,668 deletions.
120 changes: 66 additions & 54 deletions src/main/java/org/culturegraph/mf/iso2709/DirectoryBuilder.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2014 Christoph Böhme
* Copyright 2016 Christoph Böhme
*
* Licensed under the Apache License, Version 2.0 the "License";
* you may not use this file except in compliance with the License.
Expand All @@ -15,8 +15,9 @@
*/
package org.culturegraph.mf.iso2709;

import static org.culturegraph.mf.iso2709.Util.calculateMaxValue;
import static org.culturegraph.mf.iso2709.Util.padWithZeros;
import static org.culturegraph.mf.iso2709.Iso2709Constants.FIELD_SEPARATOR;
import static org.culturegraph.mf.iso2709.Iso2709Constants.MAX_PAYLOAD_LENGTH;
import static org.culturegraph.mf.iso2709.Iso2709Constants.TAG_LENGTH;

import org.culturegraph.mf.exceptions.FormatException;

Expand All @@ -30,95 +31,106 @@
*/
final class DirectoryBuilder {

private final StringBuilder directory = new StringBuilder();
private final Iso646ByteBuffer buffer;

private final int fieldStartLength;
private final int fieldLengthLength;

private final int implDefinedPartLength;
private final int entryLength;
private final int maxFieldStart;
private final int maxFieldLength;

private String tag;
private String implDefinedPart;
private int fieldStart;
private int fieldEnd;

public DirectoryBuilder(final RecordFormat format) {
DirectoryBuilder(final RecordFormat format) {
buffer = new Iso646ByteBuffer(MAX_PAYLOAD_LENGTH);
fieldStartLength = format.getFieldStartLength();
fieldLengthLength = format.getFieldLengthLength();

implDefinedPartLength = format.getImplDefinedPartLength();
entryLength = TAG_LENGTH + fieldStartLength + fieldLengthLength +
implDefinedPartLength;
maxFieldStart = calculateMaxValue(fieldStartLength);
maxFieldLength = calculateMaxValue(fieldLengthLength);

reset();
}

public void setTag(final String tag) {
this.tag = tag;
}

public void setImplDefinedPart(final String implDefinedPart) {
this.implDefinedPart = implDefinedPart;
private int calculateMaxValue(final int digits) {
assert digits >= 0;
int maxValue = 1;
for (int i = 0; i < digits; i++) {
maxValue *= 10;
}
return maxValue - 1;
}

public void setFieldStart(final int fieldStart) {
void addEntries(final char[] tag, final char[] implDefinedPart,
final int fieldStart, final int fieldEnd) {
assert tag.length == TAG_LENGTH;
assert implDefinedPart.length == implDefinedPartLength;
assert fieldStart >= 0;
this.fieldStart = fieldStart;
assert fieldEnd >= fieldStart;
checkDirectoryCapacity(fieldStart, fieldEnd);
checkFieldFitsInAddressSpace(fieldStart, fieldEnd);
writeEntries(tag, implDefinedPart, fieldStart, fieldEnd);
}

public void setFieldEnd(final int fieldEnd) {
assert fieldEnd >= 0;
this.fieldEnd = fieldEnd;
private void checkDirectoryCapacity(final int fieldStart,
final int fieldEnd) {
final int fieldLength = fieldEnd - fieldStart;
final int numberOfEntries = fieldLength / maxFieldLength +
(fieldLength % maxFieldLength == 0 ? 0 : 1);
if (numberOfEntries * entryLength > buffer.getFreeSpace()) {
throw new FormatException(
"directory does not have enough free space for directory entry");
}
}

public void write() {
assert tag != null;
assert implDefinedPart != null;
assert fieldEnd >= fieldStart;

checkAllPartsStartInAddressRange();
private void checkFieldFitsInAddressSpace(final int fieldStart,
final int fieldEnd) {
final int fieldLength = fieldEnd - fieldStart;
final int lastPartLength = fieldLength % maxFieldLength;
final int lastPartStart = fieldEnd - lastPartLength;
if (lastPartStart > maxFieldStart) {
throw new FormatException("field is too long");
}
}

private void writeEntries(final char[] tag, final char[] implDefinedPart,
final int fieldStart, final int fieldEnd) {
int remainingLength = fieldEnd - fieldStart;
int partStart = fieldStart;
while (remainingLength > maxFieldLength) {
writeDirectoryEntry(partStart, 0);
writeEntry(tag, implDefinedPart, partStart, 0);
remainingLength -= maxFieldLength;
partStart += maxFieldLength;
}
writeDirectoryEntry(partStart, remainingLength);
writeEntry(tag, implDefinedPart, partStart, remainingLength);
}

private void checkAllPartsStartInAddressRange() {
final int fieldLength = fieldEnd - fieldStart;
final int lastPartLength = fieldLength % maxFieldLength;
final int lastPartStart = fieldEnd - lastPartLength;
if (lastPartStart > maxFieldStart) {
throw new FormatException("the field is too long");
}
private void writeEntry(final char[] tag, final char[] implDefinedPart,
final int partStart, final int partLength) {
buffer.writeChars(tag);
buffer.writeInt(partLength, fieldLengthLength);
buffer.writeInt(partStart, fieldStartLength);
buffer.writeChars(implDefinedPart);
}

private void writeDirectoryEntry(final int partStart, final int partLength) {
directory.append(tag);
directory.append(padWithZeros(partLength, fieldLengthLength));
directory.append(padWithZeros(partStart, fieldStartLength));
directory.append(implDefinedPart);
void reset() {
buffer.setWritePosition(0);
}

public void reset() {
directory.setLength(0);
tag = null;
implDefinedPart = null;
fieldStart = 0;
fieldEnd = 0;
int length() {
return buffer.getWritePosition() + Byte.BYTES;
}

public int length() {
return directory.length() + 1;
void copyToBuffer(final byte[] destBuffer, final int fromIndex) {
final int directoryLength = buffer.getWritePosition();
System.arraycopy(buffer.getByteArray(), 0, destBuffer, fromIndex,
directoryLength);
final int directoryEnd = fromIndex + directoryLength;
destBuffer[directoryEnd] = FIELD_SEPARATOR;
}

@Override
public String toString() {
return directory.toString() + Iso646Characters.IS2;
return buffer.stringAt(0, buffer.getWritePosition(), Iso646Constants.CHARSET);
}

}
97 changes: 53 additions & 44 deletions src/main/java/org/culturegraph/mf/iso2709/DirectoryEntry.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,97 +15,106 @@
*/
package org.culturegraph.mf.iso2709;

import org.culturegraph.mf.exceptions.FormatException;
import static org.culturegraph.mf.iso2709.Iso2709Constants.MAX_BASE_ADDRESS;
import static org.culturegraph.mf.iso2709.Iso2709Constants.MIN_BASE_ADDRESS;
import static org.culturegraph.mf.iso2709.Iso2709Constants.RECORD_LABEL_LENGTH;
import static org.culturegraph.mf.iso2709.Iso2709Constants.TAG_LENGTH;

/**
* Provides access to a directory entry. A {@code DirectoryEntry} works like
* an iterator or cursor. Use {@link #gotoNext()} to advance to the next
* directory entry. Use {@link #rewind()} to go back to the first directory
* entry.
*
* @author Christoph Böhme
*/
class DirectoryEntry {

private final Iso646ByteBuffer buffer;

private final int directoryEnd;
private final int fieldLengthLength;
private final int fieldStartLength;
private final int implDefinedPartLength;
private final int baseAddress;
private final int entryLength;

private int currentPosition;

DirectoryEntry(final Iso646ByteBuffer buffer, final Label label) {
this.buffer = buffer;
this.fieldLengthLength = label.getFieldLengthLength();
this.fieldStartLength = label.getFieldStartLength();
this.implDefinedPartLength = label.getImplDefinedPartLength();
this.baseAddress = label.getBaseAddress();
this.entryLength = Iso2709Format.TAG_LENGTH + fieldLengthLength +
fieldStartLength + implDefinedPartLength;
verifyDirectoryLength();
reset();
}
DirectoryEntry(final Iso646ByteBuffer buffer, final RecordFormat recordFormat,
final int baseAddress) {
assert buffer != null;
assert baseAddress >= MIN_BASE_ADDRESS;
assert baseAddress <= MAX_BASE_ADDRESS;

private void verifyDirectoryLength() {
if (buffer.getLength() < Iso2709Format.MIN_RECORD_LENGTH) {
throw new FormatException("Record is too short");
}
if (buffer.charAt(baseAddress - 1) != Iso2709Format.FIELD_SEPARATOR) {
throw new FormatException("Expecting field separator at index " +
(baseAddress - 1));
}
final int dirLength = baseAddress - Iso2709Format.RECORD_LABEL_LENGTH - 1;
if (dirLength % entryLength != 0) {
throw new FormatException("Directory length must be a multiple of the " +
"directory entry length");
}
this.buffer = buffer;
directoryEnd = baseAddress - Byte.BYTES;
fieldLengthLength = recordFormat.getFieldLengthLength();
fieldStartLength = recordFormat.getFieldStartLength();
implDefinedPartLength = recordFormat.getImplDefinedPartLength();
entryLength = TAG_LENGTH + fieldLengthLength + fieldStartLength +
implDefinedPartLength;
rewind();
}

void reset() {
currentPosition = Iso2709Format.RECORD_LABEL_LENGTH;
void rewind() {
currentPosition = RECORD_LABEL_LENGTH;
}

void gotoNext() {
assert !endOfDirectoryReached();
assert currentPosition < directoryEnd;
currentPosition += entryLength;
}

boolean endOfDirectoryReached() {
return currentPosition >= baseAddress - 1;
return currentPosition >= directoryEnd;
}

char[] getTag() {
assert !endOfDirectoryReached();
return buffer.charsAt(currentPosition, Iso2709Format.TAG_LENGTH);
assert currentPosition < directoryEnd;
return buffer.charsAt(currentPosition, TAG_LENGTH);
}

int getFieldLength() {
assert !endOfDirectoryReached();
final int fieldLengthStart = currentPosition + Iso2709Format.TAG_LENGTH;
assert currentPosition < directoryEnd;
final int fieldLengthStart = currentPosition + TAG_LENGTH;
return buffer.parseIntAt(fieldLengthStart, fieldLengthLength);
}

int getFieldStart() {
assert !endOfDirectoryReached();
final int fieldStartStart = currentPosition + Iso2709Format.TAG_LENGTH +
assert currentPosition < directoryEnd;
final int fieldStartStart = currentPosition + TAG_LENGTH +
fieldLengthLength;
return buffer.parseIntAt(fieldStartStart, fieldStartLength);
}

char[] getImplDefinedPart() {
assert !endOfDirectoryReached();
final int implDefinedPartStart = currentPosition +
Iso2709Format.TAG_LENGTH + fieldLengthLength + fieldStartLength;
assert currentPosition < directoryEnd;
final int implDefinedPartStart = currentPosition + TAG_LENGTH +
fieldLengthLength + fieldStartLength;
return buffer.charsAt(implDefinedPartStart, implDefinedPartLength);
}

boolean isRecordIdField() {
final char[] tag = getTag();
return tag[0] == '0' && tag[1] == '0' && tag[2] == '1';
}

boolean isReferenceField() {
final char[] tag = getTag();
return tag[0] == '0' && tag[1] == '0';
}

boolean isContinuedField() {
return getFieldLength() == 0;
}

@Override
public String toString() {
if (endOfDirectoryReached()) {
return "END-OF_DIRECTORY";
return "@END-OF-DIRECTORY";
}
return String.valueOf(getTag()) +
String.valueOf(getFieldLength()) +
String.valueOf(getFieldStart()) +
String.valueOf(getImplDefinedPart());
return String.valueOf(getTag()) + String.valueOf(getFieldLength()) +
String.valueOf(getFieldStart()) + String.valueOf(getImplDefinedPart());
}

}
3 changes: 3 additions & 0 deletions src/main/java/org/culturegraph/mf/iso2709/FieldHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
package org.culturegraph.mf.iso2709;

/**
* Callback interface defining the events emitted by
* {@link Record#processFields(FieldHandler)}.
*
* @author Christoph Böhme
*/
public interface FieldHandler {
Expand Down
Loading

0 comments on commit 6d04d69

Please sign in to comment.