-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing repetition_delimiter
to EDI schema.
#215
Conversation
Issue: #212 `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example, if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter` present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since `repetition_delimiter` creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in `transform_declarations` is `array`. Currently we read in all the elements and their components in serial in `NonValidatingReader` into a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component index if there are more than 1 component. When `repetition_delimiter` is added, we continue down the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`. Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified): ``` { {'CLM', ElemIndex: 0, CompIndex: 1}, {'A37YH556', ElemIndex: 1, CompIndex: 1}, {'500', ElemIndex: 2, CompIndex: 1}, {'', ElemIndex: 3, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'11', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'1', ElemIndex: 5, CompIndex: 3}, {'12', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'2', ElemIndex: 5, CompIndex: 3}, } ``` Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements. This behavior is new and introduced in this PR. Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so. With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #215 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 53 53
Lines 3027 3041 +14
=========================================
+ Hits 3027 3041 +14
☔ View full report in Codecov by Sentry. |
@@ -87,6 +87,7 @@ A full EDI schema `file_declaration` is as follows: | |||
"segment_delimiter": "<segment delimiter>", <== required | |||
"element_delimiter": "<element delimiter>", <== required | |||
"component_delimiter": "<component delimiter>", <== optional | |||
"repetition_delimiter": "<repetition delimiter>", <== optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[OUT OF SCOPE]
These 4 delimiters may be gleaned from the ISA
header as the whole segment is fixed length and the separator characters are in deterministic positions:
ISA*00* *00* *ZZ*CMSFFM *ZZ*987654321 *230614*1605*^*00501*000001003*1*T*:~
segment_delimiter - index position 105
element_delimiter - index position 4
component_delimiter. - index position 104
repetition_delimiter - index position 82
This is clearly beyond the scope of this change, and the proposed configuration option matches the current paradigm as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. Yes we were aware of the fixed positions of these separators/delimiters in the header segment. However, our design goal / philosophy, as you might have figured out by now, was to make this library as flexible as possible and not bound to a given EDI standard. Different standards (X12/EDIFact/etc) have different headers, different delimiters and their positions. We felt the current approach allows flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great to know. Thanks for sharing your insight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually tested various permutations of an element with the repetition operator, and both outputs and errors were as expected.
Issue: #212
repetition_delimiter
: delimiter to separate multiple data instances for an element. For example, if^
is the repetition delimiter for a segmentDMG*D8*19690815*M**A^B^C^D~
, then the last element has 4 pieces of data:A
,B
,C
, andD
. Any element withoutrepetition_delimiter
present has essentially one piece of data; similarly, if^
is the repetition delimiter for a segmentCLM*A37YH556*500***11:B:1^12:B:2~
, the last element has 2 pieces of data:11:B:1
and12:B:2
, each of which is further delimited by acomponent_delimiter
:
. Note, sincerepetition_delimiter
creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type intransform_declarations
isarray
.Currently we read in all the elements and their components in serial in
NonValidatingReader
into a slice:[]RawSegElem
, each of which contains the element value, the element index, and component index if there are more than 1 component. Whenrepetition_delimiter
is added, we continue down the same pattern:NonValidatingReader
still reads everything into the slice, except now, there potentially can be multipleRawSegElem
share the sameElemIndex
andCompIndex
.Using the example above:
^
is the rep delim and seg isCLM*A37YH556*500***11:B:1^12:B:2~
. AfterNonValidatingReader.Read()
is done, we'll have the following[]RawSegElem
(simplified):Note the last 3 elements have the same
ElemIndex
andCompIndex
as the previous 3 elements. This behavior is new and introduced in this PR.Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because
ElemIndex
andCompIndex
are always increase, thus we never need to back-scan. With introduction of potentially duplicateElemIndex
andCompIndex
, now for each of the element decl, we simply do a full[]RawSegElem
scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so.With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the
repetition_delimiter
feature needs to usearray
type in thetransform_declarations
.