Introducing `repetition_delimiter` to EDI schema. #215

jf-tech · 2023-07-25T08:35:23Z

Issue: #212

repetition_delimiter: delimiter to separate multiple data instances for an element. For example, if ^ is the repetition delimiter for a segment DMG*D8*19690815*M**A^B^C^D~, then the last element has 4 pieces of data: A, B, C, and D. Any element without repetition_delimiter present has essentially one piece of data; similarly, if ^ is the repetition delimiter for a segment CLM*A37YH556*500***11:B:1^12:B:2~, the last element has 2 pieces of data: 11:B:1 and 12:B:2, each of which is further delimited by a component_delimiter :. Note, since repetition_delimiter creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in transform_declarations is array.

Currently we read in all the elements and their components in serial in NonValidatingReader into a slice: []RawSegElem, each of which contains the element value, the element index, and component index if there are more than 1 component. When repetition_delimiter is added, we continue down the same pattern: NonValidatingReader still reads everything into the slice, except now, there potentially can be multiple RawSegElem share the same ElemIndex and CompIndex.

Using the example above: ^ is the rep delim and seg is CLM*A37YH556*500***11:B:1^12:B:2~. After NonValidatingReader.Read() is done, we'll have the following []RawSegElem (simplified):

{
   {'CLM', ElemIndex: 0, CompIndex: 1},
   {'A37YH556', ElemIndex: 1, CompIndex: 1},
   {'500', ElemIndex: 2, CompIndex: 1},
   {'', ElemIndex: 3, CompIndex: 1},
   {'', ElemIndex: 4, CompIndex: 1},
   {'', ElemIndex: 4, CompIndex: 1},
   {'11', ElemIndex: 5, CompIndex: 1},
   {'B', ElemIndex: 5, CompIndex: 2},
   {'1', ElemIndex: 5, CompIndex: 3},
   {'12', ElemIndex: 5, CompIndex: 1},
   {'B', ElemIndex: 5, CompIndex: 2},
   {'2', ElemIndex: 5, CompIndex: 3},
}

Note the last 3 elements have the same ElemIndex and CompIndex as the previous 3 elements. This behavior is new and introduced in this PR.

Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because ElemIndex and CompIndex are always increase, thus we never need to back-scan. With introduction of potentially duplicate ElemIndex and CompIndex, now for each of the element decl, we simply do a full []RawSegElem scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so.

With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the repetition_delimiter feature needs to use array type in the transform_declarations.

Issue: #212 `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example, if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter` present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and `12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since `repetition_delimiter` creates multiple pieces of data under the same element name in the schema, in most cases the suitable construct type in `transform_declarations` is `array`. Currently we read in all the elements and their components in serial in `NonValidatingReader` into a slice: `[]RawSegElem`, each of which contains the element value, the element index, and component index if there are more than 1 component. When `repetition_delimiter` is added, we continue down the same pattern: `NonValidatingReader` still reads everything into the slice, except now, there potentially can be multiple `RawSegElem` share the same `ElemIndex` and `CompIndex`. Using the example above: `^` is the rep delim and seg is `CLM*A37YH556*500***11:B:1^12:B:2~`. After `NonValidatingReader.Read()` is done, we'll have the following `[]RawSegElem` (simplified): ``` { {'CLM', ElemIndex: 0, CompIndex: 1}, {'A37YH556', ElemIndex: 1, CompIndex: 1}, {'500', ElemIndex: 2, CompIndex: 1}, {'', ElemIndex: 3, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'', ElemIndex: 4, CompIndex: 1}, {'11', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'1', ElemIndex: 5, CompIndex: 3}, {'12', ElemIndex: 5, CompIndex: 1}, {'B', ElemIndex: 5, CompIndex: 2}, {'2', ElemIndex: 5, CompIndex: 3}, } ``` Note the last 3 elements have the same `ElemIndex` and `CompIndex` as the previous 3 elements. This behavior is new and introduced in this PR. Now on the EDI reader side (reader.go), previously when we match element decl against the raw element slice, we only do one way scan, because `ElemIndex` and `CompIndex` are always increase, thus we never need to back-scan. With introduction of potentially duplicate `ElemIndex` and `CompIndex`, now for each of the element decl, we simply do a full `[]RawSegElem` scan. Yes, it is a bit more expensive but given usually the number of total elements and components in a seg is really really small (around 20), we feel this trade-off is acceptable without making the already-complex code even more so. With this reader change, the IDR produced will potentially contain child element nodes with the same element name. Thus in schema writing, it's practically required that the user of the `repetition_delimiter` feature needs to use `array` type in the `transform_declarations`.

codecov · 2023-07-25T08:37:32Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (9e0c8da) 100.00% compared to head (80b2ff2) 100.00%.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #215   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           53        53           
  Lines         3027      3041   +14     
=========================================
+ Hits          3027      3041   +14

Files Changed	Coverage Δ
idr/marshal2.go	`100.00% <ø> (ø)`
extensions/omniv21/fileformat/edi/reader.go	`100.00% <100.00%> (ø)`
extensions/omniv21/fileformat/edi/reader2.go	`100.00% <100.00%> (ø)`
idr/marshal1.go	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

manuel-neuhauser-hs · 2023-07-25T16:39:30Z

doc/edi_in_depth.md

@@ -87,6 +87,7 @@ A full EDI schema `file_declaration` is as follows:
    "segment_delimiter": "<segment delimiter>",                     <== required
    "element_delimiter": "<element delimiter>",                     <== required
    "component_delimiter": "<component delimiter>",                 <== optional
+    "repetition_delimiter": "<repetition delimiter>",               <== optional


[OUT OF SCOPE]

These 4 delimiters may be gleaned from the ISA header as the whole segment is fixed length and the separator characters are in deterministic positions:

ISA*00* *00* *ZZ*CMSFFM *ZZ*987654321 *230614*1605*^*00501*000001003*1*T*:~

segment_delimiter - index position 105 element_delimiter - index position 4 component_delimiter. - index position 104 repetition_delimiter - index position 82

This is clearly beyond the scope of this change, and the proposed configuration option matches the current paradigm as expected.

Thanks for the feedback. Yes we were aware of the fixed positions of these separators/delimiters in the header segment. However, our design goal / philosophy, as you might have figured out by now, was to make this library as flexible as possible and not bound to a given EDI standard. Different standards (X12/EDIFact/etc) have different headers, different delimiters and their positions. We felt the current approach allows flexibility.

That's great to know. Thanks for sharing your insight.

manuel-neuhauser-hs

Manually tested various permutations of an element with the repetition operator, and both outputs and errors were as expected.

jf-tech self-assigned this Jul 25, 2023

jf-tech linked an issue Jul 25, 2023 that may be closed by this pull request

[EDI] Support repetition separator #212

Closed

jf-tech mentioned this pull request Jul 25, 2023

[EDI] Support repetition separator #212

Closed

manuel-neuhauser-hs reviewed Jul 25, 2023

View reviewed changes

manuel-neuhauser-hs approved these changes Jul 25, 2023

View reviewed changes

jf-tech merged commit 79a540b into master Jul 25, 2023

jf-tech deleted the rep2 branch July 25, 2023 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Introducing `repetition_delimiter` to EDI schema. #215

Introducing `repetition_delimiter` to EDI schema. #215

Uh oh!

jf-tech commented Jul 25, 2023

Uh oh!

codecov bot commented Jul 25, 2023

Uh oh!

manuel-neuhauser-hs Jul 25, 2023

Uh oh!

jf-tech Jul 25, 2023

Uh oh!

manuel-neuhauser-hs Jul 25, 2023

Uh oh!

manuel-neuhauser-hs left a comment

Uh oh!

Uh oh!

Uh oh!

Introducing repetition_delimiter to EDI schema. #215

Introducing repetition_delimiter to EDI schema. #215

Uh oh!

Conversation

jf-tech commented Jul 25, 2023

Uh oh!

codecov bot commented Jul 25, 2023

Codecov Report

Uh oh!

manuel-neuhauser-hs Jul 25, 2023

Choose a reason for hiding this comment

Uh oh!

jf-tech Jul 25, 2023

Choose a reason for hiding this comment

Uh oh!

manuel-neuhauser-hs Jul 25, 2023

Choose a reason for hiding this comment

Uh oh!

manuel-neuhauser-hs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Introducing `repetition_delimiter` to EDI schema. #215

Introducing `repetition_delimiter` to EDI schema. #215