Skip to content

Introducing repetition_delimiter to EDI schema. #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion doc/edi_in_depth.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ A full EDI schema `file_declaration` is as follows:
"segment_delimiter": "<segment delimiter>", <== required
"element_delimiter": "<element delimiter>", <== required
"component_delimiter": "<component delimiter>", <== optional
"repetition_delimiter": "<repetition delimiter>", <== optional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OUT OF SCOPE]

These 4 delimiters may be gleaned from the ISA header as the whole segment is fixed length and the separator characters are in deterministic positions:

ISA*00*          *00*          *ZZ*CMSFFM         *ZZ*987654321      *230614*1605*^*00501*000001003*1*T*:~
segment_delimiter      - index position 105
element_delimiter      - index position 4
component_delimiter.   - index position 104
repetition_delimiter   - index position 82

This is clearly beyond the scope of this change, and the proposed configuration option matches the current paradigm as expected.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Yes we were aware of the fixed positions of these separators/delimiters in the header segment. However, our design goal / philosophy, as you might have figured out by now, was to make this library as flexible as possible and not bound to a given EDI standard. Different standards (X12/EDIFact/etc) have different headers, different delimiters and their positions. We felt the current approach allows flexibility.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know. Thanks for sharing your insight.

"release_character": "<release character>", <== optional
"ignore_crlf": true/false, <== optional
"segment_declarations": [
Expand Down Expand Up @@ -126,6 +127,15 @@ standards call for a single ASCII character as `element_delimiter`, omniparser a
`component_delimiter` in omniparser allows UTF-8 string. This is optional, and if not specified, you
can treat each element as of a single component.

- `repetition_delimiter`: delimiter to separate multiple data instances for an element. For example,
if `^` is the repetition delimiter for a segment `DMG*D8*19690815*M**A^B^C^D~`, then the last
element has 4 pieces of data: `A`, `B`, `C`, and `D`. Any element without `repetition_delimiter`
present has essentially one piece of data; similarly, if `^` is the repetition delimiter for a
segment `CLM*A37YH556*500***11:B:1^12:B:2~`, the last element has 2 pieces of data: `11:B:1` and
`12:B:2`, each of which is further delimited by a `component_delimiter` `:`. Note, since
`repetition_delimiter` creates multiple pieces of data under the same element name in the schema,
in most cases the suitable construct type in `transform_declarations` is `array`.

- `release_character`: an optional escape character for delimiters. Imagine a piece of element data
contains a `*` which happens to be `element_delimiter`. Without escaping, parser would treat that `*`
as a real delimiter. Any character preceded by `release_character` will be treated literally.
Expand Down Expand Up @@ -550,7 +560,7 @@ And we can add the transform reference into the `FINAL_OUTPUT` directly:
```
Run cli we have:
```
$ cli.sh transform -i 2_ups_edi_210.input.txt -s test.schema.json
$ cli.sh transform -i 2_ups_edi_210.input.txt -s test.schema.json
[
{
"invoice_number": "0000001808WW308"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
{
"Records": [
{
"Children": [
{
"Children": [
{
"Children": null,
"Data": "D8",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d8)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d8",
"FirstChild": "(TextNode 'D8')",
"FormatSpecific": null,
"LastChild": "(TextNode 'D8')",
"NextSibling": "(ElementNode d_date)",
"Parent": "(ElementNode DMG)",
"PrevSibling": null,
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "19910512",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d_date)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d_date",
"FirstChild": "(TextNode '19910512')",
"FormatSpecific": null,
"LastChild": "(TextNode '19910512')",
"NextSibling": "(ElementNode d_cat)",
"Parent": "(ElementNode DMG)",
"PrevSibling": "(ElementNode d8)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "RET",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d_cat)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d_cat",
"FirstChild": "(TextNode 'RET')",
"FormatSpecific": null,
"LastChild": "(TextNode 'RET')",
"NextSibling": "(ElementNode d_cat)",
"Parent": "(ElementNode DMG)",
"PrevSibling": "(ElementNode d_date)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "RET",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d_cat)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d_cat",
"FirstChild": "(TextNode 'RET')",
"FormatSpecific": null,
"LastChild": "(TextNode 'RET')",
"NextSibling": "(ElementNode d_code)",
"Parent": "(ElementNode DMG)",
"PrevSibling": "(ElementNode d_cat)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "2135-2",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d_code)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d_code",
"FirstChild": "(TextNode '2135-2')",
"FormatSpecific": null,
"LastChild": "(TextNode '2135-2')",
"NextSibling": "(ElementNode d_code)",
"Parent": "(ElementNode DMG)",
"PrevSibling": "(ElementNode d_cat)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "2106-3",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode d_code)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "d_code",
"FirstChild": "(TextNode '2106-3')",
"FormatSpecific": null,
"LastChild": "(TextNode '2106-3')",
"NextSibling": null,
"Parent": "(ElementNode DMG)",
"PrevSibling": "(ElementNode d_code)",
"Type": "ElementNode"
}
],
"Data": "DMG",
"FirstChild": "(ElementNode d8)",
"FormatSpecific": null,
"LastChild": "(ElementNode d_code)",
"NextSibling": null,
"Parent": "(DocumentNode)",
"PrevSibling": null,
"Type": "ElementNode"
}
],
"FinalErr": "EOF"
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,167 @@
{
"Records": [
"{'e1':'0','e2':'1','e3':'2'}",
"{'e1':'3','e2':'4','e3':'5'}"
{
"Children": [
{
"Children": [
{
"Children": null,
"Data": "0",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e1)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e1",
"FirstChild": "(TextNode '0')",
"FormatSpecific": null,
"LastChild": "(TextNode '0')",
"NextSibling": "(ElementNode e2)",
"Parent": "(ElementNode ISA)",
"PrevSibling": null,
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "1",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e2)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e2",
"FirstChild": "(TextNode '1')",
"FormatSpecific": null,
"LastChild": "(TextNode '1')",
"NextSibling": "(ElementNode e3)",
"Parent": "(ElementNode ISA)",
"PrevSibling": "(ElementNode e1)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "2",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e3)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e3",
"FirstChild": "(TextNode '2')",
"FormatSpecific": null,
"LastChild": "(TextNode '2')",
"NextSibling": null,
"Parent": "(ElementNode ISA)",
"PrevSibling": "(ElementNode e2)",
"Type": "ElementNode"
}
],
"Data": "ISA",
"FirstChild": "(ElementNode e1)",
"FormatSpecific": null,
"LastChild": "(ElementNode e3)",
"NextSibling": null,
"Parent": "(DocumentNode)",
"PrevSibling": null,
"Type": "ElementNode"
},
{
"Children": [
{
"Children": [
{
"Children": null,
"Data": "3",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e1)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e1",
"FirstChild": "(TextNode '3')",
"FormatSpecific": null,
"LastChild": "(TextNode '3')",
"NextSibling": "(ElementNode e2)",
"Parent": "(ElementNode ISA)",
"PrevSibling": null,
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "4",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e2)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e2",
"FirstChild": "(TextNode '4')",
"FormatSpecific": null,
"LastChild": "(TextNode '4')",
"NextSibling": "(ElementNode e3)",
"Parent": "(ElementNode ISA)",
"PrevSibling": "(ElementNode e1)",
"Type": "ElementNode"
},
{
"Children": [
{
"Children": null,
"Data": "5",
"FirstChild": null,
"FormatSpecific": null,
"LastChild": null,
"NextSibling": null,
"Parent": "(ElementNode e3)",
"PrevSibling": null,
"Type": "TextNode"
}
],
"Data": "e3",
"FirstChild": "(TextNode '5')",
"FormatSpecific": null,
"LastChild": "(TextNode '5')",
"NextSibling": null,
"Parent": "(ElementNode ISA)",
"PrevSibling": "(ElementNode e2)",
"Type": "ElementNode"
}
],
"Data": "ISA",
"FirstChild": "(ElementNode e1)",
"FormatSpecific": null,
"LastChild": "(ElementNode e3)",
"NextSibling": null,
"Parent": "(DocumentNode)",
"PrevSibling": null,
"Type": "ElementNode"
}
],
"FinalErr": "EOF"
}
Loading