Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe separators are interfering with Markdown-based feedback loops #209

Open
mjy opened this issue May 10, 2024 · 5 comments
Open

Pipe separators are interfering with Markdown-based feedback loops #209

mjy opened this issue May 10, 2024 · 5 comments

Comments

@mjy
Copy link

mjy commented May 10, 2024

An observation.

We're starting to work with aggregated reports on data submitted to GBIF.

  • TaxonWorks uses pipes (|) to delimit multiple values, as exemplified in many examples in the term standard.
  • Reports are coming to use that also use Pipes (e.g. copy-past of sql dumps).
  • Github permits tables in Markdown ... that use pipes

If we want to clean up reporting "formatting", to better round-trip feedback, then Markdown might be useful as an intermediate format for exchanging issues. However, when we want to include data values in those reports, and those values contain pipes, then we have rendering issues. Obviously we can escape pipes, but this requires another layer of handling.

I'm wondering 2 things:

  1. Should we move away from suggesting pipes as delimiters?
  2. Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term? Surely this character-based standard would greatly increase data interoperability.
@mjy mjy changed the title Pipe separators are interfering with Markdown based feedback loops Pipe separators are interfering with Markdown-based feedback loops May 10, 2024
@ben-norton
Copy link
Member

@mjy
I cross posted this issue in the TAG repo fir the next meeting.
tdwg/tag#47

@cboelling
Copy link
Member

2. Why doesn't TDWG simply require a specific (non-pipe) delimiter when defining multiple values per term?  Surely this character-based standard would _greatly_ increase data interoperability.

Struggling with pipe characters too. (2) would be my preferred solution.

@ben-norton
Copy link
Member

@mjy @tucotuco @timrobertson100
Tim or John please correct me if I'm wrong. It is my understanding that Option 2 was the original directive. Many delimiters can be exceedingly problematic, commas especially. If you break down all of the possible common delimiters, pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.

@tucotuco
Copy link
Member

pipes are arguably the least commonly used characters in string values. Hence, the current suggestion.

That is exactly right. A change in that recommendation would have immense repercussions that I would be loathe to face without a proven better alternative.

@MattBlissett
Copy link
Member

I think Markdown is an inappropriate format for sharing data, so I suggest escaping the characters or using HTML (<td>value | value</td>) which is also valid Markdown — though you'll then need to escape < and &.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants