-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipe (bar) character (“|”) in MISC values #569
Comments
I believe the guidelines are unambiguous -- all bar characters are split points. In UDPipe, we perform an internal escape mechanism to allow bars as values when needed (if you want details, we use c-string-like formatting, where |
Then I'm afraid some kind of escaping is necessary. In case of |
I guess I’ll omit the translit for “|” for time being, leaving it as a surprise for downstream applications. |
for token 1mnf, see UniversalDependencies/docs#569
Another option is to allow XML-style Unicode escapes, i.e. |
The problem isn’t solved. No intention to introduce escaping? |
@msklvsk : Do you mean to standardize escaping in the specification of the CoNLL-U format? |
Yes, I mean to change this part:
|
I think that the intended meaning of "without special escaping" was not to ban escaping of the vertical bar inside individual items of the list. It was meant to say that the escaping must not involve the "|" character itself (e.g. with backslash, "\|") because then a simple |
Understood. Then we should agree on an escaping convention that doesn’t involve the pipe, like what foxik or amir-zeldes suggested. I like |
The guidelines say
MISC
fieldWhat if a value in
MISC
contains the bar character? LikeLTranslit=||Translit=|
.The text was updated successfully, but these errors were encountered: