-
Apparently, the SQL standard is weird for string constants: 'foo' is valid as 'foo''bar' is valid as 'foo'
'bar' is valid as Therefore: 'foo'
\t
'bar' is also valid as 'foo' a 'bar' is invalid. 'foo' 'bar' is also invalid. The whitespace has no newlines. I don't even know how to think about composing parsers. How would I break this problem down? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
So, I think there are two steps here.
The first step is usually to start off with the simplest case and then build up from there. Assuming your lexer/parser takes strings as inputs, the simplest case - matching just strings like // Matches any character that is *not* the end of the string
let str_char = any().and_is(just('\'').not());
// Matches many string characters, delimited on both ends by `'`
// (`to_slice` is used to extract the inner characters only as the parser's output)
let str = str_char.repeated().to_slice().delimited_by(just('\''), just('\'')); To allow both let str_parser = |c| {
let str_char = any().and_is(just(c).not());
str_char.repeated().to_slice().delimited_by(just(c), just(c))
};
let str = str_parser('\'').or(str_parser('"')); Building up further, you'll have to settle on a more precise rule for how concatenation works. I've not read the SQL spec, but based on your explanation it seems like "any string literals separated by (inline whitespace and at least one newline) OR nothing can be concatenated". So, let's encode that in a parser: // Matches any number of space or tab characters (including none!)
let inline_ws = one_of([' ', '\t']).repeated();
// Matches a newline, padded on either side by any amount of inline whitespace, at least once
let newline_sep = text::newline().padded_by(inline_ws).repeated().at_least(1);
// Matches a newline separator or nothing (allows 'foo''bar' to be concatenated)
let sep = newline_sep.or_not();
// Matches one or more strings, separated by our separators, collecting them into a vector
let str_chain = str.separated_by(sep).at_least(1).collect::<Vec<_>>(); As far as I can tell, this should solve problem (1). Problem (2) can be solved by just placing a let str_chain = str_chain.map(|strs| {
// Concatenate the string chain elements into one long string
strs.iter().collect::<String>()
}) I've not tested this code so no doubt there will be some minor things to fiddle around with to get it working, but hopefully this gives you a sense of the general shape of a solution. |
Beta Was this translation helpful? Give feedback.
So, I think there are two steps here.
Recognising a valid 'string chain' (or whatever terminology SQL uses)
Turning the string chain into the final string
The first step is usually to start off with the simplest case and then build up from there. Assuming your lexer/parser takes strings as inputs, the simplest case - matching just strings like
'foo'
- might look something like this: