Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialect support #190

Closed
julianhyde opened this issue Nov 5, 2018 · 7 comments
Closed

Dialect support #190

julianhyde opened this issue Nov 5, 2018 · 7 comments

Comments

@julianhyde
Copy link
Owner

julianhyde commented Nov 5, 2018

There are various behaviors in SQLLine that depend on the particular database. (Keywords, comment syntax, identifier quote string, and others.) This change would gather these into a single class, Dialect.

(The original text of this case dealt with comment syntax in MySQL. We have kept it intact in the following paragraphs. This case is now a more general refactoring, but will also fix the original issue with MySQL comments.)

In MySQL, unlike other databases, "--" is not sufficient to start a comment. It has to be followed by whitespace. Thus:

# Valid in MySQL, invalid in Oracle and PostgreSQL
select * from emp where 5 = --5;

# Valid in both MySQL, Oracle and PostgreSQL (note space in '- -')
select - -5 from emp; -- an end-of-line comment

Does this have an impact on line continuations and/or highlighting?

@snuyanzin
Copy link
Collaborator

snuyanzin commented Nov 6, 2018

Does this have an impact on line continuations and/or highlighting?

Yes it does.

The same issue is with MariaDB[1] as a fork of MySQL. Besides -- both of these databases also use # for one-line comment.
Another example Cassandra[2] which could use // for one-line comments.

I see two points here

  1. It looks like there are a number of discrepancies between sql standard and different databases and from time to time there will be appear new cases which are valid for specific db but which are not valid from sql standard's or sqlline's point of view. Unlike highlighting the line continuation could not be turned off and it is more critical issue. Thus, it would be great to have property useLineContinuation with true value by default but with a chance to turn it off.
  2. We could use a rule (currently it is called SqlLineHighlighter.HighlightRule), there was mentioned an idea (Sql syntax highlight support #164 (comment) and Sql syntax highlight support #164 (comment)) to move it from SqlLineHighlighter to DatabaseConnection. There could be one more field to describe a set of lines defining one-line comments for particular connection. Unfortunately I didn't find any method from DatabaseMetaData telling about one-line comment symbols, however it is possible to have such a map between database product name ( java.sql.DatabaseMetaData#getDatabaseProductName) and set of strings used for one-line comments like
default: "--"
MySQL: "-- ", "--\n", "#"
Cassandra: "--", "//"

[1] https://mariadb.com/kb/en/library/comment-syntax/
[2] https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlRefComment.html

@julianhyde
Copy link
Owner Author

Agree on both of your points.

@julianhyde
Copy link
Owner Author

Making a quick pass over PR #192 - how about renaming BuiltInDBSpecifics to Dialect? (Yeah, I know some of the things are not quite dialect, but it's near enough, and a lot shorter.)

@snuyanzin
Copy link
Collaborator

Well, I'm ok with such naming.
Renamed BuiltInDBSpecifics to Dialect, and also did rename DBSpecificsRule => DialectRule

@julianhyde julianhyde changed the title In MySQL, "--" comments are different Dialect support Nov 9, 2018
@julianhyde
Copy link
Owner Author

I propose a further refactoring:

  • Add a new interface Dialect, which has most of the behavior of your DialectRule.
  • Change your enum Dialect to enum BuiltInDialect implements Dialect.
  • Move implementation code into utility class Dialects, and also have an abstract class DialectImpl implements Dialect for people who want to write their own custom dialect.
  • Make sure that dialect constructors are stupid simple (only field assignment); we can add factory methods, in future, if complex construction logic is required.

We arrived at a similar pattern for dialects in Calcite and Mondrian (after many inferior systems over the years), and it worked well.

Do those steps sound OK? Maybe I do the refactoring?

@snuyanzin
Copy link
Collaborator

snuyanzin commented Nov 11, 2018

I saw you have already did the great job related to refactoring you mentioned! Thank you.
I like the changes you suggested.

While testing I found several issues but the most critical from my point of view is that any exception while highlight/line continuation check could lead to infinite loop (like it was mentioned in #164 (comment)). It happens because of while(true) loop and no exception handling on highlight/line continuation side. So I introduced this handling with switching to default behavior for the specific line, and there is no infinite loop with trace printing.
Also there are fixes for 2 more issues and tests for them:

  1. one-symbol length string
  2. line ending with / symbol

The PR where I introduced it is #195

@julianhyde
Copy link
Owner Author

julianhyde commented Nov 12, 2018

Fixed in f9d4444 and e32ec72.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants