Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support comma in column value for service test csv data #1899

Merged
merged 3 commits into from
Jun 29, 2023

Conversation

YannanGao-gs
Copy link
Contributor

@YannanGao-gs YannanGao-gs commented Jun 7, 2023

What type of PR is this?

Bug Fix

This PR is not backwards compaitble.
Some user tests in prod need to be updated.

What does this PR do / why is it needed ?

Before:

Screenshot 2023-06-06 at 10 05 38 AM Screenshot 2023-06-07 at 9 40 28 AM

After:

Screenshot 2023-06-07 at 9 41 13 AM
Screen.Recording.2023-06-09.at.6.47.58.PM.mov

Which issue(s) this PR fixes:

Fixes #

Other notes for reviewers:

Does this PR introduce a user-facing change?

@YannanGao-gs YannanGao-gs requested a review from a team as a code owner June 7, 2023 13:41
@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Test Results

     474 files  ±0       474 suites  ±0   1h 2m 49s ⏱️ - 1m 27s
  9 793 tests +2    9 462 ✔️ +2  331 💤 ±0  0 ±0 
11 655 runs  +2  11 258 ✔️ +2  397 💤 ±0  0 ±0 

Results for commit bff3d66. ± Comparison against base commit fbfdd87.

♻️ This comment has been updated with latest results.

@YannanGao-gs YannanGao-gs self-assigned this Jun 7, 2023
@YannanGao-gs YannanGao-gs force-pushed the setupSql branch 4 times, most recently from db9fb45 to 030e84c Compare June 8, 2023 04:15
'personTable\n'+
'id, firstName, lastName, age, addressId, firmId, managerId\n'+
'1,Peter, "I\'m Smith, Jr",23,1,1,2';
Copy link

@ghost ghost Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is issue with the following test case:
'1,Peter's, "I\'m Smith, Jr" ,23,1,1,2'
We need to add logic to handle the case when quote is not the first character

let charArray = chunk($s, 1);
let finalSplitterState = $charArray->fold({currentChar, splitterState|
if ($currentChar == '\'' || $currentChar == '\"',
|if (!$splitterState.inQuotes,
Copy link

@ghost ghost Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!$splitterState.inQuotes && !$splitterState.currentTokenChars->anySatisfy(x| $x != ' '),

@@ -129,7 +129,7 @@ function <<paramTest.BeforePackage>> meta::relational::tests::dbSpecificTests::s
'---\n'+
'default\n'+
'testTable3\n'+
'ABCDEFGHIJKLMNOPQRSTUVWXYZ,"0123456789","!\"#$%&()*+-./:;<=>?@[\]^_`{|}",abcdefghijklmnopqrstuvwxzy\n'+
'ABCDEFGHIJKLMNOPQRSTUVWXYZ,"0123456789","!\"\"#$%&()*+-./:;<=>?@[\]^_`{|}",abcdefghijklmnopqrstuvwxzy\n'+
Copy link

@ghost ghost Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since double-quote is part of data in this case, it should be enclosed in single quote
'!\"#$%&()*+-./:;<=>?@[\]^_`{|}'

@YannanGao-gs YannanGao-gs force-pushed the setupSql branch 5 times, most recently from 860a697 to a004bda Compare June 9, 2023 22:50
@YannanGao-gs
Copy link
Contributor Author

YannanGao-gs commented Jun 10, 2023

Hello @vikask-gs and @MauricioUyaguari ,

Could we create a clear rule of database name and column name convention? Currently, we have a very loose standard about how users can define a database name or column name, which is making correctly parsing by comma impossible.

In prod

Since we parse the values from connectionTestData by calling split(',') to separate db names and column values ,
Screenshot 2023-06-09 at 11 34 24 PM

we support

  1. database name can be enclosed by double quotes, which enables to have many '\'' and "\"" inside it

There are existing tests (naming.pure) in the codebase

   Table testTable3
    (
      ABCDEFGHIJKLMNOPQRSTUVWXYZ INT PRIMARY KEY,
      "0123456789" INT,
      "!\"#$%&()*+-./:;<=>?@[\]^_`{|}" VARCHAR(20), 
      abcdefghijklmnopqrstuvwxzy INT
    )

  1. Each column value in the CSV row is supported to have only one (or many) single quote or double quote
Screenshot 2023-06-09 at 10 48 57 PM Screenshot 2023-06-09 at 10 50 02 PM

(this example exists in legend-testable-relational-service-embeddedData.pure)
Screenshot 2023-06-09 at 10 54 38 PM

Now

Since we already allow the cases above, there would be many way of parsing the csv data below

Screenshot 2023-06-09 at 11 14 25 PM
  1. id = 1 firm_id = 2 firstName = "helloworld, " lastName = again"

  2. id = 1 firm_id = 2 firstName = "helloworld lastName = ", again"

No matter how I improve the algorithm, I can't guarantee it could be backward compatible so all existing use cases won't be broken.

Either

A. Could we take a step back to think if it really makes sense to have a delimiter inside the column value?
Currently, the input is the whole string e.g.

Screenshot 2023-06-09 at 11 34 24 PM

There would be tons of corner cases if we allow the cases above

B. Could we make sure that we add the rules below if we have to include a delimiter inside the column value?
Not sure if this will break Prod or not

1.   As long as there is a ' or ", there must be another ' or " to close it. Otherwise use \\' or \\"
 so the current existing test like the below should not be allowed and should be updated 

(this example exists in legend-testable-relational-service-embeddedData.pure)
Screenshot 2023-06-09 at 10 54 38 PM

'1,Peter's, "I\'m Smith, Jr" ,23,1,1,2' should be '1,Peter\'s, "I\'m Smith, Jr" ,23,1,1,2'

2.  if there is a delimiter inside the column value, users have to enclose it with quotes 
 like "test, comma"

@YannanGao-gs
Copy link
Contributor Author

Spoke to Mauricio offline

  1. comma is only wrapped by double quotes
  2. if there is an open double quote, it needs to be closed by another double quote

@YannanGao-gs YannanGao-gs force-pushed the setupSql branch 5 times, most recently from 7aa02da to 54707b2 Compare June 23, 2023 16:00
@YannanGao-gs YannanGao-gs changed the title fix a bug where fail to generate correct testDataSetupSql when there are delimiters in column values support comma in column value for service test csv data Jun 23, 2023
@YannanGao-gs YannanGao-gs added the not backwards compatible This PR is not backwards compatible label Jun 28, 2023
@MauricioUyaguari MauricioUyaguari enabled auto-merge (squash) June 29, 2023 14:56
@MauricioUyaguari MauricioUyaguari dismissed ghost ’s stale review June 29, 2023 14:57

spoke offline with Vikas and Yannan on this PR

@MauricioUyaguari MauricioUyaguari merged commit f0c587d into finos:master Jun 29, 2023
YannanGao-gs added a commit to YannanGao-gs/legend-engine that referenced this pull request Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not backwards compatible This PR is not backwards compatible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants