Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/sql/logictest: Test data containing multi-line values causes test to fail #127785

Closed
kyle-a-wong opened this issue Jul 26, 2024 · 1 comment · Fixed by #127773
Closed

pkg/sql/logictest: Test data containing multi-line values causes test to fail #127785

kyle-a-wong opened this issue Jul 26, 2024 · 1 comment · Fixed by #127773
Assignees
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability

Comments

@kyle-a-wong
Copy link
Contributor

kyle-a-wong commented Jul 26, 2024

Describe the problem

In #91702, logic was added to formatValues to fix an issue where tab and new line characters were interfering with the tab writer output. This fix introduced an edge case where tabular data outputs containing new lines would fail equality checks erroneously.

As an example the following test data file,

statement ok
CREATE TABLE test_table (col1 string, col2 string, col3 string)

statement ok
INSERT INTO test_table (col1, col2, col3) VALUES ('r1c1', 'r1c2', 'r2c3'), ('r1c1', 'r1
nc2', 'r2c3'), ('r1c1', 'r1c2', 'r2c3');

query TTT
SELECT * FROM test_table
ORDER BY col1
----
r1c1  r1c2  r2c3
r1c1  r1    r2c3
      nc2
r1c1  r1c2  r2c3

Would fail with the following error:

    logic.go:2968: 
         
        ../../sql/logictest/testdata/testdata:8: SELECT * FROM test_table
        ORDER BY col1
        expected:
            r1c1  r1c2  r2c3
            r1c1  r1    r2c3
                  nc2
            r1c1  r1c2  r2c3
        but found (query options: "") :
            r1c1  r1c2  r2c3
            r1c1  r1    r2c3
                  nc2
            r1c1  r1c2  r2c3

The reason the test fails is because the data that is generated and written to the test data file is being formatted to neatly present multi-line data in a column. When this data is read into the test as the expected data, it doesn't know that it is a part of a multi-line column and generates an incorrect expected value.

This bug is manifesting itself in #127290 where the new system table is causing TestLogic_gen_test_objects/templates/fewer_tables_generated_than_templates to fail with the following:

/pkg/sql/logictest/testdata/logic_test/gen_test_objects:197: SELECT quote_ident(table_name), quote_ident(column_name), data_type FROM "".information_schema.columns
        WHERE table_catalog = 'newdb2' AND table_schema = 'public'
        ORDER BY table_name, column_name
        LIMIT 20
        expected:
            database_role_settings  database_id             oid
            database_role_settings  "ro'le_id"              oid
            database_role_settings  role_name               text
            database_role_settings  rowid                   bigint
            database_role_settings  settings                ARRAY
            "eventLog"              "\\xa7t̻%pimest̏am  p"  timestamp without time zone
            "eventLog"              "eventType"             text
            "eventLog"              info                    text
            "eventLog"              "reportingID"           bigint
            "eventLog"              rowid                   bigint
            "eventLog"              "targ%ve}tID"           bigint
            "eventLog"              "uniqueID"              bytea
            "j ob\\x97s"            "claim_instAnce_id"     bigint
            "j ob\\x97s"            "claim_session?_ id"    bytea
            "j ob\\x97s"            "create                 timestamp without time zone
                                    d"
            "j ob\\x97s"            created_by_id           bigint
            "j ob\\x97s"            created_by_type         text
            "j ob\\x97s"            "id.\\U000FA67C"        bigint
            "j ob\\x97s"            "j%qob_type "           text
            "j ob\\x97s"            l😲ast_ru😻n              timestamp without time zone
            
        but found (query options: "") :
            database_role_settings  database_id             oid
            database_role_settings  "ro'le_id"              oid
            database_role_settings  role_name               text
            database_role_settings  rowid                   bigint
            database_role_settings  settings                ARRAY
            "eventLog"              "\\xa7t̻%pimest̏am  p"  timestamp without time zone
            "eventLog"              "eventType"             text
            "eventLog"              info                    text
            "eventLog"              "reportingID"           bigint
            "eventLog"              rowid                   bigint
            "eventLog"              "targ%ve}tID"           bigint
            "eventLog"              "uniqueID"              bytea
            "j ob\\x97s"            "claim_instAnce_id"     bigint
            "j ob\\x97s"            "claim_session?_ id"    bytea
            "j ob\\x97s"            "create                 timestamp without time zone
                                    d"
            "j ob\\x97s"            created_by_id           bigint
            "j ob\\x97s"            created_by_type         text
            "j ob\\x97s"            "id.\\U000FA67C"        bigint
            "j ob\\x97s"            "j%qob_type "           text
            "j ob\\x97s"            l😲ast_ru😻n              timestamp without time zone

See the image below of some relevant expressions when running through a debugger:

image

Jira issue: CRDB-40596

@kyle-a-wong kyle-a-wong added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 26, 2024
Copy link

blathers-crl bot commented Jul 26, 2024

Hi @kyle-a-wong, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

kyle-a-wong added a commit to kyle-a-wong/cockroach that referenced this issue Jul 30, 2024
logictest generated test data formats query results
in a in a tabular format using tabwriter. when
a query result contains `\n` characters, it formats
it such that stays within the same tab cell on
output. This works well when there is a single
row result, but when there are multiple rows
with data that contains `\n` characters, the
intput parser doesn't know, nor can it tell,
if a cell is multi-lined or not, erroneously
reporting that data expected data and actual
data don't match.

To Fix, multi-row query responses will now
replace `\n` characters with `\\n` to
ensure tests pass while being well formatted.

Fixes: cockroachdb#127785
Release note: None
@kyle-a-wong kyle-a-wong added the branch-master Failures and bugs on the master branch. label Jul 30, 2024
kyle-a-wong added a commit to kyle-a-wong/cockroach that referenced this issue Jul 31, 2024
logictest generated test data formats query results
in a in a tabular format using tabwriter. when
a query result contains `\n` characters, it formats
it such that stays within the same tab cell on
output. This works well when there is a single
row result, but when there are multiple rows
with data that contains `\n` characters, the
intput parser doesn't know, nor can it tell,
if a cell is multi-lined or not, erroneously
reporting that data expected data and actual
data don't match.

To Fix, multi-row query responses will now
replace `\n` characters with `\\n` to
ensure tests pass while being well formatted.

Fixes: cockroachdb#127785
Release note: None
craig bot pushed a commit that referenced this issue Jul 31, 2024
127773: sql/logictest: Fixes logic test tabular data output r=kyle-a-wong a=kyle-a-wong

logictest generated test data formats query results in a in a tabular format using tabwriter. when a query result contains `\n` characters, it formats it such that stays within the same tab cell on output. This works well when there is a single row result, but when there are multiple rows with data that contains `\n` characters, the input parser doesn't know, nor can it tell, if a cell is multi-lined or not, erroneously reporting that data expected data and actual data don't match.

To Fix, multi-row query responses will now replace `\n` characters with `\\n` to ensure tests pass while being well formatted.

Fixes: #127785
Release note: None

Co-authored-by: Kyle Wong <37189875+kyle-a-wong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-observability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant