Migrate datafusion/sql tests to insta, part7 #15621

qstommyshu · 2025-04-07T15:32:23Z

Which issue does this PR close?

Closes Migrate datafusion/sql tests to insta #15397
Related Migrate datafusion/sql tests to insta,sql integrations #15484, Migrate datafusion/sql tests to insta, part1 #15497, Migrate datafusion/sql tests to insta, part2 #15499, Migrate datafusion/sql tests to insta, part3 #15533, Migrate datafusion/sql tests to insta, part4 #15548 , Migrate datafusion/sql tests to insta, part5 #15567, #Migrate datafusion/sql tests to insta, part6 #15578
this is a part of Migrate datafusion/sql tests to insta,sql integrations #15484 breaking down.
Checkout things to note of the whole migration in comments section of Migrate datafusion/sql tests to insta,sql integrations #15484.

Rationale for this change

What changes are included in this PR?

This is the part 5 of #15484 breakdown, as the code changes in #15484 is too large. Part1: #15497, Part2: #15499, Part3: #15533, Part4: 15548, Part5: #15567, Part6: #15578

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No

qstommyshu · 2025-04-07T15:36:30Z

The last part of the puzzle is finally done! 🚀

I wrote a script to migrate the test, so the condition where the query is different before/after in large migration should not happen anymore.

qstommyshu · 2025-04-07T19:10:09Z

Hi @alamb and @blaginin ,

This PR is ready to review, please take a look once you have time.

blaginin · 2025-04-07T19:33:29Z

This is very cool! Will review now

datafusion/sql/tests/cases/plan_to_sql.rs

blaginin

thank you for working on that 🙏

blaginin · 2025-04-07T19:52:14Z

I wrote a script to migrate the test

Maybe you could attach your script here? So we can use it in the future if we have a similar task again?

qstommyshu · 2025-04-07T20:12:41Z

I wrote a script to migrate the test

Maybe you could attach your script here? So we can use it in the future if we have a similar task again?

Not sure if it will be helpful. I wrote this script specifically for migrating this test case because it has a very unique structure. Probably need some modification if we want to use it for other tests.

Here is the script:

import re


def process_block(block_lines, test_index):
    fields = {
        "sql": [],
        "expected": [],
        "parser_dialect": [],
        "unparser_dialect": []
    }
    current_field = None
    field_pattern = re.compile(r'^\s*(sql|expected|parser_dialect|unparser_dialect)\s*:\s*(.*)')

    for line in block_lines:
        m = field_pattern.match(line)
        if m:
            current_field = m.group(1)
            content = m.group(2)
            fields[current_field].append(content)
        else:
            if current_field is not None:
                # skip if } is found
                if re.match(r'^\s*\},\s*$', line):
                    continue
                fields[current_field].append(line.rstrip("\n"))

    # concat fields into a single line
    for key in fields:
        combined = "\n".join(fields[key]).strip()
        if combined.endswith(','):
            combined = combined[:-1].rstrip()
        fields[key] = combined

    # keep the sql field as it is
    sql_value = fields["sql"]

    # process expected field
    expected_lines = fields["expected"].splitlines()
    expected_comments = []
    expected_string_lines = []

    in_string = False
    for line in expected_lines:
        stripped = line.strip()
        if not in_string and stripped.startswith("//"):
            expected_comments.append(line.rstrip())
        else:
            # in_string once we met a non commented line
            in_string = True
            expected_string_lines.append(line.rstrip())
    expected_string = "\n".join(expected_string_lines).strip()
    # remove extra comma
    if expected_string.endswith(','):
        expected_string = expected_string[:-1].rstrip()

    # remove Box::new()
    def process_dialect(value):
        value = value.strip()
        m = re.match(r'Box::new\((.*)\)', value)
        if m:
            return m.group(1).strip()
        else:
            return value

    parser_dialect_value = process_dialect(fields["parser_dialect"])
    unparser_dialect_value = process_dialect(fields["unparser_dialect"])

    # construction new tests
    new_test = []
    new_test.append(f"#[test]")
    new_test.append(f"fn roundtrip_statement_with_dialect_{test_index}() -> Result<(), DataFusionError> {{")
    new_test.append(f"    roundtrip_statement_with_dialect_helper!(")
    new_test.append(f'        sql: {sql_value},')
    new_test.append(f'        parser_dialect: {parser_dialect_value},')
    new_test.append(f'        unparser_dialect: {unparser_dialect_value},')
    # put back comments
    for comment in expected_comments:
        new_test.append(f'        {comment}')
    new_test.append(f'        expected: @{expected_string},')
    new_test.append(f"    );")
    new_test.append(f"    Ok(())")
    new_test.append(f"}}")

    return "\n".join(new_test)


def transform_file(filename):
    with open(filename, "r", encoding="utf-8") as f:
        lines = f.readlines()

    output_lines = []
    in_block = False
    block_lines = []
    test_index = 1

    for line in lines:
        if not in_block:
            # new block
            if "TestStatementWithDialect {" in line:
                in_block = True
                block_lines = [line]
            else:
                output_lines.append(line)
        else:
            block_lines.append(line)
            if re.match(r'^\s*\},\s*$', line):
                transformed_block = process_block(block_lines, test_index)
                test_index += 1
                output_lines.append(transformed_block + "\n")
                in_block = False
                block_lines = []

    # write back to the original file
    with open(filename, "w", encoding="utf-8") as f:
        f.write("".join(output_lines))


if __name__ == "__main__":
    transform_file("macro.txt")

alamb

😍 looks great to me -- thank you @qstommyshu and @blaginin

alamb · 2025-04-07T21:05:33Z

part7

I think this PR series is a textbook example of how to make large scale refactorings in a way that is reviewable and minimally disruptive. Great work @qstommyshu

alamb · 2025-04-08T10:33:09Z

🚀

* Migrate `roundtrip_statement_with_dialect()` to `insta` with macro * remove duplicate test case

Migrate roundtrip_statement_with_dialect() to insta with macro

0ab4c12

github-actions bot added the sql SQL Planner label Apr 7, 2025

qstommyshu changed the title ~~Migrate roundtrip_statement_with_dialect() to insta with macro~~ Migrate datafusion/sql tests to insta, part7 Apr 7, 2025

blaginin reviewed Apr 7, 2025

View reviewed changes

datafusion/sql/tests/cases/plan_to_sql.rs Show resolved Hide resolved

blaginin approved these changes Apr 7, 2025

View reviewed changes

remove duplicate test case

f8e79d1

alamb approved these changes Apr 7, 2025

View reviewed changes

alamb merged commit e35fe6a into apache:main Apr 8, 2025
27 checks passed

nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025

Migrate datafusion/sql tests to insta, part7 (apache#15621)

fe21347

* Migrate `roundtrip_statement_with_dialect()` to `insta` with macro * remove duplicate test case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate datafusion/sql tests to insta, part7 #15621

Migrate datafusion/sql tests to insta, part7 #15621

Uh oh!

qstommyshu commented Apr 7, 2025

Uh oh!

qstommyshu commented Apr 7, 2025

Uh oh!

qstommyshu commented Apr 7, 2025

Uh oh!

blaginin commented Apr 7, 2025

Uh oh!

Uh oh!

blaginin left a comment

Uh oh!

blaginin commented Apr 7, 2025

Uh oh!

qstommyshu commented Apr 7, 2025 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb commented Apr 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

alamb commented Apr 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Migrate datafusion/sql tests to insta, part7 #15621

Migrate datafusion/sql tests to insta, part7 #15621

Uh oh!

Conversation

qstommyshu commented Apr 7, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

qstommyshu commented Apr 7, 2025

Uh oh!

qstommyshu commented Apr 7, 2025

Uh oh!

blaginin commented Apr 7, 2025

Uh oh!

Uh oh!

blaginin left a comment

Choose a reason for hiding this comment

Uh oh!

blaginin commented Apr 7, 2025

Uh oh!

qstommyshu commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alamb commented Apr 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qstommyshu commented Apr 7, 2025 •

edited

Loading

alamb commented Apr 7, 2025 •

edited

Loading