Skip to content

Conversation

qstommyshu
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This is the part 5 of #15484 breakdown, as the code changes in #15484 is too large. Part1: #15497, Part2: #15499, Part3: #15533, Part4: 15548, Part5: #15567, Part6: #15578

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No

@github-actions github-actions bot added the sql SQL Planner label Apr 7, 2025
@qstommyshu
Copy link
Contributor Author

The last part of the puzzle is finally done! 🚀

I wrote a script to migrate the test, so the condition where the query is different before/after in large migration should not happen anymore.

@qstommyshu qstommyshu changed the title Migrate roundtrip_statement_with_dialect() to insta with macro Migrate datafusion/sql tests to insta, part7 Apr 7, 2025
@qstommyshu
Copy link
Contributor Author

Hi @alamb and @blaginin ,

This PR is ready to review, please take a look once you have time.

@blaginin
Copy link
Contributor

blaginin commented Apr 7, 2025

This is very cool! Will review now

Copy link
Contributor

@blaginin blaginin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for working on that 🙏

@blaginin
Copy link
Contributor

blaginin commented Apr 7, 2025

I wrote a script to migrate the test

Maybe you could attach your script here? So we can use it in the future if we have a similar task again?

@qstommyshu
Copy link
Contributor Author

qstommyshu commented Apr 7, 2025

I wrote a script to migrate the test

Maybe you could attach your script here? So we can use it in the future if we have a similar task again?

Not sure if it will be helpful. I wrote this script specifically for migrating this test case because it has a very unique structure. Probably need some modification if we want to use it for other tests.

Here is the script:

import re


def process_block(block_lines, test_index):
    fields = {
        "sql": [],
        "expected": [],
        "parser_dialect": [],
        "unparser_dialect": []
    }
    current_field = None
    field_pattern = re.compile(r'^\s*(sql|expected|parser_dialect|unparser_dialect)\s*:\s*(.*)')

    for line in block_lines:
        m = field_pattern.match(line)
        if m:
            current_field = m.group(1)
            content = m.group(2)
            fields[current_field].append(content)
        else:
            if current_field is not None:
                # skip if } is found
                if re.match(r'^\s*\},\s*$', line):
                    continue
                fields[current_field].append(line.rstrip("\n"))

    # concat fields into a single line
    for key in fields:
        combined = "\n".join(fields[key]).strip()
        if combined.endswith(','):
            combined = combined[:-1].rstrip()
        fields[key] = combined

    # keep the sql field as it is
    sql_value = fields["sql"]

    # process expected field
    expected_lines = fields["expected"].splitlines()
    expected_comments = []
    expected_string_lines = []

    in_string = False
    for line in expected_lines:
        stripped = line.strip()
        if not in_string and stripped.startswith("//"):
            expected_comments.append(line.rstrip())
        else:
            # in_string once we met a non commented line
            in_string = True
            expected_string_lines.append(line.rstrip())
    expected_string = "\n".join(expected_string_lines).strip()
    # remove extra comma
    if expected_string.endswith(','):
        expected_string = expected_string[:-1].rstrip()

    # remove Box::new()
    def process_dialect(value):
        value = value.strip()
        m = re.match(r'Box::new\((.*)\)', value)
        if m:
            return m.group(1).strip()
        else:
            return value

    parser_dialect_value = process_dialect(fields["parser_dialect"])
    unparser_dialect_value = process_dialect(fields["unparser_dialect"])

    # construction new tests
    new_test = []
    new_test.append(f"#[test]")
    new_test.append(f"fn roundtrip_statement_with_dialect_{test_index}() -> Result<(), DataFusionError> {{")
    new_test.append(f"    roundtrip_statement_with_dialect_helper!(")
    new_test.append(f'        sql: {sql_value},')
    new_test.append(f'        parser_dialect: {parser_dialect_value},')
    new_test.append(f'        unparser_dialect: {unparser_dialect_value},')
    # put back comments
    for comment in expected_comments:
        new_test.append(f'        {comment}')
    new_test.append(f'        expected: @{expected_string},')
    new_test.append(f"    );")
    new_test.append(f"    Ok(())")
    new_test.append(f"}}")

    return "\n".join(new_test)


def transform_file(filename):
    with open(filename, "r", encoding="utf-8") as f:
        lines = f.readlines()

    output_lines = []
    in_block = False
    block_lines = []
    test_index = 1

    for line in lines:
        if not in_block:
            # new block
            if "TestStatementWithDialect {" in line:
                in_block = True
                block_lines = [line]
            else:
                output_lines.append(line)
        else:
            block_lines.append(line)
            if re.match(r'^\s*\},\s*$', line):
                transformed_block = process_block(block_lines, test_index)
                test_index += 1
                output_lines.append(transformed_block + "\n")
                in_block = False
                block_lines = []

    # write back to the original file
    with open(filename, "w", encoding="utf-8") as f:
        f.write("".join(output_lines))


if __name__ == "__main__":
    transform_file("macro.txt")

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 looks great to me -- thank you @qstommyshu and @blaginin

@alamb
Copy link
Contributor

alamb commented Apr 7, 2025

part7

I think this PR series is a textbook example of how to make large scale refactorings in a way that is reviewable and minimally disruptive. Great work @qstommyshu

@alamb alamb merged commit e35fe6a into apache:main Apr 8, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 8, 2025

🚀

nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* Migrate `roundtrip_statement_with_dialect()` to `insta` with macro

* remove duplicate test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate datafusion/sql tests to insta

3 participants