Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flytekit][2][untyped dict] Binary IDL With MessagePack #2757

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Sep 18, 2024

Tracking issue

flyteorg/flyte#5318

Why are the changes needed?

We want to support untyped dict with 100% type correct, turn it to msgpack bytes can ensure 100% type correct.

What changes were proposed in this pull request?

  1. change the method dict_to_generic_literal to dict_to_binary_literal
  2. add the method self.from_binary_idl into to_python_val
  3. fix related tests
  4. add test_untyped_dict to test 8 cases of untyped dict

How was this patch tested?

unit tests, local execution and remote execution.

Setup process

from flytekit import task, workflow, ImageSpec

flytekit_hash = "c05a905b241fa825b1a8d78d57aa3e4b9d27becd"

flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"

image = ImageSpec(
    packages=[flytekit],
    apt_packages=["git"],
    registry="localhost:30000",
)

@task(container_image=image)
def dict_task(input: dict) -> dict:
    return input


# Generate more complex dict inputs with lists and dicts as keys and values
dict_inputs = [
    # Basic key-value combinations with int, str, bool, float
    {1: "a", "key": 2.5, True: False, 3.14: 100},
    {"a": 1, 2: "b", 3.5: True, False: 3.1415},

    # Lists as values, mixed types
    {
        1: [1, "a", 2.5, False],
        "key_list": ["str", 3.14, True, 42],
        True: [False, 2.718, "test"],
    },

    # Nested dicts with basic types
    {
        "nested_dict": {1: 2, "key": "value", True: 3.14, False: "string"},
        3.14: {"pi": 3.14, "e": 2.718, 42: True},
    },

    # Nested lists and dicts as values
    {
        "list_in_dict": [
            {"inner_dict_1": [1, 2.5, "a"], "inner_dict_2": [True, False, 3.14]},
            [1, 2, 3, {"nested_list_dict": [False, "test"]}],
        ]
    },

    # More complex nested structures
    {
        "complex_nested": {
            1: {"nested_dict": {True: [1, "a", 2.5]}},  # Nested dict with list as value
            "string_key": {False: {3.14: {"deep": [1, "deep_value"]}}},  # Deep nesting
        }
    },

    # Dict with list as keys (not typical, but valid in Python if list is hashable, here used only as values)
    {
        "list_of_dicts": [{"a": 1, "b": 2}, {"key1": "value1", "key2": "value2"}],
        10: [{"nested_list": [1, "value", 3.14]}, {"another_list": [True, False]}],
    },

    # More nested combinations of list and dict
    {
        "outer_list": [
            [1, 2, 3],
            {"inner_dict": {"key1": [True, "string", 3.14], "key2": [1, 2.5]}},  # Dict inside list
        ],
        "another_dict": {"key1": {"subkey": [1, 2, "str"]}, "key2": [False, 3.14, "test"]},
    },
]



if __name__ == "__main__":
    import os
    import json
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner

    runner = CliRunner()
    path = os.path.realpath(__file__)
    for i, input_dict in enumerate(dict_inputs):
        print(f"\n=== Running test {i + 1} ===")
        json_input = json.dumps(input_dict)  # Convert input to JSON string
        result = runner.invoke(pyflyte.main,
                               ["run",
                                path,
                                "dict_task",
                                "--input",
                                json_input])
        print(f"Local Execution {i}: ", result.output)

    for i, input_dict in enumerate(dict_inputs):
        print(f"\n=== Running test {i + 1} ===")
        json_input = json.dumps(input_dict)  # Convert input to JSON string
        result = runner.invoke(pyflyte.main,
                               ["run",
                                "--remote",
                                path,
                                "dict_task",
                                "--input",
                                json_input])
        print("Remote Execution: ", result.output)

Screenshots

There are 8 test cases.

image
/Users/future-outlier/miniconda3/envs/dev/bin/python /Users/future-outlier/code/dev/flytekit/build/PR/JSON/stacked_PRs/untyped_dict_list_nested.py 

=== Running test 1 ===
Local Execution 0:  Running Execution on local.
{'1': False, 'key': 2.5, '3.14': 100}


=== Running test 2 ===
Local Execution 1:  Running Execution on local.
{'a': 1, '2': 'b', '3.5': True, 'false': 3.1415}


=== Running test 3 ===
Local Execution 2:  Running Execution on local.
{'1': [False, 2.718, 'test'], 'key_list': ['str', 3.14, True, 42]}


=== Running test 4 ===
Local Execution 3:  Running Execution on local.
{'nested_dict': {'1': 3.14, 'key': 'value', 'false': 'string'}, '3.14': {'pi': 3.14, 'e': 2.718, '42': True}}


=== Running test 5 ===
Local Execution 4:  Running Execution on local.
{'list_in_dict': [{'inner_dict_1': [1, 2.5, 'a'], 'inner_dict_2': [True, False, 3.14]}, [1, 2, 3, {'nested_list_dict': [False, 'test']}]]}


=== Running test 6 ===
Local Execution 5:  Running Execution on local.
{'complex_nested': {'1': {'nested_dict': {'true': [1, 'a', 2.5]}}, 'string_key': {'false': {'3.14': {'deep': [1, 'deep_value']}}}}}


=== Running test 7 ===
Local Execution 6:  Running Execution on local.
{'list_of_dicts': [{'a': 1, 'b': 2}, {'key1': 'value1', 'key2': 'value2'}], '10': [{'nested_list': [1, 'value', 3.14]}, {'another_list': [True, False]}]}


=== Running test 8 ===
Local Execution 7:  Running Execution on local.
{'outer_list': [[1, 2, 3], {'inner_dict': {'key1': [True, 'string', 3.14], 'key2': [1, 2.5]}}], 'another_dict': {'key1': {'subkey': [1, 2, 'str']}, 'key2': [False, 3.14, 'test']}}


=== Running test 1 ===
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1726674651.422437 1735447 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
I0000 00:00:1726674651.476439 1735447 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
Remote Execution:  Running Execution on Remote.
Image localhost:30000/flytekit:3R9iB9xupVwhApb5oEhS1g found. Skip building.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/a444fpwpcj2h5sz879wc to see execution in the console.


=== Running test 2 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/actnrqkjgnv5xr52c4qn to see execution in the console.


=== Running test 3 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/ad6hdw2czhj9r5x28pqc to see execution in the console.


=== Running test 4 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/a7mdpbghr8k9xl7hd52b to see execution in the console.


=== Running test 5 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/ajmwfrclxtfmsx9lrz95 to see execution in the console.


=== Running test 6 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/azqfxtjxs9rv4n8r4cc9 to see execution in the console.


=== Running test 7 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/af2jrclxrfnx8xhpzqrp to see execution in the console.


=== Running test 8 ===
Remote Execution:  Running Execution on Remote.

[✔] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/aqp6dzw54ncfdsdmmswn to see execution in the console.


Process finished with exit code 0

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Copy link

codecov bot commented Sep 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.07%. Comparing base (11c3a18) to head (dd5e1c9).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2757       +/-   ##
===========================================
+ Coverage   66.44%   89.07%   +22.62%     
===========================================
  Files           9        8        -1     
  Lines         453      357       -96     
===========================================
+ Hits          301      318       +17     
+ Misses        152       39      -113     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Future-Outlier <eric901201@gmail.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
…ith-message-pack-bytes-2

Signed-off-by: Future-Outlier <eric901201@gmail.com>
@eapolinario
Copy link
Collaborator

@Future-Outlier , why did you close this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants