Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sub module conflict error #295

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

StellarisW
Copy link

@StellarisW StellarisW commented Nov 4, 2024

Currently the logic of register sub module to sys.modules may cause conflict,

That's because of the PR #249 which fix #215

def load(path,
         module_name=None,
         include_dirs=None,
         include_dir=None,
         encoding='utf-8'):

    real_module = bool(module_name)
    thrift = parse(path, module_name, include_dirs=include_dirs,
                   include_dir=include_dir, encoding=encoding)
    if threadlocal.incomplete_type:
        fill_incomplete_ttype(thrift, thrift)

    if real_module:
        sys.modules[module_name] = thrift
        sub_modules = thrift.__thrift_meta__["includes"][:]
        while sub_modules:
            module = sub_modules.pop()
            if module not in sys.modules:
                sys.modules[module.__name__] = module
                sub_modules.extend(module.__thrift_meta__["includes"])
    return thrift

For example:

We have thrift files as follows:

thrift files
├── idl
│   ├── main.thrift
│   ├── included1.thrift
│   ├── included2.thrift

If we call load('idl/main.thrift', module_name='idl.main_thrift'), then the following modules will be registered:

  • idl.main_thrift
  • included1_thrift
  • included2_thrift

So, if included1.thrift is a common module name like platform, which is platform.thrift, then module platform will register to sys.modules, so it will cause conflict.

Now I come up with 2 solutions:

  1. Add an option to load(), which will be def load(path, module_name=None, include_dirs=None, include_dir=None, encoding='utf-8'), and let the users to register sub modules manually. (not recommended)

  2. Optimize sub module name. (recommended)

    use absolute path as its module name instead of the base name of the thrift file.

    let's take the example before, then the following modules will be registered:

    • idl.main_thrift
    • idl.included1_thrift
    • idl.included2_thrift

@StellarisW StellarisW closed this Nov 4, 2024
@StellarisW StellarisW reopened this Nov 4, 2024
@StellarisW
Copy link
Author

Please take a review💌
@aisk @ethe

@aisk
Copy link
Member

aisk commented Nov 7, 2024

Sorry, I can't access my development computer these days. I will review it a few days later.

But the test case should be passed before merging.

@StellarisW
Copy link
Author

Sorry, I can't access my development computer these days. I will review it a few days later.

But the test case should be passed before merging.

That's OK, I'll fix the test cases.

@@ -949,3 +960,8 @@ def _get_ttype(inst, default_ttype=None):
if hasattr(inst, '__dict__') and '_ttype' in inst.__dict__:
return inst.__dict__['_ttype']
return default_ttype

def remove_suffix(s, suffix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use s.replace(suffix, "") is better.

import itertools

from faker import Faker 

SUFFIX = "__SOME_SUFFIX_"

class MyFaker(Faker):
    def suffix_name(self):
        return f"{self.name()}{SUFFIX}"

fake = MyFaker()
data_num = 10000
escape_ratio = 0.5
escape_num = int(data_num * escape_ratio)

suffix_texts = [fake.suffix_name() for _ in range(escape_num)]
normal_texts = [fake.name() for _ in range(data_num - escape_num)]

def test_and_replace():
    for text in itertools.chain(suffix_texts, normal_texts):
        if text.endswith(SUFFIX):
            res = text.replace(SUFFIX, "")
    return res

def test_and_replace_with_slie():
    for text in itertools.chain(suffix_texts, normal_texts):
        if text.endswith(SUFFIX):
            res = text[:-len(SUFFIX)]
    return res

def just_replace():
    for text in itertools.chain(suffix_texts, normal_texts):
        res = text.replace(SUFFIX, "")
    return res


__benchmarks__ = [
    (test_and_replace, just_replace, "just_replace"),
    (test_and_replace, test_and_replace_with_slie, "test_and_replace_with_slie")
]
                                         Benchmarks, repeat=5, number=20                                          
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃                  Benchmark ┃ Min     ┃ Max     ┃ Mean    ┃ Min (+)         ┃ Max (+)         ┃ Mean (+)        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│               just_replace │ 0.043   │ 0.043   │ 0.043   │ 0.021 (2.0x)    │ 0.022 (1.9x)    │ 0.022 (2.0x)    │
│ test_and_replace_with_slie │ 0.041   │ 0.042   │ 0.042   │ 0.040 (1.0x)    │ 0.041 (1.0x)    │ 0.040 (1.0x)    │
└────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice, it seems better indeed.

Comment on lines 47 to 50
sub_modules = thrift.__thrift_meta__["sub_modules"][:]
for module in sub_modules:
if module not in sys.modules:
sys.modules[module.__name__] = include_thrift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sub_modules = thrift.__thrift_meta__["sub_modules"][:]
for module in sub_modules:
if module not in sys.modules:
sys.modules[module.__name__] = include_thrift
lost_modules = [
m for m in thrift.__thrift_meta__["sub_modules"] if m not in sys.modules
]
for module in lost_modules:
sys.modules[module.__name__] = include_thrift

Comment on lines 70 to 73
child_module_name = str(
child_path).replace(os.sep,
".").replace(
".thrift", "_thrift")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code's indents are a little bit strange, I think it's better to using something like:

aaa = bbb().ccc()\
    .ddd()

@aisk
Copy link
Member

aisk commented Nov 26, 2024

Thank you for your work, but I still have a question.

Although the included Thrift modules wouldn't overwrite each other when they have the same name but are in different subdirectories after merge this PR, we support loading a Thrift file with a dot in its name, such as foo.bar.thrift. This means that if there is both a foo/bar.thrift and a foo.bar.thrift, they might still overwrite each other.

I think the expected behavior should be to raise an exception to inform users about the conflict. I believe the current code already has this issue, but the new code might have the potential to address it.

@StellarisW
Copy link
Author

Thank you for your work, but I still have a question.

Although the included Thrift modules wouldn't overwrite each other when they have the same name but are in different subdirectories after merge this PR, we support loading a Thrift file with a dot in its name, such as foo.bar.thrift. This means that if there is both a foo/bar.thrift and a foo.bar.thrift, they might still overwrite each other.

I think the expected behavior should be to raise an exception to inform users about the conflict. I believe the current code already has this issue, but the new code might have the potential to address it.

Okay, I'll raise an exception in this case.

@StellarisW
Copy link
Author

@aisk All things have done, but there is still a test fail in def test_load_fp(), it because of the load_fp() uses threadlocal.thrift_cache which has been set in other test, don't know what is the best way to fix it.
CleanShot 2024-11-29 at 13 34 30@2x

@cocolato
Copy link
Contributor

It would be good if you could squash all the commits down to one, just to tidy up the history.

@StellarisW
Copy link
Author

It would be good if you could squash all the commits down to one, just to tidy up the history.

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cannot pickle dump object from child idl file class
3 participants