Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-1635741 unicodedata #21375

Closed
wants to merge 4 commits into from
Closed

Conversation

koubaa
Copy link
Contributor

@koubaa koubaa commented Jul 7, 2020

@koubaa koubaa changed the title Bpo 1635461 unicodedata bpo-1635461 unicodedata Jul 7, 2020
@koubaa koubaa changed the title bpo-1635461 unicodedata bpo-1635741 unicodedata Jul 7, 2020
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is wrong. You must retrieve the module state from "self". See my comments.


/* Previous versions */
v = new_previous_version("3.2.0", get_change_3_2_0, normalization_3_2_0);
v = new_previous_version(m, "3.2.0", get_change_3_2_0, normalization_3_2_0);
if (v != NULL)
PyModule_AddObject(m, "ucd_3_2_0", v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add error checking. For this call and other calls. Be careful, PyModule_AddObject() only requires DECREF on error!

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@vstinner
Copy link
Member

vstinner commented Jul 8, 2020

The purpose of UCD_Check() is to ensure that the method is called on a unicodedata.UCD instance, not to check how the method is called.

For "methods exported using a capsule (codecs.c uses it)", I have no idea how to retrieve the module state.

Moreover, I'm not sure that the Capsule API is compatible with the ability of having multiple instances of the module. So it sounds like it's incompatible with the PEP 489.

We had a similar discussion when someone tried to convert the _datetime module to PEP 489 and heap types.

For now, I suggest to close the PR.

@koubaa
Copy link
Contributor Author

koubaa commented Jul 9, 2020

@vstinner thank you for taking the time to review. Please reconsider. You are correct that:

For "methods exported using a capsule (codecs.c uses it)", I have no idea how to retrieve the module state.

However I found that in my changes, since the only module state I added is ucd_type - the methods used by the capsule do not need to access ucd_type. I will show some evidence here:

The only capsule methods which are exported are _getucname and _getcode.

static const _PyUnicode_Name_CAPI hashAPI =
{
    sizeof(_PyUnicode_Name_CAPI),
    _getucname,
    _getcode
};

_getcode is used only in unicodeobject.c (see around line 6510) and the first argument which is passed in is null.

Similarly _getucname is only used in codecs.c (see around lin 989) and the first argument which is passed in is null.

Now, here in unicodedata.c, _getcode only uses this argument to call _cmpname() with it, which calls _getucname using this argument again. _getucname does indeed call UCD_Check, but it does so like this:

if (self && UCD_Check(self)) {
    /* in 3.2.0 there are no aliases and named sequences */
    const change_record *old;
    if (IS_ALIAS(code) || IS_NAMED_SEQ(code))
        return 0;
    old = get_old_record(self, code);
    if (old->category_changed == 0) {
        /* unassigned */
        return 0;
    }
}

In fact, every call to UCD_Check is preceded by a null check on self, which prevents any of the calls from capsules from actually depending on module state.

Now that I I have shown why I state that capsules do not depend on module state, let me try to explain better why my change to UCD_Check is correct.

The methods which contain UCD_Check are both members of the type or members of the module. So this module was written to reduce duplication. However, not all of these methods are duplicate. There is some code which is only run if it the method was called on an object using the ucd_type. Like the name suggests, UCD_Check allows these methods to check if this is the case.

Given that method calls coming from capsules will never reach UCD_Check - effectively UCD_Check is returning 1 if it called on the ucd_type and 0 if it is called on the module type. There are two approaches to implement this check, one by checking for the ucd_type and the other by checking the module type. I changed the approach without changing the behavior.

Thanks for your consideration

@vstinner
Copy link
Member

vstinner commented Jul 9, 2020

If you only want to detect when self is a module, you should remove UCD_Check(o) and replace it with !PyModule_Check(o).

@vstinner
Copy link
Member

vstinner commented Jul 9, 2020

I would be simpler to review if you could write a first PR just to rework UCD_Check() checks, rather than changing too many things in the same PR.

@koubaa
Copy link
Contributor Author

koubaa commented Jul 9, 2020

@vstinner understood. I will do that.

@koubaa
Copy link
Contributor Author

koubaa commented Jul 9, 2020

@vstinner see the other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants