Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small cleanups in Unicode normalization code #82224

Closed
gnprice opened this issue Sep 6, 2019 · 4 comments
Closed

small cleanups in Unicode normalization code #82224

gnprice opened this issue Sep 6, 2019 · 4 comments
Labels
3.9 only security fixes topic-unicode

Comments

@gnprice
Copy link
Contributor

gnprice commented Sep 6, 2019

BPO 38043
Nosy @rhettinger, @benjaminp, @ezio-melotti, @gnprice
PRs
  • bpo-38043: Use bool for boolean flags on is_normalized_quickcheck. #15711
  • bpo-38043: Move unicodedata.normalize tests into test_unicodedata. #15712
  • bpo-37966: Fully implement the UAX #15 quick-check algorithm. #15558
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-09-06.04:46:54.216>
    labels = ['3.9', 'expert-unicode']
    title = 'small cleanups in Unicode normalization code'
    updated_at = <Date 2019-09-10.19:31:45.964>
    user = 'https://github.com/gnprice'

    bugs.python.org fields:

    activity = <Date 2019-09-10.19:31:45.964>
    actor = 'rhettinger'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Unicode']
    creation = <Date 2019-09-06.04:46:54.216>
    creator = 'Greg Price'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38043
    keywords = ['patch']
    message_count = 4.0
    messages = ['351229', '351373', '351600', '351740']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'benjamin.peterson', 'ezio.melotti', 'Greg Price']
    pr_nums = ['15711', '15712', '15558']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38043'
    versions = ['Python 3.9']

    @gnprice
    Copy link
    Contributor Author

    gnprice commented Sep 6, 2019

    Benjamin noticed in reviewing #59763 (for bpo-37966) several points where the existing code around Unicode normalization can be improved:

    • on the QuickcheckResult enum:

      Maybe makeunicodedata.py should output this enum (with better name namespacing)

    • merging test_normalization into this file [i.e. test_unicodedata.py] for clarity

    • These "boolean int" parameters could be actual bools. [sc. the nfc and k parameters to is_normalized_quickcheck]

    None of these are super hard, so good to knock them out while we're thinking of them.

    @gnprice gnprice added 3.9 only security fixes topic-unicode labels Sep 6, 2019
    @benjaminp
    Copy link
    Contributor

    New changeset 7669cb8 by Benjamin Peterson (Greg Price) in branch 'master':
    bpo-38043: Use bool for boolean flags on is_normalized_quickcheck. (GH-15711)
    7669cb8

    @benjaminp
    Copy link
    Contributor

    New changeset 1ad0c77 by Benjamin Peterson (Greg Price) in branch 'master':
    bpo-38043: Move unicodedata.normalize tests into test_unicodedata. (GH-15712)
    1ad0c77

    @rhettinger
    Copy link
    Contributor

    This is mostly harmless but I'm concerned that we're encouraging a new Python developer to:

    • churn code in mostly minor ways, irrelevant to users

    • altering code long known to be stable, increasing
      the risk of introducing new bugs or performance changes

    • altering code in ways that are atypical for our
      code base (i.e. the bool type isn't a norm in our
      code, we mostly use int for that)

    • altering code without communicating with the developer
      who originally wrote that code (if they are still active)

    • consuming the time of reviewers when they could be working
      on known bugs, legitimate feature requests, or documentation

    • one-off or drive-by code alterations rather that what
      Guido calls "holistic refactoring" where we do clean-ups
      while understanding and thinking about the module as a
      whole and focusing on the user experience.

    • unfortunately, making lots of random, minor changes to
      a code base in a major project is an addictive experience
      and IMO it would be best to re-channel it early, particularly
      if the changes are motivated by "I like my style of coding
      more than that of the original contributor". Style changes
      are highly subjective and usually we defer to the original
      contributor who was closest to the problem being solved.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes topic-unicode
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants