-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irc: implement CASEMAPPING parameter for ISUPPORT #2231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it through. 😩 Expected to break and finish tomorrow, but nope.
While I'm (understandably?) annoyed by all the extra constructor parameters that had to happen to make this work, I get that it's impossible—or at least, very code-smell-y—to do it another way.
5705f24
to
02d7044
Compare
I did a rebase and squashed fixup commits. There was some conflicts a bit complicated to sort out properly, I'm glad I made atomic commit so it was easier to test every single one with py.test and flake8 (except for one commit but heh it's squashed now). I added one commit to update What's left is:
I won't write a migration script in that PR, that's too big of a job, and I'm not even sure how to properly do it at the moment. I know it's doable, because I had to write a database migration system into one of my application a few years ago that was using SQLAlchemy and Alembic, so it's clearly doable. |
This is ready for a last round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I found a surprising number of things left to say, despite the many previous comments and despite only re-reviewing about half the files (mostly the ones that "Changed since your last review"). Nobody would ever accuse me of being a code-review Yes Man, huh?
Alright this is good, the last review is all about a better presentation and attention to details, I really like it. 👍 |
That should be good now, @dgw you can read my replies and see if I need to take action or if I can rebase & squash away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one conversation from before left unresolved—but I started some new ones. Hopefully they won't make you 🙁 too much, since it's about fixing details & consistency in the new changes.
And once again! But this time, no new docstrings. 😁 |
I looked, all good. Squash what you're going to squash when ready. |
39541fb
to
151a9bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to do what I should have done earlier and actually approve this.
You can squash the fixup into the commit SHA I gave and trust me (and CI) that everything will work out post-merge, or rebase on master
if you like.
And fix the default casemapping to follow the RFC it is supposed to implement (RFC 1459): only A-Z letters should be lowercased. The lowercase function is now part of a set of public functions, each implement a different "lower" rule: * ascii: implement the "ascii" CASEMAPPING, i.e. A-Z only * rfc1459: implement the "rfc1459" CASEMAPPING, i.e. A-Z + []\~ * rfc1459-strict: implement the same RFC but correctly, i.e. no replacement for ~ (it shouldn't be) The `Identifier` class now use the rfc1459 function, instead of using str.lower. In order to propagate this change correctly, I updated `sopel.db` as well, because str.lower() is not equivalent to Identifier.lower() anymore. Note that this doesn't support unicode Identifier, which would use str.lower directly; the "PRECIS" version of the IRC v3 spec is not ready yet, so it is not implemented for the moment. Also, it would requires to manage not only CASEMAPPING, but also UTF8MAPPING, and that's for another day/feature! See also https://modern.ircdocs.horse/index.html#casemapping-parameter and any links from that document. Co-authored-by: dgw <dgw@technobabbl.es>
Instead of instantiating `tools.Identifier` directly, built-in plugins are now using `bot.make_identifier(nick)` to get an Identifier for nicks and channel. This method comes from `sopel.irc.AbstractBot`, and can be used as an Identifier factory: take a string as input, return an Identifier. In the future, this factory will be able to take advantage of the configuration and the ISUPPORT to select the appropriate CASEMAPPING function. The next steps are: * to use that factory outside of plugins (i.e. triggers and targets) * to implement a different CASEMAPPING depending on context * add documentation about "how to get an Identifier"
Instead of using `tools.Identifier(nick)`, the PreTrigger class now uses a factory to instantiate an Identifier. As a result, the bot (irc.AbstractBot) provides such factory. The Test Factories have been updated to also use the bot's factory when creating test Trigger & PreTrigger. While doing so, type-hint has been added to said factories.
As per PreTrigger, Channel no longuer instantiate Identifier itself: it relies on an Identifier factory to do so. When creating a Channel in coretasks, it provides bot.make_identifier as a factory. Docstrings have been updated, and type-hint have been added for the occasion. Co-authored-by: dgw <dgw@technobabbl.es>
As per PreTrigger and Channel, SopelIdentifierMemory also uses an identifier factory. Sopel and the built-in plugins have been updated accordingly.
The core configuration "nick" option was an Identifier: it was convenient to always have an Identifier for the configured nick. However, it doesn't make a lot of sense: the nick is used in various place as it is (no lowercase), and where it needs to be an Identifier it is always to compare to to another one. As a result: * type(core.nick) is now str * when comparing the config value to an identifier, use bot.make_identifier
And Sopel provides its `make_identifier` method as a factory.
ISUPPORT provide a "CASEMAPPING" parameter, that can tell which casemapping function to use: * ascii: lower A-Z only * rfc1459: ascii + map some special char * rfc1459-strict: ascii + map some special char strictly (no mapping for ~) Whenever the bot received the ISUPPORT parameter, it automatically rebuild its nick according to that parameter. Note: this doesn't implement the UTF8MAPPING parameter; future PR Co-authored-by: dgw <dgw@technobabbl.es>
For type hint, I followed the same type hint provided by SQLAlchemy for connection and session (i.e. return type is an implicit "Any"). Then I switched back the get nick/channel value to accepting an str instead of only an Identifier, since we can always use the Identifier factory now. Co-authored-by: dgw <dgw@technobabbl.es>
Co-authored-by: dgw <dgw@technobabbl.es>
4ce843d
to
5f605ea
Compare
Description
This PR contains various changes with the end goal to use the server's casemapping instead of the hardcoded one. Sadly, this includes breaking changes.
Non-breaking changes
make_identifier
: it uses theCASEMAPPING
parameter from isupport to select the appropriate casemapping function (fromsopel.tools.identifiers
)Identifier
class has several changes:Identifier
now lives its best life insopel.tools.identifiers
, mostly to prevent cyclic import errors. This comes with some more documentation.Identifier
doesn't overridestr.__new__
anymore, I figured out a way to use__init__
instead, and it works fine.Identifier
has a new argument:casemapping
. It's a callable that must take one parameter (a string) and return a string.casemapping
isrfc1459_lower
, making this argument optional (at the moment).self.casemapping
is now used in place ofIdentifier._lower
to get a lowercase version of the identifier stringIdentifier
have a new argument: It's a callable that must take one parameter (a string) and return an instance ofIdentifier
with properly configured casemapping.User
andChannel
(both fromsopel.tools.target
)PreTrigger
(fromsopel.trigger
)SopelDB
(fromsopel.db
)SopelIdentifierMemory
(fromsopel.tools
)Identifier
when necessarycoretasks
and other modules are now usingbot.make_identifier
when they need anIdentifier
SopelDB
now accept astr
(some used to acceptIdentifier
only) since it can always convert them into anIdentifier
when necessaryBreaking changes
Identifier._lower
doesn't lower all character as it used to do, now it uses therfc1459_lower
function which lowercase only [A-Z] characters. In theory, this is not a breaking change because IRC identifiers shouldn't accept non-ASCII characters anyway. However, in the case of IRC server supporting Unicode identifiers (for nicknames and/or channels), this is a breaking change. A solution to that would be to implement rfc7613 (PRECIS) as suggested by https://modern.ircdocs.horse/index.html#casemapping-parametercore.nick
is now astr
and not anIdentifier
anymore, meaning that any plugin using that config option will need to transform it into anIdentifier
; this is considered a breaking change, even tho it shouldn't be a problem considering that most Identifier-related operations are through objects that already convertstr
intoIdentifier
when necessaryGoing further
The next steps would be:
sopel/tools/__init__.py
Checklist
make qa
(runsmake quality
andmake test
)