Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on some inputs [Python 3.5.2] #16

Open
HallowedPoint opened this issue Nov 2, 2017 · 12 comments
Open

Crash on some inputs [Python 3.5.2] #16

HallowedPoint opened this issue Nov 2, 2017 · 12 comments

Comments

@HallowedPoint
Copy link

Hello,

I've been working on a game using miniboa as the base, and have discovered it gets crashy when it encounters inputs that (assumedly?) cp1252 does not know how to deal with:

Traceback (most recent call last):
  File "urmud.py", line 86, in <module>
    game.mainLoop(gameServer)
  File "/home/x/Documents/Development/python/urmud/working/game.py", line 341, in mainLoop
    pollServer(gameServer)
  File "/home/x/Documents/Development/python/urmud/working/game.py", line 319, in pollServer
    server.poll()
  File "/home/x/Documents/Development/python/urmud/working/miniboa/async.py", line 188, in poll
    self.clients[sock_fileno].socket_recv()
  File "/home/x/Documents/Development/python/urmud/working/miniboa/telnet.py", line 292, in socket_recv
    data = str(self.sock.recv(2048), "cp1252")
  File "/usr/lib/python3.5/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1: character maps to <undefined>

So far I've discovered this works with ⁒ (u2052) and ⁄ (u2044), but considering the sheer size of unicode I'm just assuming there are more.

I'm actually using this to learn Python and I'm not really sure what the best way to handle this would be, so I figured I'd point it in the direction of the professionals.

And I did test it with the example chat_demo.py to be sure it wasn't something I broke. I was quite surprised to discover that it was not. :)

@HallowedPoint HallowedPoint changed the title Crash on some inputs Crash on some inputs [Python 3.5.2] Nov 2, 2017
@shmup
Copy link
Owner

shmup commented Nov 9, 2017

Ha, thanks for pointing this out. I'm going to try looking into this and thinking about it.

@shmup shmup added bug and removed bug labels Feb 3, 2018
@shmup
Copy link
Owner

shmup commented Feb 3, 2018

@gergelypolonkai was curious if you had any input here.

It seems some unicode characters do not map to cp1252. I'm unsure of the best way to go about this. Do I just catch the exception, but then what? Just ignore the input?

Some input here on which bytes will error https://stackoverflow.com/a/26330256

Like, I can convert from utf-8 and if fixes it but I'm unsure if there was a reason, like, for purity sake it was using cp1252? I don't grok encoding probably. :)

09e90ff

shmup referenced this issue Feb 3, 2018
This addresses a UnicodeDecodeError regarding unmapped code points for
cp1252. I really don't know if this is the right answer.
@shmup
Copy link
Owner

shmup commented Feb 3, 2018

No that is bad, I'm not doing the above, heh.

@gergelypolonkai
Copy link
Contributor

Well, that is a strange problem and I don’t think there is a good solution to it.

The linked SO answer lists the character codes that have no actual characters assigned (like the 0x81 mentioned by @HallowedPoint). When assuming the input is CP1252, you might filter such bytes out before processing it. However, that solves such problems for this specific code page; I don’t know if there are others with similar “missing” code points.

<rant>Also, it’s strange to se CP[anything] nowadays. Is there really someone out there who is not using UTF-8 nowadays? </rant>

@shmup
Copy link
Owner

shmup commented Feb 18, 2018

Fixed with 09e90ff but we'll see if this gets backlash

@gergelypolonkai yeah heh re: your rant, I just said screw it and will see if anyone rants back at my choice. I doubt it'll happen

@shmup shmup closed this as completed Feb 18, 2018
@shmup
Copy link
Owner

shmup commented Feb 20, 2018

I was just thinking, maybe I should restore the previous encoding, cp1252, (I still don't know why), and let the user override the encoding when instantiating a TelnetServer instance?

@shmup shmup reopened this Feb 20, 2018
@gergelypolonkai
Copy link
Contributor

Despite you restore it or not, I really like the idea of allowing charset selection in the constructor. It makes the whole thing much more flexible.

My the way, is there a reason to use cp1252? Or is it because itʼs “legacy”?

@shmup
Copy link
Owner

shmup commented Feb 28, 2018

@gergelypolonkai yeah that's exactly what I was thinking, based on a change you did in one of your PR's. That's exactly what I should do. I'll uh, default to utf-8 and let you set it to cp1252 in the constructor. And yes, I believe the idea was in the spirit of some romantic masturbatory old unix thing.

Really that's just a guess though 🤷‍♂️

@shmup
Copy link
Owner

shmup commented Mar 7, 2018

Well I think my change exposed a problem, re: #22

I'm being a bad software maintainer and rushing things. I'll certainly end up reverting, I think.

@HallowedPoint
Copy link
Author

Hi,

I have started poking at my project again recently, and while doing so it occured to me that I never shared my solution to this that I did a couple weeks after my initial report.

I simply added "errors=ignore" to the str() call. :P

data = str(self.sock.recv(2048), "cp1252", errors='ignore')

This seems to just straight up drop anything it doesn't like, but considering this is user input for a MUD, I am not in the least concerned about that:

wiz ⁒ (u2052) and ⁄ (u2044)
[wiz/Testdude]: (u2052) and (u2044)

Now, I am only like 1% more experienced at Python than I was when I wrote the first message, so I have absolutely zero idea if this has any potential for mayhem, and I am sure there is a more proper solution... but it ain't crashing anymore, so it seems like progress to me. :P

@shmup
Copy link
Owner

shmup commented Apr 30, 2019

I'm sorry I haven't really paid attention to this much more. I should. Maybe someone else will, heh

@shmup
Copy link
Owner

shmup commented Sep 19, 2022

Ok, I went ahead and took @gergelypolonkai suggestion of specifying the encoding in the TelnetServer constructor.

You can now TelnetServer(encoding='utf-8') to resolve this issue. I have a test capturing it, and hope this is a good resolution!

Give it a shot if you're still messing with this library, @HallowedPoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants