Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit license fails with UTF decode error #462

Closed
goneall opened this issue Apr 3, 2023 · 22 comments · Fixed by #482
Closed

Submit license fails with UTF decode error #462

goneall opened this issue Apr 3, 2023 · 22 comments · Fixed by #482
Labels
Submit New License Issues related to the submit new license feature

Comments

@goneall
Copy link
Member

goneall commented Apr 3, 2023

When running the latest main branch code on my local machine, I'm getting the following exception:

Unexpected error, please email the SPDX technical workgroup that the following error has occurred: 
Traceback (most recent call last): 
File "/spdxonlinetools/src/app/views.py", line 125, in submitNewLicense matchingIds, matchingType, _ = utils.check_spdx_license(licenseText) 
File "/spdxonlinetools/src/app/utils.py", line 518, in check_spdx_license spdxLicenseTexts = list(map(lambda x: x.decode('utf-8'), r.mget(spdxLicenseIds))) 
File "/spdxonlinetools/src/app/utils.py", line 518, in <lambda> spdxLicenseTexts = list(map(lambda x: x.decode('utf-8'), r.mget(spdxLicenseIds))) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
@goneall
Copy link
Member Author

goneall commented Apr 3, 2023

Note that the error is occurring on this line:

spdxLicenseTexts = list(map(lambda x: x.decode('utf-8'), r.mget(spdxLicenseIds)))

which which was modified with PR #444

If I change lines 517 and 518 back, the problem goes away.

@BanulaKumarage - from your PR it looks like if I change these lines back it will re-introduce the unit test failure. Let me know what you think.

@BanulaKumarage
Copy link
Contributor

BanulaKumarage commented Apr 4, 2023

Note that the error is occurring on this line:

spdxLicenseTexts = list(map(lambda x: x.decode('utf-8'), r.mget(spdxLicenseIds)))

which which was modified with PR #444

If I change lines 517 and 518 back, the problem goes away.

@BanulaKumarage - from your PR it looks like if I change these lines back it will re-introduce the unit test failure. Let me know what you think.

Yeah @goneall it was changed for the fix of unit test failure. As I mentioned in the PR #444 byte type patterns were not converted to utf-8 strings. That's why I changed that way. Can you tell me how to recreate this error? I will work in this issue when I recreate the error.

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

Can you tell me how to recreate this error? I will work in this issue when I recreate the error.

Completely understand - if you can duplicate it, you can fix it :) I just created a docker image per the README, started it up and whenever I created an issue, it caused the error.

I tried clearing the redis DB and recreating the license data (a couple lines up from the changed code in util.py) and it still occurred.

I can reproduce it at will, so if you want me to collect any data, let me know.

@BanulaKumarage
Copy link
Contributor

@goneall can you debug the code by putting a debug point at 519 and send me the r.keys() and spdxLicenseIds?

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

debug.txt

debug2.txt

@BanulaKumarage Attached is the output of the r.keys() from debug and license ID's (debug.txt and debug2.txt resp).

@BanulaKumarage
Copy link
Contributor

Thank you @goneall. I will look into this.

@BanulaKumarage
Copy link
Contributor

@goneall Is this the place where error is raised? I ran the below sample python script.
script.txt
I got the output as follows. The list was decoded with utf-8.
output.txt

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

The error is the next line:

spdxLicenseTexts = list(map(lambda x: x.decode('utf-8'), r.mget(spdxLicenseIds)))

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

I can't print out the r.mget - it is too large

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

Here's the output of print(r.mget(spdxLicenseIds[0]))

[b'\x1f\x8b\x08\x00\xff,+d\x02\xff\xadV\xed\x8e\xdb8\x0c|\x15\xc2\x7fn\x17\xc8\xfa\xb0\xed\x13\xc9\x16c\xab+K>Iv6}\xfa\x1bJ\xf1W\xd2l\x8b\xc3\xfd\x08\x10;$5\x9c\x19RQ\x03;\x8dO\x8a\x94z&\xc7)\xb6jd\x1a\xa7\xc6\x9a\x96\xf0a\x17\x99f\x0e\xd1xG\xef\xf5;\xbdTn\xb4\xd5+\xb5\xdeE\x13\x91\xe8\xcf9w\xf0?\x8d\xb5\xea\xab\xd4\x8bI}\x8e={k\xfd\xc5\xb8\x8e\xd4\n\xe0D\xc6\xb5v\xd2\xf2\x96?{\xd3\x98D\xea\xed\t\xa2\x9a\xce\xc6r$\xa3\x91j\xce\x86u)^\xfd6\xb3"\x15\x98:\x0f\\\x0eY\xcd\xf5O\x1b\xafIim\x12\x9e\x94\xa5\xc4a\x88\xa4\xc6\x11\x81\xaa\xb1L\xc9\x7fU\xa7&>\x9f\xb9M\xb5\x04\xa1\xecC%\xcd\xb1\r\xa6\x01 \xe3\x10c\xe2SDoo;\xce(\xf6\xca\xda\x8c\xe3\xba@Xth\xfd0L\x0e\xf0\x92\x0f\xd4Z\x83x\xbc\xd38\xdci\x89\x95\xc4Vh\xc0\xa1\xf9\xfd\xe44\x87r\xf8\n\xdb\x98\x9a\xaa\x05\xca_\x91\x9a\x80\xec[BE\x03+\x17\x8fER\xaf\xd2\x86]\xc3 hkJ\x90\n\x89\x7f\x03\xc9\x88~\x8d\xb8\x06XC\x14 [\xd0\r\x82wL\x88\x1c<\x84JAi\x1eT\xf8x\x89\xaft\xe9M\xdbg\xfd`\xbe\x14`\xa2\xa2\xe0z\x1e\x8a\xec\x82\x9cOK\'\x1a\x96\x0b4\xc5g]\xa2\xcd\xb5\x88\xf0c}\xe7\xebC\x14i\x8f&\xa4d\x07\x0e`0w\xa5`\xba>\xe5\x1e\xa4\xb2\x90\xbf\xc2\x8d\x1bm\xd5)\xff\xb4>\x93\xcbG\xf4>\x98\x9f\xdeU\xf9,\xe9\xf7\x18d\xa5v\xef\'q\xacD\x9c\xf2o\x80\x02\xc3\xa0b\xd5q\xfb\xe1\xe5\xcb\x0f5\xabJ\xf2\xf37\xb1\xd1\x98\xe4}\x04\x8c\x04\xc1\xfc%b\xa4*\xe2\x99\x1d\x993\xc5\t\xf4\x14\x88BR\x19\xba\xc5xB\xbc\xe9\x8c\x183\xcb\x99e\xd0\x180\xd8\x08~\x8d\xe0j\xc6\xc7\xa9\xc6X\x93\xb2\xe7\xe03q\x9f\x9e\xb8<A\x18\xd5\xa6\t\x15<\x8c\xdb\xe5\xbc\x9a\xc6`\xa4E\x7f\xa3S\x86\xfc\xf1\xb4GmN\x9b,\xbd\x8a\x9b\x9a\x88\t\x9aFtx-\xa9\x8b\xbc\xe8\xe2\x89]\xebe@\xf83\xc9,\x1c\x9d\x9a\x8f\x841s\xf0\xda\x02\xfa:\x07?\x80\xae\x0fA\x9c\xa9{8\x1a\xac\x1b\x9b\x97\xc0\x97\xf8\x07\x85\xf8\xde\xfb\x98y\nl\xa0$\xac\x04\xd7\xe7\xb2\xb9\x14\xde\xf9\xe3<\xc9V\xf3pu\xc3\x02 \xf0?\x93\x91\xdf\x8ec\xb3/@\xd1O\xa1\xe5\x95\x95\xe1tT\xbeT\xf5\x93\xd5e\x04/\x06\x80\x1a.\xeb\\\xe7\x83\xab\x83\xe4\xd5/GF\xb8\xc6\xe2?Df[\x1f\xd0c6\x8d\xc3\xa6\x83\xc0\x1a(\xac\xc7\xf4\xd74\xcb2\x05\xd2\x8e\x1d\x07e\xeb\xe2\x84\xd5,\xf9F\x89\xd8\x96\xb2x\xbfo\xcbma\x12\r}\xb6<\xa6;=\xe3\xc8m\xb9\t\x96\x15\xba\xee\xc9\xd3Zn\xae\xbfe\x94s\xfd]p|\xab\x0b\t\xb0\xa7\xd7S\x9bb}\xd4\xeb6\x1c\xc7\xa6P~\t\xbfe\xc3J\xee\xb0\xfe\xef\xac\xb7\xdbI\x81-\xabx\xb7\xb6\xf4\x14\x96\x81H\x17OWV\xd8\x8c\xdb\x1d)\xefoy\xa4\xc50\xb7\x1b\xf70<\xa7\xd5*Y\xe6\xdd\r\xb3\x82m\x18\xb3Z|\xdc\xfc\x00!\x0b\x81\xe5\n\xcaE\xf7\xce\x15\x9e\x84\x84e\x01>\xad\x0b^\xe1\x823\x18\x92\xb1\xca\xc5\xf2\xd4\x00N,\xabZ\x19\xb7\xd3e5QVAY\xa48(?\xf3\xb6\x1c\xeetX \x08\xda\xbd\xbf\x81\xf9\t\xe5\xfb\x7f\x13G\x93\xe2\xbd\x0f\xa3\x97\xc9\x93=\xc22\x89w\xe4m\xf7\xc9^\xc4\xff\x83\xbe_W\xfe\xef\x04\xe2\xde\xc2\x1e^\xb7,\xae\'i$&\x14_\x9c\x93\xd7\xda\xe3`\xbd\xbf\x93j`\xebr9A\xcb\xd9\xc4\x1c\x13\xb8S!go\x95\xf3^Y\x13_\xd4ki\xef\xe6\xc3\xb5\xc3\xe3\x9f\x11y\xc0\x92\x1a\xf3\xed\x0f\xff\xa2\x92@\xf2\xc7\x1e\xfe\x05\xa4\x11\xd5\xb9\x82\n\x00\x00']

@BanulaKumarage
Copy link
Contributor

BanulaKumarage commented Apr 4, 2023

@goneall if it's possible can you send me the value of r.mget(spdxLicenseIds) and r.mget(r.keys())?

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

@BanulaKumarage The value of r.mget(spdxLicenseIds) is too big to capture on the screen. Above, I sent you the output of r.mget(spdxLicenseIds[0]). Can you use that to see if it decodes?

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

BTW - you should be able to duplicate this if you use Docker and containers to run the app.

@BanulaKumarage
Copy link
Contributor

@BanulaKumarage The value of r.mget(spdxLicenseIds) is too big to capture on the screen. Above, I sent you the output of r.mget(spdxLicenseIds[0]). Can you use that to see if it decodes?

Yeah @goneall I can use that.

@BanulaKumarage
Copy link
Contributor

BTW - you should be able to duplicate this if you use Docker and containers to run the app.

Okay @goneall I will try that way. But a small clarification. Why the issue raises specifically when running from docker? And not raising when running the app without docker?

@goneall
Copy link
Member Author

goneall commented Apr 4, 2023

Why the issue raises specifically when running from docker? And not raising when running the app without docker?

Docker has a more reproducible environment - when your running it from your local machine it will be impacted by what your operating system is. The production environment is also run from a Docker image. I also know that this occurs in a docker image on my machine.

@BanulaKumarage
Copy link
Contributor

Thank you @goneall for the clarification. I got the idea. I will look into it.

@rtgdk
Copy link
Collaborator

rtgdk commented Apr 16, 2023

@BanulaKumarage Let's try to fix this on priority. It's not able to decode this character https://bytetool.web.app/en/ascii/code/0x8b/. Seems like a common issue with utf-8 for this character.

@BanulaKumarage
Copy link
Contributor

@rtgdk Is this character coming as b'\x02' in the list to decode?

@jlovejoy
Copy link
Member

@rtgdk @goneall - what is the status of this issue?

@jlovejoy jlovejoy added the Submit New License Issues related to the submit new license feature label Jun 21, 2023
@goneall
Copy link
Member Author

goneall commented Jun 22, 2023

@jlovejoy I believe the issue is still open based on the comment history.

@goneall
Copy link
Member Author

goneall commented Jul 2, 2023

Note - this issue is preventing me from deploying the latest code base.

goneall added a commit that referenced this issue Jul 2, 2023
Fixes #462 but may re-introduce unit test failures issue #202
by partially reverting PR #444

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Submit New License Issues related to the submit new license feature
Projects
None yet
4 participants