Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mappings JSON to CSV converter does not work #3

Open
philipphu opened this issue Mar 29, 2021 · 1 comment
Open

Mappings JSON to CSV converter does not work #3

philipphu opened this issue Mar 29, 2021 · 1 comment

Comments

@philipphu
Copy link

Running the last command in the "Convert_Files_Into_JSON_And_CSV" pipeline the following error occurs:
python Mapping_JsonToCsvConverter.py --mapping_path data/mappings

Traceback (most recent call last):
  File "Mapping_JsonToCsvConverter.py", line 76, in <module>
    sys.exit(main())
  File "Mapping_JsonToCsvConverter.py", line 68, in main
    writeCsvHeader(delimiter, output, str(behid), str(hierpath.get("path", "")), str(hierpath.get("id", -1)))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 86-87: ordinal not in range(128)

The origin seems to be some Non breaking space characters in the mappings.json file. In my file at least they often occur with the following three hierarchies
{"id":1111742,"path":"Lotame Category Hierarchy^Automobiles^Automobile Brands^Asian Made^Mazda^Mazda CX-9 "}
{"id":1111771,"path":"Lotame Category Hierarchy^Automobiles^Automobile Brands^Asian Made^Nissan^Nissan Rogue "}
{"id":1111687,"path":"Lotame Category Hierarchy^Automobiles^Automobile Brands^European Made^BMW^BMW M-Series "}

This is from the mapping file of the feed ids 72 and 73 (which have the same mapping file according to the API response).

@philipphu
Copy link
Author

The quick fix I deployed was changing lines in Mapping_JsonToCsvConverter.py
from

for hierpath in js.get('hierarchy_nodes', []):
    writeCsvHeader(delimiter, output, str(behid), str(hierpath.get("path", "")), str(hierpath.get("id", -1)))

to

for hierpath in js.get('hierarchy_nodes', []):
    hier = str(hierpath.get("path", ""))
    hier = hier.encode('utf-8')
    for replacement in [b'\xc3\x82\xc2\xa0', b'\xc3\x83\xc2\xa2a']:
        hier = hier.replace(replacement, b'')
    hier = str(hier)
    writeCsvHeader(delimiter, output, str(behid), hier, str(hierpath.get("id", -1)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant