Check for encoding on resources from KN to allow for more dynamic snippets #65

sarahrichmond · 2018-10-23T23:39:35Z

Knowing the encoding will enable us to write more dynamic snippets where we can visualise the data better

jyucsiro · 2018-10-24T01:35:11Z

Found this python library which can detect the csv encoding
https://github.com/chardet/chardet

The CSV code in a jupyter environment then gets a bit messy though trying to figure out which encoding to use... there could be 30+ types

hoylen · 2018-10-24T07:50:26Z

And it is not just the character encoding that might be different. I've seen many variations of CSV around (e.g. how they treat commas, new lines and escaping characters in values). There is no real standard... and even if there was, not everyone might implement it properly.

It feels like we need a general framework where different snippets can be assigned to different data sets, based on an expandable set of rules and metadata.

Currently, we (want to) have two snippets: download CSV and download anything. But the further we go, we'll have to deal with more variants (e.g. download UTF-8 CSV, download CSV that puts values with commas in double quotes, download CSV that uses backslashes to escape commas).

At one extreme, the rules need only find one snippet for a type of file. At the other extreme, there might need to be a custom snippet that is only used for one particular dataset. In between, a single snippet is used with all CSV from a particular publisher, but a different snippet used for other publishers. That is, the metadata for the rules might already be available, or at the worst case there needs to be a "use this particular snippet" metadata property.

Maintaining this will be a lot of work, so maybe we should let users contribute. Or at least let them tell us when a snippet no longer works for a particular dataset and/or to vote it down. Maybe they can be given a pop-up menu of possible snippets they can use, with a default already chosen, but with other options that might work -- with the "download anything" snippet as the option of last resort. Sounds like a code sharing project/feature in its own right!

sarahrichmond assigned jyucsiro Oct 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for encoding on resources from KN to allow for more dynamic snippets #65

Check for encoding on resources from KN to allow for more dynamic snippets #65

sarahrichmond commented Oct 23, 2018 •

edited

Loading

jyucsiro commented Oct 24, 2018 •

edited

Loading

hoylen commented Oct 24, 2018

Check for encoding on resources from KN to allow for more dynamic snippets #65

Check for encoding on resources from KN to allow for more dynamic snippets #65

Comments

sarahrichmond commented Oct 23, 2018 • edited Loading

jyucsiro commented Oct 24, 2018 • edited Loading

hoylen commented Oct 24, 2018

sarahrichmond commented Oct 23, 2018 •

edited

Loading

jyucsiro commented Oct 24, 2018 •

edited

Loading