You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 8, 2025. It is now read-only.
{
"id": "extraction-97066793-d536-4dd1-92bc-600a11415aa7",
"status": "failed",
"result": {
"created_at": "2023-08-21T13:35:52.195837",
"enqueued_at": "2023-08-21T13:35:52.195896",
"started_at": "2023-08-21T13:35:52.216881",
"job_result": null,
"job_error": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/site-packages/rq/worker.py\", line 1428, in perform_job\n rv = job.perform()\n File \"/usr/local/lib/python3.10/site-packages/rq/job.py\", line 1278, in perform\n self._result = self._execute()\n File \"/usr/local/lib/python3.10/site-packages/rq/job.py\", line 1315, in _execute\n result = self.func(*self.args, **self.kwargs)\n File \"/workers/./operations.py\", line 249, in data_card\n dataset_response, dataset_dataframe, dataset_csv_string = get_dataset_from_tds(\n File \"/workers/./utils.py\", line 148, in get_dataset_from_tds\n dataframe = pandas.read_csv(dataset_file)\n File \"/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py\", line 912, in read_csv\n return _read(filepath_or_buffer, kwds)\n File \"/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py\", line 577, in _read\n parser = TextFileReader(filepath_or_buffer, **kwds)\n File \"/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py\", line 1407, in __init__\n self._engine = self._make_engine(f, self.engine)\n File \"/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py\", line 1679, in _make_engine\n return mapping[engine](f, **self.options)\n File \"/usr/local/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py\", line 93, in __init__\n self._reader = parsers.TextReader(src, **kwds)\n File \"pandas/_libs/parsers.pyx\", line 550, in pandas._libs.parsers.TextReader.__cinit__\n File \"pandas/_libs/parsers.pyx\", line 639, in pandas._libs.parsers.TextReader._get_header\n File \"pandas/_libs/parsers.pyx\", line 850, in pandas._libs.parsers.TextReader._tokenize_rows\n File \"pandas/_libs/parsers.pyx\", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status\n File \"pandas/_libs/parsers.pyx\", line 2021, in pandas._libs.parsers.raise_parser_error\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 87107: invalid continuation byte\n"
}
}
This can be addressed by dynamically detecting the encoding prior to reading the CSV in pandas. See this notebook for reference on how to do this with chardet.
The text was updated successfully, but these errors were encountered:
Yes, it would be helpful to get this information as part of metadata of a Dataset on TDS instead of testing on ta1-service?
This is our perennial issue--since TDS doesn't ever "touch" the data it has no way to pull the encoding and store it. Would have to be TA1 service or the HMI server
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
@YohannParis reports an issue with this dataset when trying to profile it since it's not
utf-8
.us-counties-2023.csv
The service errors with:
This can be addressed by dynamically detecting the encoding prior to reading the CSV in pandas. See this notebook for reference on how to do this with
chardet
.The text was updated successfully, but these errors were encountered: