Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed response #43

Open
kipliklotrika opened this issue Jul 7, 2020 · 2 comments
Open

Malformed response #43

kipliklotrika opened this issue Jul 7, 2020 · 2 comments

Comments

@kipliklotrika
Copy link

When querying for records like following (shown as CSV export):

"content","created","record_url"
"ابراهيم الذيفاني
هذا من أروع الشباب جدآ ونشاطآ وعملآ
منذ العام ٢٠١١م وهو بنفس النشاط والهمه في مجاله الإعلامي والإنساني والحقوقي.
هذا هو الصخره التي تتحطم عليها عيون الحاسدين
كم تعرض لحملات تشويه وتشهير حتى على المستوى الشخصي من ناس أمراض وحاسدين وحاقدين.
نفتخر بتألقك يا إبراهيم ومزيدآ من التقدم والإزدهار","1593891864000","https://some.url"

the result gets split on new line char and the result object looks like following:

{
    "Items": [
        {
            "content": "هذا من أروع الشباب جدآ ونشاطآ وعملآ"
        },
        {
            "content": "منذ العام ٢٠١١م وهو بنفس النشاط والهمه في مجاله الإعلامي والإنساني والحقوقي."
        },
        {
            "content": "هذا هو الصخره التي تتحطم عليها عيون الحاسدين"
        },
        {
            "content": "كم تعرض لحملات تشويه وتشهير حتى على المستوى الشخصي من ناس أمراض وحاسدين وحاقدين."
        },
        {
            "content": "نفتخر بتألقك يا إبراهيم ومزيدآ من التقدم والإزدهار\"",
            "created": "1593891864000",
            "record_url": "https://some.url"
        }
    ],
    "EngineExecutionTimeInMillis": 4155,
    "DataScannedInBytes": 801028163,
    "TotalExecutionTimeInMillis": 4320,
    "QueryQueueTimeInMillis": 120,
    "ServiceProcessingTimeInMillis": 45,
    "DataScannedInMB": 764,
    "QueryCostInUSD": 0.003642752,
    "Count": 5,
    "QueryExecutionId": "90ed2b0a-b1c9-4c37-a27c-c42aa7ab9034",
    "S3Location": "s3://some-bucket/90ed2b0a-b1c9-4c37-a27c-c42aa7ab9034.csv"
}

Seen in athena-express v6.0.1.

@hi019
Copy link

hi019 commented Jul 23, 2020

This is Athena's 'fault'. The default table creation settings use \n (newline) as the line termination character. You can set LINES TERMINATED BY when creating a table (example, docs)

@kipliklotrika
Copy link
Author

Regardless of line termination char, the same char can appear in any of values. In case mentioned above a new-line char is part of the content column and readLine() cannot be used to parse such file. Parser should handle such values. For example the https://www.npmjs.com/package/csv-reader handles it well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants