Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 csv source lost data #7063

Closed
Tracked by #6640
huangjw806 opened this issue Dec 26, 2022 · 6 comments
Closed
Tracked by #6640

S3 csv source lost data #7063

huangjw806 opened this issue Dec 26, 2022 · 6 comments
Assignees
Labels
type/bug Something isn't working

Comments

@huangjw806
Copy link
Contributor

Describe the bug

S3 csv source lost data

To Reproduce

s3 csv file:

id,name,idNum
11,xx1,10341
12,xx2,20231
  1. lost data:
dev=> create materialized source s1(
    id int,
    name varchar,
    idNum varchar,
    primary key(id)
) with (
    connector = 's3',
    s3.region_name = 'ap-southeast-1',
    s3.bucket_name = 'jianwei-test',
    s3.credentials.access = 'xxxx',
    s3.credentials.secret = 'xxxx'
) row format csv delimited by ',';

dev=> select * from s1;
 id | name | idnum
----+------+-------
 11 | xx1  | 10341
(1 row)
@huangjw806 huangjw806 added the type/bug Something isn't working label Dec 26, 2022
@github-actions github-actions bot added this to the release-0.1.16 milestone Dec 26, 2022
@waruto210
Copy link
Contributor

waruto210 commented Dec 27, 2022

I guess it's because there is no line break at the end of the last line.

A record ends at a line terminator

Please add and retry, if the error still occurs, please attach the log of the compute node.

@fuyufjh
Copy link
Member

fuyufjh commented Dec 28, 2022

A record ends at a line terminator

I don't think that is necessary 🤔

@waruto210
Copy link
Contributor

A record ends at a line terminator

I don't think that is necessary 🤔

The csv definition I found requires this. And the csv lib we used in the csv parser also requires this.

@huangjw806
Copy link
Contributor Author

It does work fine after adding a line break at the end of the last line.

If it can't be avoided maybe we should inform users in the documentation that s3 files must obey csv rules.

@fuyufjh
Copy link
Member

fuyufjh commented Dec 28, 2022

@waruto210 I just tried the official example provided by rust-csv (which is based on csv_core, developed by the same author) and it turns out the last row without new-line is correctly parsed.

image

@waruto210
Copy link
Contributor

@waruto210 I just tried the official example provided by rust-csv (which is based on csv_core, developed by the same author) and it turns out the last row without new-line is correctly parsed.

image

Thank you,I'll fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants