Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the external file cannot be joined? #104

Closed
l1t1 opened this issue Dec 5, 2023 · 3 comments
Closed

the external file cannot be joined? #104

l1t1 opened this issue Dec 5, 2023 · 3 comments

Comments

@l1t1
Copy link

l1t1 commented Dec 5, 2023

I use this python script from #78

query the single file is ok.

> select count(*) from external('./t17.csv',COLUMNS => DESCRIPTOR(id int, owner int));
"count"
1499001

> select count(*) from external('./t17_2.csv',COLUMNS => DESCRIPTOR(id int, owner int));
"count"
5001

query the two files failed

> select a.owner, count(*) from external('./t17.csv',COLUMNS => DESCRIPTOR(id int, owner int)) as a,external('./t17_2.csv',COLUMNS => DESCRIPTOR(id int, owner i
nt)) as b where a.id=b.id group by a.owner ;
Error executing SQL: invalid input syntax for integer: file: './t17_2.csv' line:1 column:1
@l1t1
Copy link
Author

l1t1 commented Dec 5, 2023

btw, the csv file has the header.

I add the HEADER clause, it works now, I wonder why the 1st sql select count(*) from external('./t17.csv',COLUMNS => DESCRIPTOR(id int, owner int)) didn't raise an error?

select a.owner, count(*) from external('./t17.csv',COLUMNS => DESCRIPTOR(id int, owner int),HEADER => true ) as a,external('./t17_2.csv',COLUMNS => DESCRIPTOR(id int, owner int),HEADER => true) as b where a.id=b.id group by a.owner ;

@wolfroediger
Copy link
Contributor

You are correct, the header is the problem. The single file queries did not fail because count(*) will just count the number of lines in the CSV file. It does not type-check the values for all the columns. So, in your example Hyper does not check that the values for id and owner are valid integers. Since you did not specify header the count should also be off by one.

The following query will also fail for a single file since now Hyper checks that the values for owner are valid integers:

select owner, count(*) from external('./t17.csv',COLUMNS => DESCRIPTOR(id int, owner int)) group by owner;

@l1t1
Copy link
Author

l1t1 commented Dec 5, 2023

thanks @wolfroediger

@l1t1 l1t1 closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants