Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Float64 for Clickhouse (SaaS) #12

Closed
ilongin opened this issue Jul 11, 2024 · 13 comments
Closed

Add support for Float64 for Clickhouse (SaaS) #12

ilongin opened this issue Jul 11, 2024 · 13 comments
Labels
enhancement New feature or request priority-p1

Comments

@ilongin
Copy link
Contributor

ilongin commented Jul 11, 2024

The problem is that if number that can only be represented with double precision, e.g 0.3813476562 is inserted using CH database (in SaaS), it's actually inserted as single precision (Float32) as python float is saved as Float32 column type in database by default. This way precision of that value is lost for user.

One solution is to support datachain.sql.types when defining output schema (currently only python basic values are supported together with Feature type).

Example that works for SQLite but fails for Clickhouse:

import pandas as pd
from datachain.lib.dc import DataChain

num = 0.3813476562   # number that requires 64 bits to not loose precision
chain = DataChain.from_pandas(pd.DataFrame([[num]], columns=['number']))
assert chain.collect_one("number")[0] == num
@ilongin ilongin added enhancement New feature or request priority-p1 labels Jul 11, 2024
@skshetry
Copy link
Member

@ilongin, would it be possible to change the test assumptions to reduce precision (eg: with pytest.approx)?

@ilongin
Copy link
Contributor Author

ilongin commented Jul 11, 2024

I will fix the tests for now (using 0.5 value everywhere for float) but I also added TODO with link on this issue to change values to something more realistic when this is fixed
#13

@skshetry
Copy link
Member

pinging @dmpetrov for changing default size of float.

@dmpetrov
Copy link
Member

It looks like we have to move to float64 by default since it's python default type. Related to iterative/dvcx#1697

@dmpetrov
Copy link
Member

assert chain.collect_one("number")[0] == num

That's not a usualy way of comparing floats. Please use pytest.approx(x) or asb(val - target) < 1e-6

Otherwise, moving to float64 makes sense. We should do that everywhere.

@ilongin
Copy link
Contributor Author

ilongin commented Jul 11, 2024

assert chain.collect_one("number")[0] == num

That's not a usualy way of comparing floats. Please use pytest.approx(x) or asb(val - target) < 1e-6

Otherwise, moving to float64 makes sense. We should do that everywhere.

That was just an example to show that values are different between Clickhouse and SQLite so this is not a real test or something.

@dmpetrov
Copy link
Member

so this is not a real test or something.

oh, now got it!

@dmpetrov
Copy link
Member

@ilongin is it still relevant or we can close?
Based on my understanding we are working with float64 only rn. Other types are not supported yet.

@ilongin
Copy link
Contributor Author

ilongin commented Aug 12, 2024

@dmpetrov yes, looks like example in description works now so I'm closing this issue.

@ilongin ilongin closed this as completed Aug 12, 2024
@skshetry
Copy link
Member

I don't think we have made any changes to this. @ilongin, can you point me the commit that fixed this please?

@ilongin
Copy link
Contributor Author

ilongin commented Aug 13, 2024

@skshetry I made a mistake testing this on SQLite DB instead of Clickhouse. Reopening it agan.

@dreadatour
Copy link
Contributor

Let's discuss this approach, please? https://github.com/iterative/studio/pull/10528 🙏👀

@dreadatour
Copy link
Contributor

Should be fixed now by related Studio issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority-p1
Projects
None yet
Development

No branches or pull requests

4 participants