-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset pull fixes #560
Dataset pull fixes #560
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
|
I am getting the error as below when using the pull using
The datasets were created using:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #560 +/- ##
==========================================
- Coverage 87.89% 87.87% -0.02%
==========================================
Files 96 96
Lines 9907 9908 +1
Branches 1350 1350
==========================================
- Hits 8708 8707 -1
- Misses 859 860 +1
- Partials 340 341 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@amritghimire Thanks for finding the issue. This is more related to instantiating dataset for which I will create a separate issue as it's not so trivial to fix atm. I would merge current PR to at least have dataset pull without instantiation working fine (which is enough in majority of use cases for clients) Created follow up issue related to instantiation #566 |
I've fixed the test. Eventually I decided to use 64 bit for random number but limit it to 2^63 - 1 which is the largest possible value that can be stored in SQLite as it only supports signed numbers. Clickhouse supports unsigned ones, so the largest value is 2^64 but as I mentioned, 2^63 - 1 is largest possible value supported in both DBs. This gives us much larger number "pool" then changing this to 32 bit int. |
RAND_MAX = 2**63 # noqa: N806 | ||
else: | ||
RAND_MAX = 2**64 # noqa: N806 | ||
RAND_MAX = DataTable.MAX_RANDOM # noqa: N806 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Won't this test fail when the number drawn by clickhouse is > 2^63?
And if my math is correct the chances of that happening is:
((2^64) - (2^63)) / (2^64) = 0.5 or 50%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Clickhouse we have explicit limit to 2^63-1 https://github.com/iterative/studio/pull/10860/files#diff-1136c5032c82c12345c3103e18093d3256798daa2d4b13e5f5f2d5e8670ac32bR328
In general, now we actually know what is the range of that random number regardless of DB implementation.
Fixing various issues regarding dataset pull after recent feature based schema changes:
UInt32
type as it's needed for Clickhouse random column asUInt64
is too big for SQLlite