You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since version 0.3.x we are able to index Timestamp and Date. But when we try to filter but those Timestamps, the Query is not well parsed and returns an empty DataFrame.
After a few prints and tests, I discovered that the d.getTime we use in LinearTransformer to initialize min and max of LinearTransformation, returns a Long X. In contrast, the filtering from the user returns a different Long Y.
This difference does not allow the QuerySpecBuilder to initialize the correct space. QuerySpaceFromTo returns an EmptySpace, which contains 0 files to read.
How to reproduce?
1. Code that triggered the bug, or steps to reproduce:
We are indexing in milliseconds, while Spark is filtering in microseconds (which gives a difference of 1000x).
Spark has two functions that convert a date to a Timestamp: unix_timestamp and to_timestamp. The first one pre-processes the value as seconds, while the second one converts it to milli. This is at writing time.
While reading, we do not have a special way of treating Timestamp type. So the filtering gets parsed to microseconds and QuerySpecBuilder is unable to find any areas of the index that match the space.
Possible solutions:
Ensure that all writings are done in microseconds. Or explicitly add an option for the preferred measure.
Parse the Timestamp correctly when reading. Detect the Timestamp data type in the filters from Spark and apply any transformation to match the index area.
I am working on defining a document with a proper review of how to store Time-Series. Thanks for your patience.
What went wrong?
Since version 0.3.x we are able to index
Timestamp
andDate
. But when we try to filter but those Timestamps, the Query is not well parsed and returns an empty DataFrame.After a few prints and tests, I discovered that the
d.getTime
we use inLinearTransformer
to initializemin
andmax
ofLinearTransformation
, returns a Long X. In contrast, the filtering from the user returns a different Long Y.This difference does not allow the
QuerySpecBuilder
to initialize the correct space.QuerySpaceFromTo
returns anEmptySpace
, which contains 0 files to read.How to reproduce?
1. Code that triggered the bug, or steps to reproduce:
Using some code in the
TransformerIndexingTest
:But the returned DataFrame contains 0 rows.
2. Branch and commit id:
I'm using photon branch:
photon-datasource-standalone
3. Spark version:
On the spark shell run
spark.version
.Spark version 3.3.0 and Delta version 2.1.0
4. Hadoop version:
On the spark shell run
org.apache.hadoop.util.VersionInfo.getVersion()
.3.3.4
5. How are you running Spark?
Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?
Local
The text was updated successfully, but these errors were encountered: