Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation into data class corrupts TimeZones of Instant #1047

Open
martinitus opened this issue Jan 31, 2025 · 1 comment
Open

Transformation into data class corrupts TimeZones of Instant #1047

martinitus opened this issue Jan 31, 2025 · 1 comment

Comments

@martinitus
Copy link

Reproduction:

@Test
fun foo() {

    data class TimeZonesTest(val with_timezone_offset: Instant, val without_timezone_offset: LocalDateTime)

    val csvContent =
        """
        with_timezone_offset,without_timezone_offset
        2024-12-12T13:00:00+01:00,2024-12-12T13:00:00
        """.trimIndent()

    val df = DataFrame.readCsv(
        csvContent.byteInputStream(),
//            colTypes = mapOf("with_timezone_offset" to ColType.Instant)   // *1
//            parserOptions = ParserOptions(dateTimeFormatter = ISO_OFFSET_DATE_TIME), // *2
    )

    println(df)
    println(df.schema())
    val parsed = df.toListOf<TimeZonesTest>().first()
    assertEquals(Instant.parse("2024-12-12T13:00:00+01:00"), parsed.with_timezone_offset)
}

This outputs:

with_timezone_offset without_timezone_offset
0     2024-12-12T12:00        2024-12-12T13:00

with_timezone_offset: kotlinx.datetime.LocalDateTime
without_timezone_offset: kotlinx.datetime.LocalDateTime

org.opentest4j.AssertionFailedError: 
Expected :2024-12-12T12:00:00Z
Actual   :2024-12-12T11:00:00Z

Changing the dateTimeFormatter (*2) has no effect on the test outcome.
Explicitly telling the parser to parse as Instant (*1) fixes the issue.

However, following the principle of least surprise IMHO it would be A LOT better to:

  • either have the conversion fail because the colum was processed as LocalDateTime, hence has no timezone information, and hence cannot be converted into an instant.
  • or automagically detect, that there is a timezone in the data, and directly parse the column as Instant.
@martinitus
Copy link
Author

I guess I could figure out how to change the implementation to fail (probably boils down to remove the respective converter somewhere). I am not sure if I could figure out how implement the "automagical instant detection" approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant