From 0c1bb7916e49fe5b38e6e98c2193651af8e47da7 Mon Sep 17 00:00:00 2001 From: Dumitru Date: Mon, 13 Jan 2025 00:37:57 +0100 Subject: [PATCH 1/3] Documentation for date-time pattern This commit adds new documentation detailing how to parse date-time columns from CSV files using a specific format pattern. It explains two approaches: providing the pattern as a raw string (e.g., "dd/MMM/yy h:mm a") and supplying a DateTimeFormatter instance (e.g., DateTimeFormatter.ofPattern("dd/MMM/yy h:mm a")). These options ensure that columns are correctly recognized and parsed as date-time rather than strings. --- docs/StardustDocs/topics/read.md | 39 ++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/docs/StardustDocs/topics/read.md b/docs/StardustDocs/topics/read.md index e7bf2ef46a..e6a5be5cc1 100644 --- a/docs/StardustDocs/topics/read.md +++ b/docs/StardustDocs/topics/read.md @@ -170,6 +170,45 @@ val df = DataFrame.readCSV( +### Work with specific date-time formats + +Sometimes date and date-time columns in your CSV can appear in different formats. + + + + + +
date
13/Jan/23 11:49 AM
14/Mar/23 5:35 PM
+ +Here, the date is represented by the format "dd/MMM/yy h:mm a". However, by default, the ISO_LOCAL_DATE_TIME format is used, so the column is not recognized as date-time but instead as a simple String. + +You can fix this in two ways: + +1) By providing the date-time pattern as raw string to the parser option: + + + +```kotlin +val df = DataFrame.readCSV( + file, + parserOptions = ParserOptions(dateTimePattern = "dd/MMM/yy h:mm a") +) +``` + + +2) By providing a DateTimeFormatter to the parser option: + + + +```kotlin +val df = DataFrame.readCSV( + file, + parserOptions = ParserOptions(dateTimeFormatter = DateTimeFormatter.ofPattern("dd/MMM/yy h:mm a")) +) +``` + + +These two approaches are essentially the same, just specified in different ways. ## Read from JSON From cd56517aa256b918e69af3fb10c27f3c63b3286b Mon Sep 17 00:00:00 2001 From: Dumitru Date: Tue, 14 Jan 2025 01:35:49 +0100 Subject: [PATCH 2/3] Update read.md --- docs/StardustDocs/topics/read.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/StardustDocs/topics/read.md b/docs/StardustDocs/topics/read.md index e6a5be5cc1..fe782db0a9 100644 --- a/docs/StardustDocs/topics/read.md +++ b/docs/StardustDocs/topics/read.md @@ -172,7 +172,7 @@ val df = DataFrame.readCSV( ### Work with specific date-time formats -Sometimes date and date-time columns in your CSV can appear in different formats. +When parsing date or date-time columns, you might encounter formats different from the default ISO_LOCAL_DATE_TIME. @@ -180,11 +180,11 @@ Sometimes date and date-time columns in your CSV can appear in different formats
date
14/Mar/23 5:35 PM
-Here, the date is represented by the format "dd/MMM/yy h:mm a". However, by default, the ISO_LOCAL_DATE_TIME format is used, so the column is not recognized as date-time but instead as a simple String. +Because the format here "dd/MMM/yy h:mm a" differs from the default (ISO_LOCAL_DATE_TIME), columns like this may be recognized as simple String values rather than actual date-time columns. -You can fix this in two ways: +You can fix this whenever you parse a string-based column (e.g., using readCsv, readTsv, or StringCol.convertTo<>()) by providing a custom date-time pattern. There are two ways to do this: -1) By providing the date-time pattern as raw string to the parser option: +1) By providing the date-time pattern as raw string to the ParserOptions: @@ -196,7 +196,7 @@ val df = DataFrame.readCSV( ``` -2) By providing a DateTimeFormatter to the parser option: +2) By providing a DateTimeFormatter to the ParserOptions: @@ -210,6 +210,8 @@ val df = DataFrame.readCSV( These two approaches are essentially the same, just specified in different ways. +> Note: Although these examples focus on reading CSV files, the parse operation can handle any String columns (for instance, readCsv, readTsv, StringCol.convertTo<>(), etc.) and accept a ParserOptions argument to configure locale, null-strings, date-time patterns, and more. For more details on the parse operation, see [`Parse Operation`](parse.md). + ## Read from JSON To read a JSON file, use the `.readJSON()` function. JSON files can be read from a file or a URL. From 79665b46d8afcf24c46e94f70dc35fbca809c413 Mon Sep 17 00:00:00 2001 From: Jolan Rensen Date: Tue, 14 Jan 2025 12:33:37 +0100 Subject: [PATCH 3/3] small reformatting and rephrasing regarding read docs date-time patterns --- docs/StardustDocs/topics/read.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/docs/StardustDocs/topics/read.md b/docs/StardustDocs/topics/read.md index fe782db0a9..e664f90587 100644 --- a/docs/StardustDocs/topics/read.md +++ b/docs/StardustDocs/topics/read.md @@ -172,7 +172,8 @@ val df = DataFrame.readCSV( ### Work with specific date-time formats -When parsing date or date-time columns, you might encounter formats different from the default ISO_LOCAL_DATE_TIME. +When parsing date or date-time columns, you might encounter formats different from the default `ISO_LOCAL_DATE_TIME`. +Like: @@ -180,11 +181,14 @@ When parsing date or date-time columns, you might encounter formats different fr
date
14/Mar/23 5:35 PM
-Because the format here "dd/MMM/yy h:mm a" differs from the default (ISO_LOCAL_DATE_TIME), columns like this may be recognized as simple String values rather than actual date-time columns. +Because the format here `"dd/MMM/yy h:mm a"` differs from the default (`ISO_LOCAL_DATE_TIME`), +columns like this may be recognized as simple `String` values rather than actual date-time columns. -You can fix this whenever you parse a string-based column (e.g., using readCsv, readTsv, or StringCol.convertTo<>()) by providing a custom date-time pattern. There are two ways to do this: +You can fix this whenever you [parse](parse.md) a string-based column (e.g., using [`DataFrame.readCSV()`](read.md#read-from-csv), +[`DataFrame.readTSV()`](read.md#read-from-csv), or [`DataColumn.convertTo<>()`](convert.md)) by providing +a custom date-time pattern. There are two ways to do this: -1) By providing the date-time pattern as raw string to the ParserOptions: +1) By providing the date-time pattern as raw string to the `ParserOptions` argument: @@ -196,7 +200,7 @@ val df = DataFrame.readCSV( ``` -2) By providing a DateTimeFormatter to the ParserOptions: +2) By providing a `DateTimeFormatter` to the `ParserOptions` argument: @@ -209,8 +213,14 @@ val df = DataFrame.readCSV( These two approaches are essentially the same, just specified in different ways. +The result will be a dataframe with properly parsed `DateTime` columns. -> Note: Although these examples focus on reading CSV files, the parse operation can handle any String columns (for instance, readCsv, readTsv, StringCol.convertTo<>(), etc.) and accept a ParserOptions argument to configure locale, null-strings, date-time patterns, and more. For more details on the parse operation, see [`Parse Operation`](parse.md). +> Note: Although these examples focus on reading CSV files, +> these `ParserOptions` can be supplied to any `String`-column-handling operation +> (like, `readCsv`, `readTsv`, `stringCol.convertTo<>()`, etc.) +> This allows you to configure the locale, null-strings, date-time patterns, and more. +> +> For more details on the parse operation, see the [`parse operation`](parse.md). ## Read from JSON