diff --git a/docs/StardustDocs/topics/read.md b/docs/StardustDocs/topics/read.md
index 8615c70060..d4999bcb4f 100644
--- a/docs/StardustDocs/topics/read.md
+++ b/docs/StardustDocs/topics/read.md
@@ -1,32 +1,41 @@
[//]: # (title: Read)
-The Kotlin DataFrame library supports CSV, TSV, JSON, XLS and XLSX, Apache Arrow input formats.
+The Kotlin DataFrame library supports CSV, TSV, JSON, XLS and XLSX, and Apache Arrow input formats.
-`read` method automatically detects input format based on file extension and content
+The `.read()` function automatically detects the input format based on file extension and content:
```kotlin
DataFrame.read("input.csv")
```
-Input string can be a file path or URL.
+The input string can be a file path or URL.
-## Reading CSV
+## Read from CSV
-All these calls are valid:
+To read a CSV file, use the `.readCSV()` function.
+
+To read a CSV file from a file:
```kotlin
import java.io.File
-import java.net.URL
DataFrame.readCSV("input.csv")
+// Alternatively
DataFrame.readCSV(File("input.csv"))
+```
+
+To read a CSV file from a URL:
+
+```kotlin
+import java.net.URL
+
DataFrame.readCSV(URL("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"))
```
-All `readCSV` overloads support different options.
-For example, you can specify custom delimiter if it differs from `,`, charset
-and column names if your CSV is missing them
+### Specify delimiter
+
+By default, CSV files are parsed using `,` as the delimiter. To specify a custom delimiter, use the `delimiter` argument:
@@ -41,7 +50,9 @@ val df = DataFrame.readCSV(
-Column types will be inferred from the actual CSV data. Suppose that CSV from the previous
+### Column type inference from CSV
+
+Column types are inferred from the CSV data. Suppose that the CSV from the previous
example had the following content:
@@ -51,7 +62,7 @@ example had the following content:
89 | abc | 7.1 | false |
-[`DataFrame`](DataFrame.md) schema we get is:
+Then the [`DataFrame`](DataFrame.md) schema we get is:
```text
A: Int
@@ -60,7 +71,7 @@ C: Double
D: Boolean?
```
-[`DataFrame`](DataFrame.md) will try to parse columns as JSON, so when reading following table with JSON object in column D:
+[`DataFrame`](DataFrame.md) tries to parse columns as JSON, so when reading the following table with JSON object in column D:
A | D |
@@ -77,7 +88,7 @@ D:
C: Int
```
-For column where values are lists of JSON values:
+For a column where values are lists of JSON values:
A | G |
12 | [{"B":1,"C":2,"D":3},{"B":1,"C":3,"D":2}] |
@@ -92,7 +103,7 @@ G: *
D: Int
```
-### Dealing with locale specific numbers
+### Work with locale-specific numbers
Sometimes columns in your CSV can be interpreted differently depending on your system locale.
@@ -102,8 +113,8 @@ Sometimes columns in your CSV can be interpreted differently depending on your s
41,111 |
-Here comma can be decimal or thousands separator, thus different values.
-You can deal with it in two ways
+Here a comma can be decimal or thousands separator, thus different values.
+You can deal with it in two ways:
1) Provide locale as a parser option
@@ -132,20 +143,34 @@ val df = DataFrame.readCSV(
-## Reading JSON
+## Read from JSON
+
+To read a JSON file, use the `.readJSON()` function. JSON files can be read from a file or a URL.
+
+Note that after reading a JSON with a complex structure, you can get hierarchical
+[`DataFrame`](DataFrame.md): [`DataFrame`](DataFrame.md) with `ColumnGroup`s and [`FrameColumn`](DataColumn.md#framecolumn)s.
+
+To read a JSON file from a file:
+
+
+
+```kotlin
+val df = DataFrame.readJson(file)
+```
+
+
-Basics for reading JSONs are the same: you can read from file or from remote URL.
+To read a JSON file from a URL:
```kotlin
DataFrame.readJson("https://covid.ourworldindata.org/data/owid-covid-data.json")
```
-Note that after reading a JSON with a complex structure, you can get hierarchical
-[`DataFrame`](DataFrame.md): [`DataFrame`](DataFrame.md) with `ColumnGroup`s and [`FrameColumn`](DataColumn.md#framecolumn)s.
+### Column type inference from JSON
-Also note that type inferring process for JSON is much simpler than for CSV.
-JSON string literals are always supposed to have String type, number literals
-take different `Number` kinds, boolean literals are converted to `Boolean`.
+Type inference for JSON is much simpler than for CSV.
+JSON string literals are always supposed to have String type. Number literals
+take different `Number` kinds. Boolean literals are converted to `Boolean`.
Let's take a look at the following JSON:
@@ -178,17 +203,13 @@ Let's take a look at the following JSON:
]
```
-We can read it from file
-
-
+We can read it from file:
```kotlin
val df = DataFrame.readJson(file)
```
-
-
-Corresponding [`DataFrame`](DataFrame.md) schema will be
+The corresponding [`DataFrame`](DataFrame.md) schema is:
```text
A: String
@@ -200,7 +221,9 @@ D: Boolean?
Column A has `String` type because all values are string literals, no implicit conversion is performed. Column C
has `Number` type because it's the least common type for `Int` and `Double`.
-### JSON Reading Options: Type Clash Tactic
+### JSON parsing options
+
+#### Manage type clashes
By default, if a type clash occurs when reading JSON, a new column group is created consisting of: "value", "array", and
any number of object properties:
@@ -251,9 +274,9 @@ For this case, you can set `typeClashTactic = JSON.TypeClashTactic.ANY_COLUMNS`
This option is also possible to set in the Gradle- and KSP plugin by providing `jsonOptions`.
-### JSON Reading Options: Key/Value Paths
+#### Specify Key/Value Paths
-If you have some JSON looking like
+If you have a JSON like:
```json
{
@@ -280,10 +303,10 @@ If you have some JSON looking like
}
```
-you will get a column for each dog, which becomes an issue when you have a lot of dogs.
-This issue is especially noticeable when generating data schemas from the JSON, as you might even run out of memory
-when doing that due to the sheer number of generated interfaces.\
-Instead, you can use `keyValuePaths` to specify paths to the objects that should be read as key value frame columns.
+You will get a column for each dog, which becomes an issue when you have a lot of dogs.
+This issue is especially noticeable when generating data schemas from JSON, as you might run out of memory
+when doing that due to the sheer number of generated interfaces. Instead, you can use `keyValuePaths` to specify paths
+to the objects that should be read as key value frame columns.
This can be the difference between:
@@ -342,22 +365,35 @@ Only the bracket notation of json path is supported, as well as just double quot
For more examples, see the "examples/json" module.
-## Reading Excel
+## Read from Excel
-Add dependency:
+Before you can read data from Excel, add the following dependency:
```kotlin
implementation("org.jetbrains.kotlinx:dataframe-excel:$dataframe_version")
```
-Right now [`DataFrame`](DataFrame.md) supports reading Excel spreadsheet formats: xls, xlsx.
+To read an Excel spreadsheet, use the `.readExcel()` function. Excel spreadsheets can be read from a file or a URL. Supported
+Excel spreadsheet formats are: xls, xlsx.
+
+To read an Excel spreadsheet from a file:
+
+```kotlin
+val df = DataFrame.readExcel(file)
+```
-You can read from file or URL.
+To read an Excel spreadsheet from a URL:
+
+```kotlin
+DataFrame.readExcel("https://example.com/data.xlsx")
+```
+
+### Cell type inference from Excel
Cells representing dates will be read as `kotlinx.datetime.LocalDateTime`.
-Cells with number values, including whole numbers such as "100", or calculated formulas will be read as `Double`
+Cells with number values, including whole numbers such as "100", or calculated formulas will be read as `Double`.
-Sometimes cells can have wrong format in Excel file, for example you expect to read column of String:
+Sometimes cells can have the wrong format in an Excel file. For example, you expect to read a column of `String`:
```text
IDS
@@ -367,9 +403,9 @@ B100
C100
```
-You will get column of Serializable instead (common parent for Double & String)
+You will get column of `Serializable` instead (common parent for `Double` and `String`).
-You can fix it using convert:
+You can fix it using the `.convert()` function:
@@ -387,21 +423,16 @@ df1["IDS"].type() shouldBe typeOf()
-## Reading Apache Arrow formats
+## Read Apache Arrow formats
-Add dependency:
+Before you can read data from Apache Arrow format, add the following dependency:
```kotlin
implementation("org.jetbrains.kotlinx:dataframe-arrow:$dataframe_version")
```
-
-Make sure to follow [Apache Arrow Java compatibility](https://arrow.apache.org/docs/java/install.html#java-compatibility) guide when using Java 9+
-
+To read Apache Arrow formats, use the `.readArrowFeather()` function:
-[`DataFrame`](DataFrame.md) supports reading [Arrow interprocess streaming format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format)
-and [Arrow random access format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files)
-from raw Channel (ReadableByteChannel for streaming and SeekableByteChannel for random access), InputStream, File or ByteArray.
```kotlin
@@ -409,3 +440,11 @@ val df = DataFrame.readArrowFeather(file)
```
+
+[`DataFrame`](DataFrame.md) supports reading [Arrow interprocess streaming format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format)
+and [Arrow random access format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files)
+from raw Channel (ReadableByteChannel for streaming and SeekableByteChannel for random access), InputStream, File or ByteArray.
+
+> If you use Java 9+, follow the [Apache Arrow Java compatibility](https://arrow.apache.org/docs/java/install.html#java-compatibility) guide.
+>
+{style="note"}