diff --git a/README.md b/README.md index 7537d99a..11ed71a6 100644 --- a/README.md +++ b/README.md @@ -19,11 +19,11 @@ V3IO Frames (**"Frames"**) is a multi-model open-source data-access library, dev - [User Authentication](#user-authentication) - [Client Constructor](#client-constructor) - [Common Client Method Parameters](#client-common-method-params) -- [The create Method](#method-create) -- [The write Method](#method-write) -- [The read Method](#method-read) -- [The delete Method](#method-delete) -- [The execute Method](#method-execute) +- [create Method](#method-create) +- [write Method](#method-write) +- [read Method](#method-read) +- [delete Method](#method-delete) +- [execute Method](#method-execute) ### Overview @@ -41,11 +41,12 @@ You can then use the client methods to perform different data operations on the The `Client` class features the following methods for supporting basic data operations: -- [`create`](#method-create) — create a new NoSQL or TSDB table or a stream ("the backend"). -- [`delete`](#method-delete) — delete the backend. -- [`read`](#method-read) — read data from the backend (as a pandas DataFrame or DataFrame iterator). -- [`write`](#method-write) — write one or more DataFrames to the backend. -- [`execute`](#method-execute) — execute a command on the backend. Each backend may support multiple commands. +- [`create`](#method-create) — creates a new TSDB table or a stream ("backend data"). +- [`delete`](#method-delete) — deletes a table or stream or specific table items +- [`read`](#method-read) — reads data from a table or stream into pandas DataFrames. +- [`write`](#method-write) — writes data from pandas DataFrames to a table or stream. +- [`execute`](#method-execute) — executes a custom command on a table or stream. + Each backend may support multiple commands. #### Backend Types @@ -99,46 +100,46 @@ All Frames operations are executed via an object of the `Client` class. #### Syntax ```python -Client(address, user, password, token, container) +Client(address='', container='', user='', password='', token='') ``` #### Parameters and Data Members -- **address** — the address of the Frames service (`framesdb`). +- **address** — The address of the Frames service (`framesdb`).
When running locally on the platform (for example, from a Jupyter Notebook service), set this parameter to `framesd:8081`.
When connecting to the platform remotely, set this parameter to the API address of Frames platform service of the parent tenant. You can copy this address from the **API** column of the V3IO Frames service on the **Services** platform dashboard page. - - **Type:** String + - **Type:** `str` - **Requirement:** Required -- **container** — the name of the platform data container that contains the backend data. +- **container** — The name of the platform data container that contains the backend data. For example, `"bigdata"` or `"users"`. - - **Type:** String + - **Type:** `str` - **Requirement:** Required -- **user** — the username of a platform user with permissions to access the backend data. +- **user** — The username of a platform user with permissions to access the backend data. - - **Type:** String + - **Type:** `str` - **Requirement:** Required when neither the [`token`](#client-param-token) parameter or the authentication environment variables are set. See [User Authentication](#user-authentication).
When the `user` parameter is set, the [`password`](#client-param-password) parameter must also be set to a matching user password. -- **password** — a platform password for the user configured in the [`user`](#client-param-user) parameter. +- **password** — A platform password for the user configured in the [`user`](#client-param-user) parameter. - - **Type:** String + - **Type:** `str` - **Requirement:** Required when the [`user`](#client-param-user) parameter is set. See [User Authentication](#user-authentication). -- **token** — a valid platform access key that allows access to the backend data. +- **token** — A valid platform access key that allows access to the backend data. To get this access key, select the user profile icon on any platform dashboard page, select **Access Tokens**, and copy an existing access key or create a new key. - - **Type:** String + - **Type:** `str` - **Requirement:** Required when neither the [`user`](#client-param-user) or [`password`](#client-param-password) parameters or the authentication environment variables are set. See [User Authentication](#user-authentication). @@ -157,46 +158,75 @@ client = v3f.Client("framesd:8081", user="iguazio", password="mypass", container All client methods receive the following common parameters: -- **backend** — the backend data type for the operation. +- **backend** — The backend data type for the operation. See the backend-types descriptions in the [overview](#backend-types). - - **Type:** String + - **Type:** `str` - **Valid Values:** `"csv"` | `"kv"` | `"stream"` | `"tsdb"` - **Requirement:** Required -- **table** — the relative path to the backend data — a directory in the target platform data container (as configured for the client object) that represents a platform data collection, such as a TSDB or NoSQL table or a stream. +- **table** — The relative path to the backend data — A directory in the target platform data container (as configured for the client object) that represents a TSDB or NoSQL table or a data stream. For example, `"mytable"` or `"examples/tsdb/my_metrics"`. - - **Type:** String - - **Requirement:** Required + - **Type:** `str` + - **Requirement:** Required unless otherwise specified in the method-specific documentation Additional method-specific parameters are described for each method. ### create Method -Creates a new data collection (table/stream) in a platform data container according to the configured backend. +Creates a new TSDB table or a stream in a platform data container, according to the specified backend type. The `create` method is supported by the `tsdb` and `stream` backends, but not by the `kv` backend, because NoSQL tables in the platform don't need to be created prior to ingestion; when ingesting data into a table that doesn't exist, the table is automatically created. - [Syntax](#method-create-syntax) +- [Common parameters](#method-create-common-params) - [`tsdb` backend `create` parameters](#method-create-params-tsdb) - [`stream` backend `create` parameters](#method-create-params-stream) +- [Return Value](#method-create-return-value) #### Syntax ```python -create(backend=, table=, attrs=) +create(backend, table, attrs=None) ``` + + + +#### Common create Parameters + +All Frames backends that support the `create` method support the following common parameters: + +- **attrs** — A dictionary of `: ` pairs for passing additional backend-specific parameters (arguments). + + - **Type:** dict + - **Requirement:** Optional + - **Default Value:** `None` #### tsdb Backend create Parameters -- **rate** (Required) — `string` — the ingestion rate TSDB's metric-samples, as `"[0-9]+/[smh]"` (where `s` = seconds, `m` = minutes, and `h` = hours); for example, `"1/s"` (one sample per minute). +The following `tsdb` backend parameters are passed via the [`attrs`](#method-create-param-attrs) parameter of the `create` method: + +- **rate** — The ingestion rate TSDB's metric-samples, as `"[0-9]+/[smh]"` (where `s` = seconds, `m` = minutes, and `h` = hours); for example, `"1/s"` (one sample per minute). The rate should be calculated according to the slowest expected ingestion rate. -- **aggregates** (Optional) -- **aggregation-granularity** (Optional) + + - **Type:** `str` + - **Requirement:** Required + +- **aggregates** — Default aggregates to calculate in real time during the samples ingestion, as a comma-separated list of supported aggregation functions. + + - **Type:** `str` + - **Requirement:** Optional + +- **aggregation-granularity** — Aggregation granularity; i.e., a time interval for applying the aggregation functions, if configured in the [`aggregates`](#method-create-tsdb-param-aggregates) parameter. + + - **Type:** `str` + - **Requirement:** Optional For detailed information about these parameters, refer to the [V3IO TSDB documentation](https://github.com/v3io/v3io-tsdb#v3io-tsdb). @@ -208,8 +238,10 @@ client.create("tsdb", "/mytable", attrs={"rate": "1/m"}) #### stream Backend create Parameters -- **shards** (Optional) (default: `1`) — `int` — the number of stream shards to create. -- **retention_hours** (Optional) (default: `24`) — `int` — the stream's retention period, in hours. +The following `stream` backend parameters are passed via the [`attrs`](#method-create-param-attrs) parameter of the `create` method: + +- **shards** (Optional) (default: `1`) — `int` — The number of stream shards to create. +- **retention_hours** (Optional) (default: `24`) — `int` — The stream's retention period, in hours. For detailed information about these parameters, refer to the [platform streams documentation](https://www.iguazio.com/docs/concepts/latest-release/streams). @@ -218,32 +250,55 @@ Example: client.create("stream", "/mystream", attrs={"shards": 6}) ``` + +#### Return Value + +Returns a new Frames `Client` data object. + ### write Method -Writes data from a DataFrame to a data collection (table/stream) in a platform data container according to the configured backend. +Writes data from a DataFrame to a table or stream in a platform data container, according to the specified backend type. - [Syntax](#method-write-syntax) -- [Common parameters](#method-write-backend-common-params) +- [Common parameters](#method-write-common-params) +- [`tsdb` backend `write` parameters](#method-write-params-tsdb) - [`kv` backend `write` parameters](#method-write-params-kv) #### Syntax + + +```python +write(backend, table, dfs, condition='', labels=None, max_in_message=0, + index_cols=None, partition_keys=None) ``` - +- When the value of the [`iterator`](#method-read-param-iterator) parameter is `False` (default) — returns a single DataFrame. +- When the value of the `iterator` parameter is `True` — returns a + DataFrames iterator. + The returned DataFrames include a `"labels"` DataFrame attribute with backend-specific data, if applicable; for example, for the `stream` backend, this attribute holds the sequence number of the last stream record that was read. + + #### Common write Parameters -All Frames backends that support the `write` method support the following common parameters, which can be set in the `attrs` method parameter: +All Frames backends that support the `write` method support the following common parameters: -- **dfs** — list of DataFrames to write. -- **index_cols** (Optional) (default: none) — specify specific index columns, by default DataFrame's index columns will be used. -- **labels** (Optional) (default: none) +- **dfs** (Required) — A single DataFrame, a list of DataFrames, or a DataFrames iterator — One or more DataFrames containing the data to write. +- **index_cols** (Optional) (default: `None`) — `[]str` — A list of column (attribute) names to be used as index columns for the write operation, regardless of any index-column definitions in the DataFrame. + By default, the DataFrame's index columns are used. +
+ > **Note:** The significance and supported number of index columns is backend specific. + > For example, the `kv` backend supports only a single index column for the primary-key item attribute, while the `tsdb` backend supports additional index columns for metric labels. +- **labels** (Optional) (default: `None`) — This parameter is currently defined for all backends but is used only for the TSDB backend, therefore it's documented as part of the `write` method's [`tsdb` backend parameters](#method-write-params-tsdb). - **max_in_message** (Optional) (default: `0`) -- **partition_keys** (Optional) (default: none) (**Not yet supported**) +- **partition_keys** (Optional) (default: `None`) — `[]str` — [**Not supported in this version**] Example: ```python @@ -253,22 +308,29 @@ df.set_index("name") client.write(backend="kv", table="mytable", dfs=df) ``` + +#### tsdb Backend write Parameters + +- **labels** (Optional) (default: `None`) — `dict` — A dictionary of `
, attrs=) +read(backend='', table='', query='', columns=None, filter='', group_by='', + limit=0, data_format='', row_layout=False, max_in_message=0, marker='', + iterator=False, **kw) ``` - + #### Common read Parameters -All Frames backends that support the `read` method support the following common parameters, which can be set in the `attrs` method parameter: +All Frames backends that support the `read` method support the following common parameters: -- **iterator** — `bool` — return iterator of DataFrames or (if False) just one DataFrame -- **filter** — `string` — query filter (can't be used with query) -- **columns** — `[]str` — list of columns to pass (can't be used with query) -- **data_format** — `string` — data format (**Not yet supported**) -- **marker** — `string` — query marker (**Not yet supported**) -- **limit** — `int` — maximal number of rows to return (**Not yet supported**) -- **row_layout** — `bool` — weather to use row layout (vs the default column layout) (**Not yet supported**) +- **iterator** — (Optional) (default: `False`) — `bool` — `True` to return a DataFrames iterator; `False` to return a single DataFrame. +- **filter** (Optional) — `str` — A query filter. +
+ This parameter can't be used concurrently with the `query` parameter. +- **columns** — `[]str` — A list of attributes (columns) to return. +
+ This parameter can't be used concurrently with the `query` parameter. +- **data_format** — `str` — The data format. [**Not supported in this version**] +- **marker** — `str` — A query marker. [**Not supported in this version**] +- **limit** — `int` — The maximum number of rows to return. [**Not supported in this version**] +- **row_layout** (Optional) (default: `False`) — `bool` — `True` to use a row layout; `False` (default) to use a column layout. [**Not supported in this version**] #### tsdb Backend read Parameters -- **start** — `string` -- **end** — `string` -- **step** — `string` -- **aggregators** — `string` -- **aggregationWindow** — `string` -- **query** — `string` — query in SQL format -- **group_by** — `string` — query group by (can't be used with query) -- **multi_index** — `bool` — get the results as a multi index data frame where the labels are used as indexes in addition to the timestamp, or if `False` (default behavior) only the timestamp will function as the index. +- **start** — `str` — Start (minimum) time for the read operation, as a string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format `"now"` or `"now-[0-9]+[mhd]"` (where `m` = minutes, `h` = hours, and `'d'` = days), or 0 for the earliest time. + For example: `"2016-01-02T15:34:26Z"`; `"1451748866"`; `"now-90m"`; `"0"`. +
+ The default start time is ` - 1h`. +- **end** — `str` — End (maximum) time for the read operation, as a string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format `"now"` or `"now-[0-9]+[mhd]"` (where `m` = minutes, `h` = hours, and `'d'` = days), or 0 for the earliest time. + For example: `"2018-09-26T14:10:20Z"`; `"1537971006000"`; `"now-3h"`; `"now-7d"`. +
+ The default end time is `"now"`. +- **step** (Optional) — `str` — For an aggregation query, this parameter specifies the aggregation interval for applying the aggregation functions; by default, the aggregation is applied to all sample data within the requested time range.
+ When the query doesn't include aggregates, this parameter specifies an interval for downsampling the raw sample data. +- **aggregators** (Optional) — `str` — Aggregation information to return, as a comma-separated list of supported aggregation functions. +- **aggregationWindow** (Optional) — `str` — Aggregation interval for applying the aggregation functions, if set in the [`aggregators`](#"method-read-tsdb-param-aggregators) or [`query`](#method-read-tsdb-param-query) parameters. +- **query** (Optional) — `str` — A query string in SQL format. + > **Note:** When the `query` parameter is set, you can either specify the target table within the query string (`FROM
`) or by setting the `table` parameter of the `read` method to the table path. + > When the `query` string specifies the target table, the value of the `table` parameter (if set) is ignored. +- **group_by** (Optional) — `str` — A group-by query string. +
+ This parameter can't be used concurrently with the `query` parameter. +- **multi_index** (Optional) — `bool` — `True` to receive the read results as multi-index DataFrames where the labels are used as index columns in addition to the metric sample-time primary-key attribute; `False` (default) only the timestamp will function as the index. + For detailed information about these parameters, refer to the [V3IO TSDB documentation](https://github.com/v3io/v3io-tsdb#v3io-tsdb). @@ -336,12 +417,12 @@ df = client.read(backend="tsdb", query="select avg(cpu) as cpu, avg(diskio), avg - **reset_index** — `bool` — Reset the index. When set to `false` (default), the DataFrame will have the key column of the v3io kv as the index column. When set to `true`, the index will be reset to a range index. -- **max_in_message** — `int` — Maximal number of rows per message -- **sharding_keys** — `[]string` (**Experimental**) — a list of specific sharding keys to query, for range-scan formatted tables only. -- **segments** — `[]int64` (**Not yet supported**) -- **total_segments** — `int64` (**Not yet supported**) -- **sort_key_range_start** — `string` (**Not yet supported**) -- **sort_key_range_end** — `string` (**Not yet supported**) +- **max_in_message** — `int` — The maximum number of rows per message. +- **sharding_keys** — `[]string` (**Experimental**) — A list of specific sharding keys to query, for range-scan formatted tables only. +- **segments** — `[]int64` [**Not supported in this version**] +- **total_segments** — `int64` [**Not supported in this version**] +- **sort_key_range_start** — `str` [**Not supported in this version**] +- **sort_key_range_end** — `str` [**Not supported in this version**] For detailed information about these parameters, refer to the platform's NoSQL documentation. @@ -353,12 +434,12 @@ df = client.read(backend="kv", table="mytable", filter="col1>666") #### stream Backend read Parameters -- **seek** — `string` — valid values: `"time" | "seq"/"sequence" | "latest" | "earliest"`. +- **seek** — `str` — Valid values: `"time" | "seq"/"sequence" | "latest" | "earliest"`.
If the `"seq"|"sequence"` seek type is set, you need to provide the desired record sequence ID via the [`sequence`](#method-read-stream-param-sequence) parameter.
If the `time` seek type is set, you need to provide the desired start time via the `start` parameter. -- **shard_id** — `string` +- **shard_id** — `str` - **sequence** — `int64` (Optional) For detailed information about these parameters, refer to the [platform streams documentation](https://www.iguazio.com/docs/concepts/latest-release/streams). @@ -368,12 +449,24 @@ Example: df = client.read(backend="stream", table="mytable", seek="latest", shard_id="5") ``` + +#### Return Value + +- When the value of the [`iterator`](#method-read-param-iterator) parameter is `False` (default) — returns a single DataFrame. +- When the value of the `iterator` parameter is `True` — returns a + DataFrames iterator. + +> **Note:** The returned DataFrames include a `labels` DataFrame attribute with backend-specific data, if applicable. +> For example, for the `stream` backend, this attribute holds the sequence number of the last stream record that was read. + + ### delete Method -Deletes a data collection (table/stream) in a platform data container according to the configured backend. -
-The `kb` backend also supports an optional [`filter`](#method-delete-kv-param-filter) parameter that can be used to delete only specific items in a NoSQL tables. +Deletes a table or stream or specific table items from a platform data container, according to the specified backend type. - [Syntax](#method-delete-syntax) - [`tsdb` backend `delete` parameters](#method-delete-params-tsdb) @@ -383,14 +476,20 @@ The `kb` backend also supports an optional [`filter`](#method-delete-kv-param-fi #### Syntax ```python -delete(backend=, table=
, attrs=) +delete(backend, table, filter='', start='', end='') ``` #### tsdb Backend delete Parameters -- **start** — `string` — delete since start -- **end** — `string` — delete since start +- **start** — `str` — Start (minimum) time for the delete operation, as a string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format `"now"` or `"now-[0-9]+[mhd]"` (where `m` = minutes, `h` = hours, and `'d'` = days), or 0 for the earliest time. + For example: `"2016-01-02T15:34:26Z"`; `"1451748866"`; `"now-90m"`; `"0"`. +
+ The default start time is ` - 1h`. +- **end** — `str` — End (maximum) time for the delete operation, as a string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format `"now"` or `"now-[0-9]+[mhd]"` (where `m` = minutes, `h` = hours, and `'d'` = days), or 0 for the earliest time. + For example: `"2018-09-26T14:10:20Z"`; `"1537971006000"`; `"now-3h"`; `"now-7d"`. +
+ The default end time is `"now"`. > **Note:** When neither the `start` or `end` parameters are set, the entire TSDB table is deleted. @@ -404,7 +503,7 @@ df = client.delete(backend="tsdb", table="mytable", start="now-1d", end="now-5h" #### kv Backend delete Parameters -- **filter** — `string` — a platform filter expression that identifies specific items to delete. +- **filter** — `str` — A platform filter expression that identifies specific items to delete. For detailed information about platform filter expressions, see the [platform documentation](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/#filter-expression). > **Note:** When the `filter` parameter isn't set, the entire table is deleted. @@ -417,12 +516,32 @@ df = client.delete(backend="kv", table="mytable", filter="age > 40") ### execute Method -Extends the basic CRUD functionality of the other client methods via custom commands: +Extends the basic CRUD functionality of the other client methods via custom commands. +- [Syntax](#method-execute-syntax) +- [Common parameters](#method-execute-common-params) - [tsdb backend commands](#method-execute-tsdb-cmds) - [kv backend commands](#method-execute-kv-cmds) - [stream backend commands](#method-execute-stream-cmds) + +#### Syntax + +```python +execute(backend, table, command='', args=None) +``` + + +#### Common execute Parameters + +All Frames backends that support the `execute` method support the following common parameters: + +- **args** — A dictionary of `: ` pairs for passing command-specific parameters (arguments). + + - **Type:** dict + - **Requirement:** Optional + - **Default Value:** `None` + ### tsdb Backend execute Commands @@ -431,7 +550,7 @@ Currently, no `execute` commands are available for the `tsdb` backend. ### kv Backend execute Commands -- **infer | inferschema** — infers the data schema of a given NoSQL table and creates a schema file for the table. +- **infer | inferschema** — Infers the data schema of a given NoSQL table and creates a schema file for the table. Example: ```python @@ -439,7 +558,8 @@ Currently, no `execute` commands are available for the `tsdb` backend. ````