Schema is used to define the names, data types, and other information for the columns of a Pinot table.
The Pinot schema is composed of:
Field | Description |
---|---|
schemaName | Defines the name of the schema. This is usually the same as the table name. The offline and the realtime table of a hybrid table should use the same schema. |
dimensionFieldSpecs | A dimensionFieldSpec is defined for each dimension column. For more details, scroll down to DimensionFieldSpec. |
metricFieldSpecs | A metricFieldSpec is defined for each metric column. For more details, scroll down to MetricFieldSpec. |
dateTimeFieldSpec | A dateTimeFieldSpec is defined for the time columns. There can be multiple time columns. For more details, scroll down to DateTimeFieldSpec. |
Below is a detailed description of each type of field spec.
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec:
Property | Description |
---|---|
name | Name of the dimension column. |
dataType | Data type of the dimension column. Can be INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING, BYTES. |
defaultNullValue | Represents null values in the data, since Pinot doesn't support storing null column values natively (as part of its on-disk storage format). If not specified, an internal default null value is used as listed here. |
singleValueField | Boolean indicating if this is a single-valued or a multi-valued column. Multi-valued column is modeled as a list, where the order of the values are preserved and duplicate values are allowed. Individual rows don’t necessarily have the same number of values. Typical use case for this would be a column such as skillSet for a person (one row in the table) that can have multiple values such as Real Estate, Mortgages . The default null value for a multi-valued column is a single defaultNullValue , e.g. [Integer.MIN_VALUE] . |
Data Type | Internal Default Null Value |
---|---|
INT | Integer.MIN_VALUE |
LONG | Long.MIN_VALUE |
FLOAT | Float.NEGATIVE_INFINITY |
DOUBLE | Double.NEGATIVE_INFINITY |
BOOLEAN | 0 (false ) |
TIMESTAMP | 0 (1970-01-01 00:00:00 UTC ) |
STRING | "null" |
BYTES | byte array of length 0 |
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
Property | Description |
---|---|
name | Name of the metric column |
dataType | Data type of the column. Can be INT, LONG, FLOAT, DOUBLE, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value) |
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. |
Data Type | Internal Default Null Value |
---|---|
INT | 0 |
LONG | 0 |
FLOAT | 0.0 |
DOUBLE | 0.0 |
BYTES | byte array of length 0 |
A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec
Property | Description |
---|---|
name | Name of the date time column |
dataType | Data type of the date time column. Can be STRING, INT, LONG |
format |
The format of the time column. The syntax of the format is timeFormat can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified. For example: 1:MILLISECONDS:EPOCH - epoch millis 1:HOURS:EPOCH - epoch hours 1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd - date specified like 1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - date specified
like |
granularity | The granularity in which the column is bucketed. The syntax of granularity
is
bucket size:bucket unit
For example, the format can be milliseconds 1:MILLISECONDS:EPOCH ,
but bucketed to 15 minutes i.e. we only have one value for every 15 minute
interval, in which case granularity can be specified as 15:MINUTES
|
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec. |
Apart from these, there's some advanced fields. These are common to all field specs.
Property | Description |
---|---|
maxLength | Max length of this column |
virtualColumnProvider | Column value provider |