-
Notifications
You must be signed in to change notification settings - Fork 22
Table (Spider)
[Table of Contents](https://github.com/dell-oss/Doradus/wiki/Spider Databases: Table-of-Contents) | Previous | Next
Spider Data Model: Table
A table is a named set of objects. Table names are identifiers and must be unique within the same application. Example table names are: Message
, LogSnapshot
, and Security_4xx_Events
.
A table can include the following components:
-
Options: Table-level options (described below).
-
Fields: Definitions of scalar, link, and group fields that the table uses.
-
Aliases: Alias definitions, which are schema-defined expressions that can then be used in queries.
The general structure of a table definition in XML is shown below:
<tables>
<table name="Message">
<fields>
// fields
</fields>
<aliases>
// aliases
</aliases>
</table>
...
</tables>
In same structure in JSON is shown below:
"tables": {
"Message": {
"fields": {
// fields
},
"aliases": {
// aliases
}
},
...
}
Spider applications support table-level options, which engage two different features: automatic data aging and sharding. Below is an example in XML:
<table name="Message">
<options>
<option name="aging-field">SendDate</option>
<option name="retention-age">5 YEARS</option>
<option name="sharding-field">SendDate</option>
<option name="sharding-granularity">DAY</option>
<option name="sharding-start">2010-07-17</option>
</options>
...
</table>
Data aging causes objects within the table to be deleted when a timestamp field reaches a defined age. Aging is performed in a background task whose schedule can be controlled. An object is deleted when the data-aging task executes and finds it is equal to or greater than the defined age.
Data aging is controlled by the following options
:
-
aging-field
: Defines the field to use for data aging. It is required if a non-zeroretention-age
is specified. The aging field must be defined in the table’s schema, and its type must betimestamp
. -
retention-age
: Enables data aging and defines the retention age. It must be specified in the format:<value> [<units>]
Where
<value>
is a positive integer and<units>
, if provided, isdays
,months
, oryears
;days
is the default. An object’s age is the difference between "now" (when the aging task executes) and the object’s aging field value. When this age is greater than theretention-age
, the object is deleted. Ifretention-age
is set to 0, aging is disabled. -
aging-check-frequency
: This option specifies how often a background task should check the table for expired objects. At the table level, this option overrides the default value, if specified, at the application level. The value of this option must be in the form:<value> <units>
Where
<value>
is a positive integer and<units>
isMINUTES
,HOURS
, orDAYS
. (Singular forms of these mnemonics are also allowed.) If a non-zeroretention-age
is specified butaging-check-frequency
is not specified, it defaults to1 DAY
.
When data aging is enabled, each object’s aging field can be modified at any time. An object is deleted only when the aging field has a value.
Table sharding improves the performance of certain queries for tables with large populations (millions of objects or more). To benefit from sharding, a table must meet the following conditions:
-
Objects have a timestamp field whose value is stable, meaning it is rarely modified. In the example schema, the
Message
table’sSendDate
field works well because it is rarely modified once a message is created. This timestamp field is used as the sharding field. -
To benefit from a sharded table, queries must include an equality clause or range clause that uses the sharding field. For example, both of the following queries select objects in specific time frames:
GET /Msgs/Message/_query?q=SendDate=PERIOD().LASTWEEK AND ... GET /Msgs/Message/_query?q=SendDate=[2014-01-01 TO 2014-03-01] AND ...
Normally, Doradus Spider creates a single term vector for each field/term combination. For example, the term vector with key "Body/the
" holds references to all objects that use the term “the” in the field Body
. For common terms, the term vector may point to every object in the table, and very large term vectors slow query performance. When sharding is enabled, separate term vectors are created for objects in each shard. Faster searching occurs when the sharding field is then included in queries.
Sharding is enabled with the following table-level options
:
-
sharding-field
: This option enables sharding and identifies the sharding field. Its value must be atimestamp
field defined in the schema. -
sharding-granularity
: This option specifies what time period causes objects to be assigned to a new shard. It can beHOUR
,DAY
,WEEK
, orMONTH
. If not specified, it defaults toMONTH
. The value should be chosen so that each shard as a reasonable number of objects (< 1 million). -
sharding-start
: This option specifies the date on which sharding begins for the table. Objects whose sharding-field value is null or less than thesharding-start
value are considered "un-sharded" and assigned to shard #0. Objects whose sharding field is greater than or equal to thesharding-start
value are assigned a shard number based on the difference between the two values and thesharding-granularity
. If not explicitly assigned,sharding-start
defaults to “now”, meaning the timestamp of the schema change that enables sharding.
Each object’s sharding field value can be modified at any time. If the modified value does not cause the object to be assigned a new shard number, the update is efficient. However, if the sharding field is assigned a value that changes the object’s shard number, the update is slower since the object’s fields are re-indexed.
Table sharding can also benefit certain links that have very high fan-outs. See the description later on Sharded Links.
Technical Documentation
[Doradus OLAP Databases](https://github.com/dell-oss/Doradus/wiki/Doradus OLAP Databases)
- Architecture
- OLAP Database Overview
- OLAP Data Model
- Doradus Query Language (DQL)
- OLAP Object Queries
- OLAP Aggregate Queries
- OLAP REST Commands
- Architecture
- Spider Database Overview
- Spider Data Model
- Doradus Query Language (DQL)
- Spider Object Queries
- Spider Aggregate Queries
- Spider REST Commands
- [Installing and Running Doradus](https://github.com/dell-oss/Doradus/wiki/Installing and Running Doradus)
- [Deployment Guidelines](https://github.com/dell-oss/Doradus/wiki/Deployment Guidelines)
- [Doradus Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Doradus Configuration and Operation)
- [Cassandra Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Cassandra Configuration and Operation)