Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/en/administrator-guide/config/fe_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ There are two ways to configure FE configuration items:

2. Dynamic configuration

After the FE starts, you can set the configuration items dynamically through the following commands. This command requires administrator priviledge.
After the FE starts, you can set the configuration items dynamically through the following commands. This command requires administrator privilege.

`ADMIN SET FRONTEND CONFIG (" fe_config_name "=" fe_config_value ");`

Expand Down
10 changes: 5 additions & 5 deletions docs/en/administrator-guide/dynamic-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# Dynamic Partition

Dynamic partition is a new feature introduced in Doris verion 0.12. It's designed to manage partition's Time-to-Life (TTL), reducing the burden on users.
Dynamic partition is a new feature introduced in Doris version 0.12. It's designed to manage partition's Time-to-Life (TTL), reducing the burden on users.

At present, the functions of dynamically adding partitions and dynamically deleting partitions are realized.

Expand Down Expand Up @@ -302,11 +302,11 @@ mysql> SHOW DYNAMIC PARTITION TABLES;

Whether to enable Doris's dynamic partition feature. The default value is false, which is off. This parameter only affects the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also execute the following commands at runtime to take effect:

MySQL protocal
MySQL protocol

`ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true")`

HTTP protocal
HTTP protocol

`curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true`

Expand All @@ -316,11 +316,11 @@ mysql> SHOW DYNAMIC PARTITION TABLES;

The execution frequency of dynamic partition threads defaults to 3600 (1 hour), that is, scheduling is performed every 1 hour. You can modify the parameters in `fe.conf` and restart FE to take effect. You can also modify the following commands at runtime:

MySQL protocal
MySQL protocol

`ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = "7200")`

HTTP protocal
HTTP protocol

`curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000`

6 changes: 3 additions & 3 deletions docs/en/administrator-guide/export_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ The overall mode of dispatch is as follows:
1. The user submits an Export job to FE.
2. FE's Export scheduler performs an Export job in two stages:
1. PENDING: FE generates Export Pending Task, sends snapshot command to BE, and takes a snapshot of all Tablets involved. And generate multiple query plans.
2. EXPORTING: FE generates Export ExporingTask and starts executing the query plan.
2. EXPORTING: FE generates Export ExportingTask and starts executing the query plan.

### query plan splitting

Expand Down Expand Up @@ -122,7 +122,7 @@ WITH BROKER "hdfs"
* `timeout`: homework timeout. Default 2 hours. Unit seconds.
* `tablet_num_per_task`: The maximum number of fragments allocated per query plan. The default is 5.

After submitting a job, the job status can be imported by querying the `SHOW EXPORT'command. The results are as follows:
After submitting a job, the job status can be imported by querying the `SHOW EXPORT` command. The results are as follows:

```
JobId: 14008
Expand All @@ -141,7 +141,7 @@ FinishTime: 2019-06-25 17:08:34
* JobId: The unique ID of the job
* State: Job status:
* PENDING: Jobs to be Scheduled
* EXPORING: Data Export
* EXPORTING: Data Export
* FINISHED: Operation Successful
* CANCELLED: Job Failure
* Progress: Work progress. The schedule is based on the query plan. Assuming a total of 10 query plans have been completed, the progress will be 30%.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ To get FE log via HTTP

## Notification

Need ADMIN priviledge.
Need ADMIN privilege.
4 changes: 2 additions & 2 deletions docs/en/administrator-guide/load-data/delete-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,14 @@ The following describes the parameters used in the delete statement:

* WHERE

The conditiona of the delete statement. All delete statements must specify a where condition.
The condition of the delete statement. All delete statements must specify a where condition.

Explanation:

1. The type of `OP` in the WHERE condition can only include `=, >, <, >=, <=, !=, in, not in`.
2. The column in the WHERE condition can only be the `key` column.
3. Cannot delete when the `key` column does not exist in any rollup table.
4. Each condition in WHERE condition can only be realated by `and`. If you want `or`, you are suggested to write these conditions into two delete statements.
4. Each condition in WHERE condition can only be connected by `and`. If you want `or`, you are suggested to write these conditions into two delete statements.
5. If the specified table is a range partitioned table, `PARTITION` must be specified unless the table is a single partition table,.
6. Unlike the insert into command, delete statement cannot specify `label` manually. You can view the concept of `label` in [Insert Into] (./insert-into-manual.md)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ Insert Into itself is a SQL command, and the return result is divided into the f
## Best Practices

### Application scenarios
1. Users want to import only a few false data to verify the functionality of Doris system. The grammar of INSERT INTO VALUS is suitable at this time.
1. Users want to import only a few false data to verify the functionality of Doris system. The grammar of INSERT INTO VALUES is suitable at this time.
2. Users want to convert the data already in the Doris table into ETL and import it into a new Doris table, which is suitable for using INSERT INTO SELECT grammar.
3. Users can create an external table, such as MySQL external table mapping a table in MySQL system. Or create Broker external tables to map data files on HDFS. Then the data from the external table is imported into the Doris table for storage through the INSERT INTO SELECT grammar.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Usually used to troubleshoot network problems.

### `doris_fe_snmp{name="tcp_in_segs"}`

Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of receivied TCP packets.
Value of the `Tcp: InSegs` field in `/proc/net/snmp`. Represents the number of received TCP packets.

Use `(NEW_tcp_in_errs - OLD_tcp_in_errs) / (NEW_tcp_in_segs - OLD_tcp_in_segs)` can calculate the error rate of received TCP packets.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/community/how-to-contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Thank you very much for your interest in the Doris project. We welcome your sugg

Your suggestions, comments and comments on Doris can be made directly through GitHub's [Issues] (https://github.com/apache/incubator-doris/issues/new/selection).

There are many ways to participate in and contribute to Doris projects: code implementation, test writing, process tool improvement, document improvement, and so on. Any contribution will be welcomed and you will be added to the list of contributors. Further, with sufficient contributions, you will have the opportunity to become a Commiter of Aapche with Apache mailbox and be included in the list of [Apache Commiters] (http://people.apache.org/committer-index.html).
There are many ways to participate in and contribute to Doris projects: code implementation, test writing, process tool improvement, document improvement, and so on. Any contribution will be welcomed and you will be added to the list of contributors. Further, with sufficient contributions, you will have the opportunity to become a Committer of Apache with Apache mailbox and be included in the list of [Apache Committers] (http://people.apache.org/committer-index.html).

Any questions, you can contact us to get timely answers, including Wechat, Gitter (GitHub instant messaging tool), e-mail and so on.

Expand Down
4 changes: 2 additions & 2 deletions docs/en/developer-guide/fe-eclipse-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Then just run `Run/Debug`.

## Run FE

You can directly start an FE process in Ecplise to facilitate debugging the code.
You can directly start an FE process in Eclipse to facilitate debugging the code.

1. Create a runtime directory

Expand All @@ -116,7 +116,7 @@ You can directly start an FE process in Ecplise to facilitate debugging the code

Create the configuration file `fe.conf` in the `conf/` directory created in the first step. You can directly copy `conf/fe.conf` in the source directory and make simple changes.

3. Find the `src/main/java/org/apache/doris/PaloFe.java` file in Ecplise, right-click and select `Run As -> Run Configurations...`. Add the following environment variables to the `Environment` tab:
3. Find the `src/main/java/org/apache/doris/PaloFe.java` file in Eclipse, right-click and select `Run As -> Run Configurations...`. Add the following environment variables to the `Environment` tab:

* `DORIS_HOME: /path/to/doris/fe/run/`
* `PID_DIR: /path/to/doris/fe/run/`
Expand Down
2 changes: 1 addition & 1 deletion docs/en/developer-guide/format-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ the version is lower than clang-format-9.0.

`-i`input file

Note: filter out the files which should not be formatted, when batch clang-formating files.
Note: filter out the files which should not be formatted, when batch clang-formatting files.

A example of how to filter \*.h/\*.cpp and exclude some dirs:

Expand Down
2 changes: 1 addition & 1 deletion docs/en/extending-doris/doris-on-es.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ Parameter | Description
**password** | password for the user

* For clusters before 7.x, please pay attention to choosing the correct type when building the table
* The authentication method only supports Http Bastic authentication, need to ensure that this user has access to: /\_cluster/state/, \_nodes/http and other paths and index read permissions;The cluster has not turned on security authentication, and the user name and password do not need to be set
* The authentication method only supports Http Basic authentication, need to ensure that this user has access to: /\_cluster/state/, \_nodes/http and other paths and index read permissions;The cluster has not turned on security authentication, and the user name and password do not need to be set
* The column names in the Doris table need to exactly match the field names in the ES, and the field types should be as consistent as possible
* **ENGINE** must be: **Elasticsearch**

Expand Down
18 changes: 9 additions & 9 deletions docs/en/getting-started/data-model-rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
As you can see, this is a typical fact table of user information and access behavior.
In general star model, user information and access behavior are stored in dimension table and fact table respectively. Here, in order to explain Doris's data model more conveniently, we store the two parts of information in a single table.

The columns in the table are divided into Key (dimension column) and Value (indicator column) according to whether `AggregationType`is set or not. No `AggregationType`, such as `user_id`, `date`, `age`, etc., is set as **Key**, while AggregationType'is set as **Value**.
The columns in the table are divided into Key (dimension column) and Value (indicator column) according to whether `AggregationType`is set or not. No `AggregationType`, such as `user_id`, `date`, `age`, etc., is set as **Key**, while Aggregation Type is set as **Value**.

When we import data, the same rows and aggregates into one row for the Key column, while the Value column aggregates according to the set `AggregationType`. `AggregationType`currently has the following four ways of aggregation:

Expand Down Expand Up @@ -162,7 +162,7 @@ Following example 1, we modify the table structure as follows:
| max dwell time | INT | MAX | Maximum user residence time|
| min dwell time | INT | MIN | User minimum residence time|

That is to say, a column of `timestamp'has been added to record the data filling time accurate to seconds.
That is to say, a column of `timestamp` has been added to record the data filling time accurate to seconds.

The imported data are as follows:

Expand All @@ -188,7 +188,7 @@ Then when this batch of data is imported into Doris correctly, the final storage
| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 | 2017-10-01 10:00:15 | 100 | 3 | 3|
| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 | 2017-10-03 10:20:22 | 11 | 6 | 6|

We can see that the stored data, just like the imported data, does not aggregate at all. This is because, in this batch of data, because the `timestamp'column is added, the Keys of all rows are **not exactly the same**. That is, as long as the keys of each row are not identical in the imported data, Doris can save the complete detailed data even in the aggregation model.
We can see that the stored data, just like the imported data, does not aggregate at all. This is because, in this batch of data, because the `timestamp` column is added, the Keys of all rows are **not exactly the same**. That is, as long as the keys of each row are not identical in the imported data, Doris can save the complete detailed data even in the aggregation model.

### Example 3: Importing data and aggregating existing data

Expand Down Expand Up @@ -222,7 +222,7 @@ Then when this batch of data is imported into Doris correctly, the final storage
| 10004 | 2017-10-03 | Shenzhen | 35 | 0 | 2017-10-03 11:22:00 | 55 | 19 | 6|
| 10005 | 2017-10-03 | Changsha | 29 | 1 | 2017-10-03 18:11:02 | 3 | 1 | 1|

As you can see, the existing data and the newly imported data of user 10004 have been aggregated. At the same time, 10005 new users'data were added.
As you can see, the existing data and the newly imported data of user 10004 have been aggregated. At the same time, 10005 new user's data were added.

Data aggregation occurs in Doris in the following three stages:

Expand Down Expand Up @@ -434,7 +434,7 @@ When we do the following queries:

Doris automatically hits the ROLLUP table.

#### OLLUP in Duplicate Model
#### ROLLUP in Duplicate Model

Because the Duplicate model has no aggregate semantics. So the ROLLLUP in this model has lost the meaning of "scroll up". It's just to adjust the column order to hit the prefix index. In the next section, we will introduce prefix index in detail, and how to use ROLLUP to change prefix index in order to achieve better query efficiency.

Expand Down Expand Up @@ -513,15 +513,15 @@ The ROLLUP table is preferred because the prefix index of ROLLUP matches better.

### Some Explanations of ROLLUP

* The fundamental role of ROLLUP is to improve the query efficiency of some queries (whether by aggregating to reduce the amount of data or by modifying column order to match prefix indexes). Therefore, the meaning of ROLLUP has gone beyond the scope of "roll-up". That's why we named it Materized Index in the source code.
* The fundamental role of ROLLUP is to improve the query efficiency of some queries (whether by aggregating to reduce the amount of data or by modifying column order to match prefix indexes). Therefore, the meaning of ROLLUP has gone beyond the scope of "roll-up". That's why we named it Materialized Index in the source code.
* ROLLUP is attached to the Base table and can be seen as an auxiliary data structure of the Base table. Users can create or delete ROLLUP based on the Base table, but cannot explicitly specify a query for a ROLLUP in the query. Whether ROLLUP is hit or not is entirely determined by the Doris system.
* ROLLUP data is stored in separate physical storage. Therefore, the more OLLUP you create, the more disk space you occupy. It also has an impact on the speed of import (the ETL phase of import automatically generates all ROLLUP data), but it does not reduce query efficiency (only better).
* ROLLUP data is stored in separate physical storage. Therefore, the more ROLLUP you create, the more disk space you occupy. It also has an impact on the speed of import (the ETL phase of import automatically generates all ROLLUP data), but it does not reduce query efficiency (only better).
* Data updates for ROLLUP are fully synchronized with Base representations. Users need not care about this problem.
* Columns in ROLLUP are aggregated in exactly the same way as Base tables. There is no need to specify or modify ROLLUP when creating it.
* A necessary (inadequate) condition for a query to hit ROLLUP is that all columns ** (including the query condition columns in select list and where) involved in the query exist in the column of the ROLLUP. Otherwise, the query can only hit the Base table.
* Certain types of queries (such as count (*)) cannot hit ROLLUP under any conditions. See the next section **Limitations of the aggregation model**.
* The query execution plan can be obtained by `EXPLAIN your_sql;` command, and in the execution plan, whether ROLLUP has been hit or not can be checked.
* Base tables and all created ROLLUPs can be displayed by `DESC tbl_name ALL;` statement.
* Base tables and all created ROLLUP can be displayed by `DESC tbl_name ALL;` statement.

In this document, you can see [Query how to hit Rollup] (hit-the-rollup)

Expand Down Expand Up @@ -622,7 +622,7 @@ Therefore, when there are frequent count (*) queries in the business, we recomme

Add a count column and import the data with the column value **equal to 1**. The result of `select count (*) from table;`is equivalent to `select sum (count) from table;` The query efficiency of the latter is much higher than that of the former. However, this method also has limitations, that is, users need to guarantee that they will not import rows with the same AGGREGATE KEY column repeatedly. Otherwise, `select sum (count) from table;`can only express the number of rows originally imported, not the semantics of `select count (*) from table;`

Another way is to **change the aggregation type of the count'column above to REPLACE, and still weigh 1**. Then`select sum (count) from table;` and `select count (*) from table;` the results will be consistent. And in this way, there is no restriction on importing duplicate rows.
Another way is to **change the aggregation type of the count column above to REPLACE, and still weigh 1**. Then`select sum (count) from table;` and `select count (*) from table;` the results will be consistent. And in this way, there is no restriction on importing duplicate rows.

### Duplicate Model

Expand Down
2 changes: 1 addition & 1 deletion docs/en/getting-started/hit-the-rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ See the following queries:

`SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);`

Firstly, it judges whether the query can hit the aggregated Rolup table. After checking the graph above, it is possible. Then the condition contains three conditions: k1, K2 and k3. The first three columns of test_rollup, rollup1 and rollup2 contain all the three conditions. So the prefix index length is the same. Then, it is obvious that the aggregation degree of rollup2 is the highest when comparing the number of rows. Row 2 is selected because of the minimum number of rows.
Firstly, it judges whether the query can hit the aggregated Rollup table. After checking the graph above, it is possible. Then the condition contains three conditions: k1, K2 and k3. The first three columns of test_rollup, rollup1 and rollup2 contain all the three conditions. So the prefix index length is the same. Then, it is obvious that the aggregation degree of rollup2 is the highest when comparing the number of rows. Row 2 is selected because of the minimum number of rows.

```
| 0:OlapScanNode |
Expand Down
Loading