Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COPY from public website seems really slow #6722

Closed
2 tasks
Tracked by #6642
Xuanwo opened this issue Jul 21, 2022 · 8 comments
Closed
2 tasks
Tracked by #6642

COPY from public website seems really slow #6722

Xuanwo opened this issue Jul 21, 2022 · 8 comments

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Jul 21, 2022

Summary

COPY from the public website seems slow

https://repo.databend.rs/dataset/stateful/ontime_2006_100000.csv is 41.8 MiB.

:( curl https://repo.databend.rs/dataset/stateful/ontime_2006_100000.csv -o /tmp/test.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41.8M  100 41.8M    0     0  5231k      0  0:00:08  0:00:08 --:--:-- 8574k

But this query seems to hang forever:

copy into ontime200 from 'https://repo.databend.rs/dataset/stateful/ontime_2006_100000.csv'     
    FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1);

To address this problem:

  • Add more debug log for file splitter
  • Profile COPY
@BohuTANG
Copy link
Member

repo.databend.rs is dns&proxy by Cloudflare. Is it related to this? cc @everpcpc @flaneur2020

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 21, 2022

repo.databend.rs is dns&proxy by Cloudflare. Is it related to this? cc @everpcpc @flaneur2020

I think it's a databend-query's problem.

@BohuTANG
Copy link
Member

Seems we should add more metrics to do the profile, like the copy progress rasing from the openDAL.

@flaneur2020
Copy link
Member

flaneur2020 commented Jul 21, 2022

I guess we can add some metrics in the COPY progress, e.g. bytes read from source, bytes ingested, active threads count on copy

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 21, 2022

Yep, let's do this.

I have added mostly the features I wanted in COPY, it's time to improve the performance.

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 21, 2022

Based on my local test, the bottleneck is the insert part.

We spent less than 1s to read 1MB of data from the source but needed about 2 minutes to start the next reading round.

The first line (2022-07-21T02:32:27) is where we received(filled) the input buf.
The last line (2022-07-21T02:34:40) is the first call to read from the source again.

2022-07-21T02:32:27.850785Z DEBUG databend_query::pipelines::processors::sources::file_splitter: received date from source: 1048576 bytes
2022-07-21T02:33:00.291908Z DEBUG apply_cmd{kv_pairs=None}: common_meta_raft_store::state_machine::sm: apply_cmd: UpsertKV { key: "__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3", seq: GE(1), value: AsIs, value_meta: Some(KVMeta { expire_at: Some(1658370840) }) }
2022-07-21T02:33:00.292245Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370840) })}: common_meta_raft_store::state_machine::sm: kv_cmd key=__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 seq=GE(1) value_op=AsIs value_meta=Some(KVMeta { expire_at: Some(1658370840) })
2022-07-21T02:33:00.292659Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370840) })}: common_meta_raft_store::state_machine::sm: seq_value: SeqV { seq: 137, meta: Some(KVMeta { expire_at: Some(1658370800) }), data: "[binary]" } now: 1658370780
2022-07-21T02:33:00.292895Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370840) })}: common_meta_raft_store::state_machine::sm: applied IncrSeq: generic-kv=138
2022-07-21T02:33:00.293128Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370840) })}: common_meta_raft_store::state_machine::sm: applied upsert: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 138, meta: Some(KVMeta { expire_at: Some(1658370840) }), data: "[binary]" })
2022-07-21T02:33:00.293266Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370840) })}: common_meta_raft_store::state_machine::sm: applied UpsertKV: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 138, meta: Some(KVMeta { expire_at: Some(1658370840) }), data: "[binary]" })
2022-07-21T02:33:00.293611Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 772840 to 772864
2022-07-21T02:33:00.293930Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 772864 to 773318
2022-07-21T02:33:00.307222Z DEBUG sled::pagecache::iobuf: wrote lsns 772840-772863 to disk at offsets 772840-772863, maxed false complete_len 24
2022-07-21T02:33:00.307550Z DEBUG sled::pagecache::iobuf: mark_interval(772840, 24)
2022-07-21T02:33:00.307734Z DEBUG sled::pagecache::iobuf: new highest interval: 772840 - 772863
2022-07-21T02:33:00.316397Z DEBUG sled::pagecache::iobuf: wrote lsns 772864-773317 to disk at offsets 772864-773317, maxed false complete_len 454
2022-07-21T02:33:00.316592Z DEBUG sled::pagecache::iobuf: mark_interval(772864, 454)
2022-07-21T02:33:00.316752Z DEBUG sled::pagecache::iobuf: new highest interval: 772864 - 773317
2022-07-21T02:33:00.316963Z DEBUG sled::pagecache::iobuf: make_stable(773317) returning
2022-07-21T02:33:22.604603Z DEBUG apply_cmd{kv_pairs=None}: common_meta_raft_store::state_machine::sm: apply_cmd: UpsertKV { key: "__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3", seq: GE(1), value: AsIs, value_meta: Some(KVMeta { expire_at: Some(1658370862) }) }
2022-07-21T02:33:22.604917Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370862) })}: common_meta_raft_store::state_machine::sm: kv_cmd key=__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 seq=GE(1) value_op=AsIs value_meta=Some(KVMeta { expire_at: Some(1658370862) })
2022-07-21T02:33:22.605258Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370862) })}: common_meta_raft_store::state_machine::sm: seq_value: SeqV { seq: 138, meta: Some(KVMeta { expire_at: Some(1658370840) }), data: "[binary]" } now: 1658370802
2022-07-21T02:33:22.605503Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370862) })}: common_meta_raft_store::state_machine::sm: applied IncrSeq: generic-kv=139
2022-07-21T02:33:22.605755Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370862) })}: common_meta_raft_store::state_machine::sm: applied upsert: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 139, meta: Some(KVMeta { expire_at: Some(1658370862) }), data: "[binary]" })
2022-07-21T02:33:22.605909Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370862) })}: common_meta_raft_store::state_machine::sm: applied UpsertKV: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 139, meta: Some(KVMeta { expire_at: Some(1658370862) }), data: "[binary]" })
2022-07-21T02:33:22.606281Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 773318 to 773342
2022-07-21T02:33:22.606675Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 773342 to 774244
2022-07-21T02:33:22.624292Z DEBUG sled::pagecache::iobuf: wrote lsns 773318-773341 to disk at offsets 773318-773341, maxed false complete_len 24
2022-07-21T02:33:22.624532Z DEBUG sled::pagecache::iobuf: mark_interval(773318, 24)
2022-07-21T02:33:22.624656Z DEBUG sled::pagecache::iobuf: new highest interval: 773318 - 773341
2022-07-21T02:33:22.633336Z DEBUG sled::pagecache::iobuf: wrote lsns 773342-774243 to disk at offsets 773342-774243, maxed false complete_len 902
2022-07-21T02:33:22.633476Z DEBUG sled::pagecache::iobuf: mark_interval(773342, 902)
2022-07-21T02:33:22.633593Z DEBUG sled::pagecache::iobuf: new highest interval: 773342 - 774243
2022-07-21T02:33:22.633769Z DEBUG sled::pagecache::iobuf: make_stable(774243) returning
2022-07-21T02:33:50.648075Z DEBUG apply_cmd{kv_pairs=None}: common_meta_raft_store::state_machine::sm: apply_cmd: UpsertKV { key: "__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3", seq: GE(1), value: AsIs, value_meta: Some(KVMeta { expire_at: Some(1658370890) }) }
2022-07-21T02:33:50.648505Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370890) })}: common_meta_raft_store::state_machine::sm: kv_cmd key=__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 seq=GE(1) value_op=AsIs value_meta=Some(KVMeta { expire_at: Some(1658370890) })
2022-07-21T02:33:50.648962Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370890) })}: common_meta_raft_store::state_machine::sm: seq_value: SeqV { seq: 139, meta: Some(KVMeta { expire_at: Some(1658370862) }), data: "[binary]" } now: 1658370830
2022-07-21T02:33:50.649279Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370890) })}: common_meta_raft_store::state_machine::sm: applied IncrSeq: generic-kv=140
2022-07-21T02:33:50.649644Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370890) })}: common_meta_raft_store::state_machine::sm: applied upsert: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 140, meta: Some(KVMeta { expire_at: Some(1658370890) }), data: "[binary]" })
2022-07-21T02:33:50.649864Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370890) })}: common_meta_raft_store::state_machine::sm: applied UpsertKV: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 140, meta: Some(KVMeta { expire_at: Some(1658370890) }), data: "[binary]" })
2022-07-21T02:33:50.650420Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 774244 to 774268
2022-07-21T02:33:50.650880Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 774268 to 774722
2022-07-21T02:33:50.668470Z DEBUG sled::pagecache::iobuf: wrote lsns 774244-774267 to disk at offsets 774244-774267, maxed false complete_len 24
2022-07-21T02:33:50.668757Z DEBUG sled::pagecache::iobuf: mark_interval(774244, 24)
2022-07-21T02:33:50.668931Z DEBUG sled::pagecache::iobuf: new highest interval: 774244 - 774267
2022-07-21T02:33:50.677624Z DEBUG sled::pagecache::iobuf: wrote lsns 774268-774721 to disk at offsets 774268-774721, maxed false complete_len 454
2022-07-21T02:33:50.677798Z DEBUG sled::pagecache::iobuf: mark_interval(774268, 454)
2022-07-21T02:33:50.677951Z DEBUG sled::pagecache::iobuf: new highest interval: 774268 - 774721
2022-07-21T02:33:50.678153Z DEBUG sled::pagecache::iobuf: make_stable(774721) returning
2022-07-21T02:34:24.214629Z DEBUG apply_cmd{kv_pairs=None}: common_meta_raft_store::state_machine::sm: apply_cmd: UpsertKV { key: "__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3", seq: GE(1), value: AsIs, value_meta: Some(KVMeta { expire_at: Some(1658370924) }) }
2022-07-21T02:34:24.214996Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370924) })}: common_meta_raft_store::state_machine::sm: kv_cmd key=__fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 seq=GE(1) value_op=AsIs value_meta=Some(KVMeta { expire_at: Some(1658370924) })
2022-07-21T02:34:24.215320Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370924) })}: common_meta_raft_store::state_machine::sm: seq_value: SeqV { seq: 140, meta: Some(KVMeta { expire_at: Some(1658370890) }), data: "[binary]" } now: 1658370864
2022-07-21T02:34:24.215549Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370924) })}: common_meta_raft_store::state_machine::sm: applied IncrSeq: generic-kv=141
2022-07-21T02:34:24.215791Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370924) })}: common_meta_raft_store::state_machine::sm: applied upsert: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 141, meta: Some(KVMeta { expire_at: Some(1658370924) }), data: "[binary]" })
2022-07-21T02:34:24.215932Z DEBUG apply_cmd{kv_pairs=None}:apply_update_kv_cmd{value_meta=Some(KVMeta { expire_at: Some(1658370924) })}: common_meta_raft_store::state_machine::sm: applied UpsertKV: __fd_clusters/admin//databend_query/Ev5yjtF1NFfkLbBMBLHOT3 Some(SeqV { seq: 141, meta: Some(KVMeta { expire_at: Some(1658370924) }), data: "[binary]" })
2022-07-21T02:34:24.216269Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 774722 to 774746
2022-07-21T02:34:24.216627Z DEBUG sled::pagecache::iobuf: advancing offset within the current segment from 774746 to 775476
2022-07-21T02:34:24.228435Z DEBUG sled::pagecache::iobuf: wrote lsns 774722-774745 to disk at offsets 774722-774745, maxed false complete_len 24
2022-07-21T02:34:24.228573Z DEBUG sled::pagecache::iobuf: mark_interval(774722, 24)
2022-07-21T02:34:24.228681Z DEBUG sled::pagecache::iobuf: new highest interval: 774722 - 774745
2022-07-21T02:34:24.237389Z DEBUG sled::pagecache::iobuf: wrote lsns 774746-775475 to disk at offsets 774746-775475, maxed false complete_len 730
2022-07-21T02:34:24.237513Z DEBUG sled::pagecache::iobuf: mark_interval(774746, 730)
2022-07-21T02:34:24.237617Z DEBUG sled::pagecache::iobuf: new highest interval: 774746 - 775475
2022-07-21T02:34:24.237755Z DEBUG sled::pagecache::iobuf: make_stable(775475) returning
2022-07-21T02:34:40.736038Z DEBUG databend_query::pipelines::processors::sources::file_splitter: read from source: 4419 bytes

cc @youngsofun

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 26, 2022

Maybe related to apache/opendal#474

Please give a try after merged.

Not related.

@Xuanwo
Copy link
Member Author

Xuanwo commented Jul 28, 2022

The performance has been dramatically improved by #6873

Thanks, @youngsofun. Let's close.

@Xuanwo Xuanwo closed this as completed Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants