Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic data access slow on sort NEWEST #192

Closed
jorgheymans opened this issue Jan 17, 2020 · 20 comments
Closed

Topic data access slow on sort NEWEST #192

jorgheymans opened this issue Jan 17, 2020 · 20 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@jorgheymans
Copy link
Contributor

Hi,

When in the topic list overview, clicking on a topic that has a lot of data takes a long time before the record list shows up. This is a topic with only about 500k records. Checking perf in another browsing tool (kafkatool.com), there the records display in 4-5 seconds for the same topic. Is there anything that could potentially slow down fetching the records ?

@tchiotludo
Copy link
Owner

Hello,

Can you provide me some log after playing this curl

curl -i -X POST -H "Content-Type: application/json" \
       -d '{ "configuredLevel": "TRACE" }' \
       http://localhost:8080/loggers/org.kafkahq

It will help me to understand the response time.

One question, what is your sort preference on this page, and is this topic is compacted ?

@jorgheymans
Copy link
Contributor Author

here is the log, sort options are default so i guess 'Newest'

2020-01-17 12:51:04,195 DEBUG -thread-25 o.k.r.RecordRepository     16282 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=50, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-17 12:51:04,230 DEBUG -thread-25 org.kafkahq.utils.Logger   14 ms -> Describe Topic Config [internal_test]
2020-01-17 12:51:04,358 INFO  Group-1-10 org.kafkahq.log.access     [Date: 2020-01-17T12:50:47.643407+01:00] [Duration: 16712 ms] [Url: GET /kafka-nonprod/topic/internal_test HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 5569] [Port: 8050]

@jorgheymans
Copy link
Contributor Author

jorgheymans commented Jan 17, 2020 via email

@tchiotludo
Copy link
Owner

Ok thanks for information !

It seems that you use the NEWEST sort by default.
Default sort is OLDEST, it's because sort by NEWEST is slower than OLDEST.
Kafka Api only allow to sort by OLDEST.
The NEWEST is a hack I've to do on KafkaHQ in order to support this feature, to do this I use a parallelStream() that will fetch in parallel each partitions (with max of 1 thread per CPU, 8 cpu = 8 parallel thread).

So to resume, the NEWEST will depend

  • number of CPU (for parallelism)
  • number of partition on this topic.
    The more partition you have, the slower it will be.
    The CPU count will divide approximately the response time.

On the OLDEST, I use the standard Kafka Api, 1 Consumer fetch all the partition at the same time with 1 thread, and it's more efficient but I can't be done on NEWEST` sort because the record are on all partitions so we need to fetch all partitions.

If the topic is compacted, It can be worst (still with newest). Because it can lead to multiple query on Kafka to be able to fetch some data.

Here is the situation, but as I known I haven't any solution to speed up the NEWEST part (except have a config to allow more parallel fetch).

Just for information can you send me the log with sort OLDEST ?

@tchiotludo tchiotludo added the wait for reply Need more information from reporter label Jan 19, 2020
@jorgheymans
Copy link
Contributor Author

jorgheymans commented Jan 20, 2020

This is the debug with OLDEST, indeed it's really fast. Not sure that this is the default though, don't remember changing this ever in the UI. Or maybe i did very long time ago and it's stored in a cookie.

2020-01-20 10:02:54,011 DEBUG 1-thread-4 org.kafkahq.utils.Logger   11 ms -> Describe Topics [internal_test]
2020-01-20 10:02:54,183 DEBUG 1-thread-4 org.kafkahq.utils.Logger   167 ms -> Describe Topics Offsets [internal_test]
2020-01-20 10:02:54,191 DEBUG 1-thread-4 org.kafkahq.utils.Logger   6 ms -> Cluster
2020-01-20 10:02:54,202 DEBUG 1-thread-4 org.kafkahq.utils.Logger   17 ms -> List Log dir
2020-01-20 10:02:54,212 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 11] [start: 41689]
2020-01-20 10:02:54,221 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 12] [start: 41398]
2020-01-20 10:02:54,223 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 13] [start: 42547]
2020-01-20 10:02:54,225 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 14] [start: 43844]
2020-01-20 10:02:54,226 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 7] [start: 43009]
2020-01-20 10:02:54,227 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 8] [start: 43444]
2020-01-20 10:02:54,230 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 9] [start: 42060]
2020-01-20 10:02:54,232 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 10] [start: 43374]
2020-01-20 10:02:54,233 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 3] [start: 43086]
2020-01-20 10:02:54,234 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 4] [start: 42912]
2020-01-20 10:02:54,236 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 5] [start: 43441]
2020-01-20 10:02:54,237 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 6] [start: 44143]
2020-01-20 10:02:54,239 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 0] [start: 43800]
2020-01-20 10:02:54,239 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 1] [start: 43118]
2020-01-20 10:02:54,240 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 2] [start: 43382]
2020-01-20 10:02:54,241 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 19] [start: 42981]
2020-01-20 10:02:54,242 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 15] [start: 40706]
2020-01-20 10:02:54,244 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 16] [start: 42001]
2020-01-20 10:02:54,245 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 17] [start: 42685]
2020-01-20 10:02:54,246 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 18] [start: 41601]
2020-01-20 10:02:54,786 DEBUG 1-thread-4 o.k.r.RecordRepository     582 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=50, after={}, sort=OLDEST, partition=null, timestamp=null)]
2020-01-20 10:02:54,797 DEBUG 1-thread-4 org.kafkahq.utils.Logger   9 ms -> Describe Topic Config [internal_test]
2020-01-20 10:02:54,820 INFO  pGroup-1-8 org.kafkahq.log.access     [Date: 2020-01-20T10:02:53.9913347+01:00] [Duration: 828 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=OLDEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 9327] [Port: 8050]

@tchiotludo
Copy link
Owner

Ok just like I think.

The configuration can be changed on the configuration files only.
This can't be changed permanently from the ui (only for current page), and this is not store in session or cookie

@jorgheymans
Copy link
Contributor Author

Thanks, indeed it was set to NEWEST. Browsing OLDEST rarely makes sense in our case.

@tchiotludo
Copy link
Owner

So Like i said and can't change this behavior, since it was a limitation from Kafka Api.

The only things is to raise parallelism.
How many CPU does your KafkaHQ installation have ?

@jorgheymans
Copy link
Contributor Author

This is the CPU spec: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, 3601 Mhz, 4 Core(s), 8 Logical Processor(s)

For comparison, i have created a gif of loading the same topic data in the other tool. This just to exclude connectivity or infrastructure issues with the cluster. It does seem possible to get back some data rather quickly, so wondering where the main difference could be with kafkahq. I hope you don't mind me posting this, if not let me know and i'll remove it.

test-load

@tchiotludo
Copy link
Owner

tchiotludo commented Jan 20, 2020

No problems for screenshot.
To be honest, I've no clue how kafka tools deal about this.
KafkaHQ will use the param kafkahq.topicdata.size (default: 50) to determine the poll per partition.
So must be 3 in your case with a parralel of 8.

I'have added some log on the dev docker image, is it possible to have the log output (will log line like Consume [topic: internal_test] [partition: 9] ) ?

@jorgheymans
Copy link
Contributor Author

@tchiotludo sorry but docker I cannot easily test here. Do you have a snapshot of the jar available somewhere ?

@tchiotludo
Copy link
Owner

Yes look at this :
https://github.com/tchiotludo/kafkahq#early-dev-image

You can grab the jar on your local desktop

@jorgheymans
Copy link
Contributor Author

Using latest dev build i get NPE when wanting to browse topic data. Note I built it myself locally.

Caused by: java.lang.NullPointerException: null
        at org.kafkahq.modules.KafkaModule.getRegistryClient(KafkaModule.java:181)
        at org.kafkahq.repositories.SchemaRegistryRepository.getKafkaAvroDeserializer(SchemaRegistryRepository.java:188)
        at org.kafkahq.repositories.RecordRepository.newRecord(RecordRepository.java:335)
        at org.kafkahq.repositories.RecordRepository.consumeOldest(RecordRepository.java:83)
        at org.kafkahq.repositories.RecordRepository.lambda$consume$0(RecordRepository.java:53)
        at org.kafkahq.utils.Debug.call(Debug.java:68)
        ... 189 common frames omitted

tchiotludo added a commit that referenced this issue Jan 21, 2020
@tchiotludo
Copy link
Owner

Arg, I just push a new commit on dev branch, can you retry please ?

@jorgheymans
Copy link
Contributor Author

Here is the new output. Note that at the end, the page says 'no data available' which i find a bit strange. I have seen this happening before with NEWEST, never with OLDEST. There is definitely data in the topic as you can see from below output. Sometimes after a few refreshes the data reappears.

2020-01-21 15:15:50,485 DEBUG -thread-14 org.kafkahq.utils.Logger   8 ms -> Describe Topics [internal_test]
2020-01-21 15:15:50,667 DEBUG -thread-14 org.kafkahq.utils.Logger   180 ms -> Describe Topics Offsets [internal_test]
2020-01-21 15:15:50,675 DEBUG -thread-14 org.kafkahq.utils.Logger   6 ms -> Cluster
2020-01-21 15:15:50,689 DEBUG -thread-14 org.kafkahq.utils.Logger   20 ms -> List Log dir
2020-01-21 15:15:50,718 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 8] [start: 71690]
2020-01-21 15:15:50,723 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 2] [start: 70525]
2020-01-21 15:15:50,753 TRACE l-worker-1 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 17] [start: 69990]
2020-01-21 15:15:50,760 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 1] [start: 69334]
2020-01-21 15:15:50,756 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 6] [start: 71169]
2020-01-21 15:15:50,760 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 5] [start: 70765]
2020-01-21 15:15:50,760 TRACE -thread-14 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 12] [start: 67653]
2020-01-21 15:15:50,760 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 16] [start: 69550]
2020-01-21 15:15:55,750 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 9] [start: 69593]
2020-01-21 15:15:55,773 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 4] [start: 69306]
2020-01-21 15:15:55,780 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 0] [start: 71112]
2020-01-21 15:15:55,793 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 10] [start: 70450]
2020-01-21 15:15:55,826 TRACE -thread-14 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 14] [start: 72041]
2020-01-21 15:15:55,827 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 11] [start: 69451]
2020-01-21 15:15:55,834 TRACE l-worker-1 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 19] [start: 70523]
2020-01-21 15:15:55,844 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 15] [start: 67242]
2020-01-21 15:16:00,774 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 7] [start: 71021]
2020-01-21 15:16:00,808 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 3] [start: 69968]
2020-01-21 15:16:00,808 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 13] [start: 69930]
2020-01-21 15:16:00,808 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 18] [start: 68719]
2020-01-21 15:16:05,813 DEBUG -thread-14 o.k.r.RecordRepository     15119 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=3, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-21 15:16:05,827 DEBUG -thread-14 org.kafkahq.utils.Logger   10 ms -> Describe Topic Config [internal_test]
2020-01-21 15:16:05,837 INFO  pGroup-1-6 org.kafkahq.log.access     [Date: 2020-01-21T15:15:50.465087+01:00] [Duration: 15372 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=NEWEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 2462] [Port: 8050]

@tchiotludo
Copy link
Owner

The no data available can happen, because Kafka API can't let me know if there is data and I reach the poll timeout (that you can change in config).
In this case, the Kafka API don't let me know if it's timeout or no record.

The latest log output is showing no data available, right ?
Because every query are 5 seconds long.
You seems to reach the timeout.
We can see that you are using 8 parallel thread.
Can you also provide some metrics ? (Log size and offset for this topic)

@jorgheymans
Copy link
Contributor Author

jorgheymans commented Jan 22, 2020

I increased poll-timeout to 10000 but still getting "no data available". There are 550k records in the topic, size is about 110MB. EDIT: increasing poll-timeout to 20000 makes results appear again.

2020-01-22 08:09:59,394 DEBUG 1-thread-4 org.kafkahq.utils.Logger   8 ms -> Describe Topics [internal_test]
2020-01-22 08:09:59,613 DEBUG 1-thread-4 org.kafkahq.utils.Logger   217 ms -> Describe Topics Offsets [internal_test]
2020-01-22 08:09:59,623 DEBUG 1-thread-4 org.kafkahq.utils.Logger   8 ms -> Cluster
2020-01-22 08:09:59,634 DEBUG 1-thread-4 org.kafkahq.utils.Logger   19 ms -> List Log dir
2020-01-22 08:09:59,671 TRACE 1-thread-4 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 12] [start: 67905]
2020-01-22 08:09:59,711 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 17] [start: 70274]
2020-01-22 08:09:59,722 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 1] [start: 69539]
2020-01-22 08:09:59,727 TRACE l-worker-7 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 11] [start: 69633]
2020-01-22 08:09:59,730 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 2] [start: 70765]
2020-01-22 08:09:59,731 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 6] [start: 71418]
2020-01-22 08:09:59,732 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 19] [start: 70666]
2020-01-22 08:09:59,734 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 16] [start: 69737]
2020-01-22 08:10:09,711 TRACE 1-thread-4 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 14] [start: 72243]
2020-01-22 08:10:09,739 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 13] [start: 70158]
2020-01-22 08:10:09,757 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 18] [start: 68923]
2020-01-22 08:10:09,767 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 4] [start: 69510]
2020-01-22 08:10:09,776 TRACE l-worker-7 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 10] [start: 70715]
2020-01-22 08:10:09,777 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 15] [start: 67380]
2020-01-22 08:10:09,784 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 0] [start: 71372]
2020-01-22 08:10:09,791 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 5] [start: 70983]
2020-01-22 08:10:19,798 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 8] [start: 71868]
2020-01-22 08:10:19,806 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 7] [start: 71230]
2020-01-22 08:10:19,810 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 9] [start: 69797]
2020-01-22 08:10:19,811 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 3] [start: 70138]
2020-01-22 08:10:29,827 DEBUG 1-thread-4 o.k.r.RecordRepository     30191 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=3, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-22 08:10:29,859 DEBUG 1-thread-4 org.kafkahq.utils.Logger   9 ms -> Describe Topic Config [internal_test]
2020-01-22 08:10:29,873 INFO  pGroup-1-6 org.kafkahq.log.access     [Date: 2020-01-22T08:09:59.381221+01:00] [Duration: 30491 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=NEWEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 2461] [Port: 8050]

@tchiotludo tchiotludo changed the title topic data access slow Topic data access slow on sort NEWEST Jan 23, 2020
@tchiotludo tchiotludo added bug Something isn't working good first issue Good for newcomers and removed wait for reply Need more information from reporter labels Jan 23, 2020
@jorgheymans
Copy link
Contributor Author

The amount of volume in the topic is not related to this bug it seems. For a topic of 5k records i had to wait 35 seconds to get back only one record, that's a bit surprising.

ghost pushed a commit that referenced this issue Jul 17, 2020
* fixed bug

* erased console log and fixed formatDateTime function

* topic data filters saved in session storage

* schema column responsive
@PavelYadrov
Copy link

PavelYadrov commented Jan 26, 2023

Hello,
I've faced with the same issue, but sort by newest fails even for topics with 200 records.
Capture

In logs i can't see any errors, it loads records from all partitions (5 in my case) and then show execution time:
[0;39m 30120 ms -> Consume with options [RecordRepository.Options(topic=test_topic, size=50, after={}, sort=NEWEST, partition=null, timestamp=null)]

Could you please help me with understanding why AKHQ fails to load even in topics with small amount of records.
Note: with default OLDEST sorting it works just fine

@tchiotludo
Copy link
Owner

Seems that the reason can be found here

tchiotludo pushed a commit that referenced this issue Apr 4, 2023
- Adding stop condition in newest sort to prevent extra polling 
- Fixing wrong number of records with pagination

close #192
@tchiotludo tchiotludo moved this to Done in Backlog Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: Done
Development

No branches or pull requests

3 participants