Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Index lag time for realtime consuming segments #6431

Open
opschronicle opened this issue Jan 12, 2021 · 20 comments
Open

Text Index lag time for realtime consuming segments #6431

opschronicle opened this issue Jan 12, 2021 · 20 comments

Comments

@opschronicle
Copy link

I am having Text Index and and inverted Index . And for some data Text Index not able to pull out results. for the same search Inverted Index works all the time.

Eg: - The Text Index Search for following is not working
select * from mylog where text_match(log, '1DB7BAD2-406A-D773-B1DA-291664314C4F')

but it works for
select * from mylog where text_match(log, '0D82F520-62C8-9914-14B8-4C2331E54075')

it looks like somehow some data are skipped from text_index. However both search will works fine with regexp_like inverted index.

@opschronicle
Copy link
Author

Partial data set for not-working search is

myservicev5applog 10:52:37,468 INFO  filters.AllCheckInitFilter - [1DB7BAD2-406A-D773-B1DA-291664314C4F] Begin Transaction @ 1/12/21 10:52 AM, associated with session [6E4B98FA819713F21BCEB7A772F4E77C].

myservicev5applog 10:52:37,468 INFO  filters.AllCheckInitFilter - [1DB7BAD2-406A-D773-B1DA-291664314C4F] Processing request: /checkiot.action?

myservicev5applog 10:52:37,468 DEBUG matcher.SimplePathRequestMatcher - [1DB7BAD2-406A-D773-B1DA-291664314C4F] Checking match of request : '/checkiot.action'; against '/AppMonPool'

myservicev5applog 10:52:37,468 DEBUG matcher.SimplePathRequestMatcher - [1DB7BAD2-406A-D773-B1DA-291664314C4F] Checking match of request : '/checkiot.action'; against '/iotPub/**'

myservicev5applog 10:52:37,468 DEBUG matcher.SimplePathRequestMatcher - [1DB7BAD2-406A-D773-B1DA-291664314C4F] Checking match of request : '/checkiot.action'; against '/services/**'

@opschronicle
Copy link
Author

@kishoreg @siddharthteotia Could you guys help me to point to correct direction to fix this issue?

@opschronicle
Copy link
Author

opschronicle commented Jan 12, 2021

@siddharthteotia I just tried now and can see the query working, So that means there is a huge delay for text Index to index realtime data? or the results to appear in search. Are there any settings that controlling the text index timings?

@opschronicle
Copy link
Author

@siddharthteotia The behavior I am seeing is that the text index not pulling results from a segment that is in CONSUMING / IN PROGRESS status..! I am posting the consuming segment metadata here

{
"segment.realtime.endOffset": "9223372036854775807",
"segment.start.time": "-1",
"segment.time.unit": null,
"segment.flush.threshold.size": "100000",
"segment.realtime.startOffset": "0",
"segment.end.time": "-1",
"segment.total.docs": "-1",
"segment.table.name": "mylog_REALTIME",
"segment.realtime.numReplicas": "1",
"segment.creation.time": "1610430115929",
"segment.realtime.download.url": null,
"segment.name": "mylog__0__0__20210112T0541Z",
"segment.index.version": null,
"segment.flush.threshold.time": null,
"segment.type": "REALTIME",
"segment.crc": "-1",
"segment.realtime.status": "IN_PROGRESS"
}

@opschronicle
Copy link
Author

Further Analysis showed there is roughly 3 minutes delay in Text Index updates flowing in . Following are the timelines measured for TextIndex . So it looks TextIndex is doing a three minute bulk index update rather than incremental. Is there any way to reduce this ~3 mts to instant like happening for inverted Index.

2021-01-12 18:59:15.357000
2021-01-12 19:02:15.357000
2021-01-12 19:05:15.357000
2021-01-12 19:08:02.804000
2021-01-12 19:11:05.357000

@opschronicle
Copy link
Author

opschronicle commented Jan 14, 2021

The issue is we are seeing a delay in getting realtime segment ingested. in this case it is about 3 minutes.
As per @siddharthteotia suggestion I tried to reduce the number of columns on the text index to make the text index processing more faster. However reducing the text index columns are having no effect.

@opschronicle
Copy link
Author

@siddharthteotia @kishoreg reducing the rowcount to 1000 from 100000 is giving me a better performance of 20 seconds delay. The timings are below. However this causes the segment count to grow x100 . Could you guys please provide me the direction to fix the issue. I have to assume this is a bug and might need a code change?

2021-01-14 22:51:06.986000
2021-01-14 22:51:26.987000
2021-01-14 22:51:46.986000
2021-01-14 22:52:06.986000
2021-01-14 22:52:26.987000

@siddharthteotia
Copy link
Contributor

We need to optimize the text index snapshot refresh code. It uses a single background thread to refresh index across all consuming segments. We need to make the refresh multi threaded.

@siddharthteotia
Copy link
Contributor

Let me think a little bit on the details of code changes. We can sync up then.

@siddharthteotia
Copy link
Contributor

siddharthteotia commented Jan 15, 2021

Also the issue is not due to the delay in getting realtime segment ingested. This is a lag/consistency issue in the text index. We need to make the refresh faster to reduce the lag. Data is getting consumed correctly and timely into the segment

@siddharthteotia
Copy link
Contributor

The issue is we are seeing a delay in getting realtime segment ingested. in this case it is about 3 minutes.
As per @siddharthteotia suggestion I tried to reduce the number of columns on the realtime table to make the text index processing more faster. However reducing the text index columns are having no effect.

I had suggested to reduce the number of column s with text index. Not to reduce the number of columns in the table. This can potentially make the refresh code faster. How did you test this?

@opschronicle
Copy link
Author

opschronicle commented Jan 15, 2021

@siddharthteotia ,sorry for the misleading comment(corrected my comment above). I did not modified the table . I tested by reducing number of columns with text index. I deleted the text index definitions for the all columns except one i.e. log.

The Steps I followed:
a) Deleted the complete table config (which deleted all Segments)
b) Restarted the server
c) Executed the new config that have only single text index column.
d) Pinot started to create segments from the topic
e) Waited for sometime for Pinot to catch up with topic
f) Executed queries to check lag

However it is not making any difference. The lag still remains the same.

But, when I reduced the number of rows per segment it does reduced the lag time. However the query performance in searches reduced slightly. I think it is because of lots of small segments. So it will be better to fix the lag time code.

@opschronicle
Copy link
Author

@siddharthteotia If I understand correctly the issue is because of the way how text index scans the CONSUMING segments. it keeps refreshing the entire CONSUMING segments. If the segment size is large then the refreshing takes longer which causes the lag. Please correct me if I am wrong.

@opschronicle opschronicle changed the title Text Index not working correctly Text Index lag time for realtime consuming segments Jan 16, 2021
@opschronicle
Copy link
Author

opschronicle commented Jan 25, 2021

@siddharthteotia
I dont know much abt Pinot code , However it seems like
MutableSegmentImpl.addNewRow() calls PhysicalColumnIndexContainer._xxxxxIndex to fetch the Index for all type of indices.

PhysicalColumnIndexContainer._invertedIndex is pulled from PinotDataBuffer (segmentReader.getIndexFor()) which I assume is memory or Consuming segments and it is same for _fstIndex, _rangeIndex, _h3Index, _jsonIndex etc..

But for _textIndex it is loading from segmentIndexDir which is Physical file or I guess Completed Segments.

So we might have to create a new constructor for LuceneTextIndexReader
new LuceneTextIndexReader(segmentReader.getIndexFor(columnName, ColumnIndexType.TEXT_INDEX));
to read from Buffer ?

Do you think this make any sense?

@siddharthteotia
Copy link
Contributor

I have a proposal to fix this. This is essentially an enhancement. Let's meet and discuss. Happy to help you make the code changes else I will find time to make the changes. In any case, let's meet.

@siddharthteotia
Copy link
Contributor

Sync'd up with @pabrahamusa .

There was a bug in realtime text index. Had provided a custom build to Matt with fix and more logging. The lag issue was fixed with the build. Matt is doing some more testing.

@mcvsubbu
Copy link
Contributor

@siddharthteotia can you briefly describe the bug? I thought we had handled text index from within mutable segment?
In any case, we should add a test case for it.

@opschronicle
Copy link
Author

opschronicle commented Jan 31, 2021

@siddharthteotia Afraid to say that further testing revealed the issue is not fully fixed. The lag time is ~20 - 40 seconds when the segment row count is 100000. However, when I bumped it to 900000 the lag time steeped up to ~7 - 9 minutes..! . Please note that inverted index is always giving stable results ~10 secs no matter the row count of segments. It will be great if we fix the bug in a way that text index also behaves the same as inverted index.

@siddharthteotia
Copy link
Contributor

siddharthteotia commented Feb 1, 2021

We can go through the logs tomorrow based on the new build that I gave with additional logging and debug further. I thought you wanted to reduce the lag from 3 mins to 30secs or so which the new build has done.

Also, your comparison is not fair. If your functionality can be achieved with inverted index, then text index should not be used at all. Inverted index should always be the first choice..

But on the log column in your schema, you can't really use inverted index for the kind of search you want to perform.

Secondly, arbitrarily increasing the number of rows in consuming segment isn't a good idea either. If 100k is giving the optimal performance, my suggestion would be to stick with it.

@opschronicle
Copy link
Author

@siddharthteotia, we can go through this and I am sure we can get this fixed. Also the fix you provided last week is working as expected for ~100K rows and thanks a lot for that.

From my preliminary testing query search performance appears same for both text index and inverted index. it is just the NRT lag that have the diff.

The queries I tried with inverted index is with term query or exact match i.e

select * from mytable  where attr_one='value1' and attr_two = 'value2'  order by timemillis desc

Unfortunately I cannot use this inverted index search as my requirement is to make a proper text search on a column. As I am migrating from Elastic Search I can use the Lucene index regex patterns supported by pinot text index which makes things easier for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants