Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-11277 Tree store: fix memory usage and support concurrent indexing #1873

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

thomasmueller
Copy link
Member

The PR fixes a bunch of issues:

  • IndexMeta: when re-indexing multiple indexes, each writer that tries to open a writer first reads all metadata files (including those of concurrently added files). The thread is only interested in the current index, and all other indexes are then ignored - and on this index we synchronized. However, the problem is: another thread might concurrently create another index, whose metadata file could be empty when this thread reads it. So protection against that is needed.
  • The OakDirectory uses a ConcurrentHashSet, however it doesn't properly synchronize on the node builder.
  • The MultiplexingIndexWriter doesn't support concurrent access (unlike the default index writer). This was not detected so far because multi-threaded indexing was usually only used for one single index. I tries reindexing all indexes with many threads, which uncovered this issue. (First, it is using a regular HashMap instead of a concurrent one... but more importantly, it could concurrently create two writers).
  • The PipelinedTreeStoreStrategy doesn't support filtering yet, unlike the regular pipelined strategy.
  • TreeStore memory usage: the cache size calculation was wrong: it multiplied by the size factor twice. This could result in out-of-memory.
  • The FulltextBinaryTextExtractor didn't properly support concurrent initialization, leading to a NullPointerException.

For the concurrency issues, I added tests. They are pretty fast, because the corruption happens in memory and not on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant