Skip to content

[BUG] Frequent stats calls causing memory mapped segments to bloat up #19482

@Bukhtawar

Description

@Bukhtawar

Describe the bug

Noticed in one of the clusters that the VmSize from /proc/<pid>/status is going upto 200 TB possibly exhausting the virtual address space and that the maps show repeated entries(555) for the same compound files(approx 200 MB)

 cat /proc/$(cat <pid>/maps| grep "indices"| awk '{print $6}'| sort -nr | uniq -c | sort -n| head -3

    555 /hdd1/mnt/env/root/data/nodes/0/indices/dFQt4piDT-SHm3_0mzuGxA/4/index/_2u1d.cfs
    555 /hdd1/mnt/env/root/data/nodes/0/indices/sLwKTsOJRpO7ukMmIkmlxA/6/index/_2n86.cfs
    555 /hdd1/mnt/env/root/data/nodes/0/indices/sLwKTsOJRpO7ukMmIkmlxA/6/index/_2n88.cfs

On taking a look looks like the stats calls are periodically trying to mmap the cfs files, using an IOContext.DEFAULT specified in the Lucene's CompoundReader implementation

2025-09-30T08:34:08,286][WARN ][o.o.i.e.Engine           ][9716ea11db09a681400a3a17bfa977be]  [logs-2025.09.25][0]Error when opening compound reader for Directory [store(ByteSizeCachingDirectory(OpensearchDirectory(HybridDirectory@/hdd1/mnt/env/root/data/nodes/0/indices/sJzkGGh_RzCloFt7cjK3Xw/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@15a9506)))] and SegmentCommitInfo [_2qs3(10.2.1):c65358:[diagnostics={os=Linux, timestamp=1758844782380, mergeMaxNumSegments=-1, lucene.version=10.2.1, source=merge, os.arch=aarch64, java.runtime.version=21.0.8+9-LTS, mergeFactor=10}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}] :id=ciiiy8pv84l0jeda3ormywnaq]
java.io.IOException: Map failed: MemorySegmentIndexInput(path="/hdd1/mnt/env/root/data/nodes/0/indices/sJzkGGh_RzCloFt7cjK3Xw/0/index/_2qs3.cfs") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 53462188 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'), and 'sysctl vm.max_map_count'. More information: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
        at java.base/sun.nio.ch.FileChannelImpl.mapInternal(FileChannelImpl.java:1319)
        at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1218)
        at org.apache.lucene.store.MemorySegmentIndexInputProvider.map(MemorySegmentIndexInputProvider.java:123)
        at org.apache.lucene.store.MemorySegmentIndexInputProvider.openInput(MemorySegmentIndexInputProvider.java:68)
        at org.apache.lucene.store.MemorySegmentIndexInputProvider.openInput(MemorySegmentIndexInputProvider.java:32)
        at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:268)
        at org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:173)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
        at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
        at org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.<init>(Lucene90CompoundReader.java:78)
        at org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat.getCompoundReader(Lucene90CompoundFormat.java:86)
        at org.opensearch.index.engine.Engine.getSegmentFileSizes(Engine.java:991)
        at org.opensearch.index.engine.Engine.fillSegmentStats(Engine.java:977)
        at org.opensearch.index.engine.Engine.segmentsStats(Engine.java:933)
        at org.opensearch.index.shard.IndexShard.segmentStats(IndexShard.java:1651)
        at org.opensearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:230)
        at org.opensearch.indices.IndicesService.indexShardStats(IndicesService.java:886)
        at org.opensearch.indices.IndicesService.statsByShard(IndicesService.java:837)
        at org.opensearch.indices.IndicesService.stats(IndicesService.java:825)
        at org.opensearch.node.NodeService.stats(NodeService.java:253)
        at org.opensearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:105)
        at org.opensearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:200)
        at org.opensearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:332)
        at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118)
        at org.opensearch.performanceanalyzer.transport.RTFPerformanceAnalyzerTransportRequestHandler.messageReceived(RTFPerformanceAnalyzerTransportRequestHandler.java:92)
        at org.opensearch.wlm.WorkloadManagementTransportInterceptor$RequestHandler.messageReceived(WorkloadManagementTransportInterceptor.java:63)
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:108)
        at org.opensearch.transport.TransportService$7.doRun(TransportService.java:1048)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

It's possible we are leaking an IndexInput but since this code path hasn't been touched in a while, it makes me feel there is some issue with the index input for compound files

Tagging @ashking94 @uschindler @mch2 @shourya035 for thoughts on the same.

Related component

No response

To Reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Memory to stay flat

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinglucene

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions