Skip to content

Comments

[FileStore]: fix Use caching on demand to reduce memory usage#1850

Open
CLFutureX wants to merge 6 commits intoOpenHands:mainfrom
CLFutureX:feat_cache
Open

[FileStore]: fix Use caching on demand to reduce memory usage#1850
CLFutureX wants to merge 6 commits intoOpenHands:mainfrom
CLFutureX:feat_cache

Conversation

@CLFutureX
Copy link
Contributor

@CLFutureX CLFutureX commented Jan 29, 2026

Background:

Some FileStore usage scenarios actually don’t need caching.For example, _save_base_state() and _save_processing_history() cause extra memory usage.

Change:

Add direct write and direct read methods to enable on-demand caching.

Signed-off-by: CLFutureX <chenyongqyl@163.com>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Found several issues with the caching implementation that need to be addressed:

🔴 Critical Issues

1. Cache behavior doesn't respect cache=False in LocalFileStore.read() (line 76-80)

The cache is still consulted even when cache=False. Current flow:

  1. Check cache → return if found (ignores cache param)
  2. Read from disk
  3. Only cache if cache=True

This means if a file was previously cached, it will return stale data even when caching is disabled.

Fix: Move the cache check inside the conditional:

if cache and full_path in self.cache:
    return self.cache[full_path]

🟠 Important Issues

2. Breaking API change (LocalFileStore line 62)

The previous behavior was to always cache (implicit cache=True). Now the default is cache=False, which could cause performance regressions for existing code that hasn't been updated.

Options:

  • Change default to cache=True to maintain backward compatibility
  • Document this as a breaking change and audit all call sites
  • Verify there are no other callers besides EventStore

3. Missing documentation (base.py lines 17, 26)

The cache parameter needs docstring explaining:

  • When to use True vs False
  • That True caches in memory for faster subsequent reads
  • Use cases: True for frequently-read data (events), False for one-time reads (state snapshots)

🟡 Suggestions

4. Inconsistent default values

  • LocalFileStore: defaults to cache=False
  • MemoryFileStore: defaults to cache=True

This inconsistency could cause confusion. Consider standardizing.

5. MemoryFileStore ignores the cache parameter (lines 26, 31)

The parameter is added but not used since MemoryFileStore stores everything in memory by nature. Consider adding a comment explaining why it's ignored.

6. Verify the change achieves the stated goal

The PR description mentions reducing memory usage in _save_base_state() and _save_processing_history(). However, EventStore now passes cache=True for all writes. If those methods write events, this change won't help. Verify:

  • Do those methods use EventStore or FileStore directly?
  • Should event writes default to cache=False and only cache on read?

Recommendations

  1. Must fix: Issue #1 (critical cache bug)
  2. Should fix: Issues #2 and #3 (breaking change and docs)
  3. Nice to have: Issues #4, #5, #6 (consistency and verification)

Signed-off-by: CLFutureX <chenyongqyl@163.com>
Signed-off-by: CLFutureX <chenyongqyl@163.com>
This reverts commit b17bb5a.
Signed-off-by: CLFutureX <chenyongqyl@163.com>
history_file = f"{get_usermodeling_dir(self.user_id)}/processed_sessions_timestamps.json" # noqa: E501

self.file_store.write(history_file, json.dumps(history, indent=2))
self.file_store.directWrite(history_file, json.dumps(history, indent=2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm dense, if read/write get cache argument, do we need directRead/directWrite, isn't it with cache false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm dense, if read/write get cache argument, do we need directRead/directWrite, isn't it with cache false?

Yes, i initially planned to add the cache parameter. However, i found it would result in redundant parameters in InMemoryFileStore and cause validation failures. Also, to make the interface clearer and easier to use, i decided to add new methods instead.

Signed-off-by: CLFutureX <chenyongqyl@163.com>
@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @CLFutureX, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

1 similar comment
@all-hands-bot
Copy link
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @CLFutureX, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@neubig neubig self-requested a review February 12, 2026 00:18
Copy link
Contributor

neubig commented Feb 12, 2026

[automated message] @neubig assigned for review according to git blame

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @neubig, could you please take a look when you have a chance?

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CLFutureX , thank you for opening the PR. Before I fully assess the PR, could you please open an issue with a minimal reproducible example of the problem that you're trying to solve? For instance, a command line argument or script that runs the agent where you monitor the agent's memory and find that it is very large. You could then run the same command with this fix implemented and demonstrate that it significantly reduces memory.

That would help me (1) understand the scope of the problem, and (2) be able to confirm the error on my end as well.

Thank you! And once you are done with that, please click the "re-request review" button next to my name on the reviewers panel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants