Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak: IDSelectorBatch + SearchParameters #2996

Open
vaaliferov opened this issue Aug 9, 2023 · 10 comments
Open

Memory leak: IDSelectorBatch + SearchParameters #2996

vaaliferov opened this issue Aug 9, 2023 · 10 comments
Assignees

Comments

@vaaliferov
Copy link

vaaliferov commented Aug 9, 2023

Summary

I'm trying to use IDSelectorBatch + SearchParameters, but it seems like there is a memory leak somewhere.

  • If I use IDSelectorBatch sequentually without calling SearchParameters after that, then everything is okay.
  • If I use IDSelectorBatch just once, and then call SearchParameters a few times, then everything is fine as well.
  • But if I call IDSelectorBatch and then SearchParameters sequentually a few times, then memory usage increases, and I can't find a way to free that extra memory.

Platform

OS: Ubuntu 20.04.1
Faiss version: faiss-cpu 1.7.4

Reproduction instructions

import gc
import faiss
import numpy as np

for _ in range(10):
    subset = np.arange(0, 5000000)
    sel = faiss.IDSelectorBatch(subset)
    params = faiss.SearchParameters(sel=sel)
    mem_usage = faiss.get_mem_usage_kb() / 1024 ** 2
    print(round(mem_usage, 2), end=' ')
    gc.collect()

Output

0.34 0.57 0.79 1.02 1.24 1.47 1.69 1.92 2.14 2.37
@mdouze
Copy link
Contributor

mdouze commented Aug 9, 2023

Thanks for the clean bug report...

@mdouze mdouze added the bug label Aug 16, 2023
@mdouze
Copy link
Contributor

mdouze commented Aug 16, 2023

I can repro

@mdouze
Copy link
Contributor

mdouze commented Aug 16, 2023

This test here is an error, since the values are not classes but instances.
https://github.com/facebookresearch/faiss/blob/main/faiss/python/class_wrappers.py#L1084C28-L1084C28
however, it does not explain the memory leak.

@mdouze
Copy link
Contributor

mdouze commented Aug 16, 2023

subset = np.arange(0, 5000000)
sel = faiss.IDSelectorBatch(subset)
sel.this.own()    # True: correct 
params = faiss.SearchParameters(sel=sel)
sel.this.own()   # False: why???

@mdouze
Copy link
Contributor

mdouze commented Aug 17, 2023

see #3007

@vaaliferov
Copy link
Author

@mdouze, thank you !

@dbalabka
Copy link

@mdouze , thank you a lot! What is a proximate time when you will be able to release this fix?

@dshkliarenko
Copy link

@mdouze I have installed nightly version with the fix, and yes in the scenario from repro instructions it got fixed, however, if I use different order of creation of selector and params I get the same issue:

`import gc
import faiss
import numpy as np

for _ in range(10):
params = faiss.SearchParameters()
subset = np.arange(0, 5000000)
params.sel = faiss.IDSelectorBatch(subset)
mem_usage = faiss.get_mem_usage_kb() / 1024 ** 2
print(round(mem_usage, 2), end=' ')
gc.collect()`

0.37 0.59 0.82 1.04 1.27 1.49 1.72 1.94 2.17 2.39

@algoriddle
Copy link
Contributor

We're looking into this. As a temporary workaround, try calling

params.sel.this.own(True)

after the creation of IDSelectorBatch.

@mdouze
Copy link
Contributor

mdouze commented Nov 15, 2023

Related to this SWIG issue swig/swig#2709

mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Jul 30, 2024
Summary:
Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

It is a confirmed and reproducible memory leak every time. There is a workaround. See the task comments.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--
The below command reproduces it and gets the SWIG typemap to activate.
Asked SWIG folks for help thus far to get the typemap to activate: swig/swig#2709.

`buck test faiss/tests:test_search_params -- test_ownership_2`

But I can't get the ownership to work in the typemap. Further investigation can see how we can get the ownership to work.

Differential Revision: D60151215
mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Aug 29, 2024
…h#3704)

Summary:
Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D60151215
mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Aug 29, 2024
…h#3704)

Summary:
Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Aug 29, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Sep 12, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Why change HNSW test?
--
This fix causes the HNSW test to fail with heap-use-after-free. This is because the index.storage.get_distance_computer() somehow gets freed during clone_index, but only when reassigning to the same variable like `index.storage = clone(index.storage)`. I checked in https://fburl.com/code/qw6fznjt, and it is non-null before returning on the CPP side.

After adding the temp variable, I also had to set `index.own_fields = False`, otherwise we get a heap-use-after-free again due to it being deleted already.

Differential Revision: D61992599
mnorris11 pushed a commit to mnorris11/faiss that referenced this issue Sep 12, 2024
…h#3810)

Summary:
Pull Request resolved: facebookresearch#3810

Pull Request resolved: facebookresearch#3704

Background
--
Issue: facebookresearch#2996

Prior attempted fix: https://github.com/facebookresearch/faiss/pull/3007/files#diff-d704f33c46a4ef2936c8cf2a66b05f1993e25e79ee5c19d4b63c4e0cf46b0a42

Partially copied from facebookresearch#3139 with an additional unit test.

It is a confirmed and reproducible memory leak every time. There is a workaround. See the comments on facebookresearch#2996.

Relevant SWIG docs: https://www.swig.org/Doc4.1/SWIGDocumentation.html#Typemaps_nn2

Current status
--

`buck test faiss/tests:test_search_params -- test_ownership_2`

Test prints:

without SWIG fix:
`[0.49, 0.82, 1.15, 1.39, 1.67, 1.88, 2.16, 2.36, 2.65, 2.85, 3.13, 3.34, 3.62, 3.82, 4.11, 4.31, 4.6, 4.8, 5.08, 5.28]`

with SWIG fix:
`[0.52, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71, 0.71]`

Note: This test is not ideal. Ideally we could enable ASAN to catch these memory leaks. But ASAN does not seem to catch these Python memory leaks. Example:
```
def test_ownership_3(self):
        d = 32
        quantizer = faiss.IndexFlatL2(d)
        quantizer.this.own(False)
```
The above test produces no ASAN error, even though the `quantizer` object leaks.

Why change HNSW test?
--
This fix causes the HNSW test to fail with heap-use-after-free. This is because the index.storage.get_distance_computer() somehow gets freed during clone_index, but only when reassigning to the same variable like `index.storage = clone(index.storage)`. I checked in https://fburl.com/code/qw6fznjt, and it is non-null before returning on the CPP side.

After adding the temp variable, I also had to set `index.own_fields = False`, otherwise we get a heap-use-after-free again due to it being deleted already.

Differential Revision: D61992599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants