Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatedly saving object increases program memory #33

Open
menshikh-iv opened this issue Apr 6, 2018 · 1 comment
Open

Repeatedly saving object increases program memory #33

menshikh-iv opened this issue Apr 6, 2018 · 1 comment
Labels

Comments

@menshikh-iv
Copy link
Contributor

Description

"I'm using bounter to count the frequency of items in a large set. I was periodically pickling the bounter object. Doing this causes the memory to continually increase" (based on https://groups.google.com/forum/#!topic/gensim/LsReiXXOzKY thread)

Steps/Code/Corpus to Reproduce

import pickle as pkl
from bounter import bounter
import numpy as np
import psutil
import gc


def get_used_memory():
    """
    Return the current am't of used memory, in GB
    """
    return '{:.3f}'.format(psutil.virtual_memory().used / 1024.0 / 1024.0 / 1024.0)


def log(msg):
    print(msg, ', memory =', get_used_memory())


def main():
    log('Starting with np array')
    a = np.random.randint(0, 512, (8, 33554432), dtype='int32')
    log('Initialized array')
    for i in range(6):
        with open('array.pkl', 'wb') as f:
            pkl.dump(a, f, protocol=pkl.HIGHEST_PROTOCOL)
            log('Finished saving the ' + str(i) + 'th copy of the array')
    del a
    gc.collect()
    log('deleted array and performed gc.collect() ')


    counter = bounter(size_mb=1024, need_iteration=False, log_counting=1024)
    log('Initialized counter')
    for i in range(6):
        with open('counter.pkl','wb') as f:

            pkl.dump(counter, f, protocol=pkl.HIGHEST_PROTOCOL)
            log('Finished saving the ' + str(i) + 'th copy of the bounter')

    del counter
    gc.collect()
    log('deleted array and performed gc.collect() ')
    log('Finished')


if __name__ == '__main__':
    main()

Expected Results

Memory shouldn't increase significantly after each dump

Actual Results

I get the resulting log statements along with the two pkl files each 1.1 GB in size:

('Starting with np array', ', memory =', '3.539')
('Initialized array', ', memory =', '4.540')
('Finished saving the 0th copy of the array', ', memory =', '4.540')
('Finished saving the 1th copy of the array', ', memory =', '4.544')
('Finished saving the 2th copy of the array', ', memory =', '4.549')
('Finished saving the 3th copy of the array', ', memory =', '4.549')
('Finished saving the 4th copy of the array', ', memory =', '4.553')
('Finished saving the 5th copy of the array', ', memory =', '4.562')
('deleted array and performed gc.collect() ', ', memory =', '3.561')
('Initialized counter', ', memory =', '3.561')
('Finished saving the 0th copy of the bounter', ', memory =', '4.567')
('Finished saving the 1th copy of the bounter', ', memory =', '5.573')
('Finished saving the 2th copy of the bounter', ', memory =', '6.577')
('Finished saving the 3th copy of the bounter', ', memory =', '7.576')
('Finished saving the 4th copy of the bounter', ', memory =', '8.579')
('Finished saving the 5th copy of the bounter', ', memory =', '9.582')
('deleted array and performed gc.collect() ', ', memory =', '9.580')
('Finished', ', memory =', '9.580')

Here, I see 2 suspicious places, first with memory increasing

('Finished saving the 0th copy of the bounter', ', memory =', '4.567')
('Finished saving the 1th copy of the bounter', ', memory =', '5.573')
('Finished saving the 2th copy of the bounter', ', memory =', '6.577')
('Finished saving the 3th copy of the bounter', ', memory =', '7.576')
('Finished saving the 4th copy of the bounter', ', memory =', '8.579')
('Finished saving the 5th copy of the bounter', ', memory =', '9.582')

and the second one (that looks like memory-leak)

('Finished saving the 5th copy of the bounter', ', memory =', '9.582')
('deleted array and performed gc.collect() ', ', memory =', '9.580')
('Finished', ', memory =', '9.580')

Versions

Centos 7
python = 3.6.1
bounter = 1.0.1
numpy = 1.14.2
@menshikh-iv menshikh-iv added the bug label Apr 6, 2018
@jonsnowseven
Copy link

Hello. I am having the same problem...

Any workaround or solution available?

Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants