Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filling np.array very slow because of Carbontracker #41

Closed
nfurnon opened this issue Jun 30, 2021 · 7 comments
Closed

Filling np.array very slow because of Carbontracker #41

nfurnon opened this issue Jun 30, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@nfurnon
Copy link

nfurnon commented Jun 30, 2021

Filling a pre-allocated array is slowed down by a factor of ~70 when using carbontracker. See minimum code below.
Am I doing anything wrong ? How can we avoid this ?

import time
import numpy as np
from carbontracker.tracker import CarbonTracker


def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data
    
if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')


    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tt = time.time()
    data = load_data(l, shape)
    print(f'With CT : {time.time() - tt} seconds')
@lfwa
Copy link
Owner

lfwa commented Jul 4, 2021

Hi nfurnon,

Thanks for your feedback, we appreciate it!

This looks like a bug that happens when CarbonTracker is instantiated and tracker.epoch_start() is not called directly afterwards. I have not yet found the problem within the code base. However, there exists a workaround by starting the tracker immediately after instantiating it, e.g.:

import time
import numpy as np
from carbontracker.tracker import CarbonTracker


def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data
    
if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')


    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tracker.epoch_start()
    tt = time.time()
    data = load_data(l, shape)
    tracker.epoch_end()
    print(f'With CT : {time.time() - tt} seconds')

Let me know if this helps.

@lfwa lfwa added the bug Something isn't working label Jul 4, 2021
@nfurnon
Copy link
Author

nfurnon commented Jul 6, 2021

Thank you for your answer. The time-consuming task was instantiating a Pytorch Dataset class, so I guess I can just instantiate CarbonTracker after it. But as for doing it right before calling tracker.epoch_start(), it does not seem possible since there will be the for i_epoch in ... line before. Unless I track the whole training process as one single epoch...

@lfwa
Copy link
Owner

lfwa commented Jul 6, 2021

It should be fine to have a for-loop after instantiation like we show in the example in the README.md. The problem is likely when more compute-intensive operations are done without starting the tracker.

@nfurnon
Copy link
Author

nfurnon commented Jul 6, 2021

OK, thank you !

@nfurnon nfurnon closed this as completed Jul 6, 2021
@lfwa
Copy link
Owner

lfwa commented Jul 6, 2021

Reopening this issue as a reminder that instantiating CarbonTracker and not starting the tracker using tracker.epoch_start() will slow down other code.

The issue will be closed once it is fixed.

@lfwa lfwa reopened this Jul 6, 2021
@rhosch97
Copy link

rhosch97 commented Jun 9, 2022

adding a time.sleep() with a small but big enough time (best value to be determined, I use a quite long time of 1ms but can be shorter I think) in the CarbonTrackerThread() (see below) solved the problem for me. probably avoids clogging up the CPU with billions of accesses to the self.measuring attribute :)
I'm not sure if this solution is scalable and fault proof for the whole tool, but it could be a hint !

def run(self):
        """Thread's activity."""
        try:
            self.begin()
            while self.running:
                if not self.measuring:
                    time.sleep(0.001)
                    continue
                self._collect_measurements()
                time.sleep(self.update_interval)

            # Shutdown in thread's activity instead of epoch_end() to ensure
            # that we only shutdown after last measurement.
            self._components_shutdown()
        except Exception as e:
            self._handle_error(e)

edit: replaced screen capture with code

@PedramBakh
Copy link
Collaborator

In response to feedback about performance slowdowns due to busy-waiting in the CarbonTrackerThread(), we've implemented changes in Release v1.2.0. We've transitioned to an event-based approach, enhancing performance. Thank you for drawing our attention to this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants