Filling np.array very slow because of Carbontracker #41

nfurnon · 2021-06-30T16:21:44Z

Filling a pre-allocated array is slowed down by a factor of ~70 when using carbontracker. See minimum code below.
Am I doing anything wrong ? How can we avoid this ?

import time
import numpy as np
from carbontracker.tracker import CarbonTracker


def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data
    
if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')


    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tt = time.time()
    data = load_data(l, shape)
    print(f'With CT : {time.time() - tt} seconds')

The text was updated successfully, but these errors were encountered:

lfwa · 2021-07-04T10:54:07Z

Hi nfurnon,

Thanks for your feedback, we appreciate it!

This looks like a bug that happens when CarbonTracker is instantiated and tracker.epoch_start() is not called directly afterwards. I have not yet found the problem within the code base. However, there exists a workaround by starting the tracker immediately after instantiating it, e.g.:

import time
import numpy as np
from carbontracker.tracker import CarbonTracker


def load_data(length, data_shape):
    data = np.zeros((length, *data_shape))
    for i in range(length):
        data[i] = np.random.random(data_shape)
    return data
    
if __name__ == '__main__':
    l = 10000
    shape = (16000, )
    tt = time.time()
    data = load_data(l, shape)
    print(f'Without CT : {time.time() - tt} seconds')


    tracker = CarbonTracker(epochs=1, monitor_epochs=1, log_dir='./')
    tracker.epoch_start()
    tt = time.time()
    data = load_data(l, shape)
    tracker.epoch_end()
    print(f'With CT : {time.time() - tt} seconds')

Let me know if this helps.

nfurnon · 2021-07-06T06:19:34Z

Thank you for your answer. The time-consuming task was instantiating a Pytorch Dataset class, so I guess I can just instantiate CarbonTracker after it. But as for doing it right before calling tracker.epoch_start(), it does not seem possible since there will be the for i_epoch in ... line before. Unless I track the whole training process as one single epoch...

lfwa · 2021-07-06T07:53:50Z

It should be fine to have a for-loop after instantiation like we show in the example in the README.md. The problem is likely when more compute-intensive operations are done without starting the tracker.

nfurnon · 2021-07-06T08:21:40Z

OK, thank you !

lfwa · 2021-07-06T18:13:09Z

Reopening this issue as a reminder that instantiating CarbonTracker and not starting the tracker using tracker.epoch_start() will slow down other code.

The issue will be closed once it is fixed.

rhosch97 · 2022-06-09T09:27:48Z

adding a time.sleep() with a small but big enough time (best value to be determined, I use a quite long time of 1ms but can be shorter I think) in the CarbonTrackerThread() (see below) solved the problem for me. probably avoids clogging up the CPU with billions of accesses to the self.measuring attribute :)
I'm not sure if this solution is scalable and fault proof for the whole tool, but it could be a hint !

def run(self):
        """Thread's activity."""
        try:
            self.begin()
            while self.running:
                if not self.measuring:
                    time.sleep(0.001)
                    continue
                self._collect_measurements()
                time.sleep(self.update_interval)

            # Shutdown in thread's activity instead of epoch_end() to ensure
            # that we only shutdown after last measurement.
            self._components_shutdown()
        except Exception as e:
            self._handle_error(e)

edit: replaced screen capture with code

PedramBakh · 2023-09-11T16:08:50Z

In response to feedback about performance slowdowns due to busy-waiting in the CarbonTrackerThread(), we've implemented changes in Release v1.2.0. We've transitioned to an event-based approach, enhancing performance. Thank you for drawing our attention to this matter.

lfwa added the bug Something isn't working label Jul 4, 2021

nfurnon closed this as completed Jul 6, 2021

lfwa reopened this Jul 6, 2021

PedramBakh closed this as completed Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filling np.array very slow because of Carbontracker #41

Filling np.array very slow because of Carbontracker #41

nfurnon commented Jun 30, 2021

lfwa commented Jul 4, 2021 •

edited

Loading

nfurnon commented Jul 6, 2021

lfwa commented Jul 6, 2021 •

edited

Loading

nfurnon commented Jul 6, 2021

lfwa commented Jul 6, 2021

rhosch97 commented Jun 9, 2022 •

edited

Loading

PedramBakh commented Sep 11, 2023

Filling np.array very slow because of Carbontracker #41

Filling np.array very slow because of Carbontracker #41

Comments

nfurnon commented Jun 30, 2021

lfwa commented Jul 4, 2021 • edited Loading

nfurnon commented Jul 6, 2021

lfwa commented Jul 6, 2021 • edited Loading

nfurnon commented Jul 6, 2021

lfwa commented Jul 6, 2021

rhosch97 commented Jun 9, 2022 • edited Loading

PedramBakh commented Sep 11, 2023

lfwa commented Jul 4, 2021 •

edited

Loading

lfwa commented Jul 6, 2021 •

edited

Loading

rhosch97 commented Jun 9, 2022 •

edited

Loading