A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare
Raise awareness about the carbon footprint of machine learning methods and to encourage further optimization and the rationale use of AI-powered tools. This work advocates for sustainable AI and the rational use of IT systems.
- One hour of GPU load is equivalent to 112 gCO2eq
- 1 GB of data traffic through a data center is equivalent to 31 gCO2eq
Free software: MIT license
pip install cumulator
<- installs CUMULATOR
from cumulator import base
<- imports the script
cumulator = base.Cumulator()
<- creates an Cumulator instance
Measure cost of computations.
- Activate or deactivate chronometer by using
cumulator.on()
,cumulator.off()
whenever you perform ML computations (typically within each interation). It will automatically record each time duration incumulator.time_list
and sum it incumulator.cumulated_time()
. Then return carbon footprint due to all computations usingcumulator.computation_costs()
.
Measure cost of communications.
- Each time your models sends a data file to another node of the network, record the size of the file which is communicated (in kilo bytes) using
cumulator.data_transferred(file_size)
. The amount of data transferred is automatically recorded incumulator.file_size_list
and accumulated incumulator.cumulated_data_traffic
. Then return carbon footprint due to all communications usingcumulator.communication_costs()
.
Display your total carbon footprint
- Display the carbon footprint of your recorded actions with
cumulator.display_carbon_footprint()
:
>>>cumulator.display_carbon_footprint() ######## Overall carbon footprint: 3.14e+02 gCO2eq ######## Carbon footprint due to computations: 2.78e+02 gCO2eq Carbon footprint due to communications: 3.60e+01 gCO2eq
- You can also return the total carbon footprint as a number using
cumulator.total_carbon_footprint()
.
Default assumptions (can be manually modified for better estimation):
self.hardware_load = 250 / 3.6e6
<- computation costs: power consumption of a typical GPU in Watts converted to kWh/s
self.one_byte_model = 6.894E-8
<- communication costs: average energy impact of traffic in a typical data centers, kWh/kB
self.carbon_intensity = 447
<- conversion to carbon footprint: average carbon intensity value in gCO2eq/kWh in the EU in 2014
self.n_gpu = 1
<- number of GPU used in parallel
src/ ├── cumulator ├── base.py <- implementation of the Cumulator class └── bonus.py <- Impact Statement Protocol
@article{cumulator, title={A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare}, author={Tristan Trebaol, Mary-Anne Hartley, Martin Jaggi and Hossein Shokri Ghadikolaei}, journal={Infoscience EPFL: record 278189}, year={2020} }
- 18.06.2020: 0.0.6 update README.rst
- 11.06.2020: 0.0.5 add number of processors (0.0.4 failed)
- 08.06.2020: 0.0.3 added bonus.py carbon impact statement
- 07.06.2020: 0.0.2 added communication costs and cleaned src/
- 21.05.2020: 0.0.1 deployment on PypI and integration with Alg-E