This repository contains a dataset with CO₂ concentration, ventilation flow rate and occupancy data of 3 office spaces at Windesheim University of Applied Sciences collected for the Brains4Buildings project.
- General info
- Data management
- Subject recruitment
- Inclusion criteria
- Metadata
- Data
- Status
- License
- Credits
For the Brains4Buildings project, we collected data about CO₂ concentration, ventilation flow rate and occupancy for several weeks between October 10 and November 2, 2022. This data was collected to answer two research questions:
A. To what extent can we reliably derive the ventilation flow rate in a room from:
- Measured time series data about the occupancy in that room, and
- Measured data about the CO₂ concentration in a room?
B. To what extent can we reliably derive occupancy in a room based on:
- Measured data about the ventilation flow rate in a room, and
- Measured data about the CO₂ concentration in a room?
Before we started recruitment, we requested and obtained approval for our study from Windesheim Research Ethics Committee, based on a description of the research, the privacy policy and Data Management Policy.
Subjects were recruited via a recruitment e-mail targeted at people known to work in office rooms that satisfied the inclusion criteria for office rooms.
Inclusion criteria for office rooms at Windesheim were:
- PIR-based occupancy data, CO₂ concentration data and ventilation data was available via the existing Building Management System (BMS);
- 75% or more of the occupants of an office room must consent to their presence being tracked.
Inclusion criteria for subjects were:
- subjects provided informed by filling out an online recruitment survey, (also available in Qualtrics qsf-format), which also referred to the privacy policy and verified the inclusion criteria below;
- subjects must work at Windesheim University of Applied Sciences in one of the eligible office rooms, typically for more than one hour per week;
- subjects must have a smartphone running Android or an Apple iPhone;
- subjects must give the static Bluetooth MAC address of their smartphone to the researchers for the purpose of the research;
- subjects must be willing to turn / leave on their Bluetooth on their smartphone when they were at Windesheim.
We only installed a Twomes M5Stack CoreInk + SCD41 measurement device in an office room if more than 75% of the known number of regular office room occupants provided informed consent. We never tracked presence via Bluetooth without informed consent; the measurement devices we use are technically incapable of tracking Bluetooth based presence without a static Bluetooth MAC-address.
The file b4b-room-metadata.zip contains metadata that may be needed for analysis for each of the three rooms in the open dataset.
id | #work-places | #occupants tracked via Bluetooth1 | room__m3 2 | vent_max__m3_h_1 3 |
---|---|---|---|---|
999169 | 6 | 5 | 75 | 240 |
917810 | 6 | 2 | 75 | 240 |
925038 | 4 | 3 | 60 | 210 |
1: The coverage of Bluetooth based presence detection in room 917810 is very low compared to the number of work places, making succesfull analysis based on this occupancy data source unlikely for this room.
2, 3: We rounded room__m3
, the room volume in m3, to the nearest 5 m3, and vent_max__m3_h_1
, the maximum ventilation flow rate of the room in m3/h, to the nearest 30 to the nearest 5 m3/h we can guarantee room privacy. In particular, we chose a level of privacy for the room that is equivalent to the level of privacy required for persons participating in medical research in the Netherlands, i.e. the chance of re-identification should be less than 9%.
In the sections below, the data pre-processing and data formats used in the data files will be described.
We used the following measurement device types and data sources to collect data.
Source/Device type name | Description | Open source repository of device |
---|---|---|
CO2-meter-SCD4x |
M5Stack CoreInk SCD41 measurement device for CO₂ and occupancy | twomes-scd41-presence-firmware |
bms |
building management system | |
xovis |
Xovis PC2SE 3D sensor | |
human_observer |
data collected by human observers |
All timestamps collected by the CO2-meter-SCD4x
devices were measured in Unix time format, using device clocks synchronized via NTP with the correct UTC time, immediately after the measuement device was installed and connected to the itnernet and every 6 hours after that. Uploads of measurement data (which could contain more than one measurement) were timestamped both by the measurement device according to the local device clock and by the server. We did not yet check for deviations between the last device timestamp of a measurement upload and the upload timestamp at the server.
Timestamps of bms
and xovis
sources were recorded in the Europe/Amsterdam time zone.
Timestamps collected by human observers were also registered in the Europe/Amsterdam time zone.
Timestamps were converted to a time zone aware pandas.Timestamp
value, in the Europe/Amsterdam time zone. In the csv files we use ISO 8601 format with time offset: YYYY-MM-DDThh:mm:ss±hhmm
.
Raw masurements will be available in the folder /raw-measurements/ in three formats:
- b4b_raw_measurements.parquet: a single parquet file with data for all ids;
- nnnnnn_raw_measurements.parquet: 3 parquet files, one for each id;
- nnnnnn_raw_measurements.zip: 3 zipped csv files, one for each id;
All measurement data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame with the recommended indices and data types.
Alternatively, you can also read the zipped csv files, but this typically takes much longer.
Index/Column | Name | Type | Description |
---|---|---|---|
index | id |
category |
unique code of the home |
index | device_name |
category |
unique name of the measurement device |
index | source |
category |
device type name of the measurement device |
index | timestamp |
Timestamp |
start of the interval (time zone aware) |
index | property |
category |
property name of the measurement |
column | value |
object |
value of the measurement |
column | unit |
category |
unit of the measurement value |
In the folder /raw-properties/ we will make various measured properties available in an 'unstacked' format with each property in its own column and an appropriate datatype. Similar to measurements, we will make data available in three formats:
- b4b_raw_properties.parquet: a single parquet file with data for all ids;
- nnnnnn_raw_properties.parquet: 3 parquet files, one for each id;
- nnnnnn_raw_properties.zip: 3 zipped csv files, one for each id;
All property data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame with the recommended indices and data types.
Alternatively, you can also read the zipped csv files, but this typically takes much longer.
Index/Column | Name | Type | Description |
---|---|---|---|
index | id |
category |
unique code of the home |
index | source |
category |
device type name of the measurement device |
index | timestamp |
Timestamp |
start of the interval (time zone aware) |
column | property_1; see property table below | data_type_1 | measured value of this property |
column | property2 | data_type_2 | measured value of this property |
... | ... | ... | ... |
column | property_n | data_type_n | measured value of this property |
Below is a table that lists all properties that were measured, the data type in the raw-properties DataFrame, the measurement unit, the measurement interval, the source device and sensor that measured it, as well as the property name and value format as retrieved from the Twomes database.
Property | Pandas Type | Unit | Measurement interval [h:mm:ss] | Description | Source | Sensor | Database property | Database format |
---|---|---|---|---|---|---|---|---|
co2__ppm |
float32 |
ppm | 0:10:00 | CO₂ concentration | CO2-meter-SCD4x |
SCD41 | CO2concentration |
%u |
temp_in__degC |
float32 |
˚C | 0:10:00 | air temperature | CO2-meter-SCD4x |
SCD41 | roomTemp |
%.1f |
rel_humidity__0 |
float32 |
- | 0:10:00 | relative humidity | CO2-meter-SCD4x |
SCD41 | relativeHumidity |
%.1f |
occupancy__p |
Int8 |
- | 0:10:00 | number of smartphones responding to Bluetooth name request | CO2-meter-SCD4x |
ESP32 | countPresence |
%u |
co2__ppm |
float32 |
ppm | 0:01:00/1:00:00 | CO₂ concentration | bms |
|||
temp_in__degC |
float32 |
˚C | 0:01:00/1:00:00 | air temperature | bms |
|||
rel_humidity__0 |
float32 |
- | 0:01:00/1:00:00 | relative humidity | bms |
|||
valve_frac__0 1 |
float32 |
- | 0:01:00/1:00:00 | valve opening fraction* | bms |
|||
occupancy__bool |
Int8 |
0/1 | 0:01:00/1:00:00 | at least 1 person in the room? | bms |
PIR | ||
occupancy__p |
Int8 |
0 | 0:05:00/0:15:00 | number of persons in room | xovis |
PC2SE | ||
occupancy__p |
Int8 |
0/1 | a few times per week | number of persons in room | human_observer |
human observer | ||
window_open__bool |
Int8 |
0/1 | a few times per week | is at least 1 window in the room open? | human_observer |
human observer | ||
door_open__bool |
Int8 |
0/1 | a few times per week | is at least door in the room open? | human_observer |
human observer |
1: valve_frac__0
contains a fraction, i.e. a float between 0 and 1 that expressed the fraction of the maximum ventilation flow in that room;
- This value was derived from a voltage value registered by the Building Management System (BMS):
- The ventilation valves are physically constrained such that when the valve position voltage was registered at 0 V, the minimum opening of 20% is in effect (i.e.
valve_frac__0
= 0.20) and when the valve valve-position voltage of 10V means the valve is 100% open (i.e.valve_frac__0
= 1.00). - The ventilation system uses a regulator that makes sure there is a constant pressure, so the ventilation flow rate is not dependent on recent valve openings in nearby rooms, or only for very brief moments a few seconds, or in any case within a minute or so.
- We measured data in 6 rooms in 2 different buildings, each with a different BMS. For one building, the valve fraction data were only available for export for 24h after they were recorded. This initially implied we would not have retrospective
valve_frac__0
data for 4 of the 6 rooms we measured. Since the covid-19 pandemic, however, ventilation systems were set to max. At least, that was the intention. This was implemented by setting the ventilation setpoint to 400 ppm, implying thevalve_frac__0
= 1.00 at all times would be a reasonable assumption. Unfortunately, for 3 out of the 4 rooms in the building concerned, the room CO₂ sensors connected the bms were not calibrated properly: they often registered CO₂ concentration values well below 400 ppm, values that seem highly unlikely when looking at the Keeling Curve of the last 2 years. Therefore, we had to leave out 3 rooms from our analyses. We decided to leave them out of this open dataset as well.
Weather data was collected and geospatially interpolated using HourlyHistoricWeather from the Royal Netherlands Meteorological Institute (KNMI), based on average hourly values.
For geospatial interpolation of weather data we used lat, lon = 52.499255, 6.0765167
, the location of hogeschool Windesheim in Zwolle, the Netherlands. Average values were converted from the source units to the units as indicated in the table below.
Index/Column | Property | Type | Unit | Measurement interval [h:mm:ss] | Description | Source | Source property | Source value format | Source unit |
---|---|---|---|---|---|---|---|---|---|
index | timestamp |
Timestamp |
start of the measurement interval | KNMI | YYYMMDD , H |
H=1: 0:00:00 - 0:59:59; H=24: 23:00:00 - 23:59:59; | |||
column | temp_out__degC |
float32 |
°C | 1:00:00 | outdoor temperature | KNMI | T |
%d | 0.1 °C |
column | wind__m_s_1 |
float32 |
m/s | 1:00:00 | wind speed | KNMI | FH |
%d | 0.1 m/s |
column | ghi__W_m_2 |
float32 |
W/m2 | 1:00:00 | global horizontal irradiance | KNMI | Q |
%d | J/(h·cm2) |
Preprocessing of measurements from the measurement database was done using get_preprocessed_b4b_data(). Preprocessing steps include:
- Removal of duplicate measurements;
- Removal of CO₂ measurements that had no variation; this concerned one room that apparently had a faulty CO₂ sensor (the data for this room was already rejected for other reasons);
- Removal all CO₂ concentration measurements with a value of less than 5 ppm (several CO₂ measurement values from the BMS came in as 0 ppm, which clearly wrong);
- CO₂ baseline adjustment: per room and per measurement source, the minimum CO₂ measurement value was determined and subsequently all measurement values were raised by the same amount such that the minimum value would be to 415 ppm plus a margin (in this case 1 ppm). This preprocessing operation helps to counteract the effect of long term drift that some CO₂ sensors are subject to. Some CO₂ sensors provide automatic occasional recalibration to a pre-determined CO₂ level. Not all CO₂ sensors used in a study may have this feature, some may have this turned off (sometimes deliberately, to avoid sudden jumps). Some CO₂ sensor may have been calibrated once, but not all in the same circumstances;
- Interpolation of measurements to intervals of 15 minutes (no interpolation between measurements that were 90 minutes apart or more);
- All column values represent the average during the interval that starts at the timestamp indicated.
Index/ Column | Name | Type | Unit | Description | Calculation | Min | Max | Sigma | Minimum standard deviation |
---|---|---|---|---|---|---|---|---|---|
index | id |
Int16 |
unique code of the room | 900000 | 999999 | ||||
index | timestamp |
Timestamp |
start of the interpolated interval (time zone aware) | ||||||
column | co2__ppm |
float32 |
ppm | 5 | >0 |
Dataset is: collected, published as open data
This data is made available under the CC BY 4.0 by the Research group Energy Transition, Windesheim University of Applied Sciences
Data collection was a joint effort of:
- Henri ter Hofte · @henriterhofte · Twitter @HeNRGi
- Nick van Ravenzwaaij · @n-vr
- Engbert Nijboer
Thanks go to those who are the ultimate source of this dataset:
- all anonymous subjects who volunteered to make their measurement data available