Constant Occupancy Detector Status Evaluation #511

kengodleskidot · 2025-01-08T20:10:36Z

To confirm our constant occupancy model int_diagnostics__constant_occupancy.sql is correctly diagnosing detectors we can spot check the Tableau dashboard (https://tableau.dot.ca.gov/t/Construction/views/PeMS_Detector_Status_QC-Detailed/PeMSDetectorDetailsQualityCheck) and see which detectors on which dates show a Constant status in the existing system while the modern system has a different status (typically "Good")

thehanggit · 2025-01-21T22:47:02Z

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

kengodleskidot · 2025-01-22T00:31:34Z

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.

I am concerned that I am seeing a lot of `occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the int_diagnostics__constant_occupancy.sql model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference:

I believe it would be beneficial to look at the five-minute data for detector_id = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only

Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

thehanggit · 2025-01-22T00:45:53Z

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.

I am concerned that I am seeing a lot of `occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the int_diagnostics__constant_occupancy.sql model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference:

I believe it would be beneficial to look at the five-minute data for detector_id = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

In that case, would the logic for the condition be and? Since if the occupancy is zero but not null, 0 can not be excluded from this logic.

After changing that, the status became good, which is consistent with current PeMS.

kengodleskidot · 2025-01-22T17:16:43Z

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.
I am concerned that I am seeing a lot of occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the [int_diagnostics__constant_occupancy.sql](https://github.com/cagov/caldata-mdsa-caltrans-pems/blob/main/transform/models/intermediate/diagnostics/int_diagnostics__constant_occupancy.sql) model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference: ![Image](https://github.com/user-attachments/assets/938ec061-b933-45a2-9748-ce2802852048) I believe it would be beneficial to look at the five-minute data for detector_id` = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

In that case, would the logic for the condition be and? Since if the occupancy is zero but not null, 0 can not be excluded from this logic.

After changing that, the status became good, which is consistent with current PeMS.

Does the 5-minute data from Modern PeMS support a diagnostic test result of "Good"? If so, it would be good to confirm with @JamesSLogan that the change from "or" to "and" makes sense given the requirements of the diagnostic test.

JamesSLogan · 2025-01-22T23:35:41Z

Yes, that makes sense. Values of 0 can slip through, currently, since 0 is not null.

If I'm understanding the requirement correctly, updating the occupancy_avg check to just (occupancy_avg > 0) would clarify/simplify things. This won't match null values, so no OR is needed.

thehanggit · 2025-01-23T00:51:17Z

Thank you @JamesSLogan, I agree with you using occupancy > 0 directly.

@kengodleskidot Another issue I found out is that in some cases, the numerical value of min_occupancy_delta would be extremely small but not equals to zero as attached. Not sure why this is happening, but it will still be treated as Good. I suggest setting up a threshold like 10e-5 to diagnose them as Constant. In that case, it will match the current PeMS status.

ian-r-rose · 2025-01-23T23:44:33Z

I agree that using occupancy_avg > 0 should fix things. Nice detective work, everyone!

kengodleskidot added this to the Data Quality Checks milestone Jan 8, 2025

kengodleskidot assigned thehanggit Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant Occupancy Detector Status Evaluation #511

Constant Occupancy Detector Status Evaluation #511

kengodleskidot commented Jan 8, 2025

thehanggit commented Jan 21, 2025 •

edited

Loading

kengodleskidot commented Jan 22, 2025

thehanggit commented Jan 22, 2025

kengodleskidot commented Jan 22, 2025

JamesSLogan commented Jan 22, 2025

thehanggit commented Jan 23, 2025 •

edited

Loading

ian-r-rose commented Jan 23, 2025

Constant Occupancy Detector Status Evaluation #511

Constant Occupancy Detector Status Evaluation #511

Comments

kengodleskidot commented Jan 8, 2025

thehanggit commented Jan 21, 2025 • edited Loading

kengodleskidot commented Jan 22, 2025

thehanggit commented Jan 22, 2025

kengodleskidot commented Jan 22, 2025

JamesSLogan commented Jan 22, 2025

thehanggit commented Jan 23, 2025 • edited Loading

ian-r-rose commented Jan 23, 2025

thehanggit commented Jan 21, 2025 •

edited

Loading

thehanggit commented Jan 23, 2025 •

edited

Loading