Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant Occupancy Detector Status Evaluation #511

Open
kengodleskidot opened this issue Jan 8, 2025 · 7 comments
Open

Constant Occupancy Detector Status Evaluation #511

kengodleskidot opened this issue Jan 8, 2025 · 7 comments
Assignees

Comments

@kengodleskidot
Copy link
Contributor

To confirm our constant occupancy model int_diagnostics__constant_occupancy.sql is correctly diagnosing detectors we can spot check the Tableau dashboard (https://tableau.dot.ca.gov/t/Construction/views/PeMS_Detector_Status_QC-Detailed/PeMSDetectorDetailsQualityCheck) and see which detectors on which dates show a Constant status in the existing system while the modern system has a different status (typically "Good")

@thehanggit
Copy link
Contributor

thehanggit commented Jan 21, 2025

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

Image

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Image

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@kengodleskidot
Copy link
Contributor Author

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.

Image

However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.

Image

Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.

I am concerned that I am seeing a lot of `occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the int_diagnostics__constant_occupancy.sql model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference:
Image

I believe it would be beneficial to look at the five-minute data for detector_id = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only
Image
Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

@thehanggit
Copy link
Contributor

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.
Image
However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.
Image
Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.

I am concerned that I am seeing a lot of `occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the int_diagnostics__constant_occupancy.sql model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference: Image

I believe it would be beneficial to look at the five-minute data for detector_id = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only Image Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

In that case, would the logic for the condition be and? Since if the occupancy is zero but not null, 0 can not be excluded from this logic.

Image

After changing that, the status became good, which is consistent with current PeMS.

Image

@kengodleskidot
Copy link
Contributor Author

@kengodleskidot I dived into the diagnostic code to check the difference for "good" and "constant". I noticed that we used min function at line 58 in int_diagnostics__constant_occupancy.sql for occupancy_delta, which means as long as there is a 4-hour window remaining constant within that day, it would be diagnosed as constant. For example, the detector_id = 440253 in modern PeMS is diagnosed as Constant, whereas the current PeMS is diagnosed as Good on 2/13/2023.
Image
However, when I pulled out the sum_occupancy_delta table, the values abs_cal_occupancy_delta_summed column vary starting from 12:25:00, which means the occupancy has variance at that time. Then if we use min function, then min(abs_cal_occupancy_delta_summed) = 0 and we treat the detector status constant. That is one possible issue we can look into.
Image
Another issue might be: The rolling window function calculates a time window every 5 minute, should it be the case that for a time window 5:00 - 21:00, we calculate 4 data points from 5:00-9:00, 9:00-13:00, 13:00-17:00, 17:00-21:00 as metrics for detector status?

@thehanggit in regard to the rolling window function, we are looking to identify any sample window where there is a non-zero repeating occupancy. I do not believe we want to look at four distinct windows as mentioned in the last sentence.
I am concerned that I am seeing a lot of occupancy_avg' values of 0 which could affect the outcome even though there is a qualify statement in the [int_diagnostics__constant_occupancy.sql](https://github.com/cagov/caldata-mdsa-caltrans-pems/blob/main/transform/models/intermediate/diagnostics/int_diagnostics__constant_occupancy.sql) model that should exclude these values from the Constant status diagnostic check. I am including the Constant detector diagnostic summary logic for reference: ![Image](https://github.com/user-attachments/assets/938ec061-b933-45a2-9748-ce2802852048) I believe it would be beneficial to look at the five-minute data for detector_id` = 440253 for 2/13/2023 in Modern PeMS as well as in current PeMS. The difference in status could come down to a fundamental difference in how the 5-minute data is calculated between the two systems. We should verify that the logic in Modern PeMS is producing the detector status result we would expect because the underlying 5-minute data could be different. Our model is using non-imputed data only Image Whereas we are not sure if the existing system is doing the same or possibly using imputed data if there are holes.

In that case, would the logic for the condition be and? Since if the occupancy is zero but not null, 0 can not be excluded from this logic.

Image

After changing that, the status became good, which is consistent with current PeMS.

Image

Does the 5-minute data from Modern PeMS support a diagnostic test result of "Good"? If so, it would be good to confirm with @JamesSLogan that the change from "or" to "and" makes sense given the requirements of the diagnostic test.

@JamesSLogan
Copy link
Contributor

Yes, that makes sense. Values of 0 can slip through, currently, since 0 is not null.

If I'm understanding the requirement correctly, updating the occupancy_avg check to just (occupancy_avg > 0) would clarify/simplify things. This won't match null values, so no OR is needed.

@thehanggit
Copy link
Contributor

thehanggit commented Jan 23, 2025

Thank you @JamesSLogan, I agree with you using occupancy > 0 directly.

@kengodleskidot Another issue I found out is that in some cases, the numerical value of min_occupancy_delta would be extremely small but not equals to zero as attached. Not sure why this is happening, but it will still be treated as Good. I suggest setting up a threshold like 10e-5 to diagnose them as Constant. In that case, it will match the current PeMS status.

Image

@ian-r-rose
Copy link
Member

I agree that using occupancy_avg > 0 should fix things. Nice detective work, everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants