anomalist: Open issues and enhancements #3436

pabloarosado · 2024-10-18T15:27:19Z

One-liner

Open issues and possible enhancements of Anomalist

Context & details

See more details in #3340

Open issues

(Unlikely to happen) If you filter by a certain indicator and then unselect its dataset, Anomalist fails.
(Unlikely to happen) If you have already started Anomalist, and want to add a new dataset to the list, if you press "Detect anomalies" nothing happens. The only option is to re-scan all datasets (which takes long).
Add a "hide anomaly" button to each anomaly.
Improvements on AI summary (see conversation):
- It currently speaks about "indicator 987654" blah blah. We'd need to tweak it to always say the indicator title.
- The returned info is not particularly insightful: "Indicator blah shows spikes" "indicator blah shows anomalies"... We may need to tweak the prompt a bit to make it more useful.
- It would be more useful if the AI summary was shown on the side, so the user can read it while also interacting with the filters to visualize the results.
Improve the Anomalist workflow. Mojmir's working on letting Anomalist be automatically triggered for any new datasets in a staging server. But the UI is not yet adapted accordingly. Also, clarify what happens if the user, e.g. adds a new dataset to the list.

lucasrodes · 2024-11-04T10:19:46Z

Other minor things to work on:

Documentation: Add instructions in https://docs.owid.io/projects/etl
AI/LLM:
- Current summary can be too lengthy. We should make the output simpler.
- Summaries are not stored. We should store them so we don't need to re-generate them multiple times.
- We could explore if there is a way to show the summary along with the anomalies. Currently, it lives in a modal and not next to the anomaly list.
Help text in app: Some widgets in Anomalist could benefit from a brief help text. E.g., the selection menu for 'Detectors' could explain what each detector is looking for. Or the selection menu for 'Sort by' could elaborate on what each score measures, etc.

paarriagadap · 2024-11-08T09:42:41Z

Hi! I wanted to share my experience with Anomalist in an edge case, the Multidimensional Poverty Index:
http://staging-site-global-mpi-2024/etl/wizard/anomalist?anomalist_datasets_selected=6125&anomalist_datasets_selected=6779

It's an edge case because the data shows indicators for only one year (current margin estimates) and for 2, 3, 4 years tops (harmonized over time). I expected to see a comparison of the old and new indicators in the latter category, but I can't see that. I suppose because there are no anomalies there, right?

But what I see are potential anomalies in the time change and Gaussian process checks that should probably not be there. The pictures above show minimal variations considered anomalies (perhaps it's a problem of how small the numbers are?) and also only for the indicators of the previous version of the dataset, 6125. I can't see the analysis for the updated dataset, even when I used Indicator Upgrader. Am I doing something wrong? Thanks!

pabloarosado · 2024-11-08T10:01:08Z

Hi @paarriagadap, thanks for reporting that.

Why you see anomalies of the old dataset

Normally, you don't need to include the old dataset (6125) to calculate the anomalies. Anomalist will automatically find if there is an old version of your new dataset.

Why version change anomalies do not appear

In principle, Anomalist will:

Check if indicator upgrader has been used. If so, it will use the mapping there to compare old and new indicators.
If there is no mapping, it will try to map indicators based on the short names (so, if indicators in the old and new datasets have the exact same names, they will be compared).

Have the names changed? If not, maybe you ran indicator upgrader after anomalist? (And therefore anomalist didn't know what to map).

Why you see anomalies that are not important

This is indeed a tricky case. If I understand correctly, you have an indicator with very little data, and the range of values is very small. So, even if the anomaly is relatively small, there is not enough context to realise that. Normally you would have data for many countries, and small anomalies would be small in scale, and their weighted score would hence be small.

Please let me know if you need more clarifications.

paarriagadap · 2024-11-08T10:38:18Z

Thanks @pabloarosado. Some additional comments here:

I am not sure if I changed the datasets selected in Anomalist, but if I go to the page it adds both datasets by default. It actually breaks when I only select the new dataset.
I might have opened Anomalist before upgrading indicators some days ago for curiosity, but then I migrated, so I would expect to see the mapping in Anomalist now. I have migrated indicators only once.
Yes, the names of the indicators in this new version are different (both title and the slug). This is because this time they have been constructed with dimensions.
Yes, each country series is very short and doesn't move that much. I also assume it could be tricky to analyze that.

pabloarosado · 2024-11-08T13:38:52Z

Hi @paarriagadap I have reset the anomalies table and recalculated them. Now it's working again.
There are a few small bugs there that will need to be fixed. For now, have a look and let me know if the results now make more sense. I've noticed that there are some "Missing point" anomalies. To begin with, no "Version change" anomalies appear, but this is because they do not pass the initial threshold. Once you lower the thresholds (click on "Advanced options" and bring all thresholds to zero), you can find some potential discrepancies. I hope that makes sense (and we'll have to deal with the bugs soon).

paarriagadap · 2024-11-08T15:06:24Z

Thank you, @pabloarosado! Most of the anomalies detected are expected, but now I can see them. I will take a closer look if there are any that are unexpected.

Marigold · 2024-11-08T15:36:51Z

@paarriagadap Thanks for giving it a try! It was very helpful in fixing all sorts of bugs and improving performance. You should give it a second try next week when we merge all of them.

pabloarosado · 2024-11-20T17:12:06Z

I'll drop here some ideas for future improvements that came up during the last data call:

Improve the AI summary, so that it could provide a human-readable list of anomalies (that we could easily send to data providers).
Have an "Export" button, so that, after you have filtered to have just the most important anomalies, you could export them (either as a csv, or something more visual, to share with data providers).
Add the possibility to facet lines by country. This could be useful when showing comparison of old and new versions for multiple countries. Maybe we could add the Settings button.

lucasrodes · 2024-11-22T16:36:50Z

Following up on #3436 (comment), I've added an 'export as csv' option and enabled faceting when there ar multiple timeseries added!

thanks @pabloarosado

stale · 2025-01-21T17:46:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pabloarosado added the wizard Issues related to wizard tool label Oct 18, 2024

pabloarosado assigned Marigold, lucasrodes and pabloarosado Oct 18, 2024

github-actions bot added the needs triage label Oct 18, 2024

pabloarosado mentioned this issue Oct 18, 2024

Create anomaly detection workflow #3340

Closed

3 tasks

pabloarosado added priority 3 - nice to have and removed needs triage labels Oct 31, 2024

This was referenced Nov 4, 2024

Add a streamlit interface to Wizard for anomaly detection #3369

Closed

Add an anomaly detection for standalone indicators #3370

Closed

pabloarosado mentioned this issue Nov 8, 2024

Fix multidimension variable load #3520

Closed

Marigold mentioned this issue Nov 8, 2024

🔨 Speedup anomalist #3517

Merged

lucasrodes mentioned this issue Nov 22, 2024

✨ wizard: anomalist #3598

Merged

stale bot added the wontfix This will not be worked on label Jan 21, 2025

Marigold added the pinned label Jan 22, 2025

stale bot removed the wontfix This will not be worked on label Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anomalist: Open issues and enhancements #3436

anomalist: Open issues and enhancements #3436

pabloarosado commented Oct 18, 2024 •

edited

Loading

lucasrodes commented Nov 4, 2024

paarriagadap commented Nov 8, 2024

pabloarosado commented Nov 8, 2024

paarriagadap commented Nov 8, 2024

pabloarosado commented Nov 8, 2024

paarriagadap commented Nov 8, 2024

Marigold commented Nov 8, 2024

pabloarosado commented Nov 20, 2024

lucasrodes commented Nov 22, 2024

stale bot commented Jan 21, 2025

anomalist: Open issues and enhancements #3436

anomalist: Open issues and enhancements #3436

Comments

pabloarosado commented Oct 18, 2024 • edited Loading

One-liner

Context & details

Open issues

lucasrodes commented Nov 4, 2024

paarriagadap commented Nov 8, 2024

pabloarosado commented Nov 8, 2024

Why you see anomalies of the old dataset

Why version change anomalies do not appear

Why you see anomalies that are not important

paarriagadap commented Nov 8, 2024

pabloarosado commented Nov 8, 2024

paarriagadap commented Nov 8, 2024

Marigold commented Nov 8, 2024

pabloarosado commented Nov 20, 2024

lucasrodes commented Nov 22, 2024

stale bot commented Jan 21, 2025

pabloarosado commented Oct 18, 2024 •

edited

Loading