Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anomalist: Open issues and enhancements #3436

Open
5 tasks
pabloarosado opened this issue Oct 18, 2024 · 9 comments
Open
5 tasks

anomalist: Open issues and enhancements #3436

pabloarosado opened this issue Oct 18, 2024 · 9 comments
Assignees
Labels
priority 3 - nice to have wizard Issues related to wizard tool

Comments

@pabloarosado
Copy link
Contributor

pabloarosado commented Oct 18, 2024

One-liner

Open issues and possible enhancements of Anomalist

Context & details

See more details in #3340

Open issues

  • (Unlikely to happen) If you filter by a certain indicator and then unselect its dataset, Anomalist fails.
    Image

  • (Unlikely to happen) If you have already started Anomalist, and want to add a new dataset to the list, if you press "Detect anomalies" nothing happens. The only option is to re-scan all datasets (which takes long).
    Image

  • Add a "hide anomaly" button to each anomaly.

  • Improvements on AI summary (see conversation):

    • It currently speaks about "indicator 987654" blah blah. We'd need to tweak it to always say the indicator title.
    • The returned info is not particularly insightful: "Indicator blah shows spikes" "indicator blah shows anomalies"... We may need to tweak the prompt a bit to make it more useful.
    • It would be more useful if the AI summary was shown on the side, so the user can read it while also interacting with the filters to visualize the results.
  • Improve the Anomalist workflow. Mojmir's working on letting Anomalist be automatically triggered for any new datasets in a staging server. But the UI is not yet adapted accordingly. Also, clarify what happens if the user, e.g. adds a new dataset to the list.

@lucasrodes
Copy link
Member

Other minor things to work on:

  • Documentation: Add instructions in https://docs.owid.io/projects/etl
  • AI/LLM:
    • Current summary can be too lengthy. We should make the output simpler.
    • Summaries are not stored. We should store them so we don't need to re-generate them multiple times.
    • We could explore if there is a way to show the summary along with the anomalies. Currently, it lives in a modal and not next to the anomaly list.
  • Help text in app: Some widgets in Anomalist could benefit from a brief help text. E.g., the selection menu for 'Detectors' could explain what each detector is looking for. Or the selection menu for 'Sort by' could elaborate on what each score measures, etc.

@paarriagadap
Copy link
Contributor

Hi! I wanted to share my experience with Anomalist in an edge case, the Multidimensional Poverty Index:
http://staging-site-global-mpi-2024/etl/wizard/anomalist?anomalist_datasets_selected=6125&anomalist_datasets_selected=6779

It's an edge case because the data shows indicators for only one year (current margin estimates) and for 2, 3, 4 years tops (harmonized over time). I expected to see a comparison of the old and new indicators in the latter category, but I can't see that. I suppose because there are no anomalies there, right?

But what I see are potential anomalies in the time change and Gaussian process checks that should probably not be there. The pictures above show minimal variations considered anomalies (perhaps it's a problem of how small the numbers are?) and also only for the indicators of the previous version of the dataset, 6125. I can't see the analysis for the updated dataset, even when I used Indicator Upgrader. Am I doing something wrong? Thanks!

Image
Image
Image

@pabloarosado
Copy link
Contributor Author

Hi @paarriagadap, thanks for reporting that.

Why you see anomalies of the old dataset

Normally, you don't need to include the old dataset (6125) to calculate the anomalies. Anomalist will automatically find if there is an old version of your new dataset.

Why version change anomalies do not appear

In principle, Anomalist will:

  1. Check if indicator upgrader has been used. If so, it will use the mapping there to compare old and new indicators.
  2. If there is no mapping, it will try to map indicators based on the short names (so, if indicators in the old and new datasets have the exact same names, they will be compared).

Have the names changed? If not, maybe you ran indicator upgrader after anomalist? (And therefore anomalist didn't know what to map).

Why you see anomalies that are not important

This is indeed a tricky case. If I understand correctly, you have an indicator with very little data, and the range of values is very small. So, even if the anomaly is relatively small, there is not enough context to realise that. Normally you would have data for many countries, and small anomalies would be small in scale, and their weighted score would hence be small.

Please let me know if you need more clarifications.

@paarriagadap
Copy link
Contributor

Thanks @pabloarosado. Some additional comments here:

  • I am not sure if I changed the datasets selected in Anomalist, but if I go to the page it adds both datasets by default. It actually breaks when I only select the new dataset.
  • I might have opened Anomalist before upgrading indicators some days ago for curiosity, but then I migrated, so I would expect to see the mapping in Anomalist now. I have migrated indicators only once.
  • Yes, the names of the indicators in this new version are different (both title and the slug). This is because this time they have been constructed with dimensions.
  • Yes, each country series is very short and doesn't move that much. I also assume it could be tricky to analyze that.

@pabloarosado
Copy link
Contributor Author

Hi @paarriagadap I have reset the anomalies table and recalculated them. Now it's working again.
There are a few small bugs there that will need to be fixed. For now, have a look and let me know if the results now make more sense. I've noticed that there are some "Missing point" anomalies. To begin with, no "Version change" anomalies appear, but this is because they do not pass the initial threshold. Once you lower the thresholds (click on "Advanced options" and bring all thresholds to zero), you can find some potential discrepancies. I hope that makes sense (and we'll have to deal with the bugs soon).

@paarriagadap
Copy link
Contributor

Thank you, @pabloarosado! Most of the anomalies detected are expected, but now I can see them. I will take a closer look if there are any that are unexpected.

@Marigold
Copy link
Collaborator

Marigold commented Nov 8, 2024

@paarriagadap Thanks for giving it a try! It was very helpful in fixing all sorts of bugs and improving performance. You should give it a second try next week when we merge all of them.

@pabloarosado
Copy link
Contributor Author

I'll drop here some ideas for future improvements that came up during the last data call:

  • Improve the AI summary, so that it could provide a human-readable list of anomalies (that we could easily send to data providers).
  • Have an "Export" button, so that, after you have filtered to have just the most important anomalies, you could export them (either as a csv, or something more visual, to share with data providers).
  • Add the possibility to facet lines by country. This could be useful when showing comparison of old and new versions for multiple countries. Maybe we could add the Settings button.

@lucasrodes
Copy link
Member

Following up on #3436 (comment), I've added an 'export as csv' option and enabled faceting when there ar multiple timeseries added!

thanks @pabloarosado

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority 3 - nice to have wizard Issues related to wizard tool
Projects
None yet
Development

No branches or pull requests

4 participants