Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 wizard: anomalist (v2) #3388

Merged
merged 36 commits into from
Oct 21, 2024
Merged

🎉 wizard: anomalist (v2) #3388

merged 36 commits into from
Oct 21, 2024

Conversation

lucasrodes
Copy link
Member

@lucasrodes lucasrodes commented Oct 9, 2024

How to work with this PR:

  • Branch out, and work on your changes
  • Create a PR against this branch; review & merge.
  • ⚠️ Before merging to master, revert changes in the dag, and remove new temporary steps that were created for testing purposes. Reject all chart diff changes.

@owidbot
Copy link
Contributor

owidbot commented Oct 9, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-wizard-anomalist

chart-diff: ❌
  • 0/97 reviewed charts
    • Modified: 0/97
    • New: 0/0

Edited: 2024-10-11 10:15:18 UTC
Execution time: 3.90 seconds

lucasrodes and others added 4 commits October 9, 2024 15:03
* 🎉 Add CLI for running anomaly detectors

* merge with wizard-anomalist, pass ci/cd

---------

Co-authored-by: lucasrodes <lucasrodes@users.noreply.github.com>
* Start a new staging server for branch 'variable-mapping'

* add to_sql

* define sqlite db name in variable

* new methods to store variable mapping

* force int if possible

* fix inifinte loop

* save variable mapping

* minor ui tweak

* add undo capabilities

* store var mapping
lucasrodes and others added 21 commits October 10, 2024 16:09
* ✨ wizard: anomalist ui

* rename file

* rename + tweak UI

* function to get variable uris from indicator list

* tweak config

* minor fixes

* demo

* org: folder for app

* ci/cd
* 🎉 anomalist: Detect new datasets automatically

* Add temporary duplicates of the energy and electricity mix datasets for testing purposes

* Add another temporary step

* Move common function to detect new datasets to utils cached

* Fix wrong mapping of dataset ids in indicator upgrader

* Edit dag and energy steps to be able to play around with mappings and anomalies

* Improve map_datasets

* Let anomalist detect new datasets and list them

* Cache inputs

* remove redundant code

---------

Co-authored-by: lucasrodes <lucasrodes@users.noreply.github.com>
…cator (#3368)

* ✨ wizard: anomalies

* wip

* bump streamlit

* wip

* wip: chart

* wip

* todo

* plot indicator

* re-structure

* wip: loading indicators

* fix API grapher_chart

* deprecate chart_html

* chart_html -> grapher_chart

* clean

* feature: Detect abrupt changes in consecutive versions of an indicator

* Improve compare_tables

* Add new BARD score and improve compare_tables

* ci/cd

* wip

* wip

* changed module name

* custom components module

* add methods to get uris

* get dataset uris

* update import

* update gpt pricing

* update import

* wip

* provide entity-context for anomaly

* wip: anomalist v2

* Implement detection of different kinds of anomaly types

* Rename script

* Rename script

* Rename script

* Create a class AnomalyDetector, simplify code

* Improve scores dataframe

* Rename score column

* wip

* wip

* Improve detection of abrupt changes in time series

* Add population score

* Create function to get views for a list of variables

* Add analytics score

* Improve anomaly aggregation

* Align with master

* Align with master

* Fix minor bug

* minor cleaning

* map entities only if explicitly asked

* reduce re-implemented functions

* avoid usage of get_connection

* Ignore formatting issues

---------

Co-authored-by: lucasrodes <lucasrodes@users.noreply.github.com>
* 🎉 anomalist: Improve anomalist CLI

* Allow for multiple anomalies, datasets and variable ids

* Fix small issues and let data loading use maximum number of workers
* ✨ anomalist: ui flog

* wip

* wip

* enable multiple indicator plot

* allow full entity mapping load

* bugfix

* polish demo

* ci/cd
* 🎉 anomalist: Improve Anomalist backend

* Improve types of anomaly_detection and cli

* Minor refactor and removing useless todo

* Move anomaly detection to a separate module

* Prevent Anomaly from failing if table already exists

* Big refactor to be able to add version change anomalies

* Rename anomalies

* Move detectors to a separate module

* Use entity_name instead of entity_id

* Convert to long format afterwards

* Pass data explicitly to generate scores df
* ✨ wizard: improve app flow

* add option to drop table when creating

* adapt to new api

* new function to create tables in anomalist

* improve comments

* checkfirst flag when creating table

* re-order code

* bug fixes in app flow

* improve pagination ui

* tweak internal grapher_chart flow

* entity selection

* module for chart configs

* adjust for indicator upgrades

* enable re-scan

* help text, anomaly types, upgrade anomalies
* ✨ Add GP outlier detector

* drop anomalies with zero values
* ✨ anomalist: stop using mock

* style

* ✨ anomalist: stop using mock data

* re-order mock data

* replace mock data with real data

* discard df if all-zero
* 🐛 anomalist: Fix unknown variable ids

* Fix missing variable ids when detecting anomalies in multiple datasets

* Update misleading comment
* ✨ anomalist: nits

* abstract df parsing logic

* add GP outlier

* add dfReduced to table

* reset index

* incorporate GP

* re-arrange functions, add link to indicator

* stop reducing dfScore
pabloarosado and others added 10 commits October 16, 2024 16:44
* ✨ anomalist: stop using mock

* style

* ✨ anomalist: stop using mock data

* re-order mock data

* replace mock data with real data

* 🎉 anomalist: Add population and analytics scores

* Store scores with all years and combine them on app

* Add anomaly and population score, as well as weighted score

* Move get_scores to utils

---------

Co-authored-by: lucasrodes <lucasrodes@users.noreply.github.com>
* ✨ anomalist: nits

* abstract df parsing logic

* add GP outlier

* add dfReduced to table

* reset index

* incorporate GP

* re-arrange functions, add link to indicator

* ✨ anomalist: test llms for summary

* stop reducing dfScore

* wip

* wip

* llm summary button

* add function to get variables from DB

* tag: icon is optional

* AI summary
* ✨ Add max_time and n_jobs to gp_outlier
* 🐛 Fix anomalist bugs
* 🎉 anomalist: Experiment with different anomaly detection methods

* Improve script to visualize anomalies

* Improve visualization of anomalies, and try different methods

* Improve cli

* Some refactoring

* Add useful comment

* ✨ anomalist: Improve automatic detection of new datasets (#3429)

* ✨ anomalist: Improve automatic detection of new datasets

* Create new functions to detect new datasets, and speed up anomalist

* Infer variable mapping

* Use inferred variable mapping in Anomalist

* Move function to get datasets info
* ✨ Add anomalist to owidbot
…3434)

* 🐛 anomalist: Fix bug with unknown indicators and long loading time

* Stop storing dfScore, which takes a long time to load

* Fix GP detecting anomalies on old variables (which is unnecessary)
* 🐛 anomalist: Fix bug with unknown indicators and long loading time

* Stop storing dfScore, which takes a long time to load

* Fix GP detecting anomalies on old variables (which is unnecessary)

* ✨ anomalist: Small improvement in Anomalist filters

* Show instead of hide detectors in filter
@Marigold Marigold mentioned this pull request Oct 21, 2024
@Marigold Marigold merged commit b00bf99 into master Oct 21, 2024
13 of 14 checks passed
@Marigold Marigold deleted the wizard-anomalist branch October 21, 2024 16:12
@Marigold Marigold restored the wizard-anomalist branch October 21, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants