Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use sqlite and more #108

Merged
merged 64 commits into from
Dec 3, 2021
Merged

feat: use sqlite and more #108

merged 64 commits into from
Dec 3, 2021

Conversation

revolunet
Copy link
Member

@revolunet revolunet commented Oct 18, 2021

Avec le script actuel, en enlevant les filtres, on monte a +25Go de RAM et le script se fait tuer par un OOMKilled dans le cluster, je ne sais pas comment optimiser cette partie.

J'ai testé avec un import/export CSV -> SQLite -> CSV pour voir; Pour produire le CSV complet, ca met ~35minutes sur ma machine (MBP2020), mais il n'y a que le CPU qui travaille. Ca produit un CSV de 5Go avec 31,6 millions d'enregistrements.

Cette PR contient :

  • Construction du fichier CSV flat via assemblage SQLite
  • Github actions qui build les images (api, indexation, front de démo), déploie l'API et le front de démo
  • Github actions manuelles pour lancer une réindexation en dev ou prod
  • Documentation swagger de l'API exposée sur /
  • Frontend de démo

Capture d’écran 2021-11-12 à 18 29 14

Todo :
  • améliorer l'API
  • exposer/publier les types de l'API

@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 12:23 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 12:45 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 12:52 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 13:02 Inactive
@github-actions github-actions bot requested a deployment to recherche-entreprises-sqlite October 18, 2021 13:13 In progress
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 13:15 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 13:40 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 13:50 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 13:56 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 14:36 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 14:38 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 19:50 Inactive
@github-actions github-actions bot requested a deployment to recherche-entreprises-sqlite October 18, 2021 20:44 In progress
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 20:47 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 20:56 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 21:01 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite October 18, 2021 21:04 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 1, 2021 22:32 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 1, 2021 23:11 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 1, 2021 23:21 Inactive
@revolunet
Copy link
Member Author

Hello, pour simplifier la MEP, ca se déploie sur un nouvel index dédié (search-entreprises) et avec un nouvelle URL d'API sur https://search-recherche-entreprises-sqlite.dev.fabrique.social.gouv.fr/

@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 2, 2021 10:40 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 2, 2021 14:08 Inactive
@github-actions github-actions bot temporarily deployed to recherche-entreprises-sqlite December 2, 2021 14:14 Inactive
@rmelisson
Copy link
Collaborator

tickets
#112
#114
#50
#90
#89
#73

@github-actions github-actions bot requested a deployment to recherche-entreprises-sqlite December 2, 2021 15:14 In progress
@rmelisson
Copy link
Collaborator

@bobylito j'ai regardé en détails pour le score de novaway
https://search-recherche-entreprises-sqlite.dev.fabrique.social.gouv.fr/api/v1/search?query=novaway&limit=2&ranked=false

à cause du fuzzy matching, les deux versions (avec ou sans s) ont le même score, et c'est finalement l'ordre sur le siret qui fait passer l'un devant l'autre...

@bobylito
Copy link
Contributor

bobylito commented Dec 2, 2021

@rmelisson ok merci d'avoir regardé. Je suis quand même étonné, car d'après ce que je comprends de la doc sur fuzzy, novaway devrait être un match avec une altération sur la query novaways alors que novaways est un match parfait. Bon ça me paraît pas super grave pour le moment si on continue à trouver ce qu'on cherche même si c'est pas le premier résultat. On pourra toujours faire plus de tweak plus tard.

@github-actions
Copy link

github-actions bot commented Dec 3, 2021

🎉 Deployment for commit 7ae2595 :

Ingresses
Docker images
  • 📦 docker pull ghcr.io/socialgouv/recherche-entreprises/front:sha-7ae25958610ee93906d56a295dcfa2cf40d15cdb
  • 📦 docker pull ghcr.io/socialgouv/recherche-entreprises/search:sha-7ae25958610ee93906d56a295dcfa2cf40d15cdb
  • 📦 docker pull harbor.fabrique.social.gouv.fr/cdtn/recherche-entreprises-api:1.5.8
Debug

@revolunet revolunet enabled auto-merge (squash) December 3, 2021 09:09
@revolunet revolunet merged commit b8168f0 into master Dec 3, 2021
@revolunet revolunet deleted the sqlite branch December 3, 2021 09:09
SocialGroovyBot added a commit that referenced this pull request Dec 3, 2021
# [1.6.0](v1.5.8...v1.6.0) (2021-12-03)

### Bug Fixes

* **k8s:** adjust requests/limits ([#96](#96)) ([9655358](9655358))
* **k8s:** rm useless ([#97](#97)) ([ee0972f](ee0972f))

### Features

* use sqlite and more ([#108](#108)) ([b8168f0](b8168f0)), closes [#73](#73) [#112](#112) [#50](#50)
@SocialGroovyBot
Copy link
Member

🎉 This PR is included in version 1.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants