Releases: opensanctions/yente
v4.1.0
This release improves error handling in the dataset indexer, and updates many dependencies.
A small breaking change: the results.xxx.total.value
response field in the /match
query API now contains the number of matching entities, not the number of candidates that have been scored. This behavior is meaningful, the previous one was more of a bug.
What's Changed
- feat: add traceID and pass it to ES by @SimonThordal in #489
- Bump rigour from 0.5.2 to 0.5.3 by @dependabot in #492
- Bump httpx[http2] from 0.27.0 to 0.27.2 by @dependabot in #514
- Bump rigour from 0.5.3 to 0.6.1 by @dependabot in #512
- Bump opensearch-py[async] from 2.6.0 to 2.7.1 by @dependabot in #509
- Bump anyio from 4.3.0 to 4.6.0 by @dependabot in #525
- Bump python-multipart from 0.0.9 to 0.0.10 by @dependabot in #524
- Bump fastapi from 0.114.1 to 0.115.0 by @dependabot in #522
- Bump python-multipart from 0.0.10 to 0.0.12 by @dependabot in #527
- Bump aiohttp[speedups] from 3.10.5 to 3.10.8 by @dependabot in #528
- Bump uvicorn[standard] from 0.30.6 to 0.31.0 by @dependabot in #529
Full Changelog: v4.0.0...v4.1.0
v4.0.0
This is a major release of yente
which changes the data indexing and search backend systems. It's adding support for incremental data updates (delta updater) and for the OpenSearch search index as a provider. Read more in our announcement blog post:
https://www.opensanctions.org/articles/2024-07-24-yente4/
This release does not change the scoring and matching systems.
v3.8.10
What's Changed
- Add jitter to updates by @SimonThordal in #470
- Bump jellyfish from 1.0.3 to 1.0.4 by @dependabot in #452
- Bump uvicorn[standard] from 0.29.0 to 0.30.1 by @dependabot in #456
- Bump cryptography from 42.0.7 to 42.0.8 by @dependabot in #458
- Bump docker/build-push-action from 5 to 6 by @dependabot in #469
- Bump followthemoney from 3.6.0 to 3.6.3 by @dependabot in #460
- Bump structlog from 24.1.0 to 24.2.0 by @dependabot in #451
- Bump elasticsearch[async] from 8.13.1 to 8.14.0 by @dependabot in #461
- Bump orjson from 3.10.3 to 3.10.5 by @dependabot in #465
- Bump email-validator from 2.1.1 to 2.2.0 by @dependabot in #471
- Bump asyncstdlib from 3.12.3 to 3.12.4 by @dependabot in #472
- Bump aiofiles from 23.2.1 to 24.1.0 by @dependabot in #474
- Bump nomenklatura from 3.10.6 to 3.12.5 by @dependabot in #476
- Bump orjson from 3.10.5 to 3.10.6 by @dependabot in #477
- Add dummy support for authentication by @SimonThordal in #478
Full Changelog: v3.8.9...v3.8.10
v3.8.9
What's Changed
- Bump asyncstdlib from 3.12.2 to 3.12.3 by @dependabot in #430
- Bump orjson from 3.10.0 to 3.10.1 by @dependabot in #431
- Bump aiohttp[speedups] from 3.9.3 to 3.9.5 by @dependabot in #432
- Handle validation error by @pudo in #433
- Bump pyicu from 2.12 to 2.13.1 by @dependabot in #438
- Bump fastapi from 0.110.1 to 0.111.0 by @dependabot in #440
- Bump cryptography from 42.0.5 to 42.0.7 by @dependabot in #444
- Bump followthemoney from 3.5.9 to 3.6.0 by @dependabot in #445
- Bump aiocsv from 1.3.1 to 1.3.2 by @dependabot in #436
- Bump orjson from 3.10.1 to 3.10.3 by @dependabot in #442
- Bump elasticsearch[async] from 8.13.0 to 8.13.1 by @dependabot in #443
Full Changelog: v3.8.8...v3.8.9
v3.8.8
New features
- feat: add gzip support by @SimonThordal in #422
- feat: proxy requests if requested by @SimonThordal in #428
- feat: add query parameter for allowing datasets by @SimonThordal in #429
Dependency updates
- Bump aiocsv from 1.3.0 to 1.3.1 by @dependabot in #410
- Bump uvicorn[standard] from 0.27.1 to 0.28.0 by @dependabot in #414
- Bump asyncstdlib from 3.12.0 to 3.12.1 by @dependabot in #413
- Bump nomenklatura from 3.10.4 to 3.10.5 by @dependabot in #411
New Contributors
- @SimonThordal made their first contribution in #422
Full Changelog: v3.8.4...v3.8.8
v3.8.4
This is a maintenance release which addresses a potential vulnerability in orjson
. It does not change any scoring behaviour.
What's Changed
- Bump python-multipart from 0.0.7 to 0.0.9 by @dependabot in #398
- Bump uvicorn[standard] from 0.27.0.post1 to 0.27.1 by @dependabot in #399
- Bump fastapi from 0.109.2 to 0.110.0 by @dependabot in #406
- Add ability to pass custom certificate by @RMHogervorst in #407
New Contributors
- @RMHogervorst made their first contribution in #407
Full Changelog: v3.8.3...v3.8.4
v3.8.3
This release includes two changes to the match API:
- Fix a bug where custom datasets that are much smaller than the OpenSanctions data were not scored correctly in search results and therefore didn't return even if they were a good match for the query.
- Fix the phonetics matcher to cut off results where the raw (levenshtein) edit distance between the proposed match and the query exceeds a threshold.
What's Changed
- Bump uvicorn[standard] from 0.25.0 to 0.26.0 by @dependabot in #383
- Bump orjson from 3.9.10 to 3.9.12 by @dependabot in #384
- Bump elasticsearch[async] from 8.11.1 to 8.12.0 by @dependabot in #385
- Bump aiohttp[speedups] from 3.9.1 to 3.9.3 by @dependabot in #388
- Bump uvicorn[standard] from 0.26.0 to 0.27.0.post1 by @dependabot in #389
Full Changelog: v3.8.2...v3.8.3
v3.8.2
This release makes functional changes in response to user feedback, in particular the following:
- Indexer stability: the indexer process is struggling with interrupted downloads of source data, in part due to the growth of our database (error: "Payload not completed"). We've now switched to a different HTTP client library and added support for HTTP/2 binary streams in an effort to add more stability to this process. We've also disabled the option to conduct multiple indexing jobs at the same time.
- Phonetic search yields overly broad results: this also results in missed matches due to an abnormally large number of match candidates being generated. We've further limited the way that phonetic search works in an effort to reduce false positives.
- Default data update checks (
YENTE_CRONTAB
) are now conducted every two hours. - Improved handling of exceptions from the search index.
- Introduced a new
index_stale
boolean flag in/catalog
for monitoring purposes.
v3.8.0
This release brings a number of improvements:
- Updated nomenklatura matching model (
logic-v1
) which now does SWIFT BIC matching and handles names with different tokenization better ("Jean-Paul Sartre" == "JeanPaul Sartre"). logic-v1
is now the default algorithm for the match API- The match API now supports a
topics
argument that can be used to match only entities with a particular topic tag (e.g.role.pep
,sanction
). - The
/catalog
endpoint now carries freshness data, giving theindex_version
for each dataset, and listing an array of allcurrent
andoutdated
datasets in the index. - Various dependency upgrades.
v3.7.3
- Improvements to matching of company names
- Disable phonetic matching on names that do not use a Western-style alphabet
- Fix a race condition in the indexer which can delete the active index
Full Changelog: v3.7.2...v3.7.3