Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable token-based lexical search #162

Merged

Conversation

kdutia
Copy link
Member

@kdutia kdutia commented Dec 16, 2024

Description

The lexical parts of our search have always searched for the stemmed, lowercased phrase using the words in the same order that they are in the query, i.e. text_block contains query. This swaps out contains() for userInput() as per the Vespa example. userInput() does some Vespa heavy lifting to convert text queries into Vespa queries, and is configurable in ways that could be useful to us later e.g. prioritising prefixes.

Fields to search are now defined using the default fields for each schema.

This now means that more tests pass at search fixes - PR on that soon!

Before & after

For search "adaptation plans and strategies":

Before: we only match fields which contain some of the words in a different order because of embedding search. BM25 score is 0 for all results where all the words aren't present in the right order.

─────────────────────────────────────────────────────────────────────────────────────────────────── Family 1/20: 'Portugal. National Communication (NC). NC 8. Biennial Reports (BR). BR 5.' (PRT). Score: 6.11 ────────────────────────────────────────────────────────────────────────────────────────────────────

            Total hits: 6
            Family: UNFCCC.family.1030.0
            Family slug: portugal-national-communication-nc-nc-8-biennial-reports-br-br-5_2159
            Geography: PRT
            Relevance: 6.1099198330787345
            
Description: Portugal. National Communication (NC). NC 8. Biennial Reports (BR). BR 5., National Communication,Biennial Report from Portugal in 2022

Hits:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Text                                                                                                                                                             ┃ Score ┃ Type       ┃ TB ID ┃ Doc ID              ┃ bm25(text_block) ┃ closeness(text_embedding) ┃ description_score ┃ name_score ┃ text_score ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Subnational adaptation plans and strategies (6.4.4)                                                                                                              │ 6.11  │ Text block │ 12716 │ UNFCCC.party.1031.0 │ 5.158            │ 0.952                     │ -                 │ -          │ 6.11       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┼────────────┼───────┼─────────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ Domestic adaptation policies and strategies (6.4)                                                                                                                │ 0.934 │ Text block │ 136   │ UNFCCC.party.1031.0 │ 0.0              │ 0.934                     │ -                 │ -          │ 0.934      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┼────────────┼───────┼─────────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ Domestic adaptation policies and strategies (6.4)                                                                                                                │ 0.934 │ Text block │ 12393 │ UNFCCC.party.1031.0 │ 0.0              │ 0.934                     │ -                 │ -          │ 0.934      │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┼────────────┼───────┼─────────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ -Climate Change Adaptation Plans                                                                                                                                 │ 0.93  │ Text block │ 12450 │ UNFCCC.party.1031.0 │ 0.0              │ 0.93                      │ -                 │ -          │ 0.93       │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┼────────────┼───────┼─────────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ From the adaptation perspective these plans will be the main instrument to support the role of the local governments. The plans seek to promote proper vertical  │ 0.927 │ Text block │ 12718 │ UNFCCC.party.1031.0 │ 0.0              │ 0.927                     │ -                 │ -          │ 0.927      │
│ integration (e.g. integration of intermunicipal plans at the municipal scale), to define climate adaptation planning, to strengthen the role of land use         │       │            │       │                     │                  │                           │                   │            │            │
│ planning in adaptation, to establish municipal adaptation action programmes to be implemented until 2030, to empower municipal officials and technical staff,    │       │            │       │                     │                  │                           │                   │            │            │
│ and to prepare communities for the challenges of climate change.                                                                                                 │       │            │       │                     │                  │                           │                   │            │            │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────┼────────────┼───────┼─────────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ The National Adaptation Strategy - ENAAC (6.4.3)                                                                                                                 │ 0.924 │ Text block │ 12631 │ UNFCCC.party.1031.0 │ 0.0              │ 0.924                     │ -                 │ -          │ 0.924      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────┴────────────┴───────┴─────────────────────┴──────────────────┴───────────────────────────┴───────────────────┴────────────┴────────────┘

After: example of bm25 score working for text fields where the terms aren't in the same order as in the query, and not all the search terms are in the query

────────────────────────────────────────────────────────────────────────────────────────── Family 1/20: 'Saint Lucia’s Resilient Ecosystems Adaptation Strategy and Action Plan (REASAP) 2020–2028' (LCA). Score: 23.077 ───────────────────────────────────────────────────────────────────────────────────────────

            Total hits: 1
            Family: CPR.family.i00002393.n0000
            Family slug: saint-lucias-resilient-ecosystems-adaptation-strategy-and-action-plan-reasap-2020-2028_5143
            Geography: LCA
            Relevance: 23.077193964660033
            
Description: <p>This document is the fourth National Adaptation Plan (NAP) strategy prioritized in 2017.</p>


Hits:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Text                     ┃ Score  ┃ Type     ┃ TB ID ┃ Doc ID                       ┃ bm25(text_block) ┃ closeness(text_embedding) ┃ description_score ┃ name_score ┃ text_score ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ <see family description> │ 23.077 │ Document │ -     │ CPR.document.i00002394.n0000 │ -                │ -                         │ 13.919            │ 9.159      │ -          │
└──────────────────────────┴────────┴──────────┴───────┴──────────────────────────────┴──────────────────┴───────────────────────────┴───────────────────┴────────────┴────────────┘
─────────────────────────────────────────────────────────────────────────────────────────── Family 2/20: 'Accompanying information: Australian National Climate Resilience and Adaptation Strategy' (AUS). Score: 22.819 ───────────────────────────────────────────────────────────────────────────────────────────

            Total hits: 10
            Family: UNFCCC.family.4.0
            Family slug: accompanying-information-australian-national-climate-resilience-and-adaptation-strategy_9046
            Geography: AUS
            Relevance: 22.81935296458243
            
Description: Accompanying information: Australian National Climate Resilience and Adaptation Strategy, National Adaptation Plan from Australia in 2021

Hits:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Text                                                                                                                                                               ┃ Score  ┃ Type       ┃ TB ID ┃ Doc ID           ┃ bm25(text_block) ┃ closeness(text_embedding) ┃ description_score ┃ name_score ┃ text_score ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ <see family description>                                                                                                                                           │ 22.819 │ Document   │ -     │ UNFCCC.party.4.0 │ -                │ -                         │ 14.914            │ 7.905      │ -          │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ National Climate Resilience and Adaptation Strategy 13                                                                                                             │ 13.339 │ Text block │ 268   │ UNFCCC.party.4.0 │ 12.438           │ 0.901                     │ -                 │ -          │ 13.339     │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ National Climate Resilience and Adaptation Strategy 32                                                                                                             │ 13.319 │ Text block │ 584   │ UNFCCC.party.4.0 │ 12.413           │ 0.906                     │ -                 │ -          │ 13.319     │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ National Climate Resilience and Adaptation Strategy 25                                                                                                             │ 13.318 │ Text block │ 500   │ UNFCCC.party.4.0 │ 12.413           │ 0.905                     │ -                 │ -          │ 13.318     │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ National Climate Resilience and Adaptation Strategy 11                                                                                                             │ 13.311 │ Text block │ 227   │ UNFCCC.party.4.0 │ 12.407           │ 0.904                     │ -                 │ -          │ 13.311     │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ strategies including the National Disaster Risk Reduction Framework and supporting Action Plans, the 2021 Intergenerational Report, the 2021 Australian            │ 11.012 │ Text block │ 283   │ UNFCCC.party.4.0 │ 10.095           │ 0.917                     │ -                 │ -          │ 11.012     │
│ Infrastructure Plan, the 2021 Delivering Ag2030 report, Australia's Strategy for Nature, and Australia's International Climate Change Action Strategy that guides  │        │            │       │                  │                  │                           │                   │            │            │
│ adaptation action in our region. Implementation of this Strategy will continue to support the integration of climate information in new plans and strategies as    │        │            │       │                  │                  │                           │                   │            │            │
│ they are developed.                                                                                                                                                │        │            │       │                  │                  │                           │                   │            │            │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ Assessing progress on adaptation requires regular assessments of national progress to inform and prioritise effort and monitoring and evaluation. An integrated    │ 9.72   │ Text block │ 599   │ UNFCCC.party.4.0 │ 8.826            │ 0.893                     │ -                 │ -          │ 9.72       │
│ collaborative national assessment will support planning and investment in adaptation and resilience building across the Australian community. Monitoring and       │        │            │       │                  │                  │                           │                   │            │            │
│ evaluation of the Strategy will complement assessments of climate impacts and adaptation progress, and the effectiveness of investments and actions.               │        │            │       │                  │                  │                           │                   │            │            │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ Since the Government released the first National Climate Resilience and Adaptation Strategy in 2015, individuals, businesses and all levels of government have     │ 8.283  │ Text block │ 57    │ UNFCCC.party.4.0 │ 7.389            │ 0.894                     │ -                 │ -          │ 8.283      │
│ been preparing for our future climate and taking action. I am proud to present the new Strategy, which builds on the 2015 Strategy to enable even greater action.  │        │            │       │                  │                  │                           │                   │            │            │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ Adaptation involves a range of broad, cross- sectoral challenges. An effective national adaptation response requires coordinated action across the natural, built, │ 7.884  │ Text block │ 194   │ UNFCCC.party.4.0 │ 6.979            │ 0.906                     │ -                 │ -          │ 7.884      │
│ social and economic domains. The Strategy uses these four domains as outlined in the diagram below to frame the approach to coordinated adaptation.                │        │            │       │                  │                  │                           │                   │            │            │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼────────────┼───────┼──────────────────┼──────────────────┼───────────────────────────┼───────────────────┼────────────┼────────────┤
│ The Australian Government's new ten-year Threatened Species Strategy (2021-2031), includes a focus on climate change adaptation and resilience. The Threatened     │ 7.809  │ Text block │ 714   │ UNFCCC.party.4.0 │ 6.908            │ 0.901                     │ -                 │ -          │ 7.809      │
│ Species Strategy identifies actions that are needed to assist threatened species adapt to climate change - taking into account interactions with other threats -   │        │            │       │                  │                  │                           │                   │            │            │
│ including risk-based conservation planning and identifying and conserving places that will be refuges for threatened species.                                      │        │            │       │                  │                  │                           │                   │            │            │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────┴────────────┴───────┴──────────────────┴──────────────────┴───────────────────────────┴───────────────────┴────────────┴────────────┘

Proposed version

Please select the option below that is most relevant from the list below. This
will be used to generate the next tag version name during auto-tagging.

  • Skip auto-tagging
  • Patch
  • Minor version
  • Major version

Visit the Semver website to understand the
difference between MAJOR, MINOR, and PATCH versions.

Notes:

  • If none of these options are selected, auto-tagging will fail (integrated soon)
  • Where multiple options are selected, the most senior option ticked will be
    used -- e.g. Major > Minor > Patch
  • If you are selecting the version in the list above using the textbox, make
    sure your selected option is marked [x] with no spaces in between the
    brackets and the x

Type of change

Please select the option(s) below that are most relevant:

  • Bug fix
  • New feature
  • Breaking change

How Has This Been Tested?

Please describe the tests that you added to verify your changes.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

Copy link

linear bot commented Dec 16, 2024

@kdutia kdutia marked this pull request as ready for review December 16, 2024 17:05
@kdutia kdutia requested a review from a team as a code owner December 16, 2024 17:05
@kdutia kdutia merged commit 4cb546e into main Dec 16, 2024
6 checks passed
@kdutia kdutia deleted the feature/sci-88-enable-token-order-unaware-search-for-titles branch December 16, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants