Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix name and description bm25 search #159

Conversation

kdutia
Copy link
Member

@kdutia kdutia commented Dec 16, 2024

Description

The issue was that we were searching on a field that was of indexing types attribute, summary, rather than the index versions.

Merges into branch adding relevance scores to output models as that's how I verified this is working.

Examples of this working

Search for "NDC" matches NDCs first

┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Family Name           ┃ Geography ┃ Score  ┃ Hits ┃ Slug                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Panama Second NDC     │ PAN       │ 23.681 │ 1    │ panama-second-ndc_28c6     │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ India First NDC       │ IND       │ 23.375 │ 1    │ india-first-ndc_b304       │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Philippines First NDC │ PHL       │ 23.366 │ 9    │ philippines-first-ndc_226c │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Kiribati Enhanced NDC │ KIR       │ 23.363 │ 10   │ kiribati-enhanced-ndc_69b4 │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Bhutan Second NDC     │ BTN       │ 23.359 │ 10   │ bhutan-second-ndc_fb4f     │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Gambia Second NDC     │ GMB       │ 23.353 │ 10   │ gambia-second-ndc_8eb3     │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Iraq First NDC        │ IRQ       │ 23.352 │ 1    │ iraq-first-ndc_3fae        │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Guyana First NDC      │ GUY       │ 23.351 │ 1    │ guyana-first-ndc_5955      │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Afghanistan First NDC │ AFG       │ 23.351 │ 1    │ afghanistan-first-ndc_77b7 │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Dominica First NDC    │ DMA       │ 23.345 │ 1    │ dominica-first-ndc_22ab    │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Updated NDC Serbia    │ SRB       │ 23.345 │ 10   │ updated-ndc-serbia_3052    │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Tuvalu First NDC      │ TUV       │ 23.344 │ 1    │ tuvalu-first-ndc_6ab4      │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Grenada Second NDC    │ GRD       │ 23.343 │ 10   │ grenada-second-ndc_d08e    │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Nepal Second NDC      │ NPL       │ 23.342 │ 10   │ nepal-second-ndc_fd24      │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Niue First NDC        │ NIU       │ 23.342 │ 1    │ niue-first-ndc_a8d0        │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Botswana First NDC    │ BWA       │ 23.341 │ 1    │ botswana-first-ndc_adf7    │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Algeria First NDC     │ DZA       │ 23.34  │ 2    │ algeria-first-ndc_5679     │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Kiribati First NDC    │ KIR       │ 23.339 │ 1    │ kiribati-first-ndc_d5f2    │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Senegal First NDC     │ SEN       │ 23.337 │ 10   │ senegal-first-ndc_84ec     │
├───────────────────────┼───────────┼────────┼──────┼────────────────────────────┤
│ Azerbaijan First NDC  │ AZE       │ 23.337 │ 1    │ azerbaijan-first-ndc_55c3  │
└───────────────────────┴───────────┴────────┴──────┴────────────────────────────┘

Search for "citizen's assembly" finds a document containing those terms in the title first, then a document containing those in the text

─────────────────────────────── Family 1/20: 'Resolution approving the establishment of the Citizens' Assembly' (IRL). Score: 39.341 ───────────────────────────────

            Total hits: 10
            Family: CCLW.family.9414.0
            Family slug: resolution-approving-the-establishment-of-the-citizens-assembly_3b3b
            Geography: IRL
            Relevance: 39.34084838533407
            
Description: This resolution of Dáil Éireann approves the establishment of the Citizens’ Assembly. The Assembly is notably established to consider how the State can
make Ireland a leader in tackling climate change, in order to make such recommendations as it sees fit and report to the Houses of the 
Oireachtas.<br><br><br><strong><br></strong><br>

Hits:
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Text                 ┃ Score  ┃ Type       ┃ TB ID ┃ Doc ID               ┃ bm25(text_block) ┃ closeness(text_emb… ┃ description_score ┃ name_score ┃ text_score ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ <see family          │ 39.341 │ Document   │ -     │ CCLW.legislative.94… │ -                │ -                   │ 17.094            │ 22.247     │ -          │
│ description>         │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.983 │ Text block │ 1688  │ CCLW.legislative.94… │ 11.003           │ 0.981               │ -                 │ -          │ 11.983     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.983 │ Text block │ 1233  │ CCLW.legislative.94… │ 11.003           │ 0.981               │ -                 │ -          │ 11.983     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.982 │ Text block │ 1683  │ CCLW.legislative.94… │ 11.002           │ 0.981               │ -                 │ -          │ 11.982     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.982 │ Text block │ 1225  │ CCLW.legislative.94… │ 11.002           │ 0.981               │ -                 │ -          │ 11.982     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.982 │ Text block │ 9     │ CCLW.legislative.94… │ 11.002           │ 0.981               │ -                 │ -          │ 11.982     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.982 │ Text block │ 1450  │ CCLW.legislative.94… │ 11.002           │ 0.981               │ -                 │ -          │ 11.982     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.98  │ Text block │ 1167  │ CCLW.legislative.94… │ 10.999           │ 0.981               │ -                 │ -          │ 11.98      │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.98  │ Text block │ 1498  │ CCLW.legislative.94… │ 10.999           │ 0.981               │ -                 │ -          │ 11.98      │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Citizens'        │ 11.978 │ Text block │ 1494  │ CCLW.legislative.94… │ 10.998           │ 0.981               │ -                 │ -          │ 11.978     │
│ Assembly             │        │            │       │                      │                  │                     │                   │            │            │
└──────────────────────┴────────┴────────────┴───────┴──────────────────────┴──────────────────┴─────────────────────┴───────────────────┴────────────┴────────────┘
──────────────────────────────────────────────── Family 2/20: 'National Adaptation Framework' (IRL). Score: 12.511 ─────────────────────────────────────────────────

            Total hits: 5
            Family: CCLW.family.8663.0
            Family slug: national-adaptation-framework_3322
            Geography: IRL
            Relevance: 12.510784910102725
            
Description: Ireland's first statutory National Adaptation Framework (NAF) was released on January 19th, 2018. The NAF sets out the national strategy to reduce the 
vulnerability of the country to the negative effects of climate change and to avail of positive impacts. The NAF was developed under the Climate Action and Low 
Carbon Development Act 2015 and succeeds to the NCCAF previously released in 2012.

Hits:
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Text                 ┃ Score  ┃ Type       ┃ TB ID ┃ Doc ID               ┃ bm25(text_block) ┃ closeness(text_emb… ┃ description_score ┃ name_score ┃ text_score ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Citizens' Assembly   │ 12.511 │ Text block │ 849   │ CCLW.executive.8663… │ 11.523           │ 0.987               │ -                 │ -          │ 12.511     │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────┤
│ The Programme for a  │ 4.846  │ Text block │ 850   │ CCLW.executive.8663… │ 3.984            │ 0.861               │ -                 │ -          │ 4.846      │
│ Partnership          │        │            │       │                      │                  │                     │                   │            │            │
│ Government also      │        │            │       │                      │                  │                     │                   │            │            │
│ committed the        │        │            │       │                      │                  │                     │                   │            │            │
│ Government to "the   │        │            │       │                      │                  │                     │                   │            │            │
│ establishment of a   │        │            │       │                      │                  │                     │                   │            │            │
│ Citizens' Assembly,  │        │            │       │                      │                  │                     │                   │            │            │
│ within six months    │        │            │       │                      │                  │                     │                   │            │            │
│ and without          │        │            │       │                      │                  │                     │                   │            │            │
│ participation by     │        │            │       │                      │                  │                     │                   │            │            │
│ politicians, with a  │        │            │       │                      │                  │                     │                   │            │            │
│ mandate to look at a │        │            │       │                      │                  │                     │                   │            │            │
│ limited number of    │        │            │       │                      │                  │                     │                   │            │            │
│ key issues over an   │        │            │       │                      │                  │                     │                   │            │            │
│ extended time        │        │            │       │                      │                  │                     │                   │            │            │
│ period."             │        │            │       │                      │                  │                     │                   │            │            │
├──────────────────────┼────────┼────────────┼───────┼──────────────────────┼──────────────────┼─────────────────────┼───────────────────┼────────────┼────────────

Proposed version

Please select the option below that is most relevant from the list below. This
will be used to generate the next tag version name during auto-tagging.

  • Skip auto-tagging
  • Patch
  • Minor version
  • Major version

Visit the Semver website to understand the
difference between MAJOR, MINOR, and PATCH versions.

Notes:

  • If none of these options are selected, auto-tagging will fail (integrated soon)
  • Where multiple options are selected, the most senior option ticked will be
    used -- e.g. Major > Minor > Patch
  • If you are selecting the version in the list above using the textbox, make
    sure your selected option is marked [x] with no spaces in between the
    brackets and the x

Type of change

Please select the option(s) below that are most relevant:

  • Bug fix
  • New feature
  • Breaking change

How Has This Been Tested?

No new tests

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

@kdutia kdutia requested a review from a team as a code owner December 16, 2024 11:54
Copy link

linear bot commented Dec 16, 2024

Base automatically changed from feature/sci-146-add-scores-and-summary-features-to-sdk to main December 16, 2024 11:56
@kdutia kdutia closed this Dec 16, 2024
@kdutia
Copy link
Member Author

kdutia commented Dec 16, 2024

closed as merge history got messed up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant