Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - compute doc vectors on the fly #1984

Merged
merged 12 commits into from
Nov 6, 2022
Merged

Conversation

AileenLin
Copy link
Member

@AileenLin AileenLin commented Sep 27, 2022

[RESOLVED]result consistent with stored vector version

todo: stored vector results need to be update:

2022-09-27 04:21:59,678 INFO [python] [OK] expected: 0.1926 actual: 0.1926 - metric: AP@1000 model: bm25-default topics: dev
2022-09-27 04:22:06,167 INFO [python] [OK] expected: 0.1840 actual: 0.1840 - metric: RR@10 model: bm25-default topics: dev
2022-09-27 04:22:14,168 INFO [python] [OK] expected: 0.6578 actual: 0.6578 - metric: R@100 model: bm25-default topics: dev
2022-09-27 04:22:22,178 INFO [python] [OK] expected: 0.8526 actual: 0.8526 - metric: R@1000 model: bm25-default topics: dev
2022-09-27 04:22:30,847 ERROR [python] [FAIL] expected: 0.1661 actual: 0.1663 - metric: AP@1000 model: bm25-default+rm3 topics: dev
2022-09-27 04:22:37,335 ERROR [python] [FAIL] expected: 0.1564 actual: 0.1566 - metric: RR@10 model: bm25-default+rm3 topics: dev
2022-09-27 04:22:45,377 ERROR [python] [FAIL] expected: 0.6494 actual: 0.6538 - metric: R@100 model: bm25-default+rm3 topics: dev
2022-09-27 04:22:53,549 INFO [python] [OK] expected: 0.8606 actual: 0.8606 - metric: R@1000 model: bm25-default+rm3 topics: dev
2022-09-27 04:23:02,128 ERROR [python] [FAIL] expected: 0.1692 actual: 0.1690 - metric: AP@1000 model: bm25-default+rocchio topics: dev
2022-09-27 04:23:08,607 ERROR [python] [FAIL] expected: 0.1597 actual: 0.1595 - metric: RR@10 model: bm25-default+rocchio topics: dev
2022-09-27 04:23:16,639 ERROR [python] [FAIL] expected: 0.6552 actual: 0.6553 - metric: R@100 model: bm25-default+rocchio topics: dev
2022-09-27 04:23:24,689 INFO [python] [OK] expected: 0.8620 actual: 0.8620 - metric: R@1000 model: bm25-default+rocchio topics: dev
2022-09-27 04:23:33,252 ERROR [python] [FAIL] expected: 0.1677 actual: 0.1676 - metric: AP@1000 model: bm25-default+rocchio-neg topics: dev
2022-09-27 04:23:39,717 ERROR [python] [FAIL] expected: 0.1578 actual: 0.1576 - metric: RR@10 model: bm25-default+rocchio-neg topics: dev
2022-09-27 04:23:47,744 ERROR [python] [FAIL] expected: 0.6561 actual: 0.6559 - metric: R@100 model: bm25-default+rocchio-neg topics: dev
2022-09-27 04:23:55,774 ERROR [python] [FAIL] expected: 0.8649 actual: 0.8652 - metric: R@1000 model: bm25-default+rocchio-neg topics: dev
2022-09-27 04:24:04,408 INFO [python] [OK] expected: 0.1625 actual: 0.1625 - metric: AP@1000 model: bm25-default+ax topics: dev
2022-09-27 04:24:10,950 INFO [python] [OK] expected: 0.1517 actual: 0.1517 - metric: RR@10 model: bm25-default+ax topics: dev
2022-09-27 04:24:19,058 INFO [python] [OK] expected: 0.6556 actual: 0.6556 - metric: R@100 model: bm25-default+ax topics: dev
2022-09-27 04:24:27,170 INFO [python] [OK] expected: 0.8747 actual: 0.8747 - metric: R@1000 model: bm25-default+ax topics: dev
2022-09-27 04:24:36,275 INFO [python] [OK] expected: 0.1520 actual: 0.1520 - metric: AP@1000 model: bm25-default+prf topics: dev
2022-09-27 04:24:42,829 INFO [python] [OK] expected: 0.1421 actual: 0.1421 - metric: RR@10 model: bm25-default+prf topics: dev
2022-09-27 04:24:50,925 INFO [python] [OK] expected: 0.6535 actual: 0.6535 - metric: R@100 model: bm25-default+prf topics: dev
2022-09-27 04:24:59,037 INFO [python] [OK] expected: 0.8537 actual: 0.8537 - metric: R@1000 model: bm25-default+prf topics: dev
2022-09-27 04:25:07,596 INFO [python] [OK] expected: 0.1958 actual: 0.1958 - metric: AP@1000 model: bm25-tuned topics: dev
2022-09-27 04:25:14,062 INFO [python] [OK] expected: 0.1875 actual: 0.1875 - metric: RR@10 model: bm25-tuned topics: dev
2022-09-27 04:25:22,042 INFO [python] [OK] expected: 0.6701 actual: 0.6701 - metric: R@100 model: bm25-tuned topics: dev
2022-09-27 04:25:30,030 INFO [python] [OK] expected: 0.8573 actual: 0.8573 - metric: R@1000 model: bm25-tuned topics: dev
2022-09-27 04:25:38,586 ERROR [python] [FAIL] expected: 0.1762 actual: 0.1741 - metric: AP@1000 model: bm25-tuned+rm3 topics: dev
2022-09-27 04:25:45,061 ERROR [python] [FAIL] expected: 0.1668 actual: 0.1646 - metric: RR@10 model: bm25-tuned+rm3 topics: dev
2022-09-27 04:25:53,086 ERROR [python] [FAIL] expected: 0.6655 actual: 0.6674 - metric: R@100 model: bm25-tuned+rm3 topics: dev
2022-09-27 04:26:01,120 ERROR [python] [FAIL] expected: 0.8687 actual: 0.8704 - metric: R@1000 model: bm25-tuned+rm3 topics: dev
2022-09-27 04:26:09,692 INFO [python] [OK] expected: 0.1777 actual: 0.1777 - metric: AP@1000 model: bm25-tuned+rocchio topics: dev
2022-09-27 04:26:16,225 ERROR [python] [FAIL] expected: 0.1685 actual: 0.1684 - metric: RR@10 model: bm25-tuned+rocchio topics: dev
2022-09-27 04:26:24,287 ERROR [python] [FAIL] expected: 0.6702 actual: 0.6706 - metric: R@100 model: bm25-tuned+rocchio topics: dev
2022-09-27 04:26:32,352 INFO [python] [OK] expected: 0.8726 actual: 0.8726 - metric: R@1000 model: bm25-tuned+rocchio topics: dev
2022-09-27 04:26:40,849 INFO [python] [OK] expected: 0.1762 actual: 0.1762 - metric: AP@1000 model: bm25-tuned+rocchio-neg topics: dev
2022-09-27 04:26:47,342 INFO [python] [OK] expected: 0.1669 actual: 0.1669 - metric: RR@10 model: bm25-tuned+rocchio-neg topics: dev
2022-09-27 04:26:55,394 ERROR [python] [FAIL] expected: 0.6744 actual: 0.6748 - metric: R@100 model: bm25-tuned+rocchio-neg topics: dev
2022-09-27 04:27:03,458 ERROR [python] [FAIL] expected: 0.8756 actual: 0.8757 - metric: R@1000 model: bm25-tuned+rocchio-neg topics: dev
2022-09-27 04:27:11,984 INFO [python] [OK] expected: 0.1699 actual: 0.1699 - metric: AP@1000 model: bm25-tuned+ax topics: dev
2022-09-27 04:27:18,535 INFO [python] [OK] expected: 0.1594 actual: 0.1594 - metric: RR@10 model: bm25-tuned+ax topics: dev
2022-09-27 04:27:26,648 INFO [python] [OK] expected: 0.6721 actual: 0.6721 - metric: R@100 model: bm25-tuned+ax topics: dev
2022-09-27 04:27:34,766 INFO [python] [OK] expected: 0.8809 actual: 0.8809 - metric: R@1000 model: bm25-tuned+ax topics: dev
2022-09-27 04:27:43,220 INFO [python] [OK] expected: 0.1582 actual: 0.1582 - metric: AP@1000 model: bm25-tuned+prf topics: dev
2022-09-27 04:27:49,714 INFO [python] [OK] expected: 0.1484 actual: 0.1484 - metric: RR@10 model: bm25-tuned+prf topics: dev
2022-09-27 04:27:57,772 INFO [python] [OK] expected: 0.6589 actual: 0.6589 - metric: R@100 model: bm25-tuned+prf topics: dev
2022-09-27 04:28:05,839 INFO [python] [OK] expected: 0.8561 actual: 0.8561 - metric: R@1000 model: bm25-tuned+prf topics: dev

@lintool
Copy link
Member

lintool commented Sep 27, 2022

hi @AileenLin - if you merge in main trunk, the scores should be updated and match?

Also, can you compare speed with -storeDocVectors and using your new implementation?

@@ -109,14 +111,14 @@
private final long seed;
private final String originalIndexPath;
private final String externalIndexPath; // Axiomatic reranking can opt to use
// external sources for searching the expansion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo these edits? I don't think they're supposed to be part of the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you ran a linter... which is fine. We can fix these minor issues... but the comments alignment should be reverted.

@codecov-commenter
Copy link

codecov-commenter commented Oct 5, 2022

Codecov Report

Base: 58.51% // Head: 58.83% // Increases project coverage by +0.32% 🎉

Coverage data is based on head (a050430) compared to base (b5ecc5a).
Patch coverage: 57.62% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1984      +/-   ##
============================================
+ Coverage     58.51%   58.83%   +0.32%     
- Complexity     1092     1129      +37     
============================================
  Files           187      187              
  Lines         10217    10780     +563     
  Branches       1413     1479      +66     
============================================
+ Hits           5978     6342     +364     
- Misses         3760     3948     +188     
- Partials        479      490      +11     
Impacted Files Coverage Δ
...ain/java/io/anserini/collection/CarCollection.java 0.00% <0.00%> (ø)
...ava/io/anserini/collection/DocumentCollection.java 57.33% <ø> (ø)
...in/java/io/anserini/collection/HtmlCollection.java 51.85% <0.00%> (-6.49%) ⬇️
...o/anserini/collection/TrialstreamerCollection.java 0.00% <0.00%> (ø)
.../java/io/anserini/collection/VectorCollection.java 0.00% <0.00%> (ø)
...ain/java/io/anserini/rerank/lib/AxiomReranker.java 0.00% <0.00%> (ø)
...n/java/io/anserini/rerank/lib/BM25PrfReranker.java 0.00% <0.00%> (ø)
...rini/rerank/lib/NewsBackgroundLinkingReranker.java 0.00% <0.00%> (ø)
...main/java/io/anserini/search/SearchCollection.java 43.36% <12.50%> (-0.59%) ⬇️
.../main/java/io/anserini/rerank/lib/Rm3Reranker.java 43.93% <12.90%> (-9.84%) ⬇️
... and 59 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@lintool lintool self-requested a review November 6, 2022 12:47
@lintool lintool merged commit 1273619 into castorini:master Nov 6, 2022
thongnt99 pushed a commit to thongnt99/anserini-lsr that referenced this pull request Mar 3, 2023
…i#1984)

This means that we can perform pseudo-relevance feedback on an index that does not have docvectors stored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants