Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated reproduction logs #1758

Merged
merged 2 commits into from
Jan 10, 2024

Conversation

wu-ming233
Copy link
Member

Everything worked, except one minor issue encountered in the last line of BM25 Baseline for MS MARCO Passage Ranking in Pyserini / Interactive Retrieval, the line hits[0].raw would error:

AttributeError: 'io.anserini.search.ScoredDoc' object has no attribute 'raw'

The output from the builtin dir(hits[0]) gives:

['_JavaClass__cls_storage', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__javaclass__', '__javaconstructor__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_class', 'clone', 'docid', 'equals', 'finalize', 'getClass', 'hashCode', 'lucene_docid', 'lucene_document', 'notify', 'notifyAll', 'registerNatives', 'score', 'toString', 'wait']

which includes attributes like docid and score, but not raw.

However, the command

searcher.doc(hits[0].docid).raw()

does give the desired output:

'{\n  "id" : "7187158",\n  "contents" : "Paula Deen and her brother Earl W. Bubba Hiers are being sued by a former general manager at Uncle Bubba\'sâ\x80¦ Paula Deen and her brother Earl W. Bubba Hiers are being sued by a former general manager at Uncle Bubba\'sâ\x80¦"\n}'

Or, alternatively, the command

hits[0].lucene_document.get('raw')

also gives the desired output.

I apologize in advance if the cause of this issue was that I didn't set up my environment or follow the instructions correctly. However, it appears to be the case that the class ScoredDoc (or JScoredDoc), returned by LuceneSearcher.search(), does not have the raw attribute; on the other hand, the class Document, returned by LuceneSearcher.doc(), does have the raw() method implemented to access the raw attribute of its underlying Lucene Document.


Operating System

WSL 2 running Ubuntu 22.04.3 LTS on a Windows 10 machine

Environment

  • Python 3.10.12
  • Java 11.0.21

Hardware

  • i7-9700 CPU
  • 16 GB RAM
  • NVIDIA GeForce RTX 2070 SUPER

All the unit tests were passed after setting up my environment. Please let me know if there's any other information I can give about the setup or configurations to help with the situation.

@wu-ming233
Copy link
Member Author

Fixed the outdated API call in onboarding tutorial.

@lintool lintool merged commit a6ed27e into castorini:master Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants