Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

yilinjz · 2023-08-25T06:41:29Z

OS: macOS Ventura Version 13.5 
Hardware: M2 MacBook Pro (2023), 16 GB RAM 
Conda: 23.7.2 
Python: (Conda Environment): 3.8.17
Java: (Conda Environment) 11.0.13
Maven: (Conda Environment) 3.9.4

Result: I was able to replicate (most of) the Indexing, Retrieval & Evaluation steps with Pyserini on my machine with the above settings. Outputs are the same as listed in the document.  

Additional Comment: I encountered two (and only two) failed unit tests (from running “python -m unittest”) when following the “pyserini/docs/installation.md” file. Namely,

test_remove_undjudged (tests.test_trectools.TestTrecTools)
test_undjudged_keep (tests.test_trectools.TestTrecTools)

This seems to be caused by the TREC evaluation tool. When running the following command:

python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap
collections/msmarco-passage/qrels.dev.small.trec
runs/run.msmarco-passage.bm25tuned.trec

I got:

Running command: ['java', '-jar', '/Users/jasonzhang/.cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-c', '-mrecall.1000', '-mmap', 'collections/msmarco-passage/qrels.dev.small.trec', 'runs/run.msmarco-passage.bm25tuned.trec']
Exception in thread "main" java.lang.UnsupportedOperationException: Unsupported os/arch: trec_eval-macosx-aarch64
at uk.ac.gla.terrier.jtreceval.trec_eval.getTrecEvalBinary(trec_eval.java:80)
at uk.ac.gla.terrier.jtreceval.trec_eval.(trec_eval.java:130)
at uk.ac.gla.terrier.jtreceval.trec_eval.main(trec_eval.java:262)

which seems to be Apple's M1/M2 chips related. There is surprisingly little information about this error on Google (it needs a better IR system!). As a side note, according to previous commits, it seems others have succeeded with M1 MacBooks (but no commit with M2).

On the other hand, the official MS MARCO evaluation script worked perfectly fine for me.

For the next step, I'm planning on replicating this experiment again with my old laptop (which has intel chips), but before that I thought I'd record this issue here as potential reference for future users.

…g for ugrad onboarding

yilinjz · 2023-08-25T22:01:37Z

Update: Reran this experiment on my Intel chip MacBook. Everything worked smoothly. Outputs are the same as listed in the document.  

OS: macOS Ventura Version 13.4.1
Hardware: MacBook Pro (2019), 2.8GHz Quad-Core Intel Core i7, 16 GB RAM

lintool · 2023-08-26T15:04:36Z

Interesting. From what I understand, Rosetta is supposed to handle translation of x86 instructions... but for some reason it's not kicking in? I have an Apple M2 Macbook, and seems to work fine for me? #shrug

Thanks for noting, and let's keep an eye out on this issue.

yilinjz · 2023-08-30T16:17:25Z

@lintool Found a solution to this issue. Leaving a comment here for future reference.

If you encountered this issue:
This issue seemed to be caused by Conda 23.7.2 ARM64 distribution. If you are unsure about which conda version you have, run "conda info" in your terminal and check the "platform" field.

Steps to fix this issue:

Uninstall your current conda distribution (https://docs.anaconda.com/free/anaconda/install/uninstall/)
If you do not have Rosetta installed on your Mac, install Rosetta (https://osxdaily.com/2020/12/04/how-install-rosetta-2-apple-silicon-mac/)
Install the latest Intel Mac distribution (https://www.anaconda.com/). Rosetta will handle the x86 translation so yes you can run the Intel distribution on your M1/M2 Mac.
Go through the Pyserini Installation again and it should work fine (https://github.com/castorini/pyserini/blob/master/docs/installation.md). Try the Development Installation if you encounter issues with the Pip Installation.

Side note:
It's unclear whether earlier Conda ARM64 distributions such as 23.7.1 or 23.7.0 also have this issue. If you prefer installing an ARM64 distribution, try the earlier Conda versions.

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Lo…

f789db8

…g for ugrad onboarding

lintool approved these changes Aug 26, 2023

View reviewed changes

lintool merged commit 01d645e into castorini:master Aug 26, 2023

lintool mentioned this pull request Aug 29, 2023

Adds reproduction logs for onboarding docs #1607

Merged

yilinjz mentioned this pull request Aug 30, 2023

Pyserini Onboarding Path: Add an entry in Reproduction Log #1608

Closed

lintool mentioned this pull request Aug 31, 2023

Refactor installation instructions #1611

Closed

yilinjz mentioned this pull request Sep 1, 2023

Refactored Installation Guide #1614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

yilinjz commented Aug 25, 2023

yilinjz commented Aug 25, 2023

lintool commented Aug 26, 2023

yilinjz commented Aug 30, 2023

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

Add an entry in Pyserini's Passage Ranking Experiment Reproduction Log for ugrad onboarding #1599

Conversation

yilinjz commented Aug 25, 2023

yilinjz commented Aug 25, 2023

lintool commented Aug 26, 2023

yilinjz commented Aug 30, 2023