Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when running evaluate_bm25.py #64

Open
jordane95 opened this issue Feb 17, 2022 · 5 comments
Open

ValueError when running evaluate_bm25.py #64

jordane95 opened this issue Feb 17, 2022 · 5 comments

Comments

@jordane95
Copy link
Contributor

Hi, I was trying to run your evaluate_bm25.py baseline, but I got the following error. There may be some problem with elasticsearch. Could you please help me fix it?

2022-02-17 02:38:34 - Loading Queries...
2022-02-17 02:38:34 - Loaded 300 TEST Queries.
2022-02-17 02:38:34 - Query Example: 0-dimensional biomaterials show inductive properties.
2022-02-17 02:38:34 - Activating Elasticsearch....
2022-02-17 02:38:34 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'scifact', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 1, 'language': 'english'}
Traceback (most recent call last):
  File "evaluate_bm25.py", line 64, in <module>
    model = BM25(index_name=index_name, hostname=hostname, initialize=initialize, number_of_shards=number_of_shards)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/bm25_search.py", line 22, in __init__
    self.es = ElasticSearch(self.config)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/elastic_search.py", line 34, in __init__
    self.es = Elasticsearch(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py", line 312, in __init__
    node_configs = client_node_configs(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 101, in client_node_configs
    node_configs = hosts_to_node_configs(hosts)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 141, in hosts_to_node_configs
    node_configs.append(url_to_node_config(host))
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elastic_transport/client_utils.py", line 198, in url_to_node_config
    raise ValueError(
ValueError: URL must include a 'scheme', 'host', and 'port' component (ie 'https://localhost:9200')
@nreimers
Copy link
Member

As hostname I think you must use http://localhost (or http://localhost:9200), not just localhost

@jordane95
Copy link
Contributor Author

Thank you so much! I change the hostname to http://localhost:9200 and it works. But when I run it to evaluate BM25, I get different scores at different runs. For example, the NDCG@10 score ranges from 0.64~0.67 on scifact dataset. Do you know why? Is there any randomness in the BM25 algorithm?

@nreimers
Copy link
Member

This was addressed in #58

Not sure if the latest release already includes this. You can either update BEIR to use the latest version from the GIT. Or you add a sleep after you index the documents in your code.

@jordane95
Copy link
Contributor Author

I see. It's fixed in the beir code but not yet included in the examples. I add a sleep time and eventually get a consistent score.

@thakur-nandan
Copy link
Member

Hi @jordane95,

Yes soon with our next pip update, hopefully, this should not be an issue anymore and consistent scores should be visible with Elasticsearch BM25. Thanks for notifying me!

Kind regards,
Nandan Thakur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants