Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not support quotes in the query string #98

Closed
janakanuwan opened this issue Nov 25, 2024 · 6 comments
Closed

Does not support quotes in the query string #98

janakanuwan opened this issue Nov 25, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@janakanuwan
Copy link

Bug description

JSON decode errors occur when the query parameter has double quotes or other binary operators

Reproducible code example

search_paper('"Computing Machinery" and Intelligence')
results = sch.search_paper('"Computing Machinery" and Intelligence', bulk=True)
search_paper('Computing Machinery + (Intelligence | Intel*)')

Error message

results = sch.search_paper('"Computing Machinery" and Intelligence')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\SemanticScholar.py", line 348, in search_paper
    results = loop.run_until_complete(
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\asyncio\futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\asyncio\tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\AsyncSemanticScholar.py", line 496, in search_paper
    results = await PaginatedResults.create(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\PaginatedResults.py", line 56, in create
    await obj._async_get_next_page()
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\PaginatedResults.py", line 139, in _async_get_next_page
    results = await self._request_data()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\PaginatedResults.py", line 130, in _request_data
    return await self._requester.get_data_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\ApiRequester.py", line 89, in get_data_async
    return await self._get_data_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\asyncio\__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\asyncio\__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
    result = await action(retry_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\_utils.py", line 99, in inner
    return call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
                                     ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\tenacity\asyncio\__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\semanticscholar\ApiRequester.py", line 127, in _get_data_async
    data = r.json()
           ^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\site-packages\httpx\_models.py", line 766, in json
    return jsonlib.loads(self.content, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hello\miniconda3\envs\lit_review\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Package version

0.8.4

Python version

3.11.10

@danielnsilva
Copy link
Owner

@janakanuwan I tried reproducing the issue but couldn’t get the same error on my end. Does it still happen for you?

This works for me:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery + (Intelligence | Intel*)', bulk=True)
print(f'Total results: {results.total}')
print(f'First result: {results[0].title}')

Output:

Total results: 806
First result: The Teaching of Psychological Medicine

Line 127 in ApiRequester.py handles responses with a 400 status (Bad Query Parameters). Maybe the service returned an invalid JSON for some reason.

@janakanuwan
Copy link
Author

janakanuwan commented Dec 1, 2024

@danielnsilva I think the issue is with passing the JSON after receiving the response, which includes the original query, which may not follow JSON format by default when it has double quotes. Can you try using double quotes that enable phases, such as results = sch.search_paper('"Computing Machinery" + Intelligence', bulk=True) . By the way, I used the solution #99, and it worked. But this may remove the original query in resulting response object.

@danielnsilva
Copy link
Owner

This also works for me:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('"Computing Machinery" + Intelligence',  bulk=True)
print(f'Total results: {results.total}')
print(f'First result: {results[0].title}')

Output:

Total results: 311
First result: A Comparative Study of Multilingual Translation Using Generative AI Translators and Human Translators : Alan Turning:‘Computing Machinery and Intelligence'

What version of httpx are you using?

@janakanuwan
Copy link
Author

@danielnsilva I am using conda environment on Windows and using;
`

  • semanticscholar==0.8.4
  • requests==2.32.3
  • requests-toolbelt==1.0.0
  • httpcore==1.0.7
  • httplib2==0.22.0
  • httpx==0.27.2 `

@danielnsilva
Copy link
Owner

It seems like this issue only happens with version 0.27 of HTTPX. I tested with 0.26 and 0.28 on both Linux and Windows, and they worked fine, but it fails with version 0.27. That said, there’s no handling for query parameters in the code right now, so using the params argument, as suggested in PR #99, might be a better approach.

... By the way, I used the solution #99, and it worked. But this may remove the original query in resulting response object.

@janakanuwan What do you mean by "this may remove the original query in resulting response object"?

@janakanuwan
Copy link
Author

@danielnsilva I mean earlier, the PaginatedResults had the url, which may include both the URL and parameters. Now, if the URL parameters are separated the resulting PaginatedResults may not have that information (not sure). But I realized that query still has those parameters in a different format, thus, it should be ok. Thus, I think merging the #99 should be fine.

`class PaginatedResults:
'''
This class abstracts paginated results from API search.
You can just iterate over results regardless of the number of pages.
'''

def __init__(
            self,
            requester: ApiRequester,
            data_type: Any,
            url: str,
            query: str = None,
            fields: str = None,
            limit: int = None,
            headers: dict = None,
            max_results: int = 10000
        ) -> None:`

danielnsilva added a commit that referenced this issue Dec 8, 2024
FIX:Does not support quotes in the query string #98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants