Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get amazon product data erroring due to whitespace in headers #9009

Merged

Conversation

CaedenPH
Copy link
Contributor

Describe your change:

This PR fixes this error:

Traceback (most recent call last):
  File "C:\Users\caeden\Github\python\web_programming\get_amazon_product_data.py", line 100, in <module>
    get_amazon_product_data(product).to_csv(f"Amazon Product Data for {product}.csv")
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\caeden\Github\python\web_programming\get_amazon_product_data.py", line 26, in get_amazon_product_data
    soup = BeautifulSoup(requests.get(url, headers=header).text)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 575, in request
    prep = self.prepare_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 486, in prepare_request
    p.prepare(
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 369, in prepare
    self.prepare_headers(headers)
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\models.py", line 491, in prepare_headers
    check_header_validity(header)
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\utils.py", line 1040, in check_header_validity
    _validate_header_part(header, value, 1)
  File "C:\Users\caeden\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\utils.py", line 1056, in _validate_header_part
    raise InvalidHeader(
requests.exceptions.InvalidHeader: Invalid leading whitespace, reserved character(s), or returncharacter(s) in header value: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36\n        (KHTML, like Gecko)Chrome/44.0.2403.157 Safari/537.36'

As well as some type changes and warnings (setting the featuresto lxml)

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

@CaedenPH CaedenPH requested a review from cclauss as a code owner August 22, 2023 12:48
@algorithms-keeper
Copy link

Multiple Pull Request Detected

@CaedenPH, we are extremely excited that you want to submit multiple algorithms in this repository but we have a limit on how many pull request a user can keep open at a time. This is to make sure all maintainers and users focus on a limited number of pull requests at a time to maintain the quality of the code.

This pull request is being closed as the user already has an open pull request. Please focus on your previous pull request before opening another one. Thank you for your cooperation.

User opened pull requests (including this one): #9009, #8966, #8965, #8906

@algorithms-keeper algorithms-keeper bot removed the request for review from cclauss August 22, 2023 12:48
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Aug 22, 2023
@CaedenPH
Copy link
Contributor Author

Broken bot 😢

@cclauss cclauss reopened this Aug 22, 2023
@algorithms-keeper algorithms-keeper bot added the enhancement This PR modified some existing files label Aug 22, 2023
@50-Course
Copy link

ouch, is that bot for real though?

@tianyizheng02
Copy link
Contributor

Broken bot 😢

@CaedenPH Last time it stopped you once you exceeded 5 PRs, this time it's 3. What exactly is the PR limit, @dhruvmanila?

Copy link

@50-Course 50-Course left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to some extent.

Can you review #L65-L77? Here are a few suggestions:

  • float is a poor way of representing currency. First it introduces a lot of inefficient way when manipulating the money, I think its good you did first, multiply by 100 to take it to interger and even at that, i would have to do some round off, manupulate in integer form and convert back. Here's a typical example:
I got $2.99 and need to pay tax of .05 cents,

by working with floats, I would be doing 2.99*100 = 299, and .05*100=5,

money_less_tax = (299 - 5) / 100 = $2.94, in a fintech application with minute charges up 0.005 or so, could be worse with fractions and implicit manipulations, hence you should check out the decimal module

Refs:
(How to calculate money in python)[https://learnpython.com/blog/count-money-python/]
(Python Doc on why Decimal is better than float)[https://docs.python.org/3/library/decimal.html]

@CaedenPH
Copy link
Contributor Author

@50-Course This is a change with the algorithm itself. All this pr does is fix a bug and type errors in the algorithm, not refactor the code.
If you think the algorithm could do with a refactoring in terms of the way that money is handled, feel free to implement a pull request for this

@dhruvmanila
Copy link
Member

Broken bot 😢

@CaedenPH Last time it stopped you once you exceeded 5 PRs, this time it's 3. What exactly is the PR limit, @dhruvmanila?

It is 3 although it could happen that the event was posted late or there might've been some kind of latency issues or the request failed. I would just remove the limit actually, so if anyone's interested they can open a PR and set this variable to 0: https://github.com/TheAlgorithms/algorithms-keeper/blob/6d535abd7e5344606b979498ed68a756508792c5/algorithms_keeper/event/pull_request.py#L42-L43

@tianyizheng02 tianyizheng02 merged commit 72f6000 into TheAlgorithms:master Sep 5, 2023
@algorithms-keeper algorithms-keeper bot removed the awaiting reviews This PR is ready to be reviewed label Sep 5, 2023
@CaedenPH CaedenPH deleted the fix-get-amazon-product-data branch September 5, 2023 06:25
sedatguzelsemme pushed a commit to sedatguzelsemme/Python that referenced this pull request Sep 15, 2024
…eAlgorithms#9009)

* updating DIRECTORY.md

* fix(get-amazon-product-data): Remove whitespace in headers

* refactor(get-amazon-product-data): Don't print to_csv

---------

Co-authored-by: github-actions <${GITHUB_ACTOR}@users.noreply.github.com>
@isidroas isidroas mentioned this pull request Jan 25, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This PR modified some existing files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants