Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download Excel file from globalenergymonitor for build_gas_input_locations.py #1125

Closed
fhg-isi opened this issue Jul 1, 2024 · 1 comment · Fixed by #1235
Closed
Labels

Comments

@fhg-isi
Copy link
Contributor

fhg-isi commented Jul 1, 2024

Describe the Bug

When trying to run build_gas_input_location.py using

 snakemake = mock_snakemake(
            "build_gas_input_locations",
            simpl="",
            clusters="37",
        )

I get the error message below.

I am able to manually download the excel file when entering the url into a browser:

http://globalenergymonitor.org/wp-content/uploads/2023/07/Europe-Gas-Tracker-2023-03-v3.xlsx

Maybe the webpage has some restrictions for automated downloads?

Error Message

(pypsa-eur) projekt-resilient03@ubuntu-22-04-lts-temp:~/pypsa-eur$ python scripts/build_gas_input_locations.py
ERROR:root:Uncaught exception
Traceback (most recent call last):
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 156, in <module>
    gas_input_locations = build_gas_input_locations(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 91, in build_gas_input_locations
    lng = build_gem_lng_data(gem_fn)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/pypsa-eur/scripts/build_gas_input_locations.py", line 28, in build_gem_lng_data
    df = pd.read_excel(fn, sheet_name="LNG terminals - data", engine='openpyxl')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 495, in read_excel
    io = ExcelFile(
         ^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
    self._reader = self._engines[engine](
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
    super().__init__(
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/excel/_base.py", line 563, in __init__
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 728, in get_handle
    ioargs = _get_filepath_or_buffer(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 384, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
         ^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/site-packages/pandas/io/common.py", line 289, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/projekt-resilient03/conda/envs/pypsa-eur/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Related:

#1118

@fhg-isi fhg-isi added the bug label Jul 1, 2024
@fhg-isi
Copy link
Contributor Author

fhg-isi commented Jul 1, 2024

Possible solution: mock browser headers:

def open_remote_excel_file(url, sheet_name):

  temp_xlsx_path = 'temp_dummy_file.xlsx'

  headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
  req = urllib.request.Request(url, headers=headers)
  with urllib.request.urlopen(req) as response:
      with open(temp_xlsx_path, "wb") as output_file:
          output_file.write(response.read())  
    
  df = pd.read_excel(temp_xlsx_path, sheet_name=sheet_name)
  os.remove(temp_xlsx_path)
  return df

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant