Bad gateway error when trying to access TROPOMI files #601

zfasnacht · 2024-06-10T18:56:30Z

zfasnacht
Jun 10, 2024

I'm trying to use the earthaccess tool to read TROPOMI files that are in the GES DISC cloud but I'm getting the following error quite frequently

Traceback (most recent call last):
  File "/panfs/ccds02/home/zfasnach/pace_no2_nn_train.py", line 16, in <module>
    pace_data = grab_pace_data(start_date,end_date)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/panfs/ccds02/home/zfasnach/grab_pace_l1b.py", line 44, in grab_pace_data
    f = h5py.File(filename,'r')  
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/h5py/_hl/files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/h5py/_hl/files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
  File "h5py/h5fd.pyx", line 163, in h5py.h5fd.H5FD_fileobj_read
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/spec.py", line 1915, in readinto
    data = self.read(out.nbytes)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/spec.py", line 1897, in read
    out = self.cache._fetch(self.loc, self.loc + length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/caching.py", line 481, in _fetch
    self.cache = self.fetcher(start, bend)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/implementations/http.py", line 653, in async_fetch_range
    r.raise_for_status()
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 502, message='Bad Gateway', url=URL('https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.2/2024/143/S5P_OFFL_L2__NO2____20240522T000826_20240522T014956_34229_03_020600_20240523T161152.nc')

Any idea how to avoid this from happening other then a simple try/except?

chuckwondo · 2024-06-10T19:44:28Z

chuckwondo
Jun 10, 2024
Maintainer

@zfasnacht, have you accepted the EULA? If not, that might be the problem. To accept the EULA, open the URL mentioned in the error message in a browser, which should redirect you to login to Earthdata Login, and then to an EULA (End User License Agreement) page, where you can check to box at the bottom of the page and click the Agree button. Once you do that, you should get past this problem, assuming you haven't already accepted the EULA, and assuming you're using the same credentials via earthaccess as you do when you accept the EULA.

0 replies

zfasnacht · 2024-06-10T19:46:48Z

zfasnacht
Jun 10, 2024
Author

Well it's not happening for the first file, so I'm not sure that would be the case. It reads a few files, then randomly that error occurs. Might read 2 files ok, might read 7, seems to be random.

0 replies

zfasnacht · 2024-06-10T19:48:17Z

zfasnacht
Jun 10, 2024
Author

I did go to that link, logged in, and it downloaded fine which I think suggests I've already accepted the EULA.

0 replies

chuckwondo · 2024-06-10T19:48:26Z

chuckwondo
Jun 10, 2024
Maintainer

Can you share your code? Just enough to show how you're using earthaccess.

0 replies

zfasnacht · 2024-06-10T19:50:41Z

zfasnacht
Jun 10, 2024
Author

Of course, thanks for the help!

import earthaccess
import h5py

start_date = '2024-05-22 00:00:00'
end_date = '2024-05-22 23:59:59'

def grab_pace_data(start_date,end_date):
    earthaccess.login(persist=True)

    results = earthaccess.search_data(short_name = 'S5P_L2__NO2____HiR',cloud_hosted=True,temporal=(start_date,end_date),count=20,bounding_box=(-180,-90,180,90))                                                 
    trop_no2_files = earthaccess.open(results)                                                                                                                                                                    


    for filename in trop_no2_files:
        print(filename.full_name)                                                                                                                                                                                 
        f = h5py.File(filename,'r')

        data_group = '/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/'
        product_group = '/PRODUCT/'

        no2_scd = f[data_group+'nitrogendioxide_slant_column_density'][0]
        no2_strat = f[data_group+'nitrogendioxide_stratospheric_column'][0]

0 replies

chuckwondo · 2024-06-10T21:23:37Z

chuckwondo
Jun 10, 2024
Maintainer

This might have to do with the underlying async and multi-threading happening under the covers with the fsspec library. Unfortunately, the way earthaccess.open is currently implemented, this might be causing this issue when using it as you are using it (which is how most people are using it, I suspect).

To see if my hunch is correct, try doing the following instead, and let me know if this avoids the issue.

import earthaccess
import h5py

start_date = '2024-05-22 00:00:00'
end_date = '2024-05-22 23:59:59'

def grab_pace_data(start_date,end_date):
    data_group = '/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/'
    product_group = '/PRODUCT/'

    earthaccess.login(persist=True)

    results = earthaccess.search_data(short_name = 'S5P_L2__NO2____HiR',cloud_hosted=True,temporal=(start_date,end_date),count=20,bounding_box=(-180,-90,180,90))

    for result in results:
        with (
            earthaccess.open([result])[0] as trop_no2_file,
            h5py.File(trop_no2_file) as f
        ):
            print(trop_no2_file.full_name)
            f = h5py.File(trop_no2_file,'r')

            no2_scd = f[data_group+'nitrogendioxide_slant_column_density'][0]
            no2_strat = f[data_group+'nitrogendioxide_stratospheric_column'][0]

This will cause each file to be open and closed in sequence. The way most people use eartheaccess.open with multiple files, multiple files are opened concurrently across multiple threads, and they are not closed, causing resource leaks. Further, given some potential issues with the combination of fsspec caching, multi-threading, and h5py, opening (and closing) each file in sequence might just address this issue.

Although I wouldn't normally suspect a "Bad Gateway" error to be a result of such potential caching/threading conflicts, I've certainly seen misleading error messages before.

Alternatively, it might literally be a flaky server causing intermittent "Bad Gateway" errors.

Regardless, I still recommend the "safer" file handling approach I gave above. If it doesn't fix this specific problem, it should at least avoid other potentially gnarly behavior.

0 replies

zfasnacht · 2024-06-10T21:26:54Z

zfasnacht
Jun 10, 2024
Author

Oh geez, that's a great point. I actually am normally careful about closing files but it looks like I did miss that so you might be very right. I'll give that a try.

Thanks for the help!

0 replies

zfasnacht · 2024-06-11T18:19:37Z

zfasnacht
Jun 11, 2024
Author

So I'm making sure I close the file now but it still seems like after I read 1-2 files I get the Bad Gateway error

Any other possible suggestions to improve this?

0 replies

mfisher87 · 2024-06-12T00:13:50Z

mfisher87
Jun 12, 2024
Maintainer

It's a different file every time, right? You mentioned that this is random. Perhaps a retry mechanism which "backs off" a little bit by waiting an increasing number of seconds (to a limit) with each retry would help work around this. It's possible this explanation from @chuckwondo is the issue:

it might literally be a flaky server causing intermittent "Bad Gateway" errors.

GES DISC may appreciate a heads up about this or be better able to help troubleshoot.

0 replies

zfasnacht · 2024-06-12T00:26:35Z

zfasnacht
Jun 12, 2024
Author

I'll give the retries a test. Problem is that it's happening so frequently that I'm not sure how much that will help. It seemed today like I actually went for 30mins to an hour without being able to access a single file.

I sent a message to the contact info for earthdata but as you suggest I'll also reach out to the GES DISC folks.

Thanks again for all the help!

0 replies

mfisher87 · 2024-06-12T00:36:13Z

mfisher87
Jun 12, 2024
Maintainer

We're happy to help any time! I'm going to close this issue since we have a new issue to track the need for us to implement retries internally, but if you feel there's more to talk about or that the issue should be re-opened, please feel free to continue to post here.

0 replies

goodwilj · 2024-06-12T17:21:01Z

goodwilj
Jun 12, 2024

@zfasnacht I was also having this issue downloading large amounts of TEMPO data (though this data is probably held on different servers than TROPOMI data), and I came across this issue. The 502 Bad Gateway error would occur randomly with or without using the earthaccess API (e.g. with curl also), so it seems to be a server issue. I reduced the number of threads in the earthaccess.download() function to potentially help any overloading. The 502 Bad Gateway errors still persisted but were less frequent.

0 replies

mfisher87 · 2024-06-12T20:06:18Z

mfisher87
Jun 12, 2024
Maintainer

It's clear there's more to discuss here! I'm going to re-open and convert this to a discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad gateway error when trying to access TROPOMI files #601

{{title}}

Replies: 13 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Bad gateway error when trying to access TROPOMI files #601

zfasnacht Jun 10, 2024

Replies: 13 comments

chuckwondo Jun 10, 2024 Maintainer

zfasnacht Jun 10, 2024 Author

zfasnacht Jun 10, 2024 Author

chuckwondo Jun 10, 2024 Maintainer

zfasnacht Jun 10, 2024 Author

chuckwondo Jun 10, 2024 Maintainer

zfasnacht Jun 10, 2024 Author

zfasnacht Jun 11, 2024 Author

mfisher87 Jun 12, 2024 Maintainer

zfasnacht Jun 12, 2024 Author

mfisher87 Jun 12, 2024 Maintainer

goodwilj Jun 12, 2024

mfisher87 Jun 12, 2024 Maintainer

zfasnacht
Jun 10, 2024

chuckwondo
Jun 10, 2024
Maintainer

zfasnacht
Jun 10, 2024
Author

zfasnacht
Jun 10, 2024
Author

chuckwondo
Jun 10, 2024
Maintainer

zfasnacht
Jun 10, 2024
Author

chuckwondo
Jun 10, 2024
Maintainer

zfasnacht
Jun 10, 2024
Author

zfasnacht
Jun 11, 2024
Author

mfisher87
Jun 12, 2024
Maintainer

zfasnacht
Jun 12, 2024
Author

mfisher87
Jun 12, 2024
Maintainer

goodwilj
Jun 12, 2024

mfisher87
Jun 12, 2024
Maintainer