Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esgpull not recognizing updated datasets #60

Open
meteorologist15 opened this issue Feb 12, 2025 · 2 comments
Open

esgpull not recognizing updated datasets #60

meteorologist15 opened this issue Feb 12, 2025 · 2 comments

Comments

@meteorologist15
Copy link

At times, esgpull will have a difficult time recognizing retractions to datasets/files that fall under a particular query, thus leading to the scenario that I presented in #58.

The example I present here is for the input4MIPs project, and the following query:

<cd5e6e>
└── distrib:        False     
    latest:         True      
    replica:        None      
    retracted:      False     
    institution_id: uoexeter  
    mip_era:        CMIP6Plus 
    project:        input4MIPs

From this particular institution, there have been several iterations of data, the latest of which as of this writing is "1-3-1".

Example: input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.day.utsvolcemis.gn

Previous "versions" of '1-1-3', '1-2-0', and '1-3-0', have been retracted. When I go to update the esgpull query (i.e. esgpull update cd5e6e, I will receive the following message:
<cd5e6e> is already up-to-date.
For some reason, esgpull is not recognizing these different dataset ID's. Any help would be appreciated.

Thanks

@svenrdz
Copy link
Collaborator

svenrdz commented Feb 12, 2025

Hi @meteorologist15

I notice you are using distrib=false, could it be that the index_node you are querying does not have the new versions ?

I tried reproducing your issue, and got updates with new files on my end, with distrib=true. The query does not seem to fetch any particular version, it contains all of the ones you mention:

$ esgpull search --distrib true --latest true institution_id:uoexeter mip_era:CMIP6Plus project:input4MIPs  --all --file --hints source_id
[
  {
    "source_id": {
      "UOEXETER-CMIP-1-1-3": 7,
      "UOEXETER-CMIP-1-2-0": 13,
      "UOEXETER-CMIP-1-3-0": 13,
      "UOEXETER-CMIP-1-3-1": 15
    }
  }
]

That amounts to 48 files, which is also what I get after an esgpull update on the same query, is that consistent with what you observe ?

@meteorologist15
Copy link
Author

Using distrib=True, my query displays as follows

<3882a1>
└── distrib:        True                    
    latest:         True                    
    replica:        None                    
    retracted:      False                   
    institution_id: uoexeter                
    mip_era:        CMIP6Plus               
    project:        input4MIPs              
    files:          55.1 kiB / 7.4 GiB [1/7]

After running esgpull update 3882a1, it recognizes 7 new files (strange as there should be 15?) And when I try running the download, it errors out. The query is only recognizing the 1-1-3 versions,

From the LOG output:

[2025-02-12 12:11:09]  ERROR     root

  + Exception Group Traceback (most recent call last):
  |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/esgpull/tui.py", line 164, in logging
  |     yield
  |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/esgpull/cli/download.py", line 73, in download
  |     esg.ui.raise_maybe_record(exc_group)
  |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/esgpull/tui.py", line 338, in raise_maybe_record
  |     raise exc 
  | ExceptionGroup: Download (6 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/esgpull/processor.py", line 117, in stream
    |     async for ctx in stream:
    |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/esgpull/download.py", line 58, in stream
    |     resp.raise_for_status()
    |   File "/net2/ker/anaconda3/envs/esgpull_temp/lib/python3.12/site-packages/httpx/_models.py", line 761, in raise_for_status
    |     raise HTTPStatusError(message, request=request, response=self)
    | httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://esgf1.dkrz.de/thredds/fileServer/input4mips/CMIP6Plus/CMIP/uoexeter/UOEXETER-CMIP-1-1-3/atmos/mon/reff/gnz/v20240903/reff_input4MIPs_aerosolProperties_CMIP_UOEXETER-CMIP-1-1-3_gnz_175001-202312.nc'
    | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

So it appears that distrib=True isn't recognizing the new files at download time. The search feature does recognize the files however.

esgpull search project:input4MIPs mip_era:CMIP6Plus institution_id:uoexeter

results in

Found 15 datasets.
 id │                                          dataset                                          │ # │   size    
════╪═══════════════════════════════════════════════════════════════════════════════════════════╪═══╪═══════════
  0 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.day.utsvolcemis.gn.v20250210 │ 1 │  47.1 kiB 
  1 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.asy.gnz.v20250210        │ 1 │   2.5 GiB 
  2 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.asy.gnz.v20250210       │ 1 │   9.5 MiB 
  3 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.ext.gnz.v20250210       │ 1 │   9.5 MiB 
  4 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.nd.gnz.v20250210        │ 1 │ 273.5 kiB 
  5 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.reff.gnz.v20250210      │ 1 │ 273.7 kiB 
  6 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.sad.gnz.v20250210       │ 1 │ 273.5 kiB 
  7 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.ssa.gnz.v20250210       │ 1 │   9.5 MiB 
  8 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.monC.vd.gnz.v20250210        │ 1 │ 273.5 kiB 
  9 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.ext.gnz.v20250210        │ 1 │   2.5 GiB 
 10 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.nd.gnz.v20250210         │ 1 │  63.6 MiB 
 11 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.reff.gnz.v20250210       │ 1 │  63.6 MiB 
 12 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.sad.gnz.v20250210        │ 1 │  63.6 MiB 
 13 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.ssa.gnz.v20250210        │ 1 │   2.5 GiB 
 14 │ input4MIPs.CMIP6Plus.CMIP.uoexeter.UOEXETER-CMIP-1-3-1.atmos.mon.vd.gnz.v20250210         │ 1 │  63.6 MiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants