Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalBand exception #790

Closed
DeRooBert opened this issue Jun 4, 2024 · 14 comments
Closed

IllegalBand exception #790

DeRooBert opened this issue Jun 4, 2024 · 14 comments
Assignees
Labels

Comments

@DeRooBert
Copy link

While running jobs on waw3-1.openeo-vlcc-prod.vgt.vito.be, I got sometimes (10% of jobs) the error that band 9 does not exists (I guess viewAzimuthMean from S2-tile). Can somebody have a look where this problem originates from? ex. j-2406043c04884110a4bad00e9a79aaf8. The error popped after upgrade of openeo on waw3-1.openeo-vlcc-prod (21/05). Before the upgrade this error was not seen.
This is urgent for the VLCC production.

@bossie
Copy link
Collaborator

bossie commented Jun 4, 2024

The error popped after upgrade of openeo on waw3-1.openeo-vlcc-prod (21/05). Before the upgrade this error was not seen.

Could you also provide the ID of a job that was successful but is failing now (i.e. one from before the upgrade)?

@DeRooBert
Copy link
Author

I don't have that information at the moment, this would me require rerun (random) jobs before the upgrade and then hope the error will show up. I'll give it a shot.

@jdries
Copy link
Contributor

jdries commented Jun 4, 2024

Note that this likely has something to do with the readperproduct code path.
Only thing is that I actually already fixed something very similar, my guess is that there's still an uncovered edge case.

@bossie bossie added the bug label Jun 5, 2024
@DeRooBert
Copy link
Author

Successful in April: j-240426c6ad0e4dc2aa2454a50194a9f8
Failed now : j-240605db8f004d198057de621673dac3

@DeRooBert
Copy link
Author

@jdries Can this ticket be raised in priority?
Can the following scenario also happen; when only partial band 9 is found and hence no error is thrown? (We see are suddenly decrease in successfull products in a subsequent step of the processing). Here a job-id where we might this issue be happening:j-2405306429094f73aa333edb6d1b0d68

@jdries
Copy link
Contributor

jdries commented Jun 6, 2024

yes, trying to do that, but had to move some other things out of the way first
it's pretty specific, making it a bit harder to assign a random person

@jdries
Copy link
Contributor

jdries commented Jun 6, 2024

It happens in the filter bands process, before the fapar UDF.
A large number of tasks do succeed, so it is not consistent, and band 9 is in fact only the last band, so other geometry bands do seem to work fine.
Hence we're looking for a case where load_collection for whatever reason decides to not return one of the bands.

@jdries
Copy link
Contributor

jdries commented Jun 7, 2024

We need to figure out which band is exactly missing. Band with index 9 is the last one in the list, so looks like only 1 band got lost somewhere for specific tiles.
I'm going to remove the filter_bands and then I can put some logging in the fapar udf to figure this out.

@jdries
Copy link
Contributor

jdries commented Jun 7, 2024

as expected, logging in the udf shows that most chunks have the correct number of 10 bands
I now added an exception to indicate when the number is lower, and print the cube

jdries added a commit that referenced this issue Jun 7, 2024
@jdries
Copy link
Contributor

jdries commented Jun 7, 2024

printing the bad cube in udf didn't work out, because a built-in check before that also throws an error because something is wrong with input band labels
now exporting cube to netcdf, hoping to see where the problem is situated

jdries added a commit to Open-EO/openeo-python-driver that referenced this issue Jun 7, 2024
@jdries
Copy link
Contributor

jdries commented Jun 8, 2024

was able to narrow it down by downloading and inspecting the full 4.9GB input data cube
It happens for march 2019, the viewZenithMean band seems to go missing for specific chunks.

When switching band order, it turns out that the band right before the last one goes missing.
When saving to file, it is reproducable for netcdf but apparently not for tiff...

{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B02",
          "viewAzimuthMean",
          "viewZenithMean",
          "sunAzimuthAngles",
          "sunZenithAngles"
        ],
        "featureflags": {
          "indexreduction": 2,
          "temporalresolution": "ByDay",
          "tilesize": 512
        },
        "id": "SENTINEL2_L2A",
        "properties": {
          "eo:cloud_cover": {
            "process_graph": {
              "lte1": {
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": 95
                },
                "process_id": "lte",
                "result": true
              }
            }
          },
          "tileId": {
            "process_graph": {
              "eq1": {
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": "30SX*"
                },
                "process_id": "eq",
                "result": true
              }
            }
          }
        },
        "spatial_extent": {
          "east": -0.862851468722435,
          "north": 37.85256342596736,
          "south": 37.73684527335698,
          "west": -1.0294362480744212
        },
        "temporal_extent": [
          "2019-03-01",
          "2019-04-01"
        ]
      }
    },
    "save1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "format": "NETCDF"
      },
      "result": true
    }
  },
  "parameters": []
}

@jdries
Copy link
Contributor

jdries commented Jun 8, 2024

May have found the issue: angle band names were not always unique. In the rare case where sun and view azimuth angles were similar, this issue could happen.

@jdries
Copy link
Contributor

jdries commented Jun 9, 2024

@DeRooBert it seems fixed on staging, can be deployed on vlcc clusters.

@DeRooBert
Copy link
Author

Ok, Ill ask Thomas to do a redeploy on vlcc clusters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants