Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert list timing out #319

Open
turley85 opened this issue Oct 23, 2024 · 16 comments
Open

Alert list timing out #319

turley85 opened this issue Oct 23, 2024 · 16 comments
Labels
Biosecurity Alerts for biosecurity species

Comments

@turley85
Copy link

turley85 commented Oct 23, 2024

I'm getting a 504 Gateway Time-out the previewing the alert:

BioSecurity alert for NSW_NPWS_Western_Weeds_list

@turley85 turley85 added the Biosecurity Alerts for biosecurity species label Oct 23, 2024
@nickdos
Copy link
Contributor

nickdos commented Oct 23, 2024

That URL is working for me now. Is it still showing 504 for you?

@turley85
Copy link
Author

Sorry, that URL was for the list.

It's the alert associated with that list that is failing... I can't get a URL directly to the alert itself to work sorry.

@nickdos
Copy link
Contributor

nickdos commented Oct 23, 2024

Looking at logs, these errors are resulting:

2024-10-24 08:09:33.812 ERROR --- [.1-8080-exec-26] au.org.ala.alerts.BiosecurityService     : Server returned HTTP response code: 504 for URL: https://biocache.ala.org.au/ws/occurrences/search?q=%28genus%3A%22Alternanthera+philoxeroides%22%29+OR+%28species%3A%22Alternanthera+philoxeroides%22%29+OR+%28subspecies%3A%22Alternanthera+philoxeroides%22%29+OR+%28scientificName%3A%22Alternanthera+philoxeroides%22%29+OR+%28raw_scientificName%3A%22Alternanthera+philoxeroides%22%29&fq=-data_resource_uid%3A%22dr27665%22+AND+spatialObject%3A9433219+OR+spatialObject%3A9433227&fq=eventDate%3A%5B2024-05-23T14%3A00%3A00Z+TO+2024-10-23T21%3A08%3A33Z+%5D&fq=firstLoadedDate%3A%5B2024-10-20T13%3A00%3A00Z+TO+2024-10-23T21%3A08%3A33Z+%5D&pageSize=10000

Testing that URL manually, resulted in 504 Gateway Time-out and not the usual SOLR error you see when the spatial_object is too long.

I'm guessing the spatial_object is still to blame (too complex) and resulting in SOLR timing out or running out of memory.

UPDATE: I think the fq column might not be written correctly too. E.g. -data_resource_uid:"dr27665" AND spatialObject:9433219 OR spatialObject:9433227 - Boolean precedence means that the AND will take precedence over the OR, resulting in (effectively) (-data_resource_uid:"dr27665" AND spatialObject:9433219) OR spatialObject:9433227. So it will (effectively) return all the results that match spatialObject:9433227 due to the last OR.

I think the intended result should use: -data_resource_uid:"dr27665" AND (spatialObject:9433219 OR spatialObject:9433227).

UPDATE 2: Reminder: the spatial object should be tested independently before using in a fq.

https://biocache.ala.org.au/ws/occurrences/search?q=spatialObject:9433227

results in

{
message: "Error from server at null: Expected mime type application/octet-stream but got application/json. {  "error":{    "metadata":[      "error-class","org.apache.solr.common.SolrException",      "root-error-class","org.apache.solr.common.SolrException"],    "msg":"application/x-www-form-urlencoded content length (74308530 bytes) exceeds upload limit of 32768 KB",    "code":400}}",
errorType: "Query syntax invalid",
statusCode: 400
}

FYI, we advise that you do not combine spatialObject in a fq too. By combining 2 spatialObject's, you are in effect, causing the same error shown above (internally its like using one combined object).

@kylie-m
Copy link

kylie-m commented Oct 24, 2024

Thanks for investigating Nick! Adding some other relevant background here, spatialObject:9433227 was one of the original shapefiles that was too complex and needed optimising. There's more info on ticket #246 but essentially there is an optimised version of it that I created, spatialObject:9439588. So that should be the one used in alerts.

Can be viewed and tested at:
https://spatial.ala.org.au/?pid=9439588
https://biocache.ala.org.au/ws/occurrences/search?q=spatialObject:9439588

@turley85
Copy link
Author

turley85 commented Oct 24, 2024

@kylie-m @nickdos I just updated that spatial object from spatialObject:9433227 to https://spatial.ala.org.au/?pid=9439588. However, the alert still failed.

I note that the list actually runs of 3 shapefiles, so is one of the other two causing this issue too? Or is having multiple shapefiles itself causing the issue?

@nickdos
Copy link
Contributor

nickdos commented Oct 24, 2024

Hi @turley85 - I saw this in the logs:

2024-10-24 12:56:59.648 ERROR --- [.1-8080-exec-30] au.org.ala.alerts.NotificationService : User or query not found for userId: null, queryId: BioSecurity alert for NSW_NPWS_Western_Weeds_list

userId: null,

So I think you had the page loaded from earlier and then clicked the "Preview" or "Notify" but your login had expired. So try reloading the page and see if you're prompted to login again. And then try running it again.

@turley85
Copy link
Author

turley85 commented Oct 24, 2024

@nickdos Hmm, I just tried again. Closed the windows, logged out and then back into ALA and used "Preview" to test the alert again.

It failed again sorry :(

Let me know if there's something else I should have done to test!

@kylie-m
Copy link

kylie-m commented Oct 24, 2024

I just tested https://biocache.ala.org.au/ws/occurrences/search?q=spatialObject:9433219 as well, so that spatialObject should be ok. I didn't spot a third one on the list though?

@nickdos
Copy link
Contributor

nickdos commented Oct 24, 2024

Same error again: 2024-10-24 12:56:59.648 ERROR --- [.1-8080-exec-30] au.org.ala.alerts.NotificationService : User or query not found for userId: null, queryId: BioSecurity alert for NSW_NPWS_Western_Weeds_list.

Will look into it more.

@nickdos
Copy link
Contributor

nickdos commented Oct 24, 2024

Seems the timeouts are causing the DB to error (as described in other ticket), so the DB lookup for the query ID subsequently fails.

So fix is to remove all but one spatialObject in the list fq column, and re-try.

@turley85, we strongly recommend you take a copy of the list over to lists-test.ala.org.au and do the testing on alerts-test.ala.org.au., before making changes on production servers.

@kylie-m
Copy link

kylie-m commented Oct 24, 2024

@nickdos would a good additional workaround here be to combine the 2 spatial layers into one layer in QGIS first? No guarantees but I can give that a try, have done so for other work previously

@nickdos
Copy link
Contributor

nickdos commented Oct 24, 2024

@nickdos would a good additional workaround here be to combine the 2 spatial layers into one layer in QGIS first?

@kylie-m - I think so. Combining spatialObjects adds an extra level of complexity and depending on how they are combined, could be worse than a single object. So simpler/safer to stick with a single spatialObject, as recommended by Adam.

@kylie-m
Copy link

kylie-m commented Oct 25, 2024

Thanks Nick!

@turley85 I have merged the 2 layers in QGIS, then uploaded to Spatial portal.

In ala-test:
resulting object: spatialObject:21643483
test: https://api.test.ala.org.au/occurrences/occurrences/search?q=Acacia%20longifolia&fq=spatialObject%3A21643483
test in UI to view on map (will be quite slow): https://biocache-test.ala.org.au/occurrences/search?q=taxa&fq=spatialObject:21643483

is returning records within the new spatial object above, though the equivalent alert on test is not yet working - I'll keep trying, but @nickdos if you have any ideas, let me know!
(https://lists-test.ala.org.au/speciesListItem/list/dr22890)

In production:
resulting object: spatialObject:9478102
test: https://biocache.ala.org.au/ws/occurrences/search?q=spatialObject:9478102

Alert is working for this test list: https://lists.ala.org.au/speciesListItem/list/dr28737
(Alert name: "wattle")

Other Docs:

@turley85
Copy link
Author

turley85 commented Oct 25, 2024

@kylie-m @nickdos I've updated the NPWS Western list with spatialObject:9478102 in production and still getting 504 Gateway timeout sorry.

However, I did replicate @kylie-m's result with the wattle list:

76 new records for
wattle, dr28737
since 16 Oct 2024

@kylie-m
Copy link

kylie-m commented Oct 25, 2024

hmm I wonder if the spatialObject is too complex when in combination with a more complex query, but just ok with a simpler query, @nickdos ?

@nickdos
Copy link
Contributor

nickdos commented Oct 31, 2024

@kylie-m I wondered the same thing - I think the additional terms for the OR'ed names might be pushing us over some threshold value. Only way to know is run the alert and look at logs, I think.

503 gateway timeout is usually an indicator, the biocache requests are timing out or erroring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Biosecurity Alerts for biosecurity species
Projects
None yet
Development

No branches or pull requests

3 participants