-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CSV feed] CSV feed flood ingestion with identical data #8588
Comments
I've started a feed to reproduce, will let you know about the output |
We compute the hash on the full file. |
I reproduced your issue @Lhorus6. So based on your comment @richard-julien , can I consider it as a "wont fix"? |
Maybe it's not the hash calculation we have to play with, but there's something to be done in any case IMO. Here we're blowing up the ingestion queues Julien said " To prevent too much works I currently try to not create any job is there is something already in the queue", so I guess he is testing possibilities for improvement |
Yes. Testing PR opened here #8617 |
Description
CSV feed import seems buggy or not optimized.
In my case, I have an import from the Blocklist.de source, which contains around 30K IPs. E.g. at this moment, we have 27K entries in the source:
However, just for this small source, I currently find myself with 2.36M bundles in the queue and tons of works.
Environment
OCTI 6.3.4
Reproducible Steps
Steps to create the smallest reproducible scenario:
Additional information
It seems to me that it only imports the data if the hash changes. So this source updates its file every 30 minutes? (because I have a work every 30min)
This seems unlikely, perhaps we have a bug in the hash generation that takes meta data as input?
Just a guess
If the file does change continuously, maybe we shouldn't have to retrieve it every time, but just 2 times a day?
The text was updated successfully, but these errors were encountered: