[CSV feed] CSV feed flood ingestion with identical data #8588

Lhorus6 · 2024-10-03T09:22:21Z

Description

CSV feed import seems buggy or not optimized.

In my case, I have an import from the Blocklist.de source, which contains around 30K IPs. E.g. at this moment, we have 27K entries in the source:

However, just for this small source, I currently find myself with 2.36M bundles in the queue and tons of works.

Environment

OCTI 6.3.4

Reproducible Steps

Steps to create the smallest reproducible scenario:

Create this CSV Mapper:

Create this CSV feed: https://lists.blocklist.de/lists/all.txt

Let it run for several hours, or even 24 - 48 hours, to see how it behaves.

Additional information

It seems to me that it only imports the data if the hash changes. So this source updates its file every 30 minutes? (because I have a work every 30min)

This seems unlikely, perhaps we have a bug in the hash generation that takes meta data as input?
Just a guess

If the file does change continuously, maybe we shouldn't have to retrieve it every time, but just 2 times a day?

nino-filigran · 2024-10-04T07:13:45Z

I've started a feed to reproduce, will let you know about the output

richard-julien · 2024-10-05T12:09:08Z

We compute the hash on the full file.
We cant really do much on term of data control.
To prevent too much works I currently try to not create any job is there is something already in the queue

nino-filigran · 2024-10-07T07:19:48Z

I reproduced your issue @Lhorus6. So based on your comment @richard-julien , can I consider it as a "wont fix"?

Lhorus6 · 2024-10-07T08:11:00Z

Maybe it's not the hash calculation we have to play with, but there's something to be done in any case IMO. Here we're blowing up the ingestion queues

Julien said " To prevent too much works I currently try to not create any job is there is something already in the queue", so I guess he is testing possibilities for improvement

richard-julien · 2024-10-07T08:47:55Z

Yes. Testing PR opened here #8617

Lhorus6 added bug use for describing something not working as expected needs triage use to identify issue needing triage from Filigran Product team labels Oct 3, 2024

nino-filigran added needs more info Intel needed about the use case and removed needs triage use to identify issue needing triage from Filigran Product team labels Oct 4, 2024

nino-filigran removed the needs more info Intel needed about the use case label Oct 7, 2024

nino-filigran added this to the Bugs backlog milestone Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CSV feed] CSV feed flood ingestion with identical data #8588

[CSV feed] CSV feed flood ingestion with identical data #8588

Lhorus6 commented Oct 3, 2024 •

edited

Loading

nino-filigran commented Oct 4, 2024

richard-julien commented Oct 5, 2024

nino-filigran commented Oct 7, 2024

Lhorus6 commented Oct 7, 2024

richard-julien commented Oct 7, 2024

[CSV feed] CSV feed flood ingestion with identical data #8588

[CSV feed] CSV feed flood ingestion with identical data #8588

Comments

Lhorus6 commented Oct 3, 2024 • edited Loading

Description

Environment

Reproducible Steps

Additional information

nino-filigran commented Oct 4, 2024

richard-julien commented Oct 5, 2024

nino-filigran commented Oct 7, 2024

Lhorus6 commented Oct 7, 2024

richard-julien commented Oct 7, 2024

Lhorus6 commented Oct 3, 2024 •

edited

Loading