Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature update services may not expose new signatures for workers immediately #232

Open
kam193 opened this issue Jun 4, 2024 · 0 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working

Comments

@kam193
Copy link

kam193 commented Jun 4, 2024

Describe the bug
I've observed it originally when testing my service, but I was also able to reproduce it with the YARA service. When updating signatures from a source, the updater service (base from AL library) sends new data to Elasticsearch (do_source_update), and then notifies another thread to get a new signature package from Elasticsearch (do_local_update), and then finally serves it to worker services.

When uploading data to Elastic, the updater through signature client makes a synchronous call, but does not request to wait for shreds to be refreshed. By default, Elastic finishes the bulk request independently of the refreshing. Effectively, if the Elastic isn't quick enough or the updater slow enough, the new signatures are not visible yet when the updater asks for a new signature package. Effectively, new updates are not downloaded back to the updater and are not exposed to the workers until another update of local files (worst case: on next scheduled update, e.g. next day).

To Reproduce
Steps to reproduce the behavior:

  1. Set up YARA service with an update source you control, for example with a one signature.
  2. Start the updater, wait until the data is downloaded. Check your data in the signature viewer and the signature in the updater file in the container.
  3. Change your source data, e.g. by editing the signature. Trigger update (with source in GitHub, I needed to wait a little until they refreshed caches).
  4. See update happening in logs (e.q. Imported 1/1 signatures from example into Assemblyline), but also a log No signature updates available. shortly after.
  5. Check data in the signature viewer - should be the newest version - and compare with data in the signature package in the service - should still have the older version (!).
  6. Trigger update once more, observe refreshed signature package.

Expected behavior
After successful update from source, data are immediately or in a short time available to download by workers.

The bulk API from Elastic exposes a parameter refresh requesting Elastic to e.g. wait for the refresh. By default, it does not wait. I did tests by hardcording wait_for value in the bulk() method in datastore/collection.py from assemblyline_base repo, and it fixed the problem. However, I don't know if it was intentionally not set or it has any other side effects somewhere else.

Screenshots

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.0.29, Yara 4.5.10
  • Elasticsearch still v7

Additional context
During the debugging, I've confirmed that the Elasticsearch is just returning old last modified timestamp, it's not the issue in the service itself. This is a race condition, and I'm aware it may be harder to spot with more sources or different Elastic configuration. I believe it should generally be consistent, and if it's not just that the Elasticsearch in my configuration is slow, the impact may sometimes be rather big (e.g. Yara rules used a day later than expected).

@kam193 kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants