-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce tags for strings #1
Comments
@danielplohmann I'd invite you to take a peek at our research called QUANTUMSTRAND:
Notably, we've been collecting databases of "tags" for strings that we find in executable files, including:
Perhaps we could collaborate on these ideas and/or database contents? |
Hey @williballenthin, thanks so much for this pointer to your research! I'm currently already reprocessing Malpedia with floss-3.0.1 and have also included a selection of benign code for which strings are extracted on the fly. Today, I've used my ApiScout DbBuilder to parse all of the DLLs found on Win10 and diffed that against your API collection with the following results (qs_api_additions.zip):
Depending on if you want the "full" or just "common" set (based on prior research around WinAPI usage frequency) that's a couple thousand WinAPI functions in additions already. Further ideas for tagging that I had (a bunch possibly identifiable by regex as well) are
Do you have a list of ideas for entities somewhere as well that could be joined? |
Thank you! We'll add these to the databases for even greater coverage.
Nothing more thorough than you listed here. We imagined that analysts might be able to contribute regular expressions alongside some metadata (comments, tags, etc.) to be rendered nicely. Like "this is the OpenSSL version string" or "if you see this, panic!". I guess it's yet another rule format, but meant for extensibility of tools, not detection. Personally, these various databases seem pretty useful, but I'm not quite sure when and how they'll see action. Maybe QS takes off. Or maybe we'll do an IDA plugin. Thoughts? |
Okay, sounds good! With respect to the tags I listed above, I went ahead and created some simple heuristics to apply them to the strings.
I created string DBs for file-extensions, LOLBAS, language-ids (like For operationalization, I definitely had similar thoughts, especially for an IDA plugin as a demo use case. The key challenge seems to be filtering out trash strings on which I will spend a bit more time, I guess... |
What if we took a large number of capa runs and joined "string references by function" with "capa matches by function" so that we could say "this string is often associated with... [DNS resolution or whatever]"? (and we could do the reverse: "this capa rule is often associated with the strings...") |
Add tags to strings to give them semantic context.
A taxonomy could encompass for example:
winapi
: strings that are associated with Windows DLL files or WinAPI namesbenign
/library
: strings that are found in benign software and/or libraries, like deflate etc.compiler
: strings that are introduced as metadata by compilersThe text was updated successfully, but these errors were encountered: