Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving parser's metadata & output #781

Closed
cccs-jp opened this issue Mar 1, 2022 · 21 comments
Closed

Improving parser's metadata & output #781

cccs-jp opened this issue Mar 1, 2022 · 21 comments

Comments

@cccs-jp
Copy link

cccs-jp commented Mar 1, 2022

Hi CAPE Team!

First let me thank you for all the hard work put on this project over the years. IoC extracted from your parsers are helping Canada be a safer place.
I’m reaching out from the Assemblyline team at the Canadian Cyber Centre. (Home - Assemblyline 4 (cybercentrecanada.github.io).

We’ve noticed that you recently made some cleanup in your parsers and we are wondering if you would be open to a few extra changes in order to make it easier to integrate & consume parsers’ data.
A few ideas:

Finally, one common problem for us is the lack of standardized output between parsers and frameworks.
We think it would be easy could come up with a simple (key:value) ontology for frequently extracted type of information which would help to standardize fields across parsers(IP, domains, mutex etc), yet still allow for flexibility for custom fields. It could be a simple library imported in the parsers. MWCP did provided a bit of that but it looks like you are moving away from it.
We would be happy to contribute via PRs.

Cheers from Canada! 👋

@doomedraven
Copy link
Collaborator

Hello, yes we are open to any improvements/suggestions. So by parts:

  • About add family name is fine, but here is kinda question, which name you want to use, or just set it to list and use all names for family X? like Malpedia does?

  • with TLP what are you mean? I mean TLP of parser or what exactly?

  • the problem with separate extractors to their own repos, you will need to start writing a loader for each repo, we have integration of all frameworks, and we had to write 2 different loaders in order to handle how different frameworks their plugins. I do understand that is easy to do git pull to get updated external modules. But i think is easier write a small wrapper that does git pull and move as update than handle each repo and how they store files. We almos have added note about parsers, that we prefer pure python parsers as if you do search in issues you will see that in past mwcp was breaking different extractors depending on version you have installed. and most of those repos has short life span and their extractors become outdated pretty fast.

  • about keys names yes is known problem, but that was mostly due to different frameworks as they was using specific fields. if you have suggestion we are happy to hear that. I personally don't use any public extractor even from CAPE.

Cheers from Kyiv.

@cccs-jp
Copy link
Author

cccs-jp commented Mar 2, 2022

Hi @doomedraven, thanks for your quick reply.

1 & 2: Making DESCRIPTION, AUTHOR, FAMILY(malware the parser is for), TLP (TLP of the parser) mandatory would go a long way for us. eg:

FAMILY = "EVILGRAB"
DESCRIPTION = "EvilGrab configuration parser."
AUTHOR = "kevoreilly"
TLP = "WHITE"

3: Ok we understand!
4: We will propose a format with a small validation library, you can see if you like it and want to adopt it in CAPE. I will get back to you on that one.

Thanks!

@doomedraven
Copy link
Collaborator

sounds good, thank you

@kevoreilly
Copy link
Owner

I'm certainly happy to hear that parsers from CAPE are helping Canada - great to hear. Certainly no reason we can't add those fields and use a validation library.

Out of interest, do you make use of the sandbox as a whole, or just the static configuration parsers? I am curious and note that while there is a Cuckoo service in AssemblyLine, there is no support for CAPE. Perhaps there is some feedback the project could benefit from here? Certainly one of the main reasons for the existence of this project is the automated unpacking of malware, which would seem to me to be a perfect fit for a service in AssemblyLine.

@cccs-rs
Copy link
Contributor

cccs-rs commented Mar 3, 2022

We're currently working on revamping our ConfigExtractor service which only handles MWCP, Malduck (via MWCFG), and RATDecoders.

We'd like to add CAPE to the roster but as we were rewriting the backend library, we noticed there was discrepancies between the output from different frameworks. So standardizing the output goes a big way in terms of Assemblyline because this helps us tag the output correctly.

A big holdback from the service is that you'd have to bake in the parsers at build time to make use of them (so as you can imagine, to keep up with you guys we'd have to build several times per day 😜).

We'd like to change that service to allow the users to dynamically add parsers for the service to use from Assemblyline (hence the ask to separate the parser modules into their own repo but we can get around that).

@doomedraven
Copy link
Collaborator

at the end, you will need to write a "proxy" for those frameworks to transform name fields. as everyone uses names that they want + the problem of different dependencies of each frameworks is a pain, is why i did move all from those to pure python

@cccs-rs
Copy link
Contributor

cccs-rs commented Mar 3, 2022

Exactly - handling the runtime for the different frameworks isn't so much the issue, it's the output that comes out.
So at the very least I would need some mapping that says 'XYZ' for CAPE and 'PQR' for MWCP map to 'ABC' in this general schema.

But at the same time, I can't really enforce the idea 'oh you must use these fields and they must be this type' and so on (but it would simplify things for the general public + new parser writers). 🤷‍♂️

@doomedraven
Copy link
Collaborator

so guys any update on this? i don't see any pull request with those changes

@cccs-jp
Copy link
Author

cccs-jp commented Apr 30, 2022

Yes, please give us a bit more time, we have an ontology to propose. We've been side tracked on other issues.
@cccs-rs @cccs-ay

@cccs-jp
Copy link
Author

cccs-jp commented Apr 30, 2022

Hi @kevoreilly, just noticed your reply on this thread! There is an Assemblyline service for CAPE maintained by NVISO (https://github.com/NVISOsecurity/assemblyline-service-cape).

We're in the process of investigating replacement for cuckoo, as its showing its age. Currently we had great success with running https://github.com/hasherezade/hollows_hunter inside Cuckoo and using various parsers (including yours) to extract the config from HH's memory dumps. One reason why we need to decouple the sandbox and parsers is we get memory dump submitted directly.

@kevoreilly
Copy link
Owner

kevoreilly commented Apr 30, 2022

Well I bet that's nowhere near as good as cape itself!

@doomedraven
Copy link
Collaborator

hello all, as we solved the initial issue im gonna close it, if you want to discuss something else related to this subject, just post new msg here

@cccs-rs
Copy link
Contributor

cccs-rs commented Jun 23, 2022

@doomedraven We now have an official ontology (and a framework) that'd we'd like to propose to be used:
https://github.com/CybercentreCanada/Maco

Any feedback on what you guys think is much appreciated!

@doomedraven
Copy link
Collaborator

i will be out for almost 1.5 from tomorrow. but i will try to find some time to get a look

@cccs-rs
Copy link
Contributor

cccs-rs commented Jul 5, 2022

Any updates on this?

@doomedraven
Copy link
Collaborator

sorry totally forgot about this. framework looks good

@cccs-jp
Copy link
Author

cccs-jp commented Jul 5, 2022

Can you let us know if you have some interest in adopting it?, it would help standardize extractor output which would be great for exporting the data in other system (e.g store in a DB). We are looking at supporting CAPE's extractor in our standalone config extractor service so MACO would help with that as well.

Quick update too, the Assemblyline team is now fully onboard with CAPE (as you've seen with recent PRs), we just released the official CAPE integration: https://github.com/CybercentreCanada/assemblyline-service-cape

Thanks for your work on this project, more collaboration incoming!

@doomedraven
Copy link
Collaborator

doomedraven commented Jul 6, 2022 via email

@cccs-rs
Copy link
Contributor

cccs-rs commented Jul 6, 2022

I'm currently working on the library that will run the different frameworks and doing service testing, but once I get that finalized, I can start on a PR to convert the CAPE parsers output to the MaCo format.

@doomedraven
Copy link
Collaborator

that would be amazing, thank you

@cccs-rs
Copy link
Contributor

cccs-rs commented Aug 2, 2022

Initial PR with the first round of conversions: #1037

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants