Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have CAPE adopt MACO format #1037

Merged
merged 23 commits into from
Aug 28, 2022
Merged

Have CAPE adopt MACO format #1037

merged 23 commits into from
Aug 28, 2022

Conversation

cccs-rs
Copy link
Contributor

@cccs-rs cccs-rs commented Aug 2, 2022

Converts existing CAPE parsers to output according to MACO's output model.

This is just an initial run with limited knowledge of what the parsers are meant to extract using previous results in capesandbox.com.
Some refinement may be needed on both the model's and the parsers' end to make sure ever the result is beneficial for all!

@doomedraven
Copy link
Collaborator

amazing work. thank you @cccs-rs. @kevoreilly it looks good to me, do you want to get a look or press merge directly

@kevoreilly
Copy link
Owner

Thanks - looks like I should test these before merging

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Aug 3, 2022

For sure! The one thing I noticed is the output is more nested under MACO which I believe wasn't the case for CAPE, so maybe flattening is required?

@doomedraven
Copy link
Collaborator

i hope to test this this weekend and merge if no problems

@doomedraven doomedraven merged commit bbee903 into kevoreilly:master Aug 28, 2022
@doomedraven
Copy link
Collaborator

Thank you guys

kevoreilly added a commit that referenced this pull request Sep 9, 2022
@kevoreilly
Copy link
Owner

I reverted this due to undesirable changes in representation of the parser output, for example QakBot:

image

which should appear instead as:

image

I am happy to work on this myself, I will just need to test each parser individually to ensure the new output is still acceptable in the web ui as well as in exported form.

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Oct 26, 2022

What do you think about having the data transform back to a flattened state for presentation in the UI? I can amend my PR to include that change specifically for CAPE in cape_utils?

This is at least sets the presentation in the original state while adopting the new format for the parsers.
cccs-rs@8d0e128

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Nov 14, 2022

@kevoreilly @doomedraven thoughts?

@kevoreilly
Copy link
Owner

Sorry for the delay in my reply - I have been pondering this, I think there is both an aesthetic and a technical point to be made here. I think Qakbot serves as a good example : in the malware the ip address and port are stored together. I think malware reverse engineers and other cape users will expect to see the config appear in the cape ui in the same form as in the malware, in what I would call 'raw' form. And from a purely aesthetic point of view, multiple repetitions of the label (i.e. "server_port") look bad.

My feeling is that to convert the config from the 'raw' malware into Maco then back into some third form for display in cape does not seem optimal. Ultimately if the config can be represented concisely in the same way that it appears in the malware then perhaps it does not matter, but I can't help but feel it might be better to do things the other way around: parse the data from the malware into 'raw' form which can be represented as it is currently, then transform into Maco for export/API and optional additional display in the web ui?

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Nov 16, 2022

The problem that I noticed with creating a translation layer from CAPE-to-Maco, let's say for an export API, is that there isn't a standard for the CAPE output (least not one that I noticed).

So it would be hard to map from raw to Maco, however going from Maco to a flatter version of Maco is doable and should play well with the UI as it currently stands.

@kevoreilly
Copy link
Owner

Well I guess the 'standard' output is defined by the code in the cape processing module. The design is a list of dicts at the top level representing families. Each dict has only one key which is the family name with the value another list of dicts, config item and values. Then more nested dicts/lists as needed basically.

If you can map from this form to MACO that could be a nice way to handle this. For example, here is a QakBot config in this form:

{
    "QakBot": {
        "Loader Build": [
            "404.30"
        ],
        "address": [
            [
                "23.240.47.58:995",
                "12.172.173.82:465",
                "91.169.12.198:32100",
                "94.63.65.146:443",
                "80.13.179.151:2222",
                "64.207.237.118:443",
                "24.206.27.39:443",
                "83.114.60.6:2222",
                "86.171.75.63:443",
                "86.195.32.149:2222",
                "170.253.25.35:443",
                "92.185.204.18:2078",
                "157.231.42.190:995",
                "170.249.59.153:443",
                "174.101.111.4:443",
                "116.74.163.152:443",
                "76.80.180.154:995",
                "180.151.104.143:443",
                "86.130.9.167:2222",
                "86.99.15.243:2222",
                "90.104.22.28:2222",
                "172.117.139.142:995",
                "103.141.50.117:995",
                "176.142.207.63:443",
                "71.183.236.133:443",
                "131.106.168.223:443",
                "190.75.110.239:443",
                "70.66.199.12:443",
                "183.87.31.34:443",
                "83.110.223.247:443",
                "47.34.30.133:443",
                "71.247.10.63:995",
                "92.207.132.174:2222",
                "89.129.109.27:2222",
                "12.172.173.82:21",
                "87.202.101.164:50000",
                "2.99.47.198:2222",
                "154.247.95.119:2078",
                "197.148.17.17:2078",
                "37.14.229.220:2222",
                "78.247.21.20:443",
                "112.141.184.246:995",
                "142.161.27.232:2222",
                "71.247.10.63:50003",
                "108.6.249.139:443",
                "92.239.81.124:443",
                "184.176.154.83:995",
                "184.153.132.82:443",
                "74.66.134.24:443",
                "24.64.114.59:3389",
                "105.184.161.242:443",
                "73.36.196.11:443",
                "82.31.37.241:443",
                "24.116.45.121:443",
                "213.67.255.57:2222",
                "200.93.14.206:2222",
                "91.254.215.167:443",
                "87.220.205.14:2222",
                "92.27.86.48:2222",
                "73.230.28.7:443",
                "176.151.15.101:443",
                "24.64.114.59:2222",
                "86.165.15.180:2222",
                "66.191.69.18:995",
                "175.205.2.54:443",
                "64.121.161.102:443",
                "87.99.116.47:443",
                "180.156.240.239:995",
                "12.172.173.82:22",
                "50.68.204.71:995",
                "213.91.235.146:443",
                "174.77.209.5:443",
                "76.127.192.23:443",
                "50.68.204.71:443",
                "109.11.175.42:2222",
                "199.83.165.233:443",
                "91.68.227.219:443",
                "45.248.169.101:443",
                "85.59.61.52:2222",
                "85.139.176.42:2222",
                "82.34.170.37:443",
                "157.231.42.190:443",
                "76.20.42.45:443",
                "27.110.134.202:995",
                "89.115.196.99:443",
                "83.11.84.105:2222",
                "12.172.173.82:2087",
                "12.172.173.82:443",
                "181.118.183.116:443",
                "174.45.15.123:443",
                "77.126.81.208:443",
                "92.106.70.62:2222",
                "82.121.73.56:2222",
                "173.239.94.212:443",
                "187.199.224.16:32103",
                "183.82.100.110:2222",
                "186.188.2.193:443",
                "41.62.227.225:443",
                "75.99.125.238:2222",
                "2.84.98.228:2222",
                "82.121.237.106:2222",
                "100.6.8.7:443",
                "85.241.180.94:443",
                "79.37.204.67:443",
                "217.128.91.196:2222",
                "58.247.115.126:995",
                "12.172.173.82:993",
                "98.147.155.235:443",
                "102.157.69.217:995",
                "212.251.122.147:995",
                "92.137.74.174:2222",
                "24.228.132.224:2222",
                "69.119.123.159:2222",
                "89.79.229.50:443",
                "47.176.30.75:443",
                "174.104.184.149:443",
                "173.32.181.236:443",
                "74.92.243.113:50000",
                "12.172.173.82:995",
                "58.186.75.42:443"
            ]
        ],
        "Campaign ID": [
            "BB06"
        ],
        "Config timestamp": [
            "11:06:37 17-11-2022"
        ]
    }
}

@cccs-rs
Copy link
Contributor Author

cccs-rs commented Nov 23, 2022

Right, which can be doable if let's say all parsers agreed on using the same set of fields rather than a parser defining fields at will (which is the aspect that makes mapping difficult because it's unpredictable) even though they conform to the same format that the rest of system accepts.

An example of what I mean is Azorult using address to refer to C2 domains, Emotet using the field for IP:PORT C2 addresses, and BuerLoader using the field as well (but not sure if those addresses are C2-related or not.)

So for me, in that particular scenario, it becomes a guessing game of: does 'address' refer to C2 URLs, IPs, domains, etc or is there another usage associated to those connections like downloading/uploading/etc.?

So going from a structured output to less than is easier than the reverse but I do understand your point in which CAPE users would like to see the data in its 'raw' form.

As a possible solution, what if each parser had a function to convert from raw to MACO. From your perspective, you would run the parsers as-is and render the output as you would originally. In our case, we would run your parser and then feed the raw output through the parser-specified conversion and then take the output.

@kevoreilly
Copy link
Owner

Yes I really like this idea - there are already multiple 'entry points' in some parsers, where cape calls extract_config() but standalone calls __main__() and a test harness may call test_them_all() so we could trivially add another function for MACO which wraps extract_config() and the conversion function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants