Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imphash issue with empty import names. #1944

Merged
merged 6 commits into from
Aug 18, 2023

Conversation

wxsBSD
Copy link
Collaborator

@wxsBSD wxsBSD commented Aug 10, 2023

If an import has an empty name skip processing it. This is consistent with the behavior of pefile (https://github.com/erocarrera/pefile/blob/593d094e35198dad92aaf040bef17eb800c8a373/pefile.py#L5871-L5872).

Add a test case which is just the tiny test file with the first import name set to all NULL bytes. I tested that pefile calculates the imphash of it, which matches what YARA now calculates too:

import pefile
pe = pefile.PE('/Users/wxs/src/yara/tests/data/tiny_empty_import_name')
pe.get_imphash()
'0eff3a0eb037af8c1ef0bada984d6af5'

Fixes #1943

If an import has an empty name skip processing it. This is consistent with the
behavior of pefile (https://github.com/erocarrera/pefile/blob/593d094e35198dad92aaf040bef17eb800c8a373/pefile.py#L5871-L5872).

Add a test case which is just the tiny test file with the first import name set
to all NULL bytes. I tested that pefile calculates the imphash of it, which
matches what YARA now calculates too:

>>> import pefile
>>> pe = pefile.PE('/Users/wxs/src/yara/tests/data/tiny_empty_import_name')
>>> pe.get_imphash()
'0eff3a0eb037af8c1ef0bada984d6af5'
>>>

Fixes VirusTotal#1943
@plusvic
Copy link
Member

plusvic commented Aug 10, 2023

The file tests/data/tiny_empty_import_name wasn't added in the PR. Interesting enough, the tests are failing with a memory leak that doesn't seem related to the change done here.

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Aug 10, 2023

Oops, I forgot to add it! Will do so as soon as I'm able.

@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Aug 10, 2023

Please don't merge this yet. There are other problems raised in the linked issue that are being looked into.

wxsBSD added 4 commits August 10, 2023 11:40
If an imported function does not contain ONLY a-zA-Z0-9 and a small subset of
special characters it will now be ignored.  This aligns us better with pefile,
which checks for valid import names and skips them if they are invalid.

I've also updated the test file to check that these special characters are
handled properly.
Turns out that the tiny-idata-5200 file is corrupted to the point that pefile
doesn't parse it. For example, it finds no imports to hash:

```
>>> pe = pefile.PE('tests/data/tiny-idata-5200')
>>> pe.get_imphash()
''
>>>
```

We were parsing imports from this file before these changes that were incorrect,
so fix the tests to reflect the fact that we parse no imports from this file
anymore.

As part of this I've split the checks for number of parsed imports and
successfully parsed imports into two different counters, which now means we are
accurately reflecting when we are able to parse the import table but not the
descriptors in it while still making sure we don't parse too many as we have
seen before.
Move these declaractions so they are with the rest of the constants. This makes
the output of -D easier to read.
If we have an invalid import name we need to free the name and continue to the
next thunk and function. While fixing this I noticed that if we fail to alloc an
IMPORT_FUNCTION* we would end up looping endlessly because we were never
incrementing the thunk pointer or function index. Fix it by ALWAYS incrementing
those at the end of the loop and conditionally populating the newly allocated
node.
@wxsBSD
Copy link
Collaborator Author

wxsBSD commented Aug 11, 2023

The file tests/data/tiny_empty_import_name wasn't added in the PR. Interesting enough, the tests are failing with a memory leak that doesn't seem related to the change done here.

Actually, the memory leak was because of my changes. I was not calling yr_free() if the import had an invalid name.

I also discovered that if we failed to allocate a new IMPORT_FUNCTION we would free the name (correctly) but then continue without incrementing the thunk pointer or function index, which would cause an infinite loop. I've fixed that too.

I think this can be reviewed and merged now.

@plusvic plusvic merged commit e94ec7b into VirusTotal:master Aug 18, 2023
@wxsBSD wxsBSD deleted the imphash_fix branch August 18, 2023 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

yara imphash and pefile imphash are different
2 participants