Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fix allmatch #687

Merged
merged 56 commits into from
Oct 19, 2022
Merged

Feature/fix allmatch #687

merged 56 commits into from
Oct 19, 2022

Conversation

micahsnyder
Copy link
Contributor

Overhaul of the allmatch system. In allmatch mode, CL_VIRUS not returned until the very end. Removes need for allmatch checks scattered throughout the codebase.

@lgtm-com
Copy link

lgtm-com bot commented Sep 7, 2022

This pull request fixes 1 alert when merging 9243b2e into cba4246 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@ragusaa
Copy link
Contributor

ragusaa commented Sep 13, 2022

I put a comment in CLAM-2083

@micahsnyder
Copy link
Contributor Author

Just did a rebase to get things up to date, to resolve the merge conflicts and squash the commits that needed squashing.

@lgtm-com
Copy link

lgtm-com bot commented Sep 19, 2022

This pull request fixes 1 alert when merging a944aeb into 0301808 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 20, 2022

This pull request fixes 1 alert when merging fe86c44 into 0301808 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2022

This pull request fixes 1 alert when merging a2740e2 into bafb26b - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2022

This pull request fixes 1 alert when merging cc001bc into bafb26b - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2022

This pull request fixes 1 alert when merging 60b6837 into bafb26b - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 23, 2022

This pull request fixes 1 alert when merging ca6c589 into f720b00 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 26, 2022

This pull request fixes 1 alert when merging 774700c into f720b00 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@lgtm-com
Copy link

lgtm-com bot commented Sep 26, 2022

This pull request fixes 1 alert when merging a4a0d35 into f720b00 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@micahsnyder
Copy link
Contributor Author

I found that the section offset commit increased ram usage by a bit (50-ish MB?) and also increased load time by a lot (60 seconds give or take).
So I removed the section offset fix commit and moved it to a separate branch for further investigation: https://github.com/micahsnyder/clamav-micah/tree/CLAM-2094-section-offset-acpatt

@lgtm-com
Copy link

lgtm-com bot commented Sep 27, 2022

This pull request fixes 1 alert when merging 95369d3 into 197113c - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@micahsnyder
Copy link
Contributor Author

I just removed the "reduce RAM usage" commit and moved it to a separate branch so we can test and review it separate. I found that it had some false positive issues relating to offset sigs during regression testing: https://github.com/micahsnyder/clamav-micah/tree/CLAM-2128-reduce-ac-patt-mem-usage

Since I found a different way to reduce RAM usage in this PR back to normal levels (moving the section offset bug fix to a separate branch as well), this change not as important to merge at the same time.

@lgtm-com
Copy link

lgtm-com bot commented Sep 27, 2022

This pull request fixes 1 alert when merging f2f4a91 into 197113c - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@micahsnyder micahsnyder marked this pull request as ready for review September 29, 2022 21:05
libclamav/7z_iface.c Outdated Show resolved Hide resolved
And fix issue where critical errors in magic_scan done by pdf_scan_contents() may be ignored.
And fix issue where call to magic_scan may not propagate critical errors.
It appears the result from the _x509_to_pem() function was accidentally hardcoded to a `0` (success)
This does not change the ABI because enums are ints. Though it may add warnings.
Fix issue where critical errors from magic_scan may not be propagated.

Fix issue where file name and descriptor may be leaked if keeptmp is enabled.
And fix issue where magic_scan() critical error may be ignored.
And fix issue where magic_scan() critical error may be ignored.
And fixed a couple of conditions where critical scan errors may be ignored.
Also fixed a number of conditions where magic_scan() critical errors may
be ignored.

To ensure that the scan truly aborts for signature matches (not in
allmatch mode) and for timeouts, the `ctx->abort` option is now set in
these two conditions, and checked in several spots in magic_scan().

Additionally, I've consolidated some of the "scan must halt" type of
checks (mostly large switch statements) into a function so that we can
use the exact same logic in a few places in magic_scan().

I've also fixed a few minor warnings and code format issues.
The header parsing / executable metadata collecting functions for the
PE, ELF, and Mach-O file types were using `int` for the return type.
Mostly they were returning 0 for success and -1, -2, -3, or -4 for
failure. But in some cases they were returning cl_error_t enum values
for failure. Regardless, the function using them was treating 0 as
success and non-zero as failure, which it stored as -1 ... every time.

This commit switches them all to use cl_error_t.  I am continuing to
storeo the final result as 0 / -1 in the `peinfo` struct, but outside of
that everything has been made consistent.

While I was working on that, I got a tad side tracked.  I noticed that
the target type isn't an enum, or even a set of #defines. So I made an
enum and then changed the code that uses target types to use the enum.

I also removed the `target` parameter from a number of functions that
don't actually use it at all. Some recursion was masking the fact that
it was an unused parameter which is why there was no warning about it.
The cli_bc_ctx->outfd struct member was not properly initialized to -1.
Perhaps previous developers figured 0 was invalid-enough. All of the
checks for that file descriptor assumed 0 was the invalid value, going
so far as to explicitly set outfd to 0 if `open()` returned -1.
I didn't know this, so when I cleaned up the error handling in
`cli_unpackelf()` and `cli_unpackmacho()`, I had it `close(outfd)` when
not -1. That of course ended up closing stdin... and then all subsequent
file scans opened the file as fd `0`,... which interestingly caused
`read()` and `stat()` errors, but only after scanning a macho or elf
file, first.

Anyways... this commit fixes the issue by properly initializing outfd to
-1, and by changing any checks from 0 to -1.

I also found that it appears that the bytecode timeout wasn't being
applied to bytecode functions associated with logical signaures (that
is, those run by `cli_bytecode_runlsig()`).
What I see is that `ctx->bytecode_timeout` is only set to a non-zero
value in `cli_bytecode_context_alloc()`.
But for `cli_bytecode_runlsig()`, the bytecode context sits on the stack
and is memset instead. To resolve this, and ensure the bytecode context
is properly initialized, I created a new function that does this and
had it do the memset instead of using a calloc in the allocation
function.

I also removed the `bytecode_context_clear()` function because it simply
called `bytecode_context_reset()` and then did a memset. The memset is
unnecessary, especially since in most cases it's memsetting a stack
structure immediately before a return.
Because of an oversight, the `ctx->abort_scan` flag was propagating
timeout and perhaps other error types to the final verdict.
Fixed this in the `result_should_goto_done()` function.

Some non-fatal errors (notably CL_ETIMEOUT) may appear between the call
to the magic-scan, and the end of scan_common().
We need to filter these status results at this level as well to prevent
this from happening.

I also fixed a warning accidentally introduced earlier caused by trying
to use a string after free'ing it and setting it to NULL.

And I explicitly set the status to CL_SUCCESS if the cache check comes
back clean, rather than trusting that no one had messed with the `ret`
variable since it was let set (to CL_SUCCESS). It's more readable and
less prone to future bugs.
Fix pretty silly double-free, and fix issue where 'extracted_file' was
not being set to true.

Also rename `uncompressed` variable to `uncompressed_block`, because
there is another variable with the same name, above.
The `cli_memcpy` function is a version of memcpy that will catch
exceptions on Windows if the memcpy fails because the memory-mapped file
fails to read (like if a network drive is disconnected or something).
It presently returns an `int`. Since it is only used in one spot, I've
swapped it over to a `cl_error_t`, and created a unique variable so
we're not piggybacking with the `cli_file_t` enum used to check the
result of a filetyping scan.
Error handling for bytecode sig evaluation, as well as logical and yara
sig evaluation only consider CL_VIRUS or CL_SUCCESS and fail to
consider terminal errors such as CL_ETIMEOUT.

This commit fixes the error handling in those functions so we properly
abort instead of continuing to evaluate more sigs.

This commit also adds some a scan time limit checks:
1. shortly after a bytecode time limit exceeded.
2. after evaluating each lsig or yara sig.
3. when processing unzip central directory file headers.
In the logical sig evaluation function, we are calling the run bytecode
function even if the lsig doesn't depend on bytecode. That function
immediately returns CL_ENULLARG, which previously was ignored because we
were only checking for CL_VIRUS. Now that we're checking if NOT success,
the error propagates and scanning of that layer ends early.

The above bug is actually in some duplicate code, so when fixing this I
went ahead and refactored the function to remove this, and added
additional comments and readability improvements.
Some files that take longer to scan may exceed the scan time by a fair
bit during the actual pattern matching part of the scan (vs.
decompressing archives or running bytecode functions, or whatever).

ClamAV does pattern matching and hashing of files in chunks.
This change adds a time check before each chunk is matched.
@micahsnyder
Copy link
Contributor Author

Rebased and fixed massive merge conflict caused by archive inspection PR.

@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2022

This pull request fixes 1 alert when merging 155ca1b into cf81299 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

Some small files that appear to include archive content (zip entries,
RAR archives, etc) later in the file are no longer set to that SFX type
initially.  However, the current way we skip those drops the initial
type detection on the floor, meaning that something like a small image
may end up as CL_TYPE_ANY instead of CL_TYPE_PNG.

This commit fixes that by changing where in the process the SFX types
are ignored.
@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2022

This pull request fixes 1 alert when merging efa8a01 into cf81299 - view on LGTM.com

fixed alerts:

  • 1 for Empty branch of conditional

@micahsnyder micahsnyder merged commit 9d8af63 into main Oct 19, 2022
@micahsnyder micahsnyder deleted the feature/fix-allmatch branch October 19, 2022 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants