Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with deeply nested directory hierarchies #2088

Closed
georgschoelly opened this issue Jun 17, 2024 · 5 comments
Closed

Crash with deeply nested directory hierarchies #2088

georgschoelly opened this issue Jun 17, 2024 · 5 comments
Labels

Comments

@georgschoelly
Copy link

georgschoelly commented Jun 17, 2024

Describe the bug
Yara ignores deeply nested directories or even crashes due to stack overflow.

To Reproduce

  1. use Ubuntu 22.04.
  2. Download and install Yara 4.5.1: https://yara.readthedocs.io/en/latest/gettingstarted.html
  3. Create a dummy rule:
cat << EOF > matchall.yara
rule MatchAll {
condition:
        true
}
EOF
  1. Set the fd limit to something high. This ensures we have a crash. If it is lower, there is no crash because the fd limit is exhausted. The result then is that the file is not scanned.
ulimit -n 10000
  1. Create a deeply nested file hierarchy b/b/b/b/b/.../file. I chose 600 as the depth because this ensures the path will take more than the 1024 characters defined in MAX_PATH here: https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L42
DEEP_PATH=$(printf 'b/%.0s' {1..600})
mkdir -p $DEEP_PATH; touch $DEEP_PATH/file
  1. Scan this directory:
$ yara -r matchall.yara b
Segmentation fault (core dumped)

Expected behavior

No core dump. Maybe an error indicating that the directory hierarchy is too deep. Something like this:

$ yara -r matchall.yara b
yara -r matchall.yara b/
MatchAll b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/b/.../file

Please complete the following information:

  • OS: Ubuntu 22.04
  • YARA version: 4.5.1

Additional context

The issue is missing error checking in cli/yara.c:

char full_path[MAX_PATH];
snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);

if dir is already the size of full_path, the new path is the same as the old path. This leads to an infinite recursion which ends either in a SIGSEGV (i.e. stack overflow) or once the file descriptor limit has been reached.

In the case of the file descriptor, this means that any sufficiently deeply nested files are ignored.

How to fix

I see these options:

  1. Dynamically allocate full_path. This would make the stack frame much smaller and a stack overflow less likely. This would not solve the fd-limit problem however.
  2. Simply print an error:
int printed;
printed = snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);
if (printed >= sizeof(full_path)) {
    fprintf(
        stderr,
        "skipping %s because the path is longer than %zu bytes\n",
        full_path,
        sizeof(full_path));
    }
    de = readdir(dp);
    continue;
}
  1. Restructure the code to avoid recursion.
@georgschoelly
Copy link
Author

georgschoelly commented Jun 18, 2024

After some more research, I think the best solution is quite simple. The underlining issue is that the code uses MAX_PATH which is a number used only by Windows. On Linux systems we would use PATH_MAX instead. This is not the whole story of PATH_MAX, but it's good enough. 1 2

I therefore propose the following change:

https://github.com/VirusTotal/yara/blob/v4.5.1/cli/yara.c#L676C7-L676C32

- char full_path[MAX_PATH];
+ char full_path[PATH_MAX];

https://github.com/VirusTotal/yara/blob/v4.5.1/libyara/include/yara/limits.h#L41-L43

// Maximum length of file paths. This is the only limit that doesn't have the
// YR_ prefix. The intention is using the default MAX_PATH if defined.
  #ifndef MAX_PATH
- #define MAX_PATH 1024
+ #define MAX_PATH 4096
  #endif

+ #ifndef PATH_MAX
+ #define PATH_MAX 4096
+ #endif

@plusvic
Copy link
Member

plusvic commented Jun 19, 2024

Please try with the code in this branch: https://github.com/VirusTotal/yara/tree/max_path

Let me know if it works for you.

@georgschoelly
Copy link
Author

Thanks for your answer. It works better now, I can scan deeper paths. The stack overflow can still be triggered by using even deeper paths:

DEEP_PATH=$(printf 'b/%.0s' {1..2000})
mkdir -p $DEEP_PATH; touch $DEEP_PATH/file

I can work around this by increasing the stack limit:

ulimit -Ss 16384      // default was 8192

I guess the fundamental problem is that recursion is being used for scanning the file system.

@plusvic
Copy link
Member

plusvic commented Jun 19, 2024

Yes, the right solution would be refactoring the directory walking logic so that it is iterative instead of recursive. A mitigation measure would be imposing a limit to the recursion depth.

@georgschoelly
Copy link
Author

Even with an iterative approach there are problems:

  • The fill descriptor limit is still there.
  • Paths longer than 4096 are possible on Linux and changes in other parts of the Yara code would need to be adapted to support them.

All of this is non-trivial to fix.

However, by dynamically allocating full_path we can get rid of the stack overflow since the stack frame is much smaller. Therefore I would do the following change and then consider the issue solved:

--- yara-max_path/cli/yara.c    2024-06-19 09:22:21.000000000 +0000
+++ yara-max_path2/cli/yara.c   2024-06-19 14:23:42.187843585 +0000
@@ -667,12 +667,14 @@
   {
     struct dirent* de = readdir(dp);

+    char *full_path = calloc(YR_MAX_PATH, sizeof(char));
+    const size_t full_path_size = YR_MAX_PATH * sizeof(char);
+
     while (de && result != ERROR_SCAN_TIMEOUT)
     {
-      char full_path[YR_MAX_PATH];
       struct stat st;

-      snprintf(full_path, sizeof(full_path), "%s/%s", dir, de->d_name);
+      snprintf(full_path, full_path_size, "%s/%s", dir, de->d_name);

       int err = lstat(full_path, &st);

@@ -731,6 +733,7 @@
       de = readdir(dp);
     }

+    free(full_path);
     closedir(dp);
   }

@plusvic plusvic closed this as completed in 2a9f61d Aug 1, 2024
DavidTurland pushed a commit to DavidTurland/yara that referenced this issue Sep 9, 2024
DavidTurland added a commit to DavidTurland/yara that referenced this issue Sep 9, 2024
* Fix crash while parsing PE Rich header

File e77b007c9a964411c5e33afeec18be32c86963b78f3c3e906b28fcf1382f46c3 has a Rich header of only 8 bytes, which is smaller than the RICH_SIGNATURE structure. This was causing a crash when some of the `rich_xxx` functions were used with this file.

* Fix warning

`_rich_version` in PE module should return an `int64_t` instead of `uint64_t`.

* Use YR_MAX_PATH instead of MAX_PATH (VirusTotal#2090)

Replace all instances of `MAX_PATH` with `YR_MAX_PATH`.

* Adding Veeam (VirusTotal#2083)

Adding Veeam to list of companies that use YARA.

* Add Cado to who is using Yara (VirusTotal#2086)

* Mitigate stack overflow when scanning very deep directory trees.

Closes VirusTotal#2088.

* Remove all references to ERROR_TOO_MANY_SCAN_THREADS

This error code is not used anymore. Closes VirusTotal#2068.

* Use latest MacOS in build workflow.

* Use MacOS 13 in build workflow.

For some reason in MacOS 14 the build fails because the `configure` script is unable to find the Jansson library, even thought it is correctly installed by `brew`.

* docs: minor updates to xor (VirusTotal#2098)

* use new module macros in docs (VirusTotal#2100)

Co-authored-by: Tad Keller <logisch@pm.me>

* filemap: define PROC_SUPER_MAGIC, avoid linux/magic.h (VirusTotal#2103)

PR VirusTotal#1848 caused build issues with some "unusual" build configurations
– apparently we can't rely on linux/magic.h being present when
cross-building for musl libc.

Defining PROC_SUPER_MAGIC should not cause a problems since it should
be considered part of the Linux kernel/user API and it is unlikely to
change.

---------

Co-authored-by: Victor M. Alvarez <vmalvarez@virustotal.com>
Co-authored-by: Chris Arceneaux <carcenea@gmail.com>
Co-authored-by: chrisdoman <chris.doman@cantab.net>
Co-authored-by: Wes <5124946+wesinator@users.noreply.github.com>
Co-authored-by: Tad Keller <43346260+GLMONTER@users.noreply.github.com>
Co-authored-by: Tad Keller <logisch@pm.me>
Co-authored-by: Hilko Bengen <bengen@hilluzination.de>
DavidTurland added a commit to DavidTurland/yara that referenced this issue Sep 9, 2024
* Fix crash while parsing PE Rich header

File e77b007c9a964411c5e33afeec18be32c86963b78f3c3e906b28fcf1382f46c3 has a Rich header of only 8 bytes, which is smaller than the RICH_SIGNATURE structure. This was causing a crash when some of the `rich_xxx` functions were used with this file.

* Fix warning

`_rich_version` in PE module should return an `int64_t` instead of `uint64_t`.

* Use YR_MAX_PATH instead of MAX_PATH (VirusTotal#2090)

Replace all instances of `MAX_PATH` with `YR_MAX_PATH`.

* Adding Veeam (VirusTotal#2083)

Adding Veeam to list of companies that use YARA.

* Add Cado to who is using Yara (VirusTotal#2086)

* Mitigate stack overflow when scanning very deep directory trees.

Closes VirusTotal#2088.

* Remove all references to ERROR_TOO_MANY_SCAN_THREADS

This error code is not used anymore. Closes VirusTotal#2068.

* Use latest MacOS in build workflow.

* Use MacOS 13 in build workflow.

For some reason in MacOS 14 the build fails because the `configure` script is unable to find the Jansson library, even thought it is correctly installed by `brew`.

* docs: minor updates to xor (VirusTotal#2098)

* use new module macros in docs (VirusTotal#2100)

Co-authored-by: Tad Keller <logisch@pm.me>

* filemap: define PROC_SUPER_MAGIC, avoid linux/magic.h (VirusTotal#2103)

PR VirusTotal#1848 caused build issues with some "unusual" build configurations
– apparently we can't rely on linux/magic.h being present when
cross-building for musl libc.

Defining PROC_SUPER_MAGIC should not cause a problems since it should
be considered part of the Linux kernel/user API and it is unlikely to
change.

---------

Co-authored-by: Victor M. Alvarez <vmalvarez@virustotal.com>
Co-authored-by: Chris Arceneaux <carcenea@gmail.com>
Co-authored-by: chrisdoman <chris.doman@cantab.net>
Co-authored-by: Wes <5124946+wesinator@users.noreply.github.com>
Co-authored-by: Tad Keller <43346260+GLMONTER@users.noreply.github.com>
Co-authored-by: Tad Keller <logisch@pm.me>
Co-authored-by: Hilko Bengen <bengen@hilluzination.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants