Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2021 idea: Checker creation helper scripts #1063

Closed
terriko opened this issue Jan 27, 2021 · 9 comments
Closed

GSoC 2021 idea: Checker creation helper scripts #1063

terriko opened this issue Jan 27, 2021 · 9 comments
Labels
gsoc Tasks related to our participation in Google Summer of Code
Milestone

Comments

@terriko
Copy link
Contributor

terriko commented Jan 27, 2021

While writing #1062 I suggested that we could look in to providing some scripts to simplify the process of checker creation, specificially because windows users don't have access to all the same utility scripts we use, and a lot of that functionality is already built into the tool.

But I think we might want to do this in a grander scale: what would a helper tool for new checkers look like?

Input: a set of .rpm, .deb, .tar.gz or other files that represent packaged versions of the software to be detected, including the product name and version you're expecting to find in each. (similar to what we put in our long tests). You'd specifically want multiple versions of the same product here, so we could tell if a string was actually common across versions or not.

Output:

  • a list of common binary filenames found in all packages (maybe filtered for "this is likely to just be a man page" and "this is just a really common filename" type stuff)
  • a list of common strings that could maybe be used to indicate the presence of this library (again, heuristically filtered for stuff that's less likely to be a false positive, so probably longer strings with more human-readable words and ones that include the product name)
  • a list of strings that contain version numbers (as candidate for version detection patterns)

Basically, start by automating the process found in the checkers README to get some candidate strings, maybe even formulate the output as a full checker to minimize cut/paste errors.

Once you get that working, you could probably iterate to make it better:

  • make a utiltiy to check a candidate checker against existing packages (maybe even all the test files we have?) and filter out any suggestion that might generate false positives
  • use the utility to figure out a list of common false postitive filenames/patterns that occur in many files
  • try to guess actual simple regexes for version number detection (based on the numbers found in the NVD database)
  • once you start to know what "really common filename" looks like, add tests in cve-bin-tool to warn if the checker detects them
  • look at some of the "signature needs work" checkers where a signature couldn't be found and see if you can develop something based on strings that vary from version to version.

In the course of writing this tool, I expect you'd be able to add new checkers as you built and tested to cve-bin-tool as well.

@terriko terriko added the gsoc Tasks related to our participation in Google Summer of Code label Jan 27, 2021
@terriko
Copy link
Contributor Author

terriko commented Jan 27, 2021

Note that this script wouldn't take very long to write -- I'd guess a week of work for an inexperienced student, a day or two for an experienced one, so if applying for this idea you'll want to spend time working out related tasks and improvements to make a full 175 hours worth of work. You could probably pad it out with "and then I spend the rest of my summer trying it on every package in centos" if you had to.

@Alabhya268
Copy link

@terriko, I new to this project and I want to contribute, Could you please tell me beside python what other tech stack is necessary to get started with this project.

Thanks in advance

@terriko
Copy link
Contributor Author

terriko commented Feb 3, 2021

@Alabhya268 it depends on what you mean by "tech stack" but for this particular project idea, you'd need to know python and become comfortable with a number of file utilities commonly used on Linux (e.g. file, grep, various extraction tools, strings, and so on. As you might expect, if you want to write checker helper scripts you'd need to understand how to write a checker, so the checker readme gives some more details. The ideal candidate for this particular project would have successfully written at least one checker.

If you want to know more generally for all projects across cve-bin-tool, our general gsoc questions thread is the right place to post questions.

@terriko
Copy link
Contributor Author

terriko commented Mar 10, 2021

Adding a link for clarity: this script would likely start by reproducing what we describe in detail in the checkers README. This is the part that I think should only take a week (or potentially less), since it's already pretty well-documented. From there, finessing it to improve the quality of checkers you build from it, testing it extensively against real packages, documenting and writing unittests for it will likely take up the rest of the summer.

@CabTheProgrammer
Copy link
Contributor

Hey everyone, is this issue still available? I'd like to attempt to tackle it, so I have been following the instructions on the gsoc page

@terriko
Copy link
Contributor Author

terriko commented Mar 24, 2021

@CabTheProgrammer Yes, it's still available. Since this is a GSoC project idea and not a regular issue, it's "available" up until we choose a candidate for the project and we likely won't merge contributions related to it before the project announcements go out on May 17th.

If you're looking for a bugs to fix as part of a GSoC application, this is not good for that, and you'd likely want to take a look at the things marked "good first issue" instead.

@imsahil007
Copy link
Contributor

Steps you can follow @peb-peb :

  1. Extract the binary (in .deb,.rpm,.tar.gz or so) extensions recursively using existing code.
  2. You can prompt the user to enter a filename if the user wants to search within a small subset of files within the binary.
  3. Most common file names for possible files for string extraction may have a filename as the checker name. For example, avahi binary contains usr/bin/avahi-daemon with related strings.
  4. If the user is sure about what version binary they are scanning (and the file name contains a version string) - then this can be considered a common filename as well.
  5. Now you have to scan the strings within the above-mentioned file. You can check previous checkers in the cve-binary tool to get the most common pattern. As far as I know, most of the checkers contain the binary name, its version and sometimes the compiler used. For example, gcc, glibc may contain gcc/ldd , its version string and "GNU"
  6. You can create a set of regex that search for version strings within the binary
    Some things to note:
    Avoid false positive cases (This may happen with GNU or oracle related binaries)
    Avoid really common names that may exist. Never consider only the version string as a positive filename string

@peb-peb
Copy link
Contributor

peb-peb commented Jun 8, 2021

@imsahil007 Thank you for the help! I'll remember all these points while making the script :)

@terriko terriko added this to the 3.0 milestone Jun 23, 2021
@terriko
Copy link
Contributor Author

terriko commented Aug 18, 2021

Finished as part of GSoC 2021

@terriko terriko closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Tasks related to our participation in Google Summer of Code
Projects
None yet
Development

No branches or pull requests

5 participants