Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize Every Analyzer #607

Closed
wants to merge 32 commits into from
Closed

Conversation

milesflo
Copy link

@milesflo milesflo commented Feb 8, 2020

Addresses: #606

Changes:

  1. Convert analyser JSON file versions from loose versioning schema to semver standard

    • This will pay dividends w/ specific analyzer version management & Docker deployment
  2. Creation of 2 helper scripts for Dockerfile mgmt

    1. utils/dockerfile_builder.py
      • Utilizing the relevant config file(s) in each Analyzer, create a Dockerfile w/ proper
        • Base image (either hardcoded w/ the key baseImage in config or derived from source code shebang)
        • Entrypoint (derived from config command key)
        • Dependency installation
        • Metadata (description, author, name)
    2. utils/build_analyzers.py
      • Iterates through the Analyzers directory and builds Images for later Cortex execution
  3. Inclusion of built Dockerfiles

  4. Miscellaneous house keeping

    • Some shebangs were incorrectly defined
    • git pip sources should be used as a last resource in favor of PyPi
    • Updated catalogue via update_catalogue.sh
    • AutoFocus was incorrectly stylized throughout the project

Next Steps:

  1. Include these scripts in the pre-commit hook to ensure that Dockerfiles are always up to date
  2. Deploy these using the docker image push command on each of these built images at every major release, utilizing the catalog system already in place to act as a dockerImage name directory.

Known Issues:

Currently, some of the Analyzers have niche dependencies that break the automated Dockerfile generation logic... We can either change the builder script, or redesign the analyzers to use more native APIs.

Broken Analyzers:

  • Yara
    • gcc does not ship with python:2-alpine
    • Resolved w/ dependency analysis
  • Malpedia
    • gcc does not ship with python:3-alpine
    • Resolved w/ dependency analysis
  • EmlParser
    • gcc does not ship with python:3-alpine
    • Resolved w/ dependency analysis
  • FileInfo
    • This one has a bunch of errors... May want to write a whole edge case just for this one analyzer.. Some apt-level dependencies that probably can't be imported from another base image
    • Reverted to manual state. Could not find a way to use python:3-alpine with the libfuzzy-dev library.

@milesflo milesflo requested review from nadouani, jeromeleonard and To-om and removed request for nadouani February 8, 2020 00:43
@milesflo milesflo changed the title Adding Dockerfiles to analyzers Dockerize Every Analyzer Feb 8, 2020
@milesflo
Copy link
Author

Note: libfuzzy-dev is a not in the Alpine dependency tool apk. I've reached out to the team requesting they add it, as it's valuable to what we're doing here.

@nadouani
Copy link
Contributor

nadouani commented Feb 11, 2020

Hello @milesflo Your changes look great but I don't know how do you want us to accept a PR that changes almost 300 files? How are we supposed to review that? how much time are we supposed to put in reviewing that? Anyways

@milesflo
Copy link
Author

milesflo commented Feb 11, 2020

@nadouani Hello! I included a script at tests/analyzer_docker.test which should help as it will verify that the Dockerfiles function properly w/o needed to test them individually...

I also fixed a lot of things that were broken in the recent commit that forced the jump from python 2 => 3... Between that, and the changes to JSON schemas that just tweak the version numbers, it's not as much as it looks.

@nadouani
Copy link
Contributor

The PR here is supposed to juste play with things related to Dockerize Every Analyzer which is not exactly what I can see on the changes you are making.

Some of your changes have already been made by 2.4.0 for example.

And in general, the rule is here: https://github.com/TheHive-Project/CortexDocs/blob/master/api/how-to-create-an-analyzer.md#create-a-pull-request

@milesflo
Copy link
Author

RE that last commit message, that was a bit of frustration on my part trying to debug the Abuse_Finder analyzer... turns out the dependency tree is broken! 🎉

joepie91/python-whois#151 (comment)

Fixed by specifying an older version of the dep

@nadouani
Copy link
Contributor

Fixed by specifying an older version of the dep

So within a PR that creates a tool to automate dockerfile creation for analyzers, you fix something else! (abuse_finder). Not sure this is a right way.

@milesflo
Copy link
Author

milesflo commented Feb 11, 2020

@nadouani Can't disagree with you that the scope of this one crept up... Thank the test script & packaging for that... I didn't feel comfortable leaving anything broken once I knew it was there. 😄

I could always abstract these out into different PRs if you think that'll be less work for the team... I started to do that with Issues created & referenced back to this.

@@ -14,39 +14,45 @@
# See the License for the specific language governing permissions and
# limitations under the License.

# Python3 compatability by: https://github.com/guyddr/dnsdb-query
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Python3 version of this vendor script we imported from elsewhere. Link listed here.

@@ -0,0 +1,185 @@
#!/usr/bin/env python3
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nadouani Hey it might help if you start from here... This is the script that I used to build those dockerfiles.

It uses the relevant requirements.txt to create a list of Alpine dependencies, then updates the file if any new changes are detected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeromeleonard Psst over here ☝️

if requirements_path.exists():
requirements = requirements_path.open().read()

if 'yara-python' in requirements:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real meat here... 'dependency analysis' might be a bit generous considering how iterative the approach here is. Thankfully a lot of these use the same, common reversing libraries.

dockerfile = dockerfile.read()

# Dockerfiles with this string will be frozen; skip.
if '### MANUAL ###' in dockerfile:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanna freeze a Dockerfile in time, add this string anywhere (preferably the first line for clarity).

Used this as a compromise for when an analyzer is just too niche to build programatically.

@@ -0,0 +1,64 @@
#!/usr/bin/env python3

# Build analyzers from Dockerfiles locally
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is made a bit redundant by the build_analyzers test suite... The idea was that you could use this script to, as quickly as possible (via concurrency), build all the docker images locally.


success_msg = '{"success": false, "input": {}, "errorMessage": "Input file doesnt exist"}'

config_required_msg=""" File "/usr/local/lib/python3.8/site-packages/cortexutils/worker.py", line 31, in __init__
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is produced by executing an Analyzer without piping in a relevant job config. Despite it being a stack trace, its presence indicates that the Python interpreter did not detect any syntax errors or missing libraries-- a success in terms of packaging for deployment.

dockerfile_contents.append('RUN apk add --no-cache {}\n'.format(' '.join(sorted(alpine_dependencies))))


labels = dict()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here down is the LABEL metadata tags. Analyzers with multiple authors will have an authors label and a CSV of the contributors' names in alphabetical order.

@milesflo
Copy link
Author

@nadouani Logging off as I’m on the west coast. Your timing is good— that last push was the final change needed to make sure that all of the analyzer tests work.

Assume this is the final state unless you need edits.

@milesflo milesflo changed the base branch from master to develop February 12, 2020 08:19
@milesflo
Copy link
Author

milesflo commented Feb 12, 2020

I'm opening a PR for each of these Analyzers as your contribution guide requested @nadouani

@nadouani
Copy link
Contributor

I'm opening a PR for each of these Analyzers as your contribution guide requested @nadouani

Hey man I didn't ask for that. When did I say "create a PR par docker file"? I've just said that you have fixed python3 related issues on a PR dealing with the Docker stuff. That's all.

And note that this stuff has a lower priority for what we have to implement for TheHive/Cortex etc... We won't review this immediately.

@milesflo
Copy link
Author

Breakout:

This commit contains Dockerfiles for all anayzers-- the superset for this PR being the following PRs:
#621 #622 #623 #624 #625 #626 #627 #628 #629 #630 #631 #632 #633 #634 #635 #636 #637 #638 #639 #640 #641 #642 #643 #644 #645 #646 #647 #648 #649 #650 #651 #652 #653 #654 #655 #656 #657 #658 #659 #660 #661 #662 #663 #664 #665 #666 #667 #668 #669 #670 #671 #672 #673 #674 #675 #676 #677 #678 #679 #680 #681 #682 #683 #684 #685 #686 #687 #688 #689 #690 #691 #692 #693 #694

@milesflo
Copy link
Author

I'm opening a PR for each of these Analyzers as your contribution guide requested @nadouani

Hey man I didn't ask for that. When did I say "create a PR par docker file"? I've just said that you have fixed python3 related issues on a PR dealing with the Docker stuff. That's all.

And note that this stuff has a lower priority for what we have to implement for TheHive/Cortex etc... We won't review this immediately.

You said the scale of this PR was too large. You linked the docs which say:

Create one Pull Request per analyzer against the develop branch of the Cortex-Analyzers repository. Reference the issue you've created in your PR.

I'm following your lead here... I don't know what else you could have been addressing. Anyways

@dadokkio
Copy link
Contributor

Hello @milesflo , can I ask if the current docker situation for analyzer is ok or if there are still issues?
Can we close all the docker related pull?

@milesflo milesflo closed this Sep 23, 2020
@milesflo
Copy link
Author

@dadokkio Got you, automated closing them all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants