Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic improvements for tasking #1020

Merged
merged 12 commits into from
Jul 30, 2022
Merged

Conversation

cccs-kevin
Copy link
Collaborator

@cccs-kevin cccs-kevin commented Jul 21, 2022

The goal of this PR started out as being able to handle tasks that require machines with specific tags, but then it got WAY bigger!

Things that I have improved:

  • conf/az.conf, modules/machinery/az.py
    • Added more configurations and abilities to the Azure machinery
      • Added ability to use different resource groups for the virtual network and the resources associated with sandboxing (virtual machine scale sets, disks, network interface cards, etc)
      • Added the ability to manually enter the number of cores for an instance type rather than programmatically finding this value
      • Added the ability to not reset the pool size of each virtual machine scale set when CAPE restarts
      • Added the ability to wait or not wait for victim agents when CAPE restarts
  • conf/web.conf, lib/cuckoo/common/web_utils.py, lib/cuckoo/core/database.py
    • Added configurations to web.conf that will toggle the dynamic assignments of required machine platform and architecture for a task
  • lib/cuckoo/common/abstracts.py, lib/cuckoo/core/database.py
    • Added ability to acquire/lock/list machines based on the architecture
  • lib/cuckoo/common/web_utils
    • If a value representing a boolean is passed to the REST API, it can be true/false, yes/no, on/off, 0/1 and can be any case.
  • utils/db_migration/versions/2_3_3_expand_error_message.py, lib/cuckoo/core/database.py
    • Changed the size of the Error table message field because it could not contain the CuckooGuestCriticalTimeout error
      • Created a migration script with alembic
      • Updated the SCHEMA_VERSION of the database.py file
  • lib/cuckoo/core/scheduler.py, lib/cuckoo/core/database.py
    • Refactored the way that tasks are assigned to machines. The reason behind this is because the original way of doing this was to iterate through available machines and then determine if there was a task that matched up with a machine. This logic is flawed because it is inefficient - we should be iterating through pending tasks rather than available machines. My solution:
      • Use a is_relevant_machine_available method that determines if a machine is relevant to a task and available to be assigned
      • Reworked the fetch method into fetch_task for tasks that do not require a VM. This really simplifies the logic for this method.
  • lib/cuckoo/core/database.py
    • In the guest_stop method, we should check if the guest exists before trying to shut it down. This can occur in dynamic machineries such as az or aws.
    • Modified the logic of list_machines so that you can also filter by locked as True/False, machine label and the machine architecture
    • Improved the logging in lock_machine to include the selection criteria that was attempted
    • Just like in Cuckoo, get_available_machines now joins on tags
    • Added options_not_like in list_tasks since this is a way that we can replicate the use of not_("node=") in the original fetch method https://github.com/kevoreilly/CAPEv2/blob/master/lib/cuckoo/core/database.py#L815
    • Added the ability for a tuple representing the order_by keyword argument in list_tasks, so that we can pass multiple items to list_tasks to order tasks by
  • lib/cuckoo/core/plugins.py
    • Improving exception logging
  • lib/cuckoo/core/resultserver.py
    • Nitpick here for consistency across logs
  • lib/cuckoo/core/scheduler.py
    • In the check_file method, we should check if sample exists before trying to access it's sha256 attribute.
    • Before calling acquire, determine which task tags are the architecture and which are just tags, and then pass these values to acquire
    • Improve logging by detailing what the task requires if it has been selected off of the stack and there is no machine available yet
    • Just like in Cuckoo, add exception clause for the CuckooGuestCriticalTimeout error that will unlock the machine
    • Improved the periodic log so that it includes details about tasks and machines that are useful in determining why tasks are not being pulled off of the stack
  • modules/machinery/az.py
    • Check if the number_of_new_cpus_available according to the quota is less than 0, and if so, set the minimum number_of_relevant_machines_required. This was causing errors when the 5 VM buffer was too close to the limit.
    • Improving the logging of exceptions
  • tests/test_scheduler.py
    • Fixing test for acquire and mock_tags
  • utils/cleaners.py
    • If both mongodb and elasticsearch are disabled, pending tasks were unable to be deleted using this utility, so check if web reporting is enabled first.

@cccs-kevin cccs-kevin changed the title Misc logic improvements to database.py Logic improvements for tasking Jul 29, 2022
@cccs-kevin cccs-kevin marked this pull request as ready for review July 29, 2022 17:39
@cccs-kevin
Copy link
Collaborator Author

Let me know if you would prefer if this PR was split into smaller ones.

@doomedraven
Copy link
Collaborator

thats fine. amazing work on this. Special thanks for the tags that removes some stuff that i had pending but never had time to review properly

@doomedraven doomedraven merged commit 380e45f into kevoreilly:master Jul 30, 2022
@doomedraven
Copy link
Collaborator

btw amazing summary, forgot to say

@kevoreilly
Copy link
Owner

Thanks a lot

tbeadle pushed a commit to tbeadle/CAPEv2 that referenced this pull request Aug 1, 2022
kevoreilly#1020 introduced a regression
in selection of VM's for x86 tasks where it would no longer allow
selection of an x64 VM. This reintroduces that ability so that x86 tasks
will select x86 OR x64 machines, preferring x86 over x64.
doomedraven pushed a commit that referenced this pull request Aug 2, 2022
* Fix regression in VM selection.

#1020 introduced a regression
in selection of VM's for x86 tasks where it would no longer allow
selection of an x64 VM. This reintroduces that ability so that x86 tasks
will select x86 OR x64 machines, preferring x86 over x64.

* Fix formatting.

Co-authored-by: Tommy Beadle
@cccs-kevin cccs-kevin deleted the update/db branch August 2, 2022 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants