Logic improvements for tasking #1020

cccs-kevin · 2022-07-21T20:17:44Z

The goal of this PR started out as being able to handle tasks that require machines with specific tags, but then it got WAY bigger!

Things that I have improved:

conf/az.conf, modules/machinery/az.py
- Added more configurations and abilities to the Azure machinery
  - Added ability to use different resource groups for the virtual network and the resources associated with sandboxing (virtual machine scale sets, disks, network interface cards, etc)
  - Added the ability to manually enter the number of cores for an instance type rather than programmatically finding this value
  - Added the ability to not reset the pool size of each virtual machine scale set when CAPE restarts
  - Added the ability to wait or not wait for victim agents when CAPE restarts
conf/web.conf, lib/cuckoo/common/web_utils.py, lib/cuckoo/core/database.py
- Added configurations to web.conf that will toggle the dynamic assignments of required machine platform and architecture for a task
lib/cuckoo/common/abstracts.py, lib/cuckoo/core/database.py
- Added ability to acquire/lock/list machines based on the architecture
lib/cuckoo/common/web_utils
- If a value representing a boolean is passed to the REST API, it can be true/false, yes/no, on/off, 0/1 and can be any case.
utils/db_migration/versions/2_3_3_expand_error_message.py, lib/cuckoo/core/database.py
- Changed the size of the Error table message field because it could not contain the CuckooGuestCriticalTimeout error
  - Created a migration script with alembic
  - Updated the SCHEMA_VERSION of the database.py file
lib/cuckoo/core/scheduler.py, lib/cuckoo/core/database.py
- Refactored the way that tasks are assigned to machines. The reason behind this is because the original way of doing this was to iterate through available machines and then determine if there was a task that matched up with a machine. This logic is flawed because it is inefficient - we should be iterating through pending tasks rather than available machines. My solution:
  - Use a is_relevant_machine_available method that determines if a machine is relevant to a task and available to be assigned
  - Reworked the fetch method into fetch_task for tasks that do not require a VM. This really simplifies the logic for this method.
lib/cuckoo/core/database.py
- In the guest_stop method, we should check if the guest exists before trying to shut it down. This can occur in dynamic machineries such as az or aws.
- Modified the logic of list_machines so that you can also filter by locked as True/False, machine label and the machine architecture
- Improved the logging in lock_machine to include the selection criteria that was attempted
- Just like in Cuckoo, get_available_machines now joins on tags
- Added options_not_like in list_tasks since this is a way that we can replicate the use of not_("node=") in the original fetch method https://github.com/kevoreilly/CAPEv2/blob/master/lib/cuckoo/core/database.py#L815
- Added the ability for a tuple representing the order_by keyword argument in list_tasks, so that we can pass multiple items to list_tasks to order tasks by
lib/cuckoo/core/plugins.py
- Improving exception logging
lib/cuckoo/core/resultserver.py
- Nitpick here for consistency across logs
lib/cuckoo/core/scheduler.py
- In the check_file method, we should check if sample exists before trying to access it's sha256 attribute.
- Before calling acquire, determine which task tags are the architecture and which are just tags, and then pass these values to acquire
- Improve logging by detailing what the task requires if it has been selected off of the stack and there is no machine available yet
- Just like in Cuckoo, add exception clause for the CuckooGuestCriticalTimeout error that will unlock the machine
- Improved the periodic log so that it includes details about tasks and machines that are useful in determining why tasks are not being pulled off of the stack
modules/machinery/az.py
- Check if the number_of_new_cpus_available according to the quota is less than 0, and if so, set the minimum number_of_relevant_machines_required. This was causing errors when the 5 VM buffer was too close to the limit.
- Improving the logging of exceptions
tests/test_scheduler.py
- Fixing test for acquire and mock_tags
utils/cleaners.py
- If both mongodb and elasticsearch are disabled, pending tasks were unable to be deleted using this utility, so check if web reporting is enabled first.

… bigger error messages in db

…in cleaner

cccs-kevin · 2022-07-29T18:37:15Z

Let me know if you would prefer if this PR was split into smaller ones.

doomedraven · 2022-07-30T02:35:58Z

thats fine. amazing work on this. Special thanks for the tags that removes some stuff that i had pending but never had time to review properly

doomedraven · 2022-07-30T12:15:00Z

btw amazing summary, forgot to say

kevoreilly · 2022-07-30T12:18:42Z

Thanks a lot

kevoreilly#1020 introduced a regression in selection of VM's for x86 tasks where it would no longer allow selection of an x64 VM. This reintroduces that ability so that x86 tasks will select x86 OR x64 machines, preferring x86 over x64.

* Fix regression in VM selection. #1020 introduced a regression in selection of VM's for x86 tasks where it would no longer allow selection of an x64 VM. This reintroduces that ability so that x86 tasks will select x86 OR x64 machines, preferring x86 over x64. * Fix formatting. Co-authored-by: Tommy Beadle

cccs-kevin added 2 commits July 26, 2022 19:43

Misc logic improvements to database.py

90af111

Further updates to scheduler, db and az machinery

bc4c40b

cccs-kevin force-pushed the update/db branch from f5510f1 to bc4c40b Compare July 27, 2022 19:37

cccs-kevin added 10 commits July 28, 2022 18:50

Improving periodic log, fixing logic in scheduler, adding handling of…

d33ec8f

… bigger error messages in db

Updating the schema version

ad8c17f

Expanding length of messages field, adding pound to log

13c971e

Updating the message field limit in database.py

a2760af

Removing whitespace

24e82e8

Adding doc string for arch in acquire override

05144dc

Use pool_tag naming

6f6bd2f

Improving periodic log, adding label for machine-specific tasks

3f8ba9c

Further periodic log improvements, check if web_reporting is enabled …

4f2537b

…in cleaner

Fixing scheduler tests

bae5d94

cccs-kevin changed the title ~~Misc logic improvements to database.py~~ Logic improvements for tasking Jul 29, 2022

cccs-kevin marked this pull request as ready for review July 29, 2022 17:39

doomedraven merged commit 380e45f into kevoreilly:master Jul 30, 2022

tbeadle mentioned this pull request Aug 1, 2022

Fix regression in VM selection. #1035

Merged

cccs-kevin deleted the update/db branch August 2, 2022 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logic improvements for tasking #1020

Logic improvements for tasking #1020

cccs-kevin commented Jul 21, 2022 •

edited

Loading

cccs-kevin commented Jul 29, 2022

doomedraven commented Jul 30, 2022

doomedraven commented Jul 30, 2022

kevoreilly commented Jul 30, 2022

Logic improvements for tasking #1020

Logic improvements for tasking #1020

Conversation

cccs-kevin commented Jul 21, 2022 • edited Loading

cccs-kevin commented Jul 29, 2022

doomedraven commented Jul 30, 2022

doomedraven commented Jul 30, 2022

kevoreilly commented Jul 30, 2022

cccs-kevin commented Jul 21, 2022 •

edited

Loading