anshuman-goel · anshuman-goel · Oct 6, 2020 · Sep 29, 2020 · Sep 30, 2020 · Sep 30, 2020
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,45 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## 1.1.0
+### Added
+* Agent/Service: Added the ability to automatically re-image nodes that are out-of-date [#35](https://github.com/microsoft/onefuzz/pull/35)
+* Deployment: Added data-migration scripts for pre-release installs [#12](https://github.com/microsoft/onefuzz/pull/12)
+* SDK/CLI: Added more `onefuzz debug` sub-commands to support debugging tasks [#95](https://github.com/microsoft/onefuzz/pull/95)
+* Agent: Added machine_id and version to log messages [#94](https://github.com/microsoft/onefuzz/pull/94)
+* Service: Errors in creating Azure Devops work items from reports now mark the task as failed [#77](https://github.com/microsoft/onefuzz/pull/77)
+* Service: The nodes executing a task are now included when fetching details for a task (such as `onefuzz tasks get $TASKID`)  [#54](https://github.com/microsoft/onefuzz/pull/54)
+* SDK: Added example [Azure Functions](https://azure.microsoft.com/en-us/services/functions/) that uses the SDK [#56](https://github.com/microsoft/onefuzz/pull/56)
+* SDK/CLI: Added the ability to execute debugger commands automatically during `repro` [#39](https://github.com/microsoft/onefuzz/pull/39)
+* CLI: Added documentation of CLI sub-command arguments (used to describe `afl_container` in AFL templates [#10](https://github.com/microsoft/onefuzz/pull/10)
+* Agent: Added `ONEFUZZ_TARGET_SETUP_PATH` environment variable that indicates the path to the task specific setup container on the fuzzing nodes [#15](https://github.com/microsoft/onefuzz/pull/15)
+* CICD: Use [sccache](https://github.com/mozilla/sccache) to speed up build times [#47](https://github.com/microsoft/onefuzz/pull/47)
+* SDK: Added end-to-end [integration test script](src/cli/examples/integration-test.py) to verify full fuzzing pipelines [#46](https://github.com/microsoft/onefuzz/pull/46)
+* Documentation: Added definitions for [pool](docs/terminology.md#pool), [node](docs/terminology.md#node), and [scaleset](docs/terminology.md#scaleset) [#17](https://github.com/microsoft/onefuzz/pull/17)
+
+### Changed
+* Agent/Service: Refactored state management for on-vm supervisors [#96](https://github.com/microsoft/onefuzz/pull/96)
+* Agent: Added 'done' semaphore to the agent to prevent agent from fetching additional work once the node should be reset.  [#86](https://github.com/microsoft/onefuzz/pull/86)
+* Agent: Nodes now sleep longer between checking for new work.  [#78](https://github.com/microsoft/onefuzz/pull/78)
+* Agent: The task execution clock is now started once the task is in the 'setting up' state [#82](https://github.com/microsoft/onefuzz/pull/82)
+* Service: Drastically reduced logs sent to App Insights from third-party libraries [#63](https://github.com/microsoft/onefuzz/pull/63)
+* Agent/Service: Added the ability to upgrade out-of-date VMs upon requesting new tasking [#35](https://github.com/microsoft/onefuzz/pull/35)
+* CICD: Non-release builds now include the GIT hash in the versions and `localchanges` if built locally with uncommited code.  [#58](https://github.com/microsoft/onefuzz/pull/58)
+* Agent: [Command replacements](docs/command-replacements.md) now use absolute rather than relative paths.  [#22](https://github.com/microsoft/onefuzz/pull/22)
+
+### Fixed
+* CLI: Fixed issue using `onefuzz template stop` which would improperly stop jobs that had the same 'name' but different 'project' values.  [#97](https://github.com/microsoft/onefuzz/pull/97)
+* Agent: Fixed input marker expansion (used in AFL templates related to handling `@@`).  [#87](https://github.com/microsoft/onefuzz/pull/97)
+* Service: Errors generated after the task shutdown has started are ignored.  [#83](https://github.com/microsoft/onefuzz/pull/83)
+* Agent: Instance specific tools now download and run on windows nodes as expected [#81](https://github.com/microsoft/onefuzz/pull/81)
+* CLI: Using `--wait_for_running` in `onefuzz template` jobs now properly waits for tasks to launch before exiting [#84](https://github.com/microsoft/onefuzz/pull/84)
+* Service: Handled more Azure Devops notification errors [#80](https://github.com/microsoft/onefuzz/pull/80)
+* Agent: WSearch service is now properly disabled by default on Windows VMs [#67](https://github.com/microsoft/onefuzz/pull/67)
+* Service: Properly deletes `repro` VMs [#36](https://github.com/microsoft/onefuzz/pull/36)
+* Agent: Supervisor now flushes logs to appinsights upon exit [#21](https://github.com/microsoft/onefuzz/pull/21)
+* Agent: Task specific setup script failures now properly get recorded as a failed task and trigger the node to be re-imaged [#24](https://github.com/microsoft/onefuzz/pull/24)
+
+
 ## 1.0.0
 ### Added
 * Initial public release
diff --git a/CURRENT_VERSION b/CURRENT_VERSION
@@ -1 +1 @@
-1.0.0
+1.1.0
diff --git a/src/api-service/__app__/.gitignore b/src/api-service/__app__/.gitignore
@@ -1 +1,4 @@
-.direnv
+.direnv
+.python_packages
+__pycache__
+.venv
diff --git a/src/api-service/__app__/agent_commands/__init__.py b/src/api-service/__app__/agent_commands/__init__.py
@@ -15,6 +15,7 @@
 
 def get(req: func.HttpRequest) -> func.HttpResponse:
     request = parse_request(NodeCommandGet, req)
+
     if isinstance(request, Error):
         return not_ok(request, context="NodeCommandGet")
 

diff --git a/src/api-service/__app__/agent_events/__init__.py b/src/api-service/__app__/agent_events/__init__.py
@@ -107,6 +107,7 @@ def on_state_update(
                             state=NodeTaskState.setting_up,
                         )
                         node_task.save()
+
             elif state == NodeState.done:
                 # if tasks are running on the node when it reports as Done
                 # those are stopped early
@@ -125,6 +126,8 @@ def on_state_update(
                             machine_id,
                             done_data,
                         )
+        else:
+            logging.info("No change in Node state")
     else:
         logging.info("ignoring state updates from the node: %s: %s", machine_id, state)
 

diff --git a/src/api-service/__app__/agent_registration/__init__.py b/src/api-service/__app__/agent_registration/__init__.py
@@ -3,6 +3,7 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
 
+import logging
 from uuid import UUID
 
 import azure.functions as func
@@ -76,6 +77,7 @@ def get(req: func.HttpRequest) -> func.HttpResponse:
 
 def post(req: func.HttpRequest) -> func.HttpResponse:
     registration_request = parse_uri(AgentRegistrationPost, req)
+    logging.info(f"request: {registration_request}")
     if isinstance(registration_request, Error):
         return not_ok(registration_request, context="agent registration")
 

diff --git a/src/api-service/__app__/onefuzzlib/pools.py b/src/api-service/__app__/onefuzzlib/pools.py
@@ -327,6 +327,11 @@ def create(
         arch: Architecture,
         managed: bool,
         client_id: Optional[UUID],
+        max_size: int,  # scaleset max size
+        vm_sku: str,
+        image: str,
+        spot_instances: bool,
+        region: Region,
     ) -> "Pool":
         return cls(
             name=name,
@@ -335,6 +340,11 @@ def create(
             managed=managed,
             client_id=client_id,
             config=None,
+            max_size=max_size,
+            vm_sku=vm_sku,
+            image=image,
+            spot_instances=spot_instances,
+            region=region,
         )
 
     def save_exclude(self) -> Optional[MappingIntStrAny]:
@@ -854,14 +864,18 @@ def halt(self) -> None:
             self.state = ScalesetState.halt
             self.delete()
 
-    def max_size(self) -> int:
+    @classmethod
+    def scaleset_max_size(cls, image: str) -> int:
         # https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/
         #   virtual-machine-scale-sets-placement-groups#checklist-for-using-large-scale-sets
-        if self.image.startswith("/"):
+        if image.startswith("/"):
             return 600
         else:
             return 1000
 
+    def max_size(self) -> int:
+        return Scaleset.scaleset_max_size(self.image)
+
     @classmethod
     def search_states(
         cls, *, states: Optional[List[ScalesetState]] = None

diff --git a/src/api-service/__app__/onefuzzlib/tasks/main.py b/src/api-service/__app__/onefuzzlib/tasks/main.py
@@ -153,6 +153,27 @@ def get_by_task_id(cls, task_id: UUID) -> Union[Error, "Task"]:
         task = tasks[0]
         return task
 
+    @classmethod
+    def get_tasks_by_pool_name(
+        cls, pool_name: str
+    ) -> Optional[Union[Error, List["Task"]]]:
+        tasks = cls.search()
+        if not tasks:
+            return Error(code=ErrorCode.INVALID_REQUEST, errors=["unable to find task"])
+
+        pool_tasks = []
+
+        for task in tasks:
+            if not task.config.pool:
+                continue
+            if pool_name == task.config.pool.pool_name and task.state not in [
+                TaskState.stopped,
+                TaskState.stopping,
+            ]:
+                pool_tasks.append(task)
+
+        return pool_tasks
+
     def mark_stopping(self) -> None:
         if self.state not in [TaskState.stopped, TaskState.stopping]:
             self.state = TaskState.stopping

diff --git a/src/api-service/__app__/onefuzzlib/versions.py b/src/api-service/__app__/onefuzzlib/versions.py
@@ -16,8 +16,8 @@
 def read_local_file(filename: str) -> str:
     path = os.path.join(os.path.dirname(os.path.realpath(__file__)), filename)
     if os.path.exists(path):
-        with open(path, "r") as handle:
-            return handle.read().strip()
+        with open(path, "rb") as handle:
+            return handle.read().strip().decode("utf-16")
     else:
         return "UNKNOWN"
 

diff --git a/src/api-service/__app__/pool/__init__.py b/src/api-service/__app__/pool/__init__.py
@@ -3,14 +3,15 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT License.
 
+import logging
 import os
 
 import azure.functions as func
 from onefuzztypes.enums import ErrorCode, PoolState
 from onefuzztypes.models import AgentConfig, Error
 from onefuzztypes.requests import PoolCreate, PoolSearch, PoolStop
 
-from ..onefuzzlib.azure.creds import get_instance_name
+from ..onefuzzlib.azure.creds import get_base_region, get_instance_name, get_regions
 from ..onefuzzlib.pools import Pool
 from ..onefuzzlib.request import not_ok, ok, parse_request
 
@@ -65,12 +66,30 @@ def post(req: func.HttpRequest) -> func.HttpResponse:
             context=repr(request),
         )
 
+    logging.info(request)
+
+    if request.region is None:
+        region = get_base_region()
+    else:
+        if request.region not in get_regions():
+            return not_ok(
+                Error(code=ErrorCode.UNABLE_TO_CREATE, errors=["invalid region"]),
+                context="poolcreate",
+            )
+
+        region = request.region
+
     pool = Pool.create(
         name=request.name,
         os=request.os,
         arch=request.arch,
         managed=request.managed,
         client_id=request.client_id,
+        max_size=request.max_size,
+        vm_sku=request.vm_sku,
+        image=request.image,
+        spot_instances=request.spot_instances,
+        region=region,
     )
     pool.save()
     return ok(set_config(pool))

diff --git a/src/api-service/__app__/pool_resize/__init__.py b/src/api-service/__app__/pool_resize/__init__.py
@@ -0,0 +1,123 @@
+#!/usr/bin/env python
+#
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+import logging
+import math
+from typing import List
+
+import azure.functions as func
+from onefuzztypes.enums import NodeState, PoolState, ScalesetState
+from onefuzztypes.models import Error
+
+from ..onefuzzlib.pools import Node, Pool, Scaleset
+from ..onefuzzlib.tasks.main import Task
+
+
+def scale_up(pool: Pool, scalesets: List[Scaleset], nodes_needed: int) -> None:
+    logging.info(f"Nodes needed: {nodes_needed}")
+
+    for scaleset in scalesets:
+        if scaleset.state == ScalesetState.running:
+
+            max_size = min(scaleset.max_size(), pool.max_size)
+            logging.info(f"Scaleset size: {scaleset.size}, max_size: {max_size}")
+            if scaleset.size < max_size:
+                current_size = scaleset.size
+                if nodes_needed <= max_size - current_size:
+                    scaleset.size = current_size + nodes_needed
+                    nodes_needed = 0
+                else:
+                    scaleset.size = max_size
+                    nodes_needed = nodes_needed - (max_size - current_size)
+                scaleset.state = ScalesetState.resize
+                scaleset.save()
+
+            else:
+                continue
+
+            if nodes_needed == 0:
+                return
+
+    for _ in range(
+        math.ceil(
+            nodes_needed / min(Scaleset.scaleset_max_size(pool.image), pool.max_size)
+        )
+    ):
+        logging.info(f"Creating Scaleset for Pool {pool.name}")
+        max_nodes_scaleset = min(
+            Scaleset.scaleset_max_size(pool.image), pool.max_size, nodes_needed
+        )
+        scaleset = Scaleset.create(
+            pool_name=pool.name,
+            vm_sku=pool.vm_sku,
+            image=pool.image,
+            region=pool.region,
+            size=max_nodes_scaleset,
+            spot_instances=pool.spot_instances,
+            tags={"pool": pool.name},
+        )
+        scaleset.save()
+        # don't return auths during create, only 'get' with include_auth
+        scaleset.auth = None
+        nodes_needed -= max_nodes_scaleset
+
+
+def scale_down(scalesets: List[Scaleset], nodes_to_remove: int) -> None:
+    for scaleset in scalesets:
+        nodes = Node.search_states(
+            scaleset_id=scaleset.scaleset_id, states=[NodeState.free]
+        )
+        if nodes and nodes_to_remove > 0:
+            max_nodes_remove = min(len(nodes), nodes_to_remove)
+            if max_nodes_remove >= scaleset.size and len(nodes) == scaleset.size:
+                scaleset.state = ScalesetState.halt
+                nodes_to_remove = nodes_to_remove - scaleset.size
+                scaleset.save()
+                continue
+
+            scaleset.size = scaleset.size - max_nodes_remove
+            nodes_to_remove = nodes_to_remove - max_nodes_remove
+            scaleset.state = ScalesetState.resize
+            scaleset.save()
+
+
+def get_vm_count(tasks: List[Task]) -> int:
+    count = 0
+    for task in tasks:
+        if not task.config.pool:
+            continue
+        count += task.config.pool.count
+    return count
+
+
+def main(mytimer: func.TimerRequest) -> None:  # noqa: F841
+    pools = Pool.search_states(states=[PoolState.init, PoolState.running])
+    for pool in pools:
+        tasks = Task.get_tasks_by_pool_name(pool.name)
+        num_of_tasks = 0
+        # get all the tasks (count not stopped) for the pool
+        if not tasks or isinstance(tasks, Error):
+            continue
+
+        num_of_tasks = get_vm_count(tasks)
+        logging.info(f"#Tasks: {num_of_tasks}")
+        # do scaleset logic match with pool
+        # get all the scalesets for the pool
+        scalesets = Scaleset.search_by_pool(pool.name)
+        pool_resize = False
+        for scaleset in scalesets:
+            if scaleset.state in ScalesetState.is_resizing():
+                pool_resize = True
+                break
+            num_of_tasks = num_of_tasks - scaleset.size
+
+        if pool_resize:
+            continue
+
+        if num_of_tasks > 0:
+            # resizing scaleset or creating new scaleset.
+            scale_up(pool, scalesets, num_of_tasks)
+        elif num_of_tasks < 0:
+            scale_down(scalesets, abs(num_of_tasks))
diff --git a/src/api-service/__app__/pool_resize/function.json b/src/api-service/__app__/pool_resize/function.json
@@ -0,0 +1,11 @@
+{
+  "scriptFile": "__init__.py",
+  "bindings": [
+    {
+      "name": "mytimer",
+      "type": "timerTrigger",
+      "direction": "in",
+      "schedule": "00:01:00"
+    }
+  ]
+}
diff --git a/src/api-service/__app__/requirements.txt b/src/api-service/__app__/requirements.txt