-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* extracted resource watching logic into a separate class; implemented logic for observer in a RecourceWatcher class; added method to stop a thread gracefully * finish the observer logic extraction; merge changes from main; add resource watcher instance to enable_joblogs to subscribe to the event watcher if the log feature is configured; delete logic about event watcher from main; pass container for list objects function instead of container name; remove start methon from log handler class; modify joblogs init to subscribe to event watcher * add number of retry attempts and exponential backoff time to the reconnect loop for event watcher; make number of reconnect attempts, backoff time and a coefficient for exponential growth configurable via config; add backoff_time, reconnection_attempts and backoff_coefficient as attributes to the resource watcher init; add resource_version as a param to w.stream so a failed stream can read from the last resource it was able to catch; add urllib3.exceptions.ProtocolError and handle reconnection after some exponential backoff time to avoid api flooding; add config as a param for init for resource watcher; modify config in kubernetes.yaml and k8s config to contain add backoff_time, reconnection_attempts and backoff_coefficient * add logic to reset number of reconnect attempts and backoff time when connection to the k8s was achieved so only sequential failures detected; add exception handling to watch_pods to handle failure in urllib3, when source version is old and not available anymore, and when stream is ended; remove k8s resource watcher initialization from run function in api.py and move it to k8s.py launcher as _init_resource_watcher; refactor existing logic from joblogs/__init__.py to keep it in _init_resource_watcher and enable_joblogs in k8s launcher * added a CONFIG.md file with detailed explanations about parameters used to re-establish connection to the Kubernetes wather * move section about config file from README.md to CONFIG.md; add a link to the CONFIG.md in the README.md; remove variables for reconnection_attempts, backoff_time and backoff_coefficient fron the sample config since default values are provided in the code.
- Loading branch information
Showing
11 changed files
with
251 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
## About | ||
This file provides you with the detailed description of parameters listed in the config file, and explaining why they are used | ||
and when you are expected to provide or change them. | ||
|
||
## Configuration file | ||
|
||
* `http_port` - defaults to `6800` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#http-port)) | ||
* `bind_address` - defaults to `127.0.0.1` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#bind-address)) | ||
* `max_proc` - _(implementation pending)_, if unset or `0` it will use the number of nodes in the cluster, defaults to `0` ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#max-proc)) | ||
* `repository` - Python class for accessing the image repository, defaults to `scrapyd_k8s.repository.Remote` | ||
* `launcher` - Python class for managing jobs on the cluster, defaults to `scrapyd_k8s.launcher.K8s` | ||
* `username` - Set this and `password` to enable basic authentication ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#username)) | ||
* `password` - Set this and `username` to enable basic authentication ([➽](https://scrapyd.readthedocs.io/en/latest/config.html#password)) | ||
|
||
The Docker and Kubernetes launchers have their own additional options. | ||
|
||
## [scrapyd] section, reconnection_attempts, backoff_time, backoff_coefficient | ||
|
||
### Context | ||
The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the | ||
number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you | ||
choose to use them. | ||
|
||
The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the | ||
nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate | ||
long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived | ||
connection to the Kubernetes API. To achieve this, three parameters were introduced: `reconnection_attempts`, | ||
`backoff_time` and `backoff_coefficient`. | ||
|
||
### What are these parameters about? | ||
- `reconnection_attempts` - defines how many consecutive attempts will be made to reconnect if the connection fails; | ||
- `backoff_time` and `backoff_coefficient` - are used to gradually slow down each subsequent attempt to establish a | ||
connection with the Kubernetes API, preventing the API from becoming overloaded with requests. The `backoff_time` increases | ||
exponentially and is calculated as `backoff_time *= self.backoff_coefficient`. | ||
|
||
### When do I need to change it in the config file? | ||
Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network | ||
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1 @@ | ||
import logging | ||
from scrapyd_k8s.joblogs.log_handler_k8s import KubernetesJobLogHandler | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
def joblogs_init(config): | ||
""" | ||
Initializes job logs handling by starting the Kubernetes job log handler. | ||
Parameters | ||
---------- | ||
config : Config | ||
Configuration object containing settings for job logs and storage. | ||
Returns | ||
------- | ||
None | ||
""" | ||
joblogs_config = config.joblogs() | ||
if joblogs_config and joblogs_config.get('storage_provider') is not None: | ||
log_handler = KubernetesJobLogHandler(config) | ||
log_handler.start() | ||
logger.info("Job logs handler started.") | ||
else: | ||
logger.warning("No storage provider configured; job logs will not be uploaded.") | ||
from scrapyd_k8s.joblogs.log_handler_k8s import KubernetesJobLogHandler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
import threading | ||
import logging | ||
import time | ||
from kubernetes import client, watch | ||
from typing import Callable, List | ||
import urllib3 | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
class ResourceWatcher: | ||
""" | ||
Watches Kubernetes pod events and notifies subscribers about relevant events. | ||
Attributes | ||
---------- | ||
namespace : str | ||
Kubernetes namespace to watch pods in. | ||
subscribers : List[Callable] | ||
List of subscriber callback functions to notify on events. | ||
""" | ||
|
||
def __init__(self, namespace, config): | ||
""" | ||
Initializes the ResourceWatcher. | ||
Parameters | ||
---------- | ||
namespace : str | ||
Kubernetes namespace to watch pods in. | ||
""" | ||
self.namespace = namespace | ||
self.reconnection_attempts = int(config.scrapyd().get('reconnection_attempts', 5)) | ||
self.backoff_time = int(config.scrapyd().get('backoff_time', 5)) | ||
self.backoff_coefficient = int(config.scrapyd().get('backoff_coefficient', 2)) | ||
self.subscribers: List[Callable] = [] | ||
self._stop_event = threading.Event() | ||
self.watcher_thread = threading.Thread(target=self.watch_pods, daemon=True) | ||
self.watcher_thread.start() | ||
logger.info(f"ResourceWatcher thread started for namespace '{self.namespace}'.") | ||
|
||
def subscribe(self, callback: Callable): | ||
""" | ||
Adds a subscriber callback to be notified on events. | ||
Parameters | ||
---------- | ||
callback : Callable | ||
A function to call when an event is received. | ||
""" | ||
if callback not in self.subscribers: | ||
self.subscribers.append(callback) | ||
logger.debug(f"Subscriber {callback.__name__} added.") | ||
|
||
def unsubscribe(self, callback: Callable): | ||
""" | ||
Removes a subscriber callback. | ||
Parameters | ||
---------- | ||
callback : Callable | ||
The subscriber function to remove. | ||
""" | ||
if callback in self.subscribers: | ||
self.subscribers.remove(callback) | ||
logger.debug(f"Subscriber {callback.__name__} removed.") | ||
|
||
def notify_subscribers(self, event: dict): | ||
""" | ||
Notifies all subscribers about an event. | ||
Parameters | ||
---------- | ||
event : dict | ||
The Kubernetes event data. | ||
""" | ||
for subscriber in self.subscribers: | ||
try: | ||
subscriber(event) | ||
except Exception as e: | ||
logger.exception(f"Error notifying subscriber {subscriber.__name__}: {e}") | ||
|
||
def watch_pods(self): | ||
""" | ||
Watches Kubernetes pod events and notifies subscribers. | ||
Runs in a separate thread. | ||
""" | ||
v1 = client.CoreV1Api() | ||
w = watch.Watch() | ||
resource_version = None | ||
|
||
logger.info(f"Started watching pods in namespace '{self.namespace}'.") | ||
backoff_time = self.backoff_time | ||
reconnection_attempts = self.reconnection_attempts | ||
while not self._stop_event.is_set() and reconnection_attempts > 0: | ||
try: | ||
kwargs = { | ||
'namespace': self.namespace, | ||
'timeout_seconds': 0, | ||
} | ||
if resource_version: | ||
kwargs['resource_version'] = resource_version | ||
first_event = True | ||
for event in w.stream(v1.list_namespaced_pod, **kwargs): | ||
if first_event: | ||
# Reset reconnection attempts and backoff time upon successful reconnection | ||
reconnection_attempts = self.reconnection_attempts | ||
backoff_time = self.backoff_time | ||
first_event = False # Ensure this only happens once per connection | ||
pod_name = event['object'].metadata.name | ||
resource_version = event['object'].metadata.resource_version | ||
event_type = event['type'] | ||
logger.debug(f"Received event: {event_type} for pod: {pod_name}") | ||
self.notify_subscribers(event) | ||
except (urllib3.exceptions.ProtocolError, | ||
urllib3.exceptions.ReadTimeoutError, | ||
urllib3.exceptions.ConnectionError) as e: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Encountered network error: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
except client.ApiException as e: | ||
# Resource version is too old and cannot be accessed anymore | ||
if e.status == 410: | ||
logger.error("Received 410 Gone error, resetting resource_version and restarting watch.") | ||
resource_version = None | ||
continue | ||
else: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Encountered ApiException: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
except StopIteration: | ||
logger.info("Watch stream ended, restarting watch.") | ||
continue | ||
except Exception as e: | ||
reconnection_attempts -= 1 | ||
logger.exception(f"Watcher encountered exception: {e}") | ||
logger.info(f"Retrying to watch pods after {backoff_time} seconds...") | ||
time.sleep(backoff_time) | ||
backoff_time *= self.backoff_coefficient | ||
|
||
|
||
def stop(self): | ||
""" | ||
Stops the watcher thread gracefully. | ||
""" | ||
self._stop_event.set() | ||
self.watcher_thread.join() | ||
logger.info(f"ResourceWatcher thread stopped for namespace '{self.namespace}'.") |
Oops, something went wrong.