Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add timeout to requests towards ETL API #690

Closed
bossie opened this issue Feb 26, 2024 · 3 comments
Closed

add timeout to requests towards ETL API #690

bossie opened this issue Feb 26, 2024 · 3 comments
Assignees
Labels

Comments

@bossie
Copy link
Collaborator

bossie commented Feb 26, 2024

JobTracker was hanging on Terrascope and CDSE and had to be killed. Last line in the logs was:

{
  "message": "logging resource usage {'jobId': 'j-2402222929774711bcc7b5414431dae3', 'jobName': 'CH4', 'executionId': 'a-c9e77e8cd4f74ecf85ab9db984d9f24f', 'userId': '6c19184e-dc90-48bf-8eb4-0a9a74e992e0', 'sourceId': 'cdse', 'orchestrator': 'openeo', 'jobStart': 1708620563000.0, 'jobFinish': 1708622583000.0, 'idempotencyKey': 'a-c9e77e8cd4f74ecf85ab9db984d9f24f', 'state': 'FINISHED', 'status': 'UNDEFINED', 'metrics': {'cpu': {'value': 3600, 'unit': 'cpu-seconds'}, 'memory': {'value': 7372800.0, 'unit': 'mb-seconds'}, 'time': {'value': 2020000.0, 'unit': 'milliseconds'}, 'processing': {'value': 307.1666758209467, 'unit': 'shpu'}}} at https://marketplace-cost-api-prod-warsaw.dataspace.copernicus.eu",
  "levelname": "DEBUG",
  "name": "openeogeotrellis.integrations.etl_api",
  "created": 1708779120.3757954,
  "filename": "etl_api.py",
  "lineno": 127,
  "process": 1,
  "job_id": "j-2402222929774711bcc7b5414431dae3",
  "user_id": "6c19184e-dc90-48bf-8eb4-0a9a74e992e0"
}

Adding a timeout to the requests towards the ETL API should unblock JobTracker.

Note: this does not solve the underlying problem; when the timeout is reached, the batch job succeeds but the user might not be charged.

@bossie bossie self-assigned this Feb 26, 2024
@bossie bossie added the bug label Feb 26, 2024
@bossie
Copy link
Collaborator Author

bossie commented Feb 26, 2024

Suggestion by @soxofaan: retry ETL API requests.

https://github.com/eu-cdse/openeo-cdse-infra/issues/41 made it possible to retry ETL API requests without the risk of charging the user multiple times. The underlying problem was a large process graph that couldn't fit in the job's ZNode; this prevented the job from being marked as completed so it would be picked up again in subsequent JobTracker runs and the user would be charged again.

So the suggestion is about retries within a particular JobTracker run rather than across JobTracker runs and still makes sense.

@bossie
Copy link
Collaborator Author

bossie commented Feb 26, 2024

ETL API requests should already be retried in sync requests and batch jobs because of respectively:

requests_session = requests_with_retry(total=3, backoff_factor=2)
cpu_seconds = backend_config.default_usage_cpu_seconds
mb_seconds = backend_config.default_usage_byte_seconds / 1024 / 1024
etl_api = get_etl_api(
user=user,
allow_dynamic_etl_api=bool(
# TODO #531 this is temporary feature flag, to removed when done
backend_config.etl_dynamic_api_flag
and flask.request.args.get(backend_config.etl_dynamic_api_flag)
),
requests_session=requests_session,
# TODO #531 provide a TtlCache here
etl_api_cache=None,
)

and

self._request_session = requests_with_retry(total=3, backoff_factor=2)
# Cache of `EtlApi` instances, used in `get_etl_api()`
self._etl_cache: Optional[TtlCache] = TtlCache(default_ttl=cache_ttl) if cache_ttl > 0 else None
def calculate_costs(self, details: CostsDetails) -> float:
job_options = details.job_options or {}
etl_api = get_etl_api(
job_options=job_options,
allow_dynamic_etl_api=True,
requests_session=self._request_session,
etl_api_cache=self._etl_cache,
)

@bossie
Copy link
Collaborator Author

bossie commented Feb 26, 2024

Could use a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants