-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update API closest_destination function to handle destination attributes rather than a selected objectstore #4
Update API closest_destination function to handle destination attributes rather than a selected objectstore #4
Conversation
Thanks @sanjaysrikakulam ! This looks good. Just a note to discuss is, we should probably have a small test suite that uses both TPV and the API within this repo. Until now I have been updating a test that uses the API here: galaxyproject/total-perspective-vortex#108 but since we're contionually updating the API, this type of test should probably not be included in the TPV codebase. Deploying the API with a galaxy instance is probably too much for the purpose of testing as well. What do you think? |
example_tpv_config_locations_api.yml
Outdated
objectstore_loc_path = "tests/fixtures/object_store_locations.yml" | ||
selected_object_store = "object_store_australia" | ||
dataset_attributes = helpers.get_dataset_attributes(job.get_input_datasets()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong but his seems to be used like this:
dataset_attributes = helpers.get_dataset_attributes(job.get_input_datasets()) | |
dataset_attributes = helpers.get_dataset_attributes(job.input_datasets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both should work. However, we can switch it to whatever is in the TPV.
I agree that we need some testing in this repo. However, I am unsure at what level; as you have pointed out, having a Galaxy instance + TPV to test the API is too much to add to this repo. Let's discuss this further. |
My feeling is that we should get the service/API done at first. The TPV integration can come later. That was the whole idea of creating the API, so that we don't need to deal with TPV integration and changes much and can iterate faster on the API level. |
The API itself is "ready". It is already deployed on the ESG instance. Sebastian has requested a complete integration demo for the deliverable due 29th Feb 2024. The API can be queried for example using the below, import requests
destinations = [
{
"id": "pulsar_italy",
"abstract": False,
"runner": "general_pulsar_1",
"destination_name_override": "pulsar_italy",
"cores": None,
"mem": None,
"gpus": None,
"min_cores": None,
"min_mem": None,
"min_gpus": None,
"max_cores": None,
"max_mem": None,
"max_gpus": None,
"min_accepted_cores": None,
"min_accepted_mem": None,
"min_accepted_gpus": None,
"max_accepted_cores": 8,
"max_accepted_mem": 32,
"max_accepted_gpus": None,
"env": None,
"params": None,
"resubmit": None,
"scheduling": {
"require": ["pulsar"],
"prefer": [],
"accept": ["general"],
"reject": []
},
"inherits": None,
"context": {
"latitude": 50.0689816,
"longitude": 19.9070188 },
"rules": {},
"tags": None
},
{
"id": "slurm_poland",
"abstract": False,
"runner": "slurm",
"destination_name_override": "slurm_poland",
"cores": None,
"mem": None,
"gpus": None,
"min_cores": None,
"min_mem": None,
"min_gpus": None,
"max_cores": None,
"max_mem": None,
"max_gpus": None,
"min_accepted_cores": None,
"min_accepted_mem": None,
"min_accepted_gpus": None,
"max_accepted_cores": 16,
"max_accepted_mem": 64,
"max_accepted_gpus": None,
"env": None,
"params": None,
"resubmit": None,
"scheduling": {
"require": [],
"prefer": [],
"accept": ["slurm"],
"reject": []
},
"inherits": None,
"context": {
"latitude": 51.9189046,
"longitude": 19.1343786},
"rules": {},
"tags": None
},
{
"id": "slurm_germany",
"abstract": False,
"runner": "slurm",
"destination_name_override": "slurm_germany",
"cores": None,
"mem": None,
"gpus": None,
"min_cores": None,
"min_mem": None,
"min_gpus": None,
"max_cores": None,
"max_mem": None,
"max_gpus": None,
"min_accepted_cores": None,
"min_accepted_mem": None,
"min_accepted_gpus": None,
"max_accepted_cores": 16,
"max_accepted_mem": 64,
"max_accepted_gpus": None,
"env": None,
"params": None,
"resubmit": None,
"scheduling": {
"require": [],
"prefer": [],
"accept": ["slurm"],
"reject": []
},
"inherits": None,
"context": {
"latitude": 51.165691,
"longitude": 10.451526},
"rules": {},
"tags": None
}
]
objectstores = {
"object_store_italy_S3_01": {
"latitude": 57.0689816,
"longitude": 9.9070188,
"other_stuff_that_we_find_useful": "foobar"
},
"object_store_poland": {
"latitude": 41.9189046,
"longitude": 19.1343786,
"other_stuff_that_we_find_useful": "foobar"
},
"object_store_germany": {
"latitude": 31.165691,
"longitude": 14.451526,
"other_stuff_that_we_find_useful": "foobar"
},
}
dataset_attributes = {
"dataset_italy": {
"object_store_id": "object_store_italy_S3_01",
"size": 123
},
"dataset_poland": {
"object_store_id": "object_store_poland",
"size": 456
},
"dataset_germany": {
"object_store_id": "object_store_germany",
"size": 789
}
}
api_url = <API_URL>
input_data = {
"destinations": destinations,
"objectstores": objectstores,
"dataset_attributes": dataset_attributes
}
response = requests.post(api_url, json=input_data)
print(response.json()) However, more algorithms other than geolocation need to be added by Abdulrahman, which will take the API to the next step. |
This PR updates the API function
closest_destinations
to handle and work with destination attributes instead of the selected objectstore.The idea is to choose a destination closest to the object store where the dataset is located, i.e., choosing the compute closest to the data or edge computing in other terms.
I have also updated the example and the Readme to reflect the same.
A PR to add a helper function to find/extract the object store ID and the size of the dataset from a job was recently submitted and merged into TPV.
Through this PR, we try to use this helper function to get the object store ID and size of the dataset and send it to the API, and the
closest_destination
function in the API will handle the most basic scenarios,Please double-check the changes in the PR.
Notes:
@bgruening suggested in the TPV channel that we could also use the size of the dataset in the decision-making. I have not implemented that in this PR.