Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API closest_destination function to handle destination attributes rather than a selected objectstore #4

Merged

Conversation

sanjaysrikakulam
Copy link
Member

This PR updates the API function closest_destinations to handle and work with destination attributes instead of the selected objectstore.

The idea is to choose a destination closest to the object store where the dataset is located, i.e., choosing the compute closest to the data or edge computing in other terms.

I have also updated the example and the Readme to reflect the same.

A PR to add a helper function to find/extract the object store ID and the size of the dataset from a job was recently submitted and merged into TPV.

Through this PR, we try to use this helper function to get the object store ID and size of the dataset and send it to the API, and the closest_destination function in the API will handle the most basic scenarios,

  1. When there is no object store: Returns all destination IDs, and we can fall to the TPVs random weighted sampling method
  2. When there is only one dataset
  3. When there is more than one dataset and all of them reside in the same object store
  4. When there is more than one dataset, and they reside in different object stores

Please double-check the changes in the PR.

Notes:
@bgruening suggested in the TPV channel that we could also use the size of the dataset in the decision-making. I have not implemented that in this PR.

@pauldg
Copy link
Collaborator

pauldg commented Feb 5, 2024

Thanks @sanjaysrikakulam ! This looks good.

Just a note to discuss is, we should probably have a small test suite that uses both TPV and the API within this repo. Until now I have been updating a test that uses the API here: galaxyproject/total-perspective-vortex#108 but since we're contionually updating the API, this type of test should probably not be included in the TPV codebase. Deploying the API with a galaxy instance is probably too much for the purpose of testing as well.

What do you think?

objectstore_loc_path = "tests/fixtures/object_store_locations.yml"
selected_object_store = "object_store_australia"
dataset_attributes = helpers.get_dataset_attributes(job.get_input_datasets())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong but his seems to be used like this:

Suggested change
dataset_attributes = helpers.get_dataset_attributes(job.get_input_datasets())
dataset_attributes = helpers.get_dataset_attributes(job.input_datasets)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both should work. However, we can switch it to whatever is in the TPV.

@pauldg pauldg merged commit 217cf94 into usegalaxy-eu:main Feb 5, 2024
@sanjaysrikakulam
Copy link
Member Author

Thanks @sanjaysrikakulam ! This looks good.

Just a note to discuss is, we should probably have a small test suite that uses both TPV and the API within this repo. Until now I have been updating a test that uses the API here: galaxyproject/total-perspective-vortex#108 but since we're contionually updating the API, this type of test should probably not be included in the TPV codebase. Deploying the API with a galaxy instance is probably too much for the purpose of testing as well.

What do you think?

I agree that we need some testing in this repo. However, I am unsure at what level; as you have pointed out, having a Galaxy instance + TPV to test the API is too much to add to this repo. Let's discuss this further.

@bgruening
Copy link
Member

Just a note to discuss is, we should probably have a small test suite that uses both TPV and the API within this repo. Until now I have been updating a test that uses the API here: galaxyproject/total-perspective-vortex#108 but since we're contionually updating the API, this type of test should probably not be included in the TPV codebase. Deploying the API with a galaxy instance is probably too much for the purpose of testing as well.

What do you think?

My feeling is that we should get the service/API done at first. The TPV integration can come later. That was the whole idea of creating the API, so that we don't need to deal with TPV integration and changes much and can iterate faster on the API level.

@sanjaysrikakulam
Copy link
Member Author

The API itself is "ready". It is already deployed on the ESG instance. Sebastian has requested a complete integration demo for the deliverable due 29th Feb 2024.

The API can be queried for example using the below,

import requests

destinations = [
      {
         "id": "pulsar_italy",
         "abstract": False,
         "runner": "general_pulsar_1",
         "destination_name_override": "pulsar_italy",
         "cores": None,
         "mem": None,
         "gpus": None,
         "min_cores": None,
         "min_mem": None,
         "min_gpus": None,
         "max_cores": None,
         "max_mem": None,
         "max_gpus": None,
         "min_accepted_cores": None,
         "min_accepted_mem": None,
         "min_accepted_gpus": None,
         "max_accepted_cores": 8,
         "max_accepted_mem": 32,
         "max_accepted_gpus": None,
         "env": None,
         "params": None,
         "resubmit": None,
         "scheduling": {
         "require": ["pulsar"],
         "prefer": [],
         "accept": ["general"],
         "reject": []
         },
         "inherits": None,
         "context": {
            "latitude": 50.0689816,
            "longitude": 19.9070188 },
         "rules": {},
         "tags": None
      },
      {
         "id": "slurm_poland",
         "abstract": False,
         "runner": "slurm",
         "destination_name_override": "slurm_poland",
         "cores": None,
         "mem": None,
         "gpus": None,
         "min_cores": None,
         "min_mem": None,
         "min_gpus": None,
         "max_cores": None,
         "max_mem": None,
         "max_gpus": None,
         "min_accepted_cores": None,
         "min_accepted_mem": None,
         "min_accepted_gpus": None,
         "max_accepted_cores": 16,
         "max_accepted_mem": 64,
         "max_accepted_gpus": None,
         "env": None,
         "params": None,
         "resubmit": None,
         "scheduling": {
         "require": [],
         "prefer": [],
         "accept": ["slurm"],
         "reject": []
         },
         "inherits": None,
         "context": {
            "latitude": 51.9189046,
            "longitude": 19.1343786},
         "rules": {},
         "tags": None
      },
      {
        "id": "slurm_germany",
        "abstract": False,
        "runner": "slurm",
        "destination_name_override": "slurm_germany",
        "cores": None,
        "mem": None,
        "gpus": None,
        "min_cores": None,
        "min_mem": None,
        "min_gpus": None,
        "max_cores": None,
        "max_mem": None,
        "max_gpus": None,
        "min_accepted_cores": None,
        "min_accepted_mem": None,
        "min_accepted_gpus": None,
        "max_accepted_cores": 16,
        "max_accepted_mem": 64,
        "max_accepted_gpus": None,
        "env": None,
        "params": None,
        "resubmit": None,
        "scheduling": {
        "require": [],
        "prefer": [],
        "accept": ["slurm"],
        "reject": []
        },
        "inherits": None,
        "context": {
        "latitude": 51.165691,
        "longitude": 10.451526},
        "rules": {},
        "tags": None
        }
   ]
objectstores = {
      "object_store_italy_S3_01": {
         "latitude": 57.0689816,
         "longitude": 9.9070188,
         "other_stuff_that_we_find_useful": "foobar"
      },
      "object_store_poland": {
         "latitude": 41.9189046,
         "longitude": 19.1343786,
         "other_stuff_that_we_find_useful": "foobar"
      },
      "object_store_germany": {
         "latitude": 31.165691,
         "longitude": 14.451526,
         "other_stuff_that_we_find_useful": "foobar"
      },
   }
dataset_attributes = {
        "dataset_italy": {
            "object_store_id": "object_store_italy_S3_01",
            "size": 123
        },
        "dataset_poland": {
            "object_store_id": "object_store_poland",
            "size": 456
        },
        "dataset_germany": {
            "object_store_id": "object_store_germany",
            "size": 789
        }
    }

api_url = <API_URL>

input_data = {
    "destinations": destinations,
    "objectstores": objectstores,
    "dataset_attributes": dataset_attributes
}

response = requests.post(api_url, json=input_data)
print(response.json())

However, more algorithms other than geolocation need to be added by Abdulrahman, which will take the API to the next step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants