-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECS Executor - Overriding Additional ECS Task Properties #35490
Comments
I started working on this one and there is a bit of nuance that I didn't expect. Many of the config options are lists of dicts. Take tags for example, the type is: tags=[
{
'key': 'string',
'value': 'string'
},
], So if you have a base set of run_task_kwargs configured in your airflow config such as: AIRFLOW__AWS_ECS_EXECUTOR__RUN_TASK_KWARGS='{
"tags": [{"key": "fish", "value": "banana"}, {"key": "peanut", "value": "sauce"}],
"startedBy": "Niko",
"overrides": {"containerOverrides": [{"memory": 500, "environment":[{"name": "FOO", "value": "BAR"}]}]}}' And in your @task(
executor_config={
"tags": [{"key": "FOO", "value": "BAR"}],
"overrides": {"containerOverrides": [{"memory": 740}]},
}
)
def hello_world():
print("hello_world") Should the final kwargs sent to ECS for tags be the sum of the two?: There are many other configs that are lists of dicts in the ECS run_task API, I'm not sure one answer fits all of them. containerOverrides for example cannot be additive, we must update each dict with what is provided in the I'd very much appreciate some input from @ferruzzi @shubham22 @syedahsn As well as from @mshober who has an implementation of this, so I'd be interested to hear what approach they took and how it's working out for them:
|
Thanks for starting this @o-nikolas! It is important to note that you can use I'm also personally not a huge fan of the
My implementation is very specific and abstracted to my use case: def _run_task_kwargs(self, task_id: TaskInstanceKeyType, cmd: CommandType, queue: str, exec_config: ExecutorConfigType) -> dict:
capacity_provider = exec_config.get("capacity_provider")
memory_reservation_mib: int | None = exec_config.get("memory_reservation_mib")
memory_limit_mib: int | None = exec_config.get("memory_limit_mib")
cpu_reservation: int | None = exec_config.get("cpu_reservation")
run_task_api = deepcopy(self.run_task_kwargs)
container_override = self.get_container(run_task_api["overrides"])
container_override["command"] = cmd
if cpu_reservation is not None:
container_override["cpu"] = int(cpu_reservation)
if memory_reservation_mib is not None:
container_override["memoryReservation"] = int(memory_reservation_mib)
if memory_limit_mib is not None:
container_override["memory"] = int(memory_limit_mib)
if capacity_provider:
run_task_api["capacityProviderStrategy"] = [{"capacityProvider": capacity_provider}]
return run_task_api so my code is likely not too helpful. I'll spend some time thinking about how it can be improved to fit all use cases. |
It's not really so much that tags are the specific case that needs a work around, just the example I chose, there are many others.
I'm not in favour of this approach. There are many config options for ECS RunTask, creating a first class Airflow config option for each one is untenable. Especially because that couples us very tightly to the ECS API, every change it makes we must reflect in our config options. The few important configs we have now plus the catch-all of the
Mmm, yeah, that is very specific to your usecase it seems. Thanks for sharing it either way! |
I think tags should be additive.... AFAIK it will always be a list of len(1) and we should have an order of precedence; just tags[0]update() it so they get appended or overridden as expected seems reasonable to me. I think that's a pretty straightforward solution and aligns with how we handle tags elsewhere. I'm pretty sure the container_overrides is the same and there can only be one container (for now?) so it would only ever be a len(1) list? In which case I like your solution. If I'm mistaken, then the issue becomes how can we identify the container? In your example above, if it is possible to have multiple containers, there is no indication about which container is being referenced so I don't think we can make that call. We can't assume that list[0] is always the same container if there is no container id stored to verify it. If there can only be one container then I don't see the issue with how we handle it everywhere else; set an order of precedence and update() up the chain to override or append as necessary. I definitely agree that I wouldn't want an explicit config option for every possible field. Since we don't have any control over the boto API and don't get any kind of prior warning to changes they might make, we'll be stuck chasing it any time they decide to make a change. I suppose the flip side of that is that we're just passing the buck to the users instead, but as Niko mentioned it's how we handle it elsewhere so there's precedent. |
Thanks for taking the time to read and consider this issue @ferruzzi, it's a sticky one and I appreciate the brain cycles.
If we go the purely additive approach, it means that there's no way for a task to use executor_config to override the tags provided by the run_task_kwargs config template. Which seems like a sad compromise, but if everyone else agrees then that's fair.
Perhaps we can assert that the number of items in the overrides from the executor_config matches that we already have from the airflow config options. Also the override config itself contains sub configs that are themselves lists, so you run into the problem for each of those as well (should we do additive, override, zip, etc) sigh. Plus there are other config options that are lists of dicts as well:
|
Overwrite, yes. Remove, no. So maybe the solution is to make tags additive with values getting overridden if the name already exists, and some way to explicitly remove a tag? "If tag name starts with
It's entirely possible I haven't thought this through down to the bone, but it seems to me that all of them you listed should be override-with-hierarchy. It's possible I'm missing a usecase, of course, but when I don't think it's unreasonable to expect the finer-grained placementStrategy to override the more general one, for example. It seems like the tags are the only one that jumps out at me as really needing add/merge, where you might want to add a tag for which environment launched the task, and another tag at the DAG level stating something, and keep building up that list. |
Mmm, ya that's an interesting idea. I'm a little concerned with coming up with something too bespoke for each config type. That's a lot of code/complexity to maintain. Interested to hear what others think.
Interesting, I'm not sure of all the edge cases there either. I don't use those configs enough to say for sure, maybe I'll do some more testing with those to see what's up. |
I'd vote for simplicity and keeping things consistent irrespective of the type of the field, which means |
This approach is definitely enticing. Thoughts @syedahsn @ferruzzi? @shubham22, regarding your approach. In my original example above I have
|
@o-nikolas - it would be 2. the entire |
Interesting, that means if you want to just update one field for a particular task or DAG (like in this example, add some tags or update the memory for a particularly heavy task) you may wipe out 10s or even ~100 lines of config settings, and to avoid that you'd need to copy all those settings into the executor_config of the task or dag to just change one or two of them. That feels like a bit of a crummy user experience? |
Yeah, that's fair critique. To be on the same page, I was suggesting replacing at 1st level field. However, guessing by how this might be used, a user may want to change just memory to accommodate a given task, while keeping everything same. So, instead of replacing a field at the 1st level, we can replace a field at lowest level. In that case as well we should follow same behavior for arrays or strings.
Final config for task:
Yes, you'd still need to provide entire containerOverrides set. |
Yeah, this I'm more onboard with, the final sticking point is exactly what you call out though. containerOverrides can be very large, so having to specify that whole thing to change just one piece would be very frustrating. wild thought to consider: I wonder if we just keep the current behaviour and just let people subclass this executor to change it if they see fit? Any direction we consider here feels very opinionated and has very clear downsides. I'm not convinced this is the right approach either, but I'm curious to hear what others think (CC @shubham22) |
Sorry, I've been dealing with laptop issues. I like where this is heading, in general I feel like the discussion you two had is leading the right way. I guess the next logical step from there would be to do the same within containerOverrides. In the end maybe that might just end up looking like a recursive merge of the two configs, with override precedence going to the "most local" explicit value? |
Coming a bit late to the conversation, but I just want to make sure I'm following correctly. Is it true that from @shubham22 's example, the "best" final config would look something like this?
If so, I think this approach gets the best of both worlds in that the user doesn't have to repeat existing config, and can add or overwrite specific keys to customize individual tasks. |
@syedahsn - confused about which approach you're calling "best" here. The main difference between your config and my suggestion is the merge and replace of list configs that you're doing. My suggestion is that we should only merge configurations when they're not defined in |
The difference that I was pointing out between our suggestions was that if we don't specify a key in So in your example, even though So in the example, there is The difference between our approach is what happens in As for issue of removing a tag (or any key) from a dictionary, maybe we can use an empty value (i.e.
The final config will be :
|
I ended up going with a very opinionated approach which very closely approximates standard python update. What @shubham22 proposed here. It indeed has the drawbacks that I replied after that comment, but I think these are the easiest to expect, since it is how a dict update would behave normally in python. So this approach has the least hidden gotchas or bespoke behaviour, making it easily to maintain and transfer to other usecases. PR is here: #37137 |
Description
From feedback here:
The executor config is scoped to overrides.containerOverrides. However there are relevant properties outside of overrides.containerOverrides that users may want to change.
For example, our ECS Cluster is actually composed of 3 capacity providers: A General-Purpose Capacity Provider (which is our cluster's default provider and runs on M7g instances), Memory-Optimized (R7g instances) and Compute-Optimized (C7g instances). My version of the ECS Executor allows users to set the appropriate Capacity Provider via the operator's executor_config param so that we can run our jobs in the most cost-efficient environment.
There are several other properties which airflow uses may want to set on a task-level, such as:
Use case/motivation
No response
Related issues
#34381
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: