You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had an external feature request on whether it's possible to remove the navigation/embodiment (e.g. object manipulation) aspects of tasks through setting a flag (e.g. embodiment = false) to try to distill the scientific discovery aspects of the tasks from other skills (even more so than the unit tests do).
Adding this as a feature request so we can start a thread on how this might be accomplished, since there are a number of implementation routes/challenges:
Navigation: Currently we have 'teleport_to_location' and 'teleport_to_object' actions that greatly reduce the navigation needs of an agent. What might it look like to have zero navigation needs, i.e. not needing to move, search for items, etc? Would this look like, e.g., being able to see and manipulate all objects in the environment, regardless of the agent's current location (i.e. an omniscient agent)? If so, this changes the model (e.g. it's no longer a partially-observable environment).
Object manipulation: It's not immediately clear to me how to remove these requirements -- e.g. picking up objects that you need, etc. -- unless we also make the agent able to interact with every object regardless of its location (i.e. no checks for whether the objects that an action is performed on are accessible), which might be a simple change.
Some of the challenges here are:
Would these be faithful tests of removing navigation/object manipulation requirements, while maintaining discovery task difficulty? If not, what might alternate modifications be?
Observation: If the agent is omniscient and able to view many more(/all) objects in the environment, suddenly the size of the observation from the environment might become very large if it has to enumerate ~1000 objects, instead of only the objects within a short distance from the agent. That would definitely increase the load on an LLM/other agent model.
There are tasks that have steps that are spatial in nature (e.g. rocket science), and it's not clear how these modifications would translate.
How do we maintain a faithful representation of the visual output, if the agent isn't moving/can manipulate everything? Possibly teleport it to the last object that it interacted with, and use that as the visual observation?
The text was updated successfully, but these errors were encountered:
Really appreciate this excellent work! However, some feedbacks hope to be valuable.
The benchmark has various actions for an agent to take. Those actions such as moving left, and right, are meaningless (at least I think it is not a good setting).
An LLM with a long prompt and too much prompt engineering generally fails to finish the game.
It is hard to load a novel algorithm if there is no clear API documentation and examples.
The given random agent is useful but doesn't consider the feasibility of adapting to other cases, together with the prompt for location transition. (as I see the above issue)
Hope the author can release the code example and modify the code base by removing the location transition. That would be helpful.
@Dandelionym thanks for sharing additional feedback. Do you have any idea on how best to remove the navigation action and still make sense in a multi-modal environment?
Even for pure text-based games (see ScienceWorld and TextWorld), a minimum of spatial navigation is needed. If we completely abstract it away, then it means all objects can be interacted with at all time, i.e. removing a big chunk of partial observability.
We've had an external feature request on whether it's possible to remove the navigation/embodiment (e.g. object manipulation) aspects of tasks through setting a flag (e.g.
embodiment = false
) to try to distill the scientific discovery aspects of the tasks from other skills (even more so than the unit tests do).Adding this as a feature request so we can start a thread on how this might be accomplished, since there are a number of implementation routes/challenges:
Some of the challenges here are:
The text was updated successfully, but these errors were encountered: