Releases: ServiceNow/BrowserGym
Releases · ServiceNow/BrowserGym
v0.11.3: minor fixes
Bugfixes
- Fix duplicate depends_on in webarena metadata #228
Improvements
- Easier webarena / visualwebarena setup with (running
nltk.download()
at import time) #227 - More robust
full_reset()
for webarena / visualwebarena #230 - Removed ARIA extraction warnings #233
- New benchmark configuration
webarena_tiny
#232
Full Changelog: v0.11.2...v0.11.3
v0.11.2
version bump 0.11.2
v0.11.1: Benchmark update
New features
- Set max steps to 30 in webarena / visualwenarena benchmarks #214
- Benchmark dependency graph utilities #220
- Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224
Bugfixes
- Rename benchmark after subset_from_split() #221
- ExpArgs.exp_dir sanitization #222
- get_step_info() bugfix #223
Full Changelog: v0.11.0...v0.11.1
v0.11.0: WebLINX 🎉
New features
browsergym-experiments
browsergym-core
- New
hide_all_bids
option inflatten_dom_to_str()
andflatten_axtree_to_str()
#212 (thanks @imenelydiaker) - Leaner
Unicode()
gym space #218
Bugfixes
Benchmark.prepare_backends()
fixes #209
Full Changelog: v0.10.2...v0.11.0
v0.10.2: Benchmark update
New features
- New
Benchmark.prepare_backend()
method #204
Bugfixes
save_step_info()
bugfix whenobs==None
(truncated episode due toNone
action) #207
Full Changelog: v0.10.1...v0.10.2
v0.10.1: Benchmark updates
Minor changes
- train / test splits for WorkArena L2 and L3 tasks #203
- More fine-grained per-benchmark action sets #202
Full Changelog: v0.10.0...v0.10.1
v0.10.0: AssistantBench! 🎉
New features
- New BrowserGym benchmark AssistantBench, packaged as
browsergym-assistantbench
. Thanks @oriyor ! #186import browsergym.assistantbench env = gym.make("browsergym/assistantbench.validation.12") env = gym.make("browsergym/assistantbench.test.42")
- Default train/test splits for all benchmarks
miniwob = DEFAULT_BENCHMARKS["miniwob"] # 125 tasks x 5 seeds miniwob_train = miniwob.subset_from_split("train") # 62 tasks x 5 seeds miniwob_test = miniwob.subset_from_split("test") # 63 tasks x 5 seeds
Breaking Changes
Fixes
- Improved experiment logging #182
Full Changelog: v0.9.0...v0.10.0
v0.9.0: Benchmarks! 🎉
New features
- Benchmarks with default config (tasks x seeds) and metadata #173 #191
from browsergym.experiments import BENCHMARKS, Benchmark # make a custom benchmark benchmark = Benchmark( name="miniwob_click_test", high_level_action_set_args=HighLevelActionSetArgs( subsets=["bid"], multiaction=False, strict=False, retry_with_force=False, demo_mode="off", ), env_args_list=[ EnvArgs( task_name="miniwob.click-test", task_seed=42, max_steps=5, ) ], ) # use a pre-existing benchmark miniwob = BENCHMARKS["miniwob_all"]() # use only a task subset miniwob_original = miniwob.subset_from_glob( column="miniwob_category", glob="original" )
- New playwright key modifier "ControlOrMeta" #187
- Global demo_mode flag #177
import browsergym.core.action browsergym.core.action.set_global_demo_mode(True) # boolean
Fixes
- Multi-tab actions fix #188
Full Changelog: v0.8.1...v0.9.0
v0.8.1 - SoM bugfix
v0.8.0: goal_object
browsergym-core
- Breaking changes
- goal refactor #110
obs["goal_object"]
now replaces the oldobs["goal_image_urls"]
obs["goal"]
is now deprecated
the newgoal_object
now contains a list of openai-style messages, which can include an arbitrary mix of text and / or images.
- goal refactor #110
browsergym-visualwebarena
-
Breaking changes
- goal refactor #110, the goal is now a list of openai-style messages with goal images as base64
image_url
messages.
- goal refactor #110, the goal is now a list of openai-style messages with goal images as base64
-
Fixes
browsergym-experiments
- Improvements
- leaner trace files #169
other