Releases · ServiceNow/BrowserGym

Experiment traces can now be exported into the TapeAgents format #238
Installs weblinx_browsergym as a dependency #261
WA/VWA full instance reset will only issue a warning instead of crashing if not properly set-up #272
New debug benchmark visualwebarena_tiny #271

Full Changelog: v0.13.1...v0.13.2

Assets 30

15 Nov 18:26

github-actions

v0.13.1

a691656

v0.13.1: Many small fixes

What's Changed

browsergym-experiments

webarena / visualwebarena instance massage after reset #248 #250 #254 #259

browsergym-core

Fixed gym warnings "obs not within observation space" #251
Trace downgrades from INFO to DEBUG#252
More robust env.close(), can now be used in a finally block even after reset failure #253
Optional AbstractBrowserTask.teardown() method #255
Browsergym's register_task() now supports both frozen, non-overrideable task_kwargs as well as overrideable default_task_kwargs arguments #255
More robust frame marking #256 #258

browsergym-assistantbench

Refactored AssistantBench mechanism for saving test predictions to JSON files #242

browsergym-webarena

Relaxed playwright<1.40 restriction #257

browsergym-visualwebarena

Relaxed playwright<1.40 restriction #257

Full Changelog

v0.13.0...v0.13.1

Assets 30

07 Nov 20:35

github-actions

v0.13.0

d53fd5e

v0.13.0: Minor updates

What's changed

browsergym-core

More robust frame marking with lenient last try #245
Tasks can now choose their own locale and timezone_id #244

browsergym-experiments

Pre-download WebLINX data in prepare_backend() #226
Increase AssistantBench max_steps to 30 #244
Add select_option to webarena / visualwebarena default action set #247

browsergym-visualwebarena

Hide huggingface progress bar when downloading the visual evaluation model #241

browsergym-assistantbench

Set locale="en-US" and timezone_id="America/New_York"

Full Changelog: v0.12.0...v0.13.0

Assets 30

04 Nov 19:44

github-actions

v0.12.0

4f3c633

v0.12.0: VisualWebarena / WebLINX bugfixes

Bugfixes

browsergym-experiments

Fixes WebLINX task list #235
Refactors experiment ID generation #236
Adds VisualWebArena task dependencies #237 #239

browsergym-visualwebarena

Fixes VisualWebArena tasks with visual validation (missing captioning_fn in evaluator) #240
Adds a torch dependency (to run the captioning model) #240

Full Changelog: v0.11.3...v0.12.0

Assets 30

01 Nov 15:30

github-actions

v0.11.3

a9cfb46

v0.11.3: Minor fixes

Bugfixes

Fix duplicate depends_on in webarena metadata #228

Improvements

Easier webarena / visualwebarena setup with (running nltk.download() at import time) #227
More robust full_reset() for webarena / visualwebarena #230
Removed ARIA extraction warnings #233
New benchmark configuration webarena_tiny #232

Full Changelog: v0.11.2...v0.11.3

Assets 30

30 Oct 20:25

github-actions

v0.11.2

62839c3

v0.11.2: Minor fix

Bugfixes

Add incomplete ExpResult.status #225

Full Changelog: v0.11.1...v0.11.2

Assets 30

30 Oct 19:29

github-actions

v0.11.1

7b8abd3

v0.11.1: Benchmark update

New features

Set max steps to 30 in webarena / visualwenarena benchmarks #214
Benchmark dependency graph utilities #220
Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224

Bugfixes

Rename benchmark after subset_from_split() #221
ExpArgs.exp_dir sanitization #222
get_step_info() bugfix #223

Full Changelog: v0.11.0...v0.11.1

Assets 30

30 Oct 15:38

github-actions

v0.11.0

d33edf0

v0.11.0: WebLINX 🎉

New features

browsergym-experiments

New weblinx benchmark 🎉 #208 (thanks @xhluca)
New ExpResults.status() #219 (thanks @recursix)

browsergym-core

New hide_all_bids option in flatten_dom_to_str() and flatten_axtree_to_str() #212 (thanks @imenelydiaker)
Leaner Unicode() gym space #218

Bugfixes

Benchmark.prepare_backends() fixes #209

Full Changelog: v0.10.2...v0.11.0

Contributors

recursix, xhluca, and imenelydiaker

Assets 30

24 Oct 15:38

github-actions

v0.10.2

a9e44a8

v0.10.2: Benchmark update

New features

New Benchmark.prepare_backend() method #204

Bugfixes

save_step_info() bugfix when obs==None (truncated episode due to None action) #207

Full Changelog: v0.10.1...v0.10.2

Assets 30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

What's Changed

What's Changed

Full Changelog

What's changed

Bugfixes

Bugfixes

Improvements

Bugfixes

New features

Bugfixes

New features

Bugfixes

Contributors

New features

Bugfixes

Releases: ServiceNow/BrowserGym

v0.13.3: minor fixes

What's Changed

v0.13.2: experiments updates

What's Changed

v0.13.1: Many small fixes

What's Changed

Full Changelog

v0.13.0: Minor updates

What's changed

v0.12.0: VisualWebarena / WebLINX bugfixes

Bugfixes

v0.11.3: Minor fixes

Bugfixes

Improvements

v0.11.2: Minor fix

Bugfixes

v0.11.1: Benchmark update

New features

Bugfixes

v0.11.0: WebLINX 🎉

New features

Bugfixes

Contributors

v0.10.2: Benchmark update

New features

Bugfixes