Skip to content

Releases: ServiceNow/BrowserGym

v0.11.3: minor fixes

01 Nov 15:30
Compare
Choose a tag to compare

Bugfixes

  • Fix duplicate depends_on in webarena metadata #228

Improvements

  • Easier webarena / visualwebarena setup with (running nltk.download() at import time) #227
  • More robust full_reset() for webarena / visualwebarena #230
  • Removed ARIA extraction warnings #233
  • New benchmark configuration webarena_tiny #232

Full Changelog: v0.11.2...v0.11.3

v0.11.2

30 Oct 20:25
Compare
Choose a tag to compare
version bump 0.11.2

v0.11.1: Benchmark update

30 Oct 19:29
Compare
Choose a tag to compare

New features

  • Set max steps to 30 in webarena / visualwenarena benchmarks #214
  • Benchmark dependency graph utilities #220
  • Include nltk.download() in prepare_backend() for webarena / visualwebarena benchmarks #224

Bugfixes

  • Rename benchmark after subset_from_split() #221
  • ExpArgs.exp_dir sanitization #222
  • get_step_info() bugfix #223

Full Changelog: v0.11.0...v0.11.1

v0.11.0: WebLINX 🎉

30 Oct 15:38
Compare
Choose a tag to compare

New features

browsergym-experiments

browsergym-core

  • New hide_all_bids option in flatten_dom_to_str() and flatten_axtree_to_str() #212 (thanks @imenelydiaker)
  • Leaner Unicode() gym space #218

Bugfixes

  • Benchmark.prepare_backends() fixes #209

Full Changelog: v0.10.2...v0.11.0

v0.10.2: Benchmark update

24 Oct 15:38
Compare
Choose a tag to compare

New features

  • New Benchmark.prepare_backend() method #204

Bugfixes

  • save_step_info() bugfix when obs==None (truncated episode due to None action) #207

Full Changelog: v0.10.1...v0.10.2

v0.10.1: Benchmark updates

23 Oct 19:06
Compare
Choose a tag to compare

Minor changes

  • train / test splits for WorkArena L2 and L3 tasks #203
  • More fine-grained per-benchmark action sets #202

Full Changelog: v0.10.0...v0.10.1

v0.10.0: AssistantBench! 🎉

23 Oct 14:50
Compare
Choose a tag to compare

New features

  • New BrowserGym benchmark AssistantBench, packaged as browsergym-assistantbench. Thanks @oriyor ! #186
    import browsergym.assistantbench
    
    env = gym.make("browsergym/assistantbench.validation.12")
    env = gym.make("browsergym/assistantbench.test.42")
  • Default train/test splits for all benchmarks
    miniwob = DEFAULT_BENCHMARKS["miniwob"]  # 125 tasks x 5 seeds
    miniwob_train = miniwob.subset_from_split("train")  # 62 tasks x 5 seeds
    miniwob_test = miniwob.subset_from_split("test")  # 63 tasks x 5 seeds

Breaking Changes

  • Various updates and refactors to the new Benchmark class #197 #198 #199

Fixes

  • Improved experiment logging #182

Full Changelog: v0.9.0...v0.10.0

v0.9.0: Benchmarks! 🎉

19 Oct 01:16
Compare
Choose a tag to compare

New features

  • Benchmarks with default config (tasks x seeds) and metadata #173 #191
    from browsergym.experiments import BENCHMARKS, Benchmark
    
    # make a custom benchmark
    benchmark = Benchmark(
      name="miniwob_click_test",
      high_level_action_set_args=HighLevelActionSetArgs(
        subsets=["bid"],
        multiaction=False,
        strict=False,
        retry_with_force=False,
        demo_mode="off",
      ),
      env_args_list=[
        EnvArgs(
          task_name="miniwob.click-test",
          task_seed=42,
          max_steps=5,
       )
      ],
    )
    
    # use a pre-existing benchmark
    miniwob = BENCHMARKS["miniwob_all"]()
    
    # use only a task subset
    miniwob_original = miniwob.subset_from_glob(
     column="miniwob_category", glob="original"
    )
  • New playwright key modifier "ControlOrMeta" #187
  • Global demo_mode flag #177
    import browsergym.core.action
    
    browsergym.core.action.set_global_demo_mode(True)  # boolean

Fixes

  • Multi-tab actions fix #188

Full Changelog: v0.8.1...v0.9.0

v0.8.1 - SoM bugfix

15 Oct 20:04
Compare
Choose a tag to compare

Fixes

browsergym-core

  • fixed a bug with set-of-marks line drawing #184 #185

v0.8.0: goal_object

08 Oct 21:40
Compare
Choose a tag to compare

browsergym-core

  • Breaking changes
    • goal refactor #110
      obs["goal_object"] now replaces the old obs["goal_image_urls"]
      obs["goal"] is now deprecated
      the new goal_object now contains a list of openai-style messages, which can include an arbitrary mix of text and / or images.

browsergym-visualwebarena

  • Breaking changes

    • goal refactor #110, the goal is now a list of openai-style messages with goal images as base64 image_url messages.
  • Fixes

    • goal images are now self-hosted as part of the homepage #171 #165

browsergym-experiments

  • Improvements
    • leaner trace files #169

other

  • the legacy demo agent has been removed
  • the basic demo agent has been leaned out and upgraded to support the new goal_object format #110
  • other minor changes #166 #164