Add customizable evaluation dimensions #256

bugsz · 2024-11-28T20:58:00Z

Provide a way for people to customize the evaluation dimensions they want.
Currently use CustomEvaluationDimension to specify a dimension, and CustomEvaluationDimensionList to group.
To create the dimension, one can directly use a dictionary, or compose the existing metrics by specifying their names.

TODOs:

Move the file to evaluator.py
Figure out the best practice to store and use these dimensions (is using the name to specify the best way?)

Closes #

📑 Description

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

codecov · 2024-11-28T20:59:24Z

Codecov Report

Attention: Patch coverage is 38.09524% with 39 lines in your changes missing coverage. Please review.

Project coverage is 35.94%. Comparing base (dbd8294) to head (6155292).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
sotopia/database/evaluation_dimensions.py	37.09%	39 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (dbd8294) and HEAD (6155292). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (dbd8294) HEAD (6155292)

3 2

@@             Coverage Diff             @@
##             main     #256       +/-   ##
===========================================
- Coverage   74.47%   35.94%   -38.54%     
===========================================
  Files          61       50       -11     
  Lines        3162     2557      -605     
===========================================
- Hits         2355      919     -1436     
- Misses        807     1638      +831

Files with missing lines	Coverage Δ
sotopia/database/__init__.py	`95.45% <100.00%> (-4.55%)`	⬇️
sotopia/database/evaluation_dimensions.py	`37.09% <37.09%> (ø)`

... and 32 files with indirect coverage changes

XuhuiZhou

Great job, @bugsz can you also add relevant docs?

examples/experiment_eval.py

XuhuiZhou · 2024-11-30T22:22:53Z

sotopia/database/evaluation_dimensions.py

+        return validator
+
+    @staticmethod
+    def generate_dimension_model(dimension_ids: list[str]) -> Type[BaseModel]:


what is this function used for?

create an validator for the evaluation metric?

why name it as generate? then

Also consider add a docstring explanation here?

XuhuiZhou · 2024-11-30T22:24:49Z

sotopia/database/evaluation_dimensions.py

+        )
+
+    @staticmethod
+    def generate_dimension_model_from_name(


rename this? Also can we get rid of the printing here? And add the existing names to the doc?

sotopia/database/evaluation_dimensions.py

XuhuiZhou · 2024-11-30T22:31:23Z

sotopia/database/evaluation_dimensions.py

+    range_low: int = Field(index=True)
+
+
+class CustomEvaluationDimensionList(JsonModel):


can we use this as a model to save a set of dimensions? like same name sotopia, then it automatically retrieve all the sotopia original dimensions and would be ready to use

Yes that is what I am thinking. Do we want to allow different evaluation metrics to have the same name?

What do you mean? E.g., ?

For example, there is an original sotopia dimension and a refined one, say
[old] goal: provide a goal score of 1-10 where higher score indicates higher completion
[new] goal: provide a goal score of 1-10, where 1-3: xxx, 4-6: yyy

I think this is something we do not want to see, but sometimes we might need to have these two at the same time?

ProKil · 2024-12-04T02:09:47Z

@bugsz Could you fix the mypy tests first?

bugsz · 2024-12-04T05:21:23Z

@bugsz Could you fix the mypy tests first?

Fixed. Will update the doc

XuhuiZhou

Add pytests and be sure to only merge to /demo instead of /main
Make sure to update the API doc?
This change would break a lot of places!! please update them accordingly

(basically check anywhere mentioning ReachGoalLLMEvaluator?)

XuhuiZhou · 2024-12-03T22:59:12Z

sotopia/database/evaluation_dimensions.py

+    range_low: int = Field(index=True)
+
+
+class CustomEvaluationDimensionList(JsonModel):


What do you mean? E.g., ?

XuhuiZhou · 2024-12-03T23:00:05Z

sotopia/database/evaluation_dimensions.py

+        return validator
+
+    @staticmethod
+    def generate_dimension_model(dimension_ids: list[str]) -> Type[BaseModel]:


why name it as generate? then

Also consider add a docstring explanation here?

XuhuiZhou · 2024-12-06T01:31:34Z

docs/pages/concepts/evaluation.mdx

remove this then?

XuhuiZhou · 2024-12-06T01:35:08Z

examples/experiment_eval.py

@@ -108,6 +108,20 @@ def _iterate_env_agent_combo_not_in_db(
    env_ids: list[str] = [],
    tag: str | None = None,
 ) -> Generator[EnvAgentCombo[Observation, AgentAction], None, None]:
+    # method 1 for loading evaluation metric
+    evaluation_dimensions = (


It is not a good thing to write things like this? (remove method 1?) move it as an example to the doc?

XuhuiZhou · 2024-12-07T22:22:48Z

docs/pages/concepts/evaluation_dimension.md

@@ -0,0 +1,92 @@
+## Overview
+
+Evaluation dimensions are used to evaluate the quality of social interactions.


make sure mention they can use SotopiaDimension here as well, and people don't need to initialize the database when using that

XuhuiZhou · 2024-12-07T22:23:39Z

docs/pages/concepts/evaluation.mdx

Either move the content from docs/pages/concepts/evaluation_dimension.md to here or either remove this file?

And don't forget to update the API doc? you can use chatgpt to do that for ya, but it's good that we have it

XuhuiZhou · 2024-12-07T22:52:27Z

tests/database/test_database.py

+    )
+    custom_dimension.save()
+    pk = custom_dimension.pk
+    dimension = CustomEvaluationDimension(uuid_str=pk)


CustomEvaluationDimension(uuid_str=pk) this is not how you fetch data from the Redis database

…(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev>

* Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Add customizable evaluation dimensions (#256) * add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix ci error * solving pytests * improve the tests * add custom eval fast api (#268) * fix mypy error * aact moderator (#257) * initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> * Deploy the api to modal (#267) * prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> * Feature/sotopia demo UI (#261) * initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * remove dev tag * add custom eval * base dimension * fix ui mypy * fix mypy * add delete dimension * update streamlit ui * ignores the ui directory * Committing changes before push * pytest for eval dimension * fix mypy * clean up comments * run non streaming simulation * add pytest for websocket * fix mypy issue --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com>

* Update langchain requirement from <0.3.0,>=0.2.5 to >=0.2.5,<0.4.0 (#236) * [Automated] Merge release into main (#235) * bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function * fix sampling error * fix rc.4 * new tag * bump version * update workflow permission * add why sotopia * improve the why sotopia * bump version * further add clarification to the custom models --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * update macos test runner as macos-latest (#238) * update macos test runner as macos-latest * redis-stack-server * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat(exp_eval): support tag for combo iteration (#245) * support tag for combo iteration * fix pre-commit error * [autofix.ci] apply automated fixes * fix mypy error --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: Adding Chat print node for pretty printing chat conversation between agents (#250) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Adding scene context for agents * [autofix.ci] apply automated fixes * Added chat print node * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Moving chat node to experiment * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving chat node to experiment * Updating the toml file paths --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * doc: API endpoints for sotopia (#242) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * remove files * update requirement * update return message from the server * add restful error code * Update README.md * remove paks --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: FastAPI Implementation of Sotopia Part One (wo websocket) (#246) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * fastapi server wo websocket * update project toml * add websocket api and a sample client sample * add initial test * post update * finalize the doc * fix the create * finish test * add files * change chata to api * fix mypy error * fix mypy bug * fix mypy * create mock agents * downgrade langchain openai upperbound to fix CI --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> * update the instruction to load existing data via docker (#255) * fix doc (#258) * Sotopia API and UI (#264) * feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Add customizable evaluation dimensions (#256) * add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix ci error * solving pytests * improve the tests * add custom eval fast api (#268) * fix mypy error * aact moderator (#257) * initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> * Deploy the api to modal (#267) * prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> * Feature/sotopia demo UI (#261) * initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * remove dev tag * add custom eval * base dimension * fix ui mypy * fix mypy * add delete dimension * update streamlit ui * ignores the ui directory * Committing changes before push * pytest for eval dimension * fix mypy * clean up comments * run non streaming simulation * add pytest for websocket * fix mypy issue --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Haofei Yu <1125027232@qq.com> Co-authored-by: Arpandeep Khatua <arpandeepk@gmail.com> Co-authored-by: Arpandeep Khatua <54747935+akhatua2@users.noreply.github.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com>

* bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function * fix sampling error * fix rc.4 * new tag * bump version * update workflow permission * add why sotopia * improve the why sotopia * bump version * further add clarification to the custom models * UI update (#273) * Update langchain requirement from <0.3.0,>=0.2.5 to >=0.2.5,<0.4.0 (#236) * [Automated] Merge release into main (#235) * bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function * fix sampling error * fix rc.4 * new tag * bump version * update workflow permission * add why sotopia * improve the why sotopia * bump version * further add clarification to the custom models --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * update macos test runner as macos-latest (#238) * update macos test runner as macos-latest * redis-stack-server * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat(exp_eval): support tag for combo iteration (#245) * support tag for combo iteration * fix pre-commit error * [autofix.ci] apply automated fixes * fix mypy error --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: Adding Chat print node for pretty printing chat conversation between agents (#250) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Adding scene context for agents * [autofix.ci] apply automated fixes * Added chat print node * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Moving chat node to experiment * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving chat node to experiment * Updating the toml file paths --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * doc: API endpoints for sotopia (#242) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * remove files * update requirement * update return message from the server * add restful error code * Update README.md * remove paks --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: FastAPI Implementation of Sotopia Part One (wo websocket) (#246) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * fastapi server wo websocket * update project toml * add websocket api and a sample client sample * add initial test * post update * finalize the doc * fix the create * finish test * add files * change chata to api * fix mypy error * fix mypy bug * fix mypy * create mock agents * downgrade langchain openai upperbound to fix CI --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> * update the instruction to load existing data via docker (#255) * fix doc (#258) * Sotopia API and UI (#264) * feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Add customizable evaluation dimensions (#256) * add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix ci error * solving pytests * improve the tests * add custom eval fast api (#268) * fix mypy error * aact moderator (#257) * initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> * Deploy the api to modal (#267) * prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> * Feature/sotopia demo UI (#261) * initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * remove dev tag * add custom eval * base dimension * fix ui mypy * fix mypy * add delete dimension * update streamlit ui * ignores the ui directory * Committing changes before push * pytest for eval dimension * fix mypy * clean up comments * run non streaming simulation * add pytest for websocket * fix mypy issue --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Haofei Yu <1125027232@qq.com> Co-authored-by: Arpandeep Khatua <arpandeepk@gmail.com> Co-authored-by: Arpandeep Khatua <54747935+akhatua2@users.noreply.github.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * bump release version --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Haofei Yu <1125027232@qq.com> Co-authored-by: Arpandeep Khatua <arpandeepk@gmail.com> Co-authored-by: Arpandeep Khatua <54747935+akhatua2@users.noreply.github.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com>

* bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function * fix sampling error * fix rc.4 * new tag * bump version * update workflow permission * add why sotopia * improve the why sotopia * bump version * further add clarification to the custom models * UI update (#273) * Update langchain requirement from <0.3.0,>=0.2.5 to >=0.2.5,<0.4.0 (#236) * [Automated] Merge release into main (#235) * bump the version, test release to PyPi * Update README.md * Update README.md * Update README.md * bumpy version to 0.0.9 * Update Sotopia presentation information in README.md * bump version to 0.0.10 * bump version * add merge release back to main action * change checkout v4->v3 * fix merge-back-to-main and pin mypy to <1.11.0 * merge bug fix * upgrade default model to handle bad-foratted outputs to gpt-4o-mini as gpt-3.5-turbo is deprecated (#183) * update pull request -> pull request target * bump version * Add `bad_output_process_model` option and `use_fixed_model_version` option for all generation methods, to avoid future OpenAI API changes break Sotopia running. (#196) * Two major updates: 1) add "bad_output_process_model" option to all `agenerate_xxx()` methods so users can decide which model to use for handling bad outputs. By default, this is set to be `gpt-4o-mini`. 2) add `use_fixed_model_version` option for all generation methods, as some fixed model version may no longer available in the future. Users should have the right to bypass the fixed model version mapping instead of getting stuck in an error. Document (`generation.md`) has been updated for these two major changes correspondingly. * [autofix.ci] apply automated fixes --------- Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix gpt-3.5 * replace gpt3.5 turbo for tests * update gpt-3.5-turbo to gpt-4o-mini * bug fix for return fixed model version function * fix sampling error * fix rc.4 * new tag * bump version * update workflow permission * add why sotopia * improve the why sotopia * bump version * further add clarification to the custom models --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * update macos test runner as macos-latest (#238) * update macos test runner as macos-latest * redis-stack-server * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat(exp_eval): support tag for combo iteration (#245) * support tag for combo iteration * fix pre-commit error * [autofix.ci] apply automated fixes * fix mypy error --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: Adding Chat print node for pretty printing chat conversation between agents (#250) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Adding scene context for agents * [autofix.ci] apply automated fixes * Added chat print node * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Moving chat node to experiment * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * feat: Initial Message for LLM agnets in Sotopia Aact (Experimental) (#249) * Adding OpenHands node * [autofix.ci] apply automated fixes * Added LLM Agent Node * [autofix.ci] apply automated fixes * removing openhands * [autofix.ci] apply automated fixes * Correcting mypy error in tick agent * Moving everything to examples since its not specific to sotopia * [autofix.ci] apply automated fixes * Adding some additional final checks * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Renaming to interview openhands * Adding documentation * [autofix.ci] apply automated fixes * Updating the readme * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding the initial message channel * Rebasing --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Adding scene context for agents * [autofix.ci] apply automated fixes * Renaming scene context to initial message * Adding scene context for agents * Added chat print node * [autofix.ci] apply automated fixes * Adding scene context for agents * [autofix.ci] apply automated fixes * Fixing the runtime channel * Moving chat node to experiment * Updating the toml file paths --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * doc: API endpoints for sotopia (#242) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * remove files * update requirement * update return message from the server * add restful error code * Update README.md * remove paks --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: FastAPI Implementation of Sotopia Part One (wo websocket) (#246) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * fastapi server wo websocket * update project toml * add websocket api and a sample client sample * add initial test * post update * finalize the doc * fix the create * finish test * add files * change chata to api * fix mypy error * fix mypy bug * fix mypy * create mock agents * downgrade langchain openai upperbound to fix CI --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> * update the instruction to load existing data via docker (#255) * fix doc (#258) * Sotopia API and UI (#264) * feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252) * api doc * add PUT * add an temp example for websocket * websocket * update readme * Update README.md * update websocket live simulation api doc * [autofix.ci] apply automated fixes * update websocket doc * add api server with websocket as well as a client * fix mypy errors * support stopping the chat * add 404 to the status code * fix mypy issue * update the returned message types * redesign websocket api * update websocket, fix mypy error * add example of using websocket * clean code & change to existing functions for simulation * fix typing mismatch * update doc & mypy type fix * add type check for run_async_server * move example --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Add customizable evaluation dimensions (#256) * add customizable evaluation dimensions * add docs * fix mypy error & refactor examples * add docs for evaluation dimensions * update docs and examples * add test cases and fix mypy issue * fix mypy issue * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev> * Fix/custom eval dimension test (#263) * Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) * Update documentation for SotopiaDimension and EvaluationDimensionBuilder * [autofix.ci] apply automated fixes * Add API documentation for evaluation dimensions * Refine API documentation for evaluation_dimensions.py to match style * [autofix.ci] apply automated fixes --------- Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * add doc --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265) * temp run * add relationship api * fix mypy error * update relationship api * simulate episode non-streaming * modify sim episodes * add simulation status * task error * add background task * [autofix.ci] apply automated fixes * back to arun one episode * upload the code * use rq to execute background tasks * temp sol --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * fix ci error * solving pytests * improve the tests * add custom eval fast api (#268) * fix mypy error * aact moderator (#257) * initial framework * initial conv * fix module error * feat: Add 3 new features to Moderator (#266) * feat:introduce booting procedure, saving, and ending chat to moderator * fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping * merge changes of example into the original one * fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec * fix: rewrite booting() so that different agent will receive different background information * fix: moderator now inherits from Node directly, instead of from BaseAgent --------- Co-authored-by: JXZhou <JXZhou> * add save condition for moderator * push to db false * to fully stop * stopping all agents * fix mypy * fix mypy error --------- Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> * Deploy the api to modal (#267) * prototype for modal serving * add openai secret * fix type annotation * add doc * bug fix for simulation api * add customize model, evaluator model and evaluation dimensions * Implement modal API server with Redis integration and FastAPI setup - Added a new script for the modal API server that initializes a Redis instance. - Created a persistent volume for Redis data and included a function to download initial data if not present. - Configured a Docker image with necessary dependencies including Redis Stack and FastAPI. - Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests. - Integrated the SotopiaFastAPI application within the modal framework. --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> * Feature/sotopia demo UI (#261) * initial * initial ui * merge main * add new ui * switch to fastAPI * websocket check * fix render episode error * add page; make a simplified page and still WIP * [autofix.ci] apply automated fixes * fix simplified streaming version * semi-done character page + avatar assets * Fixed character card styling * [autofix.ci] apply automated fixes * unified rendering and chat display * updated chat character icons * add some tags * add typing * temp fix * add characters avatar to simulation * fix episode full avatar * go to modal config * clean up code * add modal streamlit app * clean codebase except websocket * remove repeated local css * clean websocket * fix get name error * fix errors * pre render scenario * add custom eval * change streamlit to dynamic path * new uv * revert to previous install commands * a fix for modal * add customized dimension * [autofix.ci] apply automated fixes * sort scenarios in simulation * for demo video * update deploy instruction * update intro page * update intro page * [autofix.ci] apply automated fixes * update intro page * add customized dimensions * update api link and modal environment * move folder * fix relative import * update modal image build * use uv to build environment * change folder name * change test * fix modal serve * environment change * refactor * fix ui --------- Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * remove dev tag * add custom eval * base dimension * fix ui mypy * fix mypy * add delete dimension * update streamlit ui * ignores the ui directory * Committing changes before push * pytest for eval dimension * fix mypy * clean up comments * run non streaming simulation * add pytest for websocket * fix mypy issue --------- Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> --------- Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hao Zhu <prokilchu@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Haofei Yu <1125027232@qq.com> Co-authored-by: Arpandeep Khatua <arpandeepk@gmail.com> Co-authored-by: Arpandeep Khatua <54747935+akhatua2@users.noreply.github.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com> * bump release version * change package version --------- Co-authored-by: XuhuiZhou <zhouxuhui2018@gmail.com> Co-authored-by: Chenghao (Alan) Yang <chenghao@uchicago.edu> Co-authored-by: Chenghao Yang <yangalan1996@gmail.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Haofei Yu <1125027232@qq.com> Co-authored-by: Arpandeep Khatua <arpandeepk@gmail.com> Co-authored-by: Arpandeep Khatua <54747935+akhatua2@users.noreply.github.com> Co-authored-by: Zhe Su <360307598@qq.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: JXZhou <156194797+JXZhou0224@users.noreply.github.com> Co-authored-by: astrophie <sophie.w.feng@gmail.com>

add customizable evaluation dimensions

4e78730

XuhuiZhou requested changes Nov 30, 2024

View reviewed changes

XuhuiZhou reviewed Nov 30, 2024

View reviewed changes

add docs

974cd88

fix mypy error & refactor examples

4355c79

add docs for evaluation dimensions

6e5754b

XuhuiZhou requested changes Dec 6, 2024

View reviewed changes

bugsz added 3 commits December 6, 2024 21:36

update docs and examples

0368a0c

add test cases and fix mypy issue

fd1e984

fix mypy issue

6155292

XuhuiZhou changed the base branch from main to demo December 7, 2024 22:14

XuhuiZhou requested changes Dec 7, 2024

View reviewed changes

XuhuiZhou and others added 3 commits December 8, 2024 15:02

Fix test_create_custom_dimension to use CustomEvaluationDimension.get…

e9f1cd9

…(pk) (#262) Co-authored-by: openhands <openhands@all-hands.dev>

add doc

24c1fb1

XuhuiZhou changed the base branch from demo to main December 8, 2024 20:43

XuhuiZhou changed the base branch from main to demo December 8, 2024 20:49

XuhuiZhou approved these changes Dec 8, 2024

View reviewed changes

XuhuiZhou merged commit 5a9f4b7 into demo Dec 8, 2024
1 check passed

ProKil deleted the feature/customize-eval-dimension branch February 1, 2025 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add customizable evaluation dimensions #256

Add customizable evaluation dimensions #256

bugsz commented Nov 28, 2024

codecov bot commented Nov 28, 2024 •

edited

Loading

XuhuiZhou left a comment

XuhuiZhou Nov 30, 2024

bugsz Dec 3, 2024

XuhuiZhou Dec 3, 2024

XuhuiZhou Nov 30, 2024

XuhuiZhou Nov 30, 2024

bugsz Dec 3, 2024

XuhuiZhou Dec 3, 2024

bugsz Dec 6, 2024

ProKil commented Dec 4, 2024

bugsz commented Dec 4, 2024

XuhuiZhou left a comment

XuhuiZhou Dec 3, 2024

XuhuiZhou Dec 3, 2024

XuhuiZhou Dec 6, 2024

XuhuiZhou Dec 6, 2024

XuhuiZhou Dec 7, 2024

XuhuiZhou Dec 7, 2024

XuhuiZhou Dec 7, 2024

XuhuiZhou Dec 7, 2024

		range_low: int = Field(index=True)


		class CustomEvaluationDimensionList(JsonModel):

		@@ -0,0 +1,92 @@
		## Overview

		Evaluation dimensions are used to evaluate the quality of social interactions.

Add customizable evaluation dimensions #256

Add customizable evaluation dimensions #256

Conversation

bugsz commented Nov 28, 2024

📑 Description

✅ Checks

ℹ Additional Information

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

XuhuiZhou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ProKil commented Dec 4, 2024

bugsz commented Dec 4, 2024

XuhuiZhou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 28, 2024 •

edited

Loading