fix: pricing integration tests -> trying more runs for cache and retries#3546
fix: pricing integration tests -> trying more runs for cache and retries#3546michaelneale merged 3 commits intomainfrom
Conversation
|
Personally I think some these of tests don't add a lot of value and we should delete all except for 'model_in_open_router', 'model_not_in_open_router' and maybe the concurrent one.
|
|
@jsibbison-square agree - yeah I don't think they add a lot, could drop some of them? |
|
Hi @michaelneale and @jsibbison-square , I had the look into the build failure reason here. I feel below is the the reason why the test is not stable. In our tests, we use With the above code, each tests created same directory as a cache dir because they shared the same process id and the same environment variable. In each test cases, it set the cache_dir first and remove the dir. However, since they are using the same dir name, it causes each test case use a shared file which caused the failure when running in parallel. i guess the options to fix instabilities and make sure test isolation are:
I also went through the test cases (as integration tests)
I feel when we use external services in the test cases, it also brings instability. maybe some of them can be converted to unit tests, and we can have a single integration test. |
|
even some retries on external may be ok, and testing cache as one shot is not right, need to let it run some iterations (comparison at microsecond level does seem a bit meaningless for that test though) |
|
thanks @lifeizhou-ap switched to using proper tempfile - that should help with some, still has the retries in there and gives the cache a good number of runs to ensure it gets faster to stick to intent of test (others are in there). |
* main: (69 commits) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) goose docs MCP server (#3665) Remove confusing status output when testing sharing url connection and it shows 404 (#3659) chore: use typed notifications from rmcp (#3653) feat: convert GetPromptResult from mcp_core to rmcp version (#3650) feat: Replace usage of mcp_core Tools/ToolAnnotations in openapi schema (#3649) fix: ensure execution task result is shown (#3629) docs: Quick spotlight fix (#3633) alexhancock/rmcp-tools-annotations (#3617) fix: clean up subagent (#3565) Adds the `WaitingForUserInput` state (#3620) ...
…cn/compact2-task-tracking * 'dkatz/goose-compact2' of github.com:block/goose: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...
* dkatz/goose-compact2: (22 commits) rm stray files unused fmt fix threshold autocompact splice last message fmt Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546) Add inline python extension (#3107) fix: add maintainer, homepage and categories to DEB/RPM package config (#3096) blog: agent to agent convo (#3677) Possible to disable random thinking messages (#3304) Two VS code tutorials (#3603) small blog fixes (#3549) docs: fix installation command for YouTube Transcript MCP in servers.json (#3595) Docs for using Docker Model Runner as a local LLM provider. (#3509) Docs: VS Code Extension move to tutorials (#3601) Fix working directory when session has no messages (#3513) ...
* main: blog: streamlining detection development w/ recipes (#3689) fix: have option for cli providers to use their configured or default model (#3683) docs: new blog post and corrections to an old one on goosehints (#3657) Resolve sub recipe path relative to the parent recipe path (#3642) Speed up recipe loading from deeplinks and various fixes (#3662) fix cmd + , not opening settings (#3694) Add warning when JSON env parsing fails. (#3696) chore: refactor session naming into provider (#3678) feat (ui): File picker for scheduling recipes default to recipe dir (#3611) fix: address issue with streamable http interactions via mcp (#3693) Provider scenario tests (#3688) Fix conversations before they hit the LLM (#3660) cli: add detailed instruction for WSL users (#3496) feat: recipe runs will now prompt for missing extension secrets (#3668) fix: pricing integration tests -> trying more runs for cache and retries (#3546)
…ies (block#3546) Signed-off-by: Adam Tarantino <tarantino.adam@hey.com>
these tests seem to fail a bit - due to availability of an api, and also testing with microseconds for cache timing, so runs many iterations of latter to check caching is faster (is a bit of an odd test in the first place to test in CI)