Skip to content

Limit what files and folders FileSurfer can access.#6024

Merged
afourney merged 7 commits intomainfrom
filesurfer_root
Mar 20, 2025
Merged

Limit what files and folders FileSurfer can access.#6024
afourney merged 7 commits intomainfrom
filesurfer_root

Conversation

@afourney
Copy link
Member

Optionally limit what files and folders FileSurfer can access (constraining it to a subtree of the FS).

This is not a replacement for Docker sandboxing, but can be used in conjunction with sandboxing to help prevent FileSurfer from accessing sensitive files.

@codecov
Copy link

codecov bot commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 59.52381% with 17 lines in your changes missing coverage. Please review.

Project coverage is 76.71%. Comparing base (262c74f) to head (b15d886).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...n_ext/agents/file_surfer/_markdown_file_browser.py 59.52% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6024      +/-   ##
==========================================
- Coverage   76.73%   76.71%   -0.03%     
==========================================
  Files         191      191              
  Lines       13170    13191      +21     
==========================================
+ Hits        10106    10119      +13     
- Misses       3064     3072       +8     
Flag Coverage Δ
unittests 76.71% <59.52%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@husseinmozannar husseinmozannar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is working, is there a change for how filesurfer is instantiated?

    file_surfer = FileSurfer(
        name="file_surfer",
        model_client=model_client_file_surfer,
        base_path="home/path1"
    )

I was able to get file_surfer to read files in path "home"

@afourney
Copy link
Member Author

afourney commented Mar 20, 2025

Not sure this is working, is there a change for how filesurfer is instantiated?

    file_surfer = FileSurfer(
        name="file_surfer",
        model_client=model_client_file_surfer,
        base_path="home/path1"
    )

I was able to get file_surfer to read files in path "home"

This is the correct signature for creating the FileSurfer.
I am not able to replicate this behavior.

My minimal test code is:

import asyncio
import sys
from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.file_surfer import FileSurfer

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    file_surfer = FileSurfer(
        name="FileSurfer",
        model_client = model_client,
        base_path=None #"/home/afourney/repos/autogen/python"
    )
    user_proxy = UserProxyAgent("user_proxy")
    termination = TextMentionTermination("exit", sources=["user_proxy"])
    team = RoundRobinGroupChat([file_surfer, user_proxy], termination_condition=termination)
    try:
        # Start the team and wait for it to terminate.
        await Console(team.run_stream(task=sys.argv[1]))
    finally:
        await file_surfer.close()
        await model_client.close()

asyncio.run(main())

@husseinmozannar
Copy link
Contributor

Not sure this is working, is there a change for how filesurfer is instantiated?

    file_surfer = FileSurfer(
        name="file_surfer",
        model_client=model_client_file_surfer,
        base_path="home/path1"
    )

I was able to get file_surfer to read files in path "home"

This is the correct signature for creating the FileSurfer. I am not able to replicate this behavior.

My minimal test code is:

import asyncio
import sys
from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.file_surfer import FileSurfer

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    file_surfer = FileSurfer(
        name="FileSurfer",
        model_client = model_client,
        base_path=None #"/home/afourney/repos/autogen/python"
    )
    user_proxy = UserProxyAgent("user_proxy")
    termination = TextMentionTermination("exit", sources=["user_proxy"])
    team = RoundRobinGroupChat([file_surfer, user_proxy], termination_condition=termination)
    try:
        # Start the team and wait for it to terminate.
        await Console(team.run_stream(task=sys.argv[1]))
    finally:
        await file_surfer.close()
        await model_client.close()

asyncio.run(main())

I tested again with a few different paths and it's now giving file not found errors which means it is working correctly right? I think in my earlier testing I messed up the paths.

Another thing which might be worth possibly doing is making base_path look like path "." in the filesurfer prompt.

For instance, I start filesurfer with base_path="/home/USER/autogen/python"

if I want it to open "/home/USER/autogen/python/autogen/README.md" it has to do:

[FunctionCall(id='CALL ID', arguments='{"path":"/home/USER/autogen/python/autogen/README.md"}', name='open_path')]

ideally it would be: arguments='{"path":"./autogen/README.md"}'

This also avoids having the LLM know your exact file structure if you want to hide that.

@afourney
Copy link
Member Author

Not sure this is working, is there a change for how filesurfer is instantiated?

    file_surfer = FileSurfer(
        name="file_surfer",
        model_client=model_client_file_surfer,
        base_path="home/path1"
    )

I was able to get file_surfer to read files in path "home"

This is the correct signature for creating the FileSurfer. I am not able to replicate this behavior.
My minimal test code is:

import asyncio
import sys
from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.file_surfer import FileSurfer

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    file_surfer = FileSurfer(
        name="FileSurfer",
        model_client = model_client,
        base_path=None #"/home/afourney/repos/autogen/python"
    )
    user_proxy = UserProxyAgent("user_proxy")
    termination = TextMentionTermination("exit", sources=["user_proxy"])
    team = RoundRobinGroupChat([file_surfer, user_proxy], termination_condition=termination)
    try:
        # Start the team and wait for it to terminate.
        await Console(team.run_stream(task=sys.argv[1]))
    finally:
        await file_surfer.close()
        await model_client.close()

asyncio.run(main())

I tested again with a few different paths and it's now giving file not found errors which means it is working correctly right? I think in my earlier testing I messed up the paths.

Another thing which might be worth possibly doing is making base_path look like path "." in the filesurfer prompt.

For instance, I start filesurfer with base_path="/home/USER/autogen/python"

if I want it to open "/home/USER/autogen/python/autogen/README.md" it has to do:

[FunctionCall(id='CALL ID', arguments='{"path":"/home/USER/autogen/python/autogen/README.md"}', name='open_path')]

ideally it would be: arguments='{"path":"./autogen/README.md"}'

This also avoids having the LLM know your exact file structure if you want to hide that.

Yeah, I did FileNotFound errors to not leak information -- but I'm not sure that's strictly necessary. An out of bounds error could trigger for any path outside the base, regardless of if it exists.

As for truncating the path... that's a good idea. We leak information otherwise.

@afourney
Copy link
Member Author

Not sure this is working, is there a change for how filesurfer is instantiated?

    file_surfer = FileSurfer(
        name="file_surfer",
        model_client=model_client_file_surfer,
        base_path="home/path1"
    )

I was able to get file_surfer to read files in path "home"

This is the correct signature for creating the FileSurfer. I am not able to replicate this behavior.
My minimal test code is:

import asyncio
import sys
from autogen_agentchat.agents import UserProxyAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.file_surfer import FileSurfer

async def main() -> None:
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    file_surfer = FileSurfer(
        name="FileSurfer",
        model_client = model_client,
        base_path=None #"/home/afourney/repos/autogen/python"
    )
    user_proxy = UserProxyAgent("user_proxy")
    termination = TextMentionTermination("exit", sources=["user_proxy"])
    team = RoundRobinGroupChat([file_surfer, user_proxy], termination_condition=termination)
    try:
        # Start the team and wait for it to terminate.
        await Console(team.run_stream(task=sys.argv[1]))
    finally:
        await file_surfer.close()
        await model_client.close()

asyncio.run(main())

I tested again with a few different paths and it's now giving file not found errors which means it is working correctly right? I think in my earlier testing I messed up the paths.
Another thing which might be worth possibly doing is making base_path look like path "." in the filesurfer prompt.
For instance, I start filesurfer with base_path="/home/USER/autogen/python"
if I want it to open "/home/USER/autogen/python/autogen/README.md" it has to do:
[FunctionCall(id='CALL ID', arguments='{"path":"/home/USER/autogen/python/autogen/README.md"}', name='open_path')]
ideally it would be: arguments='{"path":"./autogen/README.md"}'
This also avoids having the LLM know your exact file structure if you want to hide that.

Yeah, I did FileNotFound errors to not leak information -- but I'm not sure that's strictly necessary. An out of bounds error could trigger for any path outside the base, regardless of if it exists.

As for truncating the path... that's a good idea. We leak information otherwise.

@husseinmozannar I'm going to do the relative path stuff in another PR. For various reasons, it's a little more complex/deeper integrated, and I want to take care to do that right. At least for now, it's self-consistent and closes one avenue of risk.

@afourney afourney merged commit ecdb74b into main Mar 20, 2025
56 of 57 checks passed
@afourney afourney deleted the filesurfer_root branch March 20, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants