Skip to content

Add SWE-Playground Trajectories Dataset#161

Merged
yueqis merged 7 commits intoneulab:mainfrom
zhu-yiqi:main
Dec 30, 2025
Merged

Add SWE-Playground Trajectories Dataset#161
yueqis merged 7 commits intoneulab:mainfrom
zhu-yiqi:main

Conversation

@zhu-yiqi
Copy link
Contributor

Description

This PR adds the SWE-Play-trajectories dataset to the Agent Data Protocol repository. The dataset contains 704 high-quality software engineering task trajectories generated by AI agents based on OpenHands, providing valuable training data for agent fine-tuning.

Type of Change

  • New dataset
  • New agent
  • Bug fix
  • Documentation update
  • Other

Dataset Information

  • Name: SWE-Play-trajectories
  • Source: StephenZhu/SWE-Play-trajectories on HuggingFace
  • Size: 704 trajectories
  • Domain: Software Engineering
  • Format: Conversation-based trajectories with system/user/assistant messages
  • Task Types: Programming tasks including code editing, debugging, and file operations

Implementation Details

Files Added

The implementation follows the standard ADP dataset structure with all required files in datasets/swe-play-trajectories/:

  • README.md - Dataset documentation
  • schema_raw.py - Raw data schema definition
  • api.py - API function definitions (str_replace_editor, think, finish)
  • extract_raw.py - HuggingFace dataset extraction script
  • raw_to_standardized.py - Conversion to ADP standardized format
  • std_to_sft.py - Dataset-specific SFT conversion
  • sample_raw.json - 5 raw data samples (~1MB)
  • sample_std.json - 5 standardized format samples (~1.8MB)
  • sample_sft/sample_sft_openhands.json - OpenHands SFT format samples (~1.1MB)
  • sample_sft/sample_sft_sweagent.json - SWE-agent SFT format samples (~1MB)
  • sample_sft/sample_sft_agentlab.json - AgentLab placeholder

Schema Conversion

The dataset utilizes the following ADP schema components:

Actions:

  • ApiAction: Structured tool calls (str_replace_editor, think, finish)
  • CodeAction: Code execution (execute_bash, execute_ipython_cell)
  • MessageAction: Plain text responses without tool calls

Observations:

  • TextObservation (source="user"): Initial task descriptions
  • TextObservation (source="environment"): Execution results from tool calls

Key Features

  1. Comprehensive Parsing: Implements custom function call parser to extract structured actions from XML-like format (<function=name>...</function>)

  2. Multi-Agent Support: Includes SFT format samples for:

    • OpenHands
    • SWE-agent
    • AgentLab (placeholder)
  3. Proper Schema Validation: Uses Pydantic models for both raw and standardized formats

  4. API Definitions: Defines three core API functions used in the dataset:

    • str_replace_editor: File viewing, creation, and editing
    • think: Step-by-step reasoning
    • finish: Task completion

Testing

  • All required files are present and properly structured
  • Raw data extraction script works correctly
  • Standardized format conversion passes validation
  • SFT format conversion produces valid output for multiple agents
  • Sample files generated and validated
  • Schema validation passes with Pydantic models

Checklist

  • Code follows PEP 8 style guidelines
  • Type hints included where appropriate
  • Docstrings added for all functions
  • README documentation is comprehensive
  • All required sample files present and validated
  • No sensitive data included
  • Pre-commit hooks pass (ruff, mypy)
  • Integration with existing agent conversion scripts verified

Pre-commit File Size Note

⚠️ Important: The sample files in this dataset exceed the 500KB pre-commit file size limit:

  • sample_raw.json: ~1MB
  • sample_std.json: ~1.8MB
  • sample_sft_openhands.json: ~1.1MB
  • sample_sft_sweagent.json: ~1MB

The file size check was bypassed during commit because:

  1. These are legitimate sample files (5 samples as required by ADP guidelines)
  2. The trajectories are long and detailed, containing extensive tool usage and multi-step reasoning
  3. Software engineering tasks naturally produce verbose trajectories with code content
  4. The file sizes are necessary to demonstrate the full conversion pipeline

Reviewers may want to consider either:

  • Allowing this exception for datasets with naturally large trajectories
  • Updating the pre-commit file size limit for sample files
  • Reducing the number of samples (though 5 is the recommended minimum)

@neubig neubig requested a review from yueqis December 28, 2025 21:53
@yueqis
Copy link
Contributor

yueqis commented Dec 29, 2025

Could you explain a bit on where you need the std_to_sft.py file in the dataset's directory? If this is not needed, could you delete it?

@zhu-yiqi
Copy link
Contributor Author

Could you explain a bit on where you need the std_to_sft.py file in the dataset's directory? If this is not needed, could you delete it?

This is by mistake. I have deleted it.

@yueqis
Copy link
Contributor

yueqis commented Dec 30, 2025

Could you fix the checks? Thanks!

@zhu-yiqi
Copy link
Contributor Author

Could you fix the checks? Thanks!

Done!

@yueqis yueqis merged commit 307f3cb into neulab:main Dec 30, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants