-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
I'm trying to use Seed-Coder for Fill-in-the-Middle inference and want to understand the proper way to format prompts when including multiple files from imports along with the current file.
According to the paper, during training you used different approaches for multi-file context:
- Repository-level data with topological concatenation based on file dependencies
- Commits data with BM25-retrieved top-5 relevant files
- CrossCode evaluation with cross-file context
However, I couldn't find clear documentation on how to format the inference prompt when I have:
- A current file with imports
- Multiple imported files that I want to include as context
- A specific location where I want the model to fill in code
Example
# utils.py
def helper_function():
return "helper"
class DataProcessor:
def process(self, data):
return data.upper()
# config.py
DATABASE_URL = "sqlite:///app.db"
DEBUG = True
# main.py (current file)
from utils import helper_function, DataProcessor
from config import DATABASE_URL
def main():
# <-- WANT TO FILL HERE
return resultRequest
Could you provide:
- Official guidance on multi-file FIM prompt formatting
- Examples of best practices
- Any specific format that was used during training
- Token limit recommendations when including multiple files
Thank you!
KulikovaDarya, cdjq42, muzykantov, MarenkovIgor, rokirovochka and 8 morekotlyar-shapirov
Metadata
Metadata
Assignees
Labels
No labels