Skip to content

How to format FIM prompts with multiple imported files for context? #12

@klakovsky

Description

@klakovsky

I'm trying to use Seed-Coder for Fill-in-the-Middle inference and want to understand the proper way to format prompts when including multiple files from imports along with the current file.

According to the paper, during training you used different approaches for multi-file context:

  • Repository-level data with topological concatenation based on file dependencies
  • Commits data with BM25-retrieved top-5 relevant files
  • CrossCode evaluation with cross-file context

However, I couldn't find clear documentation on how to format the inference prompt when I have:

  1. A current file with imports
  2. Multiple imported files that I want to include as context
  3. A specific location where I want the model to fill in code

Example

# utils.py
def helper_function():
    return "helper"

class DataProcessor:
    def process(self, data):
        return data.upper()

# config.py  
DATABASE_URL = "sqlite:///app.db"
DEBUG = True

# main.py (current file)
from utils import helper_function, DataProcessor
from config import DATABASE_URL

def main():
    # <-- WANT TO FILL HERE
    return result

Request

Could you provide:

  • Official guidance on multi-file FIM prompt formatting
  • Examples of best practices
  • Any specific format that was used during training
  • Token limit recommendations when including multiple files

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions