Skip to content

Repo Map Accuracy #45

@dwash96

Description

@dwash96

The way the repository map works is that:

  • Treesitter extracts a lists of definitions and reference identifiers per file
  • These identifiers are chunked into larger maps of identifier -> definer files and identifier -> reference files
  • A graph is computed from these lists (which introduced duplication as seen here
  • The weights for these multipliers have fairly naive static values (number of references, number of definers)

The gist is that turning the determination of the multipliers to equations should allow for a more dynamic proxy for "relevancy" in the repo map construction, which should give the LLM at least marginally better context to operate off of. My hypothesis is that logarithmic equations based on the number of overall definers (to detect rote/boiler plate code) and on the number of references (in terms of unique files, and absolute references) will serve as a better checker and balancer for how relevant/central identifiers and files are inside of codebases

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions