Skip to content

Add filesystem nodes to complement code graph nodes #4

@rysweet

Description

@rysweet

Feature: Filesystem Nodes for Code Graph

Summary

Add a new type of node to the graph that represents the filesystem structure of the codebase. These filesystem nodes will be created after the AST and symbol binding nodes, providing a complementary view of the code organization from a filesystem perspective.

Motivation

While the current graph captures code structure (classes, functions, etc.), it doesn't explicitly represent the filesystem layout. Adding filesystem nodes will:

  • Enable queries about file dependencies and relationships
  • Provide insights into project organization
  • Allow analysis of which files implement specific functionality
  • Support file-based refactoring and reorganization decisions

Technical Design

1. New Node Type

Create a new node type FILESYSTEM with subtypes:

  • FILESYSTEM_FILE: Represents a file in the filesystem
  • FILESYSTEM_DIRECTORY: Represents a directory

Properties for filesystem nodes:

  • path: Absolute path to the file/directory
  • relative_path: Path relative to project root
  • name: File/directory name
  • type: "file" or "directory"
  • size: File size in bytes (files only)
  • extension: File extension (files only)
  • last_modified: Timestamp of last modification
  • permissions: File permissions (optional)

2. New Relationship Types

Add new relationship types to connect filesystem nodes:

  • IMPLEMENTS: Connects a filesystem file node to code nodes it contains
  • DEPENDS_ON: Connects filesystem nodes that have dependencies
  • FILESYSTEM_CONTAINS: Parent directory -> child file/directory
  • REFERENCED_BY_DESCRIPTION: When an LLM description mentions a file path

3. Implementation Plan

Phase 1: Infrastructure (Test-Driven)
  1. Write failing test for filesystem node creation
  2. Add FILESYSTEM to NodeLabels enum
  3. Create FilesystemNode base class
  4. Create FilesystemFileNode and FilesystemDirectoryNode classes
  5. Add new relationship types to RelationshipType enum
  6. Make tests pass
Phase 2: Filesystem Graph Generation
  1. Write failing test for filesystem graph generation
  2. Create FilesystemGraphGenerator class
  3. Implement filesystem traversal logic
  4. Create filesystem nodes for each file/directory
  5. Create FILESYSTEM_CONTAINS relationships
  6. Make tests pass
Phase 3: Integration with Code Graph
  1. Write failing test for IMPLEMENTS relationships
  2. Modify ProjectGraphCreator to call filesystem generation after AST creation
  3. Create IMPLEMENTS relationships between filesystem nodes and code nodes
  4. Implement logic to detect file dependencies and create DEPENDS_ON relationships
  5. Make tests pass
Phase 4: LLM Description Integration
  1. Write failing test for description->filesystem references
  2. Enhance description generator to detect file path mentions
  3. Create REFERENCED_BY_DESCRIPTION relationships
  4. Make tests pass

4. Testing Strategy

All tests will use Docker-based Neo4j to ensure idempotency:

# tests/test_filesystem_nodes.py
import pytest
from testcontainers.neo4j import Neo4jContainer

@pytest.fixture(scope="function")
def neo4j_container():
    with Neo4jContainer("neo4j:5-enterprise") as neo4j:
        yield neo4j

def test_filesystem_node_creation(neo4j_container):
    # Test that filesystem nodes are created correctly
    pass

def test_implements_relationships(neo4j_container):
    # Test that IMPLEMENTS relationships connect files to code
    pass

def test_filesystem_hierarchy(neo4j_container):
    # Test that directory structure is preserved
    pass

5. Example Graph Structure

FileSystem Structure:
/project
  /src
    main.py (FILESYSTEM_FILE)
    utils.py (FILESYSTEM_FILE)
  /tests
    test_main.py (FILESYSTEM_FILE)

Relationships:
- /project --[FILESYSTEM_CONTAINS]--> /src
- /src --[FILESYSTEM_CONTAINS]--> main.py
- main.py --[IMPLEMENTS]--> MainClass
- main.py --[IMPLEMENTS]--> main_function
- test_main.py --[DEPENDS_ON]--> main.py

6. Configuration

Add configuration options:

# Enable filesystem node generation
ENABLE_FILESYSTEM_NODES=true

# Include file metadata (size, permissions, etc.)
FILESYSTEM_INCLUDE_METADATA=true

# Maximum depth for filesystem traversal
FILESYSTEM_MAX_DEPTH=10

Implementation Steps

  1. Create failing tests for each component
  2. Implement filesystem node classes with proper inheritance
  3. Add filesystem graph generation as a new step in ProjectGraphCreator
  4. Create relationships between filesystem and code nodes
  5. Integrate with LLM descriptions to detect file references
  6. Add configuration options for feature control
  7. Update documentation with examples and usage

Acceptance Criteria

  • Filesystem nodes are created for all files and directories
  • IMPLEMENTS relationships connect files to their code contents
  • DEPENDS_ON relationships identify file dependencies
  • LLM descriptions that mention files create appropriate relationships
  • All tests pass with >90% coverage
  • Feature can be enabled/disabled via configuration
  • Documentation includes usage examples
  • Performance impact is minimal (<10% increase in graph generation time)

Benefits

  1. Enhanced Navigation: Navigate from code to file location and vice versa
  2. Dependency Analysis: Understand file-level dependencies
  3. Refactoring Support: Identify which files need to move together
  4. Project Structure Insights: Analyze how code is organized in the filesystem
  5. Cross-Reference Support: Connect LLM descriptions to actual files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions