Mention how many files when calculating total tokens #219

stephanj · 2024-08-18T08:08:26Z

Also mention how many files are in the scanned directory.

Claude Sonnet 3.5 suggestion how to implement this feature:

To mention the number of files along with the token count when calculating the project content, we'll need to modify the ProjectContentService class. This class is responsible for scanning the project and calculating tokens. We'll update it to keep track of the file count as well.

Here's how we can modify the ProjectContentService class to include the file count:

First, let's create a new class to hold both the content and file count:

public class ProjectScanResult {
    private final String content;
    private final int tokenCount;
    private final int fileCount;

    public ProjectScanResult(String content, int tokenCount, int fileCount) {
        this.content = content;
        this.tokenCount = tokenCount;
        this.fileCount = fileCount;
    }

    // Getters for content, tokenCount, and fileCount
}

Now, let's modify the ProjectContentService class:

package com.devoxx.genie.service;

import com.devoxx.genie.model.enumarations.ModelProvider;
import com.devoxx.genie.ui.util.NotificationUtil;
import com.devoxx.genie.ui.util.WindowContextFormatterUtil;
import com.intellij.openapi.application.ApplicationManager;
import com.intellij.openapi.project.Project;
import com.intellij.openapi.vfs.VirtualFile;
import com.knuddels.jtokkit.Encodings;
import com.knuddels.jtokkit.api.Encoding;
import com.knuddels.jtokkit.api.EncodingType;
import org.jetbrains.annotations.NotNull;

import java.awt.*;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.StringSelection;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicInteger;

public class ProjectContentService {

    private static final Encoding ENCODING = Encodings.newDefaultEncodingRegistry().getEncoding(EncodingType.CL100K_BASE);

    public static ProjectContentService getInstance() {
        return ApplicationManager.getApplication().getService(ProjectContentService.class);
    }

    public CompletableFuture<ProjectScanResult> getProjectContent(Project project,
                                                                  int windowContext,
                                                                  boolean isTokenCalculation) {
        return ProjectScannerService.getInstance()
            .scanProject(project, null, windowContext, isTokenCalculation)
            .thenApply(scanResult -> {
                if (!isTokenCalculation) {
                    copyToClipboard(scanResult.getContent());
                }
                return scanResult;
            });
    }

    public void calculateTokensAndCost(Project project,
                                       int windowContext,
                                       ModelProvider provider,
                                       LanguageModel languageModel) {
        if (!DefaultLLMSettingsUtil.isApiBasedProvider(provider)) {
            getProjectContent(project, windowContext, true)
                .thenCompose(scanResult -> CompletableFuture.runAsync(() -> {
                    String message = String.format("Project contains %s tokens in %d files. " +
                            "Cost calculation is not applicable for local providers. " +
                            "Make sure you select a model with a big enough window context.",
                        WindowContextFormatterUtil.format(scanResult.getTokenCount(), "tokens"),
                        scanResult.getFileCount());
                    NotificationUtil.sendNotification(project, message);
                }));
            return;
        }

        DevoxxGenieSettingsService settings = DevoxxGenieSettingsServiceProvider.getInstance();
        AtomicDouble inputCost = new AtomicDouble(settings.getModelInputCost(provider, languageModel.getModelName()));

        getProjectContent(project, windowContext, true)
            .thenCompose(scanResult -> CompletableFuture.runAsync(() -> {
                double estimatedInputCost = calculateCost(scanResult.getTokenCount(), inputCost.get());
                String message = String.format("Project contains %s tokens in %d files. Estimated min. cost using %s %s is $%.5f",
                    WindowContextFormatterUtil.format(scanResult.getTokenCount(), "tokens"),
                    scanResult.getFileCount(),
                    provider.getName(),
                    languageModel.getDisplayName(),
                    estimatedInputCost);

                if (scanResult.getTokenCount() > languageModel.getContextWindow()) {
                    message += String.format(". Total project size exceeds model's max context of %s tokens.",
                        WindowContextFormatterUtil.format(languageModel.getContextWindow()));
                }

                NotificationUtil.sendNotification(project, message);
            }));
    }

    // Other methods remain the same
}

We also need to update the ProjectScannerService to keep track of the file count:

package com.devoxx.genie.service;

import com.intellij.openapi.application.ApplicationManager;
import com.intellij.openapi.project.Project;
import com.intellij.openapi.vfs.VirtualFile;
import com.intellij.openapi.vfs.VfsUtilCore;
import com.intellij.openapi.vfs.VirtualFileVisitor;
import org.jetbrains.annotations.NotNull;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicInteger;

public class ProjectScannerService {

    public static ProjectScannerService getInstance() {
        return ApplicationManager.getApplication().getService(ProjectScannerService.class);
    }

    public CompletableFuture<ProjectScanResult> scanProject(Project project,
                                                            VirtualFile startDirectory,
                                                            int windowContext,
                                                            boolean isTokenCalculation) {
        CompletableFuture<ProjectScanResult> future = new CompletableFuture<>();

        ReadAction.nonBlocking(() -> {
                StringBuilder result = new StringBuilder();
                result.append("Directory Structure:\n");
                StringBuilder fullContent;
                AtomicInteger fileCount = new AtomicInteger(0);

                if (startDirectory == null) {
                    fullContent = getContentFromModules(project, windowContext, result, fileCount);
                } else {
                    fullContent = processDirectory(project, startDirectory, windowContext, result, fileCount);
                }

                String content = isTokenCalculation ? fullContent.toString() :
                    truncateToTokens(project, fullContent.toString(), windowContext, isTokenCalculation);
                
                int tokenCount = ENCODING.countTokens(content);
                return new ProjectScanResult(content, tokenCount, fileCount.get());
            }).inSmartMode(project)
            .finishOnUiThread(ModalityState.defaultModalityState(), future::complete)
            .submit(AppExecutorUtil.getAppExecutorService());

        return future;
    }

    private StringBuilder processDirectory(Project project,
                                           VirtualFile startDirectory,
                                           int windowContext,
                                           StringBuilder result,
                                           AtomicInteger fileCount) {
        // ... (existing code)

        walkThroughDirectory(startDirectory, fileIndex, fullContent, currentTokens, maxTokens, fileCount);
        return fullContent;
    }

    private void walkThroughDirectory(VirtualFile directory,
                                      ProjectFileIndex fileIndex,
                                      StringBuilder fullContent,
                                      AtomicInteger currentTokens,
                                      int maxTokens,
                                      AtomicInteger fileCount) {
        VfsUtilCore.visitChildrenRecursively(directory, new VirtualFileVisitor<Void>() {
            @Override
            public boolean visitFile(@NotNull VirtualFile file) {
                if (shouldExcludeDirectory(file)) {
                    return false;
                }
                if (fileIndex.isInContent(file) && shouldIncludeFile(file)) {
                    fileCount.incrementAndGet();
                    String header = "\n--- " + file.getPath() + " ---\n";
                    fullContent.append(header);

                    try {
                        String content = new String(file.contentsToByteArray(), StandardCharsets.UTF_8);
                        content = processFileContent(content);
                        fullContent.append(content).append("\n");

                        int tokens = ENCODING.countTokens(content);
                        currentTokens.addAndGet(tokens);

                        if (currentTokens.get() >= maxTokens) {
                            return false; // Stop scanning if token limit is reached
                        }
                    } catch (Exception e) {
                        String errorMsg = "Error reading file: " + e.getMessage() + "\n";
                        fullContent.append(errorMsg);
                    }
                }
                return true;
            }
        });
    }

    // Other methods remain the same
}

With these changes, we're now tracking both the token count and the file count when scanning the project. The calculateTokensAndCost method in ProjectContentService has been updated to include the file count in the notification message.

This approach allows us to provide more comprehensive information about the project's content, including both the token count and the number of files scanned. The notification message will now include both pieces of information, giving users a better understanding of their project's size and complexity.

The text was updated successfully, but these errors were encountered:

stephanj · 2024-08-19T11:33:25Z

stephanj added a commit that referenced this issue Aug 19, 2024

Feat #219 Mention how many files when calculating total tokens

92ddc32

stephanj closed this as completed Aug 19, 2024

stephanj added a commit that referenced this issue Dec 9, 2024

Feat #219 Mention how many files when calculating total tokens

eca1baa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mention how many files when calculating total tokens #219

Mention how many files when calculating total tokens #219

stephanj commented Aug 18, 2024

stephanj commented Aug 19, 2024

Mention how many files when calculating total tokens #219

Mention how many files when calculating total tokens #219

Comments

stephanj commented Aug 18, 2024

stephanj commented Aug 19, 2024