Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention how many files when calculating total tokens #219

Closed
stephanj opened this issue Aug 18, 2024 · 1 comment
Closed

Mention how many files when calculating total tokens #219

stephanj opened this issue Aug 18, 2024 · 1 comment

Comments

@stephanj
Copy link
Contributor

Also mention how many files are in the scanned directory.

image

Claude Sonnet 3.5 suggestion how to implement this feature:

To mention the number of files along with the token count when calculating the project content, we'll need to modify the ProjectContentService class. This class is responsible for scanning the project and calculating tokens. We'll update it to keep track of the file count as well.

Here's how we can modify the ProjectContentService class to include the file count:

  1. First, let's create a new class to hold both the content and file count:
public class ProjectScanResult {
    private final String content;
    private final int tokenCount;
    private final int fileCount;

    public ProjectScanResult(String content, int tokenCount, int fileCount) {
        this.content = content;
        this.tokenCount = tokenCount;
        this.fileCount = fileCount;
    }

    // Getters for content, tokenCount, and fileCount
}
  1. Now, let's modify the ProjectContentService class:
package com.devoxx.genie.service;

import com.devoxx.genie.model.enumarations.ModelProvider;
import com.devoxx.genie.ui.util.NotificationUtil;
import com.devoxx.genie.ui.util.WindowContextFormatterUtil;
import com.intellij.openapi.application.ApplicationManager;
import com.intellij.openapi.project.Project;
import com.intellij.openapi.vfs.VirtualFile;
import com.knuddels.jtokkit.Encodings;
import com.knuddels.jtokkit.api.Encoding;
import com.knuddels.jtokkit.api.EncodingType;
import org.jetbrains.annotations.NotNull;

import java.awt.*;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.StringSelection;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicInteger;

public class ProjectContentService {

    private static final Encoding ENCODING = Encodings.newDefaultEncodingRegistry().getEncoding(EncodingType.CL100K_BASE);

    public static ProjectContentService getInstance() {
        return ApplicationManager.getApplication().getService(ProjectContentService.class);
    }

    public CompletableFuture<ProjectScanResult> getProjectContent(Project project,
                                                                  int windowContext,
                                                                  boolean isTokenCalculation) {
        return ProjectScannerService.getInstance()
            .scanProject(project, null, windowContext, isTokenCalculation)
            .thenApply(scanResult -> {
                if (!isTokenCalculation) {
                    copyToClipboard(scanResult.getContent());
                }
                return scanResult;
            });
    }

    public void calculateTokensAndCost(Project project,
                                       int windowContext,
                                       ModelProvider provider,
                                       LanguageModel languageModel) {
        if (!DefaultLLMSettingsUtil.isApiBasedProvider(provider)) {
            getProjectContent(project, windowContext, true)
                .thenCompose(scanResult -> CompletableFuture.runAsync(() -> {
                    String message = String.format("Project contains %s tokens in %d files. " +
                            "Cost calculation is not applicable for local providers. " +
                            "Make sure you select a model with a big enough window context.",
                        WindowContextFormatterUtil.format(scanResult.getTokenCount(), "tokens"),
                        scanResult.getFileCount());
                    NotificationUtil.sendNotification(project, message);
                }));
            return;
        }

        DevoxxGenieSettingsService settings = DevoxxGenieSettingsServiceProvider.getInstance();
        AtomicDouble inputCost = new AtomicDouble(settings.getModelInputCost(provider, languageModel.getModelName()));

        getProjectContent(project, windowContext, true)
            .thenCompose(scanResult -> CompletableFuture.runAsync(() -> {
                double estimatedInputCost = calculateCost(scanResult.getTokenCount(), inputCost.get());
                String message = String.format("Project contains %s tokens in %d files. Estimated min. cost using %s %s is $%.5f",
                    WindowContextFormatterUtil.format(scanResult.getTokenCount(), "tokens"),
                    scanResult.getFileCount(),
                    provider.getName(),
                    languageModel.getDisplayName(),
                    estimatedInputCost);

                if (scanResult.getTokenCount() > languageModel.getContextWindow()) {
                    message += String.format(". Total project size exceeds model's max context of %s tokens.",
                        WindowContextFormatterUtil.format(languageModel.getContextWindow()));
                }

                NotificationUtil.sendNotification(project, message);
            }));
    }

    // Other methods remain the same
}
  1. We also need to update the ProjectScannerService to keep track of the file count:
package com.devoxx.genie.service;

import com.intellij.openapi.application.ApplicationManager;
import com.intellij.openapi.project.Project;
import com.intellij.openapi.vfs.VirtualFile;
import com.intellij.openapi.vfs.VfsUtilCore;
import com.intellij.openapi.vfs.VirtualFileVisitor;
import org.jetbrains.annotations.NotNull;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.atomic.AtomicInteger;

public class ProjectScannerService {

    public static ProjectScannerService getInstance() {
        return ApplicationManager.getApplication().getService(ProjectScannerService.class);
    }

    public CompletableFuture<ProjectScanResult> scanProject(Project project,
                                                            VirtualFile startDirectory,
                                                            int windowContext,
                                                            boolean isTokenCalculation) {
        CompletableFuture<ProjectScanResult> future = new CompletableFuture<>();

        ReadAction.nonBlocking(() -> {
                StringBuilder result = new StringBuilder();
                result.append("Directory Structure:\n");
                StringBuilder fullContent;
                AtomicInteger fileCount = new AtomicInteger(0);

                if (startDirectory == null) {
                    fullContent = getContentFromModules(project, windowContext, result, fileCount);
                } else {
                    fullContent = processDirectory(project, startDirectory, windowContext, result, fileCount);
                }

                String content = isTokenCalculation ? fullContent.toString() :
                    truncateToTokens(project, fullContent.toString(), windowContext, isTokenCalculation);
                
                int tokenCount = ENCODING.countTokens(content);
                return new ProjectScanResult(content, tokenCount, fileCount.get());
            }).inSmartMode(project)
            .finishOnUiThread(ModalityState.defaultModalityState(), future::complete)
            .submit(AppExecutorUtil.getAppExecutorService());

        return future;
    }

    private StringBuilder processDirectory(Project project,
                                           VirtualFile startDirectory,
                                           int windowContext,
                                           StringBuilder result,
                                           AtomicInteger fileCount) {
        // ... (existing code)

        walkThroughDirectory(startDirectory, fileIndex, fullContent, currentTokens, maxTokens, fileCount);
        return fullContent;
    }

    private void walkThroughDirectory(VirtualFile directory,
                                      ProjectFileIndex fileIndex,
                                      StringBuilder fullContent,
                                      AtomicInteger currentTokens,
                                      int maxTokens,
                                      AtomicInteger fileCount) {
        VfsUtilCore.visitChildrenRecursively(directory, new VirtualFileVisitor<Void>() {
            @Override
            public boolean visitFile(@NotNull VirtualFile file) {
                if (shouldExcludeDirectory(file)) {
                    return false;
                }
                if (fileIndex.isInContent(file) && shouldIncludeFile(file)) {
                    fileCount.incrementAndGet();
                    String header = "\n--- " + file.getPath() + " ---\n";
                    fullContent.append(header);

                    try {
                        String content = new String(file.contentsToByteArray(), StandardCharsets.UTF_8);
                        content = processFileContent(content);
                        fullContent.append(content).append("\n");

                        int tokens = ENCODING.countTokens(content);
                        currentTokens.addAndGet(tokens);

                        if (currentTokens.get() >= maxTokens) {
                            return false; // Stop scanning if token limit is reached
                        }
                    } catch (Exception e) {
                        String errorMsg = "Error reading file: " + e.getMessage() + "\n";
                        fullContent.append(errorMsg);
                    }
                }
                return true;
            }
        });
    }

    // Other methods remain the same
}

With these changes, we're now tracking both the token count and the file count when scanning the project. The calculateTokensAndCost method in ProjectContentService has been updated to include the file count in the notification message.

This approach allows us to provide more comprehensive information about the project's content, including both the token count and the number of files scanned. The notification message will now include both pieces of information, giving users a better understanding of their project's size and complexity.

@stephanj
Copy link
Contributor Author

image

image

action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant