Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/src/content/docs/reference/cli/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,8 @@ Commands:
code <file> [query] Parse code using tree sitter and executes a
query
tokens [options] <files...> Count tokens in a set of files
tokenize [options] <file> Tokenizes a piece of text and display the
tokens (in hex format)
jsonl2json Converts JSONL files to a JSON file
prompty [options] <file...> Converts .prompty files to genaiscript
jinja2 [options] <file> Renders Jinja2 or prompty template
Expand Down Expand Up @@ -684,6 +686,21 @@ Options:
-h, --help display help for command
```

### `parse tokenize`

```
Usage: genaiscript parse tokenize [options] <file>

Tokenizes a piece of text and display the tokens (in hex format)

Arguments:
file file to tokenize

Options:
-m, --model <string> encoding model
-h, --help display help for command
```

### `parse jsonl2json`

```
Expand Down
15 changes: 12 additions & 3 deletions genaisrc/docs.genai.mts
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ async function generateDocs(file: WorkspaceFile, fileStats: any) {
_.def("FILE", missingDoc.getRoot().root().text())
_.def("FUNCTION", missingDoc.text())
// this needs more eval-ing
_.$`Generate a function documentation for <FUNCTION>.
_.$`Generate a TypeScript function documentation for <FUNCTION>.
- Make sure parameters are documented.
- Be concise. Use technical tone.
- do NOT include types, this is for TypeScript.
Expand Down Expand Up @@ -257,8 +257,8 @@ rule:
_.def("DOCSTRING", comment.text(), { flex: 10 })
_.def("FUNCTION", match.text(), { flex: 10 })
// this needs more eval-ing
_.$`Update the docstring <DOCSTRING> to match the code in function <FUNCTION>.
- If the docstring is up to date, return /NOP/.
_.$`Update the TypeScript docstring <DOCSTRING> to match the code in function <FUNCTION>.
- If the docstring is up to date, return /NOP/. It's ok to leave it as is.
- do not rephrase an existing sentence if it is correct.
- Make sure parameters are documented.
- do NOT include types, this is for TypeScript.
Expand All @@ -268,6 +268,15 @@ rule:
The full source of the file is in <FILE> for reference.
The source of the function is in <FUNCTION>.
The current docstring is <DOCSTRING>.

docstring:

/**
* description
* @param param1 - description
* @param param2 - description
* @returns description
*/
`
},
{
Expand Down
69 changes: 58 additions & 11 deletions packages/cli/src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import {
parseMarkdown,
parsePDF,
parseSecrets,
parseTokenize,
parseTokens,
prompty2genaiscript,
} from "./parse" // Parsing functions
Expand Down Expand Up @@ -77,17 +78,55 @@ import { DEBUG_CATEGORIES } from "../../core/src/dbg"
/**
* Main function to initialize and run the CLI.
*
* Sets up global error handling for uncaught exceptions.
* Verifies Node.js version compatibility.
* Configures CLI options and commands, including:
* - `configure`: Interactive help to configure providers.
* - `run`: Executes a GenAIScript against files with various options for output, retries, and caching.
* - `runs`: Commands to manage and list previous runs.
* - `test`: Group of commands for running and managing tests, including listing and viewing tests.
* - `convert`: Converts files through a GenAIScript with options for output, concurrency, and file-specific settings.
* Handles environment setup and NodeHost installation.
* Adds support for various CLI options such as working directory, environment files, color output, verbosity, and performance logging.
* Includes error handling for request errors and runtime compatibility issues.
* @param script - The script to execute.
* @param files - Optional list of files to process.
* @param cwd - Working directory for the CLI.
* @param env - Paths to environment files.
* @param noColors - Disable color output.
* @param quiet - Disable verbose output.
* @param debug - Debug categories to enable.
* @param perf - Enable performance logging.
* @param provider - Preferred LLM provider aliases.
* @param accept - Comma-separated list of accepted file extensions.
* @param excludedFiles - List of files to exclude.
* @param ignoreGitIgnore - Disable exclusion of files ignored by .gitignore.
* @param fallbackTools - Enable prompt-based tools instead of built-in LLM tool calls.
* @param out - Output folder for results.
* @param removeOut - Remove output folder if it exists.
* @param outTrace - Output file for trace.
* @param outOutput - Output file for output.
* @param outData - Output file for data, including JSON schema validation.
* @param outAnnotations - Output file for annotations.
* @param outChangelog - Output file for changelogs.
* @param pullRequest - Pull request identifier.
* @param pullRequestComment - Create a comment on a pull request with a unique ID.
* @param pullRequestDescription - Create a comment on a pull request description with a unique ID.
* @param pullRequestReviews - Create pull request reviews from annotations.
* @param teamsMessage - Post a message to the Teams channel.
* @param json - Emit full JSON response to output.
* @param yaml - Emit full YAML response to output.
* @param failOnErrors - Fail on detected annotation errors.
* @param retry - Number of retries for the run.
* @param retryDelay - Minimum delay between retries.
* @param maxDelay - Maximum delay between retries.
* @param label - Label for the run.
* @param temperature - Temperature for the run.
* @param topP - Top-p for the run.
* @param maxTokens - Maximum completion tokens for the run.
* @param maxDataRepairs - Maximum data repairs.
* @param maxToolCalls - Maximum tool calls for the run.
* @param toolChoice - Tool choice for the run.
* @param seed - Seed for the run.
* @param cache - Enable LLM result cache.
* @param cacheName - Custom cache file name.
* @param csvSeparator - CSV separator.
* @param fenceFormat - Fence format for output.
* @param applyEdits - Apply file edits.
* @param vars - Variables as name=value pairs.
* @param runRetry - Number of retries for the entire run.
* @param noRunTrace - Disable automatic trace generation.
* @param noOutputTrace - Disable automatic output generation.
* @returns Exit code indicating success or failure.
*/
export async function cli() {
let nodeHost: NodeHost // Variable to hold NodeHost instance
Expand Down Expand Up @@ -611,6 +650,14 @@ export async function cli() {
.arguments("<files...>")
.option("-ef, --excluded-files <string...>", "excluded files")
.action(parseTokens) // Action to count tokens in files
parser
.command("tokenize")
.argument("<file>", "file to tokenize")
.description(
"Tokenizes a piece of text and display the tokens (in hex format)"
)
.option("-m, --model <string>", "encoding model")
.action(parseTokenize)
parser
.command("jsonl2json", "Converts JSONL files to a JSON file")
.argument("<file...>", "input JSONL files")
Expand Down
42 changes: 40 additions & 2 deletions packages/cli/src/parse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ import { parsePdf } from "../../core/src/pdf"
import { estimateTokens } from "../../core/src/tokens"
import { YAMLStringify } from "../../core/src/yaml"
import { resolveTokenEncoder } from "../../core/src/encoders"
import { MD_REGEX, PROMPTY_REGEX } from "../../core/src/constants"
import {
CONSOLE_TOKEN_COLORS,
MD_REGEX,
PROMPTY_REGEX,
} from "../../core/src/constants"
import { promptyParse, promptyToGenAIScript } from "../../core/src/prompty"
import { basename, join } from "node:path"
import { CSVStringify, dataToMarkdownTable } from "../../core/src/csv"
Expand All @@ -31,6 +35,10 @@ import { chunkMarkdown } from "../../core/src/mdchunk"
import { normalizeInt } from "../../core/src/cleaners"
import { prettyBytes } from "../../core/src/pretty"
import { terminalSize } from "../../core/src/terminal"
import { consoleColors, wrapColor } from "../../core/src/consolecolor"
import { genaiscriptDebug } from "../../core/src/debug"
import { stderr, stdout } from "../../core/src/stdio"
const dbg = genaiscriptDebug("cli:parse")

/**
* This module provides various parsing utilities for different file types such
Expand Down Expand Up @@ -230,7 +238,7 @@ export async function jsonl2json(files: string[]) {
/**
* Estimates the number of tokens in the content of files and logs the results.
* @param filesGlobs - An array of files or glob patterns to process.
* @param options - Options for excluding files, specifying the model, and ignoring .gitignore.
* @param options - Options for processing files.
* - excludedFiles - A list of files to exclude from processing.
* - model - The name of the model used for token encoding.
* - ignoreGitIgnore - Whether to ignore .gitignore rules when expanding files.
Expand Down Expand Up @@ -261,6 +269,36 @@ export async function parseTokens(
console.log(text)
}

/**
* Tokenizes the content of a specified file using a provided model and logs the tokens.
*
* @param file - Path to the file to tokenize.
* @param options - Object containing the following properties:
* - model - The name of the model used for token encoding.
*
* The function reads the content of the file, tokenizes it using the given model,
* and logs each token along with its hexadecimal representation.
* Debug information about the process is also logged.
*/
export async function parseTokenize(file: string, options: { model: string }) {
const text = await readText(file)
dbg(`text: %s`, text)
const { model } = options || {}
const {
model: tokenModel,
encode: encoder,
decode: decoder,
} = await resolveTokenEncoder(model)

console.debug(`model: %s`, tokenModel)
const tokens = encoder(text)
for (const token of tokens) {
stdout.write(
`(${wrapColor(CONSOLE_TOKEN_COLORS[0], decoder([token]))}, x${wrapColor(CONSOLE_TOKEN_COLORS[1], token.toString(16))})`
)
}
}

/**
* Converts "prompty" format files to GenAI script files.
*
Expand Down
46 changes: 0 additions & 46 deletions packages/core/src/runpromptcontext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -108,25 +108,6 @@ import { dotGenaiscriptPath } from "./workdir"
import { prettyBytes } from "./pretty"
import { createCache } from "./cache"

/**
* Creates a context for generating chat turn prompts.
*
* @param options - Contains generation options such as model, lineNumbers, and fenceFormat.
* @param trace - Trace object used to log the process and record outputs.
*
* @returns A context object used to generate prompt nodes and manage output-related functionalities.
*
* The returned context includes methods:
* - `writeText`: Appends a text node to the prompt for a specified role, priority, and max token limit.
* - `assistant`: Shortcut for writing text with an "assistant" role.
* - `$`: Creates and appends a template string node, returning a chainable interface for modifiers (e.g., priority, transforms).
* - `def`: Creates and appends a definition node for body content or external files, supporting error handling for empty definitions.
* - `defData`: Adds structured data as a definition.
* - `defDiff`: Adds a diff comparison between two data sets.
* - `fence`: Shortcut for creating definition nodes with code fences.
* - `importTemplate`: Imports a pre-defined template with associated data and appends it to the node tree.
* - `console`: Diagnostic methods (`log`, `debug`, `warn`, `error`) for capturing and printing logs or errors during the generation process.
*/
export function createChatTurnGenerationContext(
options: GenerationOptions,
trace: MarkdownTrace,
Expand Down Expand Up @@ -427,33 +408,6 @@ export interface RunPromptContextNode extends ChatGenerationContext {
node: PromptNode
}

/**
* Creates a chat generation context for handling prompts and related tasks in a conversational context.
*
* @param options - Configuration options for generation, including cancellation token, info callback, and user state.
* @param trace - Tracing utilities for logging and debugging execution.
* @param projectOptions - Project-specific parameters, including the project instance and environment variables.
* @returns A context object with various utility functions and properties for managing prompts and AI interactions.
*
* Utility Functions:
* - `defAgent(name, description, fn, options)`: Defines an agent with tools, memory, and task-solving capabilities.
* - `defTool(name, description, parameters, fn, defOptions)`: Registers a tool for use in the chat session. Supports multiple formats for tool definitions, including callbacks and MCP server configurations.
* - `defSchema(name, schema, defOptions)`: Defines a JSON schema for validation or metadata.
* - `defImages(files, defOptions)`: Processes and encodes image files for their integration into prompts. Supports tiling and slicing of images.
* - `defChatParticipant(generator, options)`: Adds chat participant logic (e.g., other agents or external systems).
* - `defFileOutput(pattern, description, options)`: Specifies output file patterns for tracking in the session.
* - `defOutputProcessor(fn)`: Adds output post-processing logic.
* - `defFileMerge(fn)`: Declares logic for merging file changes in the session output.
* - `prompt(strings, ...args)`: Runs a prompt with given input and additional options.
* - `runPrompt(generator, runOptions)`: Executes prompt logic and generates results, supporting inner execution.
* - `transcribe(audio, options)`: Transcribes audio input into text, with optional caching and language options.
* - `speak(input, options)`: Converts text to speech and saves the audio file.
* - `generateImage(prompt, imageOptions)`: Generates an image based on a textual description and custom options.
*
* Context Properties:
* - `node`: Root node capturing children elements in the prompt structure.
* - `env`: Environment variables passed to the project context for dynamic adjustments.
*/
export function createChatGenerationContext(
options: GenerationOptions,
trace: MarkdownTrace,
Expand Down
Loading