Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add brief introduction for Data Analysis Workflow #22

Merged
merged 1 commit into from
Nov 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,21 @@ async for event in agent.astream_events(

<!-- API reference -->

## Workflow
## Components

### Data Analysis workflow

The Data Analysis workflow is the core functionality of the `tablegpt-agent`. It processes user input and generates appropriate responses. This workflow is similar to those found in most single-agent systems and consists of an agent and various tools. Specifically, the data analysis workflow includes:

- **An Agent Powered by TableGPT2**: This agent performs data analysis tasks.
- **An IPython tool**: This tool executes the generated code within a sandbox environment.

Additionally, the data analysis workflow offers several optional plugins that extend the agent's functionality:

- [VLM](#vlm): A Visual Language Model that can be used to enhance summarization for data visualization tasks.
- [Dataset Retriever](#dataset-retriever): A retriever that fetches information about the dataset, improving the quality and relevance of the generated code.
- [Safaty Guard](#safaty-guard): A safety mechanism that protects the system from toxic inputs.

### File Reading workflow

We separate the file reading workflow from the data analysis workflow to maintain greater control over how the LLM inspects the dataset files. Typically, if you let the LLM inspect the dataset itself, it uses the `df.head()` function to preview the data. While this is sufficient for basic cases, we have implemented a more structured approach by hard-coding the file reading workflow into several steps:
Expand All @@ -92,11 +103,11 @@ The `tablegpt-agent` directs `tablegpt` to generate Python code for data analysi

#### VLM

#### RAG
#### Dataset Retriever

#### Security Guard
#### Safaty Guard

<!-- normalization chain -->
#### Dataset Normalizer

## Liscence

Expand Down