Skip to content

Commit

Permalink
Add ReAct agent concept guide, search tweaks (langchain-ai#5912)
Browse files Browse the repository at this point in the history
  • Loading branch information
jacoblee93 authored Jun 27, 2024
1 parent dfa4c6d commit 7ef4fdb
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 8 deletions.
37 changes: 31 additions & 6 deletions docs/core_docs/docs/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,18 @@ Chat Models also accept other parameters that are specific to that integration.

For specifics on how to use chat models, see the [relevant how-to guides here](/docs/how_to/#chat-models).

### Multimodality

Some chat models are multimodal, accepting images, audio and even video as inputs.
These are still less common, meaning model providers haven't standardized on the "best" way to define the API.
Multimodal outputs are even less common. As such, we've kept our multimodal abstractions fairly light weight
and plan to further solidify the multimodal APIs and interaction patterns as the field matures.

In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format.
So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.

For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal).

### LLMs

<span data-heading-keywords="llm,llms"></span>
Expand Down Expand Up @@ -579,15 +591,28 @@ If you are still using AgentExecutor, do not fear: we still have a guide on [how
It is recommended, however, that you start to transition to [LangGraph](https://github.com/langchain-ai/langgraphjs).
In order to assist in this we have put together a [transition guide on how to do so](/docs/how_to/migrate_agent).

### Multimodal
#### ReAct agents

Some models are multimodal, accepting images, audio and even video as inputs. These are still less common, meaning model providers haven't standardized on the "best" way to define the API.
Multimodal **outputs** are even less common. As such, we've kept our multimodal abstractions fairly light weight and plan to further solidify the multimodal APIs and interaction patterns as the field matures.
<span data-heading-keywords="react,react agent"></span>

In LangChain, most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format.
So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.
One popular architecture for building agents is [**ReAct**](https://arxiv.org/abs/2210.03629).
ReAct combines reasoning and acting in an iterative process - in fact the name "ReAct" stands for "Reason" and "Act".

For specifics on how to use multimodal models, see the [relevant how-to guides here](/docs/how_to/#multimodal).
The general flow looks like this:

- The model will "think" about what step to take in response to an input and any previous observations.
- The model will then choose an action from available tools (or choose to respond to the user).
- The model will generate arguments to that tool.
- The agent runtime (executor) will parse out the chosen tool and call it with the generated arguments.
- The executor will return the results of the tool call back to the model as an observation.
- This process repeats until the agent chooses to respond.

There are general prompting based implementations that do not require any model-specific features, but the most
reliable implementations use features like [tool calling](/docs/how_to/tool_calling/) to reliably format outputs
and reduce variance.

Please see the [LangGraph documentation](https://langchain-ai.github.io/langgraph/) for more information,
or [this how-to guide](/docs/how_to/migrate_agent/) for specific information on migrating to LangGraph.

### Callbacks

Expand Down
16 changes: 15 additions & 1 deletion docs/core_docs/docs/how_to/migrate_agent.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
{
"cells": [
{
"cell_type": "raw",
"id": "8f21bf6b",
"metadata": {
"vscode": {
"languageId": "raw"
}
},
"source": [
"---\n",
"keywords: [create_react_agent, create_react_agent()]\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "579c24a2",
Expand All @@ -12,7 +26,7 @@
"[`AgentExecutor`](https://api.js.langchain.com/classes/langchain_agents.AgentExecutor.html)\n",
"in particular) have multiple configuration parameters. In this notebook we will\n",
"show how those parameters map to the LangGraph\n",
"[react agent executor](https://langchain-ai.github.io/langgraphjs/reference/functions/prebuilt.createReactAgent.html).\n",
"react agent executor using the [create_react_agent](https://langchain-ai.github.io/langgraphjs/reference/functions/prebuilt.createReactAgent.html) prebuilt helper method.\n",
"\n",
"For more information on how to build agentic workflows in LangGraph, check out\n",
"the [docs here](https://langchain-ai.github.io/langgraphjs/how-tos/).\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/core_docs/docs/how_to/tool_calling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
},
"source": [
"---\n",
"keywords: [function, function calling, tool, tool calling]\n",
"keywords: [function, function calling, tool, tool call, tool calling]\n",
"---"
]
},
Expand Down

0 comments on commit 7ef4fdb

Please sign in to comment.