Skip to content

Commit

Permalink
Merge pull request #528 from alexrudall/7.3.0
Browse files Browse the repository at this point in the history
7.3.0
  • Loading branch information
alexrudall authored Oct 11, 2024
2 parents 4627c94 + f3d4121 commit 16f00c3
Show file tree
Hide file tree
Showing 13 changed files with 456 additions and 378 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [7.3.0] - 2024-10-11

### Added

- Add ability to (with the right incantations) retrieve the chunks used by an Assistant file search - thanks to [@agamble](https://github.com/agamble) for the addition!

## [7.2.0] - 2024-10-10

### Added
Expand Down
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
ruby-openai (7.2.0)
ruby-openai (7.3.0)
event_stream_parser (>= 0.3.0, < 2.0.0)
faraday (>= 1)
faraday-multipart (>= 1)
Expand Down
110 changes: 110 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1111,6 +1111,116 @@ end

Note that you have 10 minutes to submit your tool output before the run expires.

#### Exploring chunks used in File Search

Take a deep breath. You might need a drink for this one.

It's possible for OpenAI to share what chunks it used in its internal RAG Pipeline to create its filesearch example.

An example spec can be found [here](https://github.com/alexrudall/ruby-openai/blob/main/spec/openai/client/assistant_file_search_spec.rb) that does this, just so you know it's possible.

Here's how to get the chunks used in a file search. In this example I'm using [this file](https://css4.pub/2015/textbook/somatosensory.pdf):

```
require "openai"
# Make a client
client = OpenAI::Client.new(
access_token: "access_token_goes_here",
log_errors: true # Don't do this in production.
)
# Upload your file(s)
file_id = client.files.upload(
parameters: {
file: "path/to/somatosensory.pdf",
purpose: "assistants"
}
)["id"]
# Create a vector store to store the vectorised file(s)
vector_store_id = client.vector_stores.create(parameters: {})["id"]
# Vectorise the file(s)
vector_store_file_id = client.vector_store_files.create(
vector_store_id: vector_store_id,
parameters: { file_id: file_id }
)["id"]
# Check that the file is vectorised (wait for status to be "completed")
client.vector_store_files.retrieve(vector_store_id: vector_store_id, id: vector_store_file_id)["status"]
# Create an assistant, referencing the vector store
assistant_id = client.assistants.create(
parameters: {
model: "gpt-4o",
name: "Answer finder",
instructions: "You are a file search tool. Find the answer in the given files, please.",
tools: [
{ type: "file_search" }
],
tool_resources: {
file_search: {
vector_store_ids: [vector_store_id]
}
}
}
)["id"]
# Create a thread with your question
thread_id = client.threads.create(parameters: {
messages: [
{ role: "user",
content: "Find the description of a nociceptor." }
]
})["id"]
# Run the thread to generate the response. Include the "GIVE ME THE CHUNKS" incantation.
run_id = client.runs.create(
thread_id: thread_id,
parameters: {
assistant_id: assistant_id
},
query_parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
)["id"]
# Get the steps that happened in the run
steps = client.run_steps.list(
thread_id: thread_id,
run_id: run_id,
parameters: { order: "asc" }
)
# Get the last step ID (or whichever one you want to look at)
step_id = steps["data"].first["id"]
# Retrieve all the steps. Include the "GIVE ME THE CHUNKS" incantation again.
steps = steps["data"].map do |step|
client.run_steps.retrieve(
thread_id: thread_id,
run_id: run_id,
id: step["id"],
parameters: { include: ["step_details.tool_calls[*].file_search.results[*].content"] } # incantation
)
end
# Now we've got the chunk info, buried deep. Loop through the steps and find chunks if included:
chunks = steps.flat_map do |step|
included_results = step.dig("step_details", "tool_calls", 0, "file_search", "results")
next if included_results.nil? || included_results.empty?
included_results.flat_map do |result|
result["content"].map do |content|
content["text"]
end
end
end.compact
# The first chunk will be the closest match to the prompt. Finally, if you want to view the completed message(s):
client.messages.list(thread_id: thread_id)
```

### Image Generation

Generate images using DALL·E 2 or DALL·E 3!
Expand Down
2 changes: 1 addition & 1 deletion lib/openai/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module OpenAI
VERSION = "7.2.0".freeze
VERSION = "7.3.0".freeze
end

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Large diffs are not rendered by default.

Loading

0 comments on commit 16f00c3

Please sign in to comment.