diff --git a/docs/concepts/multimodal.md b/docs/concepts/multimodal.md
index 608655dd9..228b0dec3 100644
--- a/docs/concepts/multimodal.md
+++ b/docs/concepts/multimodal.md
@@ -13,10 +13,6 @@ The core of multimodal support in Instructor is the `Image` class. This class re
 
 It's important to note that Anthropic and OpenAI have different formats for handling images in their API requests. The `Image` class in Instructor abstracts away these differences, allowing you to work with a unified interface.
 
-## `Audio`
-
-The `Audio` class represents an audio file that can be loaded from a URL or file path. It provides methods to create `Audio` instances but currently only OpenAI supports it.
-
 ### Usage
 
 You can create an `Image` instance from a URL or file path using the `from_url` or `from_path` methods. The `Image` class will automatically convert the image to a base64-encoded string and include it in the API request.
@@ -40,3 +36,47 @@ response = client.chat.completions.create(
 ```
 
 The `Image` class takes care of the necessary conversions and formatting, ensuring that your code remains clean and provider-agnostic. This flexibility is particularly valuable when you're experimenting with different models or when you need to switch providers based on specific project requirements.
+
+## `Audio`
+
+The `Audio` class represents an audio file that can be loaded from a URL or file path. It provides methods to create `Audio` instances but currently only OpenAI supports it. You can create an instance using the `from_path` and `from_url` methods. The `Audio` class will automatically convert it to a base64-encoded image and include it in the API request.
+
+### Usage
+
+```python
+from openai import OpenAI
+from pydantic import BaseModel
+import instructor
+from instructor.multimodal import Audio
+import base64
+
+client = instructor.from_openai(OpenAI())
+
+
+class User(BaseModel):
+    name: str
+    age: int
+
+
+with open("./output.wav", "rb") as f:
+    encoded_string = base64.b64encode(f.read()).decode("utf-8")
+
+resp = client.chat.completions.create(
+    model="gpt-4o-audio-preview",
+    response_model=User,
+    modalities=["text"],
+    audio={"voice": "alloy", "format": "wav"},
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                "Extract the following information from the audio:",
+                Audio.from_path("./output.wav"),
+            ],
+        },
+    ],
+)  # type: ignore
+
+print(resp)
+# > name='Jason' age=20
+```