docs: add tooluse examples (#68)

Bugfix in anthropic code, also includes some examples of how to use tool_use.
timescale · Dec 5, 2024 · 2cb2fe9 · 2cb2fe9
1 parent 1c2ff20
commit 2cb2fe9
Show file tree

Hide file tree

Showing 5 changed files with 372 additions and 3 deletions.
diff --git a/docs/anthropic.md b/docs/anthropic.md
@@ -185,3 +185,173 @@ Here are five famous people from Birmingham, Alabama:
 
 These individuals have made significant contributions in various fields, including politics, entertainment, sports, and athletics, and have helped put Birmingham, Alabama on the map in their respective areas.
 ```
+
+## Tools Use
+
+Here is a [tool_use](https://github.com/anthropics/anthropic-cookbook/tree/main/tool_use)
+example which you can also delegate specific tasks on your data to the AI.
+
+```sql
+\getenv anthropic_api_key ANTHROPIC_API_KEY
+
+SELECT jsonb_pretty(ai.anthropic_generate
+( 'claude-3-5-sonnet-20240620'
+, jsonb_build_array(
+    jsonb_build_object(
+      'role', 'user',
+      'content', 'John works at Google in New York. He met with Sarah, the CEO of Acme Inc., last week in San Francisco.'
+    )
+  )
+, _max_tokens => 4096
+, _api_key => $1
+, _tools => jsonb_build_array(
+    jsonb_build_object(
+      'name', 'print_entities',
+      'description', 'Prints extract named entities.',
+      'input_schema', jsonb_build_object(
+        'type', 'object',
+        'properties', jsonb_build_object(
+          'entities', jsonb_build_object(
+            'type', 'array',
+            'items', jsonb_build_object(
+              'type', 'object',
+              'properties', jsonb_build_object(
+                'name', jsonb_build_object('type', 'string', 'description', 'The extracted entity name.'),
+                'type', jsonb_build_object('type', 'string', 'description', 'The entity type (e.g., PERSON, ORGANIZATION, LOCATION).'),
+                'context', jsonb_build_object('type', 'string', 'description', 'The context in which the entity appears in the text.')
+              ),
+              'required', jsonb_build_array('name', 'type', 'context')
+            )
+          )
+        ),
+        'required', jsonb_build_array('entities')
+      )
+    )
+  )
+)::jsonb) AS result
+\bind :anthropic_api_key
+\g
+```
+
+Outputs:
+
+```json
+{
+    "id": "msg_013VZ3M65KQy8pnh2664YLPA",
+    "role": "assistant",
+    "type": "message",
+    "model": "claude-3-5-sonnet-20240620",
+    "usage": {
+        "input_tokens": 498,
+        "output_tokens": 346
+    },
+    "content": [
+        {
+            "text": "Certainly! I'll use the print_entities tool to extract the named entities from the given document. Let's proceed with the function call:",
+            "type": "text"
+        },
+        {
+            "id": "toolu_0114pK4oBxD53xdfgEBrgq73",
+            "name": "print_entities",
+            "type": "tool_use",
+            "input": {
+                "entities": [
+                    {
+                        "name": "John",
+                        "type": "PERSON",
+                        "context": "John works at Google in New York."
+                    },
+                    {
+                        "name": "Google",
+                        "type": "ORGANIZATION",
+                        "context": "John works at Google in New York."
+                    },
+                    {
+                        "name": "New York",
+                        "type": "LOCATION",
+                        "context": "John works at Google in New York."
+                    },
+                    {
+                        "name": "Sarah",
+                        "type": "PERSON",
+                        "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
+                    },
+                    {
+                        "name": "Acme Inc.",
+                        "type": "ORGANIZATION",
+                        "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
+                    },
+                    {
+                        "name": "San Francisco",
+                        "type": "LOCATION",
+                        "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
+                    }
+                ]
+            }
+        }
+    ],
+    "stop_reason": "tool_use",
+    "stop_sequence": null
+}
+```
+
+You can run this example of **tool_use** in your database to extract named
+entities from the given text input.
+The [extract_entities.sql](../examples/extract_entities.sql) is wrapping the
+function.
+
+```sql
+ SELECT * FROM detect_entities('John works at Timescale in New York.');
+ entity_name | entity_type  |            entity_context
+-------------+--------------+--------------------------------------
+ John        | PERSON       | John works at Timescale in New York.
+ Timescale   | ORGANIZATION | John works at Timescale in New York.
+ New York    | LOCATION     | John works at Timescale in New York.
+```
+
+With the `anonymize_text` function, you can anonymize the text by replacing
+the named entities with their types.
+
+```sql
+ SELECT * FROM anonymize_text('John works at Timescale in New York.');
+-[ RECORD 1 ]--+------------------------------------------------
+anonymize_text | :PERSON: works at :ORGANIZATION: in :LOCATION:.
+```
+Through the same mechanism, the [summarize_article.sql](../examples/summarize_article.sql)
+example shows how to extract structured json with the `summarize_article` tool.
+
+```sql
+select * from summarize_article($$
+  From URL: https://docs.timescale.com/use-timescale/latest/compression
+  #  Compression
+
+  Time-series data can be compressed to reduce the amount of storage required,
+  and increase the speed of some queries. This is a cornerstone feature of Timescale.
+  When new data is added to your database, it is in the form of uncompressed rows.
+  Timescale uses a built-in job scheduler to convert this data to the form of
+  compressed columns. This occurs across chunks of Timescale hypertables.
+
+   Timescale charges are based on how much storage you use. You don't pay for a 
+   fixed storage size, and you don't need to worry about scaling disk size as your
+   data grows; We handle it all for you. To reduce your data costs further, use
+   compression, a data retention policy, and tiered storage.
+
+$$);
+-[ RECORD 1 ]------------------------------------------------------------------
+author     | Timescale Documentation
+topics     | {"database management","data compression","time-series data",
+           |  "data storage optimization"}
+summary    | The article discusses Timescale's compression feature for time-series data.
+           | It explains that compression is a key feature of Timescale, designed
+           | to reduce storage requirements and improve query performance. The
+           | process involves converting newly added uncompressed row data into 
+           | compressed columns using a built-in job scheduler. This compression
+           | occurs across chunks of Timescale hypertables. The article also 
+           | mentions that Timescale's pricing model is based on actual storage
+           | used, with automatic scaling. To further reduce data costs, users 
+           | are advised to employ compression, implement data retention policies,
+           | and utilize tiered storage.
+coherence  | 95
+persuasion | 0.8
+```
+
diff --git a/examples/anonymize_entities.sql b/examples/anonymize_entities.sql
@@ -0,0 +1,29 @@
+-- Extract entities example: https://github.com/anthropics/anthropic-cookbook/tree/main/tool_use
+\getenv anthropic_api_key ANTHROPIC_API_KEY
+
+SELECT ai.anthropic_generate( 'claude-3-5-sonnet-20240620'
+, jsonb_build_array(
+    jsonb_build_object(
+      'role', 'user',
+      'content', 'John works at Google in New York. He met with Sarah, the CEO of Acme Inc., last week in San Francisco.'
+    )
+  )
+, _max_tokens => 4096
+, _api_key => $1
+, _tools => jsonb_build_array(
+    jsonb_build_object(
+      'name', 'anonymize_recognized_entities',
+      'description', 'Anonymize recognized entities like people names, locations, companies. The output should be the original text with entities replaced by the entities recognized in the input text. Example input: John works at Google in New York. Example output: :PERSON works at :COMPANY in :CITY.',
+      'input_schema', jsonb_build_object(
+        'type', 'object',
+        'anonymized', jsonb_build_object(
+            'type', 'text',
+            'description', 'The original text anonymized with entities replaced by placeholders with the type of entity recognized.'
+         ),
+        'required', jsonb_build_array('anonimized_text')
+        )
+      )
+    )
+  ) AS result
+\bind :anthropic_api_key
+\g
diff --git a/examples/extract_entities.sql b/examples/extract_entities.sql
@@ -0,0 +1,88 @@
+-- Extract entities example: https://github.com/anthropics/anthropic-cookbook/tree/main/tool_use
+\getenv anthropic_api_key ANTHROPIC_API_KEY
+
+CREATE OR REPLACE FUNCTION public.detect_entities(input_text text)
+RETURNS TABLE(entity_name text, entity_type text, entity_context text)
+AS $$
+DECLARE
+    api_response jsonb;
+    entities_json jsonb;
+BEGIN
+    SELECT ai.anthropic_generate(
+        'claude-3-5-sonnet-20240620',
+        jsonb_build_array(
+            jsonb_build_object(
+                'role', 'user',
+                'content', input_text
+            )
+        ),
+        _max_tokens => 4096,
+        _tools => jsonb_build_array(
+            jsonb_build_object(
+                'name', 'print_entities',
+                'description', 'Prints extract named entities.',
+                'input_schema', jsonb_build_object(
+                    'type', 'object',
+                    'properties', jsonb_build_object(
+                        'entities', jsonb_build_object(
+                            'type', 'array',
+                            'items', jsonb_build_object(
+                                'type', 'object',
+                                'properties', jsonb_build_object(
+                                    'name', jsonb_build_object('type', 'string', 'description', 'The extracted entity name.'),
+                                    'type', jsonb_build_object('type', 'string', 'description', 'The entity type (e.g., PERSON, ORGANIZATION, LOCATION).'),
+                                    'context', jsonb_build_object('type', 'string', 'description', 'The context in which the entity appears in the text.')
+                                ),
+                                'required', jsonb_build_array('name', 'type', 'context')
+                            )
+                        )
+                    ),
+                    'required', jsonb_build_array('entities')
+                )
+            )
+        )
+    ) INTO api_response;
+
+    entities_json := jsonb_extract_path_text(api_response::jsonb, 'content', '1', 'input', 'entities')::jsonb;
+
+    RETURN QUERY
+    SELECT 
+        e->>'name' AS entity_name,
+        e->>'type' AS entity_type,
+        e->>'context' AS entity_context
+    FROM jsonb_array_elements(entities_json) AS e;
+
+EXCEPTION
+    WHEN OTHERS THEN
+        RAISE NOTICE 'An error occurred: %', SQLERRM;
+        RAISE NOTICE 'API Response: %', api_response;
+        RETURN;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE OR REPLACE FUNCTION public.anonymize_text(input_text text)
+RETURNS text
+AS $$
+DECLARE
+    entity record;
+    anonymized text := input_text;
+BEGIN
+    -- Replace entities with their types, starting with the longest entities
+    FOR entity IN (
+        SELECT entity_name, entity_type
+        FROM public.detect_entities(input_text)
+        ORDER BY length(entity_name) DESC
+    )
+    LOOP
+        anonymized := regexp_replace(
+            anonymized, 
+            '\m' || regexp_replace(entity.entity_name, '([().\\*+?])', '\\\1', 'g') || '\M', 
+            ':' || entity.entity_type || ':', 
+            'gi'
+        );
+    END LOOP;
+
+    RETURN anonymized;
+END;
+$$ LANGUAGE plpgsql;
+
diff --git a/examples/summarize_article.sql b/examples/summarize_article.sql
@@ -0,0 +1,82 @@
+CREATE OR REPLACE FUNCTION public.summarize_article(article_text text)
+RETURNS TABLE(
+    author text,
+    topics text[],
+    summary text,
+    coherence integer,
+    persuasion numeric
+)
+AS $$
+DECLARE
+    api_response jsonb;
+    summary_json jsonb;
+BEGIN
+    -- Call the Anthropic API using the ai.anthropic_generate function with print_summary tool
+    SELECT ai.anthropic_generate(
+        'claude-3-5-sonnet-20240620',
+        jsonb_build_array(
+            jsonb_build_object(
+                'role', 'user',
+                'content', format('Please summarize the following article using the print_summary tool: %s', article_text)
+            )
+        ),
+        _max_tokens => 4096,
+        _tools => jsonb_build_array(
+            jsonb_build_object(
+                'name', 'print_summary',
+                'description', 'Prints a summary of the article.',
+                'input_schema', jsonb_build_object(
+                    'type', 'object',
+                    'properties', jsonb_build_object(
+                        'author', jsonb_build_object('type', 'string', 'description', 'Name of the article author'),
+                        'topics', jsonb_build_object(
+                            'type', 'array',
+                            'items', jsonb_build_object('type', 'string'),
+                            'description', 'Array of topics, e.g. ["tech", "politics"]. Should be as specific as possible, and can overlap.'
+                        ),
+                        'summary', jsonb_build_object('type', 'string', 'description', 'Summary of the article. One or two paragraphs max.'),
+                        'coherence', jsonb_build_object('type', 'integer', 'description', 'Coherence of the article''s key points, 0-100 (inclusive)'),
+                        'persuasion', jsonb_build_object('type', 'number', 'description', 'Article''s persuasion score, 0.0-1.0 (inclusive)')
+                    ),
+                    'required', jsonb_build_array('author', 'topics', 'summary', 'coherence', 'persuasion')
+                )
+            )
+        )
+    ) INTO api_response;
+
+    -- Extract the summary from the tool use response
+    summary_json := jsonb_path_query(api_response, '$.content[*] ? (@.type == "tool_calls").tool_calls[*].function.arguments')::jsonb;
+
+    -- Return the extracted summary information
+    RETURN QUERY
+    SELECT
+        summary_json->>'author',
+        array(SELECT jsonb_array_elements_text(summary_json->'topics')),
+        summary_json->>'summary',
+        (summary_json->>'coherence')::integer,
+        (summary_json->>'persuasion')::numeric;
+
+EXCEPTION
+    WHEN OTHERS THEN
+        RAISE NOTICE 'An error occurred: %', SQLERRM;
+        RAISE NOTICE 'API Response: %', api_response;
+        RETURN;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Test the function
+-- Content From URL: https://docs.timescale.com/use-timescale/latest/compression
+select * from summarize_article($$
+  #  Compression
+
+  Time-series data can be compressed to reduce the amount of storage required, and increase the speed of some queries. This is a cornerstone feature of Timescale. When new data is added to your database, it is in the form of uncompressed rows. Timescale uses a built-in job scheduler to convert this data to the form of compressed columns. This occurs across chunks of Timescale hypertables.
+
+   Timescale charges are based on how much storage you use. You don't pay for a fixed storage size, and you don't need to worry about scaling disk size as your data grows; We handle it all for you. To reduce your data costs further, use compression, a data retention policy, and tiered storage.
+
+$$);
+-- -[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+-- author     | Timescale Documentation
+-- topics     | {"database management","data compression","time-series data","data storage optimization"}
+-- summary    | The article discusses Timescale's compression feature for time-series data. It explains that compression is a key feature of Timescale, designed to reduce storage requirements and improve query performance. The process involves converting newly added uncompressed row data into compressed columns using a built-in job scheduler. This compression occurs across chunks of Timescale hypertables. The article also mentions that Timescale's pricing model is based on actual storage used, with automatic scaling. To further reduce data costs, users are advised to employ compression, implement data retention policies, and utilize tiered storage.
+-- coherence  | 95
+-- persuasion | 0.8
diff --git a/projects/extension/sql/idempotent/003-anthropic.sql b/projects/extension/sql/idempotent/003-anthropic.sql
@@ -39,9 +39,9 @@ as $python$
     if temperature is not None:
         args["temperature"] = temperature
     if tool_choice is not None:
-        args["tool_choice"] = json.dumps(tool_choice)
+        args["tool_choice"] = json.loads(tool_choice)
     if tools is not None:
-        args["tools"] = json.dumps(tools)
+        args["tools"] = json.loads(tools)
     if top_k is not None:
         args["top_k"] = top_k
     if top_p is not None:
@@ -52,4 +52,4 @@ as $python$
 $python$
 language plpython3u volatile parallel safe security invoker
 set search_path to pg_catalog, pg_temp
-;
+;