Merge pull request #16 from amoffat/dev

Release 1.0.1
amoffat · Sep 12, 2023 · 3b2154d · 3b2154d
2 parents 8bd523a + 31a5580
commit 3b2154d
Show file tree

Hide file tree

Showing 37 changed files with 552 additions and 180 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,11 @@
 # Changelog
 
+## 1.0.1 - 9/12/23
+
+- Simplify docs
+- Add missing Postgres api docs
+- Generalizing `validation_only` to the `Bifrost` superclass
+
 ## 1.0.0 - 9/11/23
 
 - Subquery support

diff --git a/README.md b/README.md
@@ -1,7 +1,11 @@
 # HeimdaLLM
 
-> Heimdall, the watchman of the gods, dwelt at its entrance, where he guarded Bifrost,
-> the shimmering path connecting the realms.
+Pronounced `[ˈhaɪm.dɔl.əm]` or _HEIM-dall-EM_
+
+HeimdaLLM is a robust static analysis framework for validating that LLM-generated
+structured output is safe. It currently supports SQL.
+
+In simple terms, it helps makes sure that AI won't wreck your systems.
 
 [![Heimdall](https://raw.githubusercontent.com/amoffat/HeimdaLLM/main/docs/source/images/heimdall.png)](https://heimdallm.ai)
 [![Build status](https://github.com/amoffat/HeimdaLLM/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/amoffat/HeimdaLLM/actions)
@@ -12,77 +16,63 @@
 [![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)
 [![Coverage Status](https://coveralls.io/repos/github/amoffat/HeimdaLLM/badge.svg?branch=dev)](https://coveralls.io/github/amoffat/HeimdaLLM?branch=dev)
 
-HeimdaLLM safely bridges the gap between untrusted human input and trusted
-machine-readable output by augmenting LLMs with a robust validation framework. This
-enables you externalize LLM technology to your users, so that you can do things like
-execute trusted SQL queries from their untrusted input.
+Consider the following natural-language database query:
 
-To accomplish this, HeimdaLLM introduces a new technology, the 🌈✨
-[Bifrost](https://docs.heimdallm.ai/en/latest/bifrost.html), composed of 4 parts: an LLM
-prompt envelope, an LLM integration, a grammar, and a constraint validator. These 4
-components operate as a single unit—a Bifrost—which is capable of translating untrusted
-human input into trusted machine output.
+```
+how much have i spent renting movies, broken down by month?
+```
 
-✨ **This allows you to perform magic** ✨
+From this query (and a little bit of context), an LLM can produce the following SQL
+query:
+
+```sql
+SELECT
+   strftime('%Y-%m', payment.payment_date) AS month,
+   SUM(payment.amount) AS total_amount
+FROM payment
+JOIN rental ON payment.rental_id=rental.rental_id
+JOIN customer ON payment.customer_id=customer.customer_id
+WHERE customer.customer_id=:customer_id
+GROUP BY month
+LIMIT 10;
+```
 
-Imagine giving your users natural language access to their data in your database,
-without having to worry about dangerous queries. This is an actual query on the [Sakila
-Sample
-Database](https://www.kaggle.com/datasets/atanaskanev/sqlite-sakila-sample-database):
+But how can you ensure the LLM-generated query is safe and that it only accesses
+authorized data?
 
-```python
-traverse("Show me the movies I rented the longest, and the number of days I had them for.")
-```
+HeimdaLLM performs static analysis on the generated SQL to ensure that only certain
+columns, tables, and functions are used. It also automatically edits the query to add a
+`LIMIT` and to remove forbidden columns. Lastly, it ensures that there is a column
+constraint that would restrict the results to only the user's data.
+
+It does all of this locally, without AI, using good ol' fashioned grammars and parsers:
 
 ```
 ✅ Ensuring SELECT statement...
 ✅ Resolving column and table aliases...
 ✅ Allowlisting selectable columns...
-   ✅ Removing 4 forbidden columns...
+   ✅ Removing 2 forbidden columns...
 ✅ Ensuring correct row LIMIT exists...
-   ✅ Lowering row LIMIT to 5...
+   ✅ Lowering row LIMIT to 10...
 ✅ Checking JOINed tables and conditions...
 ✅ Checking required WHERE conditions...
 ✅ Ensuring query is constrained to requester's identity...
 ✅ Allowlisting SQL functions...
+   ✅ strftime
+   ✅ SUM
 ```
 
-| Title           | Rental Date             | Return Date             | Rental Days |
-| --------------- | ----------------------- | ----------------------- | ----------- |
-| OUTLAW HANKY    | 2005-08-19 05:48:12.000 | 2005-08-28 10:10:12.000 | 9.181944    |
-| BOULEVARD MOB   | 2005-08-19 07:06:51.000 | 2005-08-28 10:35:51.000 | 9.145139    |
-| MINDS TRUMAN    | 2005-08-02 17:42:49.000 | 2005-08-11 18:14:49.000 | 9.022222    |
-| AMERICAN CIRCUS | 2005-07-12 16:37:55.000 | 2005-07-21 16:04:55.000 | 8.977083    |
-| LADY STAGE      | 2005-07-28 10:07:04.000 | 2005-08-06 08:16:04.000 | 8.922917    |
-
-You can safely run this example here:
-
-[![Open in GitHub Codespaces](https://img.shields.io/badge/Open%20in-Codespaces-purple.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=656570421)
-
-or [view the read-only notebook](./notebooks/demo.ipynb)
-
-# 📋 Explanation
-
-So, what is actually happening above?
-
-1. Unsafe free-form input is provided, presumably from some front end user interface.
-1. That unsafe input is wrapped in a prompt envelope, producing a prompt with additional
-   context to help an LLM produce a correct query.
-1. The unsafe prompt is sent to an LLM of your choice, which then produces an unsafe
-   SQL query.
-1. The LLM response is parsed by a strict grammar which defines only the SQL features
-   that are allowed.
-1. If parsing succeeds, we know at the very least we're dealing with a valid SQL query
-   albeit an untrusted one.
-1. Different features of the parsed query are extracted for validation.
-1. A soft validation pass is performed on the extracted features, and we potentially
-   modify the query to be compliant, for example, to add a `LIMIT` clause, or to remove
-   disallowed columns.
-1. A hard validation pass is performed with your custom constraints to ensure that the
-   query is only accessing allowed tables, columns, and functions, while containing
-   required conditions.
-1. If validation succeeds, the resulting SQL query can then be sent to the database.
-1. If validation fails, you'll see a helpful exception explaining exactly why.
+The validated query can then be executed:
+
+| month   | total_amount |
+| ------- | ------------ |
+| 2005-05 | 4.99         |
+| 2005-06 | 22.95        |
+| 2005-07 | 100.78       |
+| 2005-08 | 87.82        |
+
+Want to get started quickly? Go
+[here](https://docs.heimdallm.ai/en/latest/quickstart/index.html).
 
 # 🥽 Safety
 
@@ -94,7 +84,7 @@ me](https://github.com/sponsors/amoffat) or [inquire about interest in a commerc
 license](https://forms.gle/frEPeeJx81Cmwva78).
 
 To understand some of the potential vulnerabilities, take a look at the [attack
-surface](https://docs.heimdallm.ai/en/latest/attack_surface.html) to see the risks and
+surface](https://docs.heimdallm.ai/en/latest/attack-surface.html) to see the risks and
 the mitigations.
 
 # 📚 Database support

diff --git a/docs/source/api/abc/index.rst b/docs/source/api/abc/index.rst
@@ -9,7 +9,7 @@ intended for direct use.
     bifrost
     envelope
     validator
-    llm_integration
+    llm-integration
     context
 
     sql/index
diff --git a/docs/source/api/abc/llm_integration.rst → docs/source/api/abc/llm-integration.rst b/docs/source/api/abc/llm_integration.rst → docs/source/api/abc/llm-integration.rst
diff --git a/docs/source/api/bifrosts/index.rst b/docs/source/api/bifrosts/index.rst
@@ -1,8 +1,8 @@
 Bifrosts
 ========
 
-:doc:`Bifrosts </bifrost>` are the fundamental unit of translating untrusted input into
-trusted output. This document will expand as we add more Bifrosts.
+:doc:`Bifrosts </architecture/bifrost>` are the fundamental unit of translating
+untrusted input into trusted output. This document will expand as we add more Bifrosts.
 
 .. toctree::
 

diff --git a/docs/source/api/bifrosts/sql/index.rst b/docs/source/api/bifrosts/sql/index.rst
@@ -6,9 +6,11 @@ database, please participate in `this poll.
 <https://github.com/amoffat/HeimdaLLM/discussions/2>`_
 
 .. toctree::
+    :maxdepth: 3
 
     sqlite/index
     mysql/index
+    postgres/index
     exceptions
     common
 
diff --git a/docs/source/api/bifrosts/sql/postgres/index.rst b/docs/source/api/bifrosts/sql/postgres/index.rst
@@ -0,0 +1,6 @@
+Postgres
+========
+
+.. toctree::
+
+    select/index
diff --git a/docs/source/api/bifrosts/sql/postgres/select/bifrost.rst b/docs/source/api/bifrosts/sql/postgres/select/bifrost.rst
@@ -0,0 +1,13 @@
+SQL Select Bifrost
+==================
+
+The SQL Select Bifrost produces a trusted SQL Select statement. It uses the following
+components:
+
+* :class:`SQLPromptEnvelope <heimdallm.bifrosts.sql.postgres.select.envelope.PromptEnvelope>`
+* :class:`SQLConstraintValidator <heimdallm.bifrosts.sql.postgres.select.validator.ConstraintValidator>`
+* `Grammar <https://github.com/amoffat/HeimdaLLM/blob/dev/heimdallm/bifrosts/sql/postgres/select/grammar.lark>`_
+
+.. autoclass:: heimdallm.bifrosts.sql.postgres.select.bifrost.Bifrost
+    :members:
+    :inherited-members:
diff --git a/docs/source/api/bifrosts/sql/postgres/select/envelope.rst b/docs/source/api/bifrosts/sql/postgres/select/envelope.rst
@@ -0,0 +1,12 @@
+SQL Select Envelope
+===================
+
+.. CAUTION::
+
+    The ``db_schema`` argument of the constructor is passed to the LLM. This is how the
+    LLM knows how to construct the query. If this concerns you, limit the information
+    that you include in the schema.
+
+.. autoclass:: heimdallm.bifrosts.sql.postgres.select.envelope.PromptEnvelope
+    :members:
+    :inherited-members:
diff --git a/docs/source/api/bifrosts/sql/postgres/select/index.rst b/docs/source/api/bifrosts/sql/postgres/select/index.rst
@@ -0,0 +1,8 @@
+Select
+======
+
+.. toctree::
+
+    bifrost
+    envelope
+    validator
diff --git a/docs/source/api/bifrosts/sql/postgres/select/validator.rst b/docs/source/api/bifrosts/sql/postgres/select/validator.rst
@@ -0,0 +1,8 @@
+SQL Select Validator
+====================
+
+.. autoclass:: heimdallm.bifrosts.sql.postgres.select.validator.ConstraintValidator
+    :members:
+    :inherited-members:
+
+
diff --git a/docs/source/api/index.rst b/docs/source/api/index.rst
@@ -7,5 +7,5 @@
     :maxdepth: 4
 
     bifrosts/index
-    llm_providers/index
+    llm-providers/index
     abc/index
diff --git a/docs/source/api/llm_providers/index.rst → docs/source/api/llm-providers/index.rst b/docs/source/api/llm_providers/index.rst → docs/source/api/llm-providers/index.rst
diff --git a/...ce/api/llm_providers/providers/openai.rst → ...ce/api/llm-providers/providers/openai.rst b/...ce/api/llm_providers/providers/openai.rst → ...ce/api/llm-providers/providers/openai.rst
diff --git a/docs/source/bifrost.rst → docs/source/architecture/bifrost.rst b/docs/source/bifrost.rst → docs/source/architecture/bifrost.rst
@@ -1,6 +1,11 @@
 🌈 Bifrost
 ==========
 
+.. DANGER::
+
+    Constructing Bifrosts manually is an advanced topic. Most of the time, you want to
+    use :meth:`Bifrost.validation_only <heimdallm.bifrost.Bifrost.validation_only>`.
+
 The Bifrost is the key technology that enables the translation of untrusted human input
 into trusted machine-readable input. It is composed of 4 parts:
 
@@ -42,7 +47,7 @@ out the structured data from the delimiters that you instructed the LLM to use.
 **********************
 
 The :term:`LLM <LLM>` itself is the brains of the Bifrost. We view it as a black box
-with a :doc:`well-defined interface </api/abc/llm_integration>`. Because of this,
+with a :doc:`well-defined interface </api/abc/llm-integration>`. Because of this,
 HeimdaLLM aims to make it easy to swap out LLMs in your Bifrost, so that as LLM
 capabilities and prices change, your system can adapt to use them with minimal effort.
 
@@ -51,7 +56,7 @@ Current LLM integrations:
 .. toctree::
     :maxdepth: 2
 
-    api/llm_providers/index
+    /api/llm-providers/index
 
 📜 The grammar
 **************

diff --git a/docs/source/architecture/index.rst b/docs/source/architecture/index.rst
@@ -0,0 +1,7 @@
+📐 Architecture
+===============
+
+.. toctree::
+    :maxdepth: 4
+
+    bifrost
diff --git a/docs/source/attack-surface/index.rst b/docs/source/attack-surface/index.rst
@@ -0,0 +1,14 @@
+🛡️ Attack Surface
+=================
+
+:term:`LLMs <LLM>` are vulnerable to :term:`prompt injection` attacks, which can be used
+to construct responses that are dangerous to the system. This is the primary reason that
+LLMs have not seen widespread adoption as :term:`externalized <externalizing>` products.
+
+Prompt injection can have different consequences for different types of structured
+outputs.
+
+.. toctree::
+    :glob:
+
+    sql
diff --git a/docs/source/attack_surface/sql.rst → docs/source/attack-surface/sql.rst b/docs/source/attack_surface/sql.rst → docs/source/attack-surface/sql.rst
diff --git a/docs/source/attack_surface.rst b/docs/source/attack_surface.rst
diff --git a/docs/source/blog/posts/safe-sql-execution.rst b/docs/source/blog/posts/safe-sql-execution.rst
@@ -359,23 +359,23 @@ authoritative id.
 
 
 
-🧠 Constraint validation
-------------------------
+🧠 Static analysis
+------------------
 
 .. figure:: /images/smiley.jpg
 
-    "Where did it come from? What's the access?"
+    "What's the access?"
 
 
-Constraint validation uses a real grammar to parse SQL queries into an AST. Static
-analysis can then be performed on this parse tree to determine which tables and columns
-are being used, how they're being used, if required conditions are present, and a range
-of other features.
+Static analysis uses a real grammar to parse SQL queries into an AST, which can then be
+analyzed to determine which tables and columns are being used, how they're being used,
+if required conditions are present, and a range of other features.
 
-Additionally, these frameworks may automatically add nodes or replace nodes on the AST
-to help ensure the SQL query conforms to constraint validation. In other words, the
-query may be automatically edited to be compliant. Examples of this are to ensure a
-correct ``LIMIT`` on the query, or remove a forbidden column from the ``SELECT``.
+Additionally, these static analysis frameworks may automatically add nodes or replace
+nodes on the AST to help ensure the SQL query conforms to constraints. In
+other words, the query may be automatically edited to be compliant. Examples of this are
+to ensure a correct ``LIMIT`` on the query, or remove a forbidden column from the
+``SELECT``.
 
 These frameworks can be treated as denylists or allowlists. You can list which tables,
 columns, joins, and functions are allowed, or which are denied. This allows for a higher
@@ -423,10 +423,10 @@ are playing an increasing role in the future of UI and UX, and relational databa
 not going away any time soon. For them to work together effectively, tooling needs to
 bridge the gap to make them safer.
 
-The most promising solutions are cloned databases and constraint validators, because
-they are theoretically complete solutions that can offer the highest levels of security.
-They vary primarily in their complexity and flexibility: cloned databases views are a
-high-complexity allowlist, while constraint validators are a low-complexity allowlist or
+The most promising solutions are cloned databases and static analysis, because they are
+theoretically complete solutions that can offer the highest levels of security. They
+vary primarily in their complexity and flexibility: cloned databases views are a
+high-complexity allowlist, while static analysis is a low-complexity allowlist or
 denylist.
 
 Other, non-complete solutions should not be considered if you value the safety of your

diff --git a/docs/source/faq.rst b/docs/source/faq.rst
@@ -25,11 +25,19 @@ What databases are supported?
 
 * Sqlite
 * MySQL
+* Postgres
 
 There is rapid development for the other top relational SQL databases. To help us
 prioritize, please `vote here <https://github.com/amoffat/HeimdaLLM/discussions/2>`_ on
 which database you would like to see supported:
 
+Do HeimdaLLM use an LLM?
+************************
+
+For static analysis, no it does not. It uses good old fashioned grammars and parsers.
+However, we do include a lightweight framework to build a complete
+natural-language-to-safe-SQL workflow. See :doc:`this quickstart </quickstart/llm>`.
+
 Do I need to purchase a commercial license?
 *******************************************