v0.0.1 RC

kindsys · Mar 7, 2024 · 6ac0873 · 6ac0873
commit 6ac0873
Show file tree

Hide file tree

Showing 51 changed files with 4,056 additions and 0 deletions.
diff --git a/.env.example b/.env.example
@@ -0,0 +1,135 @@
+#-------------------------------------------------------------------------------
+# LLM APIs settings 
+#-------------------------------------------------------------------------------
+# NOTE: 
+# - OLAW can use both Open AI and Ollama at the same time, but needs at least one of the two.
+# - Ollama is one of the simplest ways to get started running models locally: https://ollama.ai/
+OLLAMA_API_URL="http://localhost:11434"
+
+#OPENAI_API_KEY="" 
+#OPENAI_ORG_ID=""
+
+# NOTE: OPENAI_BASE_URL can be used to interact with OpenAI-compatible providers.
+# For example:
+# - https://huggingface.co/blog/tgi-messages-api
+# - https://docs.vllm.ai/en/latest/getting_started/quickstart.html#using-openai-completions-api-with-vllm
+# Make sure to specify both OPENAI_BASE_URL and OPENAI_COMPATIBLE_MODEL when doing so.
+#OPENAI_BASE_URL=""
+#OPENAI_COMPATIBLE_MODEL=""
+
+#-------------------------------------------------------------------------------
+# Basic Rate Limiting
+#-------------------------------------------------------------------------------
+# NOTE:
+# - This set of variables allows for applying rate-limiting to individual API routes. 
+# - See https://flask-limiter.readthedocs.io/en/stable/ for details and syntax.
+RATE_LIMIT_STORAGE_URI="memory://"
+API_MODELS_RATE_LIMIT="1/second"
+API_EXTRACT_SEARCH_STATEMENT_RATE_LIMIT="60 per 1 hour"
+API_SEARCH_RATE_LIMIT="120 per 1 hour"
+API_COMPLETE_RATE_LIMIT="60 per 1 hour"
+
+#-------------------------------------------------------------------------------
+# Court Listener API settings
+#-------------------------------------------------------------------------------
+# NOTE: The chatbot can make calls to the Court Listener API to pull relevant court opinions.
+COURT_LISTENER_MAX_RESULTS=4 # NOTE: To be adjusted based on the context lenght of the model used for inference.
+COURT_LISTENER_API_URL="https://www.courtlistener.com/api/rest/v3/"
+COURT_LISTENER_BASE_URL="https://www.courtlistener.com"
+
+#-------------------------------------------------------------------------------
+# Extract Search Statement Prompt
+#-------------------------------------------------------------------------------
+# NOTE: This prompt is used to identify a legal question and make it into a search statement.
+EXTRACT_SEARCH_STATEMENT_PROMPT="
+Identify whether there is a legal question in the following message and, if so, transform it into a search statement. 
+
+If the legal question can be answered by searching case law: 
+- Follow the COURTLISTENER instructions to generate a search statment
+
+If there are multiple questions, only consider the last one.
+
+---
+
+COURTLISTENER instructions:
+Here are instructions on how to generate an effective search statement for that platform.
+
+## Keywords
+Identify and extract keywords from the question. If a term can be both singular or plural, use both (i.e: \"pony\" and \"ponies\").
+Use quotation marks around proper nouns and terms that should not be broken up.
+
+## Logical connectors
+Separate the different keywords and parts of the search statement with logical connectors such as AND, OR, NOT.
+
+## Dates and date ranges
+If a date (or element of a date) is present in the question, you can add it to the search statement as such to define a range: 
+dateFiled:[YYYY-MM-DD TO YYYY-MM-DD]
+
+If only the year is present, set MM and DD to 01 and 01.
+If only the start year is present, assume the end date is the last day of that year.
+Do not wrap dateField statement in parentheses.
+
+## Name of cases
+If the question features the name of a case, you can add it to the search statement as such:
+caseName:(\"name of a case\")
+
+Tip to recognize case names: they often feature v. or vs. As in: \"Roe v. Wade\".
+
+## Name of court, state or jurisdiction
+If the question features the name of a court or of a state, you can add it to the search statement as such:
+court:(\"name of a court, state or jurisdiction\")
+
+## Excluded terms
+The following terms do not help make good search statements and MUST NOT be present in the search statement: law, laws, case, cases, precedent, precedents, adjudicated.
+
+## Other fields available
+dateFiled, caseName and court are the only fields you should use. Do not invent other fields. Everything else is a search term.
+
+---
+
+Return your response as a JSON object containing the following keys:
+- search_statement: String representing the generated search statement. Is empty if the text does not contain a legal question.
+- search_target: String representing the target API for that search statement. Can be \"courtlistener\" or empty.
+
+Here is the message you need to analyze: 
+"
+
+#-------------------------------------------------------------------------------
+# Text Completion Prompts
+#-------------------------------------------------------------------------------
+# NOTE: {history} {rag} and {request} are reserved keywords.
+TEXT_COMPLETION_BASE_PROMPT = "
+{history}
+
+You are a helpful and friendly AI legal assistant.
+Your explanation of legal concepts should be easy to understand while still being accurate and detailed. Explain any legal jargon, and do not assume knowledge of any related concepts.
+
+{rag}
+
+Request: {request}
+
+Helpful response (plain text, no markdown): 
+"
+
+# NOTE: Injected into BASE prompt when relevant.
+# Inspired by LangChain's default RAG prompt.
+# {context} is a reserved keyword.
+TEXT_COMPLETION_RAG_PROMPT = "
+Here is context to help you fulfill the user's request:
+{context}
+----------------
+When possible, use context to answer the request from the user.
+Ignore context if it is empty or irrelevant.
+If you don't know the answer, just say that you don't know, don't try to make up an answer. 
+Cite and quote your sources whenever possible. Use their number (for example: [1]) to reference them.
+
+"
+
+# NOTE: Injected into BASE prompt when relevant.
+# NOTE: {history} is a reserved keyword
+TEXT_COMPLETION_HISTORY_PROMPT = "
+Here is a summary of the conversation thus far:
+{history}
+----------------
+
+"
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+.env.example linguist-language=Shell
+*.env.example linguist-language=Shell
diff --git a/.github/screenshots/idle-01.png b/.github/screenshots/idle-01.png
diff --git a/.github/screenshots/inspect-01.png b/.github/screenshots/inspect-01.png
diff --git a/.github/screenshots/inspect-02.png b/.github/screenshots/inspect-02.png
diff --git a/.github/screenshots/question-01.png b/.github/screenshots/question-01.png
diff --git a/.github/screenshots/question-02.png b/.github/screenshots/question-02.png
diff --git a/.github/screenshots/question-03.png b/.github/screenshots/question-03.png
diff --git a/.github/screenshots/question-04.png b/.github/screenshots/question-04.png
diff --git a/.github/screenshots/question-05.png b/.github/screenshots/question-05.png
diff --git a/.github/screenshots/question-06.png b/.github/screenshots/question-06.png
diff --git a/.github/screenshots/settings-01.png b/.github/screenshots/settings-01.png
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,9 @@
+.env
+*.pyc
+.DS_Store
+chromadb/
+test.py
+TODO.md
+runs/
+_*/
+*.zip
diff --git a/.vscode/extensions.json b/.vscode/extensions.json
@@ -0,0 +1,8 @@
+{
+  "recommendations": [
+    "ms-python.black-formatter",
+    "ms-python.flake8",
+    "tobermory.es6-string-html",
+    "standard.vscode-standard"
+  ]
+}
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,6 @@
+{
+  "editor.formatOnSave": true,
+  "python.formatting.provider": "black",
+  "standard.enable": false,
+  "standard.autoFixOnSave": true
+}
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,10 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+  - family-names: Cargnelutti
+    given-names: Matteo
+  - family-names: Cushman
+    given-names: Jack
+title: "Open Legal AI Workbench (OLAW)"
+version: 0.0.1
+date-released: 2024-03-06
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Harvard Library Innovation Laboratory
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		.env.example linguist-language=Shell
		*.env.example linguist-language=Shell