An index to use with Large Language Models (LLMs) to answer questions about Metaflow documentation.
First a warning, Pytorch is required :) You can change the code to use OpenAI Embeddings instead, but that will incur costs.
Make a new virtual env. Then install the requirements
pip install -r requirements.txt
An index-file and a vector collection must exist before you can ask questions.
You can run the file query_metaflow.py
to create an index, or you can download
an index file
here
and a vector collection
here.
Note that the preloaded index file above only contains a subset of the Metaflow
repository. You can edit the query_metaflow.py
to not exclude any folders. May
be interesting to also add the Metaflow implementation repository to the index.
Usage requires at least an OpenAI API key in your environment variables. If you want to recreate the index, a Github API key is needed as well (public repository read access at least). With the current configuration, creating an index uses free HuggingFace models, but asking queries requires OpenAI quota.
To get more relevant answers, increase the n_sources
parameter (default is 2)
python query_metaflow.py -n 3
(Using Davinci)
Enter query: Write an example of a simple flow in Python
INFO:root:> [query] Total LLM token usage: 2233 tokens
INFO:root:> [query] Total embedding token usage: 9 tokens
The following is an example of a simple flow in Python using the FlowSpec class:
from metaflow import FlowSpec, step
class SimpleFlow(FlowSpec):
# Define parameters and data triggers here
@step
def start(self):
# perform some action
self.next(self.middle)
@step
def middle(self):
# perform some action
self.next(self.end)
@step
def end(self):
# perform some action
pass
if __name__ == '__main__':
SimpleFlow()
The FlowSpec class is the base class for all Metaflow flows, and it exposes a few methods and attributes that you can use to construct a flow. This includes the @step
decorator, which
is used to define the starting and ending points of the flow, and the next
method, which is used to move from one step to the next. Parameter objects and data triggers can also be used
to define the flow. Metaflow infers a directed (typically acy