Prompting
practices
build
base LLMs and instruction-tuned LLMs
Unicorn-> magical forest
the capital of France->France's largest city
the capital of France->Paris
RLHF reinforcement learning from human feedback
Example:Something about Alan Turing
- Principle 1: Write clear and specific instructions
- Principle 2: Give the model time to “think”
- Limit the number of words/sentences/characters.
- Ask it to focus on the aspects that are relevant to the intended audience.
- Ask it to extract information and organize it in a table.
- Summaries include topics that are not related to the topic of focus.
multiple tasks
topic
问题要:公开公平公正
Prompt
supervised learning
input-output :context-completion
next word
prompting lollipop 拆分
system, user, and assistant
System
User
Assistant
temperature
tokens use calculate - API key
traditional supervised machine learning workflow:six months or a year to build
ChatGPT :build in minutes or hours
delimiter :hashtags #
是否违反策略
OpenAI Moderation API
Avoid prompt injection:
1.delimiters and clear instructions
2.use additional prompt
string replace function
可以算数
complex tasks : simpler subtasks
chaining multiple prompts
Workflow
helper functions
string format - JSON - list
text embeddings
moderation API
latency ,additional call , additional tokens
多个步骤 如果有合适的答案应及早终结
Few-shot
one-shot
extra junk
summarize articles
good answer or not
different dimensions
BLEU score:bilingual evaluation understudy,即:双语互译质量评估辅助工具。专业的就是好。s
deal (expert) human written
evals framework
building LLM applications
composition and modularity
- Direct API calls to OpenAI
- API calls through LangChain:
- Prompts
- Models
- Output parsers
Type - text
Prompt templates
Output Parsers : JSON JSON方便结构化、解析数据
History
ConversationBufferMemory
ConversationBufferWindowMemory
Context
verbose=False
ConversationTokenBufferMemory
ConversationSummaryMemory:从简
Memory & database
- LLMChain
- Sequential Chains
- SimpleSequentialChain
- SequentialChain
- Router Chain
Embeddings vector
Vector database
处理document,先返回最合适的部分,再用gpt返回答案。
- Example generation
- Manual evaluation (and debuging)
- LLM-assisted evaluation
- LangChain evaluation platform
- Using built in LangChain tools: DuckDuckGo search and Wikipedia
- Defining your own tools
Do some complex task
80+
RecursiveCharacterTextSplitter
TokenTextSplitter
MarkdownHeaderTextSplitter
retrieval augmented generation (RAG)
Vectorstores
Similarity Search
Failure modes
Maximum marginal relevance
working with metadata
compression:LLMChainExtractor
RetrievalQA Chain
LangChain plus platform
- Map_reduce chain
- Refine chain
- Map_rerank chain
Memory
ConversationalRetrievalChain
- LLM's don't always produce the same results. The results you see in this notebook may differ from the results you see in the video.
- Notebooks results are temporary. Download the notebooks to your local machine if you wish to save your results.
Bind
Fallbacks
Pydantic - JSON validation
Linux Expression Syntax
标签 - 抽取
Extraction is similar to tagging, but used for extracting multiple pieces of information.
目的都是JSON parser
Tools : search、math、call api
Routing就是Multi Tools的ifelse
OpenAIFunctionsAgentOutputParser
AgentExecutor
ConversationBufferMemory
chatbot
meta-llama BasicModelRunner
Pytorch
Huggingface
meta-Llama VS ChatGPT
Tokenizing data
transformers - AutoTokenizer
Tokenize
Padding and truncation
Prepare instruction dataset
Tokenize the instruction dataset
Prepare test/train splits
- Choose the base model.
- Load data.
- Train it. Returns a model ID, dashboard, and playground interface.
Setup a really basic evaluation function
Evaluate all the data
Try the ARC benchmark
Word Embeddings
Sentence Embeddings
Article Embeddings
Vector Database for semantic Search
Dense Retrieval
Improving Keyword Search with ReRank
Here we are using an Inference Endpoint for the shleifer/distilbart-cnn-12-6
, a 306M parameter distilled model from facebook/bart-large-cnn
.
summarization
demo.launch(share=True)
Title - description
We are using this Inference Endpoint for dslim/bert-base-NER
, a 108M parameter fine-tuned BART model on the NER task.
Named entity
gr.close_all() 、 demo.close()
Here we'll be using an Inference Endpoint for Salesforce/blip-image-captioning-base
a 14M parameter captioning model.
Image-to-text
图像转文字描述
Here we are going to run runwayml/stable-diffusion-v1-5
using the 🧨 diffusers
library.
text-to-image
生成图像
with gr.Blocks() as demo:
gr.Markdown("# Image Generation with Stable Diffusion")
with gr.Row():
with gr.Column(scale=4):
prompt = gr.Textbox(label="Your prompt") #Give prompt some real estate
with gr.Column(scale=1, min_width=50):
btn = gr.Button("Submit") #Submit button side by side!
with gr.Accordion("Advanced options", open=False): #Let's hide the advanced options!
negative_prompt = gr.Textbox(label="Negative prompt")
with gr.Row():
with gr.Column():
steps = gr.Slider(label="Inference Steps", minimum=1, maximum=100, value=25,
info="In many steps will the denoiser denoise the image?")
guidance = gr.Slider(label="Guidance Scale", minimum=1, maximum=20, value=7,
info="Controls how much the text prompt influences the result")
with gr.Column():
width = gr.Slider(label="Width", minimum=64, maximum=512, step=64, value=512)
height = gr.Slider(label="Height", minimum=64, maximum=512, step=64, value=512)
output = gr.Image(label="Result") #Move the output up too
btn.click(fn=generate, inputs=[prompt,negative_prompt,steps,guidance,width,height], outputs=[output])
gr.close_all()
demo.launch(share=True, server_port=int(os.environ['PORT4']))
Image-text-image
Here we'll be using an Inference Endpoint for falcon-40b-instruct
, one of best ranking open source LLM on the 🤗 Open LLM Leaderboard.
To run it locally, one can use the Transformers library or the text-generation-inference
Adding other advanced features
wandb
- Logging of the training loss and metrics
- Sampling from the model during training and uploading the samples to W&B
- Saving the model checkpoints to W&B
Setup DDPM noise scheduler and sampler (same as in the Diffusion course).****¶
- perturb_input: Adds noise to the input image at the corresponding timestep on the schedule
- sample_ddpm_context: Generate images using the DDPM sampler, we will use this function during training to sample from the model regularly and see how our training is progressing
- We are going to compare the samples from DDPM and DDIM samplers
- Visualize mixing samples with conditional diffusion models
wandb.init
Let's see how to finetune a language model to generate character backstories using HuggingFace Trainer with wandb integration. We'll use a tiny language model (TinyStories-33M
) due to resource constraints, but the lessons you learn here should be applicable to large models too!
Preparing data
Training
LLM's don't always generate the same results. Your generated characters and backstories may differ from the video.
直觉
Noise
UNet
Epoch
Sprites by ElvGames, FrootsnVeggies and kyrise This code is modified from, https://github.com/cloneofsimo/minDiffusion Diffusion model is based on Denoising Diffusion Probabilistic Models and Denoising Diffusion Implicit Models
DDPM
DDIM
DDIM faster than DDPM
结对编程是一种软件开发方法,两个程序员坐在一起共同完成一个任务。其中一个程序员负责编写代码,另一个程序员则负责审查和提供反馈。这种方法可以提高代码质量、减少错误、增强团队协作能力等。
在结对编程中,两个程序员通常会使用一个共享的计算机,并使用一些工具来同步他们的工作。例如,他们可以使用版本控制系统来管理代码,并使用实时通信工具来交流想法和解决问题。
使用的模型是PaLM
- The
@retry
decorator helps you to retry the API call if it fails. - We set the temperature to 0.0 so that the model returns the same output (completion) if given the same input (the prompt).
Set the MakerSuite API key with the provided helper function.
- priming: getting the LLM ready for the type of task you'll ask it to do.
- question: the specific task.
- decorator: how to provide or format the output.
Observe how the decorator affects the output**¶**
- In other non-coding prompt engineering tasks, it's common to use "chain-of-thought prompting" by asking the model to work through the task "step by step".
- For certain tasks like generating code, you may want to experiment with other wording that would make sense if you were asking a developer the same question.
prompt_template = """
{priming}
{question}
{decorator}
Your solution:
"""
结对编程方案
- An LLM can help you rewrite your code in the way that's recommended for that particular language.
- You can ask an LLM to rewrite your Python code in a way that is more 'Pythonic".
Ask for multiple ways of rewriting your code
Paste markdown into a markdown cell
Ask the model to recommend one of the methods as most 'Pythonic'
- Ask the LLM to perform a code review.
- Note that adding/removing newline characters may affect the LLM completion that gets output by the LLM.
- It may help to specify that you want the LLM to output "in code" to encourage it to write unit tests instead of just returning test cases in English.
- Improve runtime by potentially avoiding inefficient methods (such as ones that use recursion when not needed).
技术债务
"Technical Debt"这个词是软件开发中的一个术语,最早由软件工程师比尔·麦康奈尔(Bill McConnell)在1992年提出。它用来描述在软件开发过程中,由于时间、成本或其他资源的限制,而做出的妥协或牺牲,这些妥协或牺牲可能会在未来导致额外的工作或问题。
Vertex AI
- The returned object is a list with a single
TextEmbedding
object. - The
TextEmbedding.values
field stores the embeddings in a Python list.
- Calculate the similarity between two sentences as a number between 0 and 1.
- Try out your own sentences and check if the similarity calculations match your intuition.
cosine_similarity
- One possible way to calculate sentence embeddings from word embeddings is to take the average of the word embeddings.
- This ignores word order and context, so two sentences with different meanings, but the same set of words will end up with the same sentence embedding.
- These sentence embeddings account for word order and context.
- Verify that the sentence embeddings are not the same.
Mix embedding : text +picture
- We'll use principal component analysis (PCA).
- You can learn more about PCA in this video from the Machine Learning Specialization.
- The
cosine_similarity
function expects a 2D array, which is why we'll wrap each embedding list inside another list. - You can verify that sentence 1 and 2 have a higher similarity compared to sentence 1 and 4, even though sentence 1 and 4 both have the words "desert" and "plant".
- BigQuery is Google Cloud's serverless data warehouse.
- We'll get the first 500 posts (questions and answers) for each programming language: Python, HTML, R, and CSS.
- You can reuse the above code to run your own queries if you are using Google Cloud's BigQuery service.
- In this classroom, if you run into any issues, you can load the same data from a csv file.
Generate text embeddings**¶**
- To generate embeddings for a dataset of texts, we'll need to group the sentences together in batches and send batches of texts to the model.
- The API currently can take batches of up to 5 pieces of text per API call.
Get embeddings on a batch of data**¶**
- This helper function calls
model.get_embeddings()
on the batch of data, and returns a list containing the embeddings for each text in that batch.
Load the data from file**¶**
- We'll load the stack overflow questions, answers, and category labels (Python, HTML, R, CSS) from a .csv file.
- We'll load the embeddings of the questions (which we've precomputed with batched calls to
model.get_embeddings()
), from a pickle file.
Anomaly / Outlier detection**¶**
- We can add an anomalous piece of text and check if the outlier (anomaly) detection algorithm (Isolation Forest) can identify it as an outlier (anomaly), based on its embedding.
- Train a random forest model to classify the category of a Stack Overflow question (as either Python, R, HTML or CSS).
- For more predictability of the language model's response, you can also ask the language model to choose among a list of answers and then elaborate on its answer.
Adjusting Creativity/Randomness
- You can control the behavior of the language model's decoding strategy by adjusting the temperature, top-k, and top-n parameters.
- For tasks for which you want the model to consistently output the same result for the same input, (such as classification or information extraction), set temperature to zero.
- For tasks where you desire more creativity, such as brainstorming, summarization, choose a higher temperature (up to 1).
Softmax =one/all
temperature
top_k
top_p
Microsoft
SWOTs可以被用于分析其内部和外部环境的优势(Strengths)、弱点(Weaknesses)、机会(Opportunities)和威胁(Threats)。
- Grow the existing business
- Save money and time
- Add completely new business
- Prepare for the unknown
Inventory:
- Kernel
- Semantic (and Native) functions -- you can do a lot with these
- BusinessThinking plugin --> SWOTs in ways you could never imagine
- DesignThinking plugin ... Here you are
Inventory:
- Kernel
- Semantic (and Native) functions -- you can do a lot with these
- BusinessThinking plugin --> SWOTs in ways you could never imagine
- DesignThinking plugin --> you did that. Congrats
... next up ... you did all that COMPLETION
You have now accessed both the COMPLETION and SIMILARITY engines.
Inventory:
- Kernel
- Semantic (and Native) functions -- you can do a lot with these
- BusinessThinking plugin --> SWOTs in ways you could never imagine
- DesignThinking plugin --> you did that. Congrats
- Use the similarity engine to your heart's content 🧲
- THE BIG ONE!!!!!
The next two cells will sometimes return an error. The LLM response is variable and at times can't be successfully parsed by the planner or the LLM will make up new functions. If this happens, try resetting the jupyter notebook kernel and running it again.
There are a variety of limitations to using the planner in August of 2023 in terms of number of tokens required and model preference that we can expect to slowly vanish over time. For simple tasks, this Planner-based approach is unusually powerful. It takes full advantage of both COMPLETION and SIMILARITY in a truly magical way.
https://github.com/microsoft/chat-copilot
The backend server demonstrates how to connect to a variety of resources like auth, vector dbs, telemetry, content safety, PDF import, and even OCR.