AI Toolkit for VScode (Windows)

AI Toolkit for VS Code simplifies generative AI app development by bringing together cutting-edge AI development tools and models from Azure AI Studio Catalog and other catalogs like Hugging Face. You will be able browse the AI models catalog powered by Azure ML and Hugging Face, download them locally, fine-tune, test and use them in your application.

AI Toolkit Preview will run locally. Depends on the model you selected, some tasks have Windows and Linux support only,

Local inference or fine-tune, depends on the model you selected, you may need to have GPU such as NVIDIA CUDA GPU.

If you run remotely the cloud resource needs to have GPU, please make sure to check your environment. For local run on Windows + WSL, WSL Ubuntu distro 18.4 or greater should be installed and is set to default prior to using AI Toolkit.

Getting Started

Learn more how to install Windows subsystem for Linux

and changing default distribution.

AI Tooklit GitHub Repo

Windows or Linux.
MacOS support is coming soon
For finetuning on both Windows and Linux, you'll need an Nvidia GPU. In addition, Windows requires subsystem for Linux with Ubuntu distro 18.4 or greater. Learn more how to install Windows subsystem for Linux and changing default distribution.

Install AI Toolkit

AI Toolkit is shipped as a Visual Studio Code Extension, so you need to install VS Code first, and download AI Toolkit from the VS Marketplace. The AI Toolkit is available in the Visual Studio Marketplace and can be installed like any other VS Code extension.

If you're unfamiliar with installing VS Code extensions, follow these steps:

Sign In

In the Activity Bar in VS Code select Extensions
In the Extensions Search bar type "AI Toolkit"
Select the "AI Toolkit for Visual Studio code"
Select Install

Now, you are ready to use the extension!

You will be prompted to sign in to GitHub, so please click "Allow" to continue. You will be redirected to GitHub signing page.

Please sign in and follow the process steps. After successful completion, you will be redirected to VS Code.

Once the extension has been installed you'll see the AI Toolkit icon appear in your Activity Bar.

Let's explore the available actions!

Available Actions

The primary sidebar of the AI Toolkit is organized into

Models
Resources
Playground
Fine-tuning

Are available in the Resources section. To get started select Model Catalog.

Download a model from the catalog

Upon launching AI Toolkit from VS Code side bar, you can select from the following options:

Find a supported model from Model Catalog and download locally
Test model inference in the Model Playground
Fine-tune model locally or remotely in Model Fine-tuning
Deploy fine-tuned models to cloud via command palette for AI Toolkit

Note

GPU Vs CPU

You'll notice that the model cards show the model size, the platform and accelerator type (CPU, GPU). For optimized performance on Windows devices that have at least one GPU, select model versions that only target Windows.

This ensures you have a model optimized for the DirectML accelerator.

The model names are in the format of

{model_name}-{accelerator}-{quantization}-{format}.

To check whether you have a GPU on your Windows device, open Task Manager and then select the Performance tab. If you have GPU(s), they will be listed under names like "GPU 0" or "GPU 1".

Run the model in the playground

After all the parameters are set, click Generate Project.

Once your model has downloaded, select Load in Playground on the model card in the catalog:

Initiate the model download
Install all prerequisites and dependencies
Create VS Code workspace

When the model is downloaded, you can launch the project from AI Toolkit.

Note If you want to try preview feature to do inference or fine-tuning remotely, please follow this guide

Windows Optimized Models

You should see the model response streamed back to you:

AI Toolkit offers the collection of publicly available AI models already optimized for Windows. The models are stored in the different locations including Hugging Face, GitHub and others, but you can browse the models and find all of them in one place ready for downloading and using in your Windows application.

Model Selections

If you do not have a GPU available on your Windows device but you selected the

Phi-3-mini-4k-directml-int4-awq-block-128-onnx model

the model response will be very slow.

You should instead download the CPU optimized version:

Phi-3-mini-4k-cpu-int4-rtn-block-32-acc-level-4-onnx.

It is also possible to change:

Context Instructions: Help the model understand the bigger picture of your request. This could be background information, examples/demonstrations of what you want or explaining the purpose of your task.

Inference parameters:

Maximum response length: The maximum number of tokens the model will return.
Temperature: Model temperature is a parameter that controls how random a language model's output is. A higher temperature means the model takes more risks, giving you a diverse mix of words. On the other hand, a lower temperature makes the model play it safe, sticking to more focused and predictable responses.
Top P: Also known as nucleus sampling, is a setting that controls how many possible words or phrases the language model considers when predicting the next word
Frequency penalty: This parameter influences how often the model repeats words or phrases in its output. The higher the value (closer to 1.0) encourages the model to avoid repeating words or phrases.
Presence penalty: This parameter is used in generative AI models to encourage diversity and specificity in the generated text. A higher value (closer to 1.0) encourages the model to include more novel and diverse tokens. A lower value is more likely for the model to generate common or cliche phrases.

Use the REST API in your application

The AI Toolkit comes with a local REST API web server on port 5272 that uses the OpenAI chat completions format.

This enables you to test your application locally without having to rely on a cloud AI model service. For example, the following JSON file shows how to configure the body of the request:

{
    "model": "Phi-3-mini-4k-directml-int4-awq-block-128-onnx",
    "messages": [
        {
            "role": "user",
            "content": "what is the golden ratio?"
        }
    ],
    "temperature": 0.7,
    "top_p": 1,
    "top_k": 10,
    "max_tokens": 100,
    "stream": true
}

You can test the REST API using (say) Postman or the CURL (Client URL) utility:

curl -vX POST http://127.0.0.1:5272/v1/chat/completions -H 'Content-Type: application/json' -d @body.json

Using the OpenAI client library for Python

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:5272/v1/", 
    api_key="x" # required for the API but not used
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "what is the golden ratio?",
        }
    ],
    model="Phi-3-mini-4k-cuda-int4-onnx",
)

print(chat_completion.choices[0].message.content)

Using Azure OpenAI client library for .NET

Add the Azure OpenAI client library for .NET to your project using NuGet:

dotnet add {project_name} package Azure.AI.OpenAI --version 1.0.0-beta.17

Add a C# file called OverridePolicy.cs to your project and paste the following code:

// OverridePolicy.cs
using Azure.Core.Pipeline;
using Azure.Core;

internal partial class OverrideRequestUriPolicy(Uri overrideUri)
    : HttpPipelineSynchronousPolicy
{
    private readonly Uri _overrideUri = overrideUri;

    public override void OnSendingRequest(HttpMessage message)
    {
        message.Request.Uri.Reset(_overrideUri);
    }
}

Next, paste the following code into your Program.cs file:

// Program.cs
using Azure.AI.OpenAI;

Uri localhostUri = new("http://localhost:5272/v1/chat/completions");

OpenAIClientOptions clientOptions = new();
clientOptions.AddPolicy(
    new OverrideRequestUriPolicy(localhostUri),
    Azure.Core.HttpPipelinePosition.BeforeTransport);
OpenAIClient client = new(openAIApiKey: "unused", clientOptions);

ChatCompletionsOptions options = new()
{
    DeploymentName = "Phi-3-mini-4k-directml-int4-awq-block-128-onnx",
    Messages =
    {
        new ChatRequestSystemMessage("You are a helpful assistant. Be brief and succinct."),
        new ChatRequestUserMessage("What is the golden ratio?"),
    }
};

StreamingResponse<StreamingChatCompletionsUpdate> streamingChatResponse
    = await client.GetChatCompletionsStreamingAsync(options);

await foreach (StreamingChatCompletionsUpdate chatChunk in streamingChatResponse)
{
    Console.Write(chatChunk.ContentUpdate);
}

Fine Tuning with AI Toolkit

Get started with model discovery and playground.
Model fine-tuning and inference using local computing resources.
Remote fine-tuning and inference using Azure resources

Fine Tuning with AI Toolkit

AI Toolkit Q&A Resources

Please refer to our Q&A page for most common issues and resolutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AITookit_QuickStart.md

AITookit_QuickStart.md

AI Toolkit for VScode (Windows)

Getting Started

Install AI Toolkit

Sign In

Available Actions

Download a model from the catalog

Run the model in the playground

Windows Optimized Models

Model Selections

Use the REST API in your application

Using the OpenAI client library for Python

Using Azure OpenAI client library for .NET

Fine Tuning with AI Toolkit

AI Toolkit Q&A Resources

Files

AITookit_QuickStart.md

Latest commit

History

AITookit_QuickStart.md

File metadata and controls

AI Toolkit for VScode (Windows)

Getting Started

Install AI Toolkit

Sign In

Available Actions

Download a model from the catalog

Run the model in the playground

Windows Optimized Models

Model Selections

Use the REST API in your application

Using the OpenAI client library for Python

Using Azure OpenAI client library for .NET

Fine Tuning with AI Toolkit

AI Toolkit Q&A Resources