-
Notifications
You must be signed in to change notification settings - Fork 849
Use Microsoft.Extensions.DataIngestion in AI Chat Web template
#7023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit e1d066034962c9686bf8150984b6adf0e25846c8.
This reverts commit a369be9.
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
...rc/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Components/Pages/Chat/ChatCitation.razor
Show resolved
Hide resolved
...es/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs
Show resolved
Hide resolved
....AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/IngestedChunk.cs
Show resolved
Hide resolved
...AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/SemanticSearch.cs
Outdated
Show resolved
Hide resolved
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
|
Marking as ready for review to get some eyes on this. Note that there are still pending improvements:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR modernizes the AI chat web templates by replacing the custom PDF ingestion pipeline (using PdfPig) with the new Microsoft.Extensions.DataIngestion library suite. The changes enable support for both Markdown and PDF document formats while simplifying the ingestion architecture.
Key Changes
- Replaced custom
PDFDirectorySourceandIIngestionSourcewith the standardizedMicrosoft.Extensions.DataIngestionAPIs - Removed
IngestedDocumenttracking class as document versioning is now handled by the ingestion pipeline - Added Markdown viewer support (viewer.html and viewer.mjs) for rendering
.mdfiles - Updated citation format to remove page numbers, now supporting document-level citations
- Changed ingestion trigger from startup to lazy initialization on first search request
Reviewed Changes
Copilot reviewed 94 out of 100 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| VectorStoreWriter.cs | Added workaround for QdrantVectorStore key type incompatibility using string name check |
| SemanticSearch.cs (all variants) | Added lazy ingestion on first search with _initialized flag |
| DocumentReader.cs | New custom reader supporting both Markdown and PDF via MarkdownReader and MarkItDownReader |
| DataIngestor.cs (all variants) | Simplified to use IngestionPipeline with SemanticSimilarityChunker |
| IngestedChunk.cs (all variants) | Changed Key type to Guid, made constants public, added JSON serialization attributes |
| ChatCitation.razor | Added Markdown viewer support alongside existing PDF viewer |
| ChatMessageItem.razor | Removed page number from citation regex and data structure |
| Program.cs variants | Removed startup ingestion, added vector store registrations, changed DataIngestor to singleton |
| *.csproj.in | Replaced PdfPig with DataIngestion packages and ML.Tokenizers |
| THIRD-PARTY-NOTICES.TXT | Removed PdfPig license notice |
| GeneratedContent.targets | Updated package version variables |
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
...AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/SemanticSearch.cs
Outdated
Show resolved
Hide resolved
...tensions.AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Program.Aspire.cs
Show resolved
Hide resolved
jeffhandley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @MackinnonBuck!
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
...enAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/Components/Pages/Chat/ChatCitation.razor
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big thanks for a great contribution and detailed testing @MackinnonBuck !
Please enable the tracing, this could be done by modyfing:
Line 79 in c3e0c73
| .AddSource("Experimental.Microsoft.Extensions.AI"); |
with:
.AddSource("Experimental.Microsoft.Extensions.DataIngestion");
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
....AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/IngestedChunk.cs
Show resolved
Hide resolved
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
...ts/aichatweb.AzureOpenAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/aichatweb.Web.csproj
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...es/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs
Show resolved
Hide resolved
|
@MackinnonBuck FYI - I added some commits, including one that shows a message about documents being loaded. All tests are passing and I did a lot of end-to-end functional validation too. I've marked it to auto-merge when CI is green after my latest push. |
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are almost there, we just need to disable the incremental ingestion and remove the SK dependency from the PDF reader.
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/ChatWithCustomData-CSharp.Web.csproj.in
Outdated
Show resolved
Hide resolved
...zureOpenAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
.../aichatweb.Ollama_Qdrant.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ests/Snapshots/aichatweb.Ollama_Qdrant.verified/aichatweb/aichatweb.Web/aichatweb.Web.csproj
Outdated
Show resolved
Hide resolved
...apshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...apshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...ntegrationTests/Snapshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/aichatweb.csproj
Outdated
Show resolved
Hide resolved
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MackinnonBuck To save your time, I've addressed my feedback by pushing to your branch. Please perform the manual verification before merging (I don't know how to do it).
...plates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Components/Pages/Chat/Chat.razor
Show resolved
Hide resolved
…x code formatting.
…net#7023) * Add Markdown support * Remove PDF support * Revert "Remove PDF support" This reverts commit e1d066034962c9686bf8150984b6adf0e25846c8. * Add 'Example_GPS_Watch.md' * Add MEDI dependencies * Revert "[MEDI] Remove collection key type workaround (dotnet#7010)" This reverts commit a369be9. * MEDI integration into chat template * Remove PdfPig dependency * Fix citation + normalize identifier path * Undo changes to `M.E.DI.csproj` * Update snapshots * Update DataIngestion unit tests to handle keys as either strings or guids * Update SK and fix MEDI version * Remove SK workaround * Fix sandbox paths to allow running tests multiple times * Reliable data ingestion * Enable MEDI tracing * Simplify log message * Add `PdfPigReader` for non-Aspire template * Invert PdfPigReader exclusion condition * Use Markitdown MCP * Update snapshots * Undo changes to `IngestionPipelineTests.cs` * Update src/ProjectTemplates/Microsoft.Extensions.AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> * Update snapshots * Improve template execution test failure output * Support .NET 10 in aichatweb, using it by default * Show a message when loading documents by loading docs as a separate tool * disable the incremental ingestion * map every PDF page to a single section * drop SK dependency * Add system prompt instructions for calling the LoadDocuments tool. Fix code formatting. --------- Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
- Use `Microsoft.Extensions.DataIngestion` in AI Chat Web template (dotnet#7023) - Add a new Microsoft.Agents.AI.Templates package with an aiagents-webapi project template (dotnet#7014) - Add Agent Framework DevUI into the aiagent-webapi template (dotnet#7026)
* Merged PR 54952: Getting ready for the 10.0 stable release. Flowing .NET Servicing #### AI description (iteration 1) #### PR Classification This PR updates dependency versions and build pipeline configurations to prepare for the .NET 10.0 stable release. #### PR Summary The changes update dependency and LTS versions (upgrading many from 9.0.10 to 9.0.11), enable release-specific flags, and streamline the build pipelines for servicing. - **`eng/Version.Details.xml` and `eng/Versions.props`**: Upgraded various dependency versions and LTS numbers and set stabilization flags (e.g., `StabilizePackageVersion` to true, `DotNetFinalVersionKind` to release). - **`azure-pipelines.yml`**: Removed the code coverage stage to simplify the CI pipeline. - **`eng/pipelines/templates/BuildAndTest.yml`**: Added tasks to set up private feed credentials and commented out integration tests that require authentication. - **`NuGet.config`**: Revised package source configuration by removing package source mapping and adding new internal feed URLs. - **`Directory.Build.props`**: Suppressed NU1507 warnings to accommodate internal branch configuration. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> * [MEDI] start producing NuGet packages (#7016) * remove IsPackable=false, provide all mandatory properties for each package we want to ship * add basic READMEs * Update version numbers in AI changelogs (#7008) * [MEDI] Don't stop document processing on enricher error (#7005) * introduce EnricherOptions option bag * implement batching * don't validate results returned by IChatClient * don't expose FileInfo as source via IngestionResult, as it could be Stream in the future. Just expose the document id * Enricher failures should not fail the whole ingestion pipeline, as they are best-effort enhancements * [MEDI] add PackageTags (#7022) * Add MarkItDownMcpReader for MCP server support (#7025) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com> * Image generation tool (#6749) * Prototype of using ImageGenerationTool * Handle DataContent returned from ImageGen * React to rename and improve metadata * Handle image_generation tool content from streaming * Add handling for combining updates with images * Add tests for new ChatResponseUpdateExtensions * Rename ImageGenerationTool to HostedImageGenerationTool * Remove ChatResponseUpdateCoalescingOptions * Add ImageGeneratingChatClient * Fix namespace of tool * Replace traces of function calling * More namepsace fix * Enable editing * Update to preview OpenAI with image tool support * Temporary OpenAI feed * Fix tests * Add integration tests for ImageGeneratingChatClient * Remove ChatRole.Tool -> Assistant workaround * Remove use of private reflection for Image results * Add ChatResponseUpdate.Clone * Move all mutable state into RequestState object * Adjust prompt to improve integration test reliability * Refactor tool initialization I verified that the tool creation is cached by ReflectionAIFunctionDescriptor This change includes a small optimization to avoid additional allocation around inserting tools into the options. * Add integration tests for streaming Fixes the removal of tool content - this was broken for streaming when I changed removal to be based on callId. We don't have the CallId yet in the streaming case so we have to remove by name. * React to changes and fix tests * Address feedback * Fix SkipTestException from ConditionalTheory * Fix formatting * Add back image replacement coalescing (removed in merge) * Fix template tests and use new OpenAI * Remove use of temporary staging nuget feed * Address feedback * Make ImageGeneratingChatClient use ImageGenerationTool*Content * Remove ApplyUpdates and Coalesce ImageResults instead of DataContent. * Workaround OpenAI issue where image data is not read for partial images. openai/openai-dotnet#809 * Improved workaround * Return ImageGenerationToolCallContent from OpenAI * Add OpenAI image tool tests with representation of real traffic * Correct the event sequence for streaming single image * Fix some docs and refactor for clarity * Make MEAI packages use 10.0 runtime packages (#7028) * Make MEAI packages use 10.0 runtime packages * Add back MEAI.Abstractions JsonSchemaExporter tests * Address feedback * Remove unneeded trimming suppression * When using latest .NET packages, force System.Numerics.Tensors to 10.0 (for MEAI) (#7031) * Add a new Microsoft.Agents.AI.Templates package with an aiagents-webapi project template (#7014) * Initial Microsoft.Agents.AI.Templates structure * Refine Microsoft.Agents.AI.Templates infrastructure * Move project template infrastructure utilities into a shared folder * Add the webapi-agents project template content with GitHub models * Support parameterized AI Service Provider * Rename to aiagents-webapi * Support parameterized chatmodel and update docs with renames * Add Snapshot tests * Add aiagents-webapi snapshot tests * Add aiagents-webapi execution tests (and component governance) * Improve aiagents-webapi template parameters * Apply suggestions from copilot code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move shared ProjectTemplate infrastructure to not get included in Shared.Tests * Fix the template sandbox / execution tests after moving infrastructure * Ignore CA1716 warning about 'Shared' namespace in template tests * Clean up template sandbox source/output * Rename to "aiagent-webapi" and favor singular "Agent". Docs cleanup. * Update templates dev doc to cover Microsoft.Agents.AI.Templates too * Fix remaining template sandbox references with new paths * Add a tool call in aiagent-webapi. Update workflow API usage for upcoming change. Fix snapshots * Exclude csproj.in file from template package * Add a survey link to the aiagent-webapi template's generated readme --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Use `Microsoft.Extensions.DataIngestion` in AI Chat Web template (#7023) * Add Markdown support * Remove PDF support * Revert "Remove PDF support" This reverts commit e1d066034962c9686bf8150984b6adf0e25846c8. * Add 'Example_GPS_Watch.md' * Add MEDI dependencies * Revert "[MEDI] Remove collection key type workaround (#7010)" This reverts commit a369be9. * MEDI integration into chat template * Remove PdfPig dependency * Fix citation + normalize identifier path * Undo changes to `M.E.DI.csproj` * Update snapshots * Update DataIngestion unit tests to handle keys as either strings or guids * Update SK and fix MEDI version * Remove SK workaround * Fix sandbox paths to allow running tests multiple times * Reliable data ingestion * Enable MEDI tracing * Simplify log message * Add `PdfPigReader` for non-Aspire template * Invert PdfPigReader exclusion condition * Use Markitdown MCP * Update snapshots * Undo changes to `IngestionPipelineTests.cs` * Update src/ProjectTemplates/Microsoft.Extensions.AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> * Update snapshots * Improve template execution test failure output * Support .NET 10 in aichatweb, using it by default * Show a message when loading documents by loading docs as a separate tool * disable the incremental ingestion * map every PDF page to a single section * drop SK dependency * Add system prompt instructions for calling the LoadDocuments tool. Fix code formatting. --------- Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com> * Add Agent Framework DevUI into the aiagent-webapi template (#7026) * Integrate DevUI into the aiagent-webapi project template * Improve aiagent-webapi Program.cs per feedback. * Remove --no-devui. Fix OpenAI clients. Augment execution test sandbox ignores. * Rename to Microsoft.Agents.AI.ProjectTemplates * Set Microsoft.Agents.AI package versions * Simplify the GitHub and OpenAI key config vars for aiagent-webapi * Sort package references * Fix troubleshooting section in READMEs * Revert MEAI.Templates change. Make launchSettings .gitignore more specific. --------- Co-authored-by: Mackinnon Buck <mackinnon.buck@gmail.com> * Fix display of target frameworks in agents template. Hide the chat model textbox from the IDE template UI. --------- Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com> Co-authored-by: Stephen Toub <stoub@microsoft.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Eric StJohn <ericstj@microsoft.com> Co-authored-by: Jeff Handley <Jeff.Handley@microsoft.com> Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Mackinnon Buck <mackinnon.buck@gmail.com>
This PR makes the following changes to the chat template:
Example_GPS_Watch.pdfwith its markdown equivalentPdfPigmarkitdown