quarkusio · insectengine · Sep 25, 2025
diff --git a/_includes/ai-blueprints.html b/_includes/ai-blueprints.html
@@ -0,0 +1,33 @@
+<div class="full-width-bg component">
+  <div class="grid-wrapper">
+    <div class="width-3-12 width-12-12-m">
+    <img class="light-only" src="{{site.baseurl}}/assets/images/icons/icon-ai-blueprints.svg" alt="AI blueprints icon">
+    <img class="dark-only" src="{{site.baseurl}}/assets/images/icons/icon-ai-blueprints-dark.svg" alt="AI blueprints icon">
+    </div>
+    <div class="width-9-12 width-12-12-m">
+      <h2>Enterprise AI Blueprints for Java with Quarkus & LangChain4j</h2>
+      <p>The following three blueprints are conceptual, infrastructure-agnostic reference architectures. Each stands on its own and shows how to structure a Java solution with Quarkus (runtime, APIs, orchestration) and LangChain4j (LLM access, embeddings, tools, chains).<p>
+      <p>Quarkus provides the foundation for building secure, cloud-native, and AI-infused applications. Quarkus applications integrate with external model runtimes through LangChain4j, which offers rich abstractions for connecting to LLM providers, managing embeddings, defining tools, and orchestrating agentic workflows. This keeps AI where it belongs, as a capability embedded in enterprise applications, while Quarkus ensures performance, scalability, and operational reliability.<p>
+      <p>These blueprints demonstrate practical patterns and best practices for developing enterprise-grade AI solutions using a combination of these technologies. They aim to simplify the process of using AI in Java applications and guiding software architects along the way. Whether you're building intelligent chatbots, recommendation engines, or sophisticated data analysis tools, these blueprints provide a solid starting point for your next AI project. Explore each blueprint to discover how Quarkus and LangChain4j can enrich your Java applications with advanced AI capabilities.</p>
+    </div>
+
+    <div class="width-4-12 width-12-12-m">
+      <h3>Frozen  RAG (Retrieval-Augmented Generation)</h3>
+      <p>Improve LLM accuracy with RAG, leveraging enterprise data. Quarkus handles RAG's entire process, including data ingestion, query execution, embedding, context retrieval, and LLM communication.</p>
+      <p class="textCTA"><i class="fa fa-chevron-right"></i><a href="{{site.baseurl}}/ai-frozen-rag">Learn the basics of Frozen RAG</a></p>
+    </div>
+
+    <div class="width-4-12 width-12-12-m">
+      <h3>Contextual RAG (Multi-Sources, Rerank, Injection)</h3>
+      <p>Advanced Contextual RAG improves frozen RAG by adding multi-source retrieval, reranking, and content injection. This makes it ideal for complex enterprise scenarios, ensuring accuracy, relevance, and explainability across distributed information. It enables dynamic information handling, complex queries, and clear lineage for auditable, high-stakes decisions.</p>
+      <p class="textCTA"><i class="fa fa-chevron-right"></i><a href="{{site.baseurl}}/ai-contextual-rag">Learn about Contextual RAG </a></p>
+    </div>
+
+    <div class="width-4-12 width-12-12-m">
+      <h3>Chain-of-Thought (CoT) Reasoning</h3>
+      <p>Chain-of-Thought (CoT) guides LLMs through explicit intermediate steps to solve complex problems. This systematic approach breaks tasks into manageable sub-problems for sequential processing and solution building. CoT enhances LLM accuracy, enabling understanding and debugging, especially for multi-step reasoning in mathematical problem-solving, code generation, and logical inference.</p>
+      <p class="textCTA"><i class="fa fa-chevron-right"></i><a href="{{site.baseurl}}/ai-chain-of-thought">Learn about Chain-of-Thought Reasoning</a></p>
+    </div>
+
+  </div>
+</div>
diff --git a/_includes/ai-breadcrumb.html b/_includes/ai-breadcrumb.html
@@ -0,0 +1,7 @@
+<section class="full-width-version-bg flexfilterbar">
+  <div class="guideflexcontainer">
+    <div class="docslink">
+      <a class="returnlink" href="{{site.baseurl}}/ai-blueprints"> Back to AI Blueprints</a>
+    </div>
+  </div>
+</section>
diff --git a/_includes/ai-chainofthought.html b/_includes/ai-chainofthought.html
@@ -0,0 +1,54 @@
+<div class="full-width-bg component">
+  <div class="grid-wrapper">
+    <div class="width-12-12 width-12-12-m">
+      <h1>Chain-of-Thought (CoT) Reasoning</h1>
+      <p>The architecture of the Chain-of-Thought (CoT) blueprint focuses on guiding a Large Language Model (LLM) through explicit intermediate steps to solve complex problems, improve reasoning, and provide transparency in its decision-making.</p>
+      <h2>Main Use-Cases</h2>
+      <ul>
+        <li><strong>Improved Reasoning:</strong> Decompose complex problems to reduce logical errors.</li>
+        <li><strong>Transparency:</strong> Provide optional explanations for decisions.</li>
+        <li><strong>Training & Enablement:</strong> Illustrate the "why" behind concepts, not just the "what."</li>
+        <li><strong>Decision Support:</strong> Aid in investments, vendor selection, and risk assessments.</li>
+        <li><strong>Troubleshooting:</strong> Facilitate structured diagnostics in operations and engineering.</li>
+        <li><strong>Policy Application:</strong> Apply multi-clause rules with traceable steps.</li>
+      </ul>
+      <h2>Architecture Overview</h2>
+      <p>The CoT architecture starts with a "User Query" that initiates the process. This query is received by the "Quarkus CoT Service," which serves as the orchestrator for the entire reasoning flow. Within the Quarkus service, the core Chain-of-Thought logic, powered by LangChain4j, is executed.</p>
+      <img class="light-only" src="{{site.baseurl}}/assets/images/ai/ai-cot.svg" alt="CoT architecture image">
+      <img class="dark-only" src="{{site.baseurl}}/assets/images/ai/ai-cot-dark.svg" alt="CoT architecture image">
+      <p>The "LangChain4j" package encapsulates the sequential steps of the CoT process:</p>
+    </div>
+    <div class="width-4-12 width-12-12-m">
+      <dl>
+        <dt>Step 1: Analyze Factors:</dt> 
+        <dd>This initial step involves the LLM breaking down the complex user query into its constituent parts, identifying key factors, and performing an initial analysis. This could involve understanding the problem, identifying relevant data points, or defining the scope of the task.</dd>
+      </dl>
+    </div>
+    <div class="width-4-12 width-12-12-m">
+      <dl>
+        <dt>Step 2: Synthesize Options:</dt> 
+        <dd>Building on the analysis from Step 1, the LLM then synthesizes various options, potential solutions, or different perspectives related to the query. This step demonstrates the model's ability to explore different avenues of thought before arriving at a conclusion.</dd>
+      </dl>
+    </div>
+    <div class="width-4-12 width-12-12-m">
+      <dl>
+        <dt>Step 3: Recommendation:</dt>
+        <dd>In the final step, the LLM formulates a "Recommendation" or a definitive answer based on the analysis and synthesis performed in the preceding steps. This recommendation is the ultimate output of the CoT process.</dd>
+      </dl>
+    </div>
+    <div class="width-12-12 width-12-12-m">
+      <p>Finally, the Response is returned to the user, with the option to include the intermediate reasoning steps when transparency is required. Quarkus orchestrates the execution of single- or multi-prompt chains, while LangChain4j supplies the abstractions for building prompts and capturing reasoning outputs at each step. This structured flow improves the LLM’s performance on complex tasks and, when needed, provides an auditable record of how the answer was derived.</p>
+      <h2>Further Patterns</h2>
+      <p>Further patterns in Chain-of-Thought reasoning extend beyond basic single-prompt approaches to offer more sophisticated control and integration. "Single-prompt CoT" provides a concise way to elicit reasoning, where a single instruction like "think step by step" guides the LLM to return both its thought process and the final answer.</p>
+      <p>More advanced scenarios benefit from "Program-of-Thought," which involves multiple chained prompts, where the output of one step feeds into the next, often including optional verification steps for enhanced accuracy.</p>
+      <p>Lastly, a "Hybrid" approach combines CoT with Retrieval-Augmented Generation (RAG) to ground the reasoning process in factual information, ensuring that the LLM's logical steps are supported by relevant data. These patterns provide flexibility in how CoT is applied, allowing architects to choose the level of control and factual grounding necessary for their specific enterprise AI applications.</p>
+      <h2>Guardrails & Privacy</h2>
+      <p>Architecting Chain-of-Thought (CoT) solutions for enterprise environments necessitates careful consideration of guardrails and privacy. The following points represent an initial excerpt of critical aspects that software architects must account for to ensure responsible and secure AI deployment. These considerations are vital to manage the transparency of reasoning, maintain answer consistency, and control data exposure within the CoT process.</p>
+      <ul>
+        <li><strong>Reasoning Exposure:</strong> Decide whether to reveal the Chain of Thought (CoT) or keep it internal.</li>
+        <li><strong>Consistency Checks:</strong> Implement a final verifier prompt or apply deterministic post-rules.</li>
+        <li><strong>Token Budgeting:</strong> Limit intermediate verbosity and summarize between steps.</li>
+      </ul>
+    </div>
+  </div>
+</div>
diff --git a/_includes/ai-contextualrag.html b/_includes/ai-contextualrag.html
@@ -0,0 +1,45 @@
+<div class="full-width-bg component">
+  <div class="grid-wrapper">
+    <div class="width-12-12 width-12-12-m">
+      <h1>Contextual RAG (Multi-Sources, Rerank, Injection)</h1>
+      <p>Advanced Contextual RAG extends the core frozen RAG pattern by incorporating multi-source retrieval, reranking, and content injection techniques. This is designed for more complex enterprise scenarios where information might be spread across various systems, requiring more sophisticated methods to ensure accuracy, relevance, and explainability. It allows for dynamic information handling, complex query processing, and provides clearer lineage for auditable decisions, making it ideal for high-stakes applications.</p>
+      <h2>Main Use-Cases</h2>
+      <ul>
+        <li><strong>Complex Queries:</strong> Addresses intricate questions requiring synthesis from multiple sources.</li>
+        <li><strong>Dynamic Information:</strong> Handles rapidly changing data environments by incorporating real-time updates.</li>
+        <li><strong>High-Accuracy Needs:</strong> Reranking and injection ensure more precise and relevant answers.</li>
+        <li><strong>Auditable Decisions:</strong> Provides clear lineage and context for generated responses, crucial for compliance and debugging.</li>
+      </ul>
+      <h2>Architecture Overview</h2>
+      <p>The process begins with a User Query, which is first processed by a Query Transformer to refine or enhance it for more effective retrieval. The transformed query is then passed to a Query Router that decides which knowledge sources to target. For unstructured data, the ingestion pipeline remains the same as in the foundational RAG architecture (documents split, embedded, and stored in a vector store), but contextual RAG extends retrieval to multiple sources such as structured databases, APIs, and search indexes.</p>
+      <p>The <strong>Query Router</strong> is responsible for directing the query to multiple retrieval sources simultaneously. These sources include:</p>
+      <ul>
+        <li><strong>Vector Retriever:</strong> Retrieves information based on semantic similarity from a vector store.</li>
+        <li><strong>Web/Search Retriever:</strong> Gathers information from the web or external search engines.</li>
+        <li><strong>Database Retriever:</strong> Extracts relevant data from structured databases.</li>
+        <li><strong>Full-Text Retriever:</strong> Performs keyword-based searches across a corpus of documents.</li>
+      </ul>
+      <p>All the information retrieved from these diverse sources is then fed into an <strong>Aggregator/Reranker</strong>. This component combines and prioritizes the retrieved content based on relevance to the original query.</p>
+      <p>The aggregated and reranked content is passed to a <strong>Content Injector (Prompt Builder)</strong>. This component constructs an Enhanced Prompt for the Large Language Model (LLM) by incorporating the retrieved context alongside the original user query.</p>
+      <p>Finally, the LLM processes the <strong>Augmented Prompt</strong>, using the provided context to generate an answer. Alongside the answer, the system can return the retrieved source segments for transparency and verification, though these should be considered supporting context rather than strict citations.</p>
+      <img class="light-only" src="{{site.baseurl}}/assets/images/ai/contextualrag-query.png" alt="Contextual RAG query image">
+      <img class="dark-only" src="{{site.baseurl}}/assets/images/ai/contextualrag-query-dark.png" alt="Contextual RAG query image">
+      <h2>Scalability & Performance</h2>
+      <p>Efficiently scaling and optimizing the performance of your AI solutions are crucial for enterprise adoption and operational success. While this blueprint only gives you some high level guidance, we strongly recommend to also look into the non functional aspects of your solution and ways to address these concepts:</p>
+      <ul>
+        <li><strong>Domain/Tenant Sharding:</strong> Retrieves information based on semantic similarity from a vector store.</li>
+        <li><strong>Caching:</strong> Cache query vectors and top-K hits for improved performance.</li>
+        <li><strong>Asynchronous Ingestion:</strong> Utilize asynchronous ingestion to batch embeddings and stream deltas.</li>
+        <li><strong>Lean Prompts:</strong> Prioritize token budget for context, keeping prompts concise.</li>
+      </ul>
+      <h2>Security</h2>
+      <p>Architecting secure enterprise AI solutions demands a proactive approach to safeguard sensitive data and preserve organizational integrity. Below are some first thoughts about critical security considerations and architectural patterns you should further investigate when building your solution. </p>
+      <ul>
+        <li><strong>Authorization at retrieval:</strong> Before injecting context, filter by user/tenant claims.</li>
+        <li><strong>Audit lineage:</strong> Store the chunk→document→source linkage with timestamps.</li>
+        <li><strong>PII controls:</strong> Redact or mask sensitive spans before embedding and prompting.</li>
+        <li><strong>Guard responses:</strong> Post-filter for data leakage and policy violations.</li>
+      </ul>
+    </div>
+  </div>
+</div>
diff --git a/_includes/ai-frozenrag.html b/_includes/ai-frozenrag.html
@@ -0,0 +1,40 @@
+<div class="full-width-bg component">
+  <div class="grid-wrapper">
+
+    <div class="width-12-12 width-12-12-m">
+      <h1>Frozen  RAG (Retrieval-Augmented Generation)</h1>
+      <p>Integrate RAG to anchor Large Language Model (LLM) responses in your enterprise data, with Quarkus handling ingestion pipelines, query execution, embedding generation, context retrieval, and seamless LLM interaction. This blueprint focuses on the foundational RAG pattern (also called frozen RAG); more advanced contextual RAG variants, including multi-source routing and reranking, are covered separately.</p>
+      <h2>Main Use-Cases</h2>
+      <ul>
+        <li><strong>Reduced Hallucinations:</strong> RAG ensures that LLM answers are explicitly tied to enterprise-specific sources such as policies, manuals, or knowledge bases. This grounding reduces the risk of fabricated or misleading responses and increases trust in AI-assisted decision-making.</li>
+        <li><strong>Up-to-Date Information:</strong> Because the retrieval step pulls directly from current document repositories and databases, responses adapt automatically as content evolves. There is no need to retrain or fine-tune the underlying model whenever business data changes.</li>
+        <li><strong>Cost Efficiency:</strong> By retrieving only the most relevant context chunks, prompts stay concise. This reduces token usage in LLM calls, which directly lowers cost while preserving accuracy and completeness.</li>
+        <li><strong>Java-Native Enterprise Integration:</strong> Quarkus provides a first-class runtime for embedding RAG workflows into existing enterprise systems. Developers can secure RAG services with OIDC or LDAP, expose them through familiar REST or Kafka APIs, and monitor them with Prometheus and OpenTelemetry. Because RAG runs inside the same application fabric as other Java services, it fits naturally into existing authentication, authorization, deployment, and observability workflows. This ensures AI augmentation is not just added, but part of the enterprise architecture.</li>
+      </ul>
+      <h2>Architecture Overview</h2>
+      <p>Contextual RAG focuses on integrating Retrieval-Augmented Generation (RAG) to ground Large Language Model (LLM) responses in organizational data.</p>
+      <p>The architecture is divided into two main phases:</p>
+      <p><strong>Ingestion:</strong> This phase prepares enterprise knowledge for retrieval. In a frozen RAG setup, data typically originates from unstructured document sources such as manuals, PDFs, or reports.</p>
+      <ul>
+        <li>Documents are processed by a "Text Splitter" to break them into smaller chunks.</li>
+        <li>These chunks are then converted into numerical representations (embeddings) using an "Embedding Model".</li>
+        <li>The embeddings are stored in a "Vector Store" for semantic similarity searches.</li>
+        <li>Metadata about the documents, such as lineage and other relevant information, is stored in a "Metadata Store".</li>
+      </ul>
+      <img class="light-only" src="{{site.baseurl}}/assets/images/ai/frozenrag-ingestion.png" alt="Frozen RAG ingestion image">
+      <img class="dark-only" src="{{site.baseurl}}/assets/images/ai/frozenrag-ingestion-dark.png" alt="Frozen RAG ingestion image">
+      <p><strong>Query:</strong> This phase handles user queries and generates grounded answers.</p>
+      <ul>
+        <li>A "User Query" is received and processed by a "Query Embedding" component to create an embedding of the query.</li>
+        <li>The query embedding is used in a "Similarity Search" against the "Vector Store" to retrieve relevant document chunks. The "Vector Store" "serves" the similarity search.</li>
+        <li>The retrieved chunks, along with metadata from the "Metadata Store" (which acts as the "source of truth"), are assembled into a "Context Pack".</li>
+        <li>The "Context Pack" is used by a "Prompt Assembly" component to construct an "Enhanced Prompt" that includes the relevant context.</li>
+        <li>The "Enhanced Prompt" is fed into an "LLM (LangChain4j)".</li>
+        <li>The LLM generates a "Grounded Answer" based on the provided context.</li>
+      </ul>
+      <img class="light-only" src="{{site.baseurl}}/assets/images/ai/frozenrag-query.png" alt="Frozen RAG query image">
+      <img class="dark-only" src="{{site.baseurl}}/assets/images/ai/frozenrag-query-dark.png" alt="Frozen RAG query image">
+      <p>This two-phase approach allows for reduced hallucinations in LLM responses, up-to-date information without retraining, cost efficiency by retrieving only relevant information, and seamless integration with existing enterprise Java services and workflows.</p>
+    </div>
+  </div>
+</div>