main-english.tex

% !TeX spellcheck = en-US
% !TeX encoding = utf8
% !TeX program = pdflatex
% !BIB program = biber
% -*- coding:utf-8 mod:LaTeX -*-
% !TEX root = ./main-english.tex

% vv  scroll down to line 200 for content  vv


\let\ifdeutsch\iffalse
\let\ifenglisch\iftrue
\input{pre-documentclass}
\documentclass[
  % fontsize=11pt is the standard
  a4paper,  % Standard format - only KOMAScript uses paper=a4 - https://tex.stackexchange.com/a/61044/9075
  twoside,  % we are optimizing for both screen and two-side printing. So the page numbers will jump, but the content is configured to stay in the middle (by using the geometry package)
  bibliography=totoc,
  %               idxtotoc,   %Index ins Inhaltsverzeichnis
  %               liststotoc, %List of X ins Inhaltsverzeichnis, mit liststotocnumbered werden die Abbildungsverzeichnisse nummeriert
  headsepline,
  cleardoublepage=empty,
  parskip=half,
  %               draft    % um zu sehen, wo noch nachgebessert werden muss - wichtig, da Bindungskorrektur mit drin
  draft=false
]{scrbook}
\input{config}


\usepackage[
  title={Implementing Variational Quantum Algorithms as Compositions of Reusable Microservice-based Plugins},
  author={Matthias Weilinger},
  type=master,
  institute=iaas, % or other institute names - or just a plain string using {Demo\\Demo...}
  course={Informatik},
  examiner={Prof.\ Dr.\ Dr.\ h.\ c.\ Frank Leymann},
  supervisor={M.Sc.\ Philipp Wundrack,\\M.Sc.\ Fabian Bühler},
  startdate={April 19, 2023},
  enddate={October 19, 2023}
]{scientific-thesis-cover}
\usepackage{ifoddpage}

\input{acronyms}

\makeindex

\begin{document}

%tex4ht-Konvertierung verschönern
\iftex4ht
  \Configure{$}{\PicMath}{\EndPicMath}{}
  \Css{body {text-align:justify;}}

  %conversion of .pdf to .png
  \Configure{graphics*}
  {pdf}
  {\Needs{"convert \csname Gin@base\endcsname.pdf
      \csname Gin@base\endcsname.png"}%
    \Picture[pict]{\csname Gin@base\endcsname.png}%
  }
\fi

%\VerbatimFootnotes %verbatim text in Fußnoten erlauben. Geht normalerweise nicht.

\input{commands}
\pagenumbering{arabic}
\Titelblatt

%Eigener Seitenstil fuer die Kurzfassung und das Inhaltsverzeichnis
\deftriplepagestyle{preamble}{}{}{}{}{}{\pagemark}
%Doku zu deftriplepagestyle: scrguide.pdf
\pagestyle{preamble}
\renewcommand*{\chapterpagestyle}{preamble}


%Kurzfassung / abstract
%auch im Stil vom Inhaltsverzeichnis

\section*{Abstract}
With its transformative processing capabilities, Quantum computing has ushered in a new computational era, presenting unparalleled opportunities and intricate challenges.
One potential beneficiary of this quantum revolution is the Digital Humanities. With quantum computing, the field has the potential to enhance its quantitative analysis dramatically.
QHAna, the Quantum Humanities Analysis tool explicitly designed for Quantum Digital Humanities, emerges as a pivotal system.
This thesis focuses on enhancing QHAna by integrating variational algorithms and paving the way for Variational Quantum Algorithms in a modular manner.
The objective is to encapsulate components of variational algorithms as distinct, interchangeable plugins, ensuring adaptability and enabling end users to adapt the algorithms.
Addressing challenges like robust plugin communication and intuitive user experience, the research delves into this modular framework's design, implementation, and evaluation.
Beyond the immediate application to Variational Quantum Algorithms, the insights and methodologies derived here lay the foundational groundwork for future modular system designs in the quantum computing domain.

% \section*{Kurzfassung}
% Quantum computing, mit ihren transformativen Verarbeitungsfähigkeiten, hat eine neue Ära der Berechnung eingeläutet und bietet sowohl noch nie dagewesene Chancen als auch komplexe Herausforderungen.
% Ein bedeutender Profiteur dieser Quantenrevolution ist die \gls{dh}, die durch digitale Werkzeuge ihrer traditionell qualitativen Domäne eine quantitative Dimension hinzugefügt hat.
% \gls{qhana}, speziell für \gls{qdh} entwickelt, tritt in diesem Kontext als ein zentrales System hervor.
% Diese Arbeit konzentriert sich auf die Verbesserung von \gls{qhana} durch die Integration von variationalen Algorithmen, insbesondere \glspl{vqa}, auf modulare Weise.
% Das Ziel ist es, Komponenten von variationalen Algorithmen als eigenständige, austauschbare Plugins zu kapseln, um Anpassungsfähigkeit und Benutzerzentriertheit zu gewährleisten.
% Die Forschung geht auf Herausforderungen wie robuste Plugin-Kommunikation und intuitive Benutzererfahrung ein und vertieft das Design, die Implementierung und die Bewertung dieses modularen Frameworks.
% Jenseits von \gls{vqa} legen die Ergebnisse eine Grundlage für das modulare Systemdesign in \gls{qdh}.

\cleardoublepage


% BEGIN: Verzeichnisse

\iftex4ht
\else
  \microtypesetup{protrusion=false}
\fi

%%%
% Literaturverzeichnis ins TOC mit aufnehmen, aber nur wenn nichts anderes mehr hilft!
% \addcontentsline{toc}{chapter}{Literaturverzeichnis}
%
% oder zB
%\addcontentsline{toc}{section}{Abkürzungsverzeichnis}
%
%%%

%Produce table of contents
%
%In case you have trouble with headings reaching into the page numbers, enable the following three lines.
%Hint by http://golatex.de/inhaltsverzeichnis-schreibt-ueber-rand-t3106.html
%
%\makeatletter
%\renewcommand{\@pnumwidth}{2em}
%\makeatother
%
\tableofcontents

% Bei einem ungünstigen Seitenumbruch im Inhaltsverzeichnis, kann dieser mit
% \addtocontents{toc}{\protect\newpage}
% an der passenden Stelle im Fließtext erzwungen werden.

\listoffigures
\listoftables

%Wird nur bei Verwendung von der lstlisting-Umgebung mit dem "caption"-Parameter benoetigt
\lstlistoflistings
%ansonsten:
  % \listof{Listing}{List of Listings}

%mittels \newfloat wurde die Algorithmus-Gleitumgebung definiert.
%Mit folgendem Befehl werden alle floats dieses Typs ausgegeben
% \listof{Algorithmus}{List of Algorithms}
%\listofalgorithms %Ist nur für Algorithmen, die mittels \begin{algorithm} umschlossen werden, nötig

% Abkürzungsverzeichnis
\printnoidxglossaries

\iftex4ht
\else
  %Optischen Randausgleich und Grauwertkorrektur wieder aktivieren
  \microtypesetup{protrusion=true}
\fi

% END: Verzeichnisse


% Headline and footline
\renewcommand*{\chapterpagestyle}{scrplain}
\pagestyle{scrheadings}
\pagestyle{scrheadings}
\ihead[]{}
\chead[]{}
\ohead[]{\headmark}
\cfoot[]{}
\ofoot[\usekomafont{pagenumber}\thepage]{\usekomafont{pagenumber}\thepage}
\ifoot[]{}


%% vv  scroll down for content  vv %%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Main content starts here
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Introduction}
\label{chap:introduction}

The dawn of quantum computing has revolutionized the landscape of computational research.
With its unparalleled processing capabilities and the potential to solve problems deemed impractical for classical computers, quantum computing heralds a new era of possibilities \cite{Shor}.
However, with these possibilities come challenges that demand innovative solutions, particularly in harnessing the power of quantum algorithms for diverse applications.

One domain that stands to benefit immensely from quantum computing is the \gls{dh}.
Traditionally, the humanities have been viewed through a lens of qualitative analysis.
However, with the advent of digital tools and methodologies, a quantitative dimension has emerged, enabling researchers to analyze vast datasets, uncover patterns, and derive insights that were previously unavailable \cite{Barzen2019}.
With its inherent strengths, Quantum computing offers the potential to elevate the \glspl{dh} to new heights, enabling analyses of unprecedented complexity and depth \cite{Barzen2022}.

Enter \gls{qhana} \cite{Buehler2022}.
Initially conceived and meticulously crafted for \gls{qdh}, its applicability has evolved, proving valuable not just for \glspl{dh} but for broader quantum computing applications.
The foundational architecture of the system is both robust and adaptable.
A logical extension of this framework is the integration of variational algorithms, particularly \glspl{vqa}, in a modular fashion.
The objective is to encapsulate components of variational algorithms as modular and interchangeable plugins within \gls{qhana}.
Such a modular approach promises enhanced adaptability towards a user-centric platform where researchers can precisely tailor their quantum optimization strategies to their needs.

However, the path to realizing this vision is full of challenges.
As plugins are essentially microservices, \gls{qhana} is a distributed system that brings its own challenges.
Integrating optimization as multiple plugins demands a robust communication mechanism, ensuring seamless interaction and data sharing.
Furthermore, the user experience must remain intuitive, allowing both quantum novices and experts to harness the full potential of the modular quantum optimization framework.

This work embarks on a journey to navigate those challenges. It aims to design and implement a modular framework within \gls{qhana}, delving into the intricacies of the unique plugin-based architecture of \gls{qhana}, the nuances of quantum optimization algorithms, and the challenges of ensuring seamless interaction between distinct components.
Moreover, the implications of this research extend beyond the limits of \gls{vqa} in \gls{qhana}.
By pioneering a methodology for \gls{vqa} plugin interactions within \gls{qhana}, this work lays the foundation for a broader paradigm of modular system design in the \gls{qdh}.

The thesis undertakes a comprehensive journey through modular \glspl{vqa} within \gls{qhana}.
Chapter \ref{chap:background} lays the groundwork by delving into the foundational principles and concepts that underpin the entire thesis.
Chapter \ref{chap:problem} sharpens the focus, presenting a clear and concise problem statement and outlining the objectives aimed to be achieved.
With a clear understanding of the problem at hand, Chapter \ref{chap:methodology} dives deep into the methodology.
The chapter explores the strategies and approaches employed to develop and evaluate the modular framework within \gls{qhana}.
Chapter \ref{chap:architecture} unveils the architectural blueprint of the modular framework.
Moving from design to implementation, Chapter \ref{chap:implementation} delves into the details of the framework's implementation.
In Chapter \ref{chap:results}, the design and implementation are put to the test. The architectural design's efficacy and real-world implementation are measured through rigorous evaluations, drawing insights from the results.
Chapter \ref{chap:discussion} delves into a thorough analysis of the findings and checks if set problems are adequately solved.
To place the work in a broader context, Chapter \ref{chap:relatedWork} explores the landscape of related research.
Finally, Chapter \ref{chap:conclusion} brings the journey to an end, summarizing the resulting framework and looking ahead to future possibilities and roads for further extension.


\chapter{Background}
\label{chap:background}

The study of \gls{qhana} and its advanced plugin interactions is rooted in foundational principles underpinning its functionality.
This chapter aims to offer a comprehensive introduction to these principles, encompassing key areas such as optimization algorithms, quantum computing, \glspl{vqa}, \gls{rest}, and the core architecture of \gls{qhana}.
By delving into these foundational topics, readers will gain the necessary context to understand the innovative approaches adopted in this thesis.

\section{QHana}
\label{sec:qhana}

\gls{qhana} was conceived as a specialized application in the domain of \gls{dh} and has since evolved into a versatile platform for quantum computing applications.
Its primary design allows users to explore various machine-learning algorithms on designated datasets.
While the primary vision of \gls{qhana} is to assess the potential advantages of quantum algorithms within the \gls{dh} community, the rise and cloud availability of quantum computers \cite{Castelvecchi2017} further underscore its relevance and timeliness.

\gls{qhana} is designed to be extensible, allowing the integration of new data sources and quantum algorithms as plugins.
Usually, plugins are built for specific applications, limiting their reusability in other contexts.
Moreover, an application's plugins must be developed in the same programming phrasing as the application.
Even if another application can reuse a plugin, a developer has to adapt the plugin's \gls{ui} to the new application.
Otherwise, users may need help understanding the plugin's functionality.
To address this limitation, \gls{qhana} is built on a novel concept of \glspl{ramp} \cite{Buehler2022}.
This concept allows microservices with an integrated \gls{ui} to be used as plugins by multiple applications, enhancing the reusability of the plugins across different applications.

\subsection{QHAna's Architecture}
\label{subsec:qhanaArchitecture}

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/qhanaarch.png}
  \caption{\gls{qhana}'s Architecture as depicted in \cite{Buehler2022}}
  \label{fig:qhanaarch}
\end{figure}

QHAna's architecture, as depicted in Figure \ref{fig:qhanaarch}, is primarily microservice-oriented, emphasizing modularity and extensibility, which is evident from its use of \glspl{ramp} and the concept of \emph{microfrontends}.
Such an architectural choice is geared towards ensuring scalability and adaptability in the rapidly evolving field of quantum computing.
Key components of \gls{qhana}'s architecture are:
\begin{description}
  \item[QHAna UI] Developed using Angular and TypeScript, the \gls{ui} serves as the primary interface for users, facilitating interaction with the system's functionalities.
  \item[QHAna Backend] The backend provides essential services, including a \gls{rest} API for the \gls{ui}, data management, and functions as a \emph{\glspl{ramp} Registry} to discover and load \glspl{ramp}.
  \item[RAMPs] These plugins encapsulate specific classical and quantum algorithms. Their development is streamlined by the Plugin Runner, a Python framework designed to handle generic \gls{ramp} tasks.
  \item[Database] This component is responsible for the persistent storage of data related to the experiments conducted within \gls{qhana}.
\end{description}

The \emph{\gls{qhana} \gls{ui}} offers users a selection of \glspl{ramp}.
When a \gls{ramp} is selected, its microfrontend is embedded within the \gls{ui} through an iframe.
The \emph{\gls{qhana} Backend}, central to the system, manages data storage, acts as a \gls{ramp} registry, and provides necessary services to the \gls{ui}.
\glspl{ramp}, designed for modularity, can be developed independently and integrated into \gls{qhana}.
The \emph{Plugin Runner} demands the adherence of \glspl{ramp} to system standards.
Data from external sources is processed by \glspl{ramp} and stored in the database.

\subsection{RAMPs: Bridging UI and Data Processing}
\label{subsec:ramps}

\glspl{ramp} are designed to bridge the gap between traditional plugins and microservices.
They offer a comprehensive approach to user interaction and data processing, distinguishing themselves from conventional plugins.
One of the primary advantages of \glspl{ramp} is their ability to integrate seamlessly with \glspl{ui}.
Unlike traditional plugins, which might require modifications to fit an application's \gls{ui}, \glspl{ramp} come with context-sensitive microfrontends.
This ensures that the plugin's \gls{ui} aligns effortlessly with the primary application's interface, enhancing user experience.
Furthermore, \glspl{ramp} introduce the possibility of multi-step \gls{ui} interactions, where users can be presented with sequential interfaces based on prior inputs or processing results, enhancing the depth and interactivity of the user experience.

In summary, \glspl{ramp} blend the best of microservices and plugins.
They combine the reusability and accessibility of microservices with the \gls{ui} adaptability of plugins, making them a valuable asset in diverse application settings.


\section{Quantum Computing}
\label{sec:quantumComputing}
Quantum computing is a cutting-edge field that exploits the principles of quantum mechanics to process information.
Unlike classical computers that use bits (0s and 1s) to store and process information, quantum computers use quantum bits, or \emph{qubits}.
Through superposition and entanglement, qubits can exist in multiple states simultaneously and correlate in ways that classical bits cannot \cite{Nielsen2010}.

Superposition allows a qubit to be in a state of a combination of 0 and 1, with a certain probability for each.
This property enables quantum computers to perform many calculations simultaneously, vastly increasing their potential computational power.
Entanglement allows qubits to be intimately linked regardless of the distance separating them.
A change in the state of one will instantaneously affect the state of the other, a phenomenon that Einstein famously called spooky action at a distance \cite{Einstein1935}.
This property is essential for many quantum algorithms, quantum error correction, and quantum teleportation, making it a fundamental resource in quantum information processing \cite{Nielsen2010, Preskill1998}.


\section{Variational Quantum Algorithms}
\label{sec:variationalQuantumAlgorithms}

\glspl{vqa} combine the principles of quantum computing and optimization uniquely and powerfully.
They are a class of hybrid quantum-classical algorithms that leverage the strengths of both quantum and classical computing to solve complex problems \cite{McClean2016}.

The central concept of \glspl{vqa} is to use a sequence of quantum operations (a \emph{quantum circuit}) controlled by specific parameters.
These parameters are adjusted using classical optimization techniques to solve a specific problem.
This problem, in many cases, involves finding the lowest energy state, or \emph{ground state}, of a quantum system.
This problem maps to finding the minimum of a particular function \cite{Peruzzo2013}.

By leveraging classical optimization algorithms, \glspl{vqa} become more resistant to quantum errors, as classical computers perform most of the computation.
This combination of quantum and classical resources makes \glspl{vqa} a promising algorithm for near-term quantum devices \cite{Moll2017}.


\section{Optimization Algorithms}
\label{sec:optimizationAlgorithms}
Optimization is a powerful tool ubiquitous in various scientific and technological domains.
At its core, optimization is about finding the best solution from a set of possible choices.
This section provides a snapshot of optimization's fundamental principles, paving the way for deeper exploration in the context of \glspl{vqa} and plugin-based \glspl{vqa}.

\subsection{Objective Functions}
\label{subsec:objectiveFunctions}
\glspl{of} are fundamental to optimization problems, underpinning many computational algorithms and models.
Depending on specific requirements, the aim might be to minimize or maximize these functions.
Notably, within the realm of \glspl{vqa}, the focus is primarily on function minimization \cite{Weinan2017}.

The core inputs to a \gls{of} typically encompass data points (denoted as $x$), corresponding labels or outcomes (represented by $y$), and a set of parameters or weights (often symbolized by $\theta$ or $w$).
These parameters dictate how the model responds to the input data and makes predictions.
Additionally, certain \glspl{of} may also include hyperparameters as input, which control the behavior and complexity of the model.
In the context of optimization problems, the role of an \gls{of} is to capture both the problem we are attempting to solve and the strategy by which we are trying to solve it.
It provides a measure of the 'goodness' or 'fitness' of our current solution or parameters, and the aim is to adjust these parameters to improve this measure.

One example of an \gls{of} is the Lasso (Least Absolute Shrinkage and Selection Operator) Loss function.
The Lasso loss function has the form:
\[
L(Y, X, W, \lambda) = ||Y - XW||^2_2 + \lambda ||W||_1
\]
In this equation:

\begin{itemize}
  \item \(Y\) is the vector of observed values.
  \item \(X\) is the matrix of input data points.
  \item \(W\) is the vector of weights, the model's parameters.
  \item \(\lambda\) is the regularization parameter, a non-negative hyperparameter.
\end{itemize}

This function consists of two terms:
\begin{enumerate}
  \item The first term \(||Y - XW||^2_2\) is the mean squared error between the predicted and actual outcomes.
  It measures the discrepancy between the model's predictions and the true values.
  \item The second term \(\lambda ||W||_1\) is a regularization term, where \(||W||_1\) represents the L1 norm (sum of absolute values) of the weight vector.
  This term penalizes the absolute size of the coefficients, encouraging them to be small.
\end{enumerate}
The hyperparameter \(\lambda\) governs the trade-off between these two terms.
When \(\lambda = 0\), the \gls{of} reduces to ordinary least squares regression, and the weights are chosen to minimize the mean squared error alone.
As \(\lambda\) increases, more weight is given to the regularization term, and the solution becomes more sparse (i.e., more weights are driven to zero).
Increasing the regularization term can help to prevent overfitting by effectively reducing the complexity of the model \cite{ShalevShwartz2014}

\subsection{Minimization Functions}
\label{subsec:minimizationFunctions}
Minimization functions, generally called optimization algorithms, are pivotal in many computational models and algorithms.
In essence, they serve to iteratively enhance the parameters of a model to reduce the value of the \gls{of}.
These minimization functions aim to find the optimal set of parameters that yield the lowest possible value of the \gls{of} within the constraints of the problem \cite{Nocedal2006}.

The process of optimization involves a search through the parameter space.
This search can be visualized as navigating a landscape of hills and valleys, with each point in the landscape corresponding to a different set of parameters and the height at each point representing the value of the \gls{of}.
The goal of the minimization function is to find the lowest point in this landscape, corresponding to the minimum value of the \gls{of} \cite{Goodfellow2017}.

The core inputs to a minimization function are the initial parameters of the model or weights (denoted as \(\theta\) or \(w\)),
the \gls{of} that needs to be minimized, and optionally, the gradient of the \gls{of}.
Additionally, certain minimization functions may include hyperparameters as input, which control the behavior and complexity of the optimization process \cite{Virtanen2020}.
For instance, the learning rate is a typical hyperparameter that determines the step size in each iteration of the optimization process.

One of the most fundamental and widely used minimization functions is the Gradient Descent.
To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the function's gradient (or approximate gradient) at the current point.

The update rule of gradient descent has the form:

\[
\theta_{t+1} = \theta_t - \alpha \nabla F(\theta_t)
\]

In this formula:

\begin{itemize}
  \item \(\theta_{t+1}\) represents the parameters at the next time step.
  \item \(\theta_t\) represents the current parameters.
  \item \(\alpha\) is the learning rate, a positive scalar determining the step size.
  \item \(\nabla F(\theta_t)\) is the gradient of the \gls{of}.
\end{itemize}

Here, the \gls{of} \(F\) is assumed to be a differentiable function.
The gradient \(\nabla F(\theta_t)\) provides the direction of the steepest ascent at the point \(\theta_t\), and \(-\nabla F(\theta_t)\) provides the direction of the steepest descent.
We move towards the minimum of the function by taking a step in this direction.

The choice of minimization function can significantly influence the efficiency and success of the optimization process.
While some minimization functions may perform well on certain problems, they may yield different results on others.
Therefore, it is crucial to understand the underlying mechanisms of these functions and their suitability to the specific problem at hand.


\chapter{Problem Statement and Objectives}
\label{chap:problem}

Optimization algorithms, with their ability to find the best possible solution from a set of feasible solutions, play a pivotal role in numerous computational domains.
\glspl{vqa} are a subset of these algorithms that leverage quantum computing principles, particularly in the realm of \glspl{of} and minimization techniques.
However, the true potential of optimization is often hindered by rigid platforms where the components of these algorithms are tightly integrated, limiting adaptability and innovation.

\gls{qhana}, with its unique environment tailored for experimenting with plenty of machine learning and quantum algorithms, presents an opportunity to redefine this paradigm.
However, its current architecture does not fully exploit the modular benefits that can be achieved by decoupling the components of optimization algorithms.
Furthermore, while developing this modular framework, allowing plugins to interact with each other is essential.
This interaction-centric concept, once established, can be universally applied across \gls{qhana}, not just for optimization but for any scenario where plugin interaction is required.

\textbf{Problem Statement:}
How can we design and implement a modular framework within \gls{qhana} that allows components of optimization algorithms, specifically \glspl{of} and minimization functions, to be implemented as distinct, interchangeable plugins?
Furthermore, how can these plugins be structured to communicate and collaborate seamlessly, especially in the context of \glspl{vqa}?

This problem encompasses several challenges:

\begin{itemize}
    \item \textbf{Communication:} Establishing a robust communication mechanism that enables interaction, data sharing, and collaboration among these plugins.
    \item \textbf{Interchangeability:} Designing a system where different \gls{of} and minimization plugins can be effortlessly swapped, ensuring adaptability in optimization and \glspl{vqa}.
    Implementing a consistent interface for these plugins, ensuring uniformity and compatibility across various \gls{of} and minimization plugins.
    The design should ensure that interchangeability accommodates future use cases beyond the optimization context.
    \item \textbf{User Experience:} Providing an intuitive \gls{ui} for users to easily choose and experiment with different optimization components tailored to their needs.
    \item \textbf{Developer Experience:} Providing developers with an extensible framework to build new minimization and \gls{of} plugins.
  \end{itemize}

Addressing this problem is essential to enhance the capabilities of \gls{qhana}, transforming it into a dynamic, adaptable, and user-centric platform for experimenting with optimization and \glspl{vqa}.
When solving this problem, technical requirements and constraints must be considered.
\gls{qhana}, with its design centered on the Digital Humanities, provides an extensible platform for experimenting with various algorithms.
\gls{qhana}'s architecture predominantly revolves around the concept of \glspl{ramp} \cite{Buehler2022}.
The objective is to leverage \gls{qhana}'s inherent modularity by enabling components of optimization algorithms to function as distinct and interchangeable plugins.
This brings us to the significance of modularity in optimization.
When \glspl{vqa} are designed with modular components, it allows for increased flexibility.
Specifically, having distinct \glspl{of} and minimization functions means parts of the algorithm can be adjusted or replaced seamlessly.
This reflects the goals of modularity and flexibility intrinsic to \gls{qhana}'s design.

Interactivity in \gls{qhana} should encompass various facets.
No plugin should be aware of specific user input requirements of another plugin to enable interchangeability and loose coupling between plugins.
This means that an invoking plugin should not have input fields that are specific to the invoked plugin.
Therefore, a plugin should be able to invoke the microfrontend of another plugin where the user can provide the required input.
Control should revert to the invoking plugin once the invoked plugin completes its tasks.
Interactions between plugins can have varying degrees of complexity, so the interaction mechanism should be flexible enough to accommodate different scenarios.
Both short-running tasks and long-running tasks should be facilitated.
For the latter, a callback mechanism is proposed, wherein the invoking plugin is notified upon completion of a long-running task by the invoked plugin.

The need for such interactivity stems from the inherent nature of optimization.
The \glspl{of} and minimization must closely interact to produce meaningful optimization results.
Moreover, since an invoking plugin is unaware of the parameters or requirements of the invoked plugin, direct interaction becomes imperative for dynamic data exchange and collaborative processing.

Drawing inspiration from existing systems, the interaction between the minimization and \glspl{of} in implementations like SciPy's \emph{optimizer.minimize} \cite{Virtanen2020} function serves as a precedent.

This thesis introduces the concept of \emph{interaction endpoints} to \gls{qhana} to elevate the interactivity between plugins.
While \gls{qhana} already employs a metadata field for plugins known as \emph{entry points} -- endpoints invoked by the \gls{qhana} \gls{ui} to render the \gls{ui} of a plugin -- interaction endpoints extend this paradigm by marking specific endpoints as callable by other plugins.

A defining characteristic of an interaction endpoint is its \emph{type}.
Interaction endpoints sharing the same type uphold uniformity in their signature and return type.
This typing ensures that they are invoked consistently by other plugins, irrespective of their specific implementation.
Inspiration for this typing mechanism is drawn from the OpenAPI specification \cite{Miller}, which defines a standard, language-agnostic interface for \gls{rest} APIs.

The proposed approach complements \gls{qhana}'s existing principles, resonating with its emphasis on extensibility, adaptability, and user-centric design.
By enhancing the interactivity and modularity of plugins within \gls{qhana}, we strive to elevate its capabilities, making it a more dynamic platform for optimization and \glspl{vqa}.

The subsequent sections of this thesis will delve into the methods, implementations, and evaluations related to this problem.


\chapter{Methods}
\label{chap:methodology}
This section outlines the method developed in this work for enabling modular \glspl{vqa} consisting of multiple plugins.
The chapter begins by defining the architectural design strategy and explaining the process of decomposing the optimization process into distinct plugins that interact.
Next, the implementation strategy is defined, outlining the tools and technologies employed in the implementation process.
Finally, the evaluation strategy is delineated, explaining the approach to evaluate the efficacy of the proposed design and implementation.

\section{Architectural Design Strategy}
\label{sec:architecturalDesignStrategy}

The architectural design process is a critical phase that dictates how the solution's components will function and interact with each other.
The final architectural design will be chosen by iteratively refining the existing architecture over multiple steps, considering different aspects like clarity, modularity, and extensibility.
Before diving into the details of the design process, it was essential to understand the existing architecture of \gls{qhana}.

\paragraph{Decomposition into Plugins:}
The first step involves decomposing the optimization process into distinct plugins.
Decomposition is achieved by reviewing relevant literature to understand standard practices and methods in optimization \cite{Virtanen2020, Nocedal2006, ShalevShwartz2014, Weinan2017}.
The main challenge is to find a decomposition so that, on the one hand, any functionality that should be interchangeable is encapsulated in an extra plugin,
on the other hand, splitting the process into too many plugins would be inefficient due to the overhead of the required communication between plugins.
The main question is how to encapsulate the \gls{of} and the minimization function and their coordination.
In order to represent the decomposition, a UML component diagram is created.

\paragraph{Defining Plugin Responsibilities:}
Once the plugins are identified, the next step is to define the responsibilities of each plugin precisely.
This ensures that every plugin has a well-defined purpose, preventing overlaps in functionality and ensuring clarity in their roles.
Several crucial decisions delineate these responsibilities:

\begin{itemize}
  \item Which one prompts the user for the \gls{of} hyperparameters?
  \item Which one gathers the minimization function hyperparameters from the user?
  \item Which one inquires about the user's preference for the \gls{of} and minimization algorithm?
  \item Which one solicits the input data from the user?
  \item Which one requests the target variable?
\end{itemize}

\paragraph{Universal Plugin Interface Design:}
Recognizing the vast possibilities and variations that interchangeable plugins might encompass, it is crucial to formulate a universally adhered-to interface.
This interface acts as a \emph{contract}, ensuring that a consistent mode of interaction exists irrespective of the specific implementation details of a plugin.
The notion of interface contracts in microservice architectures is not new and is inspired by the OpenAPI specification \cite{Miller}.
Core attributes and functionalities, like querying the number of initial weights for the minimization process required, are defined as part of this universal interface.
This approach fosters interchangeability and adaptability, as the defined interface could accommodate a multitude of interchangeable plugins, each with its unique implementations.
In terms of optimization, this could mean that in the case of an \gls{of} plugin responsible for calculating the loss function, interfaces must be generalized to allow for any thinkable type of loss function.
A UML component diagram visualizes the interfaces of each plugin and how they are connected.

\paragraph{Plugin Interaction Design:}
The next step is to design the interaction between plugins.
This design process involved defining the various ways in which plugins could interact with each other.
It is crucial to build a robust and flexible interaction design to accommodate many scenarios since the interaction between plugins in \gls{qhana} should apply not only to optimization.
This interaction design should lay the foundation for all future multi-plugin interactions within \gls{qhana}.

Scenarios include, for instance, a plugin invoking the microfrontend of another plugin, or it could call a specific endpoint of another plugin.
Additionally, the interaction design also encompasses the flow of control between plugins.
A plugin should be able to invoke another plugin and then stop execution for as long as the invoked plugin is running.
Alternatively, a plugin should be able to invoke another plugin and then continue with its tasks without waiting for the invoked plugin to complete its tasks.
Intrinsic to this step is the design of a coordination mechanism that facilitates the interaction between plugins.
A UML sequence diagram shows the complete process of a generalized optimization run to represent the interactions between plugins.

\paragraph{Feedback and Refinement:}
Throughout the design process, continuous feedback loops are integrated. This involves:
\begin{itemize}
\item Revisiting each stage, evaluating its alignment with overarching goals.
\item Making necessary refinements to ensure the architecture is robust and flexible.
\end{itemize}

With the architectural design strategy in place, the next step is to define the implementation strategy.

\section{Implementation Strategy}
\label{sec:implementationStrategy}

Before diving into the implementation, an in-depth study of the \gls{qhana} documentation \cite{FabianBuehler} and a related paper on \gls{qhana}'s architecture \cite{Buehler2022} was undertaken to gain a comprehensive understanding of the system.
With component and sequence diagrams already in place, the next step is to define the implementation strategy.

\subsection{Development Environment and Toolset}

The choice of development tools and environment plays a pivotal role in the successful execution and maintenance of a project.
The right toolset facilitates smooth development and enhances the system's efficiency, productivity, and scalability.
The following list delineates the tools and technologies employed in the implementation process, each chosen for its specific capabilities tailored to the requirements of the project:

\begin{itemize}
    \item \textbf{Visual Studio Code}: A versatile integrated development environment employed for its adaptability and support for Python development using extensions, facilitating comprehensive tools for coding, debugging, and testing.

    \item \textbf{Docker}: Utilized to run all components of \gls{qhana}, ensuring consistent behavior across different computing environments and simplifying the deployment process.

    \item \textbf{Postman}: An API testing tool employed to validate and debug various endpoints, ensuring consistent and expected behavior of the plugin interactions.

    \item \textbf{Python}: As \gls{qhana} is implemented in Python, which offers versatility and a vast library ecosystem.

    \item \textbf{Flask}: A lightweight Python web framework employed to develop the web services and \gls{rest}ful APIs for the plugins.

    \item \textbf{Marshmallow}: A Python library pivotal for object serialization/deserialization, ensuring structured data transfer between the plugins.

    \item \textbf{Flask-Smorest}: An extension of Flask, offering tools for building \gls{rest}ful APIs with Flask and Marshmallow, ensuring structured and accurate data transfer between plugins.

    \item \textbf{Celery}: A task queue to manage long-running tasks, particularly for minimizer plugins, allowing for asynchronous task execution.

    \item \textbf{Requests}: A Python library for interacting with HTTP endpoints. It invokes endpoints of plugins.
\end{itemize}

\subsection{Key Principles of QHana Plugins}

In the \gls{qhana} ecosystem, plugins play a pivotal role in extending functionality.
A set of principles governs the creation and integration of these plugins to ensure their seamless operation and interaction \cite{FabianBuehler}.
All plugins must adhere to these principles to ensure compatibility and consistency across the system.
To understand the decisions made in this work, consider at least the following principles:

\begin{itemize}
    \item \textbf{Plugin Definition:}
    A plugin is a Python module or package.
    Conventionally, it inherits from the \texttt{QHAnaPluginBase} class and resides in a directory specified by the \texttt{PLUGIN\_FOLDERS} configuration variable.
    The plugin imports its implementation class and all associated Celery tasks.

    \item \textbf{Plugin Metadata and Endpoints:}
    A plugin should contain metadata and links to all its endpoints. It is reachable via the "./" path.
    The metadata includes crucial information like entry points.

    \item \textbf{UI Interaction:}
    Plugins define both \texttt{href} and \texttt{hrefUi} for each plugin with an \gls{ui}.
    The `hrefUi` serves the microfrontend where users input data, and `href` is the underlying endpoint providing the functionality.

    \item \textbf{Data Handling in Multi-step Plugins:}
    Data is stored in a key-value store for plugins requiring multiple user input and processing steps.
    A plugin task is associated with a unique database ID, and subsequent endpoint URLs of a plugin typically include the ID in the URL path \texttt{http(s)://.../<ID>/<endpointName>}.
    That way, the plugin can retrieve the data from the database using the ID.

    \item \textbf{Handling Long Running Tasks:}
    For a plugin to be able to handle long-running tasks, \gls{qhana} offers the Celery framework.
    Celery is a task queue that allows for asynchronous task execution.

    \item \textbf{File Loading from URLs:}
    Plugins have to be able to load files from URLs.

    \item \textbf{Data Format Specification:}
    Data formats, especially those shared across plugins, should be defined per the guidelines of \gls{qhana}.
    For instance, for the \texttt{text/csv} format pertaining to entities:
    \begin{itemize}
        \item The first column must be the ID column (named ID). Subsequent columns represent entity attributes.
        \item The CSV file should contain a header row specifying all attribute names.
    \end{itemize}
\end{itemize}

It is imperative to note that while the outlined principles are crucial to this thesis, \gls{qhana}'s overarching documentation provides a more exhaustive list of best practices and guidelines for plugin creation \cite{FabianBuehler}.


\subsection{Data Handling and Transfer}
Flask-Smorest is instrumental in ensuring structured and accurate data transfer between plugins.
It provides functions for validating the correctness of passed data, returning appropriate error codes, and managing errors efficiently based on schemas.
This ensures that data exchanged between plugins is smooth and error-free.

\subsection{Testing and Debugging}

A multifaceted testing strategy addresses the paramount importance of the quality and reliability of plugins and their interactions:

\begin{enumerate}
    \item \textbf{Static Code Analysis}:
    \begin{itemize}
        \item \textbf{Purpose}: To ensure code quality and maintainability and to identify potential vulnerabilities or deviations from coding standards.
        \item \textbf{Tools \& Implementation}: The tool \emph{flake8}, in combination with type hints, is utilized to conduct static code analysis on the Python codebase.
        \emph{Flake8} generates a report detailing any code inconsistencies, potential errors, or areas for improvement and provides valuable feedback for refinement.
    \end{itemize}

    \item \textbf{Logging and Monitoring}:
    \begin{itemize}
        \item \textbf{Purpose}: To capture, store, and analyze real-time information about the system's operations, aiding in troubleshooting and understanding system behavior.
        \item \textbf{Tools \& Implementation}: Python's in-built \texttt{logging} package is leveraged to track and record various events during the execution of plugins.
        By strategically placing logging statements within the code, it is possible to gain insights into the flow of operations, detect anomalies, and pinpoint areas that might require attention.
    \end{itemize}

    \item \textbf{Interactive Debugging}:
    \begin{itemize}
        \item \textbf{Purpose}: To step through the code in real-time, inspect variables, and analyze the program's flow to identify and rectify issues.
        \item \textbf{Tools \& Implementation}: The integrated Python debugger in Visual Studio Code is employed.
        This debugger allowed for setting breakpoints, stepping through code, inspecting variable states, and examining the call stack, providing a granular view of the system's operations and aiding in issue identification and resolution.
    \end{itemize}

    \item \textbf{Manual Testing}:
    \begin{itemize}
        \item \textbf{Purpose}: To capture nuances and potential issues that might be overlooked.
        \item \textbf{Procedure}: A hands-on approach is adopted where the plugins are interactively used.
        This involved navigating through the \gls{ui}, experimenting with different inputs, and observing the system's reactions to ensure it behaved as expected and met user requirements.
    \end{itemize}
\end{enumerate}

Employing a combination of static code analysis, detailed logging, interactive debugging, and manual testing ensures functional correctness and that plugins adhere to coding standards and best practices.


\subsection{Performance Optimization}
The system employs several strategies to optimize performance in the microservice-based plugin architecture:

\begin{enumerate}
    \item Reducing the number of interactions between plugins.
    \item For each interaction, the amount of data transmitted across the network should be kept to a minimum to ensure efficient communication and faster response times.
    \item For tasks that require extended processing, the Celery framework should be utilized, allowing these tasks to operate asynchronously and optimizing resource usage.
\end{enumerate}

These strategies are critical in ensuring swift and seamless interactions between plugins.


\subsection{Documentation and Extensibility}
The code includes comprehensive documentation that details how to add new \gls{of} and minimization plugins, facilitating the development of new plugins.
This documentation serves as a guideline for developers aiming to extend the capabilities of \gls{qhana} with plugin interaction.


\section{Evaluation Strategy}
\label{sec:evaluationStrategy}

To validate the effectiveness of the proposed solutions and to assess whether the goals set out in the problem statement have been achieved, the following evaluation methods are adopted:

\paragraph{Performance Benchmarking:}
\label{subsec:performanceBenchmarking}
Performance is paramount, especially in a plugin-based architecture.
A direct comparison shows the differences between the two plugin-based and non-service-based approaches.
Critical metrics for this comparison include:

\begin{itemize}
\item \textbf{Objective Function Calculation Time}: This measures the time to retrieve the loss, directly impacting the overall optimization time.
For the plugin-based approach, this is the time, as observed by the client calling the calculation endpoint, meaning it includes the network latency.
As all system components run on the same machine, network, in this case, means the \emph{localhost}.

\item \textbf{Minimization Time}: This refers to the time needed to minimize the \gls{of}.
The service-based approach includes the time taken for the minimization endpoint to return a result via the network.

\item \textbf{Network Latency}: This exclusively quantifies the time required for a request to reach the server and for the corresponding response to be received.
This metric does not include any computation time that occurs on the server.
This metric is exclusive to the service-based approach.

\item \textbf{Database Access Times}: It is essential to gauge the time required to retrieve data from the database since endpoints access context data during each invocation.
\end{itemize}

This evaluation hinges on quantitative metrics, with results graphically depicted for enhanced clarity.
The \emph{time.perf\_count} function from Python measures the time taken for a function to execute, providing a reliable and accurate measure of performance.

\paragraph{Interchangeability:}
The accurate measure of interchangeability is the ability to swap components without causing disruptions.
Accordingly, all allowed combinations of plugins were tested to validate seamless interchangeability.

\paragraph{Standardization Adherence:}
Standardization is checked to ensure compatibility and uniformity across diverse plugins.
An evaluation determines whether all implemented plugins conform to the prescribed standards.
This guarantees that all possible optimization plugins can be implemented uniformly, facilitating consistent and compatible integrations in the future.

\paragraph{Developer Experience:}
The developer experience has to be maintained.
To measure the experience, the steps required to implement a new plugin are counted, and each step is evaluated for its complexity.
The evaluation acknowledges any documentation that guides developers through the process.
An assessment determines the ease and efficiency with which a developer can introduce a new plugin in plugin-based optimization in \gls{qhana}.
The goal is to ensure that the plugin-based approach is as easy to implement as the non-service-based approach.

This thesis thoroughly appraises the solutions concerning the challenges outlined in the problem statement by meticulously evaluating these parameters.

\section{Test Data Generation for Evaluation}
To robustly evaluate the solutions proposed in this work, testing them on diverse datasets with different sizes and complexity is essential.
The objective is to mimic real-world scenarios where optimization problems can range from simple tasks with a few data points to complex challenges with many features and data points.
The code used to generate the test data can be found in the \hyperref[chap:appendix]{Appendix}.
The following criteria are employed to generate the test data:

Datasets of different \emph{sizes} are generated, spanning from a modest 200 data points to a substantial 1400 data points.
This variation ensures that the optimization performance is assessed across different scales, from quick-to-process small datasets to computationally demanding large datasets.
The \emph{number of features} in each dataset is dynamically determined based on the dataset's size, calculated explicitly as \(\lfloor \sqrt{\text{size}} \times 1.5 \rfloor\).
This approach ensures that with the growth of the dataset, its complexity also increases, mirroring real-world scenarios where larger datasets often present more features or dimensions.

The \texttt{make\_regression} function from the Scikit-learn library generates data.
An added \emph{noise} parameter introduces an element of randomness, making the optimization task more intricate and resembling real-world challenges.

To maintain the consistency and reliability of the generated datasets across multiple runs or evaluations, a fixed random seed (\texttt{numpy.random.seed(42)}) has been set.
This ensures that the data, though synthetic and noisy, remains consistent across evaluations, enabling accurate comparisons, assessments, and \emph{reproducibility}.

Two criteria are employed to adhere to \gls{qhana}'s data standards:
Each data point in the dataset is allocated a unique ID in the format \emph{entityX}, where X represents the entity number.
This aligns with \gls{qhana}'s data standard that mandates every data point to have an identifiable ID.
All datasets are stored in the CSV (Comma Separated Values) format, adhering to \gls{qhana}'s accepted data formats.

% Listing \ref{lst:data_csv} shows an excerpt of a generated CSV data set with 10 data points 4 features and a noise of 10.
% For better readability, the number of decimal places has been reduced to 4.
% The first column contains the entity IDs, the last column the target variable, and the columns in between the features.

% \begin{center}
% \lstinputlisting[language={},caption={Excerpt of a test data file with 10 data points},label={lst:data_csv}]{data/data.csv}
% \end{center}


By employing such datasets, the methodology objectively evaluates the optimization solutions, assessing their performance, interchangeability, and user experience across various scenarios.


\chapter{Resulting System Architecture}
\label{chap:architecture}

This chapter delves into the intricate architectural blueprint of the proposed plugin-based optimization framework.
It explores the decomposition into distinct plugin types, each with clearly defined roles and responsibilities.
The chapter also elucidates the universal plugin interfaces designed to ensure seamless interaction between different plugins, fostering modularity and extensibility.
By the end of this chapter, readers will gain a comprehensive understanding of the system's interaction flow, its various components, and their interdependencies, all illustrated through detailed component and sequence diagrams.

\section{Resulting Decomposition into Plugins and their Responsibilities}
\label{sec:resdecomposition}

Following the decomposition strategy, the plugin-based optimization framework is divided into three primary plugin types:
\gls{of} plugin, the minimizer plugin, and the coordinator plugin.
These plugins have specific roles and responsibilities, ensuring a modular and efficient optimization process.
This split is visualized in the component diagram in figure \ref{fig:component_diagram}.
It is important to note that this decomposition and the associated responsibilities remain consistent for both proposed plugin-based approaches.

\paragraph{Objective Function Plugin:}

The \gls{of} plugin is central to the optimization framework, encapsulating the mathematical function that defines the problem at hand.
Its primary roles include:

\begin{itemize}
\item \textbf{Metadata Provision}: The plugin offers metadata about itself.
\item \textbf{Hyperparameters Acquisition}: It prompts the user to provide the hyperparameters necessary for the loss function calculation.
\item \textbf{Loss Calculation}: The plugin computes the loss based on the provided input data, representing the discrepancy between predicted and target values.
\item \textbf{Gradient Calculation (Optional)}: For optimization algorithms that leverage gradient-based methods, the plugin can optionally compute the gradient of the loss function.
The gradient aids in guiding the optimization process toward the desired minimum.
\end{itemize}

\paragraph{Minimizer Plugin:}

The minimizer plugin is responsible for iteratively adjusting parameters to minimize the loss provided by the \gls{of} plugin.
Its functions include:

\begin{itemize}
\item \textbf{Metadata Provision}: Similar to the \gls{of} plugin, it provides essential metadata.
\item \textbf{Hyperparameters Acquisition}: The plugin acquires the hyperparameters crucial for the employed minimization algorithm from the user.
\item \textbf{Minimization Process}: Using the loss (and optionally the gradient) from the \gls{of} plugin, the minimizer plugin endeavors to find the parameter values that minimize this loss.
\end{itemize}

\paragraph{Coordinator Plugin:}

The Coordinator plugin acts as the orchestrator, ensuring seamless interaction between the \gls{of} and minimizer plugins and the user.
Its primary responsibilities are:

\begin{itemize}
\item \textbf{Plugin Selection}: It prompts the user to select the desired \gls{of} and minimizer plugins for optimization.
\item \textbf{Data Acquisition}: The plugin gathers the necessary input data and the target variable from the user, which the optimization process will use.
\item \textbf{Endpoint Acquisition}: It obtains the necessary endpoints from the selected plugins.
\item \textbf{Coordination Role}: It manages the interaction between the \gls{of} and minimizer plugins, ensuring that the loss (and optionally gradient) calculation function is provided to the minimizer plugin for the optimization process.
Additionally, it coordinates the interaction between the user and the selected plugins, ensuring the user sees the necessary microfrontends.
\item \textbf{Results Presentation}: Post-optimization, the coordinator plugin presents the optimization results to the user.
\end{itemize}


\section{Universal Plugin Interface Design}

The process of optimization is inherently complex, with a multitude of variations and nuances.
It is imperative to establish universal plugin interfaces for each type of plugin to ensure a streamlined interaction between different plugins.
This interface acts as a standard that every plugin of a specific type has to adhere to.
The interfaces for each plugin are detailed below and are visualized in the component diagram in figure \ref{fig:component_diagram}.

\begin{figure}[ht]
    \centering
    \includegraphics[width=\textwidth]{graphics/plugin_decomposition.svg}
    \caption{Component Diagram of decomposed optimization process}
    \label{fig:component_diagram}
\end{figure}

\paragraph{Objective Function Plugin Interfaces:}

The \gls{of} plugin interface accommodates a wide range of loss functions, including those that offer gradient computation.
Its interfaces are:

\begin{itemize}
\item \textbf{Metadata}: The plugin provides metadata about itself as specified in the \gls{qhana} documentation \cite{FabianBuehler}.
\item \textbf{UIRef}: This endpoint returns the microfrontend for the \gls{of} plugin, where the user inputs the hyperparameters for the calculation.
This interface also follows the \gls{qhana} documentation \cite{FabianBuehler}.
\item \textbf{HRef}: This endpoint is used to process the input from the \gls{of} microfrontend and is therefore usually called the \textbf{processing} endpoint.
It is usually triggered by the user clicking the \emph{submit} button on the microfrontend.
This interface also follows the \gls{qhana} documentation \cite{FabianBuehler}.
\item \textbf{PassData}: Via this endpoint, the coordinator passes the input and target data to the \gls{of} plugin.
It returns the number of weights required for the optimization process.
It returns the number of weights required by the objective function.
\item \textbf{CalculateLoss}: This endpoint calculates the loss.
\item \textbf{CalculateGradient (Optional)}: This endpoint calculates the gradient of the loss. It is optional since the gradient cannot be computed efficiently for all \glspl{of}.
\item \textbf{CalculateLossAndGradient (Optional)}: This endpoint calculates the loss and the gradient of the loss function. Similar to the previous endpoint, it is optional.
\end{itemize}

\paragraph{Minimizer Plugin Interfaces:}

The minimizer plugin, responsible for optimizing the loss function provided by the \gls{of} plugin, offers these interfaces:

\begin{itemize}
  \item \textbf{Metadata}: same as for the \gls{of} plugin
  \item \textbf{UIRef}: same as for the \gls{of} plugin
  \item \textbf{HRef}: same as for the \gls{of} plugin
  \item \textbf{Minimize}: This endpoint minimizes the loss function.
\end{itemize}

\paragraph{Coordinator Plugin Interfaces:}

The coordinator plugin, orchestrating the interaction between the \gls{of} and minimizer plugins, is equipped with the standard \gls{qhana} interfaces \textbf{Metadata}, \textbf{UIRef}, and \textbf{HRef}.

A systematic, consistent, and efficient optimization process is ensured by establishing these interfaces and ensuring that each plugin conforms to them.
This structured approach facilitates seamless interactions and fosters interchangeability, modularity, and extensibility, making it easy to add new \gls{of} and minimizer plugins in the future.

\section{Plugin Interaction Design}

The design of plugin interactions is pivotal to ensuring efficient and seamless coordination between different system components.
Given the extensive possibilities of interactions within the \gls{qhana} environment, the design phase is meticulous, considering various scenarios and ensuring adaptability.
The two primary modes of interaction are short-running and long-running, each catering to specific requirements.

\paragraph{Short-Running Interaction:}

In short-running interactions, a plugin invokes another plugin's endpoint, typically via a `GET` or a `POST` request, and immediately receives a response.
This mode of interaction is synchronous, wherein the invoking plugin waits for the response before proceeding.
Instances of such interactions in the optimization context are:
\begin{itemize}
    \item The coordinator plugin retrieves metadata from the \gls{of} and minimizer plugins.
    \item The coordinator plugin passes data to the \gls{of} plugin.
    \item The minimizer plugin calls the \texttt{CalculateLoss} endpoint of the \gls{of} plugin.
\end{itemize}

\paragraph{Long-Running Interaction:}

Long-running interactions come into play for processes requiring extensive computation time or involving multiple steps.
In such a case, the calling plugin invokes the endpoint of the invocable plugin.
The invoked endpoint then schedules a long-running task and returns immediately.
The calling plugin can then continue with its execution asynchronously without waiting for the long-running task to finish.
The calling plugin gets an additional endpoint that is called by the long-running task once it finishes.
This endpoint is called a callback endpoint.
Instances of such interactions are:
\begin{itemize}
    \item The coordinator schedules the microfrontend of the \gls{of} plugin.
    The coordinator plugin provides a callback endpoint to the \gls{of} plugin that is called once the user enters the input and the input is processed.
    \item The coordinator schedules the microfrontend of the minimizer plugin.
    This mechanism works identically to the previous item.
    \item The coordinator plugin calls the minimization endpoint of the minimizer plugin.
    The minimizer plugin schedules the minimization process and returns immediately.
    Once the minimization process finishes, the minimizer plugin calls a callback endpoint of the coordinator plugin.
\end{itemize}
The design of these interactions ensures that plugins can interact seamlessly and efficiently regardless of the complexity or duration of tasks.

\section{Introduction of Interaction Endpoints}
\label{sec:introie}

Building upon the idea of plugins interacting with each other, as detailed in the previous section, a crucial question arises:
How does one plugin discover the available endpoints of another?
This thesis introduces a novel concept called \textit{interaction endpoints} to \gls{qhana} to address this very challenge.

While \gls{qhana} already has a metadata field named \emph{entry points}, which are endpoints invoked by the \gls{qhana} \gls{ui} to render a plugin's \gls{ui}, interaction endpoints extend this idea further.
They specifically define endpoints in the metadata that other plugins can invoke, facilitating seamless integration and interaction.

The core of interaction endpoints is their \emph{type}.
All interaction endpoints with the same type must adhere to the same signature and return type.
This uniformity ensures that other plugins can invoke them interchangeably.
The \gls{of} plugin provides interaction endpoints of types \emph{calc\_loss}, \emph{calc\_grad}, \emph{calc\_loss\_and\_grad}, and \emph{of\_pass\_data}.
The minimizer plugin offers the \emph{minimization} type.
These interaction endpoints correspond to the endpoints defined in the previous section.
The introduction of interaction endpoints significantly enhances the modularity and interchangeability within \gls{qhana}, paving the way for a more dynamic and adaptable plugin ecosystem.
More on how these interaction endpoints are implemented can be found in the implementation chapter \ref{chap:implementation}.

\section{Final Interaction Flow}
The architecture's final interaction flow is split into three main parts, each ending in a new \gls{ui} displayed to the user.
The sequence diagrams in figures \ref{fig:interaction_flow_part1}, \ref{fig:interaction_flow_part2}, and \ref{fig:interaction_flow_part3} visualize the interaction flow.

The first part begins after selecting the optimization plugin and concludes when the \gls{of} plugin's \gls{ui} is set as the next step.
Here, the coordinator plugin retrieves the user-selected \gls{of} and minimization plugin metadata, including their interaction endpoints.
This flow is represented in Figure \ref{fig:interaction_flow_part1}.

The second part starts with retrieving the \gls{of} plugin's microfrontend and ends when the minimizer plugin \gls{ui} is set as the next step.
After the user inputs hyperparameters and submits, the \gls{of} processes the input and sends a callback to the coordinator plugin,
The callback endpoint then passes the input and target data to the PassData endpoint.
This sequence is depicted in Figure \ref{fig:interaction_flow_part2}.

The last segment starts with the minimizer plugin's microfrontend retrieval and ends when the optimization process concludes.
After users input the minimization hyperparameters, the minimizer processes the data and sends a callback to the coordinator.
The coordinator then triggers the minimization endpoint, initiating a long-running minimization task that continuously calls the \gls{of} calculation endpoint.
Once this task is completed, the minimizer makes a final call to the coordinator with the minimization results.
Figure \ref{fig:interaction_flow_part3} shows this flow.

\begin{figure}[p]
  \centering
  \checkoddpage
  \ifoddpage
      \rotatebox{-90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_1.svg}}
  \else
      \rotatebox{90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_1.svg}}
  \fi
  \caption{Final interaction flow of the optimization process (part 1/3).}
  \label{fig:interaction_flow_part1}
\end{figure}

\begin{figure}[p]
  \centering
  \checkoddpage
  \ifoddpage
      \rotatebox{90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_2.svg}}
  \else
      \rotatebox{-90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_2.svg}}
  \fi
  \caption{Final interaction flow of the optimization process (part 2/3).}
  \label{fig:interaction_flow_part2}
\end{figure}

\begin{figure}[p]
  \centering
  \checkoddpage
  \ifoddpage
      \rotatebox{-90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_3.svg}}
  \else
      \rotatebox{90}{\includegraphics[width=0.9\textheight]{graphics/interaction_flow_3.svg}}
  \fi
  \caption{Final interaction flow of the optimization process (part 3/3).}
  \label{fig:interaction_flow_part3}
\end{figure}


\chapter{Implementation}
\label{chap:implementation}

The reader can find the resulting code that realized two different approaches to a plugin-based optimization framework in \gls{qhana} 's GitHub repository\footnote{\url{https://github.com/UST-QuAntiL/qhana-plugin-runner}}.
As examples, the code implements two types of minimizer plugins, the \emph{scipy-minimizer} and \emph{scipy-minimizer-grad} plugins, and three types of \gls{of} plugins, the \emph{ridge-loss}, \emph{hinge-loss}, and \emph{neural-network} plugins.
More on their implementation can be found in the following sections.
This chapter does not aim to provide a comprehensive overview of the code but to highlight the critical implementation details and differences between the two approaches.

\section{Directory Structure and Plugin Loading}
\label{sec:directoryStructure}

\gls{qhana} maintains two primary directories for plugin management: the \textit{plugins} and the \textit{stable plugins} directories.
The former contains plugins undergoing development, while the latter contains deployment-ready plugins.
Within these directories, individual plugins are neatly organized into dedicated subfolders.
Upon initiation, \gls{qhana} scans and loads plugins from these designated subfolders.

The focus of this thesis, the optimization plugin, resides in the \textit{plugins/optimization} directory.
Further divisions are made to promote clarity and a structured layout.

\begin{itemize}
  \item \textit{plugins/optimization/coordinator} – For the coordinator plugin.
  \item \textit{plugins/optimization/objective\_functions} – Where each \gls{of} plugin occupies its respective subfolder.
  \item \textit{plugins/optimization/minimizer} – Where each minimizer plugin occupies its respective subfolder.
\end{itemize}

Given that \gls{qhana}'s native architecture does not support the direct loading of plugins from nested subdirectories, a recursive plugin loader is developed for this purpose.
This loader traverses through the \textit{plugins} directory and its subdirectories.
The presence of an \textit{\_\_init\_\_.py} file within a folder confirms the plugin's legitimacy.
To exclude a plugin from the loading process, one places a \textit{.ignore} file in its respective folder.
The maximum recursion depth is limited to four to maintain system performance and a clear, structured layout.
This threshold sufficiently accommodates the current plugin structure but can be adjusted upwards if future needs arise.

The \textit{interaction\_utils} directory is dedicated to housing utility functions for generalized plugin interactions.
Additionally, there is a \textit{shared} directory, which stores data structures and schemas utilized across the plugins related to the optimization plugin.
A visual representation of the final folder structure, with all plugins implemented for this thesis, is shown in \cref{fig:folderStructure}.


\begin{figure}[h!]
  \dirtree{%
      .1 plugins.
      .2 optimizer.
      .3 coordinator.
      .3 interaction\_utils.
      .3 minimizer.
      .4 scipy\_minimizer.
      .4 scipy\_minimizer\_grad.
      .3 objective\_functions.
      .4 hinge\_loss.
      .4 neural\_network.
      .4 ridge\_loss.
      .3 shared.
  }
  \caption{QHana Plugin Folder Structure.}
  \label{fig:folderStructure}
\end{figure}

\section{Maintaining Context Across Plugin Endpoints}
\label{sec:contextBetweenPluginEndpoints}

The implementation leverages the multi-step plugin concept inherent to \gls{qhana} to preserve the continuity of information across different plugin endpoints.
In the \emph{processing} endpoint of a plugin, a database task is established, identifiable by a unique database ID.
This task stores all relevant context data, such as the hyperparameters designated for the minimization process.
Subsequent endpoints, like the \emph{minimization} endpoint, can access this stored context data by referencing the database task.
Notably, the endpoints' URL incorporates the database task's ID.


\section{UI Creation for Plugin Interaction}
\label{sec:uiCreationForPluginInteraction}
The QHAna platform provides a streamlined mechanism to craft simple \glspl{ui} with input fields generated from marshmallow schemas.
The primary step is to inherit from QHAna's \texttt{FrontendFormBaseSchema} class.
For the \gls{of} plugin, the input fields are the hyperparameters the loss function needs.
Listing \ref{lst:ridge_loss_schema} shows the \emph{ridge\-loss} plugin's hyperparameter, which is the regularization parameter \emph{alpha}.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption={Ridge Loss Plugin Hyperparameter Schema.}, label=lst:ridge_loss_schema]{code/hyperparameter_schema_ridge.py}
\end{minipage}

The minimizer and \gls{of} plugin selection field is a critical input field for the coordinator plugin.
The \texttt{PluginUrl} class,  a custom class that extends marshmallow's \texttt{fields.Url} class generates that field.
It ensures that a user can only select plugins of the correct type by checking a plugin's metadata for the correct label.
Imperative for this to work is that all minimizer plugins have the \emph{minimization} and all \gls{of} plugins have the \emph{objective-function} label.
Listing \ref{lst:coordinator_schema} shows the minimizer plugin selection field in the coordinator plugin's \gls{ui}.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Coordinator Plugin UI Schema., label=lst:coordinator_schema]{code/ui_schema_coordinator.py}
\end{minipage}

As the \glspl{ui} of the \gls{of} and minimizer plugins are designed to be invoked by the coordinator plugin, the \emph{UIHref} which serves the \gls{ui} needs to accept a callback URL.
Therefore, all \emph{UIHref} endpoints of invocable plugins must include the \emph{CallbackUrlSchema} marshmallow schema as a query parameter.


\section{Invocation of Plugin Microfrontends by the Coordinator Plugin}
\label{sec:invocationOfPluginMicrofrontendsByTheCoordinatorPlugin}

As highlighted in the previous chapter, the coordinator plugin invokes the microfrontends of both the \gls{of} and minimizer plugins.
An adaptation of \gls{qhana}'s \emph{add\_step} function allows the invocation of a plugin's microfrontend by another plugin.
The \emph{add\_step} function's original purpose is to allow a multi-step plugin to show another of its microfrontends to the user.
The function is enhanced and renamed to \emph{invoke\_task}.
With this modification, the coordinator plugin can invoke another plugin's microfrontend by supplying the appropriate \emph{UIHref} and \emph{Href}.
Furthermore, the coordinator plugin can provide a callback URL to the invoked plugin.
This callback URL is a query parameter within the \emph{Href} and \emph{UIHref} URLs.

\section{Callback to the Coordinator Plugin}
\label{sec:implementationOfCallbacksToTheCoordinatorPlugin}

The coordinator plugin requires callbacks from invocable plugins in two distinct scenarios:
\begin{enumerate}
    \item After the user submits the microfrontend of either the minimization or the \gls{of} plugin.
    \item Once the asynchronous minimization process concludes.
\end{enumerate}

For the first scenario, the process unfolds as follows:
\begin{itemize}
    \item The coordinator invokes the microfrontend of the invocable plugin using the \emph{invoke\_task} function, as elaborated in Section \ref{sec:invocationOfPluginMicrofrontendsByTheCoordinatorPlugin}.
    \item On user submission of the microfrontend, the callback URL is passed to the processing endpoint of the invocable plugin via a query parameter.
    \item Subsequently, the invoked plugin makes a post request to the callback URL.
\end{itemize}

In contrast, the second scenario operates through the following mechanism:
\begin{itemize}
    \item The coordinator plugin initiates a post request to the minimizer plugin's minimization endpoint, embedding a callback URL within the body.
    \item The minimizer plugin schedules an asynchronous celery task for loss function minimization.
    It registers the callback URL in the database under the designation \emph{status\_changed\_callback\_urls}.
    This label is pivotal for subsequent retrieval of the callback URL.
    \item The endpoint returns, and the celery task commences the minimization process.
    \item Following the task's completion, its status updates through the already implemented \emph{save\_task\_result} function.
    This action now triggers a signal when it finishes or runs into an error, courtesy of Python's \emph{blinker} library, indicating the status change and passing the database ID of the task as an argument.
    \item A dedicated signal handler retrieves the callback and task view URL from the database.
    The handler then orchestrates a post request to the callback URL, embedding the task view URL within the body.
\end{itemize}
Listing \ref{lst:signal_handler} presents the implementation of the signal handler, while Listing \ref{lst:singal_emitter} illustrates the juncture where the signal is emitted.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Signal Handler for Callbacks., label=lst:signal_handler]{code/signal_handler.py}
\end{minipage}

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Signal Emitter for Callbacks., label=lst:singal_emitter]{code/signal_emitter.py}
\end{minipage}

\section{Implementation of Interaction Endpoints}
\label{sec:implementationOfInteractionEndpoints}

The essence of the plugin-based optimization framework lies in its interaction endpoints.
These endpoints standardize the interchangeability of inputs and outputs.
Marshmallow schemas are the backbone of the implementation to manage this standardized data exchange.

\paragraph{Interaction Endpoint Schemas:}
The repository's shared folder contains the defined schemas.
Each interaction endpoint type has an associated input and output schema.
If a plugin wishes to invoke another plugin's interaction endpoint, it can populate the input schema and parse the response using the output schema.
In the implementation, the \gls{of} plugin defines schemas for the following interaction endpoints:
\begin{itemize}
  \item \textbf{pass\_data}: Uses \emph{ObjectiveFunctionPassDataSchema} for input and \emph{ObjectiveFunctionPassDataResponseSchema} for output.
  \item \textbf{calc\_loss}: Uses \emph{CalcLossOrGradInputSchema} for input and \emph{LossResponseSchema} for output.
\end{itemize}
For the minimizers' \emph{minimization} endpoint, the input schema is \emph{MinimizerInputSchema}.
The output, meanwhile, is a simple 200 status code, signaling the initiation of the minimization process.
Another implementation challenge revolves around ensuring that plugins can discover the endpoints of other plugins, which is solved in section \ref{sec:metadataEndpoint}, but first section
\ref{sec:customMarshmallowFieldForData} introduces a custom marshmallow field for input and target data.

\paragraph{Custom Marshmallow Field for Data:}
\label{sec:customMarshmallowFieldForData}
To streamline the input and target data exchange between plugins, a custom marshmallow field, \emph{NumpyArray}, is introduced (see \ref{lst:numpy_array_schema}).
Specifically crafted to handle NumPy arrays, this versatile field can accommodate arrays of any dimensionality with JSON-serializable data types.
It aids in data transfer between endpoints and ensures efficient data storage in the database.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Custom Marshmallow Field for NumPy Arrays., label=lst:numpy_array_schema]{code/numpy_array_schema.py}
\end{minipage}

\paragraph{Metadata Endpoint and Interaction Endpoint Discovery:}
\label{sec:metadataEndpoint}
The metadata of plugins houses the definitions for interaction endpoints.
Traditionally, the \emph{EntryPoint} field in the metadata guides the \gls{qhana} \gls{ui} to a plugin's microfrontend.
However, this thesis expands the \emph{EntryPoint} field to encompass interaction endpoints.
The \emph{interactionEndpoints} field is essentially a list, potentially containing multiple interaction endpoints.
Each list entry is a dictionary containing a \textbf{type} and an \textbf{href} field.
The \textbf{type} field, a string, denotes the endpoint type.
It is made user-friendly by providing the \emph{InteractionEndpointType} enum in the shared folder, which lists all possible endpoint types for the optimization process.
The \textbf{href} field houses the endpoint's URL.
Listing \ref{lst:metadata} illustrates this using the \emph{ridge-loss} plugin's metadata.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Metadata of the Ridge Loss Plugin., label=lst:metadata]{code/metadata.py}
\end{minipage}

For multi-step plugins, the interaction endpoint's definition in metadata slightly deviates.
As discussed in \ref{sec:contextBetweenPluginEndpoints}, multi-step plugins incorporate a runtime-specific database ID into the URL.
The \emph{href} field functions as a format string that allows the substitution of a placeholder with the actual database ID.
Listing \ref{lst:metadata_multistep} offers a glimpse into this, showcasing the \emph{ridge-loss} plugin implemented as a multi-step entity.
The invoking plugin replaces the placeholder with the actual database ID when known.
This distinction is foundational to the differences between the two approaches, explored further in the subsequent section.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=Metadata of the Ridge Loss Plugin as a Multi-Step Plugin., label=lst:metadata_multistep]{code/metadata_multistep.py}
\end{minipage}

\section{Exemplary Minimizers and Objective Functions}
\label{sec:exampleMinimizersAndObjectiveFunctions}

The implementation encompasses two distinct minimizer plugins and three \gls{of} plugins to illustrate the capabilities of the plugin-based optimization framework.
Specifically, the minimizer plugins \emph{scipy-minimizer} and \emph{scipy-minimizer-grad} and the \gls{of} plugins \emph{ridge-loss}, \emph{hinge-loss}, and \emph{neural-network}.

\paragraph{Minimizer Plugins:}
\label{sec:minimizerPlugins}

At the core, both plugins use \emph{scipy.optimize.minimize} as their minimization algorithm.
The \emph{scipy-minimizer} plugin leverages this method to minimize the loss function without needing a gradient.
Its sole input hyperparameter is the minimization method, restricted to methods that operate without a gradient.

On the other hand, the \emph{scipy-minimizer-grad} plugin employs the same method but incorporates the gradient for loss minimization.
It accepts input hyperparameters that exclusively pertain to methods necessitating the gradient.
Regardless of their approach, both plugins yield the weights that optimize the loss function's minimization.

\paragraph{Objective Function Plugins:}
\label{sec:objectiveFunctionPlugins}

The \emph{ridge-loss} plugin encapsulates the ridge loss function, as depicted in \ref{alg:ridge_loss}.
It operates with the regularization parameter \emph{alpha} as its input hyperparameter.
In a similar vein, the \emph{hinge-loss} plugin embodies the hinge loss function, showcased in \ref{alg:hinge_loss}, and uses the regularization parameter \emph{C} as its input.

The \emph{neural-network} plugin employs a neural network featuring a single hidden layer.
Its hyperparameters include the count of neurons present in this hidden layer.
Notably, this plugin facilitates the gradient calculation for the loss function, making it compatible with the \emph{scipy-minimizer-grad} plugin.
The user is encouraged to refer to the code repository for a more in-depth understanding of how a more sophisticated method is implemented for loss calculation.


\begin{lstlisting}[caption={Ridge Loss Calculation}, label=alg:ridge_loss]
  def ridge_loss(X: numpy.ndarray, w: numpy.ndarray, y: numpy.ndarray, alpha: float) -> float:
      # Calculate the predicted values using the weight vector
      y_pred = X.dot(w)

      # Compute the mean squared error between actual and predicted values
      mse = mean((y - y_pred) ** 2)

      # Calculate the ridge penalty using the weight vector
      ridge_penalty = alpha * sum(w**2)

      # Combine the mean squared error and ridge penalty to get the total ridge loss
      total_loss = mse + ridge_penalty

      return total_loss
\end{lstlisting}

\begin{lstlisting}[caption={Hinge Loss Calculation}, label=alg:hinge_loss]
  def hinge_loss(X: numpy.ndarray, w: numpy.ndarray, y: numpy.ndarray, C: float) -> float:
      # Get the number of samples from the input matrix
      n_samples, _ = X.shape

      # Initialize the loss to zero
      loss = 0.0

      # Iterate over each sample in the dataset
      for i in range(n_samples):
          # Calculate the score for the current sample using the weight vector
          score = np.dot(w, X[i])

          # Compute the hinge loss for the current sample and accumulate
          loss += max(0, 1 - y[i] * score)

      # Apply the regularization term to the accumulated loss
      loss = C * loss / n_samples

      # Add the l2 regularization term to the loss
      loss += 0.5 * np.dot(w, w)

      return loss
\end{lstlisting}

\section{Comparative Analysis of Plugin-Based Implementation Strategies}
\label{sec:differencesBetweenTheTwoPluginBasedImplementationApproaches}

The implementation resulted in two distinct methods for constructing a plugin-based optimization framework.
This section outlines the intrinsic differences that characterize each strategy.

\subsection{Decoupled Plugin Approach}
\label{sec:firstApproach}

The foundational idea behind the first approach is promoting maximal decoupling between plugins, which particularly applies to the \gls{of} plugin.
This idea entails minimizing data transfer between plugins and emphasizing data retrieval directly from the database whenever feasible.

Specifically, the \gls{of} plugin archives its hyperparameters in the database.
The coordinator invokes the \emph{pass\_data} endpoint, which stores input and target data in the database.
Thus, for the \emph{pass\_data} endpoint, which determines the number of input weights, the coordinator plugin only passes the input and target data since the \gls{of} plugin fetches the hyperparameters from the database.
The loss calculation endpoints \emph{calc\_loss}, \emph{calc\_grad}, and \emph{calc\_loss\_and\_grad} solely need the weights as input, leveraging already stored hyperparameters and data.
Listing \ref{lst:calc_loss_schema} illustrates an exemplary schema for the \emph{calc\_loss} endpoint.

Consequently, for the minimizer plugin, there is no occasion where knowledge of the \gls{of} 's hyperparameters or its data is essential, leading to a lean input schema for the \emph{minimization} endpoint, as depicted in Listing \ref{lst:minimization_schema}.
However, it mandates that the task's database ID be communicated to the coordinator plugin through the microfrontend callback process, essential for managing interaction endpoint URLs, as discussed in \ref{sec:metadataEndpoint}.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=\emph{CalcLossOrGradInputSchema} Schema for multistep-plugin approach., label=lst:calc_loss_schema]{code/calc_loss_schema.py}
\end{minipage}

\lstinputlisting[language=Python, caption=\emph{MinimizerInputSchema} Schema for multistep-plugin approach., label=lst:minimization_schema]{code/minimization_schema.py}

\subsection{Integrated Plugin Approach}
\label{sec:secondApproach}

The second strategy embodies a tighter integration.
This approach does not implement the \gls{of} plugin as a multi-step plugin.
Such an implementation implies that no shared context exists among the plugin's endpoints, necessitating the passing of all required data during each endpoint invocation.

This design requires the microfrontend callback process to relay the hyperparameters to the coordinator plugin.
The coordinator then transmits the hyperparameters and data to the \gls{of} plugin's \emph{pass\_data} endpoint.
In this setup, the minimizer plugin must also store the hyperparameters and data since it must relay this to the \gls{of} plugin's loss calculation endpoint.
Listing \ref{lst:calc_loss_schema_multistep} exhibits the input schema for the \emph{calc\_loss} endpoint, while Listing \ref{lst:minimization_schema_multistep} demonstrates the \emph{minimization} endpoint's schema.
Passing the hyperparameters of a \gls{of} plugin ensures that neither the coordinator nor the minimizer plugin requires in-depth knowledge of the hyperparameters irrespective of the \gls{of} 's internal mechanics.
As a trade-off, there is no need to relay a task's database ID to the coordinator.

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=\emph{CalcLossOrGradInputSchema} Schema for non-multistep-plugin approach., label=lst:calc_loss_schema_multistep]{code/calc_loss_schema_2.py}
\end{minipage}

\noindent\begin{minipage}{\linewidth}
  \lstinputlisting[language=Python, caption=\emph{MinimizerInputSchema} Schema for non-multistep-plugin approach., label=lst:minimization_schema_multistep]{code/minimization_schema_2.py}
\end{minipage}


\chapter{Result Validation}
\label{chap:results}

This chapter explains the outcomes achieved with the current implementation, emphasizing system performance, plugin interchangeability, and developer usability.

Performance benchmarks are conducted on a MacBook Pro. The system's detailed specifications include:

\begin{itemize}
    \item \textbf{Model:} MacBookPro18,1
    \item \textbf{CPU:} Apple M1 Pro
    \item \textbf{RAM:} 32 GB
    \item \textbf{GPU:} Apple M1 Pro (16 Cores, Metal 3 support)
    \item \textbf{Operating System:} macOS Version 13.4.1
\end{itemize}


\section{Benchmark Results}
\label{subsec:benchmarkingResults}

Given its recurrent invocation, the loss function call is the most resource-intensive phase of the minimization process.
The benchmarks employ the \emph{ridge-loss} plugin as the \gls{of}, configuring \emph{alpha} at 0.5.
The \emph{scipy-minimizer} plugin acts as the minimizer, with the method set to \emph{L-BFGS-B}.

Table \ref{table:of_calls} displays the frequency of loss evaluations during the minimization process for varying datasets, underscoring the significance of efficient loss function computation.

\begin{table}[h!]
  \centering
  \begin{tabular}{cc}
  \toprule
  \textbf{Number of Data Points} & \textbf{Number of Evaluations} \\
  \midrule
  200 & 264 \\
  400 & 341 \\
  600 & 370 \\
  800 & 430 \\
  1000 & 480 \\
  1200 & 520 \\
  1400 & 570 \\
  \bottomrule
  \end{tabular}
  \caption{Number of times the loss function gets evaluated during minimization across different dataset sizes.}
  \label{table:of_calls}
\end{table}

Figure \ref{fig:of_call_time_version1} shows the time it takes for a single call of the loss function for different dataset sizes for the decoupled plugin-based implementation approach as described in \ref{sec:firstApproach}.
The \emph{Calculation Time} captures only the actual time it takes to calculate ridge loss.
The \emph{Database Time} measures the time it takes to retrieve the data from the database.
The \emph{Network Time} measures the time it takes to send the data to the \gls{of} plugin and receive the result.
The total time it takes from the point at which the minimizer plugin calls the calculation endpoint until it receives the result is the sum of the \emph{Calculation Time}, \emph{Database Time}, and \emph{Network Time}.
The \emph{Database Time} is the most significant contributor to the total time as the data is retrieved from the database for every call. With increasing dataset size, the time it takes to retrieve the data also increases.
The \emph{Network Time} is comparably shorter but still significant. It increases only slightly with an increasing dataset size since the data sent over the network does not change significantly.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/of_call_times_version1.png}
  \caption[Call time for \emph{calc\_loss} endpoint decoupled approach]{Time it takes for a single call to the \emph{calc\_loss} endpoint of the \emph{ridge-loss} plugin for different dataset sizes for the decoupled plugin-based implementation approach.}
  \label{fig:of_call_time_version1}
\end{figure}

Figure \ref{fig:of_call_time_version2} captures the same measurements as \ref{fig:of_call_time_version1} but for the integrated plugin-based implementation approach as described in \ref{sec:secondApproach}.
The diagram does not show the \emph{Database Time} since this version does not make calls to the database.
The \emph{Network Time} significantly increases for this approach as the vast amounts of data previously retrieved from the database are now sent over the network.
Additionally, the \emph{Network Time} increases with increasing dataset size since the amount of data sent over the network also increases.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/of_call_times_version2.png}
  \caption[Call time for \emph{calc\_loss} endpoint integrated approach]{Time it takes for a single call to the \emph{calc\_loss} endpoint of the \emph{ridge-loss} plugin for different dataset sizes for the integrated plugin-based implementation approach.}
  \label{fig:of_call_time_version2}
\end{figure}

Figure \ref{fig:coparison_of_of} compares the total times it takes to get the loss value across different implementations, including the Jupyter Notebook implementation.
The Jupyter Notebook implementation is the fastest as it does not have to deal with network or database latency.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/comparison_of_of.png}
  \caption[Total times to get loss value compared]{Comparison of the total time it takes to get the loss value across different implementations, including the Jupyter Notebook implementation.}
  \label{fig:coparison_of_of}
\end{figure}

Figure \ref{fig:time_for_of_calc} measures the same metrics as Figure \ref{fig:coparison_of_of}, but this time it ignores the \emph{Database Time} and the \emph{Network Time} only measuring the actual \emph{Calculation Time}.
The result shows that a plugin-based implementation approach does not significantly impact the time it takes to calculate the loss function.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/time_for_of_calc.png}
  \caption[Times to calculate the loss function compared]{Comparison of the time the actual loss calculation takes across different implementations, including the Jupyter Notebook implementation.}
  \label{fig:time_for_of_calc}
\end{figure}

One can see that the Jupyter Notebook implementation is the fastest across the board as it does not have to deal with network or database latency.
Especially for the decoupled plugin-based implementation, the database latency is a bottleneck that significantly decreases the performance, but that can be mitigated by caching the data, as described in the next section.

\section{Caching of Data}
\label{subsec:cachingOfData}

A caching mechanism is implemented for the decoupled plugin-based approach to avoid database latency.
Since the input and target data and the hyperparameters do not change during the minimization process, they can be cached.
That means the data is retrieved from the database during the first call of the \emph{calc\_loss} endpoint and cached for subsequent calls.
The implementation uses the \emph{flask-caching} library and keeps the cached data in memory.
Figure \ref{fig:of_call_time_version1_cached} shows the average time for a single call of the \emph{calc\_loss} endpoint for different dataset sizes when using caching.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/of_call_times_version1_cached.png}
  \caption[Call time for \emph{calc\_loss} endpoint decoupled approach with caching]{Time it takes for a single call to the \emph{calc\_loss} endpoint of the \emph{ridge-loss} plugin for different dataset sizes with caching.}
  \label{fig:of_call_time_version1_cached}
\end{figure}

Figure \ref{fig:coparison_of_of_cached} compares the total times it takes to get the loss across different implementations, including a cached version of the decoupled plugin-based implementation approach.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/comparison_of_of_cached.png}
  \caption[Total times to get loss value compared including cached approach]{Comparison of the total time it takes to get the loss value across different implementations, including a cached version of the first plugin-based implementation approach.}
  \label{fig:coparison_of_of_cached}
\end{figure}

Finally, Figure \ref{fig:time_for_minimization} shows the total time it takes to complete the minimization process for the different implementations.
The measurements for the plugin-based implementations do not include the time it takes a user to input data but instead start when the coordinator plugin calls the minimization endpoint of the minimizer plugin.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{graphics/time_for_minimization.png}
  \caption[Times to minimize a loss function compared]{Comparison of the total time it takes to complete the minimization process across different implementations.}
  \label{fig:time_for_minimization}
\end{figure}

\section{Interchangeability and Standardization}
\label{sec:interchangeabilityOfPlugins}

This segment evaluates the plugins' adherence to predefined standards and their interchangeability.
The governing standards, as enumerated in Chapters \ref{chap:architecture} and \ref{chap:implementation}, encompass:

\begin{itemize}
  \item All invocable plugins should list interaction endpoints in their metadata.
  \item Such plugins must also offer interaction endpoints compliant with respective argument and response schemas.
  \item Both invoking and invoked plugins should uphold the callback mechanisms delineated in Section \ref{sec:implementationOfCallbacksToTheCoordinatorPlugin}.
  \item Each plugin should retain its intended functionality as specified in Section \ref{sec:resdecomposition}.
\end{itemize}

Successful compliance ensures plugin interchangeability.
All example plugins in the implementation segment conform to these standards, ensuring mutual compatibility.
Users can combine any minimizer plugin with any \gls{of} without changing any code.
It is important to note that to leverage the gradient calculation provided by the \emph{neural-network} plugin, one has to use the \emph{scipy-minimizer-grad} plugin.

The combination of all minimizer plugins with all \gls{of} plugins is extensively tested manually, and the results have been documented in Table \ref{table:interchangeability} to ensure that the plugins are fully interchangeable.


\begin{table}[h!]
  \centering
  \begin{tabular}{|c|c|c|c|}
    \hline
    & \textbf{ridge-loss} & \textbf{hinge-loss} & \textbf{neural-network} \\
    \hline
    \textbf{scipy-minimizer} & \checkmark & \checkmark & \checkmark \\
    \hline
    \textbf{scipy-minimizer-grad} & \checkmark & \checkmark & \checkmark \\
    \hline
  \end{tabular}
  \caption{Compatibility between example implementation of plugins.}
  \label{table:interchangeability}
\end{table}


\section{Developer Usability Assessment}
\label{sec:usabilityForDevelopers}

This section assesses the system's developer-friendliness.
The steps required to implement new minimizer and \gls{of} plugins are documented to underpin the evaluation.
For every new plugin, the first step is to copy an existing plugin into a new directory as a template.

\paragraph{Minimizer Plugin Development:}
\label{subsec:implementingAMinimizerPlugin}

For crafting a novel minimizer plugin:

\begin{enumerate}
  \item Replicate the \emph{scipy-minimizer} plugin into a distinct directory.
  \item Update the plugin's name in the \emph{\_\_init\_\_.py} file.
  \item Integrate requisite hyperparameters into the \emph{MinimizerSetupTaskInputSchema} schema.
  \item Implement the minimization algorithm within the \emph{minimize\_task} function.
\end{enumerate}

\paragraph{Objective Function Plugin Development:}
\label{subsec:implementingAnObjectiveFunctionPlugin}

For devising a new \gls{of} plugin:

\begin{enumerate}
  \item Duplicate the \emph{ridge-loss} plugin into a fresh directory.
  \item Modify the plugin's name in the \emph{\_\_init\_\_.py} file.
  \item Incorporate necessary hyperparameters into the \emph{HyperparamterInputSchema} schema.
  \item Implement the input weight calculation within the \emph{of\_pass\_data} endpoint.
  \item Construct the loss function's computation in the \emph{calc\_loss} endpoint.
\end{enumerate}

The outlined steps demonstrate the approach to developing new plugins within the system.
By starting with existing templates, developers are spared the repetitive tasks of setting up basic functionalities and can focus on the unique aspects of their specific plugins.


\chapter{Discussion}
\label{chap:discussion}

This chapter delves into the critical discussions surrounding architectural choices, the model's practical execution, and the consequential outcomes.
It starts by evaluating the plugin-based optimization framework's architectural decisions and how it is implemented.
Subsequently, it analyzes the performance of the plugin-based optimization framework and compares it to the Jupyter Notebook implementation.
Finally, this concept addresses how the results influence the developer experience when implementing new minimization and \gls{of} plugins.

\section{Architecture and Implementation}

Decomposing the minimization process into separate minimizer and \gls{of} plugins aligns perfectly with the plugin-based optimization framework's ethos.
This delineation ensures clarity and specificity and establishes a transparent interface between the two plugins.
The success of libraries like SciPy and how it handles the minimization process \cite{Virtanen2020} reinforces the effectiveness of this decomposition.
Entrusting the coordinator plugin with the responsibility of overseeing the optimization eliminates the need for the minimizer and \gls{of} plugins to manage interactions, enabling developers to concentrate solely on enhancing plugin functionalities.

Interaction endpoints serve as the core of the plugin-based optimization framework, offering a unified mode of interaction across plugins.
This standardization supports developers by delineating available endpoints and their operational functionalities.
By presenting these endpoints even before passing through the \gls{ui} process, developers can now bypass the \gls{ui} and directly trigger the requisite endpoint in a standardized and, thus, reliable way.
Such a setup is invaluable for those seeking automation.
By precisely defining input-output expectations, this approach also ensures seamless plugin interchangeability.

The callback mechanism is an essential facet of the plugin-based optimization framework.
Its design ensures a distinct separation of responsibilities.
The coordinator plugin, devoid of internal insights into the minimizer or \gls{of} plugin, solely relies on available endpoints.
This mechanism prevents the need for database or API polling, offering updates upon minimization completion.
The distinctive implementation of the callback mechanism for the minimization endpoint further underlines the modularity, as it fully decouples the procedure from the minimizer plugin.
The event-driven model \cite{Hohpe2011} drives adaptability for all kinds of asynchronous processes in \gls{qhana}

The rationale behind the \emph{pass\_data} endpoint merits discussion.
While it introduces an additional layer, the trade-offs justify its existence.
Directly passing vast datasets as query parameters to the \gls{ui} endpoint is not feasible.
Transferring data as files to the \gls{ui} endpoint burdens the \gls{of} plugin with the data extraction process, which should not be its responsibility.
Receiving data as a NumPy array allows the \gls{of} plugin to focus on its primary role: loss value computation.

The two plugin-based implementation strategies show the trade-off between the complexity of handling interaction endpoints and the amount of data that has to be passed between plugins.
The decoupled approach, where the \gls{of} plugin is implemented as a multi-step plugin, is more complex since the coordinator plugin has to format the interaction endpoint URL to reflect the database ID.
Nevertheless, it also has the advantage that the minimizer plugin does not have to pass the hyperparameters and data to the \gls{of} plugin since the \gls{of} plugin retrieves them from the database.
In general, that means by passing along a single parameter, the database ID, any volume of data can be loaded and used for context.
The integrated approach, where the \gls{of} plugin is not implemented as a multi-step plugin, is less complex since the coordinator plugin does not have to handle formatting the interaction endpoint URL.
However, it also has the disadvantage that the minimizer plugin has to pass the hyperparameters and data to the \gls{of} plugin since the \gls{of} plugin cannot retrieve them from the database.
Both approaches have advantages and disadvantages and should show that the plugin-based optimization framework is flexible enough to support both approaches.
As the benchmarking results show, by caching the data, the first approach is arguably the better one of the two.

Standardization is paramount in the plugin-based optimization framework.
The metadata, interaction endpoints, callback protocols, and plugin functionalities are meticulously defined to foster uniformity.
This organized structure creates seamless plugin interchangeability.
The modular nature of this framework not only simplifies the optimization process but also encourages a modular development course, paving the path for universal plugin interactions within \gls{qhana}.

\section{Performance Insights}
\label{sec:performanceAnalysis}

Performance metrics underscore the plugin-based optimization framework's viability when compared against the Jupyter Notebook model.
Absent caching, both plugin methods suffer from network and database latencies.
However, Figure \ref{fig:time_for_of_calc} stands out, highlighting that the intrinsic time for loss function computation remains unaffected across methods.

Caching within the first plugin-based model increases performance significantly, as evidenced by Figure \ref{fig:of_call_time_version1_cached}.
However, network latencies persist and cannot be minimized further as data sent and received is minimal.
Additionally, some database overhead is still present, as the data has to be retrieved from the database once and stored in the cache.
As dataset sizes increase, these latencies diminish in relative significance.
Additionally, the primary use case of this optimization framework is \gls{vqa}, where the actual loss function calculation takes significantly longer than the network and database latencies.
When fortified with caching, the decoupled plugin-based approach emerges as a compelling alternative to the Jupyter Notebook implementation.

\section{Developer Experience}
\label{sec:developerUsabilityAssessment}

The plugin-based optimization framework manifests developer-centricity.
Shielded from the difficulties of plugin interactions, developers can emphasize plugin functionalities.
The regimented structure of the interaction endpoints liberates developers from concerns about other plugins.
This contrasts with the Jupyter Notebook implementation, where developers must know the entire optimization process when implementing a new minimizer or \gls{of}.
As the results in \ref{sec:usabilityForDevelopers} show, implementing new minimizer and \gls{of} plugins is straightforward and can be done in a few steps.
What simplifies the process even more is that all plugins of the same type have the same structure, meaning that the first step is only to copy an existing plugin.
By copying an existing plugin, everything concerning the plugin's metadata, interaction endpoints, and callback protocols is already implemented.
The standardized architecture means developers can focus on hyperparameter customization and core calculations during implementation.
With several existing example plugins, developers can leverage these to understand how the core implementations are realized.
Detailed documentation of the plugin-based optimization framework that explains the architecture and implementation further enhances the understanding.


\chapter{Related Work}
\label{chap:relatedWork}

Beisel et al. \cite{Beisel2023} detail the implementation of \glspl{vqa} within a workflow-based system.
They delve into the intricacies of incorporating the quantum aspect of the algorithm into such a system while emphasizing the interaction between its various components less.
In their approach, the conventional computing segment of the algorithm is split into two services: the \emph{Objective Evaluation Service} and the \emph{Optimization Service}.
Similar to this thesis, the \emph{Optimization Service} utilizes SciPy for optimization tasks.

In a subsequent paper \cite{Beisel2023a}, Beisel et al. elaborate on the architecture of the workflow-based system, shedding light on the interplay between its components.
They explain that the system's components are realized as microservices, each offering a \gls{rest}ful API for interaction.
Central to this architecture is a \emph{Gateway}, which handles the internal communication with the various microservices and presents a unified \gls{rest}ful interface to the outside.

Mayer et al. \cite{Mayer_2003} advocate simplifying the intricacies of expansive software systems through a \emph{plugin-based architecture}.
They argue that the system gains flexibility and becomes more maintainable by minimizing the core system and encapsulating most functionalities as plugins.
They introduce the concept of \emph{Plugin-in Component Architecture} (PICA), where each plugin must provide a list of interfaces the core can call and a description of how to use them.

Wolfinger et al. \cite{Wolfinger2006} present a plugin architecture tailored for the .NET platform, drawing parallels with the Eclipse platform.
The authors emphasize the structured interaction between plugins through the concepts of \emph{slots} and \emph{extensions}.
A plugin host defines slots that indicate how to extend the plugin, while plugin contributors provide extensions to fill these slots.
This structured interaction ensures seamless integration of plugins.
The architecture also leverages .NET features to embed relevant plugin information directly within the application's source code, arguing for improved readability and maintainability.

Birsan et al. \cite{Birsan2005} delve into the {plugin architectures of the Eclipse platform.
Birsan underscores the importance of well-defined \emph{extension points}, allowing plugins to interact seamlessly.
The work highlights the intricate interplay between plugins, emphasizing the need for structured interfaces for effective interaction.

Thullier et al. \cite{Thullier2021} introduce a \emph{machine learning workbench}, emphasizing a modular approach to machine learning processes for smart homes.
The workbench divides the machine learning workflow into distinct services: the \emph{Windowing Module}, the \emph{Feature Extraction Module}, and the \emph{Machine Learning Module}.
This modular design ensures efficient parallel processing and scalability.
The paper does not aim to split the machine learning service into different components.

A similar optimization loop, wherein a \gls{of} is evaluated iteratively to minimize the loss value, is adopted by popular machine learning frameworks like TensorFlow \cite{Abadi2016} and PyTorch \cite{Paszke2019}.
These frameworks process input data, compute the discrepancy or loss between predictions and actual values, and refine the model's parameters based on this computed loss.
The iterative process continues until a specified stopping criterion, such as a predetermined number of epochs or a desired accuracy threshold, is met.
While there are architectural and functional distinctions between TensorFlow and PyTorch, at a high level, their optimization loops embody this core principle.
However, these frameworks do not offer a distributed optimization loop, a crucial feature of the plugin-based optimization framework.

In a comprehensive survey by Verbreaken et al. \cite{Verbraeken2020}, the authors delve deep into the paradigms and challenges of \emph{Distributed Machine Learning}.
The authors categorize \emph{Distributed Machine Learning} techniques based on their key characteristics, such as data distribution, model updates, and communication strategies.
The study sheds light on various challenges, including communication overhead.
They further highlight the significance of an efficient communication strategy, emphasizing the need for a robust communication protocol.
As discussed in the result section of this work, the decoupled plugin-based implementation already minimizes communication overhead by caching data and reducing the amount of data sent over the network.

\chapter{Conclusion and Outlook}
\label{chap:conclusion}

This thesis embarked on the challenging journey of conceptualizing and realizing the plugin-based optimization framework within the \gls{qhana} environment.
What emerged is an innovative approach to optimization and a pioneering framework that establishes protocols for plugin interactions within \gls{qhana}.
This framework's versatility hints at its potential applicability beyond just optimization tasks.

The example implementations of both the minimizer and the \gls{of} plugins are a testament to the system's functionality and adaptability.
Performance benchmarks highlight the framework's robustness and efficiency, especially with caching, trying to minimize network and database latencies from a distributed system.
A particularly commendable aspect is the strict adherence to standardization, which ensures modularity and paves the way for seamless plugin interchangeability.
This design philosophy also creates a developer-friendly system, simplifying plugin development while ensuring versatility.

The current ecosystem provides a substantial opportunity to diversify by introducing a broader array of minimizer and \gls{of} plugins.
Expanding this repertoire will not only enrich the optimization solutions available but also enhance the robustness and adaptability of the system.

As the thesis lays out the complete foundation to implement \glspl{vqa}, a new \gls{of} plugin that handles the quantum part would be a valuable and easily implementable addition.
The energy state of a quantum system is then perceived as the loss value, while an existing minimizer plugin is used to minimize this loss.

Building upon the inherent interchangeability and modularity of the current design, a novel concept draws attention: viewing the \gls{of} or the minimizer as hyperparameters themselves.
In a traditional setting, hyperparameters are parameters set before the learning process begins.
However, given the modularity, one could envision running parallel optimization processes with different \gls{of} plugins but the same minimizer.
This would be similar to a hyperparameter search, where each \gls{of} is a candidate and aims to find the best fitting \gls{of} for a given dataset or problem.
Similarly, different minimizers can be benchmarked against a fixed \gls{of}.

In conclusion, with its modularity, standardization, and user-friendliness, the plugin-based optimization framework offers many opportunities.
From diversifying its offerings to delving deep into quantum algorithms, the journey ahead is teeming with potential.

\printbibliography

\chapter*{Appendix}
\label{chap:appendix}
\lstinputlisting[language=Python, caption={Source code for generating a sample test dataset for benchmarking with 1000 samples, 47 features, 10 noise and 1 target}., label=lst:generate_test_dataset]{code/ridge_loss_data.py}.

\pagestyle{empty}
\renewcommand*{\chapterpagestyle}{empty}
\Versicherung
\end{document}