Skip to content
russa edited this page Feb 8, 2016 · 2 revisions

What is MMIR?

MMIR (Multimodal Mobile Interaction and Rendering) is a web-based development framework, especially designed to make the process of developing multimodal (mobile) interaction systems easier. This section explains the background of the MMIR framework in detail.

Model-View-Controller (MVC) Architecture

It is common to think of an interactive application as having three main layers: presentation (UI), application logic, and resource management. It is reasonable to propose that any given applications is likely to change its interface as time goes by, or indeed have several interfaces at any one point in time. Yet the underlying application might stay fairly constant. For example, a banking application that used to work behind character-based menu systems or command-line interfaces is likely to be the same application that today is working behind a graphical user interface (GUI). As the example illustrates, that it makes sense, to keep the essence of an application separate from any and all of its interfaces. Thus at the core of MMIR is the MVC architecture.

Long ago, in the 70's, Smalltalk defined architecture to cope with this, called the Model-View-Controller architecture, usually just called MVC. MVC paradigm is a way of splitting up your application to that it is easier to change parts of it without affecting other parts. In MVC, the presentation layer is split into controller and view. The most important separation is between presentation and application logic. The View/Controller split is less so.

Models

A model represents the information (data) of the application and manages the behavior of data in the application domain. In the case of MMIR, each model is a JavaScript file intended to store specific data and rules to manipulate that data (for example a model for user which in the simplest case contains the username and password of the application's user).

Views

Views represent the user interface of your application. In MMIR, views are often HTML files with embedded MMIR and JavaScript code that perform tasks related solely to the presentation of the data.

Controllers

Controllers provide the "glue" between models and views. In MMIR, controllers are responsible for processing the incoming requests from input devices, interrogating the models for data, and passing that data on to the presentation manager for presentation.

Basic Components

Figure 1 illustrates the basic components of MMIR framework. The framework is built upon current Web technologies such as JavaScript, HTML5, CSS3 and a range of existing W3C markup languages.


Figure 1: Basic Components of MMIR Framework

Human user

A user who enters input into the system and in turn observes and hears information presented by the system. In the following, we will use the term "user" to refer to a human user.

Input

An interactive multimodal framework should provide multiple input modes such as touch, audio/speech, and gesture.

Output

An interactive multimodal framework should use one or more modes of output, such as speech, text, and graphics.

Interaction manager (IM)

The interaction manager is the central logical component of the MMIR framework. It coordinates the data and manages the execution flow from various input and output modality component interface objects. IM maintains the interaction state and context of the application and responds to inputs from component interface objects and changes in the system and environment.

External components (services)

TBD

MMIR Architecture


Figure 2: MMIR architecture

Input Manager (Multimodal Fusion)

In contrast to speech-only-based dialog systems, multimodal dialog systems encompass a number of input modalities that can be employed by the user in an isolated or combined way in order to interact with the system. A classic example of a multimodal interaction is a spoken command like "show me information about this movie" that is accompanied by a pointing gesture for selecting a specific movie that is currently displayed on the screen. In order to trigger an appropriate reaction to such an utterance, a multimodal dialog system needs to integrate the 2 unimodal actions of the user into a coherent multimodal interpretation of their intention. This task is usually carried out by a component called input manager or multimodal fusion.

In general, the task of a modality fusion component is to combine and integrate all incoming unimodal events into a single representation of the intention most likely expressed by the user. A fusion component has to ensure that every unimodal event that could potentially contribute to the integrated meaning of a multimodal utterance is considered. Thus, a fusion component needs to synchronize the recognition and analysis components so that all unimodal components of an utterance will be taken into account.

Presentation Manager (Multimodal Fission)

The opposite process of multimodal fusion is modality fission where an abstract representation of the content that is about to be communicated to the user has to be distributed over the available modalities for optimal presentation. The presentation manager should generate an appropriate system response by planning the actual content, distributing it over the available modalities, and by finally coordinating and synchronizing the output.

Language Manager

The language manager handles language specific resources. Often this is referred to as localization or internationalization. In general, the language manager handles texts for the user interface. For instance, translations for different languages to allow localization of the application during run time.

The language manager also handles other language specific resources, e.g. speech grammars (for processing speech input) and speech synthesis (TTS: text to speech).

Interaction Manager (IM)

The interaction manager is a logical component that controls the flow of the dialog. In MMIR, the interaction manager is in direct communication with the input manager, presentation manager and controllers of the MVC-architecture.

The Interaction Manager (IM) holds the dialog description. The processing of MMIR IM is based on this dialog description in form of State Chart XML (SCXML). SC XML is a standard for describing the flow of an application and for managing interactions among components, and is a natural choice for the IM in a multimodal architecture implementation. SCXML is an XML syntax for describing state machines with the semantics of Harel State Charts. In SCXML, the flow of an application is represented as a set of states and transitions. By sending life cycle events, an SCXML interpreter can start and stop modalities and receive back user input. The SCXML interpreter can transition to other states through application-events based on user or device inputs. After an event is triggered by user, or device, or 3rd Party applications the action planner selects the possible transition and executes the appropriate actions or/and renders the appropriate view.


Figure 3: Interaction manager


< previous: "Introduction" | next: "MMIR Project Structure" >

Clone this wiki locally