The Controls Framework for ASK SDK is offered as a beta and may change as we receive feedback and iterate on the feature. If you have questions or feedback please create issues directly in the GitHub repository.
This guide assumes familiarity with building Custom Skills for Alexa using the ASK SDK v2 for Node.js. For background information on building skills and getting started with the ASK SDK for Node.js please see Understanding Custom Skills and Using the ASK SDKs to create Alexa Skills on the Amazon developer portal.
For per-class and per-method documentation, please see the Technical Documentation.
- 1. Overview
- 2. Getting started
- 2.1. Prerequisites for developing a skill with Controls
- 2.2. JavaScript or TypeScript?
- 2.3. Creating “Hello, World” in Controls
- 2.4. Getting diagnostics and tracking the runtime call flow
- 2.5. Creating a launch configuration for vscode launch.json
- 2.6. Run the regression tests
- 2.7. Build the interaction model for the skill
- 3. Exploring the Hello World Controls skill
- 4. Exploring the Fruit Shop demo skill
- 5. Developing with Controls
- 5.1. What is a Control?
- 5.2. What inputs does a Control respond to?
- 5.3. Runtime flow
- 5.4. System Acts
- 5.5. State management
- 5.6. Interfacing with state serialization during Shutdown phase
- 5.7. Interaction Model
- 6. Additional topics
- 7. Contributing
- 8. Reference
In this user guide, we’ll show you the fundamental concepts of the Controls Framework by taking you through a practical, step-by-step tutorial on how to create a simple skill using Controls from the existing templates (Hello World and Fruit Shop). In the first half of this guide (sections 1-4) we’ll detail the code in each template, so you have a solid understanding of a Control's skill architecture and components. In the second half of this user guide (sections 5-6), you will find in-depth explanations of how to build skills using Controls Framework from the ground up.
The Controls Framework allows developers to implement Custom Skills that offer rich mixed-initiative and multi-modal experiences.
One specific goal of the framework is to encourage the development of re-usable components, called Controls, that can be shared between projects and between developers. The framework includes general-purpose built-in Controls for some of the most common tasks that developers face. Developers can easily create new Controls and build new high-level Controls from low-level ones.
Another goal is to help developers to follow the best practices of conversational design and to provide a natural way to implement the concepts of situational design.
Finally, the Controls Framework is designed to integrate seamlessly with existing skills written using the ASK-SDK v2 for Node.js.
The workflow when using the Controls Framework will be largely the same as for any skill that uses ASK-SDK v2 for Node.js. Below are required and highly recommended tools to leverage:
Required
- Node.js v10.16 or higher
- ASK SDK for Node.js v2
- ASK CLI v2
Highly Recommended
- Visual Studio Code
- Typescript v3.9 or higher
- ASK Toolkit for Visual Studio Code (enables local debugging and skill simulation including APL rendering)
- ESLint (which supersedes TSLint)
- A regression test framework such as mocha + chai.
Additionally, this guide assumes you have previous experience with building skills using the ASK SDK. To learn more about the basics of skill development, including interaction models and intent handlers, please start with our guide on Building an Engaging Alexa Skill..
Skills that use the Controls Framework can be written in JavaScript or Typescript, and there is no functional or performance difference from the choice. In the documentation and sample code we provide, we have chosen to use JavaScript as it is currently used by the majority of the Alexa skill developer community.
For those who prefer TypeScript or want to try it out, we encourage you to use it with the Controls Framework. The ASK SDK dev team uses TypeScript to build the SDK and the Controls Framework as we find it improves our personal productivity through superior auto-completion support in IDEs, stronger linting rules, and the ability to find bugs at edit & build time rather than at runtime.
JavaScript users may be interested to know that some of TypeScript’s benefits can be brought to a pure JavaScript codebase. See Migrating from JavaScript and Type Checking JavaScript Files at www.typescriptlang.org for more information.
Regardless of which language you choose, we would love to hear about your experience! You can always find us here on GitHub or on our Slack channel.
Install the ASK CLI v2. If you already have the ASK CLI v1, please see the CLI migration guide.
npm i -g ask-cli
Create a new skill based on the Hello, World! (with Controls)
demo skill. When choosing
a hosting method please note that with the new local debugging support you do not need to
deploy your skill to test a skill end-to-end with a device — the code can simply run on
your local machine and attached to the VS Code debugger. If you are unsure which hosting
method to choose, please see Creating new skill with
ASK CLI
for more information. If you want Alexa-hosted skills
it's necessary to create a skill
through the developer portal.
ask new
> select NodeJS
> choose a skill hosting method (not "Alexa-hosted skills" option)
> select "Hello world (Controls Framework)"
Install the Node.js dependencies for the skill, which includes the Controls Framework for ASK SDK.
cd lambda
npm install
Either deploy your skill with ask deploy
or set up local debugging (See section "Live testing with local debugging").
Once your skill is ready for testing, use the testing page of the developer portal
(https://developer.amazon.com/alexa/console/ask/test/{SkillID}/development/en_US/) or ask dialog
to test it out.
U: Alexa, open Hello Controls
A: Hello! Welcome to your first Controls skill.
See the section "Exploring the Hello World (Controls) skill" for a discussion of how the skill works.
Once you have taken a look around the Hello World skill, we recommend repeating the process with the Fruit Shop demonstration skill to start exploring a more interesting skill. Section "Exploring the Fruit Shop demo skill" provides a discussion.
The Controls Framework produces diagnostic information via the Debug package (npm). To get maximum detail, set the following environment variable.
export DEBUG="error:*, warn:*, info:*, debug:*"
When running in Visual Studio Code, it is necessary to create a launch.json
to provide
targets for running the skill, running regression tests and running the interaction-model
builder. See the ide/
folder in the sample skills for example launch commands.
You can run the regression tests for the sample skills at the command line:
cd lambda
npm run test
Alternatively, run the unit tests in VS Code via the Launch Tests
launch target described
above. In either case, set the DEBUG
environment variable if you wish to see detailed
diagnostic output.
The Controls Framework provides APIs to create interaction-model builder scripts that
extract data from Controls and merge with existing model.json
files along with custom
programmatic modifications. Each sample skill includes a build_interaction_model.js
script that is used to create the associated interaction model files.
The build_interaction_model.js
script should be run before deploying or whenever changes
are made to skill code that would affect the interaction model. The output of the script
is a model.json file that can be diffed against the existing interaction model and then
copied across. To run an interaction builder, invoke the script with Node.js:
cd lambda
node src/build_interaction_model.js
If you wish to launch the builder script in VS Code, see the ide/
folder for an example
launch.json file with suitable targets.
The Hello World (Controls) skill is a minimal skill that uses the Controls Framework,
written in JavaScript. In response to a LaunchRequest
it says hello and then closes the
session. See the "Getting Started" section for details on how to create a local copy of
the Hello World skill.
index.js:
1. const Alexa = require('ask-sdk-core');
2. const { InputUtil, ControlHandler, ControlManager, Control, LiteralContentAct } = require('ask-sdk-controls');
3.
4. class HelloControl extends Control {
5. canHandle(input) {
6. return InputUtil.isLaunchRequest(input);
7. }
8.
9. handle(input, resultBuilder) {
10. resultBuilder.addAct(new LiteralContentAct(this, {promptFragment: 'Hello, world.'}));
11. resultBuilder.endSession();
12. }
13.}
14.
15. class HelloManager extends ControlManager {
16. createControlTree() { return new HelloControl('rootControl'); }
17.}
18.
19. exports.handler = Alexa.SkillBuilders.custom()
20. .addRequestHandlers(new ControlHandler(new HelloManager()))
21. .lambda();
22. exports.HelloManager = HelloManager;
Points of interest:
- Line 1: The
ask-sdk-core
package is imported as usual. - Line 2: New types are imported from the Controls Framework package.
- Line 4: Every Control implements the
IControl
interface and typically extends theControl
abstract base class to obtain default functionality. In this skill there is one control:HelloControl
. - Line 5-7: This block indicates that this control can handle a LaunchRequest. The Controls Framework introduces patterns to dynamically choose between controls if more than one indicates that it can handle an input.
- Line 9-12: The
handle()
method is executed for a control that can handle an input AND which has been chosen as the winner in cases where multiple controls can handle an input. Much more will be said about this! - Line 10: The framework separates response logic from rendering. A system act is
a simple object representing "something to communicate to the user." In this case the
LiteralContentAct
is used to communicate some arbitrary literal content. The framework defines various system acts that have precise purposes and developers can create new ones too. System acts are an intermediate representation that are converted to a completeResponse
during the rendering phase; in this skill the default rendering logic is sufficient and causes theLiteralContentAct
to add its literal content to theResponse.outputSpeech
. - Line 15: A
ControlManager
is a high-level organizer authored by the skill developer. It interfaces directly with the framework's runtime (ControlHandler
) and takes care of high-level behaviors. One key task is to create the tree of controls at the start of each turn. Here, the control tree comprises exactly one control. Note that every control instance must have an ID. Our control hasid = rootControl
. - Line 20: The framework provides the
ControlHandler
which is aRequestHandler
. The ControlHandler can be added just like any otherRequestHandler
. Skills can freely mix-and-match regular request handlers with control handlers. - Line 19-21: In total, these lines create a function that can be used as an entry-point by AWS Lambda.
- Line 22: This line exports the HelloManager for use in regression tests.
The Hello World skill needs an interaction model. Skills that use the Controls Framework
will generally build their interaction model using code, so we explore that here. The code
is in ./build_interaction_model.js
.
1. const { ControlInteractionModelGenerator, Logger } = require('ask-sdk-controls');
2.
3. new ControlInteractionModelGenerator()
4. .withInvocationName('hello world')
5. .addIntent({ name: 'AMAZON.StopIntent' })
6. .addIntent({ name: 'AMAZON.NavigateHomeIntent' })
7. .addIntent({ name: 'AMAZON.HelpIntent' })
8. .addIntent({ name: 'AMAZON.CancelIntent' })
9.
10. // Add a custom intent
11. .addIntent({ name: 'HelloIntent', samples: [
12. "Say hello",
13. "Say hi"
14. ]})
15.
16. // Build and write
17. .buildAndWrite('en-US-generated.json');
18.
19. console.log('Done. Wrote ./en-US-generated.json');
The main class is ControlInteractionModelGenerator()
which provides the methods
necessary to build up an interaction model in memory then write it to disk. In a skill
with more interesting controls we will make use of
ControlInteractionModelGenerator.buildCoreModelForControls()
to extract interaction
model content directly from the control tree but for Hello World that is not necessary. If
you already have a model.JSON
file, it can be loaded into the generator with
ControlInteractionModelGenerator.loadFromFile()
.
On line 17, the output is written. By convention, the output is directed to a temporary file where it can be viewed and compared against the existing interaction model. If everything looks good, the generated file can be copied over as the new interaction model.
cp en-US-generated.json skill-package/interactionModels/custom/en-US.json
The Hello World (Controls) skill can be deployed into AWS Lambda or used directly via local debugging.
The following will create a new skill in the your Alexa Developer Portal account and also deploy the code to your AWS account.
ask deploy
After this completes you can test the skill in the Alexa Developer Portal.
If you prefer to use Alexa Hosted skills, please create a hosted skill directly through the Alexa Developer Portal, then update with the Hello World code and interaction model.
The next section describes running the live skill with no code deployment at all.
To test this skill with the code running locally in Visual Studio Code, please follow the steps outlined in Test your local Alexa skill.
We recommend using local-debugging when doing initial live testing of your skills as it provides fast and easy access to log messages, and the ability to hit breakpoints can prove invaluable.
The Hello World skill, simple as it is, has two(!) regression tests. The tests are in
lambda/test/test1.js
and a launch configuration is in /ide/vscode/launch.json
. Select
the "Run tests" target and press F5
to run the tests. The test code is:
1. const { ControlHandler, SkillTester, waitForDebugger, TestInput } = require("ask-sdk-controls");
2. const { expect } = require("chai");
3. const { suite, test } = require("mocha");
4. const sinon = require("sinon");
5. const { HelloManager } = require("..");
6.
7. waitForDebugger();
8.
9. suite("all", () => {
10. test("launch", async () => {
11. const tester = new SkillTester(new ControlHandler(new HelloManager()));
12. const testResponseObj = await tester.testTurn("U: __", TestInput.launchRequest(), "A: Hello, world.");
13. expect(testResponseObj.response.shouldEndSession).equals(true);
14. });
15.
16. test("launch", async () => {
17. const tester = new SkillTester(new ControlHandler(new HelloManager()));
18. const testResponseObj = await tester.testTurn("U: __", TestInput.of('HelloIntent'), "A: Hello, world.");
19. expect(testResponseObj.response.shouldEndSession).equals(true);
20. });
21.});
Points of interest:
- Line 1-5: These tests use
mocha
,chai
andsinon
for the test harness. - Line 7: The
waitForDebugger
utility adds a pause to give the Visual Studio Code debugger time to attach. This is not necessary for VS Code version 1.41.1 and above. - Line 11: The Controls Framework provides testing support. Here a
SkillTester
is initialized with a reference to aControlHandler
ofHelloManager
. - Line 12: The
SkillTester
can run a turn through the skill code and validate that a particular prompt is produced.SkillTester.testTurn()
takes three parameters: the first is simply for documentation purposes to help reader comprehension of multi-turn tests. The second parameter is aControlInput
that will be the input for the turn. TheTestInput
class has utilities to create the most common kinds of input such as anIntentRequest
,LaunchRequest
and APLUserEvent
. The third parameter is the prompt that the skill is expected to produce. - Line 13: After testing a turn, further validations can be made by inspecting various objects. Here we ensure that the session is being ended.
- Line 16-20: This test mimics the situation when a user invokes the skill with "U:
Tell Hello World hi!" Line 18 in particular demonstrates creating a simple
ControlInput
that wraps anIntentRequest
. The expected skill behavior is identical to the first test.
Regression tests of this kind are extremely useful for validating a multi-turn skill and preventing regressions. We recommend a workflow that uses regression tests to validate key scenarios and then live testing to verify integrations and explore new dialog patterns.
The Fruit Shop skill uses all the same techniques as Hello World and dials them all up considerably.
The Fruit Shop skill uses the Controls Framework to build a shopping cart experience that demonstrates:
- Reuse and customization of built-in controls, specifically the
ListControl
,NumberControl
andDateControl
. - Building custom controls that provide novel UX and business functionality.
- Using container controls to organize and orchestrate child controls.
- Using APL and voice for input with the
ListControl
and the customShoppingCartControl
.
See the "Getting Started" section for details on how to create a local copy of the Fruit Shop skill. Also, consider skimming this section on a first read, then return to it after reading Section 5. Developing with Controls.
The index.ts
for Fruit Shop introduces some new ideas:
1. class FruitShopControlManager extends ControlManager {
2.
3. createControlTree() {
4. return new FruitShopControl({ id: 'root' });
5. }
6.
7. render(result, input, responseBuilder) {
8. renderActsInSequence(result.acts, input, responseBuilder);
9. if (!responseBuilder.isDisplayUsed()) {
10. if (getSupportedInterfaces(input)['Alexa.Presentation.APL']) {
11. addDefaultScreenAPLDirective();
12. }
13. }
14. }
15. }
16.
17. const controlHandler = new ControlHandler(new FruitShopControlManager());
18. const LocalizationRequestInterceptor = {
19. process(handlerInput) {
20. const locale = getLocale(handlerInput.requestEnvelope);
21. updateI18n(locale);
22. }
23. };
24.
25. exports.handler = SkillBuilders.custom()
26. .addRequestHandlers(controlHandler)
27. .withCustomUserAgent(`FruitShopDemo(JS) ${controlHandler.userAgentInfo()}`)
28. .addRequestInterceptors(LocalizationRequestInterceptor)
29. .lambda();
- Line 4:
FruitShopControlManager.createControlTree()
instantiates aFruitShopControl
which internally instantiates child controls and thus creates an entire tree. We could define the entire tree right here increateControlTree()
but that would break encapsulation; it's better to instantiate only the root control and have it keep the details of any child controls private. - Line 8:
render()
uses the default logic by callingrenderActsInSequence()
and it adds some additional code to ensure that there is always some APL produced for screen-enabled devices. This avoids leaving stale APL on the screen if a response doesn't include any new content. - Line 18-23, 28: A
RequestInterceptor
is used to initialize an internationalization module with the user's locale. Thei18n
module is used throughout the code to translate strings.
In total, FruitShopManager.createControlTree()
builds up the following tree:
The four leaf controls are all instances of built-in controls that are customized for their usage in this skill, while the containers are custom controls that are unique to this skill. This is a common pattern as built-in controls are generic and encapsulate domain-agnostic functionality such as "obtain a date from the user" while custom controls provide the higher-level and application-specific behaviors.
Before digging further into the details lets review the purpose of each control:
- FruitShopControl: The root control takes care of application-wide events, such as
LaunchRequest
. It delegates most inputs to its children. - ShoppingCartControl: The workhorse of this skill. It manages a shopping cart of items and orchestrates its child controls to produce a coherent experience. It delegates simple inputs to its children.
- CategoryControl: An instance of the reusable
ListControl
which "obtains from the user an item from a list of options" where the options are defined by a slot type in the interaction model. TheListControl
natively knows how to present the list to the user with voice and APL, it understands correction intents (e.g. "U: change the category") and it deals with invalid inputs. This particular instance is customized to ask the user to select one item from theCategory
SlotType. - ProductControl: Another
ListControl
, in this case customized to obtain a specific product from the user. The props of this Control include dynamic code to adjust the list that is presented depending on the category of products that the user selects. - ItemCount: An instance of the reusable
NumberControl
to obtain a number from the user while taking care of corrections, validations, and even resolving some common misunderstandings such as Alexa hearing "50" when the user says "15". - CheckoutContainer: A simple container that doesn't have any business logic of its own. This is included for demonstration purposes.
- DeliveryDateControl: An instance of the reusable
DateControl
to obtain a specific date from the user.
A good way to explore this skill further is to dive into the code and trace through the regression tests with the framework logging turned on. The code has many comments to help explain nuances and patterns.
You can also deploy the skill or hook up local-debugging to test out the user experience with the Skill simulator in the Alexa Developer Portal or on a device.
The first leaf control has the job of obtaining a "Product Category" from the user, i.e.
either "Fruit" or "Vegetables". It is an instance of the built-in ListControl
and its
definition can be found in shoppingCartControl.js
.
Almost all the properties on the built-in controls are optional and have sane defaults.
So, the CategoryControl
is functional with only the following configuration:
Minimal configuration:
1. this.categoryControl = new ListControl(
2. {
3. id: 'category',
4. slotType: 'Category', // The ID of a slot type in the interaction model
5. listItemIDs: ['fruit', 'vegetable'],
6. }
7. );
- Line 1: This control is a customized instance of the
ListControl
which "obtains a value from the user by presenting a list with voice and APL." The rest of the props forListControl
are optional and have defaults that allow the control to be used "out of the box" and then be customized as needed. - Line 3: Every control has an ID.
- Line 4: A
ListControl
is configured with the name of slot type that defines the values and synonyms that may be used with the control. By declaring the slot type, this control is able to recognize voice inputs that should be handled. - Line 5: A
ListControl
needs to know the list of slot value IDs.
As development of the skill proceeded, additional props on the control were customized in order to make the skill more robust and to improve the user experience:
Final configuration:
1. this.categoryControl = new ListControl(
2. {
3. id: 'category',
4. slotType: 'Category',
5. listItemIDs: Constants.categoryIds,
6. validation: (state) => (Constants.categoryIds.includes(state.value) ? true : { reasonCode: 'CategoryIdNotMatch' }),
7. interactionModel: {
8. targets: ['category'],
9. actions: {
10. set: [Strings.Action.Set, Strings.Action.Select, 'add']
11. }
12. },
13. prompts: {
14. requestValue: i18n.t('CATEGORY_CONTROL_REQUEST_VALUE'),
15. valueSet: '',
16. invalidValue: (act) => {
17. if (act.payload.reasonCode === 'CategoryIdNotMatch') {
18. return i18n.t('CATEGORY_CONTROL_CATEGORY_IDS_VALIDATION_FAIL');
19. }
20. return i18n.t('CATEGORY_CONTROL_GENERAL_VALIDATION_FAIL');
21. }
22. },
23. apl: {
24. requestValue: {
25. document: generateProductListDocument()
26. }
27. }
28. }
29. );
-
Line 6: Alexa's NLU system does not guarantee that a slot value will be filled with one of the defined values (which can help in situations where the list is only partially known). For this skill we only know how to handle 'fruit' and 'vegetables' and so we add a validation rule to test this.
Note that the validation function takes the
ListControl
state object as a parameter. This function is only called oncestate.value
has been populated. A general rule of all built-in controls is thatstate.value
contains the "working value" even if that "working value" doesn't pass validation or other tests. Parent controls can test if the working value is ready for use by callingcontrol.isReady()
.Validation functions either return
true
meaning valid, or an object to describe the failure. The object can contain either a code, a literal prompt fragment or both and we recommend always including a code as it can be identified programmatically inrender
methods if an override is required.Validation functions can also be written compactly with the logical-OR operator:
(state) => Constants.categoryIds.includes(state.value) || { reasonCode ".."}
-
Line 7-12: Every control has a relationship to the interaction model that regulates which inputs will be handled. Section 5.7. Interaction Model has more information.
The framework defines various standard
actions
andtargets
with IDs and standard synonyms and these are used as the defaults. The defaults for theListControl
are:interactionModel: { actions: { set: [Strings.Action.Set, Strings.Action.Select], change: [Strings.Action.Change], }, targets: [Strings.Target.Choice, Strings.Target.It] },
The Category Control adjusts the slot values that will be used in order to customize precisely what utterance will be handled.
-
Line 8: Here we see that this control defines
target = "category"
to make the control listen for inputs that have a target-slot withvalue = "category"
. This is how this control can tell that it should handle inputs such as "U: Change the category". -
Line 10: The
ListControl
has two capabilities, called "set" and "change". There can be different verbs associated with these capabilities and so line 10 declares the list of verbs associated with the "set" capability to be "set", "add", "select" and their synonyms. So, for example, if the user says "U: Select fruit please" this control will respondcanHandle = true
and setthis.state.value = "fruit"
duringhandle()
. Each item in the array is the ID of a slot value for the Action slot type. New action slot value IDs should be added for semantically different verbs and synonyms can be added when there are not major semantic differences. So, for example, if we assume the the Action slot value withid="select"
has the synonym "choose" then an utterance "U: I choose fruit" will arrive asaction = "select"
and be recognized by this control as a request to use the "set" capability. All controls test theaction
andtarget
of an input to determine whether the input is relevant. If there is any mismatch the control will returncanHandle = false
.Ultimately, the aim of the
interactionModel
prop is to ensure the control respondscanHandle = true
for inputs that are relevant and does not attempt to handle inputs that are not relevant. -
Line 13-22: Every control knows how to communicate with the user. "What" to say is encoded as
SystemActs
and "how" to say it is determined during the render phase. Each control knows about a set of system acts and defines a default prompt/reprompt/apl for each, but skill developers can override the defaults by setting the prompt prop. There are other opportunities to override the prompts too: TheControl.renderAct()
andControlManager.render()
methods can be overridden to provide even greater flexibility. -
Line 14: When the control wants to ask the user for a value, it produces a
RequestValueAct
. This line provides a new phrasing and uses thei18next
library to allow easy localization. -
Line 15: When a
ListControl
obtains a value from the user, it produces aValueSetAct
to communicate back to the user that the value was received successfully. By default this system act is rendered as a prompt like "A: OK, {value}", but it isn't always necessary to be this verbose, particularly if subsequent dialog will convey this information. -
Line 16: Override of the prompt to produce when validation fails.
-
Line 24: The
ListControl
has a default APL template that it uses to present the list options visually. This control uses a slight variation of the default. SeeproductListAPL.js
for the custom template definition. TheListControl
natively listens for APLUserEvent
inputs that include the appropriate controlID as a parameter. Thus even a custom APL template can be wired up easily to be an interactive input mechanism.
The other leaf controls are configured in a similar manner.
The simplest container control in Fruit Shop is the checkout control to demonstrate that
creating a Container Control can be very simple. The default logic on the
ContainerControl
base class will take care of automatically calling canHandle
on the
children and delegating to them, and likewise for canTakeInitiative
.
1. class CheckoutControl extends ContainerControl {
2. constructor(props) {
3. super(props);
4.
5. this.addChild(
6. new DateControl({
7. id: 'deliveryDate',
8. interactionModel: {
9. targets: ['deliveryDate'],
10. },
11. validation: DateControlValidations.FUTURE_DATE_ONLY,
12. prompts: {
13. requestValue: 'When would you like these items delivered?',
14. valueSet: (act) => `Delivery will be on ${act.payload.value}.`
15. }
16. })
17. );
18. }
19. }
- Line 1: All container controls must implement
IContainerControl
and most will do this by extending theContainerControl
abstract base class which has useful default functionality. - Line 5: Container controls usually instantiate child controls in the constructor because most container controls are much more than just a grouping of controls; they provide higher-level functionality by orchestrating the child controls and adding custom logic. Thus the specific set of child controls is usually an intrinsic feature of a container control.
- Line 6-16: In this particular case, the
CheckoutControl
only has one child: aDateControl
that will obtain and validate a date from the user. A useful extension would be to also obtain a payment method from the user in which case theCheckoutControl
would get additional child controls and perhaps some orchestration logic.
The most interesting and complex control in the Fruit Shop skill is the
ShoppingCartControl
which manages three child controls and the actual shopping cart
items array. It performs various orchestration tasks to keep everything consistent and
ensure a smooth user experience. The canHandle()
for ShoppingCartControl
is:
1. class ShoppingCartControl extends ContainerControl {
2. ...
3. async canHandle(input) {
4. return this.isAddItemIntent(input)
5. || this.isAddProductIntent(input)
6. || this.isCheckCartIntent(input)
7. || this.isResponseToWantingAnotherItemQuestion(input)
8. || this.isClickOnCartItem(input)
9. || await this.canHandleByChild(input)
10. || this.isFallback(input);
11. }
12. ...
13. }
- Line 4: The
ShoppingCartControl
doesn't only delegate inputs to its children. It also directly handles higher-level inputs such asAddItemIntent
which carries both a product and an item count. This is the general pattern for dealing with intents that comprise many pieces of information: aContainerControl
explicitly handles complex inputs and distributes the information as necessary but delegates simple inputs to the child controls whenever possible. - Line 4-9: It is usual to check all the specific things that the container can handle directly before asking the children.
- Line 9: The function
canHandleByChild
is defined on theContainerControl
base class. It callscanHandle
on each child and if there is more than one that wants to handle an input, it usesthis.decideHandlingChild()
to pick a winner. Note thatthis.canHandleByChild()
is anasync
function and so theawait
is mandatory. - Line 10: Testing for a fallback intent happens last of all because we would prefer
that a child handle a
FallbackIntent
if it wants to.
The start of the ShoppingCartControl.handle()
function is:
1. class ShoppingCartControl extends ContainerControl {
2. ...
3. async handle(input, resultBuilder) {
4. try {
5. // the "core" input handling
6. if (this.handleFunc) {
7. await this.handleFunc(input, resultBuilder);
8. } else {
9. await this.handleByChild(input, resultBuilder);
10. }
11. ...
12. // "post-handling" code.
- Lines 4-10: This is a standard pattern to handle an input in a
ContainerControl
. It assumes that duringcanHandle()
we took a memo of either ahandleFunc
or thatcanHandleByChild
returnedtrue
. Thus we can either call thehandleFunc
directly or delegate to the selected child viathis.handleByChild
. - Lines 12+: The rest of the handle function comprises "things to do after core
input handling is complete." One task of this post-handling code in
ShoppingCartControl
is to ensure that the category and product are in sync: E.g. if handling caused the product to change from 'carrots' to 'apples' then thecategoryControl.state.value
must be coerced to 'fruits'. The post-handling code also commits a new record to the shopping cart once both the product control and the number control have a value, AND neither of them has any follow-up questions to ask.
The last container control is the root control, FruitShopControl
. Its canHandle
starts
with:
1. async canHandle(input) {
2. if (await this.canHandleByChild(input)) { // NOTE: The await is important (and easy to forget)
3. this.handleFunc = this.handleByChild;
4. return true;
5. }
6. else if (InputUtil.isLaunchRequest(input)) {
7. this.handleFunc = this.handleLaunchRequest;
8. return true;
9. }
10. else if (InputUtil.isSessionEndedRequest(input)) {
11. this.handleFunc = this.handleSessionEndedRequest;
12. return true;
13. }
14. ...
15. }
- Lines 1+: This particular
canHandle
function is is implemented with if-else logic as an alternative to the ladder of calls to predicate-functions seen in the other containers. As the logic grows more complex we recommend refactoring to the predicate-ladder style for better readability and maintainability. - Lines 2-5: All container controls should ask if their children can handle the input. In this case we happen to ask the children first; the ordering only matters if there are inputs that can be recognized by both the container and its children.
- Lines 6-13: The root control will often take care of application lifecycle inputs
such as a
LaunchRequest
andSessionEndedRequest
. - Lines 14+: The root control handles various other intents — see the code for details.
The Fruit Shop skill needs an interaction model, and it is generated programmatically by
build_interaction_model.js
.
1. new ControlInteractionModelGenerator()
2. .buildCoreModelForControls(new FruitShopControlManager())
3. .withInvocationName('fruit shop')
4.
5. // Mandatory intents
6. .addIntent({ name: 'AMAZON.StopIntent' })
7. .addIntent({ name: 'AMAZON.NavigateHomeIntent' })
8. .addIntent({ name: 'AMAZON.HelpIntent' })
9. .addIntent({ name: 'AMAZON.CancelIntent' })
10. .addIntent({ name: 'AMAZON.YesIntent' })
11. .addIntent({ name: 'AMAZON.NoIntent' })
12. ... more standard intents ...
13.
14. // Domain-specific intents
15. .addIntent({
16. name: 'CheckCartIntent',
17. slots: [{ name: 'cart', type: 'cart'}],
18. samples: ['what items do I have', 'what is in {cart}', ...]
19. })
20. ... AddItemIntent ...
21. ... AddProductIntent ...
22.
23. // Domain-specific slot types
24. .addOrMergeSlotTypes({
25. name: 'Category',
26. values:
27. [
28. { id: 'fruit', ... },
29. { id: 'vegetable', ...}
30. ]
31. })
32. ... Product slot type ...
33. ... cart slot type ...
34.
35. // Add domain-specific targets
36. .addValuesToSlotType('target', {id: 'category', ...})
37. .addValuesToSlotType('target', {id: 'product', ...})
38. .addValuesToSlotType('target', {id: 'count', ...})
39. .addValuesToSlotType('target', {id: 'deliveryDate', ...})
40.
41. // Domain-specific action slot-value.
42. .addOrMergeSlotTypes(
43. {
44. name: 'action',
45. values: [
46. {
47. id: 'add',
48. name: {
49. value: 'add',
50. synonyms: [
51. 'add',
52. 'add more',
53. 'add some',
54. ]
55. }
56. }
57. ]
58. }
59. )
60. .buildAndWrite('en-US-generated.json');
61.
62. log.info('done');
- Line 1:
ControlInteractionModelGenerator
is used to programmatically build an interaction model for a skill that uses controls. - Line 2:
.buildCoreModelForControls()
walks the control tree and obtains all the interaction model goodies that are pre-defined by the controls. There is quite a lot! - Line 6-12: Simple intents are added.
- Line 15-19:
CheckCartIntent
,AddItemIntent
, andAddProductIntent
are domain-specific and will be handled by theShoppingCartControl
. The definitions follow the usual JSON encoding for an Intent. - Line 24-31: The
Category
,Product
andCart
slot types are domain-specific.Category
andProduct
are used in auto-generated intents that are created byListControl
.Cart
is used in theCheckCartIntent
. - Line 36-39: The built-in controls use intents that include a
target
slot. This allows them to determine if certain inputs should be handled. For example, the intent object for "U: Change the delivery date" will includetarget = deliveryDate
and only theDeliveryDateControl
will attempt to handle it. - Line 42-59: The
CategoryControl
andProductControl
both listen for the verb "add" such as in "U: add some fruit" or "U: add bananas". To achieve this, a new value for the Action slot type is created here. To finish the wiring, the interactionModel props for each control mention that "action = add" should be linked to the "set value" capability. - Line 60-62: The interaction model is written to a disk and a message confirms completion.
The Fruit Shop skill includes regression tests that demonstrate functionality and which
can be used for stepping through the code and observing how things operate. To use them
for learning purposes, mark one at a time with test.only()
and launch in your IDE. The
file ide/vscode/launch.json
provides launch targets that can be copied to .vscode
.
Leaf-controls are not generally responsible for complex cross-validations or business functions involving many pieces of information as that is typically a job best performed by their parent. Furthermore, controls that are intended to be reused in many scenarios should not take a direct dependency on their environment. However, the end developer can make trade-offs as they wish between purity and convenience. For example, it is quite possible to do cross-validation in a NumberControl by peeking at the state of a sibling control. This may be significantly simpler than having a parent perform cross-validations and pushing state updates around. The Fruit Shop skill demonstrates such peeking: specifically, the ProductControl peeks at the CategoryControl to choose which products to recommend.
In general, it is good to think of leaf controls as pure data gatherers that blindly obtain data for their parent to use in business functions. Leaf controls can be pure in this sense that they truly are unaware of their environment but in the interests of convenience the developer may relax this purity and allow them to peek at external data. The risk of impurity is that controls will become over-coupled and the code will be less comprehensible and maintainable as a result. We recommend being disciplined when relaxing data purity. For example, a rule that any peeking be restricted to sibling controls is a good place to start.
The central concept of Controls is that a skill can be built from controls that encapsulate portions of the skill's state and dialog logic such that controls can be developed independently and then composed together to build an entire skill. The framework provides patterns for building and grouping controls and it defines the rules regarding how controls should interact.
Building a skill using Controls can take a bit of a mental adjustment for developers who
are used to implementing skills with a list of RequestHandlers
. The payoff for this leap
is that dialog-rich skills can be built using structured and extensible patterns. A
further payoff is that general-purpose and customizable controls can be built and shared with
other developers and projects.
Controls collect information, make inferences, resolve confusion/ambiguity, and ask questions. Low-level controls obtain reliable information that higher-level controls can use, and the higher-level controls do interesting things with the information and talk to the user about the results.
Simple controls might only obtain a single item of data but they nonetheless have to do
more than simply capture a slot value. For example, a skill may want to obtain a number
representing the quantity of something. It would make sense to have a QuantityControl
that obtains this information and it would have the responsibility to ask for, record,
validate, confirm, update and respond to questions about the quantity. The rest of the
skill can thus be freed from this responsibility and can simply query the
QuantityControl
to see if the obtained value is ready for use in business functions.
A QuantityControl
will often run simple dialogs with the user, such as:
A: How many items do you require?
U: Ten
A: Got it, ten items.
However, a robust control will have the capacity to run much more intricate dialogs when
the need arises. For example, a complete QuantityControl
should be able to run the
following dialog fragment on its own:
A: How many items do you require?
U: What is the maximum?
A: I can order up to 100 items.
U: hmm. ok, I want nine items.
A: Got it, nine item.
U: No, ninety.
A: Sorry! Ninety items - is that correct?
U: Yes.
A: Great.
...
The fact `items = 90` is now available to other controls via `quantityControl.value`.
Other controls know this fact is ready for use because `quantityControl.isReady() = true`.
...
Later on the use asks about 'items ordered' and the input is recognized by the QuantityControl
...
U: How many items did I order?
A: You ordered ninety items.
U: Change it to eighty.
A: I've updated it from ninety to eighty items.
... The fact `items = 80` is now available to other controls ...
Thus controls have a much more specific job than a generic RequestHandler
— controls
manage information gathering and responses to that information and by abstracting these
problems away from the rest of the application we can scale skill complexity more reliably.
In full, each control will have some or all of the following responsibilities:
- Ask the user for some information.
- Track the information and any pertinent metadata as dynamic state.
- Validate the information against business rules.
- Provide feedback to the user about what was captured and any violations.
- Explicitly confirm the information if appropriate.
- Run business functions using the obtained information and communicate results.
- React to additional commands regarding the information: change, inspect, delete, and so on.
- Make the information available to other parts of the application, particularly the parent control.
After a control has completed its work its parent may clear out its state and repeat the process. So, for example, a game loop could be built with one set of controls that continuously obtain and refine a complete "game input" which are then reset and used to obtain the next "game input".
Examples of the information that a control may manage:
- A single integer representing a quantity of items.
- The user's gamer ID.
- A complete address.
- A shopping cart of items.
- The dates of a vacation.
- Delivery order details that include shipping method and special instructions.
- The action that a user wishes to take in a game.
To summarize, when building with controls, think first about what business information must be gathered and how that can be decomposed into well-defined chunks. This will motivate the right set of controls and their hierarchy. Only then start to think about what specific inputs they will need to react to.
A control will generally need to handle many different inputs including various intents.
For example, many controls ask the user yes/no questions and so many controls are
listening for AMAZON.YesIntent
and AMAZON.NoIntent
. In a skill built with controls,
there is no single "YesIntent handler" - every control can be listening for that event if
it wishes. However, controls live in a collaborative environment and they must not be
over-eager to handle an input that isn't appropriate for their current state. So,
for example, if a control has just asked the user a yes/no question it is appropriate for
that control to reply canHandle = true
if the next input is a YesIntent
. Other
controls that have not just asked a question should instead respond with canHandle = false
.
There may be situations where it is reasonable for two or more controls to respond
with canHandle = true
: in these cases a parent control must pick a winner.
Regarding "what information to collect," low-level controls collect simple things such as
a single date representing a birth date or perhaps the date of an upcoming event. The
built-in DateControl
, for example, is a control that collects a single date from the
user. It normally receives a date via a single AMAZON.DATE
slot but it can be extended
to implement other strategies such as asking for the year, month, and day in sequence. The
DateControl
can validate whether an obtained date is suitable for use and uses inference
to deal with ambiguous dates. For example, if a user answers "A: what date?" with "U:
2019" it may be reasonable to infer they mean Jan-1-2019 whereas in other cases it might
be more appropriate to infer they mean today's date, but for 2019. The DateControl
abstracts and packages logic of this kind so that the parent control and the rest of the
application can be freed from these details.
"Higher-level controls" are typically container controls that orchestrate low-level
controls and implement higher-level logic to manage a more complex piece of information
such as a complete address. The built-in DateRangeControl
is an example of a
higher-level control that is configurable and reusable. The DateRangeControl
delegates
any inputs to do with just the 'startDate' or the 'endDate' to child controls, each of
which is a regular DateControl
. Any input or situation that affect both the start and
end date are managed by the DateRangeControl
itself. The higher-level control provides
the same interface to its parent as the low-level controls. That is, it abstracts away the
fact that child controls are participating. Every parent control should interact with its
children as though they are leaf controls.
An ASK SDK RequestHandler
receives the incoming Request
. The framework provides a
runtime (ControlHandler
) that can be used as a standalone RequestHandler
or
incorporated into an existing one. The ControlHandler
drives the processing of each turn
as a sequence of phases:
- Initialization phase: Constructs a tree of controls and deserializes their state.
- CanHandle phase: Determines a specific chain of controls in the tree that will handle the input.
- Handle phase: Walks the handling chain to make state updates, call business functions, and produces results for the user.
- Initiative phase (if needed). Determines a chain of control to take the initiative and continue the dialog. This phase runs if the handle phase did not take or continue the initiative.
- Render phase Transforms the results into a complete user-facing response that includes speech, APL or both.
- Shutdown phase Collects and serializes the state of each control and returns the complete response.
The following diagram shows the high-level runtime flow as a Request
is consumed and a
Response
generated.
In this opening phase the ControlHandler calls ControlManager.createControlTree()
and
then reestablishes the state of each control from data in the session-attributes. Some
additional preparations are made such as collating information into a single
ControlInput
object to reduce the number of objects passed around.
The canHandle phase analyzes the input and skill state to determine the chain of controls
that will handle the input. To begin, the ControlHandler
calls
rootControl.canHandle(input)
. The root control will generally check for a delegation
opportunity by querying its children with this.canHandleByChild()
. If exactly one child
replies canHandle = true
then the rootControl can simply return true
but if more than
one child replies canHandle = true
then a decision must be made. The ContainerControl
abstract base class provides a general pattern to deal with this by defining
ContainerControl.decideHandlingChild()
which is called with the list of all candidate
children. The decideHandlingChild()
function implements the decision logic to deal with
the ambiguity and can be as simple as choosing the first in the list or something more
detailed and context-specific. The default implementation has some general-purpose
behavior which is described in the tech docs for Container Control.
Eventually a result is returned to the ControlHandler
and passed back the the
CustomSkill
driver. Assuming the root control returns canHandle = true
, a
handling chain can be identified by walking the tree and observing the planned
delegations. The handling chain describes which controls will participate during handle
and the delegations they will make. The chain can terminate on an internal container node
if the container is not planning to perform a simple delegation.
The following diagram shows an example that may occur in the Fruit Shop skill for the dialog:
(previous prompt) A: What type of fruit would you like?
(current input) U: Apples.
Assume that on the previous turn the ProductControl
took the initiative and issued a
RequestValueAct
that was rendered as "What type of fruit would you like?" When the user
replies with "Apples" the ProductControl
determines that the user is answering its
question and returns canHandle = true
to its parent. Before returning, the
ProductControl
takes a memo that it will use .setWithValue()
if it does in fact get
chosen to handle the input. The category & item count controls return canHandle = false
as they don't recognize intents with the product slot type (and do not have any questions
pending an answer). Given that one child responded positively, the Shopping Cart returns
canHandle = true
and takes a memo that it will forward to the Product control. In
diagram form:
With diagnostics turned on, this is shown in the log messages as a text diagram with 'H' to indicate the controls in the handling chain:
FruitShop
|= H = cart
| |----- category
| |= H = product
| |---- itemCount
|---- delivery
| |---- deliveryDate
Assuming the control handler returns canHandle = true
then the CustomSkill
driver will
call ControlHandler.handle()
to actually do the work.
During the handle phase the input is passed down to the tail of the handling chain and work is performed as the stack unwinds back to the root. As the call stack unwinds each control in the chain has the opportunity to observe the state changes in its subtree and make any further updates. In this way the work of actually handling an input starts low in the tree and works its way back up until the root control has the final opportunity to make any necessary changes. This pattern allows for small changes to be handled locally while providing the power to perform broad changes when necessary.
After making state changes a control can run business functions such as querying a
database with the updated state. Finally, decisions are made about what to communicate to
the user. "What" to communicate is encoded as SystemActs
that are added to the
ControlResultBuilder
object (see ISystemAct
and the SystemAct
/ ContentAct
/
InitiativeAct
abstract base classes). Briefly, system acts are objects that represent
what to communicate to the user but not how to present it; the presentation occurs during
the render phase. See Section 5.4. System Acts for more detail.
Overall, each control in the handling chain will perform the following:
- Delegate to the next control in the chain by calling
child.handle()
.
Then, when child.handle()
returns:
- Update state variables in
this.state
. Containers may also manipulate the state of their children to make everything consistent. This is how sibling controls can be brought/forced into sync after one of them changes state. - Decide what feedback to give to the user in the form of system acts.
- Run a business function to do something productive for the user. Any application
content arising from the business function should be communicated to the user via a
custom system act. For example, a weather skill would define and issue a custom
TellWeatherAct
that carries the pertinent information. - Ask the user a question or otherwise push the dialog forward, by means of a system act.
The handling controls are not required to produce a question or other initiative act. However, if any of the handling controls has a useful followup it should add it directly during handle() to ensure the dialog stays on topic. If, however, there are no further questions or follow-ups on the active topic the handling controls should not produce an initiative act. A common pattern is add
handle() {
...
if (this.canTakeInitiative(..)){
takeInitiative(..)
}
}
to the end of each handle method.
It is a best practice and a certification requirement that Alexa drives the conversation
forward on every turn to help the user achieve their goals. However, as noted above the
controls in the handling chain may not have 'any further questions' during handle()
which brings us to the need for an initiative phase.
The initiative phase runs if the handle phase did not generate an initiative act and its purpose is to identify a control that can take the initiative. Choosing a control to take the initiative follows the same basic shape as the canHandle phase:
-
The
ControlHandler
callsrootControl.canTakeInitiative()
which either directly replies or queries its children viathis.canTakeInitiativeByChild()
. If multiple child controls want the initiative a decision is made bythis.decideInitiativeChild(candidates)
.During
canTakeInitiative
, no durable state changes should be made. -
Assuming
rootControl.canTakeInitiative = true
thenrootControl.takeInitiative()
is called to walk the chain and actually generate an initiative act and make any necessary state changes to track this fact.
When diagnostics are turned up the log will contain diagrams to assist tracing and debugging. The diagrams use "H" to indicate controls in the handling chain, "I" for controls in the initiative chain, and "B" for controls in both.
Consider the following hypothetical dialog:
U: Add some bananas
A: Great! we have lots of them. How many would you like?
[ from acts in Handle phase ] [ from Initiative phase ]
The input for this turn, "U: Add some bananas", is handled by the chain FruitShopControl -> ShoppingCartControl -> ProductControl
. In this case the value is accepted and a
ValueSetAct
is produced. Validation passes and the Product control has no further
questions for the user. Thus, handling the input finishes without an initiative act and so
the initiative phase runs. During the initiative phase the initiative chain is determined
to be FruitShopControl -> ShoppingCartControl -> ItemCountControl
.
The associated diagnostic diagram shows the cart control being involved in both handling and initiative, the product control being part of handling, and the item count control being part of initiative generation:
FruitShop
|= B = cart
| |----- category
| |= H = product
| |= I = itemCount
|----- delivery
| |----- deliveryDate
At the conclusion of the initiative phase, the ControlResult
is complete and contains
system acts describing information to be communicated to the user. Unless the session is
ending, the ControlResult
is expected to contain an initiative act.
It is interesting to note that the initiative control may be very distant in the tree from the handling control. This type of situation can arise if the user takes a wide detour such as in the following:
A: When would you like to travel to Paris?
U: Is the Louvre in Paris?
A: Yes it is! Would you like to hear more about it?
U: No thanks.
A: No problem. When would you like to travel to Paris?
In the last turn, it is reasonable to expect that some CityInformationControl
handles
the users input of no thanks
but since it has no further questions for the user the
initiative phase kicks in. In this case we assume that some BookingDate
wins the right
to take the initiative, likely via the default ContainerControl
logic that biases to the
most recently active control). The CityInformationControl
and the BookingDate
control
might have no common ancestors other than the root control, but that is no impediment.
The separation of the handling phase from the initiative phase is one of the patterns that allows controls-based skills to be built in a modular fashion while supporting natural flowing dialog as the topic changes.
The input to the render phase is a ControlResult comprising system acts that describe "what" to tell the user. The render phase converts the system acts into user-facing content in the form of a prompt/reprompt and, optionally, APL.
The render phase begins with the ControlHandler
calling ControlManager.render()
. The
render method can use various strategies depending on the situation such as rendering
directly or performing incremental generation. The most common is to perform incremental
generation by rendering each act one-by-one and concatenating their output. The
ControlManager
abstract base class provides this behavior by default. If you are
overriding the ControlManager.render()
function it is common to write logic for special
cases and then process any remaining acts with
ControlManager.renderActsInSequence(remainingActs)
.
Every system act is associated with a specific control and the renderActsInSequence()
function calls control.renderAct()
on the associated control. This pattern allows
different controls to have different rendering logic for common system acts. So, for
example, a NumberControl
instance may render RequestValueAct
as "How many would you
like?" whereas a ListControl
instance may render the same act as "Would you like apples,
oranges, or bananas?". In cases where the rendering of an act is not control-specific,
control.renderAct()
may delegate to act.render()
and have the act object do the
rendering itself.
Thus there are three levels at which rendering can occur, from most general to most localized:
controlManager.render()
processes the entire list of acts. It chooses whether to process acts as a group, render them one-by-one, or some mixture of the two.control.renderAct(act)
renders a single act in a control-specific way.act.render()
renders a single act in a generic way.
See the demonstrations in /test/rendering.spec.ts
for examples and further information
regarding the three different styles of rendering acts.
Regardless of where the transformation occurs, the render process uses a
ControlResponseBuilder
to build up the complete response. ControlResponseBuilder
extends the usual ResponseBuilder
and offers new features such as
responseBuilder.addPromptFragment()
for incremental generation.
The shutdown phase collects the control state and serializes it to session attributes.
This is accomplished by calling control.getSerializableState()
on each control and
transforming the result with JSON.stringify()
. The default implementation of
Control.getSerializableState()
is to simply return the control.state
object which is
assumed to be serializable. If control.state
is not serializable,
getSerializableState()
should be overridden to produce an object that is.
When it is time to close the session with the user a control should call
ControlResultBuilder.endSession()
during handle
or takeInitiative
. To close the
microphone but otherwise keep the session open, use
ControlResultBuilder.enterIdleState()
instead. Either way, the ControlHandler
will
observe this and call Response.withShouldEndSession()
with the appropriate parameter.
If an exception is unhandled during processing it will be captured by the ControlHandler
and forwarded to ControlManager.handleInternalError()
for custom logging or similar; the
default behavior is to log the error details to stderr
at LogLevel = 'error'
.
Regardless of what ControlManager.handleInternalError()
does the error is re-thrown to
be caught by the CustomSkill
class and which runs any regular CustomSkillErrorHandlers
and causes the session to be terminated.
A system act represents "what" the skill wants to communicate to the user. As such, the system acts are the intermediate representation between the handling phases and rendering.
Every system act comprises:
- A name that describes its general meaning.
- A payload of data that further specifies its precise meaning.
- A reference to a specific control.
These three components, (name, payload, control)
, should together fully describe the
meaning of a SystemAct
instance.
The phrase "system act" comes from the literature on building voice interfaces and there are many different approaches taken to categorizing and specifying the things that a digital assistant can say. In the Controls Framework, we distinguish only two categories of system acts: Content acts and Initiative acts. The Content acts are "plain information" whereas the Initiative acts ask or tell the the user to do something. As discussed in the sections on the runtime phases, the skill must produce initiative on every non-terminal turn and so the initiative acts have special significance. Another difference is that, in a single turn, the skill can produce many content acts but may only produce one initiative act.
A guiding principle when defining system acts for reusable controls is that they should have rigorous and unambiguous semantics. However, this is easier said than done as new nuances will creep in as the language controls "speaks with" becomes more complex. The best insurance is to give acts very clear names and semantics so that they hopefully stand the test of time.
The name should clearly and succinctly represent the purpose so that other developers can write rendering code and reuse them in other contexts. Always look to see if an existing system act meets a need before defining a new one, but don't use an act that doesn't precisely represent what needs to be communicated — it is better to create a new act in such situations.
Content acts describe "something to tell the user." A common content act is ValueSetAct
which means "tell the user that a value was received and recorded." The default rendering
is "OK, {value}" but it could equally be rendered as nothing at all. ValueSetAct
is used
to provide feedback to the user so that they are aware things are going well and can also
provide "implicit confirmation" if the rendered prompt includes the value itself. Note the
that act is not called ImplicitFeedbackAct because there are many different ways to
perform that dialog function and the rendering of an act has a bearing on what precisely
it is accomplishing. The name "ValueSetAct" is intended to convey clearly the simple fact
that a value was obtained and recorded.
The LiteralContentAct
is a special catch-all act that is free of semantics. It simply
transfers some literal content which is helpful in situations where creating a new
well-defined act is too much.
The content acts defined by the Framework controls include:
- ValueSetAct: A value was obtained from the user and recorded.
- ValueChangedAct: A value was obtained from the user and it changed a previously recorded value.
- ValueAddedAct: A value was obtained from the user from a list of choices and recorded.
- ValueRemovedAct: A value was obtained from user and removed from recorded list.
- ValueClearedAct: The skill cleared all its recorded values.
- ValueConfirmedAct: The skill received 'yes' in response to a confirmation question or otherwise learned that value being tracked meets the user's expectations.
- ValueDisconfirmedAct: The skill received a 'no' in response to a confirmation question or otherwise has learned that a value being tracked is not what the user wants.
- AcknowledgeInputAct: The input was heard and made sense.
- InvalidValueAct: A value the system is tracking failed validation.
- InvalidRemoveValueAct: The input value provided to remove from recorded list is invalid.
- LaunchAct: The skill has just launched.
- LiteralContentAct: A catch-all. This can be used to transfer literal content without defining a new act for every unique need.
- NonUnderstandingAct: The skill does not understand what the user said. The input cannot be used.
Initiative acts "ask the user a question" or "suggest what the user can/should do next." These acts propel the dialog forward and give the user a strong sense of engagement.
As with content acts, the initiative acts have factual semantics and do not attempt to provide an ontology for dialog functions or otherwise provide coverage of all possible needs. The built-in initiative acts are sufficient for the built-in controls to function and can be reused in other contexts. Developers should create new initiative acts if none of the built-ins are a good match for their need.
The special LiteralInitiativeAct
is a catch-all with no semantics that can be used when
creating a new act is not warranted.
The initiative acts defined by the framework include:
- RequestValueAct: Ask the user for a value with an open question, e.g. "When is your event?"
- RequestChangedValueAct: Ask the user for a new value with an open question.
- RequestValueByListAct: Ask the user for a value by presenting them a list.
- RequestRemovedValueByListAct: Ask the user for a value to be removed by presenting them a list.
- RequestChangedValueByListAct: Ask the user for a new value by presenting them a list.
- ConfirmValueAct: Ask the user if a value has been captured correctly, e.g. "Was that five?"
- SuggestValueAct: Suggest a specific value by means of a yes/no question, e.g. "Is the table for five?"
- LiteralInitiativeAct: A catch-all with no specific semantics that simply represents some literal content.
- SuggestActionAct: Suggest specific actions supported by the control."
Controls need to track state regarding the data the user has provided and context that is necessary to interpret subsequent rounds of input. Controls should be prepared for things to change outside of their direct ownership. A general design to help with this is for controls to only track 'cold hard facts' in its state that are not subject to arbitrary change.
Good facts to track in control.state
:
- The best available value for the information the control is managing. By convention
this should be stored in
Control.state.value
. Also by convention, the information instate.value
is only considered ready for use ifControl.isReady() = true
. In the case of higher-level controls the control should collate the complete information instate.value
or offer a.getValue()
function so that other controls can remain fully unaware whether any child controls are involved in the implementation. - The last question posed to the user. Often stored as
Control.state.activeInitiativeAct
or similar. - Data about the users attention such as the turn number that the control last
communicated to the user and/or asked the user a question. Information of this kind
can be used to restate a question to refresh the user's memory. The current turn
number is made available via
ControlInput.turnNumber
. - How
.state.value
was obtained from the user, from a personalization store, or inferred. This may have a bearing on what type of confirmations are appropriate.
Things that should not be tracked in control.state
:
-
Anything that can be recomputed. For example
-
Whether information in
.state.value
is valid. This should be recomputed on demand as it may change if the validation functions refer to non-local data. -
Whether information is ready for use in business functions. This may change if the user takes initiative and interacts with other controls.
-
-
Notes about what to do on the next turn. This should be re-computable from simple facts and there is no guarantee that a user will follow Alexa's lead.
Some controls obtain and clarify information from the user but do not actually do anything with that information. Controls of this kind makes the information available to their parent controls and might publish the information to a central data store.
The Fruit Shop skill demonstrates patterns of data sharing from child to parent. The
ShoppingCartControl constructor creates three child controls and stashes them as
properties for easy access in the rest of the code. The child controls are added to the
this.children
array via this.addChild()
.
class ShoppingCartControl extends ContainerControl {
constructor(props) {
super(props);
this.state = new ShoppingCartControlState();
this.categoryControl = new ListControl(...props...);
this.productControl = new ListControl(...props...);
this.itemCountControl = new NumberControl(...props...);
this.addChild(this.categoryControl)
.addChild(this.productControl)
.addChild(this.itemCountControl);
}
In ShoppingCartControl.handle()
the state updates are processed and then the state of
the children is observed. The following block observes whether a value has been obtained
for both the product and the count -- if they are both available then we can add the item
to the cart, tell the user that this has occurred with an ItemAddedAct
, and clear out
the input controls.
handle() {
...
// delegate input to appropriate child
// then... see if questions are being asked and whether we have all the necessary data
if (resultBuilder.hasInitiativeAct()) {
return;
}
if (this.productControl.state.value !== undefined
&& this.itemCountControl.state.value !== undefined) {
//add a new (product, count) pair to the cart
}
}
Note that we first check whether an initiative act was asked during the delegation to the child. If so, then the child's data is not yet ready (it is still clarifying something with the user).
The Controls Framework does not provide a centralized data store but you can find the
state for every control via ControlInput.controls
. A centralized store built on
Redux or similar may prove useful for sharing information in large
skills.
The state information is made available to regular RequestHandlers
via
sessionAttributes[ControlHandler.attributeNameState].{controlId}
. State should only be
looked up this way before or after ControlManager.handle()
runs as during execution the
states may be changing.
During the canHandle phase, each control must determine if an input 'makes sense' to the
control given its purpose and current state. Container controls may want to handle the
input directly but otherwise they will usually ask their children via
this.canHandleByChild()
.
Determining if an input makes sense to a control can involve a good amount of logic: does it recognize the intent or APL event? do the details of the input match the control's expectations (slot values, APL token etc), and does the input seem relevant given the control's state. Thus it is important to have coding patterns to manage this complexity.
Ultimately, canHandle is a function (input, state) -> (boolean, handleFunc)
and while
there are many ways to encode such a function, it quickly becomes apparent that there can
both many inputs and many states and that the function isn't particularly smooth. As a
result, encoding the function in a developer-friendly manner can be challenging. Although
advanced cases may encode the canHandle function using a neural-net or similar ML
technique, but at lot can be done with explicit logic.
We recommend starting by implementing an explicit hierarchical decision tree out of predicate functions. The top level of functions define situations that are relevant to the control. A situation may have sub-situations which can be are encoded in additional layers of predicate functions that drill-down. At the lowest level, the predicates are usually simple statements about the input details or the control state.
The decision-tree approach employed by the built-in controls provides a clean structure and keeps the code maintainable.
By way of example, the ListControl
has the following:
canHandle(input: ControlInput): boolean {
return this.isSetWithValue(input)
|| this.isChangeWithValue(input)
|| this.isOrdinalScreenEvent(input)
|| this.isOrdinalSelection(input);
...
}
Each of the predicates tests whether a certain situation is present and if so the
predicate returns true. The ListControl.isSetWithValue()
predicate code is an example of
a bottom-layer predicate function.
isSetWithValue(input: ControlInput): boolean {
try {
okIf(isExpectedIntent(...))
const { feedback, action, target, values, valueType } = unpackValueControlIntent(
(input.request as IntentRequest).intent,
);
const valueStr = values[0].slotValue;
okIf(InputUtil.targetIsMatchOrUndefined(target, this.props.interactionModel.targets));
okIf(InputUtil.valueTypeMatch(valueType, this.getSlotTypes()));
okIf(InputUtil.valueStrDefined(valueStr));
okIf(InputUtil.feedbackIsMatchOrUndefined(feedback, FEEDBACK_TYPES));
okIf(InputUtil.actionIsMatch(action, this.props.interactionModel.actions.set));
this.handleFunc = this.handleSetWithValue;
return true;
} catch (e) {
return falseIfGuardFailed(e);
}
}
The exact predicate functions being called is not critical at this time, only that
isSetWithValue()
encodes a linear sequence of low-level predicates while allowing for
arbitrary code between the predicates. The okIf
utility does nothing if the guard
expression is true but otherwise it throws a GuardFailed
exception. Using exceptions in
this way allows for clean linear code and the cost of using exceptions is trivial.
Also note the the second-last line of the success-path is to record a memo of the handle-function that will be called if this control is selected to be the handling control.
We have found this approach to be a maintainable approach to encoding explicit canHandle logic. Also, this approach is amenable to gradual adoption of ML techniques. Fixed-logic can be switched out with machine-learned functions internally and subjective decisions, such as how to pick a dialog strategy to use in a given situation, can be exposed as functions to be customized by the user of the control or hooked up to a policy-learning solution.
Note again that canHandle
functions must not not make any durable state changes because
there are often situations where multiple controls reply canHandle = true
and the parent
decides a winner. Updating this.handleFunc
is not considered a durable state change
because it is not part of this.state
and will not be kept after the turn completes.
At the end of the canHandle phase, a chain of controls have been identified to participate in the handling phase. Each control in the chain delegates to the next in the chain, and subsequently observes the result and performs additional work if necessary.
A common pattern is demonstrated by the DateRangeControl
:
async handle(input, resultBuilder) {
// if handleFunc memo exists, it will do all the work.
if (this.handleFunc) {
return this.handleFunc(input, resultBuilder);
}
// Otherwise, delegate handling to the child...
await this.handleByChild(input, resultBuilder);
// ... then do some post-delegation work.
const newStartDate = this.getStartDateFromChild();
const newEndDate = this.getEndDateFromChild();
this.setStartDate(newStartDate);
this.setEndDate(newEndDate);
this.state.onFocus = false;
// After child handle the request and child has no question
// ask DateRangeControl whether the value is ready
if (!resultBuilder.hasInitiativeAct() && await this.canTakeInitiative(input)) {
if (this.takeInitiativeFunc !== undefined) {
this.takeInitiativeFunc(input, resultBuilder);
}
}
}
Recall that the initiative phase only runs if the handle phase did not produce any initiative and the purpose of the initiative phase is to determines the next some control to take over. The control that ultimately wins will often be a sibling of the handling control but it could legally be any control in the tree. The search for a control to take the initiative is a major mechanism by which the skill will "move to the next topic" and/or "return to previous topics" without rigid coupling in the code.
During canTakeInitiative
, each control must determine if it would like to take over the
conversation; a control should do this if it has not yet fulfilled its mission and it
should refrain if its mission is complete.
The NumberControl
, for example, has the mission of obtaining a number from the user and
ensuring it passes validation and confirmation. And if the props for the NumberControl
include required: false
, or a function that evaluates to false, then the NumberControl
does not actually have to obtain a number at all. So, for the NumberControl
, it offers
to take the initiative if:
props.required = true
and a value has not yet been obtained, or- the obtained value fails validation, or
props.confirmationRequired = true
and the obtained value has not been positively confirmed with user.
This is encoded as:
class NumberControl {
canTakeInitiative(input: ControlInput): boolean {
return (
this.wantsToElicitValue(input) ||
this.wantsToRequestReplacementForInvalidValue(input) ||
this.wantsToConfirmValue(input)
);
}
}
Container controls typically consult their children via .canTakeInitiativeByChild()
and
determine a winner amongst any candidates with .decideInitiativeChild(candidates)
. The
default logic in ContainerControl.decideInitiativeChild
will bias to the child that most
recently had the initiative, but otherwise it just picks the first candidate. This will be
sufficient for many situations and can be overridden as needed. When applied recursively
by all containers, this default logic can be summarized as "choose the next control in the
natural tree ordering, or the most recently active control, or an arbitrary control."
The actual coding of canTakeInitiative
is essentially identical to canHandle
and
typically simpler as there is no input to consider. The canTakeInitiative
function is a
mapping of (input, state) -> (boolean, handleFunc)
and can be implemented in many ways;
a good place to start is to encode a decision tree as described in the section for
'canHandle'. Later, if necessary, some predicates might be replaced with machine-learned
functions and subjective choices about what strategy to take (i.e. what
initiative-function to run) could defer to a learned model.
The output of calling rootControl.canTakeInitiative
is a boolean and if true, we can
walk the container controls and observe their .selectedInitiativeChild
to extract the
initiative-chain that will be traversed during the called to rootControl.takeInitiative
.
The task of control.takeInitiative()
is to generate an initiative act and add it to the
ControlResult
. The process is very similar to handle()
and typically simpler as there
is no input to consume and hence far fewer state changes to make. It is often sufficient
to add an initiative act to the result and simply record that this was done in
state.activeInitiativeAct
or similar. Tracking the last initiative act provides
sufficient information to react on subsequent turns. For example, if a yes/no confirmation
question is asked, via result.addAct(new ConfirmValueAct(...)
, the control should record
this and only report canHandle = true
for an Amazon.YesIntent
/ Amazon.NoIntent
in
subsequent turns if the control has actually asked a yes/no question.
In large skills and advanced scenarios, the value of the turn counter can be stored and
used to determine if a previously asked question was so far in the history that it should
be treated as stale and discarded. The built-in ContainerControl
tracks information of
this kind but does not use it in the default logic.
Control.renderAct(act, responseBuilder)
is optionally called by
ControlManager.render(result)
and with the task of appending information to the
responseBuilder to help build up a complete response. So, for example, an
address-obtaining control might implement renderAct
as follows:
class CustomAddressControl {
renderAct(act, input, responseBuilder) {
if(act instanceof RequestValueAct){
responseBuilder.addPromptFragment('What is your address?')
responseBuilder.addRepromptFragment('Could you please tell me your address?')
}
...
}
}
If there is no need to specialize the response for a system act by Control, then it is
simplest to let the act do its own rendering. This is often a useful pattern when a custom
system act is used in only one control of an application. To achieve this, simply call
act.render(input, responseBuilder)
. The Control
abstract base class implements this as
the default for its renderActs()
and so a class that extends Control
gets this
'delegate to the act' behavior for free.
A container control has child controls and is responsible for them. This responsibility
is absolute and forms a strict hierarchy of command from the root to the leaves. Under
this pattern every container control must ensure that after every call to handle
and
takeInitiate
the control and its children are in a consistent state. To an outside
observer a parent control with many children should be indistinguishable from a single
control: the contract and expectations are identical. This strictness provides a clean
abstraction and allows container controls to perform any manipulations they wish on their
children.
When delegating to a child control, there is sometimes a decision to be made between:
- delegate, then patch up the
Result
, and - directly manipulate the child and prepare the
Result
directly.
We recommend the second style as it is more direct and more comprehensible to a reader and
because it is simpler to follow during debugging. Note that when using the second style it
is perfectly legal for a container control to add a SystemAct
to the ControlResult
that refers to a child control. The container may do whatever it wants given that it takes
full responsibility.
Controls should only serialize their business state. Any ephemeral data, such as memos
taken during canHandle
and canTakeInitiative
should not be serialized; this is easily
implemented by storing the memos in properties outside of control.state
.
Controls should serialize the minimal information required to re-establish themselves on
the next turn. For example, if a system act is recorded in the state it should record only
the ID of the control it is associated with and not the complete Control
object. Logic
of this kind is implemented as the default in ContainerControl
.
The interaction model is a critical part of any Alexa skill as it defines the language that customers can use to interact with your skill. The Controls Framework uses the regular Alexa interaction model and introduces new patterns to standardize the Intents and Slots used. The primary reason to do this is to support independently developed components to work together. A secondary reason is simply to share best practices around interaction model design.
The built-in Controls can generate the interaction model that they need to function. All the built-ins follow the patterns and thus avoid any utterance conflicts. Custom controls should take these patterns into account else they may introduce conflicts and break the behavior of built-in controls.
The most important concept is that Controls must agree on the intents that will be used to represent common utterances. So, for example, if two independent controls need to understand the phrase "U: yes, the second one" then they must agree on the intent that will match it else there will be utterance conflicts and the controls will not see the inputs they expect.
As we will see, the phrase "U: yes, the second one" will be recognized by the common
AMAZON_ORDINAL_ValueControlIntent
and all Controls that react to that phrase should
expect this intent. The built-in ListControl not only listens for this intent but it can
generate it and add it to the interaction model automatically!
To see the problem more clearly, imagine some custom control that wants to react to the
phrase "U: yes, the second one" and does so by listening for a CustomOrdinalIntent
and
the developer manually creates that intent in the interaction model. If both
CustomOrdinalIntent
and AMAZON_ORDINAL_ValueControlIntent
are in the model together
then there will be utterance conflicts and NLU will randomly select one or the other when
the user gives the magic phrase. This can break one or both controls as they will not
observe the inputs they need to function. There are two solutions:
-
The developer can change their custom control to expect the
AMAZON_ORDINAL_ValueControlIntent
and ensure it is added to the interaction model. This solution ensures only one intent in the model wil match the phrase. -
The developer can make sure that both their custom control and all instances of built-controls will react appropriately to both intents. This is more work than the first solution and so it is less interesting. The tricky bit is teaching a built-in control about an intent is knowns nothing about: see the
inputHandling.customHandlingFuncs
prop on the built-in controls for the standard solution.
The first convention is that the AMAZON.* intents are part of the standard schema except
where they cause unacceptable utterance conflicts. At this time, it is only the
AMAZON.NextIntent
that cannot be used as it captures many utterances such as "change
that" which is overly broad and thus limits it usability.
- All
AMAZON.*
intents exceptAMAZON.NextIntent
are part of the standard schema. Any utterance that can be captured by anAMAZON.*
intent should be.
The existing set of AMAZON.*
intents is not sufficient for building skills as they do
not carry enough information and they are not generic enough. To built a robust voice
interface we need to support utterances like the following:
U: Yes, two apples.
U: No, I said three.
U: Can you change the time please?
U: I need to update that.
U: Put it on my visa.
U: Put apples into the bag.
U: I'm not sure but I'm thinking Tuesday
- etc.
The second convention is to introduce new intent shapes that capture these types of utterances. These new intents are called "Control Intents" which simply indicates that they follow the following rules:
- Phrases such as "yes", "no", "I'm not sure" that are the major component of an answer to a question are captured by a slot called "feedback".
- Phrases such as "change", "select", "print" that indicate what to do are captured by a slot called "action".
- Phrases such as "the time" and "my visa" that indicate what to act upon are captured by a slot called "target".
- Phrases such as 'two', 'apples', and 'tomorrow' that indicate a value are captured by slots that are named with the SlotType that they capture. If there is more than one slot for a given type, they are named with suffix ".a", ".b" and so on.
- additional slots are defined to help with fine-grained understanding. For example, a slot called "preposition" is defined to help distinguish between "in to", "onto" etc. This fine-grained information may not be used but it is preferable to capture slots rather than to lose the information.
The Feedback
slot type represents things the user says to Alexa about "her recent
statements". That is, the user is giving feedback. For example "not that's not right" is a
feedback utterance, as is "awesome, ...". The built-in feedback slot values are
builtin_affirm
: "Agree". Synonyms: "yes", "great", etc.builtin_disaffirm
: "Disagree". Synonyms: "no", "oh no", "ugh", etc.
More standard feedback values may be added in time, and developers can and should add additional feedback slot values.
The Action
slot type captures the primary verb of a input. This allows the user to
mention what they are asking Alexa 'to do'. For example "delete" and "deploy" should be
captured as actions. Controls in the skill know the actions that they can perform and they
will not attempt to handle utterances containing unknown actions.
The built-in action slot values are:
builtin_set
: "Provide/Set a value". Synonyms: "set", "assign", "make it", "should be" etc.builtin_change
: "Change a value". synonyms: "change", "update", "make it", etc.builtin_select
: "Select a value". Synonyms: "select", "pick", "take", etc.
More standard feedback values may be added in time, and developers can and should add additional action slot values.
The Target
slot captures the primary noun of an input. The user will not always mention
a target but when they do it is extremely useful information. Controls know what targets
they are associated with and they will not attempt to handle inputs that mention an
unknown target.
The built-in target slot values are:
- builtin_it: "The anaphoric noun of the conversation". Synonyms: "it", "that", etc.
- builtin_date: "The date of something". Synonyms: "date", "the date", "day", etc.
- builtin_startDate: "The start date of something". Synonyms: "start date", "the start date", etc.
- builtin_number: "The number of something". Synonyms: "the number",
- and various others. See
Strings.Target
for the full list of built-in targets.
More standard target values may be added in time, and developers can and should add
additional target slot values. It is very usual for Control instances to need new target
values to indicate "what the control is capturing". For example, if a NumberControl
will
be used to collect the number of tickets for an event, it should have at least one custom
target slot value like so that the user can be understood when they say things like "U: I
need to change the number of tickets".
Example:
{
"interactionModel": {
"languageModel": {
"types": [
{
"name": "target",
"values": [
...,
{
"id": "eventTicketCount",
"name": {
"value": "eventTicketCount",
"synonyms": [
"number of tickets",
"the number of tickets",
"tickets",
"tickets for the game"
]
}
}
]
},
...
Robust skills must deal with target ambiguity. For example, a skill might want to capture
the user's birth date and also the date of their birthday celebration. This will be solved
by having one DateControl
for each piece of data and assigning them both generic and
specific targets:
- DateControl for birth date:
Targets: builtin_it, builtin_date, birthDate
- DateControl for birthday event:
Targets: builtin_it, builtin_date, eventDate birthdayEventDate, celebrationDate
Note that some targets are the same whereas others differ. Thus if the user says "U: I
need you to change the date" we can expect both controls to report canHandle = true
as
they both are managing some data to do with builtin_date = "the date"
. This is a
situation with ambiguity and it will be the responsibility of their parent controls to
decide a winner. A common solution will be to choose the control that most recently
'spoke' to the user but perhaps different logic will apply. (Note that the builtin
ContainerControl
has default logic to track the its recently active child.)
However, if the utterance is "I need you to change the date of my celebration" then only
the second Control will report canHandle = true
and the parents will not have to resolve
any ambiguity. Targets thus give the user the power to precisely describe what they are
talking about, if and when they like.
These three slot types provide a compact way to include many common "carrier phrases" in the sample utterances. For example the phrase "Could you please ..." is a standard value for the Head slot along with many others. As such a sample utterance like "{head} {action} {target}" will be able to match the utterance "U: Could you please change the time".
The head, tail and preposition slots are "mostly content free" in the sense that the specific value that a user matches is not used in dialog logic. However, if it proves necessary to distinguish the values, that can be accommodated by virtue of capturing the phrase as a slot value.
The GeneralControlIntent
defines the most general purpose common utterance shapes (See
systemResource.en.translation.GENERAL_CONTROL_INTENT_SAMPLES
). A selection of the sample
utterances are:
Utterance 'shape' definition Sample user utterance w/ slot-matching
---------------------------- ---------------------------------------
"{feedback} {action}", "{Yes}, {delete}"
"{feedback} {action} {target}", "{No}, {change} {the delivery date}"
"{feedback} {tail}", "{Yes}, {thanks}"
"{feedback} {action} {tail}", "{Yes} {review} {would be great}"
"{feedback} {action} {target} {tail}", "{Yes} {change} {the delivery date} {please}"
"{action} {target}", "{change} {start date}"
"{head} {action}", "{just} {pause}"
"{head} {action} {target}", "{just} {delete} {it}"
"{action} {tail}", "{delete} {is correct}"
"{action} {target} {tail}", "{update} {my address} {please}"
"{head} {action} {tail}", "{go ahead and} {delete} {thanks}"
"{head} {action} {target} {tail}", "{go ahead and} {delete} {the item} {thanks}"
Note that this short set of sample utterance definitions can capture a wide variety of
user utterance by means of the feedback, action, target, head, and tail slot types. This
means that individual controls can simply define new targets etc and listen for the
GeneralControlIntent
. The built-in controls, such as ListControl
, allow the developer
to customize the targets associated with the control so that each ListControl
instance
can be directly targeted by user utterances.
The built-in controls allow the developer to define the actions and targets that should be
recognized by means of props. For example, a NumberControl
that collects the number of
diners for a reservation would be configured as:
1. new NumberControl(
2. {
3. id: 'dinerCount',
4. interactionModel: {
5. targets: ['builtin_it', 'count', 'dinerCount', 'attendees', 'partySize']
6. }
7. }
8. );
- Line 5- Each of the target entries is the ID of a value item in the Target slot
type. This line declares that any
ControlIntent
that has its target slot filled with one of those IDs should be considered relevant to this control. Likewise, if the target slot is filled with anything else, this control will assume it is not relevant.
Actions represent the verbs that are relevant to a control. By default a NumberControl
only knows about the set
and change
actions, and they are associated with the set
and change
capabilities of the skill. Let's imagine a restaurant skill want to
understand user utterances like "U: Increase the party size". Be default, the word
"increase" is not understood by the dinerCount control. To fix this, we decide that the
verb "increase" should be associated with the change capability for the NumberControl (as
it doesn't actually have the "increase capability" yet). We then need to decide if the
word "increase" should be treated as an exact synonym of "change" or whether it should
have its own identity. Our general guidance is to define new actions/targets when the
semantics do seem meaningfully different. So, we update the definition to:
1. new NumberControl(
2. {
3. id: 'dinerCount',
4. interactionModel: {
5. targets: ['builtin_it', 'count', 'partySize'],
6. actions: {
7. change: ['builtin_change', 'increase']
8 }
9. }
10. }
11. );
- Line 7: Note that when overriding a prop we must specify all the action IDs,
including the built*ins that we wish to keep. Line 7 specifies that an Action slot
value with
id = builtin_change | increase
should be considered a match for the "Change capability" of this Control. As a result, the utterance "U: Increase the party size" will be interpreted as(action=increase, target=partySize)
and recognized by this control as an attempt to change the control's value. The control will thus reportcanHandle = true
unless it's internal state makes that behavior inadmissable.
To complete the skill, the novel targets and actions must be added to the interaction model. If the decision is made to treat new words as exact synonyms for existing built-ins, that information must also be added to the interaction model.
For a skill that uses built-in controls the bulk of the interaction model can be produced by the following script:
new ControlInteractionModelGenerator()
.buildCoreModelForControls(new MyControlManager())
.withInvocationName('my invocation phrase')
.buildAndWrite('en-US-generated.json');
console.log('done.');
The output file will include all the intents, slot types, and utterances shapes that are
needed to run straight-forward interactions. The actual prompt fragments will use their
default localized strings for the en-US
locale.
The auto-generated interaction model will not have any of the custom data that is necessary for custom actions, targets, and so on. This additional data can be added by extending the interaction-model building script.
Example:
1. new ControlInteractionModelGenerator()
2. .withInvocationName('my invocation phrase')
3.
4. .addValuesToSlotType('target', {
5. id: 'partySize',
6. name: {
7. value: 'party size',
8. synonyms: ['number of folks attending', 'people joining us', 'attendees' ]
9. }
10. })
11.
12.
13. .addValuesToSlotType('action', {
14. id: 'increase',
15. name: {
16. value: 'increase',
17. synonyms: ['more', 'be more', 'additional' ]
18. }
19. })
20. .buildCoreModelForControls(new MyControlManager())
21. .buildAndWrite('en-US-generated.json');
22. console.log('done.');
- Line 5-11: Adds a new
target
slot value with id='partySize'. The associated words can now be understood in any sample utterance that uses the{target}
slot. - Line 5-11: Adds a new
action
slot value with id='increase'. The associated words can now be understood in any sample utterance that uses the{action}
slot.
As user-testing uncovers utterance coverage gaps the auto-generated interaction model can also be extended and amended via code.
The standard ControlIntents are general-purpose and hence do not cover all the interesting utterances that a skill may need. For example if a skill must react to an utterance like "Add two apples to the shopping cart" then a new Custom Intent should be created. The built-in controls will not recognize such intents but we can still arrange to update them in response to such inputs. The main idea is to have a parent control handle these inputs nd manipulate its child controls as necessary. See the FruitShop skill which demonstrates this.
This section is a quick walk-through of the steps to add Controls to an existing skill.
The Controls Framework provides various built-in controls that offer solutions for common interactions scenarios such as presenting a list of options to a user. These controls include their own interaction models which can be merged with your existing skill's interaction model.
As an example, we'll add a ListControl that asks users to select their favorite color to an existing skill:
-
Instantiate the ListControl within a custom ControlManager.
// index.js const { ListControl, ControlManager, ControlHandler } = require('ask-sdk-controls'); ... class SkillControlManager extends ControlManager { createControlTree() { const colorsListControl = new ListControl({ prompts: { requestValue: "What is your favorite color?", valueSet: (act) => { return `I'll remember your favorite color is ${act.state.value}`; } }, listItemIDs: () => ["red", "green", "blue", "yellow", "purple"] }); return colorsListControl; } } exports.handler = Alexa.SkillBuilders.custom() .addRequestHandlers(new ControlHandler(new SkillControlManager())) .lambda();
-
Create a
build_interaction_models.js
script to generate and merge the ListControl's interaction model with your skill's own interaction model:const { ControlInteractionModelGenerator } = require('ask-sdk-controls'); new ControlInteractionModelGenerator() .loadFromFile('./my-en-US.json') // load hand-written model, if needed .buildCoreModelForControls(new SkillControlManager()) .buildAndWrite('./en-US-generated.json');
-
Run
build_interaction_models.js
script to build a new interaction model that includes auto-generated intents and slots for the built-in controls:node ./build_interaction_model.js
-
Replace your skill package's interaction model with the generated version.
mv ./en-US-generated.json ./skill-package/interactionModels/custom/en-US.json
Many developers use the Dialog Management features and the Dialog Interface to define dialog behavior. These dialog models are rules about intents and slots that encoded as JSON and evaluated by an Alexa service rather than the skill code.
Transitioning from the Dialog Model to the Controls Framework requires a change in the mental model regarding the purpose of Intents. Specifically, intents and slots are just a input mechanism that is comparable to APL events or even to a keyboard. Intents and slots bring data in but they are not things to confirm and validate, nor a place to store data turn-over-turn. Rather, it is the job of a control to collate the raw information from intents, and other sources, and then to validate, disambiguate and confirm the information, regardless of how it was obtained. This approach generalizes all the input sources and moves the logic into the skill code where it can be customized and extended s needed. Even in the simplest situations, such as obtaining a single number, this approach allows validations and so on to be performed with arbitrary code.
To convert an existing skill to use the Control Framework, keep the following conversion strategies in mind:
- A DialogModel involving a single slot data input converts to a
ValueControl
, which is the simplest and most general control.ListControl
and other specific controls offer more functionality "for free" and should be used when applicable. Custom controls can be used for advanced scenarios. - A Dialog model for a multi-slot intent converts to a
ContainerControl
with a child control for each slot. - "Required slots", "slot validation" and "slot confirmation" all convert to
configuration on the
ValueControl
(or other control types). In particular, the props.required
,.validation
and.confirmationRequired
control equivalent behaviors. - The order in which to request slots converts from a static ordering to arbitrary logic
in
ContainerControl.decideInitiativeControl()
. - "Intent confirmation" converts to custom logic in
ContainerControl
where it can be customized to provide context-sensitive behavior. - Prompt content can be configured on a
ValueControl
either literally or symbolically.
The Dialog Model also defines dialog directives that can be manually added to a
Response
. These dialog directives continue to be available via ControlResponseBuilder
and the Dialog.ElicitSlot
directive
is of particular use. Controls should use Dialog.ElicitSlot
whenever they directly ask
the user for a value as it can dramatically increase speech recognition quality of the
user's response. Dialog directives can set a updatedIntent
to sequence stateful intents
together. See the built-in control implementations for examples of using
Dialog.ElicitSlot
.
The Controls Framework is built to be used internationally. The framework codebase currently contains localization data (strings, etc) for 'en-US' and new locales can be supported by providing the appropriate translation information.
To start, take a look at systemResources
defined in LanguageStrings.ts
. New
translations can be added for additional locales (and can be shared with other developers
by submitting a pull-request!).
The new translations can be defined on a new object that follows the same schema. At
runtime the new translations will be merged with the built-in translations. If you wish to
update the en-US
defaults, you can make targeted updates without having to take a copy
of all the existing defaults.
export const myResources: Resource = {
en: {
translation: {
DATE_CONTROL_DEFAULT_PROMPT_VALUE_SET: 'Got it.', // a single en-US override
},
},
'es-ES': {
translation: {
// complete content for es-ES
// ...
},
},
};
Use new translations at build time by updating the interaction model generator script:
1. new ControlInteractionModelGenerator()
2. .withInvocationName('my invocation phrase')
3. .buildCoreModelForControls(
4. new MyControlManager( {locale: 'es-ES', i18nResources: myResources }))
5. .buildAndWrite('es-ES-generated.json');
6. console.log('done.');
- Line 3: The locale of the interaction model to generate is specified, as is the bag of additional localization data.
At runtime there are two i18n instances: one is used by the framework and is configured
during construction of the ControlManager
. The other is for the custom skill and can be
used to localized skill-specific strings that are passed as props etc.
- To adjust the built-in localization data for the built-in controls, provide your overrides during construction of the ControlManager, as done for interaction model building.
- To localize skill-specific strings, stand up your own i18n manager and use it when providing strings as props and so on. This is demonstrated below:
1. const LocalizationRequestInterceptor = {
2. process(handlerInput) {
3. const locale = getLocale(handlerInput.requestEnvelope);
4. updateI18n(locale);
5. }
6. };
7.
8. function updateI18n(locale) {
9. i18n.init({
10. lng: locale,
11. fallbackLng: 'en',
12. resources: myResources,
13. });
14. }
15. exports.handler = SkillBuilders.custom()
16. .addRequestInterceptors(LocalizationRequestInterceptor)
17. .addRequestHandlers(controlHandler)
18. .lambda();
- Line 1: Creates a new interceptor that will run at the start of every turn
- Line 3: Get the user's locale from the request
- Line 4: Call to update/initialize the skill-specific i18n system
- Line 9-13: Initialization of the skill-specific
i18next
instance - Line 16: Register the request interceptor.
The rest of the skill can use this custom i18n instance when configuring control. The
Fruit Shop skill demonstrates this. For example, the Shopping cart control initializes a
ListControl
with a custom and localized prop:
this.categoryControl = new ListControl(
{
...
prompts: {
requestValue: i18n.t('CATEGORY_CONTROL_REQUEST_VALUE'),
...
To get the Controls Framework source code for perusal or for development, clone the ASK-SDK Controls Github repo.
git clone https://github.com/alexa/ask-sdk-controls
The regression tests included in the Controls Framework are a good resource for learning. One way to use them is to step through individual tests inside Visual Studio Code:
- Clone the framework code.
- Copy the launch.json from
./ide/vscode/launch.json
to./vscode
. - Find an interesting test, mark it as
test.only()
then step into it withF5
(Debug: Start Debugging)
For per-class and per-method documentation, please see the Technical documentation.