Hacking Amazon Alexa with Java

For the recent AT&T IoT Hackathon in Dallas, we decided to try something new and make an Amazon Echo Dot a central part of our project. Our project used a Raspberry Pi with a camera to detect when the lever on a coffee airpot is pushed down, and capture a picture. We then fed the picture through IBM Watson for facial recognition, and wrote the name and the image to an S3 bucket.
coffee-pot
This is where Alexa took over. I wrote a Amazon Lambda function in Java which read the S3 bucket and exposed two intents. The first was to ask “who took the last cup?” The function would respond with the name, which came for a text file in the S3 bucket. The second intent was more fun. You could then tell Alexa to “shame them”. This posted a Tweet with the image of the person and a caption saying they took the last cup of coffee.

We actually got this all working in a day. I handled the Alexa side of the project, while my teammate handled the Pi and Watson. The biggest challenge was figuring out how to actually get Lambda and Alexa playing together nicely using Java.

Amazon produces a lot of doc about Alexa, and about Lambda, but very little deals with using the two of them together with Java. Most of the examples are for NodeJS. There were a lot of tutorials out there using NodeJS, most of mixed quality. In the interesting of improving the situation for us Java developers, I’ll share my lessons learned and walk through how to get this setup.

For the TL;DR crowd, you can grab the project source off my GitHub project and be sure to look at the examples in the Alexa Skills Kit Java SDK.

Creating your Project

First off, ignore all the “Using Lambda with the Eclipse SDK” tutorials. You do not want to do this as you’ll just be wasting your time. You need to be using the Java Alexa Skills Kit SDK. The jar is available in Maven Central, and all the source is in the GitHub repository. More importantly, the SDK includes numerous examples for how to use the SDK. For working with Alexa and Java, reading the source is the only reliable option.

Ultimately, Alexa cares about JSON payloads. The Skills Kit SDK is essentially a bunch of wrapper classes around the JSON exchange between Lambda and Alexa. This is the reason the other tutorials you’ll find don’t work with Alexa. You can’t have a Lambda that simply takes a String and returns a String. You need to implement a Speechlet, which takes a SpeechRequestEnvelope and returns a SpeechResponse.

For the initial project structure, I used Gradle. Since I’m talking to S3 and Twitter, I also have dependencies for those. You can trim them out if you’re not using them for your own project.

group 'org.sporcic'
version '1.0'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    compile 'com.amazon.alexa:alexa-skills-kit:1.2'
    compile 'com.amazonaws:aws-lambda-java-core:1.1.0'
    compile 'com.amazonaws:aws-lambda-java-events:1.3.0'
    compile 'com.amazonaws:aws-lambda-java-log4j:1.0.0'

    compile 'com.amazonaws:aws-java-sdk-s3:1.11.56'
    compile 'org.twitter4j:twitter4j-core:4.0.5'

    compile 'log4j:log4j:1.2.17'
    compile 'org.slf4j:slf4j-api:1.7.0'
    compile 'org.slf4j:slf4j-log4j12:1.7.0'
}

task buildZip(type: Zip) {
    baseName = 'coffeeStatus'
    from compileJava
    from processResources
    into('lib') {
        from configurations.runtime
    }
}

build.dependsOn buildZip

line 6 : make sure you set your sourceCompatibility to 1.8, as Amazon Lambda uses Java 8
lines 13-16 : these are the core Amazon Lambda and Alexa SDK libraries. You need them.
lines 18-19 : I need these since I’m talking to S3 and Twitter. Remove them if you aren’t
lines 21-23 : the logging libraries you’ll need for S3
lines 26-35 : to deploy Java to Amazon Lambda, it has to be packaged as a zip file, with all the dependencies inside a
directory called lib inside the zip file. This Gradle task takes care of that for you, and adds the tasks onto the normal build task.

This is all the Gradle file you need to write a function for Amazon Lambda. You can add additional dependencies depending on what you’re trying to do. You will upload this jar via the Amazon Lambda management console.

Now you need to create your SpeechletRequestStreamHandler implementation. This is a pretty simple class:

package org.sporcic;

import java.util.HashSet;
import java.util.Set;
import com.amazon.speech.speechlet.lambda.SpeechletRequestStreamHandler;

public class CoffeeStatusSpeechletRequestStreamHandler extends SpeechletRequestStreamHandler {

    private static final Set<String> supportedApplicationIds = new HashSet<String>();

    static {
        String appId = System.getenv("APP_ID");
        supportedApplicationIds.add(appId);
    }

    public CoffeeStatusSpeechletRequestStreamHandler() {
        super(new CoffeeStatusSpeechlet(), supportedApplicationIds);
    }
}

line 7 : name the class what you want, but you’ll use the fully qualified name of this class in the name of the handler in the Lambda configuration
lines 12-13 : the Skills SDK has logic to verify the application ID of the caller to the Lambda function. Rather than hard coding the application ID of the Alexa Skill in code, I ready it from an environment variable configured in the Lambda Management console.
line 17 : you need to implement a no-arg construction which calls super() with an instance of your Speechlet and the Set of your authorized application IDs

One final piece of setup is to create a log4j.properties file in the src/main/resources of your project. This is necessary to use logging inside of your Lambda function. The file needs to contain this configuration:

log = .
log4j.rootLogger = DEBUG, LAMBDA

#Define the LAMBDA appender
log4j.appender.LAMBDA=com.amazonaws.services.lambda.runtime.log4j.LambdaAppender
log4j.appender.LAMBDA.layout=org.apache.log4j.PatternLayout
log4j.appender.LAMBDA.layout.conversionPattern=%d{yyyy-MM-dd HH:mm:ss} <%X{AWSRequestId}> %-5p %c{1}:%L - %m%n

NOTE: Be sure to change the level of the rootLogger before you go to production!

Now comes the fun of implementing your Speechlet. Like a Servlet, the Speechlet interface defines the lifecycle methods for handling requests from Alexa. I inspired my code from the Helloworld Speechlet in the Skills SDK. The primary difference is I used the new SpeechletV2 interface.

The SpeechletV2 interface defines four lifecycle methods Alexa will use to interact with your Lambda function:

public interface SpeechletV2 {

    void onSessionStarted(SpeechletRequestEnvelope<SessionStartedRequest> requestEnvelope);

    SpeechletResponse onLaunch(SpeechletRequestEnvelope<LaunchRequest> requestEnvelope);

    SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope);

    void onSessionEnded(SpeechletRequestEnvelope<SessionEndedRequest> requestEnvelope);
}

The primary method you’ll interact with is the onIntent() method. Here’s my implementation for Skill with two intents:

    @Override
    public SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope) {
        log.info("onLaunch requestId={}, sessionId={}",
                requestEnvelope.getRequest().getRequestId(),
                requestEnvelope.getSession().getSessionId());

        Intent intent = requestEnvelope.getRequest().getIntent();
        String intentName = (intent != null) ? intent.getName() : null;

        if ("CoffeeStatusIntent".equals(intentName)) {
            return getCoffeeStatusResponse();
        } else if("ShameUserIntent".equals(intentName)) {
            return tweetTheShame();
        } else {
            return getUnknownCommandResponse();
        }
    }

lines 3-5 : just shows logging is handled the same as about every other application, along with how to get the request and session IDs
lines 7-8 : you get the Intent off the request, and can get the actual name by calling getName() to decide what you’re going to do. These are the same intent names defined in the interaction model in the Alexa Skill Kit configuration.
lines 10-16 : I evaluate the String value for the Intent and call another function for each intent. I also have a fall through function which returns a generic unknown command response.

Now lets walk through one of the functions that builds the SpeechletResponse:

private SpeechletResponse getWelcomeResponse() {
        String speechText = "Welcome to Coffee Status";

        SimpleCard card = new SimpleCard();
        card.setTitle("Coffee Pot");
        card.setContent(speechText);

        PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
        speech.setText(speechText);

        return SpeechletResponse.newTellResponse(speech, card);
    }

lines 4-6 : while the Echo’s are voice devices, Alex also had the mobile application. The cards (SimpleCard and StandardCard) define what shows up in the Alexa application as a result of the voice interaction. The SimpleCard only displays text, while the StandardCard provides the ability to include an Image.
lines 8-9 : this is where we define what gets said back to the user via Alex
line 11 : now that we have the Card and the OutputSpeech, we use a static factory method on the SpeechletReponse to build the response. The response can either be a “Tell” response, which simply states the OutputSpeech text, or an “Ask” response, which says the OutputSpeech and then prompts the user to provide additional information which can continue the user’s session.

The Intent provides access to the Slots data, which were defined in the Alexa Skill interaction model. The History Buff example in the Alexa Skills SDK is an excellent example of how to get data from the slots and have an interaction with the user.

Once all the code is ready, do a standard ./gradlew build to generate the zip file for upload to the Lambda Management console. The zip is place in the build/distributions directory of your Java project.

One final note: the SDK lays down a pattern for adding the configuration of your Intents and Sample Utterances to the code repository. The pattern is to create a speechAssets folder under the directory your Speechlet is in. The two files you’ll create are IntentSchema.json and SampleUtterances.txt. Here are examples of mine:

{
  "intents": [
    {
      "intent": "CoffeeStatusIntent"
    },
    {
      "intent" : "ShameUserIntent"
    }
  ]
}
CoffeeStatusIntent who took the last cup of coffee
CoffeeStatusIntent who took the last cup
CoffeeStatusIntent who was the last person to get coffee
CoffeeStatusIntent what jerk took the last cup
CoffeeStatusIntent what jerk took the last cup of coffee
ShameUserIntent to shame them
ShameUserIntent shame them

Having these in your source code makes them easier to edit, since you can just copy/paste them into correct fields in the Alexa Skill configuration. And having them under thumb also helps as a reference for developing your intents.

This takes care of the code. In my next post, I’ll cover how to deploy this to Amazon Lamba, and how to configure and text the Alexa skill.

One thought on “Hacking Amazon Alexa with Java”

  1. Hi very useful for a beginner of Alexa. Have you published subsequent details yet?

Leave a Reply

Your email address will not be published. Required fields are marked *


*