Category Archives: Programming

Hacking Amazon Alexa with Java

For the recent AT&T IoT Hackathon in Dallas, we decided to try something new and make an Amazon Echo Dot a central part of our project. Our project used a Raspberry Pi with a camera to detect when the lever on a coffee airpot is pushed down, and capture a picture. We then fed the picture through IBM Watson for facial recognition, and wrote the name and the image to an S3 bucket.
coffee-pot
This is where Alexa took over. I wrote a Amazon Lambda function in Java which read the S3 bucket and exposed two intents. The first was to ask “who took the last cup?” The function would respond with the name, which came for a text file in the S3 bucket. The second intent was more fun. You could then tell Alexa to “shame them”. This posted a Tweet with the image of the person and a caption saying they took the last cup of coffee.

We actually got this all working in a day. I handled the Alexa side of the project, while my teammate handled the Pi and Watson. The biggest challenge was figuring out how to actually get Lambda and Alexa playing together nicely using Java.

Amazon produces a lot of doc about Alexa, and about Lambda, but very little deals with using the two of them together with Java. Most of the examples are for NodeJS. There were a lot of tutorials out there using NodeJS, most of mixed quality. In the interesting of improving the situation for us Java developers, I’ll share my lessons learned and walk through how to get this setup.

For the TL;DR crowd, you can grab the project source off my GitHub project and be sure to look at the examples in the Alexa Skills Kit Java SDK.

Creating your Project

First off, ignore all the “Using Lambda with the Eclipse SDK” tutorials. You do not want to do this as you’ll just be wasting your time. You need to be using the Java Alexa Skills Kit SDK. The jar is available in Maven Central, and all the source is in the GitHub repository. More importantly, the SDK includes numerous examples for how to use the SDK. For working with Alexa and Java, reading the source is the only reliable option.

Ultimately, Alexa cares about JSON payloads. The Skills Kit SDK is essentially a bunch of wrapper classes around the JSON exchange between Lambda and Alexa. This is the reason the other tutorials you’ll find don’t work with Alexa. You can’t have a Lambda that simply takes a String and returns a String. You need to implement a Speechlet, which takes a SpeechRequestEnvelope and returns a SpeechResponse.

For the initial project structure, I used Gradle. Since I’m talking to S3 and Twitter, I also have dependencies for those. You can trim them out if you’re not using them for your own project.

group 'org.sporcic'
version '1.0'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    compile 'com.amazon.alexa:alexa-skills-kit:1.2'
    compile 'com.amazonaws:aws-lambda-java-core:1.1.0'
    compile 'com.amazonaws:aws-lambda-java-events:1.3.0'
    compile 'com.amazonaws:aws-lambda-java-log4j:1.0.0'

    compile 'com.amazonaws:aws-java-sdk-s3:1.11.56'
    compile 'org.twitter4j:twitter4j-core:4.0.5'

    compile 'log4j:log4j:1.2.17'
    compile 'org.slf4j:slf4j-api:1.7.0'
    compile 'org.slf4j:slf4j-log4j12:1.7.0'
}

task buildZip(type: Zip) {
    baseName = 'coffeeStatus'
    from compileJava
    from processResources
    into('lib') {
        from configurations.runtime
    }
}

build.dependsOn buildZip

line 6 : make sure you set your sourceCompatibility to 1.8, as Amazon Lambda uses Java 8
lines 13-16 : these are the core Amazon Lambda and Alexa SDK libraries. You need them.
lines 18-19 : I need these since I’m talking to S3 and Twitter. Remove them if you aren’t
lines 21-23 : the logging libraries you’ll need for S3
lines 26-35 : to deploy Java to Amazon Lambda, it has to be packaged as a zip file, with all the dependencies inside a
directory called lib inside the zip file. This Gradle task takes care of that for you, and adds the tasks onto the normal build task.

This is all the Gradle file you need to write a function for Amazon Lambda. You can add additional dependencies depending on what you’re trying to do. You will upload this jar via the Amazon Lambda management console.

Now you need to create your SpeechletRequestStreamHandler implementation. This is a pretty simple class:

package org.sporcic;

import java.util.HashSet;
import java.util.Set;
import com.amazon.speech.speechlet.lambda.SpeechletRequestStreamHandler;

public class CoffeeStatusSpeechletRequestStreamHandler extends SpeechletRequestStreamHandler {

    private static final Set<String> supportedApplicationIds = new HashSet<String>();

    static {
        String appId = System.getenv("APP_ID");
        supportedApplicationIds.add(appId);
    }

    public CoffeeStatusSpeechletRequestStreamHandler() {
        super(new CoffeeStatusSpeechlet(), supportedApplicationIds);
    }
}

line 7 : name the class what you want, but you’ll use the fully qualified name of this class in the name of the handler in the Lambda configuration
lines 12-13 : the Skills SDK has logic to verify the application ID of the caller to the Lambda function. Rather than hard coding the application ID of the Alexa Skill in code, I ready it from an environment variable configured in the Lambda Management console.
line 17 : you need to implement a no-arg construction which calls super() with an instance of your Speechlet and the Set of your authorized application IDs

One final piece of setup is to create a log4j.properties file in the src/main/resources of your project. This is necessary to use logging inside of your Lambda function. The file needs to contain this configuration:

log = .
log4j.rootLogger = DEBUG, LAMBDA

#Define the LAMBDA appender
log4j.appender.LAMBDA=com.amazonaws.services.lambda.runtime.log4j.LambdaAppender
log4j.appender.LAMBDA.layout=org.apache.log4j.PatternLayout
log4j.appender.LAMBDA.layout.conversionPattern=%d{yyyy-MM-dd HH:mm:ss} <%X{AWSRequestId}> %-5p %c{1}:%L - %m%n

NOTE: Be sure to change the level of the rootLogger before you go to production!

Now comes the fun of implementing your Speechlet. Like a Servlet, the Speechlet interface defines the lifecycle methods for handling requests from Alexa. I inspired my code from the Helloworld Speechlet in the Skills SDK. The primary difference is I used the new SpeechletV2 interface.

The SpeechletV2 interface defines four lifecycle methods Alexa will use to interact with your Lambda function:

public interface SpeechletV2 {

    void onSessionStarted(SpeechletRequestEnvelope<SessionStartedRequest> requestEnvelope);

    SpeechletResponse onLaunch(SpeechletRequestEnvelope<LaunchRequest> requestEnvelope);

    SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope);

    void onSessionEnded(SpeechletRequestEnvelope<SessionEndedRequest> requestEnvelope);
}

The primary method you’ll interact with is the onIntent() method. Here’s my implementation for Skill with two intents:

    @Override
    public SpeechletResponse onIntent(SpeechletRequestEnvelope<IntentRequest> requestEnvelope) {
        log.info("onLaunch requestId={}, sessionId={}",
                requestEnvelope.getRequest().getRequestId(),
                requestEnvelope.getSession().getSessionId());

        Intent intent = requestEnvelope.getRequest().getIntent();
        String intentName = (intent != null) ? intent.getName() : null;

        if ("CoffeeStatusIntent".equals(intentName)) {
            return getCoffeeStatusResponse();
        } else if("ShameUserIntent".equals(intentName)) {
            return tweetTheShame();
        } else {
            return getUnknownCommandResponse();
        }
    }

lines 3-5 : just shows logging is handled the same as about every other application, along with how to get the request and session IDs
lines 7-8 : you get the Intent off the request, and can get the actual name by calling getName() to decide what you’re going to do. These are the same intent names defined in the interaction model in the Alexa Skill Kit configuration.
lines 10-16 : I evaluate the String value for the Intent and call another function for each intent. I also have a fall through function which returns a generic unknown command response.

Now lets walk through one of the functions that builds the SpeechletResponse:

private SpeechletResponse getWelcomeResponse() {
        String speechText = "Welcome to Coffee Status";

        SimpleCard card = new SimpleCard();
        card.setTitle("Coffee Pot");
        card.setContent(speechText);

        PlainTextOutputSpeech speech = new PlainTextOutputSpeech();
        speech.setText(speechText);

        return SpeechletResponse.newTellResponse(speech, card);
    }

lines 4-6 : while the Echo’s are voice devices, Alex also had the mobile application. The cards (SimpleCard and StandardCard) define what shows up in the Alexa application as a result of the voice interaction. The SimpleCard only displays text, while the StandardCard provides the ability to include an Image.
lines 8-9 : this is where we define what gets said back to the user via Alex
line 11 : now that we have the Card and the OutputSpeech, we use a static factory method on the SpeechletReponse to build the response. The response can either be a “Tell” response, which simply states the OutputSpeech text, or an “Ask” response, which says the OutputSpeech and then prompts the user to provide additional information which can continue the user’s session.

The Intent provides access to the Slots data, which were defined in the Alexa Skill interaction model. The History Buff example in the Alexa Skills SDK is an excellent example of how to get data from the slots and have an interaction with the user.

Once all the code is ready, do a standard ./gradlew build to generate the zip file for upload to the Lambda Management console. The zip is place in the build/distributions directory of your Java project.

One final note: the SDK lays down a pattern for adding the configuration of your Intents and Sample Utterances to the code repository. The pattern is to create a speechAssets folder under the directory your Speechlet is in. The two files you’ll create are IntentSchema.json and SampleUtterances.txt. Here are examples of mine:

{
  "intents": [
    {
      "intent": "CoffeeStatusIntent"
    },
    {
      "intent" : "ShameUserIntent"
    }
  ]
}
CoffeeStatusIntent who took the last cup of coffee
CoffeeStatusIntent who took the last cup
CoffeeStatusIntent who was the last person to get coffee
CoffeeStatusIntent what jerk took the last cup
CoffeeStatusIntent what jerk took the last cup of coffee
ShameUserIntent to shame them
ShameUserIntent shame them

Having these in your source code makes them easier to edit, since you can just copy/paste them into correct fields in the Alexa Skill configuration. And having them under thumb also helps as a reference for developing your intents.

This takes care of the code. In my next post, I’ll cover how to deploy this to Amazon Lamba, and how to configure and text the Alexa skill.

Custom Error Pages with Spring Boot

I’ve been a big fan of the Spring Framework. Yes, it is now even more bloated than the JEE world it set out to replace, but for enterprise software development it provides a consistent solution to common problems, including ones you might not have realized you are going to have.

My biggest gripe with Spring is how painfully slow and complicated it has been to get a Spring Framework project started. Getting a basic MVC application setup with JPA and a good view technology is a royal pain in the butt. The new Spring Boot project was created to change that.

Spring Boot has turned setting up a Spring Framework project into a breeze. It’s not perfect, but after using it on a small project, I definitely plan on using it as a baseline going forward.

One of the issues with Spring Boot is that while it does a tremendous job with 90% of the problems, there is still 10% you need to dig in and figure out. Custom error pages was one of those problems for me.

Spring Boot uses embedded Tomcat by default, which means your 404 (and other) error page is the lovely, standard Tomcat page. I don’t want my error pages showing internal application state, especially for 500 errors, so I wanted to configure custom error pages.

It turns out this a pretty simple task with the org.springframework.boot.context.embedded.EmbeddedServletContainerCustomizer class.

Add the following Bean definition to whichever class you’re using for your main method to startup Spring Boot:

@Bean
public EmbeddedServletContainerCustomizer containerCustomizer() {

   return (container -> {
        ErrorPage error401Page = new ErrorPage(HttpStatus.UNAUTHORIZED, "/401.html");
        ErrorPage error404Page = new ErrorPage(HttpStatus.NOT_FOUND, "/404.html");
        ErrorPage error500Page = new ErrorPage(HttpStatus.INTERNAL_SERVER_ERROR, "/500.html");

        container.addErrorPages(error401Page, error404Page, error500Page);
   });
}

This the the Java 8 version using a lambda expression to simplify things. It creates three ErrorPage instances for three common HTTP Status Codes and then adds them to the container. The ErrorPage class is an abstraction for setting up error pages which will work with both Jetty and Tomcat.

The equivalent code for Java 7 using an inner class would be this:

@Bean
public EmbeddedServletContainerCustomizer containerCustomizer() {

    return new EmbeddedServletContainerCustomizer() {
        @Override
        public void customize(ConfigurableEmbeddedServletContainer container) {

            ErrorPage error401Page = new ErrorPage(HttpStatus.UNAUTHORIZED, "/401.html");
            ErrorPage error404Page = new ErrorPage(HttpStatus.NOT_FOUND, "/404.html");
            ErrorPage error500Page = new ErrorPage(HttpStatus.INTERNAL_SERVER_ERROR, "/500.html");

            container.addErrorPages(error401Page, error404Page, error500Page);
        }
    };
}

The actual error pages need to be place in the static content directory of the Spring Boot web application. The default location is src/main/resources/static :

File Location

For the actual files, this archive contains versions inspired by the error page included in the awesome HTML5 Boilerplate.

With files in place, you will now see a simplified version of the core error pages which don’t expose the internal state of your application. For development, you would typically want to keep your regular 500 page so you can see what blew up without chasing the log files.

Dear New Microsoft CEO

Congratulations on assuming the reigns of one of the best know technology brands in the world. Once you get past your new-hire honeymoon, you have a lot work ahead. You see, Microsoft is dying. Not in the monetary sense, but from an innovation standpoint.

You could continue to license Office and Windows to large enterprises for another decade and make your shareholders happy. But Microsoft’s future viability isn’t about the Office/Windows cash cow, it is about successfully returning Microsoft to a company built on innovation and wonder. As your predecessor so gleefully proclaimed, it’s about developers, developers, developers! And you’re losing that battle.

I’m not a billion-dollar technology executive, unlike what you became the minute you signed your offer letter. But I’ve been in the technology trenches for a while, and the guys in the trenches have a lot better instinct for how the battle is going than the REMFs at the top.

Your challenge is that Microsoft has lost its street cred. When someone says “.NET Developer”, they’re thinking of a minimally-skilled, cube dweller writing Sharepoint widgets. And that’s a shame, because you should be a whole lot more.

I’ve been a Java developer since the JDK 1.1 days, but I’ve tracked the .NET scene since its inception. I used Visual J++, read the language specs for Cool, attended C# training on the Microsoft campus, and deployed more .NET code than I’m willing to admit for fear of being kicked out of my tribe.

You have a great thing in .NET and C#, and your pissing it all away.  I’ve written code in a lot of different languages, and I still think the C# language is about the most powerful, elegant and best-designed language available to developers today.  But you’re allowing your stubbornness and internal politics to kill it by relegating it to best-supporting-actor role for your boring server products rather than driving it as a thought leader for innovation.

How many startups or smart kids in dorm rooms would even give C# more than a passing thought while building the next Facebook? The answer is near zero. Go take a walk around a non-Microsoft technology conference and count laptops. Apple owns. Even those ugly Dells are probably running Linux and not Windows. At SenchaCon this year, I probably saw more Chromebooks than Windows laptops, which must really be rubbing salt in your wounds.

If you want to follow IBM down the path to irrelevance, more power to you. But I always looked at Microsoft as the hometown hero of the northwest, so I hope you aspire to do better.  Here’s a few suggestions to help you out of the gate and to find Microsoft’s mojo.

Step 1: Fire the ignorant fool responsible for stack ranking and fix your culture. You can’t be successful when your internal culture is the equivalent of corporate Hunger Games. Teams play to win. Microsoft is any army of individuals right now. You might be able to hire mercenaries with stack ranking, but you’ll never have cohesion across the company when everyone is in mortal combat with their cube-mates for their very job survival.

Step 2: Give a free Visual Studio Professional and Windows 7 Developer Edition (see below) license to any developer who registers to be a Microsoft developer. Every other ecosystem has world-class tools available for essentially free. You can still make money on your “Enterprise Editions” suckering Fortune 500 clients into paying enormous fees, but the grassroots developers you need to attract won’t pay for it. And Express edition is too gimp. You have nothing to lose and everything to gain by getting your tools into the hands of as many people as possible.

Step 3: I use an Apple laptop probably 90% of the time for development, even when it is for tasks I could also do on Windows. The primary reason is the workflow is better. Easy virtual desktops, a full-power terminal, and a window manager that stays out of your face are the primary reasons. You should push out a version of Windows 7 tuned for developers (Windows 7 Developer Edition). Strip it of all the crap for making grandma’s life easier. Include as close as you can get to a real terminal/console (don’t get me started on the suck that is PowerShell). And it should scream when running Visual Studio. Get feedback from developers and churn on it. This is a version of Windows for developers, not Fred in accounting.

Step 4: Make Internet Explorer rock, or get out of the browser game. IE 10 is barely useable, and you all should be embarrassed to even admit authoring any of the prior versions. Everyone I know uses Chrome or Firefox. Swallow your pride and go learn what people want and like from these other browsers. Make Internet Explorer the most standards-compliant browser on the planet. You should own on HTML5 Test and Acid 3. Your JavaScript engine should blow up V8. And start churning! There should be an update to IE every two months, not every two years.

Step 5: Beat Apple at their own game. You picked the wrong battle to get into the hardware market with. An upside-down laptop isn’t revolutionary. Similar to #3, you should go build a developer-focused laptop. With all your R&D power, you should be able to come up with something than can trump a Retina Macbook Pro. Sell it direct to developers. Earn mindshare. Your stock vesting plan should correspond to the percentage of Microsoft laptops being used at conferences in three years. If you walk into OSCON or RailsConf and over 50% of the attendees are using your laptops, you’ve won.

Step 6: Become the new MySQL. SQL Server is a tremendous product. You have a database that is very easy to use, yet powerful and reliable. But once again, you’re caught up in the enterprise world trying to be Oracle instead of being yourselves. SQL Server Web edition should be filling the role you’ve been pushing SQL Server Express edition for. And it should be free. Yes, you’ll eat some short term revenue loss, but you’re in it for the long game. There should be no reason someone picks PostgreSQL or MySQL over SQL Server for a startup. And no, Bizspark doesn’t count.

I know all this sounds like a lot of developer whining, but the people who write code really are The New Kingmakers. Microsoft has a lot of cool stuff going for it, but it feels very fragmented. Microsoft is losing the battle for the hearts and minds of today’s developers, and pretty much screwed for the  future generation. You must change that. Microsoft can’t afford another lost decade.

 

Programming Choices

The past few months have pretty much been both a blur and a grind. Most of my work time has been spent on a classic big, dumb enterprise Java application. They’re a drain to write, and I’ve found it also saps my creativity for trying new things. Java, powerful as it may be, really has become the new COBOL and I definitely don’t want to be one of the grey-beards left maintaining crap code in another decade.

The tough problem is there are too many choices today, and Java ends up being the safe bet by default. There are several other technologies out there that I find more interesting. The challenge is finding the time and project to use them on.

WordPress – Yes, don’t laugh, I said WordPress. We finished up a custom WordPress site for a local home builder and it was a blast. WordPress has grown in to quite the little powerhouse and you can almost think of it as a mini web framework. PHP is a bit kludgy, but no worse than many alternatives. I’d like to get a chance to do another WordPress and push it a bit father.

NodeJS – I really enjoy working with NodeJS. As I’ve mentioned in a prior post, it reminds me of Java in the JDK 1.1 days. NodeJS is fast to work with with, and allows you to build an application about anyway out want due to an emphasis on tightly-focused “micro modules”. Think JavaScript Legos. The downside of NodeJS is it tends to be a lone genius technology. I couldn’t see a large team working on a NodeJS application.

Scala – I’ve never had such a love/hate relationship with a language as a I do with Scala. It feels like a modern language but it is still, at its heart, Java. This can be both a good thing and a bad thing. In some ways, it feels like it tries to be too different from Java, making for an abusive learning curve. And if I wanted to stay on the JVM, I could just keep doing Java and cut through the complex abstraction. Scala has moved to my “watch, but don’t play” list.

There are always other little things that pop up too competing for my attention. Now we’re firmly into spring so I can bust out of the winter doldrums and try to get creative with my coding again. I suspect though I’ll be slitting braincells between WordPress and NodeJS.

Ready, Set, GoLang!

My new found infatuation with the Go Language survived a weekend of reading The Way of Go and even some time with Grails. The book itself is poorly edited, but the information is good. And in spite of my love of Grails, I kept finding myself coming back to Go to learn more.

A lot of what I’m liking about Go is similar to what I like about NodeJS: they both have a clean API, similar to what I call the “Java One” era, aka JDK 1.1.x. In those days, the API provided the basics, and you had to build the rest. Both Go and NodeJS provide a solid API without being overwhelming.

The most interesting part though is that Go builds native applications. You know, those things we used to build before VM-based languages started owning the world. Modern development has been completely dominated recently by languages running on virtual machines, including Java, C# and Ruby. About the only holdout has been Objective-C, but I don’t like the idea of being locked in to a single platform.

Go allows a developer to build a real live executable that doesn’t depend on several MB of infrastructure software being in place. Combine that with a rich, web-centric API, and I can see some interesting possibilities.

Finally, I’ve always had a closet weakness for the C language. It is a marvel of simplicity and efficiency, but can go from fine scalpel to chainsaw with one misplaced character. Go shakes the ugly out of C, turning it into a modern language while still holding true to its ideals.

I’ll keep tinkering with Go for now. Maybe I’ll get bored with it, but maybe not. I see a lot of promise in a fast, light, native-compiling language with a solid baseline API. I dare say Google might be laying down the future with Go.