Binge streaming Web APIs, with Ratpack, Cloud Endpoints, App Engine Flex and Streamdata.io

At Devoxx last week, I had the chance to do a joint tools-in-action with my talented friend Audrey Neveu, titled "Binge streaming you Web API".

For the impatient, scroll down to the end of the article to access the slides presented, and view a recording of the video!

We split the presentation in two sections: 1) first of all, we obviously need to create and deploy a Web API, and then 2) to configure and use Streamdata.io to stream the updates live, rather than poll the API endlessly.

For the purpose of our demo, Audrey and myself decided to surf on the theme of the conference, by publishing ourselves an API of the conference content, listing all the talks and speakers available.

As the content is pretty much static, we needed some data that would evolve in real-time as well. We added a voting capability, so that users could click on a little smiley to say if they're enjoying the talk or not. With the streaming capability, as soon as votes are taking place, we update the UI with the new vote results.

Implementing my API with Ratpack

To build the API, I decided to go with the Ratpack framework. As the Ratpack website states:

Ratpack is a set of Java libraries for building modern HTTP applications.
It provides just enough for writing practical, high performance, apps.
It is built on Java 8, Netty and reactive principles.

You can use Ratpack with Java 8, but there's also a nice Groovy wrapper. So with my Apache Groovy hat on, I naturally chose to go with Groovy!

In the Groovy community, there's a tool called Lazybones which allows you to create template projects easily. And we're also using SDKman for installing various SDKs, including SDKman. 

If you don't have lazybones installed (but have SDKman), it's fairly easy to install:
sdk install lazybones
With both installed already on my machine, I just needed to create a new project with the Ratpack template:
lazybones create ratpack
And I had my template project ready!

Gradle to the rescue to build the app

My Gradle script is pretty straightforward, using the Ratpack Groovy plugin, the Shadow plugin, etc. Nothing really fancy:
buildscript {
    repositories {
        jcenter()
    }
    dependencies {
        classpath "io.ratpack:ratpack-gradle:1.4.4"
        classpath "com.github.jengelman.gradle.plugins:shadow:1.2.3"
    }
}
apply plugin: "io.ratpack.ratpack-groovy"
apply plugin: "com.github.johnrengelman.shadow"
apply plugin: "idea"
apply plugin: "eclipse"
repositories {
    jcenter()
}
dependencies {
    runtime 'org.slf4j:slf4j-simple:1.7.21'
    testCompile "org.spockframework:spock-core:1.0-groovy-2.4"
}
To build this app, I'll use the tar distribution target, which generates startup scripts to be launched from the command-line:
./gradlew distTar
To run the app locally, you can use:
./gradlew run
By default, it runs on port 5050.

The URL paths and methods

For developing my API, I created the following paths and methods:
  • GET /api : to list all the talks available
  • GET /api/{day} : to restrict the list of talks to just one particular day
  • GET /api/talk/{id} : to view the details of a particular talk
  • POST /api/talk/{id}/vote/{vote} : to vote on a particular talk (negative, neutral, positive)
  • POST /import : to import a JSON dump of all the talks, but it's just used by me for uploading the initial content, so it's not really part of my API.
Implementing the API

My Ratpack app implementing this API (plus a few other URLs) spans a hundred lines of Groovy code or so (including blank lines, imports, and curly braces). The implementation is a bit naive as I'm storing the talks / speakers data in memory, but I should have used a backend storage like Cloud Datastore or Cloud SQL, potentially with Memcache in front. So the app won't scale well and data will not be synchronized across multiple instances running in parallel. For the sake of my demo though, that was sufficient!
import ratpack.handling.RequestLogger
import org.slf4j.LoggerFactory
import static ratpack.groovy.Groovy.ratpack
import static ratpack.jackson.Jackson.json as toJson
import static ratpack.jackson.Jackson.fromJson
def log = LoggerFactory.getLogger('Devoxx')
def allTalks = []
ratpack {
    handlers {
        all(RequestLogger.ncsa())
        post('import') {
            log.info "Importing talks dump"
            byContent {
                json {
                    parse(fromJson(List)).onError { e ->
                        String msg = "Import failed: $e"
                        log.error msg
                        response.status(400)
                        render toJson([status: "Import failed: $e"])
                    }.then { talks ->
                        allTalks = talks
                        log.info "Loaded ${allTalks.size()} talks"
                        render toJson([status: 'Import successful'])
                    }
                }
            }
        }
        prefix('api') {
            prefix('talk') {
                post(':id/vote/:vote') {
                    def aTalk = allTalks.find { it.id == pathTokens.id }
                    if (aTalk) {
                        def msg = "Voted $pathTokens.vote on talk $pathTokens.id".toString()
                        switch (pathTokens.vote) {
                            case "negative":
                                aTalk.reactions.negative += 1
                                log.info msg
                                render toJson([status: msg])
                                break
                            case "neutral":
                                aTalk.reactions.neutral += 1
                                log.info msg
                                render toJson([status: msg])
                                break
                            case "positive":
                                aTalk.reactions.positive += 1
                                log.info msg
                                render toJson([status: msg])
                                break
                            default:
                                response.status(400)
                                msg = "'${pathTokens.vote}' is not a valid vote".toString()
                                log.info msg
                                render toJson([status: msg])
                        }
                    } else {
                        response.status(404)
                        render toJson([status: "Talk $pathTokens.id not found".toString()])
                    }
                }
                get(':id') {
                    def aTalk = allTalks.find { it.id == pathTokens.id }
                    if (aTalk) {
                        log.info "Found talk: $pathTokens.id"
                        render toJson(aTalk)
                    } else {
                        String msg = "Talk $pathTokens.id not found"
                        log.info msg
                        response.status(404)
                        render toJson([status: msg])
                    }
                }
            }
            get(':day') {
                def talksPerDay = allTalks.findAll { it.day == pathTokens.day }.collect {
                    it.subMap(it.keySet() - 'summary')
                }
                if (talksPerDay) {
                    render toJson(talksPerDay)
                } else {
                    response.status(404)
                    render toJson([status: "Invalid day, or no talks found for: $pathTokens.day".toString()])
                }
            }
            get {
                render toJson(request.queryParams.full ? allTalks : allTalks.collect {
                    it.subMap(it.keySet() - 'summary')
                })
            }
        }
    }
}

Some interesting points about this code:
  • use of an NCSA-compliant request logger to log all API calls (following the NCSA usual output pattern) and a dedicated logger for important events, and you're able to watch both those kind of logs in the Stackdriver logging in the cloud console
  • the use of prefix('...') to factor common path parts
  • Jackson is being used both for parsing (the input file containing the initial list of talks) as well as for output for rendering the JSON payloads
  • see how we use byContent / json to handle the requests coming up with content-type of application/json
  • the rest is essentially error handling, collection filtering, etc.
Containerizing the app

Ratpack requires JDK 8 to run, and it's not based on servlets, I went with Google App Engine Flex, which allows me to run JDK 8 / Java 8 apps which can be based on networking toolkits like Netty.

I mentioned we're using the distTar target to build a distribution of the application, and that's what we'll point our Dockerfile at, as App Engine Flex allows you to customize Docker images for bundling and running your app:
FROM gcr.io/google_appengine/openjdk8
VOLUME /tmp
RUN mkdir -p /app/endpoints
ADD service.json /app/endpoints
ADD build/distributions/devoxx-reactions.tar /
ENV JAVA_OPTS='-Dratpack.port=8080 -Djava.security.egd=file:/dev/./urandom'
ENTRYPOINT ["/devoxx-reactions/bin/devoxx-reactions"]
I'm using the dedicated Open JDK 8 image for App Engine, from the Google Container Registry. I'm specifying the port for my app, the entry point to the startup scripts. You'll notice the lines about "endpoints" and "service.json", and I'll come to it in a minute: it's because I'm using Google Cloud Endpoints to expose and manage my API!

At this point, to feel safer, you can double check that your app is running under docker with something like:
./gradlew distTar
docker build -t devoxx-reactions-image .
docker run -p 127.0.0.1:8080:8080 -it devoxx-reactions-image
The app is running on port 8080 of your localhost. We'll see later on how to test it with curl, and how to load the sample data that we've prepared in public/data/talks.json.

Gcloud, to set up our projects

As we're going to use Cloud Endpoints on App Engine Flex, it's time that I start setting things up with the gcloud SDK:
gcloud init
gcloud components update
gcloud components install beta
gcloud config set project [YOUR_PROJECT_ID]
About Google Cloud Endpoints

Google Cloud Platform offers a particular service for managing Web APIs called Endpoints. With Endpoints, you're able to monitor your API (see what endpoints, methods, are called, with some nice graphs and stats), to secure your API (with different kind of authentication like Firebase authentication, JSON Web Tokens, API keys, etc.), to scale it.

Speaking of scaling, I'm using App Engine Flex here as my deployment target, but it's possible to use Compute Engine or Container Engine as well. Interestingly, you can use Endpoints API management with an API hosted in a third-party cloud, as well as on premises too!

The management aspect of the API is done thanks to a proxy, called the Endpoints Service Proxy, which is implemented on top of NGINX (and which will be open sourced). And to continue on scaling aspect of the story, it's interesting to note that this proxy is living along your app (in its own container). If your API needs to scale across several machines, instead of one giant middle proxy somewhere, your ESP will be duplicated the same way. So there's no single point of failure with a gigantic proxy, but it also means that the latency is super low (below 1 ms usually), because the proxy is as close as possible to your API, without needing any additional costly network hop.

The last interesting aspect about Cloud Endpoints that I'd like to mention is that your API contract is defined using OpenAPI Specs (formerly known as Swagger). So it doesn't matter which language, framework, tech stack you're using to implement your API: as long as you're able to describe your API with an OpenAPI Specification, you're good to go!

Specifying the contract of our API with OpenAPI Spec

I mentioned before the various resources and methods we're using for our API, and we'll encode these in the form of a contract, using the OpenAPI Spec definition format. In addition to the various resources and methods, we should also define the payloads that will be exchanged: basically, a Talk, a Result, a list of talks. We must define the different status codes for each kind of response. Here's what my specification looks like:
---
swagger: "2.0"
info:
  description: "Consult the Devoxx schedule and vote on your favorite talks."
  version: "1.0.0"
  title: "Devoxx Reactions"
  contact: {}
host: "devoxx-reactions.appspot.com"
schemes:
- "http"
paths:
  /import:
    post:
      summary: "Import a dump of the lists of talks"
      operationId: "ImportTalks"
      consumes:
      - "application/json"
      produces:
      - "application/json"
      parameters:
      - name: "talkList"
        in: "body"
        required: true
        schema:
          $ref: "#/definitions/TalkList"
      responses:
        200:
          description: "Status 200"
          schema:
            $ref: "#/definitions/Result"
        400:
          description: "Status 400"
          schema:
            $ref: "#/definitions/Result"
  /api:
    get:
      summary: "Get the list of talks"
      operationId: "GetAllTalks"
      produces:
      - "application/json"
      parameters: []
      responses:
        200:
          description: "Status 200"
          schema:
            type: "array"
            items:
              $ref: "#/definitions/Talk"
  /api/talk/{talk}:
    get:
      summary: "Get a particular talk"
      operationId: "GetOneTalk"
      produces:
      - "application/json"
      parameters:
      - name: "talk"
        in: "path"
        required: true
        type: "string"
      responses:
        200:
          description: "Status 200"
          schema:
            $ref: "#/definitions/Talk"
        404:
          description: "Status 404"
          schema:
            $ref: "#/definitions/Result"
  /api/talk/{talk}/vote/{vote}:
    post:
      summary: "Vote for a talk"
      operationId: "VoteOnTalk"
      produces:
      - "application/json"
      parameters:
      - name: "talk"
        in: "path"
        required: true
        type: "string"
      - name: "vote"
        in: "path"
        description: "The vote can be \"negative\", \"neutral\" or \"positive\""
        required: true
        type: "string"
      responses:
        200:
          description: "Status 200"
          schema:
            $ref: "#/definitions/Result"
        400:
          description: "Status 400"
          schema:
            $ref: "#/definitions/Result"
        404:
          description: "Status 404"
          schema:
            $ref: "#/definitions/Result"
  /api/{day}:
    get:
      summary: "Get the talks for a particular day"
      operationId: "GetTalksPerDay"
      produces:
      - "application/json"
      parameters:
      - name: "day"
        in: "path"
        required: true
        type: "string"
      responses:
        200:
          description: "Status 200"
          schema:
            type: "array"
            items:
              $ref: "#/definitions/Talk"
        404:
          description: "Status 404"
          schema:
            $ref: "#/definitions/Result"
definitions:
  Result:
    type: "object"
    required:
    - "status"
    properties:
      status:
        type: "string"
    description: "Voting results"
  Talk:
    type: "object"
    required:
    - "day"
    - "fromTime"
    - "fromTimeMillis"
    - "id"
    - "reactions"
    - "speakers"
    - "talkType"
    - "title"
    - "toTime"
    - "toTimeMillis"
    - "track"
    properties:
      day:
        type: "string"
      fromTime:
        type: "string"
      fromTimeMillis:
        type: "integer"
        format: "int64"
      id:
        type: "string"
        description: ""
      reactions:
        type: "object"
        properties:
          negative:
            type: "integer"
            format: "int32"
          neutral:
            type: "integer"
            format: "int32"
          positive:
            type: "integer"
            format: "int32"
        required:
        - "negative"
        - "neutral"
        - "positive"
      room:
        type: "string"
      speakers:
        type: "array"
        items:
          type: "string"
      summary:
        type: "string"
      talkType:
        type: "string"
      title:
        type: "string"
      toTime:
        type: "string"
      toTimeMillis:
        type: "integer"
        format: "int64"
      track:
        type: "object"
        properties:
          title:
            type: "string"
          trackId:
            type: "string"
        required:
        - "title"
        - "trackId"
    description: "A talk representation"
  TalkList:
    type: "array"
    items:
      $ref: "#/definitions/Talk"
    description: "A list of talks"
You can write your API specifications using either JSON or YAML. I chose YAML because it's a bit easier to read, for the human eye, and still as much readable for the computer as well.

There's one extra step for instructing Cloud Endpoints about our API definition: I needed to convert my OpenAPI Spec into the service definition format used internally by Cloud Endpoints, thanks to the following command:
gcloud beta service-management convert-config swagger20.yaml service.json
Deploying on App Engine Flex

To deploy on App Engine and use Cloud Endpoints, we're going to use the gcloud command-line tool again:
gcloud beta app deploy
After a few minutes, your app / API should be available and be ready for serving.

Testing our API

I have a JSON file with the bulk of the talks and their details, so I uploaded it with:
curl -d @src/ratpack/public/data/talks.json -H 'Content-Type: application/json' http://devoxx-reactions.appspot.com/import
And then I was able to call my API with:
curl https://devoxx-reactions.appspot.com/api
curl https://devoxx-reactions.appspot.com/api/monday
curl https://devoxx-reactions.appspot.com/api/talk/HFW-0944
And to vote for a given talk with:
curl -X POST https://devoxx-reactions.appspot.com/api/talk/XMX-6190/vote/positive
Managing your API

When you're visiting your cloud console, you'll be able to see interesting statistics and graphs, about the usage of your API:



Streaming the API

For the streaming part of the story, I'll let Audrey cover it! I focused on the backend, how she focused on the Polymer frontend, with a custom Streamdata component, that used the Streamdata proxy to get the patches representing the difference between consecutive calls to the backend. So when the votes were changing (but the rest of the talk details were left unchanged), Streamdata would send back to the front only the diff. In addition to keeping ongoing data exchanges low (in terms of size), the proxy is also able to take care of caching, so it also helps avoiding hitting the backend too often, potentially helping with scalability of the backend, by keeping the cache at the proxy level.

Slides and video available!

You can watch the video online on Devoxx' YouTube channel:


And you can also have a closer look at the slides as well:

Becoming Twitter verified!

Probably for vanity sake, or perhaps even out of jealousy seeing friends becoming "twitter verified", I was curious to see if, me too, I could get those little ticks beside my name on my Twitter profile.

Generally speaking, verified accounts are accounts of "public interest". It can range from your usual movie stars, to politicians, from well-known artists, to company CEOs, but also persons somehow well known in the twittosphere, including tech luminaries, representative of particular tech communities, etc. So, seeing my tech friends getting the little tick, I thought I should try that out.

And on my first try... I failed...

I didn't completely failed, as I didn't get the usual message saying I failed verification, but that some tweaks could be made to my profile to add some more information. They asked whether the account name reflected the real person's name (in my case 'glaforge' maps well to my name!), and if my government issued photo ID was legible or not (and it was the case too). So I was puzzled, and didn't really get actionable items to pursue.

For my first attempt, I initially scanned my national ID card, but I guess it's not a very well known format, so on my second try, I instead scanned my passport. Even if I'm not feeling safe sending a scan of my ID... but that's for another story.

I didn't change anything, but two months later, I tried again, and this time it worked!


After bragging about being verified on Twitter, I got several DM conversations asking for advice. So I'll just repeat some of the advice given by Twitter themselves, and add my own into the loop.

First, be sure to read the following articles from Twitter:
The last one is the most important of the three.

You should really have:
  • a verified phone number: It'll be checked with a text message and a code, pretty easy step.
  • a confirmed email address: Same principle, be sure to use a real email address, and a similar verification step takes place.
  • a bio: I think the bio is pretty important. It describes the "public" person that you are. So if you're a singer, mention that you're a singer and give a link that proves that you are one, for example the website of your band. During the verification, you'll be asked for up to 5 links confirming who you are, so don't hesitate to reuse one of those links in your bio, or related twitter account, etc.
  • a header photo: I don't think the content of the header photo really matters, but verified accounts need to have a header picture. Even if it's just a nice landscape, it should be there, instead of just the colored background.
  • a birthday: I'm not sure how important this really is. But if you indicate your birthday, it better be the same date as the one provided on your ID scan obviously!
  • a website: For the website, I used the URL of my blog. And this is also one of the URLs I've given in the form to request verification. This URL should point at a place that can prove who you are: on my blog, there's a section about me, describing what I'm doing in life.
  • public tweets: your tweets should be publicly visible, otherwise, no point in requesting being verified!
The documentation says it's better if the account name reflects the real name of the person. So it might help. But if you're known with a particular Twitter handle, I don't think you really need to change it for the sake of becoming verified. But it might be harder with a strange Twitter handle than a handle that resembles your name.

When you fill the form for requesting verification, you will be asked for up to five links. In my case, I gave the URL of my blog (which, as I said, has a section explaining who I am, shows my real name, and also a picture of myself, so they can check the Twitter avatar, the photo ID as well). I also gave my LinkedIn profile, my Google+ profile. And I think that's all. Anything that proves you're who you claim to be can be useful.

Last but not least, there's a form field where you have to justify why you request becoming verified. In my case, I wanted cover the two "aspects" of my life: at day, I'm a Developer Advocate for Google, and at night, I'm working on the Apache Groovy open source project. For reference, here's the blurb that I used:

"Leading the Apache Groovy project, I'm a spokesperson for the successful open source project, and for the ecosystem & community around it. In my day life, I'm also a Developer Advocate, at Google, on their Cloud Platform. Having my Twitter account verified would be an additional stamp of approval for my involvement in the Groovy community and with the company's product I advocate for."

After my first attempt two months ago, I tried again over the weekend, and the next day, I got the email confirming my account was verified! Yay!

I hope this article is helpful. Don't hesitate to share your own tricks in the comments. At least, following the advice above, I was able to get verified. I have about 11K followers, but I've seen people with less than half as many followers also become verified. So the number of followers is not everything. But I think paying attention to the quality of your profile is what ultimately pays off. So good luck with your verification process! And don't hesitate to share your own tips in the comments below.

Latest features of Google Cloud Platform

When you're following a project, a company, a platform, you're looking for the latest news, about the latest feature announcement, to take advantage of what's coming up. 

Last time, I blogged about the gcloud command line tool, which nicely shows you the latest updates since the last time you updated its components. 

If you go to the Google Cloud Platform website, you'll see dedicated release notes pages for pretty much all products. For example, here are the release notes for:
I've just discovered a new way to stay updated about what's new. If you go to the Cloud Console, click on the little vertical dots, and then preferences:

Then, in the main panel, you'll see a new "communication" section, and if you click on the "updates & offers", you'll get a chance to select the option "feature announcements":


You'll receive monthly emails about the feature announcements. 

Quick intro to Google Cloud Platform for the Paris Ansible meetup

Tonight, Google France was hosting the Paris Ansible meetup, and I had the chance to play the Master of Ceremony, by introducing the speakers for the evening, as well as give a brief introduction to the Google Cloud Platform, as well as outlining where Ansible users and DevOps engineers might be interested in learning more.

Here's my quick overview of the Google Cloud Platform:

Scaling a Swagger-based Web API on Google Cloud Endpoints

I had the pleasure of presenting at the Nordic APIs Platform Summit 2016 in Stockholm this week. I enjoyed the conference a lot, with great speakers and content, flawless organization, and nice interactions with the audience.

For the last keynote of the conference, I had the chance to present about Google Cloud Endpoints, Google's take on API management. I worked on a little "pancake"-powered demo, deploying a Ratpack application, in a Docker container, on Google Container Engine. I created an OpenAPI Specification describing my Web API that served pancakes. And used the Extensible Service Proxy to receive the API calls for securing (with an API key), monitoring (through the Cloud Console) and scaling my Web API (thanks to the scaling capabilities of Container Engine). This demo will be the topic of some upcoming blog posts.

In the meantime, here is the abstract of my talk:

Scale a Swagger-based Web API with Google Cloud Endpoints

Web APIs are and more often specified with API definition languages like Swagger (now named Open API Spec), as it can help you generate nice interactive documentation, server skeletons, and client SDKs, mocks, and more, making it simpler to get started both producing and consuming an API.

In this session, Guillaume will demonstrate how to define a Web API with Swagger / Open API Spec, and scale it using Cloud Endpoints, on the Google Cloud Platform.

And here are the slides I presented:



 
© 2012 Guillaume Laforge | The views and opinions expressed here are mine and don't reflect the ones from my employer.