gcloud informative update message

I was playing with the new IntelliJ IDEA plugin for Google App Engine yesterday. The plugin depends on the gcloud SDK to do its work. And I started exploring gcloud a little bit more. 

I was experiencing some odd bug which prevented me to run my application locally with the App Engine's local app server. It was a bug which was present in an old version of gcloud and its App Engine component, so I had to update the SDK and its App Engine Java component to fix it. No big deal, but what I wanted to highlight here was a little detail about that upgrade process.

I love when SDKs give me information about what needs updating and what's new in the new versions I'm using!

I've been using SDKMan for dealing with various SDK installations, like those for Groovy, Grails, Gradle, etc, and I've always liked when it was telling me which SDK updates were available, what was new in SDKMan. And I'm glad to see that gcloud behaves the same, and gives informative details about what's new. So let's see that in action.

First of all, while debugging my problem with a colleague, he asked me which versions of the SDK and the App Engine component I had. So I ran the following command:

$ gcloud version
Google Cloud SDK 119.0.0
alpha 2016.01.12
app-engine-java 1.9.38
app-engine-python 1.9.38
beta 2016.01.12
bq 2.0.24
bq-nix 2.0.24
core 2016.07.21
core-nix 2016.06.06
gcloud 
gsutil 4.19
gsutil-nix 4.19

At the time of this writing, the latest version of gcloud was actually 127.0.0, but I had 119.0.0. And for the app-engine-java component, I had version 1.9.38 although 1.9.42 was available. So it was time to update!

$ gcloud components update

Your current Cloud SDK version is: 119.0.0
You will be upgraded to version: 127.0.0
┌──────────────────────────────────────────────────────────┐
│            These components will be updated.             │
├─────────────────────────────────┬────────────┬───────────┤
│               Name              │  Version   │    Size   │
├─────────────────────────────────┼────────────┼───────────┤
│ BigQuery Command Line Tool      │     2.0.24 │   < 1 MiB │
│ Cloud SDK Core Libraries        │ 2016.09.20 │   4.9 MiB │
│ Cloud Storage Command Line Tool │       4.21 │   2.8 MiB │
│ gcloud app Java Extensions      │     1.9.42 │ 135.6 MiB │
│ gcloud app Python Extensions    │     1.9.40 │   7.2 MiB │
└─────────────────────────────────┴────────────┴───────────┘
The following release notes are new in this upgrade.
Please read carefully for information about new features, breaking changes,
and bugs fixed.  The latest full release notes can be viewed at:
  https://cloud.google.com/sdk/release_notes
127.0.0 (2016/09/21)
  Google BigQuery
      ▪ New load/query option in BigQuery client to support schema update
        within a load/query job.
      ▪ New query option in BigQuery client to specify query parameters in
        Standard SQL.
  Google Cloud Dataproc
      ▪ gcloud dataproc clusters create flag
        --preemptible-worker-boot-disk-size can be used to specify future
        preemptible VM boot disk size.
  Google Container Engine
      ▪ Update kubectl to version 1.3.7.
  Google Cloud ML
      ▪ New gcloud beta ml predict command to do online prediction.
      ▪ New gcloud beta ml jobs submit prediction command to submit batch
        prediction job.
  Google Cloud SQL
      ▪ New arguments to beta sql instances create/patch commands for Cloud
        SQL Second Generation instances:
        ◆ --storage-size Sets storage size in GB.
        ◆ --maintenance-release-channel Sets production or preview channel
          for maintenance window.
        ◆ --maintenance-window-day Sets day of week for maintenance window.
        ◆ --maintenance-window-hour Sets hour of day for maintenance window.
        ◆ --maintenance-window-any (patch only) Clears maintenance window
          setting.
[...]

I snipped the output to just show the details of the changes for the latest version of gcloud, but it showed me the actual changelog up until the version I had... and as I hadn't updated in a while, there was lots of improvements and fixes! But it's really nice to see what had changed, and sometimes, you can discover some gems you weren't even aware of!

So if you're working on some kind of SDK, with auto-update capabilities, be sure to provide a built-in changelog facility to help your users know what's new and improved!


JavaOne 2016 sessions

Next week will be this time of the year where tons of Java developers are gathering & meeting in San Francisco for JavaOne. It'll be my 10th edition or so, time flies! 

This year, I'll participate to a couple sessions:

  • Java and the Commoditization of Machine Intelligence [CON2291]
    It's a panel discussion with representative from IBM, Microsoft and Google to talk about Machine Learning APIs. I'll be covering the ML APIs from Google Cloud Platform: Vision, Speech, Natural Language.
  • A Groovy Journey in Open Source [CON5932]
    In this session, I'll cover the history of the Apache Groovy project, and talk about the latest developments and new features.
Google colleagues will also be present to speak about:

  • gRPC 101 for Java Developers [CON5750] by Ray Tsang
  • Managing and Deploying Java-Based Applications and Services at Scale [CON5730] by Ray Tsang
  • Hacking Hiring [BOF1459] by Elliotte Harold
  • The Ultimate Build Tools Face-off [CON2270] with Dmitry Churbanau and Baruch Sadogursky
  • RIA Technologies and Frameworks Panel [CON4675] with Kevin Nilson
There are quite a few interesting Groovy ecosystem related talks on the agenda:

  • Improving Your Groovy Kung-Fu [CON1293] by Dierk König
  • Groovy and Java 8: Making Java Better [CON3277] by Ken Kousen
  • Spock: Test Well and Prosper [CON3273] by Ken Kousen
  • Writing Groovy AST Transformations: Getting Practical in an Hour [CON1238] by Baruch Sadogursky
  • Juggling Multiple Java Platforms and Jigsaw with Gradle [CON4832] by Cédric Champeau
  • Maven Versus Gradle: Ready...Steady...Go! [CON2951] by Mert Caliskan & Murat Yener
  • Meet the Gradle Team [BOF6209] with Sterling Greene & Cédric Champeau
  • Faster Java EE Builds with Gradle [CON4921] by Ryan Cuprak
  • Lightweight Developer Provisioning with Gradle [BOF5154] by Mario-Leander Reimer
  • Making the Most of Your Gradle Build [CON6468] by Andrés Almiray
  • Gradle Support in NetBeans: A State of the Union [CON6253] with Sven Reimers & Martin Klähn
  • A Practical RxJava Example with Ratpack [CON4044] by Laurent Doguin
Lots of interesting content! I'm really looking forward to meeting you there, in the hallways, to chat about Google Cloud Platform and Apache Groovy!

Natural language API and JavaScript promises to bind them all

A bit of web scraping with Jsoup and REST API calls with groovy-wsclient helped me build my latest demo with Glide / Gaelyk on App Engine, but now, it's time to look a bit deeper into the analysis of the White House speeches:


I wanted to have a feel of how positive and negative sentences flow together in speeches. Looking at the rhetoric of those texts, you'd find some flows of generally neutral introduction, then posing the problem with some negativity connotation, then the climax trying to unfold the problems with positive solutions. Some other topics might be totally different, though, but I was curious to see how this played out on the corpus of texts from the speeches and remarks published by the White House press office

The Cloud Natural Language API

For that purpose, I used the Cloud Natural Language API:
  • Split the text into sentences thanks to the text annotation capability. The API can split sentences even further, of course, by word, to figure out verbs, nouns, and all components of sentences (POS: Part Of Speech tagging).
  • Define the sentiment of sentences, with a polarity (negative to positive), and a magnitude (for the intensity of the sentiment expressed).
  • Extract entities, ie. finding people / organization / enterprise names, place locations, etc.
Text annotation is important for better understanding text, for example to create more accurate language translations. Sentiment analysis can help brands track how their customers appreciate their products. And entity extraction can help figure out the topics of articles, who's mentioned, places where the action takes places, which is useful for further contextual search, like finding all the articles about Obama, all the speeches about Europe, etc. There's a wide applicability of those various services to provide more metadata, a better understanding for a given piece of text.

Asynchronously calling the service and gathering results

Let's look back at my experiment. When I scrape the speeches, I actually get a list of paragraphs (initially enclosed in <p> tags basically). But I want to analyze the text sentence by sentence, so I need to use the text annotation capability to split all those paragraphs into sentences that I analyze individually.

Currently, the sentiment analysis works on one piece of text at a time. So you have to make one call per sentence! Hopefully an option might come to allow to send several pieces of text in a batch, or giving the sentiment per sentence for a big chunk of text, etc. But for now, it means I'll have to make p calls for my p paragraphs, and then n calls for all the sentences. those p + n calls might be expensive in terms of network traffic, but on the other hand, I can make the sentence coloring appear progressively, and asynchronously, by using JavaScript Promises and Fetch API, as I'm making those calls from the client side. But it seems it's possible to batch requests with the Google API Client, but I haven't tried that yet.

First of all, to simplify the code a bit, I've created a helper function that calls my backend services calling the NL API, that wraps the usage of the Fetch API, and the promise handling to gather the JSON response:
        var callService = function (url, key, value) {
            var query = new URLSearchParams();
            query.append(key, value);
            return fetch(url, {
                method: 'POST',
                body: query
            }).then(function (resp) {
                return resp.json();
            })
        };
I use the URLSearchParams object to pass my query parameter. The handy json() method on the response gives me the data structure resulting from the call. I'm going to reuse that callService function in the following snippets:
            callService('/content', 'url', e.value).then(function (paragraphs) {
                paragraphs.forEach(function (para, paraIdx) {
                    z('#output').append('<p id="para' + paraIdx + '">' + para + '</p>');
                    callService('/sentences', 'content', para).then(function (data) {
                        var sentences = data.sentences.map(function (sentence) {
                            return sentence.text.content;
                        });
                        return Promise.all(sentences.map(function (sentence) {
                            return callService('/sentence', 'content', sentence).then(function (sentenceSentiment) {
                                var polarity = sentenceSentiment.documentSentiment.polarity;
                                var magnitude = sentenceSentiment.documentSentiment.magnitude;
                                return {
                                    sentence: sentence,
                                    polarity: polarity,
                                    magnitude: magnitude
                                }
                            });
                        }));
                    }).then(function (allSentiments) {
                        var coloredSentences = allSentiments.map(function (sentiment) {
                            var hsl = 'hsl(' +
                                Math.floor((sentiment.polarity + 1) * 60) + ', ' +
                                Math.min(Math.floor(sentiment.magnitude * 100), 100) + '%, ' +
                                '90%) !important';
                            return '<span style="background-color: ' + hsl + '">' + sentiment.sentence + '</span>';
                        }).join('&nbsp;&nbsp;');
                        z('#para' + paraIdx).html(coloredSentences);
                    });
                });
            });
The first call will fetch the paragraphs from the web scraping service. I display each paragraph right away, uncolored, with an id so that I can then later update each paragraph with colored sentences with their sentiment.

Now for each paragraph, I call the sentences service, which calls the NL API to get the individual sentences of each paragraph. With all the sentences in one go, I use the Promise.all(iterable) method which returns a promise that resolves when all the promises of sentiment analysis per sentence have resolved. This will help me keep track of the order of sentences, as the analysis can give me results in a non predictable order. 

I also keep track of the paragraph index to replace all the sentences of each paragraph, once all the promises for the sentences are resolved. I update the paragraph with colored sentences once all sentences of a paragraph are resolved, joining all colored sentences together.





Web scraping and REST API calls on App Engine with Jsoup and groovy-wslite

After my Twitter sentiment article, those past couple of days, I've been playing again with the Cloud Natural Language API. This time, I wanted to make a little demo analyzing the text of speeches and remarks published by the press office of the White House. It's interesting to see how speeches alternate negative and positive sequences, to reinforce the argument being exposed.


As usual, for my cloud demos, my weapons of choice for rapid development are Apache Groovy, with Glide & Gaelyk on Google App Engine! But for this demo, I needed two things:
In both cases, we need to issue calls through the internet, and there are some limitations on App Engine with regards to such outbound networking. But if you use the plain Java HTTP / URL networking classes, you are fine. And under the hood, it's using App Engine's own URL Fetch service

I used Jsoup for web scraping, which takes care itself for connecting to the web site.

For interacting with the REST API, groovy-wslight came to my rescue, although I could have used the Java SDK like in my previous article.

Let's look at Jsoup and scraping first. In my controller fetching the content, I did something along those lines (you can run this script in the Groovy console):
@Grab('org.jsoup:jsoup:1.9.2')
import org.jsoup.*
def url = 'https://www.whitehouse.gov/the-press-office/2016/07/17/statement-president-shootings-baton-rouge-louisiana'
def doc = Jsoup.connect(url)
               .userAgent('Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0')
               .get()
println doc.select('.forall-body .field-item p').collect { it.text() }.join('\n\n')
Now I'm gonna make a call with groovy-wslight to the NL API:
@Grab('com.github.groovy-wslite:groovy-wslite:1.1.3')
import wslite.rest.*
def apiKey = 'MY_TOP_SECRET_API_KEY'
def client = new RESTClient('https://language.googleapis.com/v1beta1/')
def result = client.post(path: 'documents:annotateText', query: [key: apiKey]) {
    type ContentType.JSON
    json document: [
            type   : 'PLAIN_TEXT',
            content: text
    ], features: [
            extractSyntax           : true,
            extractEntities         : true,
            extractDocumentSentiment: true
    ]
}
// returns a list of parsed sentences
println result.json.sentences.text.content
// prints the overall sentiment of the speech
println result.json.documentSentiment.polarity

Groovy-wslight nicely handles XML and JSON payloads: you can use Groovy maps for the input value, which will be marshalled to JSON transparently, and the GPath notation to easily access the resulting JSON object returned by this API.

It was very quick and straightforward to use Jsoup and groovy-wslight for my web scraping and REST handling needs, and it was a breeze to integrate them in my App Engine application. In a follow-up article, I'll tell you a bit more about the sentiment analysis of the sentences of the speeches, so please stay tuned for the next installment!

Sentiment analysis on tweets

What’s the mood on Twitter today? Looking at my little twitter demo from a few weeks ago (using Glide & Gaelyk on Google App Engine), I thought I could enrich the visualization with some sentiment analysis to give more color to those tweets. Fortunately, there’s a new API in Google-town, the Cloud Natural Language API (some more info in the announcement and a great post showing textual analysis of Harry Potter and New York Times)!


The brand-new Cloud Natural Language API provides three key services:

  • Sentiment analysis: “inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral”.

  • Entity recognition: “inspects the given text for known entities (proper nouns such as public figures, landmarks, etc.) and returns information about those entities”.

  • Syntax analysis: “extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), providing further analysis on those tokens”.


I’m going to focus only on the sentiment analysis in this article. When analyzing some text, the API tells you whether the content is negative, neutral or positive, returning “polarity” values ranging from -1 for negative to +1 for positive. And you also get a “magnitude”, from 0 to +Infinity to say how strong the emotions expressed are. You can read more about what polarity and magnitude mean for a more thorough understanding.


Let’s get started!


With the code base of my first article, I will add the sentiment analysis associated with the tweets I’m fetching. The idea is to come up with a colorful wall of tweets like this
, with a range of colors from red for negative, to green for positive, through yellow for neutral:


I’ll create a new controller (mood.groovy) that will call the Cloud NL service, passing the text as input. I’ll take advantage of App Engine’s Memcache support to cache the calls to the service, as tweets are immutable, their sentiment won’t change. The controller will return a JSON structure to hold the result of the sentiment analysis. From the index.gtpl view template, I’ll add a bit of JavaScript and AJAX to call my newly created controller.


Setting up the dependencies


You can either use the Cloud NL REST API or the Java SDK. I decided to use the latter, essentially just to benefit from code completion in my IDE. You can have a look at the Java samples provided. I’m updating the glide.gradle file to define my dependencies, including the google-api-services-language artifact which contains the Cloud NL service. I also needed to depend on the Google API client JARs, and Guava. Here’s what my Gradle dependencies ended up looking like:


dependencies {
   compile "com.google.api-client:google-api-client:1.21.0"
   compile "com.google.api-client:google-api-client-appengine:1.21.0"
   compile "com.google.api-client:google-api-client-servlet:1.21.0"
   compile "com.google.guava:guava:19.0"
   compile "com.google.apis:google-api-services-language:v1beta1-rev1-1.22.0"
   compile "org.twitter4j:twitter4j-appengine:4.0.4"
}

Creating a new route for the mood controller


First, let’s create a new route in _routes.groovy to point at the new controller:


post "/mood",       forward:  "/mood.groovy"

Coding the mood controller


Now let’s code the mood.groovy controller!


We’ll need quite a few imports for the Google API client classes, and a couple more for the Cloud Natural Language API:


import com.google.api.client.googleapis.json.GoogleJsonResponseException
import com.google.api.client.http.*
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport
import com.google.api.client.json.jackson2.JacksonFactory
import com.google.api.services.language.v1beta1.*
import com.google.api.services.language.v1beta1.model.*

We’re retrieving the text as a parameter, with the params map:


def text = params.txt

We’ve set up a few local variables that we’ll use for storing and returning the result of the sentiment analysis invocation:


def successOutcome = true
def reason = ""
def polarity = 0
def magnitude = 0


Let’s check if we have already got the sentiment analysis for the text parameter in Memcache:


def cachedResult = memcache[text]

If it’s in the cache, we’ll be able to return it, otherwise, it’s time to compute it:


if (!cachedResult) {
   try {
       // the sentiment analysis calling will be here
   } catch (Throwable t) {
       successOutcome = false
       reason = t.message
   }

We’re going to wrap our service call with a bit of exception handling, in case something goes wrong, we want to alert the user of what’s going on. And in lieu of the comment, we’ll add some logic to analyze the sentiment


We must define the Google credentials allowing us to access the API. Rather than explaining the whole process, please follow the authentication process explained in the documentation to create an API key and a service account:


        def credential = GoogleCredential.applicationDefault.createScoped(CloudNaturalLanguageAPIScopes.all())

Now we can create our Cloud Natural Language API caller:


        def api = new CloudNaturalLanguageAPI.Builder(
               GoogleNetHttpTransport.newTrustedTransport(),
               JacksonFactory.defaultInstance,
               new HttpRequestInitializer() {
                   void initialize(HttpRequest httpRequest) throws IOException {
                       credential.initialize(httpRequest)
                   }
               })
               .setApplicationName('TweetMood').build()

The caller requires some parameters like an HTTP transport, a JSON factory, and a request initializer that double checks that we’re allowed to make those API calls. Now that the API is set up, we can call it:


        def sentimentResponse = api.documents().analyzeSentiment(
               new AnalyzeSentimentRequest(document: new Document(content: text, type: "PLAIN_TEXT"))
       ).execute()

We created an AnalyzeSentimentRequest, passing a Document to analyze with the text of our tweets. Finally, we execute that request. With the values from the response, we’re going to assign our polarity and magnitude variables:


        polarity = sentimentResponse.documentSentiment.polarity
       magnitude = sentimentResponse.documentSentiment.magnitude

Then, we’re going to store the result (successful or not) in Memcache:


    cachedResult = [
           success: successOutcome,
           message: reason,
           polarity: sentiment?.polarity ?: 0.0,
           magnitude: sentiment?.magnitude ?: 0.0
   ]
   memcache[text] = cachedResult
}

Now, we setup the JSON content type for the answer, and we can render the cachedResult map as a JSON object with the Groovy JSON builder available inside all controllers:


response.contentType = 'application/json'
json.result cachedResult

Calling our controller from the view


A bit of JavaScript & AJAX to the rescue to call the mood controller! I wanted something a bit lighter than jQuery, so I went with Zepto.js for fun. It’s pretty much the same API as jQuery anyway. Just before the end of the body, you can install Zepto from a CDN with:


<script src="https://cdnjs.cloudflare.com/ajax/libs/zepto/1.1.6/zepto.min.js"></script>

Then, we’ll open up our script tag for some coding:


<script language="javascript">
   Zepto(function(z) {
       // some magic here!
   });
</script>

As the sentiment analysis API call doesn’t support batch requests, we’ll have to call the API for  each and every tweet. So let’s iterate over each tweet:


        z('.tweet').forEach(function(e, idx) {
           var txt = z(e).data('text');
           // ....
       }

Compared to the previous article, I’ve added a data-text attribute to contain the text of the tweet, stripped from hashtags, twitter handles and links (I’ll let you use some regex magic to scratch those bits of text!).


Next, I call my mood controller, passing the trimmed text as input, and check if the response is successful:


            z.post('/mood', { txt: txt }, function(resp) {
               if (resp.result.success) {
                   // …
               }
           }

I retrieve the polarity and magnitude from the JSON payload returned by my mood controller:


                    var polarity = resp.result.polarity;
                   var magnitude = resp.result.magnitude;

Then I update the background color of my tweets with the following approach. I’m using the HSL color space: Hue, Saturation, Lightness.


The hue ranges from 0 to 360°, and for my tweets, I’m using the first third, from red / 0°, through yellow / 60°, up to green / 120° to represent the polarity, respectively with negative / -1, neutral / 0 and positive / +1.


The saturation (in percents) corresponds to the magnitude. For tweets which are small, the magnitude rarely goes beyond 1, so I simply multiply the magnitude by 100 to get percentages, and floors the results to 100% if it goes beyond.


For the lightness, I’ve got a fixed value of 80%, as 100% would always be full white!


Here’s a more explicit visualization of this color encoding with the following graph:


So what does the code looks like, with the DOM updates with Zepto?


                    var hsl = 'hsl(' +
                           Math.floor((polarity + 1) * 60) + ', ' +
                           Math.min(Math.floor(magnitude * 100), 100) + '%, ' +
                           '80%) !important';
                   
                   z(e)
                           .css('background-color', hsl)
                           .data('polarity', polarity)
                           .data('magnitude', magnitude);

For the fun, I’ve also added some smileys to represent five buckets of positivity / negativity (very negative, negative, neutral, positive, very positive), and from 0 to 3 exclamation marks for 4 buckets of magnitude. That’s what you see in the bottom of the tweet cards in the final screenshot:



Summary


But we’re actually done! We have our controller fetching the tweets forwarding to the view template from the last article, and we added a bit of JavaScript & AJAX to call our new mood controller, to display some fancy colors to represent the mood of our tweets, using the brand new Cloud Natural Language API.


When playing with sentiment analysis, I was generally on the same opinion regarding sentiment of the tweets, but I was sometimes surprised by the outcome. It’s hard for short bursts of text like tweets to decipher things like irony, sarcasm, etc, and a particular tweet might appear positive when reality it isn’t, and vice versa. Sentiment analysis is probably not an exact science, and you need more context to decide what’s really positive or negative.

Without even speaking of sarcasm or irony, sometimes certain tweets were deemed negative when some particular usually negative words appeared: a “no” or “not” is not necessarily negative when it’s negating something already negative, turning it into something more positive (“it’s not uncool”). For longer text, the general sentiment seems more accurate, so perhaps it’s more appropriate to use sentiment analysis in such cases than on short snippets.
 
© 2012 Guillaume Laforge | The views and opinions expressed here are mine and don't reflect the ones from my employer.