Automating Chrome headless mode on App Engine with Node.JS 8

On the Google Cloud front today, the big news is the release of the new Node.JS 8 runtime for Google App Engine Standard. It’s been a while since a completely new runtime was added to the list of supported platforms (Python, Java, PHP, Go). You could already run anything in custom containers on App Engine Flex, including your own containerized Node app, but now you can have all the nice developer experience on the Standard environment, with fast deployment times, and 0 to 1 to n instance automatic scaling (you can see the difference between those two environments here).

To play with this new runtime, I decided to follow the steps in this guide about using Chrome headless with Puppeteer.

As my readers know, I’m not really a Node person, and usually dabble more with Apache Groovy and Java, but this runtime was interesting to me as there’s a nice integration with native packages. Let me explain.

The App Engine Node runtime includes tons of native package out of the box, without requiring you to install anything (except the Node modules that take advantage of those packages, of course.) For instance, if you need to do any audio / video manipulation, there’s an ffmpeg package. If you want to deal with Git repositories, there’s a git package. Need to manipulate images, there’s ImageMagick, etc. And there are usually nice Node wrapper modules around those native components.

Among those system pre-installed packages, there’s all the necessary ones to run Headless Chrome, ie. running the Chrome browser but without displaying its window basically.

Furthermore, there’s the Puppeteer Node module, which is a library to control Chrome. With those two, you can completely automate the usage of Chrome on the server-side.

What can you do with that? Well, you can:

  • look at / introspect / manipulate the DOM,

  • pre-render content for your single page apps,

  • take screenshots of web pages,

  • watch a particular page and compute diffs between different versions, etc.

Let’s get started!

Without blindly recopying all the steps explained in the tutorial for running Chrome headless, I’ll simply highlight some of key points. The goal is to let puppeteer take screenshots of webpages.

In your package.json, you need the reference the puppeteer module, and potentially express for handling your web requests:

  "dependencies": {
"express": "^4.16.3",
"puppeteer": "^1.2.0"

Taking advantage of Node 8’s async capabilities, in your app.js file, you can instantiate puppeteer:

const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox']

Then navigate to the desired page:

  const page = await browser.newPage();
await page.goto(url);

Take a screenshot and render it back to the browser:

  const imageBuffer = await page.screenshot();
res.set('Content-Type', 'image/png');

To deploy to the Node runtime, you also need the app.yaml deployment descriptor:

runtime: nodejs8
instance_class: F4_1G

We specify that we want to use the new node runtime, but also that we want a slightly bigger instance to run our Node app, as Chrome is pretty hungry with RAM!

Then deploy your app with the gcloud CLI.

Be sure to check the whole code on Github for all the details.

One quick remark: although it’s not mentioned in the tutorial, when you’ll first try to deploy the application, it’ll tell you that you need to enable the Container Builder API. The error message will be something like “Container Builder has not been used in project xyz before or it is disabled. Enable it by visiting…” You just need to follow the indicated URL to enable Container Builder. Container Builder is responsible for containerizing your application to be run on App Engine.

Then I was able to navigate to my app, pass it a URL, and get back the screenshot of the web page at that URL. It’s pretty handy if you want to integrate thumbnails of websites you reference in your blog posts, for example, or if you want to see if there are differences between different versions of a web page (for integration testing purposes).


The Java ecosystem has a wealth of libraries for various tasks, but often, there are native libraries which are more fully-featured, and Node generally provides nice wrappers for them. Chrome headless with Puppeteer is one example, but ImageMagick for image manipulation is another great one, where I could not find a good equivalent library in the Java ecosystem. So as they say, use the best tool for the job! In the age of microservices, feel free to use another tech stack that best fit the task at hand. And it’s really exciting to see this new Node 8 runtime for App Engine now being available so that you can take advantage of it in your projects.

Best of Web Paris: Chatbots, et si on passait la seconde ?

Avec mon fidèle comparse Wassim Chegham, nous avons eu le plaisir d'animer un nouvel atelier sur le thème des chatbots, à la conférence Best of Web Paris 2018.

Lors de cet atelier, nous avons parlé de l'Assistant Google, comment créer des chatbots pour cette plateforme en utilisant l'outil Dialogflow, pour concevoir son interface conversationnelle.

Pour la partie "mains dans le cambouis", nos participants ont pu s'exercer sur les deux "codelabs" suivants :

Si vous n'avez pas eu l'occasion de participer à l'atelier, vous pouvez très bien suivre ces deux codelabs didactiques de par vous même, sans soucis.

Voici également la présentation que nous avons montré aux participants :

Vision recognition with a Groovy twist

Last week at GR8Conf Europe, I spoke about the machine learning APIs provided by Google Cloud Platform: Vision, Natural Language, Speech recognition and synthesis, etc. Since it’s GR8Conf, that means showing samples and demos using a pretty Groovy language, and I promised to share my code afterwards. So here’s a series of blog posts covering the demos I’ve presented. We’ll start with the Vision API.

The Vision API allows you to:

  • Get labels of what appears in your pictures,
  • Detect faces, with precise location of face features,
  • Tell you if the picture is a particular landmark,
  • Check for inappropriate content,
  • Give you some image attributes information,
  • Find if the picture is already available on the net,
  • Detects brand logos,
  • Or extract text that appears in your images (OCR).

You can try out those features online directly from the Cloud Vision API product page:

Here’s a selfish example output:

In this first installment, I’ll focus on two aspects: labels and OCR. But before diving, I wanted to illustrate some of the use cases that those features enable, when you want to boost your apps by integrating the Vision API.

Label detection

What’s in the picture? That’s what label detection is all about. So for example, if you’re showing a picture from your vacations with your family and dog on the beach, you might be getting labels like “People, Beach, Photograph, Vacation, Fun, Sea, Sky, Sand, Tourism, Summer, Shore, Ocean, Leisure, Coast, Girl, Happiness, Travel, Recreation, Friendship, Family”. Those labels are also accompanied by percentage confidence.

Example use cases:

  • If your website is a photography website, you might want your users to be able to search for particular pictures on specific topics. Rather than having someone manually label each and every picture uploaded, you can store those labels as keywords for your search engine.
  • You’re building the next Instagram for animals, perhaps you want to check that the picture indeed contains a dog in it. Again labels will help to tell you if that’s the case or not.
  • Those labels can help with automatic categorization as well, so you can more easily find pictures along certain themes.

Face detection

Vision API can spot faces in your pictures with great precision, as it gives you detailed information about the location of the face (with a bounding box), as well as the position of each eye, eyebrows, nose, lips / mouth, ear, or chin. It also tells you how your face is tilted, even with which angle!

You can also learn about the sentiment of the person: joy, sorrow, anger, surprise. In addition, you’re told if the face is exposed, blurred, or has some headwear.

Example use cases:

  • Snapchat-style, you want to add a mustache or silly hat to pictures uploaded by your users.
  • Another handy aspect is that you can also count the number of faces in a picture. For example, if you want to count the number of attendees in a meeting or presentation at a conference, you’d have a good estimate of people with the number of faces recognized.

Landmark detection

If the picture is of a famous landmark, like say, the Eiffel Tower, Buckingham Palace, the Statue of Liberty, the Golden Gate, the API will tell you which landmark it recognized, and will even give you the GPS coordinates of the place.

Example use cases:

  • Users of your app or site should only upload pictures of a particular location, you could use those coordinates to automatically check that the place photographed is within the right geo-fenced bounding box.
  • You want to automatically show pictures on a map on your app, you could take advantage of the GPS coordinates, or automatically enrich a tourism websites with pictures from the visitors of that place.

Inappropriate content detection

Vision API will give you a percentage confidence about whether the picture contains adult content, is a spoofed picture (with user added text and marks), bears some medical character, displays violence or racy materials.

Example use case:

  • The main use case here is indeed to be able to filter images automatically, to avoid any bad surprise of user-generated content showing up in your site or app when it shouldn’t.

Image attributes and web annotations

Pictures all have dominant colors, and you can get a sense of which colors are indeed representing the best your image, in which proportion. Vision API gives you a palette of colors corresponding to your picture.

In addition to color information, it also suggests possible crop hints, so you could crop the picture to different aspect ratios.

You get information about whether this picture can be seen elsewhere on the net, with a list of URL with matched images, full matches, or partial matches.

Beyond the label detection, the API identifies “entities”, returning to you IDs of those entities from the Google Knowledge Graph.

Example use cases:

  • To make your app or website responsive, before loading the full images, you’d like to show colored placeholders for the picture. You can get that information with the palette information returned.
  • You’d like to automatically crop pictures to keep the essential aspects of it.
  • Photographers upload their pictures on your website, but you want to ensure that no one steals those pictures and put them on the net without proper attribution. You can find out whether this picture can be seen elsewhere on the web.
  • For the picture of me above, Vision API recognized entities like “Guillaume Laforge” (me!), Groovy (the programming language I’ve been working on since 2003), JavaOne (a conference I’ve often spoken at), “Groovy in Action” (the book I’ve co-authored), “Java Champions” (I’ve recently been nominated!), or “Software Developer” (yes, I still code a bit)

Brand / logo detection

In addition to text recognition, that we’ll mention right below, Vision API tells you if it recognized some logos or brands.

Example use case:

  • If you’re a brand, have some particular products, and you want your brand or products to be displayed on supermarket shelves, you might have people take pictures of those shelves and confirm automatically if your logo is displayed or not.

OCR / text recognition

You can find the text that is displayed on your pictures. Not only does it gives you the raw text, but you also get all the bounding boxes for the recognized words, as well as offers a kind of document format, showing the various blocks of texts that appear.

Example use case:

  • You want to automatically scan expense receipts, enter text rapidly from pictures, etc. The usual use cases for OCR!

Time to get Groovy!

People often wonder where and when they can use the Vision API, I think I’ve given enough use cases for now, with detailed explanations. But it’s time to show some code, right? So as I said, I’ll highlight just two features: label detection and text detection.

Using the Apache Groovy programming language, I wanted to illustrate two approaches: the first one using the a REST client like groovy-wslite, and the second one just using the Java SDK provided for the API.


In order to use the Vision API, you’ll need to have created an account on Google Cloud Platform (GCP for short). You can benefit from the $300 of credits of the free trial, as well as free quotas. For instance, for the Vision API, without even using your credits, you can make 1000 calls for free every month. You can also have a look at the pricing of the API, once you go above the quota or your credits.

Briefly, if you’re not already using GCP or have an account on it, please register and create your account on GCP, then create a cloud project. Once the project is created, enable access to Vision API, and you’re ready to follow the steps detailed hereafter.

Although we’ve enabled the API, we still need somehow to be authenticated to use that service. There are different possibilities here. In the first sample with the OCR example, I’m calling the REST API and will be using an API key passed as query parameter, whereas for my label detection sample, I’m using a service account with applicaction default credentials. You can learn more about those approaches in the documentation on authentication.

Okay, let’s get started!

OCR calling the Vision REST endpoint

With spring and summer, allergetic people might be interested in which pollens are going to cause their allergies to bother them. Where I live, in France, there’s a website showing a map of the country, and you can hover the region where you are, and see a picture of the allergens and their levels. However, this is really just a picture with the names of said allergens. So I decided to get their list with the Vision API.

In Apache Groovy, when calling REST APIs, I often use the groovy-wslite library. But there are other similar great libraries like HTTPBuilder-NG, which offer similar capabilities with nice syntax too.

Let’s grab the REST client library and instantiate it:

def client = new RESTClient('')

Here’s the URL of the picture whose text I’m interested in:

def imgUrl = ""

Then I’m going to send a post request to the /images:annotate path with the API key as query parameter. My request is in JSON, and I’m using Groovy’s nice list & maps syntax to represent that JSON request, providing the image URL and the feature I’m interested in (ie. text detection):

def response = '/images:annotate', query: [key: API_KEY]) { 
    type ContentType.JSON 
    json "requests": [ 
            "image": [ 
                "source": [ 
                    "imageUri": imgUrl 
            "features": [ 
                    "type": "TEXT_DETECTION" 

This corresponds to the following JSON payload:

    "requests": [ 
            "image": { 
                "source": { 
                    "imageUri": imgUrl 
            "features": [ 
                    "type": "TEXT_DETECTION" 

Thanks to Groovy’s flexible nature, it’s then easy to go through the returned JSON payload to get the list and println all the text annotations and their description (which corresponds to the recognized text):

println response.json.responses[0].textAnnotations.description

Label detection with the Java client library

For my second sample, as I visited Copenhagen for GR8Conf Europe, I decided to see what labels the API would return for a typical picture of the lovely colorful facades of the Nyhavn harbor.

Let’s grab the Java client library for the vision API:


Here’s the URL of the picture:

def imgUrl = "" .toURL()

Let’s instantiate the ImageAnnotatorClient class. It’s a closeable object, so we can use Groovy’s withCloseable{} method:

ImageAnnotatorClient.create().withCloseable { vision ->

We need the bytes of the picture, that we obtain with the .bytes shortcut, and we create a ByteString object from the protobuf library used by the Vision API:

def imgBytes = ByteString.copyFrom(imgUrl.bytes)

Now it’s time to create our request, with the AnnotateImageRequest builder, using Groovy’s tap{} method to make the use of the builder pattern easier:

def request = AnnotateImageRequest.newBuilder().tap { 
    addFeatures Feature.newBuilder().setType(Feature.Type.LABEL_DETECTION).build() 
    setImage    Image.newBuilder().setContent(imgBytes).build() 

In that request, we ask for the label detection feature, and pass the image bytes. Then it’s time to call the API with our request, and iterate over the resulting label annotations and their confidence score:

      .responsesList.each { res -> 
    if (res.hasError()) 
        println "Error: ${res.error.message}" 

    res.labelAnnotationsList.each { annotation -> 
        println "${annotation.description.padLeft(20)} (${annotation.score})" 

The labels found (and their confidence) are the following:

            waterway (0.97506875) 
water transportation (0.9240114)
town (0.9142202)
canal (0.8753313)
city (0.86910504)
water (0.82833123)
harbor (0.82821053)
channel (0.73568773)
sky (0.73083687)
building (0.6117833)

Pretty good job!


First of all, there are tons of situations where you can benefit from image recognition thanks to Google Cloud Vision. Secondly, it can get even groovier when using the Apache Groovy language!

In the upcoming installment, we’ll speak about natural language understanding, text translation, speech recognition and voice synthesis. So stay tuned!

Machine Learning APIs and AI panel discussion at QCon

Last March, I had the chance to attend and speak at QCon London. I spoke at the event for its first edition, many moons prior, so it was fun coming back and seeing how the conference evolved. 
This year, Eric Horesnyi of Streamdata was leading the Artificial Intelligence track, and invited me to speak about Machine Learning.

First, I gave an overview of the Machine Learning offering, from the off-the-shelf ready-made APIs like Vision, Speech, Natural Language, Video Intelligence. I also mentioned AutoML, to further train existing models like the Vision model in order to recognize your own specific details in pictures. For chatbots, I also covered Dialogflow. And I said a few words about Tensorflow and Cloud Machine Learning Engine for training & running your Tensorflow models in the cloud. You can watch the video by clicking on the picture below:

Eric hosted a panel discussion with all the speakers in the AI track, where we discussed many interesting topics, to demystify AI and answer questions from the audience. Click on the picture below to watch the panel discussion:

Getting started with G* tech on Google Cloud Platform, at GR8Conf Europe

Back to GR8Conf Europe in Denmark, for the yearly Groovy community reunion! I had the chance to present two talks. 

The first one on Google's Machine Learning APIs, with samples in Groovy using vision recognition, speech recognition & generation, natural language analysis. I'll come back on ML in Groovy in forthcoming articles. 

And the second talk was an overview of Google Cloud Platform, focusing on the compute and storage options, with demos using Groovy frameworks (Ratpack, Gaelyk, and the newly released Micronaut) and how to deploy apps on Compute Engine, Kubernetes Engine, App Engine. I'll also come back in further articles on those demos, but in the meantime, I wanted to share my slide deck with you all! Without further ado, here's what I presented:

© 2012 Guillaume Laforge | The views and opinions expressed here are mine and don't reflect the ones from my employer.