Day #12 with Cloud Workflows: loops and iterations

In previous episodes of this Cloud Workflows series, we’ve learned about variable assignment, data structures like arrays, jumps and switch conditions to move between steps, and expressions to do some computations, including potentially some built-in functions. 


With all these previous learnings, we are now equipped with all the tools to let us create loops and iterations, like for example, iterating over the element of an array, perhaps to call an API several times but with different arguments. So let’s see how to create such an iteration!




First of all, let’s prepare some variable assignments:


- define:
    assign:
        - array: ['Google', 'Cloud', 'Workflows']
        - result: ""
        - i: 0


  • The array variable will hold the values we’ll be iterating over.

  • The result variable contains a string to which we’ll append each values from the array.

  • And the i variable is an index, to know our position in the array.


Next, like in a for loop of programming languages, we need to prepare a condition for the loop to finish. We’ll do that in a dedicated step:


- checkCondition:
    switch:
        - condition: ${i < len(array)}
          next: iterate
    next: returnResult


We define a switch, with a condition expression that compares the current index position with the length of the array, using the built-in len() function. If the condition is true, we’ll go to an iterate step. If it’s false, we’ll go to the ending step (called returnResult here).


Let’s tackle the iteration body itself. Here, it’s quite simple, as we’re just assigning new values to the variables: we append the i-th element of the array into the result variable, and we increment the index by one. Then we go back to the checkCondition step.


- iterate:
    assign:
        - result: ${result + array[i] + " "}
        - i: ${i+1}
    next: checkCondition


Note that if we were doing something more convoluted, for example calling an HTTP endpoint with the element of the array as argument, we would need two steps: one for the actual HTTP endpoint call, and one for incrementing the index value. However in the example above, we’re only assigning variables, so we did the whole body of the iteration in this simple assignment step.


When going through the checkCondition step, if the condition is not met (ie. we’ve reached the end of the array), then we’re redirected to the returnResult step:


- returnResult:
    return: ${result}


This final step simply returns the value of the result variable.


Days #11 with Cloud Workflows: sleeping in a workflow

Workflows are not necessarily instantaneous, and executions can span over a long period of time. Some steps may potentially launch asynchronous operations, which might take seconds or minutes to finish, but you are not notified when the process is over. So when you want for something to finish, for example before polling again to check the status of the async operation, you can introduce a sleep operation in your workflows.



To introduce a sleep operation, add a step in the workflow with a call to the built-in sleep operation:


- someSleep:
    call: sys.sleep
    args:
        seconds: 10
- returnOutput:
    return: We waited for 10 seconds!


A sleep operation takes a seconds argument, where you can specify the number of seconds to wait.


By combining conditional jumps and sleep operations, you can easily implement polling some resource or API at a regular interval, to double check that it completed.


Day #10 with Cloud Workflows: accessing built-in environment variables

Google Cloud Workflows offers a few built-in environment variables that are accessible from your workflow executions.




There are currently 5 environment variables that are defined:

  • GOOGLE_CLOUD_PROJECT_NUMBER: The workflow project's number.

  • GOOGLE_CLOUD_PROJECT_ID: The workflow project's identifier.

  • GOOGLE_CLOUD_LOCATION: The workflow's location.

  • GOOGLE_CLOUD_WORKFLOW_ID: The workflow's identifier.

  • GOOGLE_CLOUD_WORKFLOW_REVISION_ID: The workflow's revision identifier.


Let’s see how to access them from our workflow definition:


- envVars:
    assign:
      - projectID: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
      - projectNum: ${sys.get_env("GOOGLE_CLOUD_PROJECT_NUMBER")}
      - projectLocation: ${sys.get_env("GOOGLE_CLOUD_LOCATION")}
      - workflowID: ${sys.get_env("GOOGLE_CLOUD_WORKFLOW_ID")}
      - workflowRev: ${sys.get_env("GOOGLE_CLOUD_WORKFLOW_REVISION_ID")}
- output:
    return: ${projectID + " " + projectNum + " " + projectLocation + " " + workflowID + " " + workflowRev}


We use the built-in sys.get_env() function to access those variables. We’ll revisit the various existing built-in functions in later episodes.


Then when you execute this workflow, you’ll get an output like this:


"workflows-days 783331365595 europe-west4 w10-builtin-env-vars 000001-3af"


There’s one variable I’d like to see added to this list, that would be the current execution ID. That could potentially be useful for identifying a particular execution, when looking in the logs, to reason about potential failure, or for auditing purposes.


Day #9 with Cloud Workflows: deploying and executing workflows from the command-line

So far, in this series on Cloud Workflows, we’ve only used the Google Cloud Console UI to manage our workflow definitions, and their executions. But it’s also possible to deploy new definitions and update existing ones from the command-line, using the GCloud SDK. Let’s see how to do that!



If you don’t already have an existing service account, you should create one following these instructions. I’m going to use the workflow-sa service account I created for the purpose of this demonstration.                                                                                                                     


Our workflow definition is a simple “hello world” like the one we created for day #1 of our exploration of Google Cloud Workflows:


- hello:
    return: Hello from gcloud!


To deploy this workflow definition, we’ll launch the following gcloud command, specifying the name of our workflow, passing the local source definition, and the service account:


$ gcloud beta workflows deploy w09-new-workflow-from-cli \
    --source=w09-hello-from-gcloud.yaml \
    --service-account=workflow-sa@workflows-days.iam.gserviceaccount.com


You can also add labels with the --labels flag, and a description with the --description flag, just like in the Google Cloud Console UI. 


If you want to update the workflow definition, this is also the same command to invoke, passing the new version of your definition file.


Time to create an execution of our workflow!


$ gcloud beta workflows run w09-new-workflow-from-cli


You will see an output similar to this:


Waiting for execution [d4a3f4d4-db45-48dc-9c02-d25a05b0e0ed] to complete...done.
argument: 'null'
endTime: '2020-12-16T11:32:25.663937037Z'
name: projects/783331365595/locations/us-central1/workflows/w09-new-workflow-from-cli/executions/d4a3f4d4-db45-48dc-9c02-d25a05b0e0ed
result: '"Hello from gcloud!"'
startTime: '2020-12-16T11:32:25.526194298Z'
state: SUCCEEDED
workflowRevisionId: 000001-47f

Our workflow being very simple, it executed and completed right away, hence why you see the result string (our Hello from gcloud! message), as well as the state as SUCCEEDED. However, workflows often take longer to execute, consisting of many steps. If the workflow hasn’t yet completed, you’ll see its status as ACTIVE instead, or potentially FAILED if something went wrong.


When the workflow takes a long time to complete, you can check the status of the last execution from your shell session with:


$ gcloud beta workflows executions describe-last


If you want to know about the ongoing workflow executions:


$ gcloud beta workflows executions list your-workflow-name


It’ll give you a list of operation IDs for those ongoing executions. You can then inspect a particular one with:


$ gcloud beta workflows executions describe the-operation-id


There are other operations on executions, to wait for an execution to finish, or even cancel an ongoing, unfinished execution. 


You can learn more about workflow execution in the documentation. And in some upcoming episodes, we’ll also have a look at how to create workflow executions from client libraries, and from the Cloud Workflows REST API.


Day #8 with Cloud Workflows: calling an HTTP endpoint

Time to do something pretty handy: calling an HTTP endpoint, from your Google Cloud Workflows definitions. Whether calling GCP specific APIs such as the ML APIs, REST APIs of other products like Cloud Firestore, or when calling your own services, third-party, external APIs, this capability lets you plug your business processes to the external world!


Let’s see calling HTTP endpoints in action in the following video, before diving into the details below:




By default, when creating a new workflow definition, a default snippet / example is provided for your inspiration. We’ll take a look at it for this article. There are actually two HTTP endpoint calls, the latter depending on the former: the first step (getCurrentTime) is a cloud function returning the day of the week, whereas the second step (readWikipedia) searches Wikipedia for articles about that day of the week.


- getCurrentTime:
    call: http.get
    args:
        url: https://us-central1-workflowsample.cloudfunctions.net/datetime
    result: CurrentDateTime


The getCurrentTime step contains a call attribute of type http.get, to make HTTP GET requests to an API endpoint. You have the ability to do either call: http.get or call: http.post. For other methods, you’ll have to do call: http.request, and add another key/value pair under args, with method: GET, POST, PATCH or DELETE. Under args, for now, we’ll just put the URL of our HTTP endpoint. The last key will be the result, which gives the name of a new variable that will contain the response of our HTTP request.


Let’s call Wikipedia with our day of the week search query:


- readWikipedia:
    call: http.get
    args:
        url: https://en.wikipedia.org/w/api.php
        query:
            action: opensearch
            search: ${CurrentDateTime.body.dayOfTheWeek}
    result: WikiResult


Same thing with call, and args.url, however, we have a query where you can define the query parameters for the Wikipedia API. Also note how we can pass data from the previous step function invocation: CurrentDateTime.body.dayOfTheWeek. We retrieve the body of the response of the previous call, and from there, we get the dayOfTheWeek key in the resulting JSON document. We then return WikiResult, which is the response of that new API endpoint call.


- returnOutput:
    return: ${WikiResult.body[1]}


Then, the last step is here to return the result of our search. We retrieve the body of the response. The response’s body is an array, with a first term being the search query, and the second item is the following array of document names, which is what our workflow execution will return:


[
  "Monday",
  "Monday Night Football",
  "Monday Night Wars",
  "Monday Night Countdown",
  "Monday Morning (newsletter)",
  "Monday Night Golf",
  "Monday Mornings",
  "Monday (The X-Files)",
  "Monday's Child",
  "Monday.com"
]


So our whole workflow was able to orchestrate two independent API endpoints, one after the other. Instead of having two APIs that are coupled via some messaging passing mechanism, or worse, via explicit calls to one or the other, Cloud Workflows is here to organize those two calls. It’s the orchestration approach, instead of a choreography of services (see my previous article on orchestration vs choreography, and my colleague’s article on better service orchestration with Cloud Workflows).


To come back to the details of API endpoint calls, here’s their structure:


- STEP_NAME:
    call: {http.get|http.post|http.request}
    args:
        url: URL_VALUE
        [method: REQUEST_METHOD]
        [headers:
            KEY:VALUE ...]
        [body:
            KEY:VALUE ...]
        [query:
            KEY:VALUE ...]
        [auth:
            type:{OIDC|OAuth2}]
        [timeout: VALUE_IN_SECONDS]
    [result: RESPONSE_VALUE]


In addition to the URL, the method and query, note that you can pass headers and a body. There is also a built-in mechanism for authentication which works with GCP APIs: the authentication is done transparently. You can also specify a timeout in seconds, if you want to fail fast and not wait forever a response that never comes. But we’ll come back to error handling in some of our upcoming articles.


 
© 2012 Guillaume Laforge | The views and opinions expressed here are mine and don't reflect the ones from my employer.