Covid learning: Machine Learning applied to music generation with Magenta

I missed this talk from Alexandre Dubreuil, when attending Devoxx Belgium 2019, but I had the chance to watch while doing my elliptical bike run, confined at home. It's about applying Machine Learning to music generation, thanks to the Magenta project, which is based on Tensorflow.

I like playing music (a bit of piano & guitar) once in a while, so as a geek, I've also always been interested in computer generated music. And it's hard to generate music that actually sounds pleasant to the ear! Alexandre explains that it's hard to encode the rules a computer could follow to play music, but that machine learning is pretty interesting, as it's able to learn complex functions, thus understanding what does sound good.

He, then, covers the various types of music representations, like MIDI scores which are quite light in terms of data, and audio waves which on the high end of data as there are thousands of data points representing the position on the wave along the time axis. While MIDI represents a note of music, audio waves really represent the sound physically as a wave (of data points).

Note that in the following part of the article, I'm not an ML / AI expert, so I'm just trying to explain what I actually understood :-)

For MIDI, Recurrent Neural Networks (RNN) make sense, as they work on sequences for the input and output, and also have the ability to remember past information. And that's great as you find recurring patterns in music (series of chords, main song lines, etc.) 

RNN tend to forget progressively those past events, so those networks often use Long-Short-Term-Memory to keep some of their memory fresh.

Variational Auto-Encoders are a pair of networks that diminish the dimensions of outputs compared to the quantity in input, but to then re-expand back to the same size of output. So VAEs try to actually generate back something that's close to what was initially given in input, but it events to reproduce similar patterns.

For audio waves, Magenta comes with a Convolutional Neural Network (CNN) called WaveNet, that's used for example for voice generation on devices like Google Home. There are WaveNet Auto-Encoders that also generate audio waves, because it can learn to generate the actual sound of instruments, or create totally new instruments, or mixes of sounds. Alexandre shows some cool demos of weird instruments made of cat sounds and musical instruments.

Magenta comes with various RNNs for drums, melody, polyphony, performance. With auto-encoders for WaveNet and MIDI too. There's also a Generative Adversarial Network (GAN) for audio waves. GANs are often used for generating things like pictures, for example. 

The demos in this presentation are quite cool, with creating new instruments (cat + musical instrument), or for generating sequences of notes (drum score, melody score)

Alexandre ends the presentation with pointers to things like data sets of music, as neural networks further need to learn about style, performance, and networks need plenty of time to learn from existing music and instrument sounds, so as to create something nice to hear! He shows briefly some other cool demos using TensorFlow.js, so that it works in the browser and that you can more easily experiment with music generation.

Also, Alexandre wrote the book "Hands-On Music Generation with Magenta", so if you want to dive deeper, there's much to read and experiment with!

Covid learning: HTML semantic tags

We all know about HTML 5, right? Well, I knew about some of the new semantic tags, like header / nav / main / article / aside / footer, but I'm still falling down to using tons of divs and spans instead. So as I want to refresh that blog at some point, it was time I revise those semantic tags. Let's take the little time we have during confinement to learn something!

There are likely plenty of videos of the topic, but this one was in my top results, so I watched:
HTML & CSS Crash Course Tutorial #6 - HTML 5 Semantics. It's part of a series of videos on the topic of HTML & CSS by the Net Ninja. This particular episode was covering the topic of the semantic tags:


So you have a main tag that wraps the meaty content of your page (ie. not stuff like header / footer / navigation). Inside, you would put articles, that wrap each piece of content (a blog post, a news article, etc). Sections tend to be for grouping some other information, like a list of resources, some contact info. Asides can be related content like similar articles, or something somewhat related to your current article (perhaps a short bio of a character you're mentioning in your article?) In the header section, you'd put the title of your site, the navigation. The footer will contain your contact info.

Here's a basic structure of how those tags are organised:

After an explanation of those tags, the author does a live demo, building up a web page with all those tags. So it was a good refresher for me to remember how to use those tags, rather than nesting div after div!

Covid learning: Modern Web Game Development

Next in my series of videos while doing sports at home, I watched this video from my colleague Tom Greenaway! It's about modern web game development, and was recorded last year at Google I/O.

There are big gaming platforms, like Sony's PlayStation, Microsoft's XBox, Nintendo Switch, as well as plenty of mobile games on Android and iOS. But the Web itself, within your browser, is also a great platform for developing and publishing games! There's all that's needed for good games!

Tom explains that you need a functioning game (runs well on device, looks good, sounds good). And today, most of the game engines you can use for developing games actually provide an HTML5 target. You need users, and you need to have a good monetisation strategy. The web already provides all the right APIs for nice graphics, sound mixing, etc, and its a very open platform for spreading virally.

It was pretty interesting to hear about one of the key advantages of the web: it's URLs! You can be pretty creative with URLs. A game can create a URL for a given game session, for a particular state in a game, for inviting others to join.

In addition to game engines with a web target, Tom mentions also that it's possible to port games from C/C++ for example, to JavaScript in the browser, with a tool like Emscripten. Even things like OpenGL 3D rendering can be translated into WebGL. But he also advises to look at WebAssembly, as it's really become the new approach to native performance in the browser. He mentioned construct, it's basically the Box2D game engine, but optimised for WebAssembly.

For 3D graphics, for the web, the future lies in WebGPU, which is a more modern take on WebGL and OpenGL. For audio, there's the Web Audio APIs and worklets which allows you to even create effects in JavaScript or WebAssembly. But there are other useful APIs for game development, like the Gamepad API, the Gyroscope API, etc. 

For getting users, ensure that your game is fun of course, but also make it fast, in particular load fast, to avoid using users even before you actually got them to load the game! But you also need to think about this user acquisition loop: make the game load and start fast to enter the action right away, so you're really pulled in in the game, and that's then a good reason for users to share this new cool games with others. Of course, being featured on game sites & libraries helps, it gives a big boost, but it's not necessarily what will make you earn the most in the long run. Tom also shares various examples of games that were successful and worked well.

Covid learning: decoding a QR code by hand!

Doing sport at home on a treadmill or an elliptical bike is pretty boring, when you're confined, so to make things more interesting, I'm watching some videos to learn something new while exercising. This time, I found this old video about how to decode QR codes... by hand! Have you ever thought how these were encoded?

This video comes from the Robomatics YouTube channel. You recognise easily QR codes thanks to the 3 big squares with the inner white line. I knew about the purple dotted lines that was fixed in those patterns. What I didn't know however was that there's a mask that is applied to the data, to avoid QR codes to potentially look all white or black. It was interesting to see also how the bytes were encoded: how they follow a path through out the matrix. 

However, what this video doesn't cover, for example, is how error correction is working. You might have some holes or a bad picture of a QR code, but still being able to decode it with some level of loss of data. So I'll have to learn how that part works some day!

Covid learning: Defence Against the Docker Arts by Joe Kutner

Confined at home, because of the corona-virus pandemic, I'm also doing sport at home. I have a small treadmill for light walks (mostly during conf calls!) and also an elliptical bike. I'd much rather run outside though, but I have to use what I have, even if I hate that stationary elliptical bike in my basement. It's so boring! So to avoid feeling like wasting my time, I decided to watch videos during my sessions! Not necessarily series on Netflix. No! But interesting technical videos. So today, I'd like to share with you a series of posts on those interesting videos I'm watching while exercising.

Today, thanks to the wonderful Joe Kutner, from Heroku, I learned about the Defence Against the Docker Arts! It was recorded at Devoxx Belgium 2019.

Joe starts with clearly differentiating Docker and Dockerfiles. Docker is an ecosystem, while Dockerfiles describe docker container images. An important distinction. The first part of the video, shows best practices on how to writer proper Dockerfiles, and references an article on the Docker blog post from last year on that topic: 
  • use official base images, rather than reinventing the wheel, as they are usually more up-to-date and secure
  • remember that images are built in layers, so to speed up your builds, ensure that the base layers are the ones that change less, and keep your source file changes in the last layer as they change the most
  • join several RUN commands into one by binding them together with ampersands
  • be explicit about the version of base images you use
  • try to chose minimal flavors of base images, as they can be pretty big
  • build from source in a consistent environment, so that developers are on the same page, with the same version of their environment (build tool, runtime versions)
  • fetch dependencies in a separate step, so dependencies are cached in their own layer
  • use multi-staged build to remove build dependencies not needed at runtime
That's a lot of things to know! Joe then moves on to talk about higher-level approaches, starting with the BuildKit Go library. It's more for platform developers than developers, but it gives you lots of advanced controls on how to build docker images.

Joe introduces Jib which is a build plugin (both for Maven and Gradle) that let developers focus on writing and building their apps, but letting that plugin create properly layered docker images for you, using minimal base images. You can even build without a local Docker daemon.

After BuildKit and Jib, Joe talks about the new Cloud Native BuildPacks, a tool that builds OCI images from source, cleverly. There are buildpacks for plenty of platforms and runtimes, not just Java. Those new cloud native buildpacks build upon years of experience from Heroku and CloudFoundry, on the first version of the buildpack concept. Joe says that buildpacks are reusable, fast, modular and safe, and goes on to show the power of this approach that allowed Heroku, for instance, to safely and rapidly upgrade Heartbleed affected images by replacing the underlying OS with a patched / upgraded version, thanks to image rebasing.
© 2012 Guillaume Laforge | The views and opinions expressed here are mine and don't reflect the ones from my employer.