Dua Lipa, Colour palettes and Machine Learning5 minute read 08 Jun 2020
I’m not sure how large the intersection of “Dua Lipa fan” and “Data Scientist” is, but we’re about to make it bigger.
For those living under a pop culture rock (like me), Dua Lipa is a pop artist who has seen a meteoric rise in prominence in the last few years. Since being introduced to Dua Lipa, I wasn’t a huge fan of her first couple tracks that charted on the radio. I thought it was slightly more refreshing than the predictable drone of pop music at the time, but ultimately I considered it rather “safe”.
Fast forward to 2020, and one album later, and my goodness is her new album something special. As a fan of disco and Daft Punk, I think Dua Lipa has managed to perfectly balance ear-worm pop and the resurgence of 80s nostalgia. It’s great.
Have a listen:
Notice anything interesting? The colours!
Besides sounding great, I was just blown away by the colour palette. The use of bold Reds, Blues and Purples in conjunction with their complementary colours (no doubt meant to evoke nostalgic memories of the neon-soaked 80s) just look fantastic.
This gave me an idea – could I “map” the dominant colours of the “Break My Heart” music video to some kind of timeline?
With a little bit of transformation and machine learning, it turns out you can. It happens to produce some striking results:
From the colour stream above, you can even identify the various scenes in the video. Here are a few more interesting examples (chosen for both their visual – and audible – qualities):
half•alive - still feel
This is a particularly great example, as “still feel” is almost perfectly colour
coordinated scene-to-scene. I’m a particular sucker for the
(thanks, Bronwyn) in the scene about half way through.
And my personal favourite:
Gunship - Fly For Your Life
I particularly like the “Fly For Your Life” colour stream. There’s a really strong message told through the visuals, and a large portion of that is communicated through colour. If you squint slightly you can even imagine the underlying message embedded in the video’s colour-scape alone. It’s a wonderful piece of art, and I highly recommend you give it a watch.
Hopefully I’ve done enough to grab your attention. If you’re curious how I extract the colours from these videos, and how a little sprinkle of ML does the job, read on! Don’t worry if you’re not an expert in ML, we’ll be keeping things accessible.
So how does this all work?
On a high level, this techniques works as follows:
- Split the video into a sequence of images.
- Extract the dominant colour from each image.
- Append each dominant colour together to create a colour-sequence representing the video.
Step 1 is conceptually quite easy to understand, so I’m not going to cover it deeply here.
For those interested in the technical details: I used
youtube-dl to download
the video, and then used
ffmpeg with the following command to split the video into images:
ffmpeg -i input.mp4 -crf 0 -vf fps=15 out_%05d.jpg
The interesting bit, and where I want to spend most of my time, is step 2. This is the bit where we sprinkle in some ML to extract the dominant colours. But first, some brief colour theory.
Generally, a digital image is encoded using the RGB colour model. Essentially, this means that each pixel is represented by an additive blend of different amount of Red, Green and Blue:
This allows us to represent a fairly large spectrum of colours. From a data-perspective, however, we can also choose to see each pixel as a datapoint that has three dimensions or “features”.
To illustrate this, consider the following screen capture from Dua Lipa’s music video:
If we take each pixel in this image, and treat it like a three-dimensional data point (where each dimension represents the amount of of Red, Green and Blue), we can create a plot that shows “where” each pixel exists in three-dimensional space:
While a conceptually simple, notice how similar colours are physically “close” to each other? That’s important when it comes to “clustering” similar colours together.
In machine learning, clustering is the concept and task of grouping together similar data points into “classes”, usually based on their similarities. There is a plethora of clustering algorithms out there (it’s an entire field) . We’ll be using by far the most commonly-encountered algorithm out there, K-means.
I’m going to skip over the technical details of exactly how the K-means algorithm works, since it’s been done million times over by people smarter than myself. The important thing to understand is that the K-means algorithm will try its best to sort \(n\) data points into \(k\) clusters. In other words, given our data, we ask the algorithm to cluster together the data points into \(k\) groups or “clusters”.
As an example, let’s again look at the pixels we looked at earlier. (To make things easier to understand, I’ve just projected the pixels down to a 2D plane):
If we feed these data points into K-Means, and ask it to find \(k=5\) clusters, we get the following result:
Notice how the cluster centers or centroids are located within the center of the naturally-occuring groups of colours? If we take a look at the pixel colours again, along with the centroids, we see that each “center” falls remarkably close to the dominant colours within the image:
You can see a centroid near:
- The whites / greys, from Dua’s skirt
- Dark blues, from the darker portions of the background wall
- Lighter blues, from the lighter portions of the background wall and cityscape
- Reds, from the shelf
- and Yellows / Purples from the cushion and Dua’s skin and hair.
If we retrieve the values of the closest pixel to each centroid, we essentially extract the dominant colours of the image.
It’s useful to stop here if you only wish to extract the colour palette from a still image, but we’re after the most dominant colour at each frame of the video. Finding the most dominant colour is simple: we consider the pixel closest to the centroid of the largest cluster (i.e. with the most pixels assigned to it) as the dominant colour:
In this case, the dominant colour comes from Cluster 0, which is #032040, and has apparently been named “Bottom of the Unknown” by the internet.
To produce our final colour sequences (Step 3), we just rinse-and-repeat this process for each image frame from the video, and stitch together each dominant colour, 1-pixel at a time. Nice!
Today we covered the resurgence of the disco audioscape, some brief colour theory and how to extract dominant colours from both images and videos using the K-Means algorithm.
Thanks for reading along!
Till next time, Michael.
Update: Code is available in a Jupyter Notebook, here.