Visualising colour themes in popular movies using Clojure

Was it just me, or did the Harry Potter films get darker in tone as the kids got older? Let's find out. By taking each frame of each movie and compressing it down to 5x5 pixels and then averaging all of the frames (trimming the start and end to avoid credits) you can produce a single block colour:

Harry Potter and the Philosopher's Stone (2001) Harry Potter and the Philosopher's Stone converted to a single, average colour
Harry Potter and the Chamber of Secrets (2002) Harry Potter and the Chamber of Secrets converted to a single, average colour
Harry Potter and the Prisoner of Azkaban (2004) Harry Potter and the Prisoner of Azkaban converted to a single average colour
Harry Potter and the Goblet of Fire (2005) Harry Potter and the Goblet of Fire converted to a single, average colour
Harry Potter and the Order of the Phoenix (2007) Harry Potter and the Order of the Phoenix converted to a single, average colour
Harry Potter and the Half-Blood Prince (2009) Harry Potter and the Half-Blood Prince converted to a single, average colour
Harry Potter and the Deathly Hallows Part 1 (2010) Harry Potter and the Deathly Hallows Part 1 converted to a single, average colour
Harry Potter and the Deathly Hallows Part 2 (2011) Harry Potter and the Deathly Hallows Part 2 converted to a single, average colour

Below is the series of images as an animated gif, with the films position in the series in the bottom right hand corner:

An animation showing each Harry Potter film as a single, averaged block of colour

If you generate a composited image from shrunken, averaged frames from the movie, and analyse it to calculate the average colour use for each film, you can show the overall darkening in tone as below:

Graph showing proportion of an averaged image representing each film in the Harry Potter series which was pure black

Graph showing proportion of an averaged image representing each film in the Harry Potter series which was dark where dark is defined as RGB all less than 20

So, in summary, the films did get noticeably darker - there are certainly more browns and greens in the early movies, whereas the last two are over 50% ‘dark’. A more impressive series of images highlighting the transition is included later.

It turns out I’m not the first person to ask this question and, after a couple of weeks of work, I stumbled across a number of resources, including this Reddit thread basically achieving the same result. I’ll walk you through my approach.

Given a local copy of each movie converted to an MP4 format, we can extract each frame and visualise the changing colour palette each director chose to use as the films progress. The choice of colour palette tends to have a large impact on the 'mood' of the film as perceived by the audience. As such, it is often a good proxy for the tone of the content.

Capturing a single frame

When planning how to achieve these results I anticipated a fair amount of matrix manipulation, or conversion of sequences between formats, and so I chose to use Clojure. It helps that it's my language of choice anyway, and is used in my day job at uSwitch. The Java interop is an added bonus, particularly as there are very useful libraries out there, such as JavaCV, which provides a suite of tools for image capture and manipulation.

Below is an example which captures a single frame of video and returns it as a BufferedImage:

(import [org.bytedeco.javacv FFmpegFrameGrabber Java2DFrameConverter])

(defn buffered-image
  [path]
  (let [g (FFmpegFrameGrabber. path)
        c (Java2DFrameConverter.)]
    (.start g)
    (.getBufferedImage c (.grab g))))

As Java is typically stateful, we have to start and stop the FFmpegFrameGrabber to avoid memory overhead and to ensure resources are appropriately released. To abstract these steps away, we can write a macro to make working with image streams a little easier:

(defmacro with-image-grabber
  [grabber-binding & body]
  `(let ~(subvec grabber-binding 0 2)
     (do (.start ~(grabber-binding 0))
         (let [result# ~@body]
           (.stop ~(grabber-binding 0))
           result#))))

This allows us to invoke the image grabber as follows in this example, which returns the length of a video in frames:

(defn get-film-length
  [film-path]
  (image/with-image-grabber [g (FFmpegFrameGrabber. film-path)]
    (.getLengthInFrames g)))

Turning a frame into a pixel

My original plan was to iterate through each pixel in each frame, summing the colour components and then dividing by the size of the frame to calculate the average colour. It’s relatively easy to get a byte array of pixel data from a BufferedImage:

(byte-array (.. buffered-image getRaster getDataBuffer getData))

Each byte represents the value of a single colour channel of the image. Most images have 4 channels; red, blue, green and alpha. To determine whether or not an image includes an alpha channel, we can simply ask it:

(not (nil? (.getAlphaRaster buffered-image)))

If there is an alpha channel, each pixel will be represented by 4 bytes, allowing us to partition the byte array by pixel and to convert each pixel to a vector of integer colour values:

(defn byte-array->rgb
  [colours]
  (reverse (map #(bit-and % 0xff) colours)))

(defn bytes->rgb
  [image-has-alpha? image-data-as-byte-array]
  (let [partitioned-bytes   (partition (if image-has-alpha? 4 3) image-data-as-byte-array)]
    (map byte-array->rgb partitioned-bytes)))

(defn buffered-image->int-array
  [buffered-image]
  (let [image-data-as-byte-array (byte-array (.. buffered-image getRaster getDataBuffer getData))
        image-has-alpha?         (not (nil? (.getAlphaRaster buffered-image)))]
    (bytes->rgb image-has-alpha? image-data-as-byte-array)))

This turns a colour, represented by Java as a 32-bit value into an array, such as [128 128 128 255], which represents this lovely pink colour:

A solid pink colour represented by RGB 128 128 128 255

To get an idea of the average (mean) colour of a frame, we would first calculate the int-array, and sum the values and then divide by the count:

(defn average-pixels
  [pixel-values]
  (let [pixel-count (double (count pixel-values))]
    (map #(int (/ % pixel-count)) (apply (partial map +) pixel-values))))

Being able to view the pixel values as ints in the range 0-255 made development easier but was clearly not the optimal way to calculate the average colour value. This process takes several seconds to complete. Given we’d have to calculate this for every frame (24/second), assuming we could average 1 frame/second, we’d be able to process about 60 seconds of material every 24 hours.

I was about to investigate a more optimal algorithm when it occurred to me that what I was actually doing was resizing the image down to 1 pixel using averaging. After some exploration of the tooling already available, it turned out this is efficiently implemented already on the Graphics class, which exposes an editable component of the BufferedImage. The pixels are still averaged but in a much more efficient manner:

(defn- calculate-final-height
  [desired-width scaled-width scaled-height frames-to-capture]
  (let [total-length     (* scaled-width frames-to-capture)
        number-of-lines  (inc (int (/ total-length desired-width)))]
    (* scaled-height number-of-lines)))

(defn- scale-image
  [graphics image-to-scale scaled-width scaled-height x-position y-position]
  (.drawImage graphics image-to-scale x-position y-position scaled-width scaled-height nil))

(defn- write-tiled-images
  [frames-to-capture image-grabber scaled-width scaled-height desired-width new-image-graphics]
  (doseq [i (range frames-to-capture)]
    (when-let [frame (film/get-next-frame-as-buffered-image image-grabber)]
      (let  [[x-offset y-offset] (image/calculate-offset i scaled-width scaled-height desired-width)]
        (scale-image new-image-graphics frame scaled-width scaled-height x-offset y-offset)))))

(defn create-tiled-image
  [film-title film-path frames-to-capture scaled-width desired-width]
  (image/with-image-grabber [g (FFmpegFrameGrabber. film-path)]
    (let [[image-width image-height]   (film/frame-dimensions g)
          scaled-height                (image/scaled-height-preserving-aspect-ratio image-width image-height scaled-width)
          final-height                 (calculate-final-height desired-width scaled-width scaled-height frames-to-capture)
          new-image                    (image/new-image desired-width final-height)
          new-image-graphics           (.createGraphics new-image)]
      (write-tiled-images frames-to-capture g scaled-width scaled-height desired-width new-image-graphics)
      (.dispose new-image-graphics)
      (image/write-image new-image film-title scaled-width))))

There’s quite a lot going on here, but the key part is the scale-image function which, given a graphics component of a BufferedImage, will draw a different image to x and y coordinates, scaled to a height and width as follows:

(.drawImage new-image-graphics frame x-offset y-offset scale-factor scale-factor nil)

Be resizing each frame to 1 pixel and tiling them, we can produce an image like below, which shows Skyfall (2012):

Skyfall compressed so that each frame of the film is averaged to a single pixel

Scaling Mechanism

Java provides a number of mechanisms for scaling an image based on your priorities - most of the time, the key decision is performance vs ‘quality’ for some definition of quality, which is usually centred on how much of the image is discarded when scaling down or how interpolation works when scaling up.

Fast image scaling is usually achieved using a nearest neighbour algorithm; additional blank pixels are added between the existing pixels and the colour of the pixel nearest each blank pixel is copied over:

Below is an example of an image scaled up using the nearest neighbour algorithm

The uSwitch logo
Original Image

The uSwitch logo scaled to double size
Scaled x 2

The uSwitch logo scaled to 4 times size
Scaled x 4

This leads to relatively decent results although jagged artifacts are obvious, particularly on curved edges. An image with a wider range of colour would also end up looking ‘blockier’.

What’s more interesting for our purposes is seeing what happens when the image is scaled down:

The uSwitch logo
Original Image

Scaled to 1/2 size
Scaled x 0.5

Scaled to 1/4 size
Scaled x 0.25

There are a number of alternative image-scaling techniques which interpolate the pixel data in a variety of ways. The most common (and default options in the Java graphics class) are bicubic and bilinear. Below is a comparison of the results of scaling the original image down through each different method:

Original Image The uSwitch logo
Scaled x 0.25 using nearest neighbour The uSwitch logo scaled to 1/4 size using nearest neighbour scaling
Scaled x 0.25 using bilinear interpolation The uSwitch logo scaled to 1/4 size using bilinear scaling
Scaled x 0.25 using bicubic interpolation The uSwitch logo scaled to 1/4 size using bicubic scaling

This excellent article provides detail of the different options available for scaling. The method of scaling can be trivially achieved using the .setRenderingHints method on the Graphics class, meaning we could write something as follows to scale images up or down in Clojure using bilinear interpolation:

(defn scale-image
  [graphics image-to-scale scaled-width scaled-height]
  (.drawImage graphics image-to-scale 0 0 scaled-width scaled-height nil))

(defn resize-image
  [path scale-factor]
  (let [i                  (read-image path)
        new-width          (* scale-factor (.getWidth i))
        new-height         (* scale-factor (.getHeight i))
        new-image          (image/new-image new-width new-height)
        new-image-graphics (.createGraphics new-image)]
    (.setRenderingHint new-image-graphics RenderingHints/KEY_INTERPOLATION RenderingHints/VALUE_INTERPOLATION_BILINEAR)
    (.setRenderingHint new-image-graphics RenderingHints/KEY_RENDERING RenderingHints/VALUE_RENDER_QUALITY)
    (scale-image new-image-graphics i new-width new-height)
    (.dispose new-image-graphics)
    (image/write-image new-image)))

Increasing the size of each pixel

My next thought was to see if I could generate more impressive images by resizing each frame to a small 5x5 size, retaining a small amount of the detail of the initial frame. After experimenting with the different scaling hints possible, and after doing some research, I came across a number of users pointing me to the Thumbnailinator library, which abstracts the action of choosing an optimal algorithm away from the user based on the size of the desired final image.

Integration was relatively easy, and our create-tiled-image function doesn’t actually need to change that much - we just need to create an instance of the ThumbnailMaker which we'll skip over for now:

(defn create-tiled-image
  [film-title film-path frames-to-capture scaled-width desired-width]
  (image/with-image-grabber [g (FFmpegFrameGrabber. film-path)]
    (let [[image-width image-height]   (film/frame-dimensions g)
          scaled-height                (image/scaled-height-preserving-aspect-ratio image-width image-height scaled-width)
          final-height                 (calculate-final-height desired-width scaled-width scaled-height frames-to-capture)
          thumbnail-maker              (image/get-thumbnail-maker image-width image-height scaled-width scaled-height)
          new-image                    (image/new-image desired-width final-height)
          new-image-graphics           (.createGraphics new-image)]
      (write-tiled-images frames-to-capture g scaled-width scaled-height desired-width new-image-graphics thumbnail-maker)
      (.dispose new-image-graphics)
      (image/write-image new-image film-title scaled-width))))

The key change is to the scale-image function

(defn- scale-image
  [graphics image-to-scale scaled-width scaled-height x-position y-position thumbnail-maker]
  (let [resized-image       (image/scale thumbnail-maker image-to-scale)]
    (.drawImage graphics resized-image x-position y-position nil)))

And here are the appropriate parts of the image namespace:

(defn calculate-offset
  [i scaled-width scaled-height desired-width]
  [(mod (* i scaled-width) desired-width)
   (* scaled-height (int (/ (* i scaled-width) desired-width)))])

(defn dimension
  [width height]
  (Dimension. width height))

(defn- get-resizer
  [image-width image-height intended-width intended-height]
  (.getResizer (DefaultResizerFactory/getInstance)
               (dimension image-width image-height) 
               (dimension intended-width intended-height)))

(defn scaled-height-preserving-aspect-ratio
  [image-width image-height scaled-width]
  (inc (int (* (/ scaled-width image-width) image-height))))

(defn get-thumbnail-maker
  [image-width image-height intended-width intended-height]
  (.resizer (FixedSizeThumbnailMaker. intended-width intended-height false true)
            (get-resizer image-width image-height intended-width intended-height)))

(defn scale
  [thumbnail-maker buffered-image-to-scale]
  (.make thumbnail-maker buffered-image-to-scale)))

This methodology allows us to produce results such as this:

2001: A Space Oddysey compressed so that each frame is a single 5x5 pixel This image, composed of frames from 2001: A Space Oddysey (1968), highlights Kubrick’s use of long static shots, indicated by the large stretches of regular colour.

On the other hand, The Matrix (1999) is a great study in colour delineation, with scenes in the Matrix tinged with green (or white in the case of scenes set in The Construct), whereas scenes in the ‘real world’ are coloured blue.

Here are some other favourites, the interpretation of which I will leave as an exercise for the reader:

Tron Legacy (2010)

Tron Legacy compressed so that each frame is a single 5x5 pixel

Vertigo (1958)

Vertigo compressed so that each frame is a single 5x5 pixel

Ghost in the Shell (1995)

Ghost in the Shell compressed so that each frame is a single 5x5 pixel

Next Steps

Colour in films is rarely accidental. Filmmaking is an art form and, in most filmmaking, barring indie movements such as Dogme 95, nothing in a shot is there by accident. Further, a director will choose a colour palette which they feel best conveys the most appropriate mood for a scene. Sometimes colours can be used to provide a clear delineation between ideas - a technique dating back to the ‘black hat’ and ‘white hat’ Westerns where hat colour differentiated the good guys from the bad guys.

Modern filmmaking has the advantage of being able to change the colours in a scene in post-production, allowing even more subtle manipulations of the audience’s perceptions.

As such, mining colour information for sentiment, tone or even genre is an interesting possibility. I hope to post a follow up exploring some or all of these areas.

Results

All of the code is available on Github

Harry Potter and the Philosopher's Stone (2001)

Harry Potter and the Philosopher's Stone compressed so that each frame is a single 5x5 pixel

Harry Potter and the Chamber of Secrets (2002)

Harry Potter and the Chamber of Secrets compressed so that each frame is a single 5x5 pixel

Harry Potter and the Prisoner of Azkaban (2004)

Harry Potter and the Prisoner of Azkaban compressed so that each frame is a single 5x5 pixel

Harry Potter and the Goblet of Fire (2005)

Harry Potter and the Goblet of Fire compressed so that each frame is a single 5x5 pixel

Harry Potter and the Order of the Phoenix (2007)

Harry Potter and the Order of the Phoenix compressed so that each frame is a single 5x5 pixel

Harry Potter and the Half-Blood Prince (2009)

Harry Potter and the Half-Blood Prince compressed so that each frame is a single 5x5 pixel

Harry Potter and the Deathly Hallows Part 1 (2010)

Harry Potter and the Deathly Hallows Part 1 compressed so that each frame is a single 5x5 pixel

Harry Potter and the Deathly Hallows Part 2 (2011)

Harry Potter and the Deathly Hallows Part 2 compressed so that each frame is a single 5x5 pixel