Waveform Visualization with Ruby and Stimulus

By Exequiel Rozas

A waveform visualization feature is an interesting way to showcase audio in an application.

There are multiple ways to go around it, and we will explore a couple of them by building the feature using Stimulus, the Canvas API with JavaScript classed and Ruby to perform the waveform generation on the server.

This article is complementary to the one where we built an audio player with Stimulus, feel free to check it out if you are also looking to build an audio player.

Let's start by exploring the result of this article:

What we will build

For this application, we will generate waveform visualizations using a couple of approaches to show the different ways to tackle the problem.

We will use the Canvas API to draw the waveform by actually drawing a limited amount of the samples in a <canvas/> element.

The first and most simple approach will be to generate a convincingly looking waveform that doesn't actually represent the track. This is nice because it can improve the perceived experience for users that are not too focused on how the actual audio looks like.

The second approach will be to generate the graph from waveform data which actually represents the audio using Ruby and offsetting that part to a background job to make the process seamless.

The final result should look like this:

Application setup

We will start by creating a new Rails application:

$ rails new waves --css=tailwind

Next, we will add Avo to easily create new resources with audio uploads. Let's start by installing it:

bundle add avo && bundle install

Now let's install Avo to generate the avo.rb initializer and mount it to the route file:

bin/rails generate avo:install

Then, we will create a Episode model which we will use to later showcase how to generate the waveform data in a background job when creating a track:

bin/rails generate model Episode name data:text duration:integer

Because we've installed Avo, the previous command will add an Episode resource to our admin panel.

Then, we install the ActiveStorage migrations:

bin/rails generate active_storage:install

Then, we migrate our database:

bin/rails db:migrate

We add the proper configuration to our model:

# app/models/episode.rb
class Episode < ApplicationRecord
  has_one_attached :audio

  validates :name, presence: true
end

Now, we make sure that our Avo resource is correctly configured:

class Avo::Resources::Episode < Avo::BaseResource
  def fields
    field :id, as: :id
    field :name, as: :text
    field :duration, as: :number, hide_on: [:new, :edit]
    field :audio, as: :file
  end
end

Next, if we navigate to /avo/resources/episodes we should see an empty list for the episodes:

View of the episodes list in Avo

We can create episodes with audio uploads using Avo and Active Storage. Check our article on S3 uploads with Active Storage if you want to upload files to the cloud.

With the installation out of the way, let's learn a bit about digital audio before jumping into the actual waveform generation process.

Of course, skip to the actual generation of waveforms if you already know about the subject.

A bit about digital audio

As a physical phenomenon, sound is the propagation of vibration as an acoustic wave through a medium like air, water, or even solids like walls or metal.

Generally, when we think of sound, we think of the movement of air that's picked up by our ears and interpreted as such in our brains.

Audio, on the other hand, is the electrical representation of sound, which can be analog or digital. While sound is the actual movement of air particles, audio is recorded sound that is stored in different types of media which can be used to reproduce the sound anytime.

An analog representation of audio results in an alternating current that is analog to the original sound, hence the name.

A digital representation of audio results in a set of samples which are are then used to recreate the original waveform and because they are mostly a set of numbers, they can be easily stored, understood and manipulated with computers.

Let's dig a bit into sampling first:

Technically speaking, frequency and perception play a role in what we define as audio: inaudible frequencies like those above 20kHz are not considered audio because humans cannot hear them while other animals might.

Sampling

In digital audio, sampling is the act of capturing instantaneous sound amplitudes at a given interval of time.

The interval we choose is called the sample frequency: if it's big enough, the original analog signal can be re-created without significant losses.

The following visualization, which shows an original analog signal consisting of a sine wave first, the number of samples taken and finally the reconstructed signal, might help you understand how it works:

As we decrease the sample rate, we get a lower resolution re-constructed signal, and the contrary happens as we increase the sample rate.

But, what does this have to do with generating a waveform? Well, we will be extracting information about samples in our audio file to produce a set of points that we will use to draw the waveform.

Waveform generation

The first thing we need to learn is how to draw a waveform in the browser.

One approach could be to place a <div> with a certain height and spacing for each sample we want to represent in our waveform, but this approach puts unnecessary weight on the DOM and is not suitable if we need to make dynamic visualizations.

Instead, we will use the Canvas API to produce the desired results. Let's start by learning to draw a set of lines in a rectangle:

Drawing bars with Canvas

The first step is to define the <canvas> element in the document:

<div class="max-w-screen-lg mx-auto px-6 py-16" id="container">
  <div class="w-full h-48">
    <canvas id="canvas" class="w-full h-full"></canvas>
  </div>
</div>

This element allows us to draw shapes, paths, text, etc. within itself programmatically. To start working with it, we just need to access the canvas element which has #canvas ID and then access its 2D context which we will actually use to draw:

const container = document.getElementById("container")
const canvas = document.getElementById("canvas")
const ctx = canvas.getContext("2d")

Then, let's define the canvas width, height:

const {height, width} = container.getBoundingClientRect()
canvas.height = height
canvas.width = width

Next, let's add a fill color using the Canvas context we stored in the ctx variable:

ctx.fillStyle = '#f1f5f9';
ctx.fillRect(0, 0, width, height);

This should produce the following result:

Empty canvas with solid fill

Let's now draw one line that starts at the bottom of the canvas and goes all the way to the top, minus a margin:

const padding = 24

ctx.beginPath()
ctx.moveTo(padding, height)
ctx.lineWidth = 6;
ctx.lineCap = "round";
ctx.strokeStyle = "teal";
ctx.lineTo(padding, padding)
ctx.stroke()
ctx.closePath()

Which produces the following result:

Canvas with a single line drawn

Yes, it's not the most exciting result, but we're getting closer. Let's draw a set of lines that go until the end of the canvas, with some spacing between them.

We need to do basically the same we already did, but within a loop:

const barCount = 75
const spacing = 6
const barWidth = 8
const padding = 24

ctx.lineWidth = barWidth;
ctx.lineCap = "round";
ctx.strokeStyle = "teal";

for (let i = 0; i < barCount; i++) {
  const barHeight = Math.random() * (height - padding);
  const startX = (i * (spacing + barWidth));
  const topY = height - padding - barHeight;

  ctx.beginPath();
  ctx.moveTo(startX, height);
  ctx.lineTo(startX, topY);
  ctx.stroke();
}

Here, we're predefining the number of bars we want to draw, a fixed spacing between them and a padding which we're using to avoid the bars reaching the top of the canvas.

Then, inside the loop, that's executed 75 times, we set a random number for the barHeight, set the startX point by multiplying the current loop index with the sum of the spacing and the barWidth to make sure the spacing between them is constant and, finally, we're actually drawing each bar just like we did before.

Executing the code results in the following:

Basic bar graph with canvas

Even though it resembles a waveform, we're not there yet: the random barHeight introduces some weird improbable jumps between high and low amplitudes, so let's correct that and add a couple of features to improve the looks of our waveform.

Improving the results

Now, let's fix some of the things that make our graph not as realistic as it can be.

The first improvement is to calculate the amount of space so we can fit the exact number of bars that are set in the barCount variable.

The second change is to generate the waveforms in a separate loop and store them in the amplitudes array.

Firstly, let's smooth the changes by introducing three variables: isPeak, a boolean that is true around 5% of the time, change which is a negative or positive number added to the previous amplitude stored in prevAmplitude value.

Lastly, we make sure that the amplitude is within the 0.05 inferior and 0.9 superior limits, this means that our bars will never go over the height of the canvas:

const barCount = 120
const barWidth = 4
const padding = 8
const centerY = height / 2
const availableWidth = width - (padding * 2)
const barSpacing = availableWidth / (barCount - 1) // There are (n - 1) spaces in an n number of bars.

const amplitudes = Array(barCount).fill(0)
let prevAmplitude = Math.random() * 0.4 // Randomly assign a first amplitude

for (let i = 0; i < barCount; i ++) {
  const isPeak = Math.random() < 0.05
  const change = isPeak ? Math.random() * 0.3 : (Math.random() - 0.5) * 0.25
  prevAmplitude = Math.max(0.05, Math.min(0.9, prevAmplitude + change))
  amplitudes[i] = prevAmplitude
}

ctx.lineWidth = barWidth;
ctx.lineCap = "round";
ctx.strokeStyle = "teal";

for (let i = 0; i < barCount; i++) {
  const barHeight = amplitudes[i] * (height - padding * 2);
  const startX = (i * barSpacing) + padding
  const endY = height - barHeight;

  ctx.beginPath();
  ctx.moveTo(startX, height);
  ctx.lineTo(startX, endY);
  ctx.stroke();
}

This produces the following result:

Smoothed waveform

Let's improve the result by mirroring the graph, which generates the same bar from the y-axis center.

To achieve this, we define the centerY variable which is set to height / 2 we change the moveTo argument for the y-axis, so the line starts at the vertical center minus the barHeight / 2 and we set the finishing point of the line to be set at the vertical center plus the barHeight / 2:

const centerY = height / 2

// Rest of the code
ctx.beginPath();
ctx.moveTo(startX, centerY - (barHeight / 2))
ctx.lineTo(startX, centerY + (barHeight / 2))
ctx.stroke();

This produces the following result:

Mirrored waveform

Currently, we have something that's usable for an audio related project.

But, we can improve this by actually picking the amplitudes from actual audio samples instead of generating them ourselves like we're doing here.

So, let's start by learning how to generate the waveform data using Ruby:

Users might expect the waveform to actually communicate things like the actual amplitude of the sound or the location of peaks, silences, and even the actual appearance of the waveform. As long as that information is not relevant, we can use this approach.

Waveform data using Ruby

To extract waveform data using Ruby we will actually cheat a little by using a CLI that generates the waveform data for us, and we will call it using a Ruby class.

For this, we will use audiowaveform developed by the BBC, which is a C++ program that can generate waveform data and render images from audio files.

It's more convenient than alternatives because it can handle MP3, WAV, FLAC and Ogg Vorbis and Opus files.

Follow their installation guidelines and make sure it works on your operating system.

If you plan to deploy this code to production, you should make sure the audiowaveform library is installed and that your application has access to run it. You can add the installation command to your Dockerfile if you're deploying using Kamal or any Docker-related service.

We will encapsulate the extraction into a WaveformData class, and we will be using the Open3 module which provides us with access to the stdin, stdout and stderr so we can throw an error if the waveform generation fails.

Our class will receive a file, define a generate method that returns the data as a Ruby hash and a to_json method which generates a JSON.

It works by executing the audiowaveform command, storing the result in a Tempfile and returning the output. If the file is an instance of an ActiveStorage::Blob or if it's an ActiveStorage::Attachment it will download the file contents and save them into a @temp_file instance variable which gets cleaned up at the end of the process.

class WaveformData
  attr_reader :file, :pixels_per_second
  attr_accessor :temp_file

  def initialize(file, pixels_per_second = 10)
    @file = file
    @pixels_per_second = pixels_per_second
    @temp_file = nil
  end

  def generate
    Tempfile.create(['waveform', '.json']) do |temp_output|
      stdout, stderr, status = Open3.capture3(*command(temp_output.path))

      unless status.success?
        raise "Failed to generate JSON: #{stderr.strip}"
      end

      temp_output.rewind
      JSON.parse(temp_output.read)
    ensure
      cleanup_temp_file
    end
  end

  def to_json
    JSON.generate(generate)
  end

  private

  def file_path
    if file.respond_to?(:path)
      file.path
    elsif file.respond_to?(:tempfile)
      file.tempfile.path
    elsif (file.respond_to?(:blob) || file.is_a?(ActiveStorage::Blob)) && defined?(ActiveStorage)
      download_blob_to_tempfile(file.respond_to?(:blob) ? file.blob : file)
    else
      raise ArgumentError, "Unsupported file type: #{file.class}"
    end
  end

  def command(temp_output_path)
    ['audiowaveform',
      '-i', file_path,
      '-o', temp_output_path,
      '--pixels-per-second', pixels_per_second.to_s]
  end

  def download_blob_to_tempfile(blob)
    @temp_file = Tempfile.new(['active-storage', File.extname(blob.filename.to_s)])
    @temp_file.binmode
    blob.download { |chunk| @temp_file.write(chunk) }
    @temp_file.flush
    @temp_file.rewind
    @temp_file.path
  end

  def cleanup_temp_file
    @temp_file&.close
    @temp_file&.unlink
    @temp_file = nil
  end
end

Now, to use our WaveformData class, we just need a File instance to pass it to the class and then generate the JSON:

file = File.open(Rails.root.join("audio-file.mp3"))
data = WaveformData.new(file).to_json

This will output a JSON hash which includes the following keys:

  • version: represents the version number of the waveform data format.
  • channels: the number of waveform channels present in the data.
  • sample_rate: the sample rate for the original sound file.
  • samples_per_pixel: the number of samples that go into a pixel when generating an image. This also determines the length of the amplitudes array.
  • bits: the resolution of the waveform data. It may be 8 or 16. It represents the number of possible values the amplitude has. For 16 bits, we can have values that go from -32768 to +32768.
  • length: the number of minimum and maximum value pairs per channel. If we're working with a single channel and with the --pixels-per-second set to 1, this value should match the track's duration.
  • data: the array of interleaved minimum and maximum waveform data points.

To access the actual array, we access the data key data["data"] which gives us an array of amplitudes:

# The actual array holds many more elements
[-1911, 1944, -8490, 9999, -7025, 9206, -4108, 5937, -9505, 13513]

Consider that, in our case, the amplitudes represent one channel of the audio and are interleaved, meaning that the pairs represent minimum and maximum values for a given sample.

With this array, we can generate the waveform just like we did before using the Canvas API:

Canvas painted waveform with server generated data

The main difference is that we're not generating the data semi-randomly anymore, we're actually using the data we get from the server in JSON format, normalizing it by dividing every value by 32768 and also including an amplitudeScale which we set to height * 0.5 to make sure the waveform extends up to half of the canvas height above and below the center.

  const availableWidth = width - (padding * 2);
  const totalBars = Math.floor(data.length / 2);
  const barSpacing = totalBars <= 1 ? 0 : availableWidth / (totalBars - 1);
  const amplitudeScale = height * 0.5

  for (let i = 0; i < normalizedData.length; i += 2) {
    if (i + 1 < normalizedData.length) {
      const min = normalizedData[i];
      const max = normalizedData[i + 1];

      const x = padding + (i / 2) * barSpacing;

      ctx.beginPath();
      ctx.moveTo(x, centerY + min * amplitudeScale);
      ctx.lineTo(x, centerY + max * amplitudeScale);
      ctx.stroke();
    }
  }

Now, we will extract the canvas part of the waveform generation into a class to make things easier.

The reason we're dividing every value by 32768 is that the audio we used has a resolution of 16 bits, which means that we can get 2¹⁵ possible values to represent the amplitude at any given time. By dividing by 32768 we make sure that our values are between -1 and 1. This also means that we need to use an amplifying factor represented by the amplitudeScale variable, so the waves we're drawing are visible and located within some reasonable limits.

Extracting behavior into a class

Until now, we've been defining behavior by writing isolated JavaScript code that you could run in a <script> tag inside any view.

Let's create a WaveformVisualizer class that receives a <canvas> element and can draw the waveform just like we did before.

The class will receive two arguments: a canvas instance and an options object which will let us configure many things like the data we will use to draw, the backgroundColor for the canvas, the strokeColor for the bars and the progressColor to showcase progress.

We will also let the user customize the barWidth, the padding for the canvas element and, lastly, the amplifyingFactor that we use to adjust the appearance of the waveform:

// app/javascript/waveform_visualizer.js
export default class WaveformVisualizer {
  constructor(canvas, options = {}) {
    this.canvas = canvas;
    this.ctx = this.canvas.getContext('2d');
    this.waveformData = options.waveformData || {data: []};
    this.backgroundColor = options.backgroundColor || "#FFFFFF";
    this.strokeColor = options.strokeColor || "teal";
    this.progressColor = options.progressColor || "#FF5500";
    this.barWidth = options.barWidth || 2;
    this.padding = options.padding || 8;
    this.amplifyingFactor = options.amplifyingFactor || 1;
  }
}

Next, let's define an init method that will be called before actually using the class where we will throw an error if the canvas element is not valid, set the context and the canvas size and initial styles using two “private” methods.

init() {
  if (!this.canvas || !(this.canvas instanceof HTMLCanvasElement)) {
    throw new Error("A valid canvas element is required")
  }

  this.ctx = this.canvas.getContext("2d")

  this._setCanvasSize()
  this._setCanvasStyles()
}

_setCanvasSize() {
  const { width, height } = this.canvas.parentElement.getBoundingClientRect()
  this.canvas.width = width
  this.canvas.height = height
}

_setCanvasStyles() {
  this.ctx.fillStyle = this.backgroundColor
  this.ctx.strokeStyle = this.strokeColor
  this.ctx.lineWidth = this.barWidth
  this.ctx.lineCap = 'round'
  this.ctx.fillRect(0, 0, this.canvas.width, this.canvas.height)
}

Now, let's add a draw method that will do something very similar to what we were doing previously:

draw() {
  const normalizedData = this.waveformData.data.map((v) => v / 32768)
  const amplitudeScale = this.canvas.height * this.amplifyingFactor
  const centerY = this.canvas.height / 2

  const availableWidth = this.canvas.width - (this.padding * 2)
  const totalBars = Math.floor(normalizedData.length / 2)
  const barSpacing = totalBars <= 1 ? 0 : availableWidth / (totalBars - 1)

  for (let i = 0; i < normalizedData.length; i += 2) {
    if (i + 1 < normalizedData.length) {
      const min = normalizedData[i]
      const max = normalizedData[i + 1]
      const x = this.padding + (i / 2) * barSpacing

      this._drawLine(x, centerY + (min * amplitudeScale), centerY + (max * amplitudeScale))
    }
  }
}

_drawLine(x, y1, y2) {
  this.ctx.beginPath();
  this.ctx.moveTo(x, y1);
  this.ctx.lineTo(x, y2);
  this.ctx.stroke();
}

Now, we can use the class to draw waveforms just like before but with a more extensible behavior:

// app/javascript/application.js
import WaveformVisualizer from "./waveform_visualizer"

const canvas = document.querySelectorAll("canvas")
canvas.forEach(canvas => {
  const data = JSON.parse(canvas.dataset.waveform)
  const options = {
    waveformData: data
  }
  const visualizer = new WaveformVisualizer(canvas, options)
  visualizer.init()
  visualizer.draw()
})

Note that we're passing the waveform data through the data-waveform attribute on the canvas element.

We get the following result:

Waveforms generated with the WaveformVisualizer class

As you can see, we're displaying two different waveforms that we extracted from audio tracks we uploaded to our application.

Stimulus Controller

Now that we've defined the WaveformVisualizer class, let's define a Stimulus controller to better encapsulate the behavior:

import { Controller } form "@hotwired/stimulus"
import WaveformVisualizer from "../waveform_visualizer"

export default class extends Controller {
  static targets = ["canvas"]
  static values = {
    duration: Number,
    waveformData: {type: Object, default: {}},
    backgroundColor: {type: String, default: "#FFFFFF"},
    progressColor: {type: String, default: "#9c59ff"},
    strokeColor: {type: String, default: "#9c59ff"},
    barWidth: {type: Number, default: 2},
    padding: {type: Number, default: 8},
    amplifyingFactor: {type: Number, default: 1}
  }

  connect() {
    if (!this.hasCanvasTarget) throw new Error("Canvas target is required")
    if (!this.hasWaveformDataValue) throw new Error("Waveform data is required.")

    this.visualizer = new WaveformVisualizer(this.canvasTarget, this.visualizerOptions);
    this.visualizer.init();
    this.visualizer.draw();
  }

  get visualizerOptions() {
    return {
      waveformData: this.waveformDataValue,
      backgroundColor: this.backgroundColorValue,
      strokeColor: this.strokeColorValue,
      progressColor: this.progressColorValue,
      barWidth: this.barWidthValue,
      padding: this.paddingValue,
      amplifyingFactor: this.amplifyingFactorValue,
      onSeek: this.seek.bind(this)
    }
  }
}

The controller is a light wrapper over the WaveformVisualizer class, but we can extend it further to achieve behavior like progress animation or seek to playback just like we did in the audio player article but using a waveform instead of a rectangular progress bar.

To facilitate the usage of the controller, we're providing default values for most of the config options and using a visualizerOptions getter to keep them all in the same place.

To use the controller, we just need the following HTML:

<div class="max-w-screen-lg mx-auto px-6 py-16" id="container">
  <div class="space-y-8">
    <% @tracks.each do |track| %>
      <div class="border border-slate-300 rounded-lg w-full px-2">
        <div class="w-full h-40"  
        data-controller="waveform"
        data-waveform-waveform-data-value="<%= track.data %>"
        data-waveform-bar-width-value="3"
        data-waveform-progress-color-value="#9c59ff"
        data-waveform-stroke-color-value="#9c59ff"
        data-waveform-amplifying-factor-value="1"
        >
          <canvas id="canvas" data-waveform-target="canvas"></canvas>
        </div>
      </div>
    <% end %>
  </div>
</div>

Beyond the canvas target and the waveformData value, we can rely on the default values and the controller should work.

Usage in Rails

In the setup section, we created an Episode model with name, data and duration attributes. Let's create two background jobs: one to extract the waveform data and another to set the duration attribute by using an ActiveStorage Analyzer.

Let's start with a SetEpisodeDurationJob class:

# app/jobs/set_episode_duration_job.rb
class SetEpisodeDurationJob < ApplicationJob
  queue_as :default

  def perform(episode)
    return if !episode.audio.attached?

    blob = episode.audio.blob
    analyzer = ActiveStorage::Analyzer::AudioAnalyzer.new(blob)
    metadata = analyzer.metadata

    if metadata && metadata[:duration].present?
      episode.update(duration: metadata[:duration].to_i)
    else
      Rails.logger.warn "Couldn't update duration for episode #{episode.name}"
    end
  end
end

Now, let's create a job to extract the waveform using our WaveformData class:

# app/jobs/set_audio_waveform_job.rb
class SetAudioWaveformJob < ApplicationJob
  queue_as :default

  def perform(episode, pixels_per_second = 5)
    waveform = WaveformData.new(episode.audio, pixels_per_second)
    waveform_json = JSON.generate(waveform.generate)
    episode.update(data: waveform_json)
  end
end

Then, we add them as callbacks to our Episode model:

class Episode < ApplicationRecord
  has_one_attached :audio

  validates :name, presence: true

  after_save :set_duration
  after_save :set_audio_waveform

  private

  def set_duration
    SetEpisodeDurationJob.perform_later(self)
  end

  def set_audio_waveform
    SetAudioWaveformJob.perform_later(self)
  end
end

Now, if we create or edit an Episode we will set the duration and the waveform data that we can use to display the waveform as we see fit.

Summary

In this article, we learned how to generate waveform visualizations for audio files using two different approaches: generating semi-random waveforms that look like audio and using the audiowaveform command to generate them using Ruby.

The first step was to learn a bit about digital audio to better understand the process.

Then created the visualizations using the Canvas API, first by making it work with vanilla JavaScript and then by extracting the behavior to a class and encapsulating it with Stimulus.

We also learned how to generate the waveform amplitudes using Ruby and how to integrate it with Rails by adding async workers to perform the waveform generation after an audio upload.

I hope this article, together with the previous one about building an audio player with Stimulus, can help you implement the feature or improve it with your approach.

Don't hesitate to share with us what you built.

Have a good one and happy coding!

Build your next rails app 10x faster with Avo

Avo dashboard showcasing data visualizations through area charts, scatterplot, bar chart, pie charts, custom cards, and others.

Find out how Avo can help you build admin experiences with Rails faster, easier and better.