bundle install black_friday_deals

Using a CDN for Active Storage uploads

By Exequiel Rozas

- October 30, 2024

In our previous article about S3 uploads with Active Storage we learned how to upload images to S3, covering everything from the AWS configuration to handling direct uploads using Active Storage.

In this article, we will explore how to integrate a CDN—specifically CloudFront—to speed up our application.

We will explore a couple of ways to integrate a CDN with it: using the recommended Proxy Mode where we use our application domain as an origin and a simpler way where we expose the URLs and fetch the resources directly from the CDN using the S3 bucket as our origin.

Let's start by understanding what's a CDN and why you might benefit by serving your Active Storage uploads with it:

What's a CDN, and why do we need one?

A CDN, short for Content Delivery Network, is a network of servers distributed around the world designed to cache content from an origin and serves it to users close to its location.

The goal of a CDN is to reduce network latency caused by fetching resources from distant servers, improving performance and user experience.

In the context of file uploads, CDNs help us reduce the time spent serving files to our users by caching those files closer to them and optimizing them with techniques like compression.

For example, if we upload a file to the default S3 bucket located in the us-east-1 AWS region, users that live in Pennsylvania will have a much better experience that users living in Spain or Brazil.

Diagram showing how downloading resources from S3 work

The diagram above shows how Alice, who lives in Brazil and wants to see the product page that lives at products/4 makes a request to our application server, that returns the page with an image that points to our S3 bucket in Virginia.

Alice's browser now requests that image from S3 and displays it in Alice's screen.

The same situation but using a CDN would look like this:

Diagram that shows how resources are fetched from a CDN

Here, Alice requests the same page from the application server, but the returned page includes a link to the product image using a CDN.

The browser would then request the image from the CDN, and if it has the image cached it will return it. Otherwise, it will fetch it from the S3 bucket, cache it and then return it to Alice.

This results in a better experience for Alice because the resource would be located closer to her location and optimized for its delivery.

Using a CDN is also convenient because the traffic costs are usually lower when compared to having an S3 bucket or the assets served from an AWS server.

All things considered, there are very few scenarios where using a CDN is not a good option so you probably should use one when deploying your apps to production.

Active Storage: Redirect and Proxy Mode

By default, Active Storage doesn't return the actual resource address when we call a method like url_for(resource.attachment).

Instead, it returns a URL that redirects to the actual resource URL. This is what's referred to as “Redirect mode”.

The reason to introduce this indirection is to avoid directly exposing the resource link in the response which also means we can introduce features like mirroring in order to increase our resource availability without our users noticing any changes.

This mode works by presenting a URL to the client (browser) that follows a redirect to the actual resource (our S3 bucket in this case).

However, the redirect links have an expiration which means they're not cacheable by design, which means they're not a good fit for a CDN integration.

Active Storage redirect mode diagram

The other mode of serving files with Active Storage is the proxy mode. In this mode, our application server acts a proxy between the client and the storage service.

When using the proxy mode, the client makes a request to the CDN with a path representing the attachment that the CDN would request from our server.

The first time the path gets requested from the CDN would result in a cache miss which in turn would have our server download the file from the bucket and then serving it to the CDN.

Every subsequent request to the path generated by Active Storage would redirect the request to the CDN, so the file would be served by a server close to the user.

One advantage of the proxy method over the redirect method is that it hides the actual resource URL from the user.

This means that we can have better control over file access and that we can be sure that the file key can't be accesses by the user.

Consider that the downside for this method is that one of our application server workers would be busy until the file streaming ends, which is not ideal if users are on slow connections.

Active Storage proxy mode diagram

Integrating CloudFront with Active Storage

We assume that you already have a Rails app that can upload files to an S3 bucket using Active Storage. If you don't have one yet, you can follow our tutorial on direct uploads to S3 using Active Storage.

For this tutorial we will be using a public bucket in order to simplify the configuration and better explain the concepts exclusively related to making the CDN integration work.

In order to have public bucket access, you should remove the Block all public acess default setting and edit the ACL config.

Please note that if you need more granular access control, or you need some sort of asset hosting multi-tenancy (user A should only access its own resources) you should never set public access for your bucket.

Let's start by creating a distribution:

CloudFront distribution setup

Before anything, we need to configure a CloudFront distribution which is the set of servers that will be caching and delivering our content to end-users.

As we will show two ways of integrating a CDN with Active Storage, we will show how to configure a distribution using an S3 bucket as an origin and then a distribution that uses our application domain as an origin.

Let's start by configuring a distribution with S3 as the origin:

Create a CloudFront Distribution with S3 as an origin

In order to serve our CDN assets using the “direct URL exposure” method, we need to create a CloudFront distribution that sets an S3 bucket as an origin.

View of the CloudFront distribution provisioning

We need to pick the S3 bucket we previously created as the origin domain. For the origin access we can leave it public, but it's better to use Origin Access Control to avoid the origin (our S3 bucket) to be accessed by clients other than the CDN.

Then we need to configure our distributions' behavior:

CloudFront distribution behavior configuration

Then, the caching configuration. We want to make sure that we select the CachingOptimized policy that supports Brotli and gzip compression.

CloudFront caching configuration

After this, we can also choose to enable AWS WAF protection which protects against threats like DDoS and other malicious actors.

CloudFront AWS WAF configuration section

Finally, we can leave the settings with the default values. But, consider that you can set up a custom domain for your CDN or pick the edge locations you will actually use in the 'Price Class' section of the settings.

Please consider that the traffic and request pricing is highly dependent on the edge locations you pick. Delivering traffic to North America and Europe is cheaper than delivering it to South America, Asia, Africa, or Oceania.

CloudFront distribution settings section

After this, our distribution is created and ready to be used. But, before we can jump back to our Rails app, we need to edit our bucket's policy to allow our recently created CloudFront distribution to access it.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudfront.amazonaws.com"
      },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::[BUCKET_NAME]/*",
      "Condition": {
        "StringEquals": {
          "AWS:SourceARN": "[CLOUDFRONT_DISTRIBUTION_ARN]"
        }
      }
    }
  ]
}

You can find your CloudFront Distribution ARN (Amazon Resource Name) in the details page under the ARN section of the distribution. It should look like this: arn:aws:cloudfront::123456789012:distribution/EDFDVBD6EXAMPLE

Create a CloudFront distribution with our domain as an origin

In order to integrate the Active Storage proxy file serving mode, we need to create a CloudFront distribution with our application's domain as an origin.

The first step is to add the origin configuration:

CloudFront application server as origin configuration for Active Storage

We just need to add our origin domain and add the Match Viewer in the protocol section.

Replace the origindomain.com with the domain for your application.

After this, we need to set custom headers that will be included in every CloudFront request to our application server:

Customizing CloudFront headers for our origin

The headers that we're adding here are:

  • Access-Control-Allow-Origin:

Then, we configure the 'Cache key and origin requests' section with the following information:

Configuring cache key and origin requests in CloudFront

For the setting parts of the configuration, we can repeat the same values from the S3 origin configuration: pick our preferred edge locations, set HTTP/2 as the supported HTTP version, no “Standard Logging” and IPv6 set to “On”.

In order to allow the CloudFront distribution to access our serves we need to allow the CDN to access our application:

# config/application.rb
config.hosts << ENV["CDN_HOST"]

Where ENV["CDN_HOST] would be our distribution's host name. Something like: d123456abcdefg.cloudfront.net

If we configured everything correctly, we should be able to serve our assets from our origin using the “Proxy mode” from Active Storage.

In case you don't know how that works, let's learn how the two file serving modes work:

Serving uploaded assets with the CDN

There's more than one method to serve the stored assets using a CDN.

Let's start with the most simple:

Serving assets directly

With this method, we construct the URL to our CDN by appending the key for our file to our CDN_HOST and that's what we return to our users.

It's a quick and “dirty” way to achieve the goal, but there are some cases where it makes sense to do it like this.

We can do that by defining a direct route:

direct :cdn_image do |model|
  if model.respond_to?(:signed_id) # We're dealing with an ActiveStorage::Blob
    File.join(ENV["CDN_URL"], model.key)
  else
    File.join(ENV["CDN_URL"], model.blob.key)
  end
end

We could also use a Rails helper or even return the URL directly from a model via a method or a concern in case you want to control which models generate this URL.

Please note that this way of serving assets has a couple of disadvantages for some use cases:

  1. The URL is public and permanent: unless we remove the assets from our bucket anyone who has access to this URL can access it and there's not much we can do to avoid that. In some cases, even removing the asset from the bucket won't work because it will still be available in the CDN PoPs (points of presence) where users already requested the asset.
  2. There's no access control: this method is not adequate for scenarios where you would like to authorize access to certain resources.

Please consider that this method won't work with the way we configured the CloudFront distribution. In order to make it work, you would need to edit the Origin configuration and allow public access. Your bucket's configuration would need to be set to public too. AWS officially dissuades this configuration so proceed with caution.

Using Active Storage proxy mode

This method is the recommended way of integrating a CDN with Rails and Active Storage according to the official guides.

When implemented, every request goes through the CDN which returns the desired asset. The first request to a given asset would be routed from the CDN to our application server that downloads the file from the storage service and streams it to the CDN.

Every subsequent request would use the same URL as the first, but if the resource was cached by the CDN it would be retrieved from it.

To use the proxy mode we can set our application to serve every file using this mode by adding the following to an initializer or the application configuration:

# config/initializers/active_storage.rb
Rails.application.config.active_storage.resolve_model_to_route = :rails_storage_proxy

# Or at config/development.rb or config/production.rb
config.active_storage.resolve_model_to_route = :rails_storage_proxy

After configuring the proxy mode we would need to call the rails_storage_proxy_path in our HTML or API responses:

<%= image_tag rails_storage_proxy_path(image) %>

Or, we can alternatively proxy the files we need with a route helper using the direct method:

# Note that the conditional is introduced because we can pass instances that respond to 'signed_id' (Attachment, Blob)
direct :cdn_image do |model, options|
    expires_in = options.delete(:expires_in) { ActiveStorage.urls_expire_in }

    if model.respond_to?(:signed_id)
      route_for(
        :rails_service_blob_proxy,
        model.signed_id(expires_in: expires_in),
        model.filename,
        options.merge(host: ENV['CDN_HOST'])
      )
    else
      signed_blob_id = model.blob.signed_id(expires_in: expires_in)
      variation_key  = model.variation.key
      filename       = model.blob.filename

      route_for(
        :rails_blob_representation_proxy,
        signed_blob_id,
        variation_key,
        filename,
        options.merge(host: ENV['CDN_HOST'])
      )
    end
end

When we then call cdn_image_url(image) or cdn_image_url(image.variant(:thumb)) it would generate a URL with the following structure:

https://d123456abcdefg.cloudfront.net/rails/active_storage/blobs/proxy/#{signed_id}/#{filename}

The browser would then request the resource from the CDN which would produce a cache miss on the first try, so the request would get routed to our application which would download the file from S3 and return it for the CDN to cache it.

Future requests to the URL would result in a cache hit

Conclusions

Putting a CDN in front of our application's attachments is almost always a good idea, considering it improves user experience and reduces bandwidth costs.

In this article we explored to ways to integrate CloudFront:

  • A direct mode by generating a CloudFront URL to our assets via their Blob key and setting our S3 bucket as the distribution origin.
  • Using Active Storage's proxy mode where our application would act as a proxy between the client and the storage service. To implement this mode we need to set our domain as the distributions origin and configure it appropriately.

Both methods have pros and cons and it's up to you to decide which one is better for your situation.

Build your next rails app 10x faster with Avo

Avo dashboard showcasing data visualizations through area charts, scatterplot, bar chart, pie charts, custom cards, and others.

Find out how Avo can help you build admin experiences with Rails faster, easier and better.