Rails: How to Send and Receive Webhooks

There are two pieces to webhooks in Flipper Cloudsending and receiving. Typically, a webhook system is only about the sending and receiving is left as an exercise for the customer.

But our situation is different. All feature flag data is stored locally in your application to keep reads blazing fast. In order to do this, we need to update your application data when things change in Cloud. This means we need a mechanism to sync.

Cloud supports two ways of syncing – polling (in development) and webhooks. But webhooks are the recommended way for syncing changes in Cloud to your application – especially in production. They ensure your application requests are never tied up polling for new data.

I didn't want every customer building their own webhook receiver and sync, so I knew I'd need to provide something automatic for them.

Before you can receive, you must send.

Sending Webhooks

I knew that webhooks would be tied to an environment. For those that are new here (👋) or need a refresher, Flipper Cloud is multi-environment in a few different ways.

The main relationships you need to know are:

  • an Organization is the root.
  • Organization has_many :projects
  • Project has_many :environments (e.g. QA, staging, production, etc.).
  • Environment has_many :webhooks
  • And, finally, Webhook has_many :webhook_responses

Managing the hooks

The boring, and perhaps more obvious, part of all this is the managing of webhooks. This looks like a fairly typical Rails CRUD section.

At of the time of writing, our webhooks table looks like this:

create_table "webhooks", force: :cascade do |t|
  t.string "url"
  t.integer "state", default: 0, null: false
  t.integer "creator_id", null: false
  t.integer "environment_id", null: false
  t.datetime "created_at", null: false
  t.datetime "updated_at", null: false
  t.string "secret", limit: 50, null: false
  t.text "description"
  t.index ["environment_id"], name: "index_webhooks_on_environment_id"
end

We have a pretty standard Rails model wrapped around the table. It looks mostly like this:

require "securerandom"

class Webhook < ApplicationRecord
  # Regex that webhook url must match.
  URL_REGEX = URI.regexp(['http', 'https'])

  # Hosts that we don't want to make requests to.
  DENIED_HOSTS = %w(localhost 127.0.0.1).freeze

  has_secure_token :secret

  belongs_to :creator, class_name: "User"
  belongs_to :environment

  has_many :responses, class_name: "WebhookResponse", dependent: :destroy

  delegate :project, to: :environment, allow_nil: true
  delegate :organization, to: :project, allow_nil: true

  enum state: {enabled: 0, deleted: 1, disabled: 2, disabled_by_admin: 3}

  validates :url,
    presence: true,
    format: { with: URL_REGEX, allow_blank: true }
  validates :creator_id, presence: true
  validates :environment_id, presence: true
  validate :url_host

  scope :created, -> { where.not(state: :deleted) }
  
  private

  def url_host
    return if url.blank?
    uri = URI(url)
    if DENIED_HOSTS.include?(uri.host)
      errors.add(:url, "is invalid")
    end
  end
end

In the pics below, I show the UI for managing webhooks. In each of the screenshots, the UI is scoped to:

  • Fewer & Faster (organization)
  • Speaker Deck (project)
  • Production (environment)
Viewing the list of webhooks
Adding a new webhook
Viewing individual webhook and responses

The controller for this area is, I would assume, a very stock Rails controller. If you recently read my post on reducing friction at the authorization layer, you'll notice some of the policy enforcement below.

class WebhooksController < ApplicationController
  layout "environment"
  before_action :find_organization_project_and_environment
  before_action :check_for_enabled_webhooks

  def index
    authorize @environment, :view_webhooks?
    @webhooks = @environment.webhooks.created.order(url: :asc).paginate(page: params[:page])
    Webhook.preload_latest_responses(@webhooks)
  end

  def show
    @webhook = @environment.webhooks.find(params[:id])
    @webhook_responses = FindWebhookResponses.call(@webhook, page: params[:page])
  end

  def new
    authorize @environment, :create_webhook?
    @webhook = @environment.webhooks.build(creator: current_user)
  end

  def create
    @webhook = CreateWebhook.call(
      actor: current_user,
      environment: @environment,
      url: webhook_params[:url],
      description: webhook_params[:description]
    )

    if @webhook.persisted?
      redirect_to [@organization, @project, @environment, @webhook], notice: 'Webhook was successfully created.'
    else
      render :new
    end
  end

  def edit
    @webhook = @environment.webhooks.find(params[:id])
    authorize @webhook, :update?
  end

  def update
    @webhook = @environment.webhooks.find(params[:id])
    authorize @webhook, :update?

    if @webhook.update(webhook_params)
      redirect_to [@organization, @project, @environment, @webhook], notice: 'Webhook was successfully updated.'
    else
      render :edit
    end
  end

  def destroy
    @webhook = @environment.webhooks.find(params[:id])
    authorize @webhook, :destroy?

    @webhook.deleted!
    redirect_to [@organization, @project, @environment, :webhooks], notice: 'Webhook was successfully deleted.'
  end

  def send_test
    @webhook = @environment.webhooks.find(params[:id])
    webhook_policy = @webhook.policy(current_user)

    if webhook_policy.send_test?
      @webhook.sync
      flash[:notice] = "Webhook test enqueued."
    else
      flash[:error] = webhook_policy.send_test.message
    end

    redirect_to [@organization, @project, @environment, @webhook]
  end

  def regenerate_secret
    @webhook = @environment.webhooks.find(params[:id])
    authorize @webhook, :regenerate_secret?
    @webhook.regenerate_secret
    redirect_to [@organization, @project, @environment, @webhook], notice: "Webhook secret regenerated. Be sure to update any place you were using the old secret with the new one."
  end

  private

  def webhook_params
    webhook_params = params.require(:webhook).permit(:url, :description, :state)

    # If trying to use a state you don't have access to then remove the state.
    if state = webhook_params[:state]
      attempting_value_without_permission = !Webhook.state_select_valid_value?(current_user, state)
      non_admin_attempting_to_enable_admin_disabled = @webhook && @webhook.disabled_by_admin? && !current_user.admin?
      if attempting_value_without_permission || non_admin_attempting_to_enable_admin_disabled
        webhook_params.delete(:state)
      end
    end

    webhook_params
  end

  def find_organization_project_and_environment
    @organization = current_organization
    @project = @organization.projects.friendly.find(params[:project_id])
    @environment = @project.environments.friendly.find(params[:environment_id])
    authorize @environment, :show?
  end

  def check_for_enabled_webhooks
    unauthorized unless Flipper.enabled?(:webhooks, @organization)
  end
end

There is a bit of gnarly code around determining if the user can edit the state that I'd like to cleanup with a policy method, but I left it because I believe in showing you the warts and all.

If you noticed Webhook.preload_latest_responses and wondered about it, you can read my post on never querying the same thing more than once.

Protection with a feature flag

You may also notice that I check if the webhooks feature flag is enabled in a before_action. This allowed me to work on webhooks in production without customers using them before they were fully baked.

Could I remove that feature now that webhooks are fully released? Sure, I could. But it isn't hurting anything and does allow me to turn them off if I ever need to. For that reason, I've left the feature check in.

Firing the hooks

Now that people could manage webhooks, I needed to choose where and when a webhook request should actually be fired.

Thankfully, Flipper Cloud already has audit history for feature flags. When features are created, enabled or disabled in anyway we store a record of that. This audit history is built into the custom flipper adapter that cloud uses to manage feature flag data for an environment.

I already had a change method in the adapter that was creating the audit history. That seemed like the perfect place to also send webhooks. I created a new WebhookJob so I could enqueue a background job in the change method.

This is where it starts to get 🧃-y (aka juicy).

What could go wrong

My blessing and curse in life is fear of failure. It extends from my personal life (how should I invest this money) to my professional (how should I write this code).

I can't write a single line of code without mentally creating a matrix in my head of:

  • how can this fail
  • how likely is this to fail
  • how bad would it be if it failed
  • how easy is it to handle failure
  • how can I remedy the failure
  • and the list goes on...

Because of this curse, I immediately struggled with this job. We allow multiple webhooks per environment. I didn't want to enqueue N + 1 jobs where N is the number of webhooks for the environment. What if one of the jobs failed to enqueue?

Also, I like constants. No, not Ruby constants like User or BLAH_BLAH, but bits of code that do a constant number of network calls – aka anti-decay programming.

I wanted a single job enqueue at this code point, not a variable number of enqueues. So I enqueued the job with the environment and looped through the webhooks in the job like so:

class WebhookJob < ApplicationJob
  queue_as :webhooks

  def perform(environment)
    environment.webhooks.enabled.find_each do |webhook|
      SyncWebhook.call(webhook)
    end
  end
end

What if the webhook job performed, one webhook had a successful response and another a failed response? I would need to retry the failed one. I'd like to make a best effort to eventually get a hook request to that end point.

If I let the job system handle the retry, then each webhook that succeeded would receive another webhook. I could change this job and make it enqueue one job per webhook. At least this keeps the N+1 to the job system and out of any web request.

But what if there are 4 webhooks for this environment, 2 get enqueued and 2 don't because the job gets restarted for any number of reasons. When the job retries, it will enqueue 2 more jobs for the webhooks that already had jobs enqueued.

At least once

I don't need exactly once processing as it simply isn't worth the effort for this use case. I'm going for at least once, with a duplicate here and there being ok. It's just a sync request. If we have a few extra syncs, no biggie.

But even if I created a webhook response in the first two jobs, they'd have no way of knowing whether this is a new sync request or the previous one that was enqueued as a duplicate.

Shower driven development

In the 🚿 later that day, it hit me. I need an identifier. If I enqueue this job with a UUID and I store records of webhook requests with that UUID I can have good enough certainty to discard duplicates.

Enqueueing with the UUID is straightforward:

require "securerandom"
WebhookJob.perform_later(environment, SecureRandom.uuid)

The changes to the job are minor:

class WebhookJob < ApplicationJob
  queue_as :webhooks

  def perform(environment, uuid)
    environment.webhooks.enabled.find_each do |webhook|
      SyncWebhook.call(webhook, uuid: uuid)
    end
  end
end

This tiny little unique identifier makes it far more easy to make background jobs idempotent.

I added a delivered? method on Webhook.

class Webhook
  has_many :responses, class_name: "WebhookResponse", dependent: :destroy
  
  def delivered?(uuid)
    responses.where(delivery_id: uuid).exists?
  end
end

Note: I also have a unique index on webhook_id and delivery_id. This prevents the duplicate from being stored.

And I can then use that method in SyncWebhook to determine if we really need to sync or if we already have a response. Again, this isn't perfect as it has a read then write race, but it is good enough.

require "securerandom"

class SyncWebhook
  def self.call(webhook, uuid: SecureRandom.uuid)
    new(webhook, uuid: uuid).call
  end
  
  NOOP = GitHub::Result.new

  def initialize(webhook, uuid: SecureRandom.uuid)
    @webhook = webhook
    @uuid = uuid
  end

  def call
    return NOOP if webhook.delivered?(@uuid)

    # do the sync
  end
end

NOOP is just a sentinel value so I can tell if a hook was actually fired or not. I'd ignore it for now if I were you.

I can think of so many places this simple idea of enqueueing a job with an identifier would have been helpful. Not sure why it didn't occur to me before. Really any job that does more than one thing could benefit from this.

Time to party

I'd be cheating you if that was all I showed for the sending, so I'm going to leave the whole SyncWebhook class here for the curious. I'll add some comments to hopefully explain any parts that could cause confusion.

require "json" # to encode the body
require "httparty" # because I like to party
require "securerandom"
# something similar to ActiveSupport::MessageVerifier 
# and Stripe's webhook verification. I added it 
# separately to avoid needing to pull either of those
# as dependencies.
require "flipper/cloud/message_verifier"

class SyncWebhook
  # convenience method at the class level
  def self.call(webhook, uuid: SecureRandom.uuid)
    new(webhook, uuid: uuid).call
  end

  # convenience reader methods
  attr_reader :webhook
  attr_reader :uuid
  attr_reader :timestamp

  # sentinel value for "nothing happened"
  NOOP = GitHub::Result.new

  # pretty stock initialize
  def initialize(webhook, uuid: SecureRandom.uuid)
    @webhook = webhook
    @uuid = uuid
    @timestamp = Time.now
  end

  # actually try to sync the hook
  def call
    # only sync if the policy is allowing it (aka they paying)
    return NOOP unless webhook.policy.sync?
    # no need to deliver if already delivered
    return NOOP if webhook.delivered?(uuid)

    # for tracking the duration
    start = now
    
    # its ok if this fails for now, we just want to wrap it up,
    # but now that i look at it, this should rescue with another
    # enqueue and a new uuid, maybe the old uuid as a parent uuid
    GitHub::Result.new {
      HTTParty.post(webhook.url, request_options)
    }.map { |response|
      # if the response succeeds, set the db attributes for code and headers
      webhook_response_attributes[:status] = response.code
      webhook_response_attributes[:headers] = response.headers
      response
    }.map { |response|
      # we return the Flipper registered groups in the hook 
      # endpoint so this ensures they exist for the environment
      create_groups(response.body) if response.code == 200
    }.rescue { |error|
      # if response fails then track the error in db, 
      # send to error tracking and return error
      webhook_response_attributes.update({
        error: {
          class: error.class.name,
          message: error.message[0..100],
          backtrace: error.backtrace.first(5)
        }
      })
      Raven.capture_exception(error)
      GitHub::Result.error(error)
    }
    # store duration so we know slow hooks
    webhook_response_attributes[:duration] = now - start

    # try to create the webhook response in the database
    GitHub::Result.new { webhook.responses.create!(webhook_response_attributes) }
  end

  private

  def webhook_response_attributes
    @webhook_response_attributes ||= {
      request_url: webhook.url,
      request_headers: request_headers,
      delivery_id: uuid,
    }
  end

  def now
    Process.clock_gettime(Process::CLOCK_MONOTONIC, :millisecond)
  end

  # Heavily rescued method with best effort of creating the groups in the
  # response body.
  #
  # This is very defensive as people can setup webhooks to anywhere and return
  # any response body they want.
  #
  # Returns GitHub::Result.
  def create_groups(response_body)
    GitHub::Result.new {
      JSON.parse(response_body)
    }.map { |data|
      if groups = data["groups"]
        if groups.is_a?(Array)
          registered_groups = groups.map { |row| row.is_a?(Hash) ? row["name"] : nil }.compact.uniq
          RegisterAppGroups.call(webhook.environment, registered_groups)
        end
      end
    }.rescue { |exception|
      Raven.capture_exception(exception)
      GitHub::Result.error(exception)
    }
  end

  # this and the next few methods just set some 
  # reasonable timeouts, we don't want to sit here
  # forever waiting on endpoints
  def read_timeout
    Integer(ENV["WEBHOOK_READ_TIMEOUT"].presence || 10)
  end

  def open_timeout
    Integer(ENV["WEBHOOK_OPEN_TIMEOUT"].presence || 10)
  end

  def connect_timeout
    Integer(ENV["WEBHOOK_CONNECT_TIMEOUT"].presence || 10)
  end

  def message_verifier
    @message_verifier ||= Flipper::Cloud::MessageVerifier.new(secret: webhook.secret)
  end

  def signature
    @signature ||= message_verifier.generate(request_body, timestamp)
  end

  def header
    @header ||= message_verifier.header(signature, timestamp)
  end

  def request_headers
    @request_headers ||= {
      "Content-Type" => "application/json",
      "Flipper-Cloud-Signature" => header,
    }
  end

  # sync! 
  def request_body
    @request_body ||= JSON.generate({
      "environment_id" => webhook.environment_id,
      "webhook_id" => webhook.id,
      "delivery_id" => uuid,
      "action" => "sync",
    })
  end

  # add all the timeouts and details
  def request_options
    @request_options ||= {
      body: request_body,
      headers: request_headers,
      read_timeout: read_timeout,
      open_timeout: open_timeout,
      connect_timeout: connect_timeout,
    }
  end
end

I think a case could be made to break this up into a few smaller classes (like all the http stuff). But my new life theme is good enough for now.

For more on why I'm using GitHub::Result all over the place, check out my resilience for ruby post. Also, I've previously written about the RegisterAppGroups class in my post covering rails insert_all and upsert_all.

Now that sending hooks was working, it was time to add receiving.

Receiving Webhooks

The receiving end of webhooks is all public in OSS flipper. But I'll cover it here for thoroughness sake. Many people these days just create Rails engines. I'd love to only do that. But doing only that means you leave out Sinatra, Hanami, Roda and friends.

I'm not one to leave anyone out and we have a standard for stuff like this – Rack. My typical route is instead to create a middleware that is used by Rack apps that get mounted in Rails routes or URL mapped in rackup files for the rest.

This makes it so everything built on Rack can Just Work™. And you can still add a Rails engine to mount the Rack app for automatic Rails support.

The middleware

The middleware for Flipper Cloud receives a webhook request (from Cloud), validates the signature, and performs a sync. It also returns the groups as I mentioned above, so we can keep those in sync for you. That's it.

require "flipper/cloud/message_verifier"

module Flipper
  module Cloud
    class Middleware
      # Internal: The path to match for webhook requests.
      WEBHOOK_PATH = %r{\A/webhooks\/?\Z}
      # Internal: The root path to match for requests.
      ROOT_PATH = %r{\A/\Z}

      def initialize(app, options = {})
        @app = app
        @env_key = options.fetch(:env_key, 'flipper')
      end

      def call(env)
        dup.call!(env)
      end

      def call!(env)
        request = Rack::Request.new(env)
        
        # backwards compatible stuff to make / and /webhooks work.
        if request.post? && (request.path_info.match(ROOT_PATH) || request.path_info.match(WEBHOOK_PATH))
          status = 200
          headers = {
            "Content-Type" => "application/json",
          }
          body = "{}"
          payload = request.body.read
          signature = request.env["HTTP_FLIPPER_CLOUD_SIGNATURE"]
          flipper = env.fetch(@env_key)

          begin
            message_verifier = MessageVerifier.new(secret: flipper.sync_secret)
            if message_verifier.verify(payload, signature)
              # message is valid so let's do a sync
              begin
                # sync
                flipper.sync
                
                # return the groups in the response so cloud can pull them
                body = JSON.generate({
                  groups: Flipper.group_names.map { |name| {name: name}}
                })
              rescue Flipper::Adapters::Http::Error => error
                # if we received an error, check if payment is 
                # issue or something else
                status = error.response.code.to_i == 402 ? 402 : 500
                headers["Flipper-Cloud-Response-Error-Class"] = error.class.name
                headers["Flipper-Cloud-Response-Error-Message"] = error.message
              rescue => error
                status = 500
                headers["Flipper-Cloud-Response-Error-Class"] = error.class.name
                headers["Flipper-Cloud-Response-Error-Message"] = error.message
              end
            end
          rescue MessageVerifier::InvalidSignature
            status = 400
          end

          [status, headers, [body]]
        else
          @app.call(env)
        end
      end
    end
  end
end

Not the prettiest of code. But as I age, I'd like to think that I'm a little better at letting working code that isn't the cleanest sneak into production. Reality is the world runs on dirty code and that middleware is probably in the top 1% of beautiful code despite its length.

The Rack app

Cloud has an app method for building the Rack app that you can then easily mount in your routes (for Rails) or url map in your rackup (for alternative frameworks).

module Flipper
  module Cloud
    def self.app(flipper = nil, options = {})
      env_key = options.fetch(:env_key, 'flipper')
      memoizer_options = options.fetch(:memoizer_options, {})

      app = ->(_) { [404, { 'Content-Type'.freeze => 'application/json'.freeze }, ['{}'.freeze]] }
      builder = Rack::Builder.new
      yield builder if block_given?
      builder.use Flipper::Middleware::SetupEnv, flipper, env_key: env_key
      builder.use Flipper::Middleware::Memoizer, memoizer_options.merge(env_key: env_key)
      builder.use Flipper::Cloud::Middleware, env_key: env_key
      builder.run app
      klass = self
      builder.define_singleton_method(:inspect) { klass.inspect } # pretty rake routes output
      builder
    end
  end
end

The app method is mostly boring – just building up a Rack::Builder instance. But you can see on line 12 where it uses the middleware.

You can then use it by mounting it in your Rails routes.

Rails.application.routes.draw do
  mount Flipper::Cloud.app, at: "_flipper"
end

This adds an endpoint for your app at yourapp.com/_flipper (or wherever you mounted it).

Wrap Up

So there you have it. This post covered:

  • managing webhooks
  • finding the point in your code where you'll need to send the webhook
  • how to party when actually sending the webhook
  • how to receive the webhook on the other side

This is definitely a first iteration and will harden over time as all good production code does. But so far it has been working great. Nearly all of our customers are using hooks and we haven't had any problems to date.

Welp. That was another long one – a common theme here of late. But I hope you enjoyed it and maybe learned a little something. It took me around 3+ hours to write.

If you did enjoy or learn anything, you should probably check out Flipper Cloud. Sign up, kick the tires and let me know what you think. I crave feedback.

P.S. It is worth noting that I don't yet have backoff in retries or automatic disabling of failing hooks or all the other things that would be good. As I need and create these systems, I'll do my best to write them up as well.

P.P.S. If you'd like to see the functioning of this system in production from end to end, check out this one minute video.