Rails: Postgres Native Partitioning

At Box Out and Flipper Cloud we've got some big tables. Not huge, but big (tens to hundreds of millions). I've been meaning to try pg_partman, pgslice or native postgres partitioning for a while. I finally did and as usual thought I should write it up for posterity.

pgslice

pgslice is pure ruby. That's cool (to me). But the gem's community is tiny compared to pg_partman and native postgres partitioning.

When making a decision on a dependency to use, always factor community in as a large percentage of why you choose what you do.

You likely don't want to maintain your choice long term or you wouldn't be looking for something that does it for you.

pg_partman

pg_partman is old and trusty. We used it for some GitHub analytics stuff at GitHub years ago (might still be in use). So I have some history and comfort with it.

It works with trigger based (old) and with native declarative (new) partitioning.

It will create new partitions and prune old ones. Pretty much does it all.

To use pg_partman you need an extension. Its popular enough that any postgres service usually comes with it (e.g. heroku, aws, etc.).

Local Dev Pain

Installing the extension on your laptop isn't too hard, but perhaps not as easy or automatic as many teammates (designers, front end devs, etc.) would appreciate. This makes it a bit annoying for local dev.

One option to get past local dev pain is to not partition locally. Local tables aren't large there anyway so just skip it. But now you have a huge fork in your app.

Production works different than local dev and trust me, it'll bring your site down at some point as your team grows. 😂

Another option is docker. Now you have to use docker on your laptop!

I kid. I kid. Docker is cool. But I'm avoiding it as long as I can for as much as I can. So far so good.

Native Postgres Partitioning

After researching for a while, most everything I read recommended using native declarative postgres partitioning for performance reasons.

Faster Writes

The old way of partitioning involved triggers to put data in the right table. Native partitioning doesn't do this. No triggers means faster writes. Every little bit adds up.

Faster Reads

You don't just get faster writes though. Reads are also faster (sometimes). Native knowledge of partitioning means faster reads because postgres can use query conditions to prune partitions that wouldn't have matching data.

For both reads and writes, the fastest query is the one you don't make.

pg_party

I looked at pg_party for a while – a long while. I really couldn't quite get it to do what I wanted.

And honestly, I wasn't even totally sure exactly what I wanted because I hadn't used native partitioning before in any capacity.

Droppin' Low

Sometimes you gotta bang with the raw, low level stuff and feel the pain before you use high level stuff (that supposedly takes pain away).

So I whipped out trusty old GitHub::SQL. And what did I do?

I wrote some SQL my friends. By Hand! #wild #night

The result? It turned out pretty cool.

The Migration

For starters, I partitioned API request logs by day.

class CreatePartitionedAdapterRequestLogs < ActiveRecord::Migration[6.1]
  def change
    execute <<~SQL
      CREATE TABLE partitioned_adapter_request_logs (
        status smallint NOT NULL,
        method character varying(20) NOT NULL,
        path character varying(2000) NOT NULL,
        ip inet NOT NULL,
        token_id integer,
        created_at timestamp without time zone NOT NULL,
        headers jsonb
      ) PARTITION BY RANGE (created_at);
    SQL

    add_index :partitioned_adapter_request_logs, :token_id
    add_index :partitioned_adapter_request_logs, :created_at, order: {created_at: :desc}
  end
end

Partioning By Day

I then whipped together a PartitionByDay class to manage partition retention. You could tweak this to be by week, month, year or whatever you need.

I needed by day as the data I was partitioning got less valuable really quick. I also create aggregates from this raw data that don't get deleted, so the raw data doesn't really matter.

Note: if you try to use the code below you'll also need these patches to GitHub::SQL so it doesn't get confused between binds and postgres casts ::. And you'll need the virtus gem or to tweak the parts that use virtus.

class PartitionByDay
  class Row
    include Virtus.model

    attribute :name, String
    attribute :expression, String
  end

  def self.validate_table(table:)
    raise ArgumentError, "table cannot be blank" if table.blank?
    raise ArgumentError, "table must be a String" unless table.is_a?(String)

    table
  end

  def self.validate_name(table:, name:)
    validate_table(table: table)

    raise ArgumentError, "name must be a String" unless name.is_a?(String)
    unless name.starts_with?(table)
      raise ArgumentError, "name (#{name}) must start with table (#{table})"
    end
    unless name =~ /_\d{4}_\d{2}_\d{2}$/
      raise ArgumentError, "name must end with yyyy_mm_dd but does not (#{name})"
    end

    name
  end

  def self.validate_from(from:)
    raise ArgumentError, "from must not be nil" if from.nil?

    from
  end

  def self.validate_to(to:)
    raise ArgumentError, "to must not be nil" if to.nil?

    to
  end

  def self.validate_number(number:)
    raise ArgumentError, "number must not be nil" if number.nil?
    unless number >= 2
      raise ArgumentError, "number should be at least 2 or whats the point"
    end

    number
  end

  # Fetch all partitions for a given table.
  def self.all(table:)
    validate_table(table: table)

    rows = SQL.hash_results <<-SQL.squish, table: table
      SELECT pg_class.relname AS name,
        pg_get_expr(pg_class.relpartbound, pg_class.oid, true) AS expression
      FROM pg_class base_tb
        JOIN pg_inherits ON pg_inherits.inhparent = base_tb.oid
        JOIN pg_class ON pg_class.oid = pg_inherits.inhrelid
      WHERE base_tb.oid = :table::regclass;
    SQL

    rows.map { |row| Row.new(row) }
  end

  # Generate a partition name based on table and from time.
  #
  # table - The String name of the source table.
  # from - The Time of the new partition.
  #
  # Returns String partition name.
  def self.name(table:, from:)
    validate_table(table: table)
    validate_from(from: from)

    "#{table}_%d_%02d_%02d" % [from.year, from.month, from.day]
  end

  # Create new partition for provided table.
  #
  # table - The String name of the source table.
  # name - The String name of the new partition.
  # from - The Time to start the range of the partition.
  # to - The Time to end the range of the partition.
  #
  # Returns nothing.
  # Raises if anything goes wrong.
  def self.create(table:, name:, from:, to:)
    validate_name(table: table, name: name)
    validate_from(from: from)
    validate_to(to: to)

    binds = {
      table: SQL::LITERAL(table),
      name: SQL::LITERAL(name),
      from: from,
      to: to,
    }

    SQL.run <<~SQL.squish, binds
      CREATE TABLE IF NOT EXISTS :name
      PARTITION OF :table FOR VALUES FROM (:from) TO (:to)
    SQL

    nil
  end

  # Premake several partitions from a given time. Also tries to create a
  # partition for the from time so sometimes you ask for 3 partitions but get 4
  # if the partition does not exist for the provided time.
  #
  # table - The String name of the source table.
  # from - The Time to start premaking partitions from.
  # number - The Integer number of partitions to create.
  #
  # Returns nothing.
  # Raises if anything goes wrong.
  def self.premake(table:, from: Time.now.utc, number: 3)
    validate_table(table: table)
    validate_from(from: from)
    validate_number(number: number)

    start = from.to_date
    stop = start + number

    (start..stop).each do |date|
      new(table, date).create
    end

    nil
  end

  # Retain a given number of partitions and detch + drop the rest.
  #
  # table - The String name of the source table.
  # from - The Time to determine retention from.
  # number - The Integer number of partitions to older than from time.
  #
  # Returns nothing.
  # Raises if anything goes wrong.
  def self.retain(table:, from: Time.now.utc, number: 14)
    validate_table(table: table)
    validate_from(from: from)
    validate_number(number: number)

    date = from.to_date - number
    binds = {
      relname_pattern: "#{table}_%",
      max_relname: name(table: table, from: date),
    }
    prunable = SQL.values <<~SQL.squish, binds
        SELECT relname
          FROM pg_class c
          JOIN pg_namespace n ON n.oid = c.relnamespace
         WHERE nspname = 'public' AND
               relname LIKE :relname_pattern AND
               relkind = 'r' AND
               relname <= :max_relname
      ORDER BY relname
    SQL

    prunable.each { |name|
      detach(table: table, name: name)
      drop(table: table, name: name)
    }

    nil
  end

  # Drops a partition table.
  #
  # table - The String name of the source table.
  # name - The String name of the partition.
  #
  # Returns nothing.
  # Raises if anything goes wrong.
  def self.drop(table:, name:)
    validate_name(table: table, name: name)

    SQL.run <<~SQL.squish, name: SQL::LITERAL(name)
      DROP TABLE IF EXISTS :name
    SQL

    nil
  end

  # Detaches a partition from a table. Once detached you can do whatever with it
  # and it won't show up in query results.
  #
  # table - The String name of the source table.
  # name - The String name of the partition.
  #
  # Returns nothing.
  # Raises if anything goes wrong.
  def self.detach(table:, name:)
    validate_name(table: table, name: name)

    SQL.run <<~SQL.squish, table: SQL::LITERAL(table), name: SQL::LITERAL(name)
      ALTER TABLE IF EXISTS :table DETACH PARTITION :name;
    SQL

    nil
  end

  def self.exists?(name)
    raise ArgumentError, "name can't be blank" if name.blank?

    ActiveRecord::Base.connection.table_exists?(name)
  end
  class << self; alias exist? exists?; end

  attr_reader :from, :to, :table, :name

  def initialize(table, from)
    self.class.validate_table(table: table)
    self.class.validate_from(from: from)

    @from = from.to_time.utc.beginning_of_day
    @to = @from + 1.day
    @table = table
    @name = self.class.name(table: @table, from: @from)
  end

  def create
    self.class.create(table: @table, name: @name, from: @from, to: @to)
  end

  def premake(number)
    self.class.premake(table: @table, from: @from, number: number)
  end

  def retain(number)
    self.class.retain(table: @table, from: @from, number: number)
  end

  def detach
    self.class.detach(table: @table, name: @name)
  end

  def drop
    self.class.drop(table: @table, name: @name)
  end

  def exists?
    self.class.exists?(@name)
  end
  alias :exist? :exists?

  def all
    self.class.all(table: @table)
  end
end

The First Use of Partition by Day

I added a few bits to a PartitionedAdapterRequestLog class that uses the PartitionByDay stuff.

class PartitionedAdapterRequestLog
  # Name of the parent table that the partitions inherit from.
  def self.table_name
    "partitioned_adapter_request_logs".freeze
  end

  # Build a partition instance for a given time.
  def self.partition(time = Time.now.utc)
    PartitionByDay.new(table_name, time)
  end
end

The Maintenance Background Job

This code is invoked from a background job to perform maintenance on the partitioned tables.

class AdapterRequestLogMaintenanceJob < ApplicationJob
  queue_as :request_logs

  def perform
    ErrorReporter.rescue_and_report do
      PartitionedAdapterRequestLog.partition.premake(3)
    end

    ErrorReporter.rescue_and_report do
      PartitionedAdapterRequestLog.partition.retain(14)
    end
  end
end

The Cron

The last bit of machinery is that the aforementioned job is invoked via good_job's cron functionality.

module FlipperCloud
  class Application < Rails::Application
    config.good_job.enable_cron = true
    config.good_job.cron = {
      adapter_request_log_maintenance: {
        cron: "every 12 hours",
        class: "AdapterRequestLogMaintenanceJob",
      },
    }
  end
end

Conclusion

With a few hundred lines of code, I have adapter request logs partitioned in production. It's been working great for a month or so.

I no longer have to manually delete older data with carefully crafted queries. Pruning the data is easy (detach and drop table) and automatic (see jobs above) as it becomes less valuable (say after a few weeks).

I build aggregates on top of this raw log data. Those aggregates live on long after the raw data is pruned (orders of magnitude less data in these aggregate tables).

The pages that use this data are loading in a more snappy fashion because they are querying against less data.

For local dev, I'll likely use on-demand partition maintenance or a default partition since local data will stay small. This is nice as it means no fork in the road between local and production.

Overall, writing this partitioning stuff was fun and incremental.

  • I understand the guts.
  • The dependency (native postgres partitioning) has a large active community.
  • I learned something new and felt smart.

A++. Would partition again.