Shrinking a Postgres Table
Ok folks, this is kind of a weird one. I'm going to put it in the "you won't ever need this, but if you do, you are going to be glad I wrote this up for ya" category.
As you may or may not know, I recently acquired fireside.fm, an awesome podcast host. While investigating a slow query, I noticed that the database servers were at 87% disk space use.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 158G 129G 21G 87% /
I typically use a managed Postgres, which makes it real obvious when you are starting to use a lot of disk space and easy to fix. But this is self-managed Postgres on Linode.
Servers don't like to run out of disk space (or even get close to running out).
What is taking up the space?
So I started up a casual chat with GitHub's Copilot. You know, the typical, "how are you?", "oh I'm fine, you?", "yeah, same", "do you like long walks on the beach?", "yeah, my database is about to fall over too".
I asked Copilot how I could quickly get the size of the database:
SELECT pg_size_pretty(pg_database_size(current_database())) AS database_size;
-- 114 GB
Unfortunately, my database was the problem, not some rogue log file or backup retention policy.
My next question was which table is taking up all the space:
SELECT
table_name,
pg_size_pretty(pg_total_relation_size(table_name)) AS total_size
FROM
information_schema.tables
WHERE
table_schema = 'public'
ORDER BY
pg_total_relation_size(table_name) DESC;
The usual suspects filled up the top 10 (analytics/metrics):
table_name | total_size
-----------------------------------+------------
downloads | 57 GB
downloads_archive | 30 GB
day_downloads | 12 GB
location_downloads | 6235 MB
metrics | 3537 MB
user_agent_downloads | 2145 MB
spotify_day_downloads | 2072 MB
downloads
seemed the largest so I started there. I wasn't entirely sure how the table was used (again I'm new to this app). So I poked around for min and max created_at
times. To my surprise, there were rows going back to 2021.
Why is the downloads table large?
The downloads
table seemed to be a record of all the raw downloads before they are aggregated into tables more efficient for querying in a web request. As downloads are processed, they are deleted from the table.
The problem was many records (10's of thousands an hour) were not being processed and thus not being deleted. Those downloads seemed to be for episodes that no longer exist for various reasons.
Over the course of many years, this added up to many 10's of millions of rows and 35% of the disk space. I sliced and diced several queries and determined that every download older than a few hours was effectively bad data and not processable.
Now I knew what my issue was, but I wasn't sure about the best fix.
Brainstorming with Copilot
If you don't know this already, Copilot is great for brainstorming stuff like this. Talk human to it and it'll throw ideas back to you.
Copilot really wanted me to delete the rows in batches of 1-10k and then vacuum
at the end. But not vacuum full
because that would lock the database and bring down the house.
I ran the first query to delete 1k records:
WITH rows_to_delete AS (
SELECT id
FROM downloads
WHERE created_at < '2022-01-01'
LIMIT 1000
)
DELETE FROM downloads
WHERE id IN (SELECT id FROM rows_to_delete);
But it took like 20 seconds.
I wasn't going to sit around and run this 30 - 50 thousand more times.
I also didn't really feel like making a background job to handle this for me.
Additionally, copilot wasn't sure how much space would actually be freed up by a vacuum
(as compared to a vacuum full
).
The easiest, fastest and most guaranteed way to free up space for a table is to drop it. But I did want some data out of it. That's when I remembered table swapping.
I asked Copilot for the best way to clone the downloads table:
CREATE TABLE downloads_new (LIKE downloads INCLUDING ALL);
🆒
And how do I insert only the rows I need, those that are downloads for episodes that exist:
INSERT INTO downloads_new
SELECT d.* FROM downloads d
INNER JOIN episodes e ON e.token = d.episode_token
WHERE date_trunc('day', d.created_at) >= '2024-11-27';
😎
And lastly, how do I swap the tables:
ALTER TABLE downloads RENAME TO downloads_old;
ALTER TABLE downloads_new RENAME TO downloads;
Sure, I could have looked these up in the Postgres docs or on Stack Overflow, but the ability to paste in your schema for your table and say in English what you want to do and get exact commands to copy is just amazing.
The only bit left was to drop the original table with all the junk data:
DROP TABLE downloads_old;
At this point, I had a brand, spanking new downloads
table with only 30-40k records (past hour or so of downloads) and a whole lot more disk space:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 158G 80G 71G 53% /
The best part is it took about 10 seconds to run the queries and I lost, at most, if anything, a few seconds of data during the data dump and table rename.
A worthy trade off compared to hours of coding and babysitting.
Brainstorming with Real People
Now, it's worth mentioning that I wasn't entirely sure if this was going to get me the result I hoped for (no downtime, low effort, and freed up space).
So first, I dropped into my handy dandy, top secret slack of battle scarred programmers who are smarter than me.
Akin to Copilot, I posed the same question and I got a variety of other options. I felt good about all of them, but none could compare (IMO) with the simplicity of the table swap.
No one seemed overly scared that the table swap wouldn't work. And I got a yes from two people who thought it would be easiest. That was enough to give me the confidence to try it first.
Cheers to top secret slacks with smart people and AI chat bots! 🍻
So that's how I spent my Thanksgiving morning!
And, honestly, I'm good with that. Feeling thankful even.
I learned something new. I saved my database.
And now I'm going to eat some great food and hang out with family sans a low undercurrent of stress wondering if my poor little database has crept from 87% to 88% disk usage or even higher!