"I'm sorry to those who come after me. I tried my best. Let me explain what this code does, because it's basically unreadable."
This is a comment I wrote about 5 years ago on one of the biggest hacks I put into production ever.
I had a large ETL job fail. It was Friday. I was at a company party. I went back to my room and worked until 1am to just get the fix into production, just so we could get it unblocked.
What was the nature of this hack? To understand that, you had to understand the problem. The system was one where we had to store hierarchal information in a relational db so that you could at runtime you could quickly look up a path to the entity from parent to child. The problem was that we would need to make large scale adjustments to the tree structure. I’m talking about moving 1 million records from on one node on a tree to another and doing potentially 1 million of those kinds of operations. Oh we had to do it all in one transaction.
The amount of data we had to bring into memory to compute all of this stuff was enormous. And the performance bottleneck was that originally it was thought to be a bright idea to do the computation of the path for this key on the db.
Once we got our first real bit of high ingest load, the db’s CPU got thrashed to death.
So what was my solution? The hack that I threw together at 1am? I pulled out that computation from the db and wrote some really messy code to compute the path key myself so that the db could just insert the records and not have to figure out where in the tree to put the records too.
I asked several other engineers for review and alternatives after the fact. No one had a better idea of what to do. A week or so later I added that comment and never looked at it again. The code worked flawlessly until the system was put out commission.
I’ve thought about that system many times over the years. I’ve often wondered if we could have foreseen that problem and fixed it. When you’re writing a thing, you don’t really know what performance looks like until you really get load, but you have to make choices in order to get there.
If you can, figure out how to make a thing adaptable to change as much as you can, but that itself is a tradeoff.
Working and correct software in production that looks a little ugly is preferable to waiting days or weeks to find the perfect solution.