What AI Actually Does to Legacy Software
I steward a twenty-year-old software company. We host a platform where art galleries collect applications from artists for juried exhibitions. We've processed millions of dollars in fees over the past two decades. Yet, we are still trudging through latent issues.
A few weeks ago we set out to migrate the way galleries collect those fees from a legacy system to a modern one. The legacy system was running fine in the way legacy systems "run fine". Meaning, every few weeks a customer wrote in to say their gallery's event page advertised one fee, but artists were charged another. Someone on our team had to dig into a twenty-year-old admin panel to figure out why. Digging into a single gallery could mean an hour of work. Multiply by the hundreds of galleries we serve, and doing this every time an issue came up was unsustainable.
This piece is about how we did the migration, what AI did during it, and what AI did not. Most of what gets written about AI is written by people who have not used it for anything load-bearing. Well, we have deployed against legacy, load-bearing systems (with some success!). Here's what actually happens:
§The artifact
For twenty years, our platform let art galleries paste in raw HTML for their page. Whatever they pasted became the payment button. This button would contain the fee calculation. We would help generate it, but sometimes users would want to modify the HTML and sometimes JavaScript themselves. The previous owner would sometimes create custom buttons for galleries, allowing them to deviate from the norm ($35 base fee) to the obscure ($10 base fee + $5 per art piece for your first 3 and then $10 per piece after, 2x each of those for non-members).
Right next to that textbox was another textbox where the gallery wrote prose for the artist, like "Members $10, Non-members $45." These two boxes had no explicit relationship to each other. The artist read the prose. The button charged according to the HTML. The gallery was in charge of verifying these two matched. There were no checkers, no validation. Just raw code. Although we had a "wizard" to generate the HTML, users were free to create their own HTML (and they did!). Justin wrote about how LLMs have affected Buttondown. We saw increasing support requests in the same vein.
This is the kind of system you build in 2010 and never quite get around to replacing. It "worked". The gallery was responsible for what they pasted. We were responsible for what we charged. We could claim innocence when there were issues. But should we?
When there were mistakes, sure our support inbox would be flooded by artists and galleries alike. But the support-by-a-thousand emails felt facile compared to the alternative of ripping out a foundational system. Sure, it caused a brief reputational hit, but what is the value of your reputation compared to a weekend of grinding out a new payment system?
OK, we needed to make a change.
So we built the new system. Now, the application fees were stored in defined fields, and the button was generated from them. Prose and the charge amount cannot drift apart. Computing fees was easy.
But building the new system was the easy part. Now we must upgrade 282 active fee-collecting events.
§What was inside the textboxes
Before we could migrate anything, we had to look at what was actually in the textbox. Twenty years of paste-whatever-you-have produces a long tail of weirdness. AI made it possible to look at all 282 of them in one sitting.
A few examples:
A gallery in Hawaii had two different deposit accounts wired into the same event. The gallery's HTML textbox routed payment to one account; a backup button they had also pasted into the prose textbox routed to a different one. Twenty artists had submitted. Depending on which button each artist clicked, the money went to one of two bank accounts. Which makes it hard to track down, provide refunds, and add fixes where we needed to! This needed to be cut down to the intended account.
A Gallery in Connecticut had pasted their entire payment button into the prose field, leaving the HTML textbox empty. The form rendered the button anyway, since browsers are forgiving. Thirty-one real artists submitted through it. The gallery's accountant could not have told from looking at the page that anything was wrong. We needed to move their payments to the correct textbox.
A Gallery in New York had two active events with no deposit account configured at all. Broken from the moment of creation. But the gallery had eighteen prior events stored in our system. The account used in each of those past events was visible in the old HTML, and we could trace how it had migrated over time: an Earthlink (yes, I haven't heard of this internet provider either) address used in 2018, a switch to Gmail in 2020, the same Gmail through five subsequent events. We recovered the right account from the gallery's own history.
A Gallery in California had marked an event as "paid" in our system but configured no payment button at all. We had quietly assumed this kind of state meant the gallery was broken. It wasn't. They were collecting fees through their own website, which had processed fees from forty-seven artists. They had simply indicated we were collecting fees, when in reality they were collecting fees. The configuration looked broken. The fee-collections were humming along just fine.
These four examples are the texture. Multiply by 282. Lots of bespoke buttons and situations, all of which needed to be collapsed into a consistent system. Talk about a big band-aid to rip. This might hurt a bit!
§What AI did
Harrison built a tool, call it The Analyzer, which read the HTML in each of those 282 textboxes and reported what fee it would actually charge, what PayPal account it would route to, and what kind of pricing structure it had: flat fee, dropdown of options, tiered pricing, or something custom.
For each event, The Analyzer compared what the legacy hand-pasted HTML would charge against what the modern system would charge if we migrated it. If those numbers matched within a penny, the migration was safe. No artist would see a different price. If they didn't match, the migration was skipped, with a note explaining why. This would allow for a manual review later.
This is the kind of work AI is good at. It is repetitive. It requires careful reading of code that nobody alive understands well. It rewards patience. A junior engineer could have done it in three months if they wrote all the code. It would have been more efficient for them to burn a few days manually checking everything. AI did it in a single session.
The pattern matters. AI was not generating the migration. AI was reading twenty years of customer-supplied configuration and producing a structured decision per event: safe to flip, or skip with a reason. Every flip was reviewed by a human (me!). Every database change was previewed before it was applied. Nothing modified production directly.
By the end of the verifiable batch:
- 101 of 101 simple flat-fee events passed verification and migrated cleanly.
- 64 of 66 events with multiple price options migrated cleanly.
- 39 of 40 events with tiered pricing migrated cleanly.
Where The Analyzer flagged a mismatch, the mismatch was usually real. In one case the difference was a single character — "Non member" versus "Non- member". The prices and structure were identical. The labels had drifted because somebody had once edited the event in the modern system, and the modernized label had been saved alongside the original. Trivial. But the strict matching rule caught it. Lenient matching, "the prices match, the labels are close enough", would have flipped it silently and pushed a small drift downstream forever.
§What AI did not do
AI does not have judgment about what a customer actually meant. It has pattern-matching on what their data looks like. The two are often close. Sometimes they are not.
Late in the project I made a quiet decision that turned out to be wrong. There were eighty-four events in our system that had been marked "paid", but had no payment button configured at all, no HTML in the textbox. I assumed most of these were galleries who had clicked the wrong setting. For seven of them, I reclassified the setting from "paid" to "free."
After a brief pause, I asked: did you actually check what the gallery wrote in their prospectus? What did they say?
I checked. Every one of the seven prospectuses was empty. There was no signal at all. The gallery had clicked "paid" and walked away. I had reclassified them based on what looked like the most likely explanation, not on any actual evidence.
I reverted all seven. They went back into the "paid but unconfigured" bucket — which is what they actually are. That bucket is not a data problem. It is a real reflection of where galleries are in their own setup process. Some of them will finish later. Some of them never will. The fact that the bucket exists is information about our customers, not noise to be cleaned up.
This is the part of AI work that does not show up in the demos. The model will happily make whatever change you tell it to make. It has no sense of whether you are working from evidence or from a hunch. The discipline the human brings is: do I actually know this, or am I assuming? AI accelerates whatever discipline the human has. If the human is sloppy, AI is sloppy, but faster.
§What it produced
We started the session with 51 events on the modern system and 282 on the legacy hand-pasted one, about 15% on the modern path. We finished at 293 on the modern system and 39 on the legacy one, about 88%. The bump from the original number included some shows in the past, so we could be consistent. Many galleries would copy a previous show as a template for their next one. I wanted to make sure everybody was working on the modern system, even if they duplicated an old show.
Across all of it, no artist saw a different fee than they would have paid the day before. The verification rule held.
The remaining 39 are not migrations we couldn't do. They are events where doing the migration is the wrong move, galleries running custom-built payment buttons that need translation, galleries with edge-case pricing the modern system doesn't represent yet, past events whose original configuration is the historical record of what they charged and shouldn't be rewritten. Each of those is a one-by-one decision. There is no batch tool for them. This took a few hours to get across the finish line.
Underneath the count is the bigger result. The class of support ticket that drove this project, gallery says one fee, button charges another, should disappear for the migrated events. The two fields can no longer drift, because there are no longer two fields.
§What this work says about AI
A few takeaways for anyone trying to make sense of where AI fits in a real business.
AI is good at high-volume, low-judgment reading. The 282-textbox tour was the kind of work nobody wants to do, that takes weeks if done manually, and that is mostly mechanical pattern recognition. The cost of doing it dropped by an order of magnitude. The work itself didn't change.
AI compounds the operator's discipline. Every safe move I made, reversible changes, human-reviewed previews, strict matching rules, one wave at a time, was a move I could have made without AI. AI made the moves cheaper to make in volume. The seven-event reversion was a move I would have had to make either way; the difference is that AI helped me make the wrong call faster.
AI does not replace the part of the job that requires knowing your business. Knowing that "paid event with no button" is a real customer state, not a data error, takes years of looking at the same kind of data. Knowing that the school in the midwest receives some of the routing emails for one program but not the others takes context AI cannot have. Knowing when to leave a twenty-year-old record alone, because it is the only audit trail of what an artist was actually charged in 2018, is a judgment call AI will make wrong by default.
The interesting work is the cleanup surfaced from the migration. Five different galleries had configured "member discount" pricing in a way the modern system does not represent correctly. The migration would have flipped them blindly. Instead, each one became a small product conversation: how should the modern system represent this pattern? That is the real engineering question. AI surfaced it. It cannot answer it.
We will keep doing work like this, using AI for the volume tour, using human discipline to decide what is safe to change, and using both to find the next product question worth answering.