Predicting Industrial Failures

Big machines give off more signals than anyone can read by hand. The work was turning those signals into warnings people trusted.

Freight locomotives are enormous, expensive, and unforgiving. When one breaks down mid-route, the cost isn't just the repair. It's the freight that didn't get delivered. The line that got blocked. The crews waiting on the next available engine. The downstream contracts that slip because the inventory didn't show up on time. One unexpected failure can ripple across a supply chain for days.

A major industrial company wanted to solve that problem. They signed a large contract with a new industrial AI startup. The idea was to take the massive amounts of telemetry data these machines were already producing and use it to predict failures before they happened. Instead of reacting to breakdowns, you'd see them coming weeks out and fix the problem during a scheduled stop. The whole category was called "predictive maintenance," and nobody had figured out how to make it work at industrial scale yet.

I joined early, before the product really existed. My job was to figure out what it should be, how someone inside a large industrial company would actually use it, and then work with the data and engineering teams to build it from nothing. A lot of it came down to understanding how an engineer at a massive industrial company actually makes a decision. What signals they trust. What they ignore. How much warning they need before a fix becomes useful. The technical model was one piece of the puzzle. The product around it was a much bigger one.

The company grew fast, and I spent a few years on the ground floor of it, learning what it takes to build something from nothing inside a company that's also building itself from nothing. The product worked. The predictions worked. A bunch of locomotives didn't break down that otherwise would have. And somebody on the operations side got to sleep a little better at night.

← Back home