I Saw Stripe's Codebase. It Changed How I Pick Tech Stocks
Why the best tech moats don't show up on a 10-K. Understanding these moats will change how you invest.
I remember interning as a software engineer at Stripe, my first thought after seeing their monolithic codebase was along the lines of: “Maybe being a consultant and fixing powerpoints isn’t a bad gig.”
I mean there were coding design patterns I had never heard of, system architecture that would require days to understand, and so much internal tooling and documentation that I could quite literally feel myself drowning.
But this was a necessity for Stripe.
See, they prided themselves on their near-perfect uptime, and as a payment processor, availability of their services is key. Every hour they are down results in billions of dollars lost for merchants. A 10-K will never tell the story of the engineers driving such fault tolerance.
This brings me to my topic: the missing link to tie your tech investments together.
So what is this link? It’s the engineers designing the system architecture.
How do you evaluate this? We will tackle that question in this post.
Fault Tolerance - Skipping This Is The Fastest Way To Lose Trust
Fault tolerance starts from the management team. At Stripe, Patrick and John Collison were obsessed with uptime. This obsession spread to the engineers, ensuring every feature had been well-tested via:
A/B testing
Test cases (unit, integration, mocking, etc.)
Using feature flags and staging rollouts/Canary deployment
Immediate rollback if an error was logged
Heavy emphasis on monitoring via Grafana, Splunk, Sentry etc.
However, fault tolerance encompasses a larger idea. To explain this I will use Bob and his new e-commerce business of selling cookies.
Bob was happily selling his cookies using Shopify; however, a new startup, Zeroify, came along promising a lower take rate.
Thursday afternoon is prime time - this is when the bulk orders come. Suddenly, Zeroify stops accepting orders from Spain. What happened?
Zeroify could be facing:
A server that crashed
Network partition (one server cannot communicate with another due to outage)
Database server stopped working
Maybe the data is taking a while due to increased latency
Poor architecture
Shopify was protected from this. They had many years of experience and understood dynamic demand. Shopify replicated data to improve fault tolerance. They have redundant servers, if one fails then it can be replaced (a consensus algorithm can be used). Each server maintains well-logged events so it can easily recover from failures. There are consistent writes so an order is recorded everywhere or nowhere (this is known as atomicity).
Shopify built the system assuming faults would happen.
Zeroify had just as smart engineers as Shopify - but fewer in quantity. They were trying to play catch up with Shopify’s feature set, while Shopify perfected its system.
If I had to estimate, Shopify is spending close to $1 billion on cloud bills. They are not picking providers based on cost. They are choosing based on capabilities (GCP and Cloudflare).
Zeroify decides to go with a smaller cloud provider, not nearly as reliable as Google. They skip the content delivery network (CDN) to keep operating costs down. Their static content is getting served from halfway across the world. This is a design issue that cannot be undone with the snap of a finger - it will take a massive effort to change this.
Zeroify is a house built on sand.
Zeroify didn’t just lose Bob’s orders. They lost his trust. Bob is moving back to Shopify, and learned a very expensive lesson:
Cheap can be very costly.
User Latency - Don’t Bore Your Customers
A customer in Madrid wants to buy a pack of cookies, and clicks the “Buy Now” button. Nothing happens - that’s weird. He clicks the button again. Suddenly, he gets charged twice.
This is what bad latency looks like, and it creates a lot of issues.
Stripe is dead set on eliminating all friction. In fact, Patrick pushed for the Link product - a digital wallet for storing payment methods and accelerating the checkout process. Each friction point results in less sales for the merchant; therefore, less revenue for Stripe.
Google did interesting research on this topic. They found that users exposed to ≥ 200 ms delay did fewer searches. Amazon found that every 100 ms delay can result in 1% fewer sales.
Let’s go back to Bob and his cookies.
Why did this happen? With the modern cloud services, it is easy to spin up a server in any location in hours. Zeroify could have a server in Frankfurt; however, they decided this wasn’t a high priority - replicating the database across regions, sharding for fast lookups, and building the caching layer to speed up data reads and writes was skipped.
Unfortunately, Bob is the one who suffers.
Feature Set - A Cut From 100 Friction Points
Bob wants to know where his biggest customers came from in the last week. He sees that 1 month is the smallest timescale. Bob wants to write a blog for how to bake cookies. He notices he doesn’t have Sidekick - Shopify’s AI agent. Bob wants his Instagram store to automatically update with his new cookie flavour. There is no native Instagram integration. A customer messages Bob asking if his store accepts Klarna; Bob notices Klarna is not integrated. There is no abandoned-cart automation - so Bob decides to add Mailchimp instead.
Alone, these problems are solvable, but all of them together cause Bob a lot of issues. He just can’t catch a break.
On the other hand, Bob’s competitor, Sarah, continued using Shopify. She doesn’t mind the higher take rate as the system works smoothly. This smoothness didn’t solely come from sheer engineering throughput, it came from a decade-plus of data, integrations, partnerships, and failures that Zeroify doesn’t have.
Shopify is quite literally the operating system for e-commerce.
While at Stripe, it felt like a new idea was being shipped every week! Only the paranoid survive, and Stripe wasn’t just paranoid - they were out of their minds terrified.
Stripe has perfected their ecosystem, where they are the default for nearly any payments-related service. Acquiring Bridge for $1.1 billion shows their future-proofing for a world where stablecoins are used.
Stripe keeps expanding its ecosystem, but never compromises on the quality of the output. This is a green flag for investors: look at whether or not the company has had to rollback products.
Engineering Culture - The Engineers Should Want To Work There
Zeroify’s engineers have no identity. They clock in, do the work and leave. The conversations between themselves are limited to whether a feature has been shipped, and the occasional: “how are the kids?”
On the other hand, Shopify’s engineers would show up early. They knew each other well, asking about hobbies, families, and the recent trip John, the staff engineer, took last week. The senior engineers would freely help juniors to improve their craft, often taking hours out of their schedule to help. Conversations were more geared towards whether features were stable, tested, and if logging was set up correctly.
At Stripe, success was not just ‘more money for the management’, but rather ‘look at the impact we made!’ In fact, equity was part of every engineer’s contract so they all had skin in the game. People would show up early, stay late, and fly between SF and Dublin to engage with their teams - they all banked on the success of Stripe.
How To Analyse Culture?
To see the compensation engineers are getting, it can be useful looking on levels.fyi. Generally, the higher the compensation, the better the talent - I believe you can engineer your way out of anything if the team is right.
Look at the Compensation By Level section and compare against other firms.
This indicates a lot about the company and the importance they place on hiring the smartest people. This will undoubtedly result in more innovative products.
Additionally, read the engineering blogs companies have. It is always interesting to see the depth of knowledge and the problem complexity.
Recently, I read about Datadog’s Monocle database, a database made from scratch, written in Rust. This tells me a lot about the company: they are highly focussed on making a performant metrics platform, otherwise, making a database would be a very expensive hobby project. A green flag is seeing the results quantified, Datadog mentions a 60x ingestion and 5x query speedup. I don’t see any of Datadog’s competitors doing this, I believe this puts them in a good position for the future.
Conclusion
The complexity of Stripe’s codebase wasn’t an accident - it was the moat. Each line was carefully constructed, adhering to the highest quality. This results in a slick product that, essentially, always works. The engineers were not just shipping features, but ensuring that when something goes wrong, the system holds.
That is the moat. Not the product on the page (anyone can make a flashy product) nor the insane growth rate (that is a symptom of a well-architected system that people want).
Before taking a look at a company’s filings, take a look at: the system you are buying a piece of, the culture, engineering blogs (what are they doing to improve performance), and most importantly how reliable the software is. This tells you about the company’s survival chances more than any growth rate will.






and yet here we are with the SaaSpocalypse. The market is a funny beast.