Why Your PostgreSQL Database Won't Scale and What to Do About It

Database scaling is often the biggest headache when your application starts growing. Let’s take PostgreSQL, one of the most reliable, well-understood, and fastest-growing SQL databases. Everything works perfectly at first - your queries are fast, your application is responsive, and life is good. However, as more users start using your application, cracks begin to appear. It usually starts with random slowdowns during busy hours. Then you notice more error messages popping up when lots of users try to do things at once. Adding more CPU or RAM to your database server helps for a while, but eventually, you hit a wall where throwing more hardware at the problem doesn't help anymore.

PostgreSQL's design is fundamentally built for running on a single computer. While it's incredibly robust and reliable, this single-computer design creates bottlenecks that become obvious as you grow. For example, according to Robinhood’s tech blog - “Robinhood’s original brokerage trading system was built on a single database server with a dynamically scalable application server tier. This single database was not scalable and exposed us to gradual performance degradation during periods of high trading volume and market volatility.”

Companies have tried various ways to work around these limitations. The simplest approach is adding read replicas, which lets you spread out read operations across multiple servers while keeping all writes going to one main server. Another common approach is manual sharding, where you split your data across multiple PostgreSQL servers based on some grouping like user ID ranges or geographic regions. While this can technically scale forever, it makes everything much more complicated. Changing your database structure needs careful coordination across all shards, querying data across different shards becomes very complex, and keeping everything consistent requires lots of careful programming.

Modern distributed SQL databases offer a fresh approach to these scaling challenges. Unlike traditional sharding, where application developers have to manage how data is split up, these systems handle all of that automatically while keeping data consistent.

The numbers these distributed SQL databases can handle are staggering. In real-world use, they regularly manage data volumes from hundreds of terabytes to petabytes, horizontally scale across thousands of servers globally, handle writes on all these servers, and thereby support a massive number of simultaneous connections while keeping the response times consistently fast. For example, Temenos processes over 400,000 operations per second during peak hours on their YugabyteDB system, with automatic failover and zero downtime when servers need maintenance. Google Spanner, CockroachDB, and TiDB are other well-known massively scalable OLTP distributed SQL databases. What's really impressive isn't just the scale but how these systems keep data consistent even at this size, using clever techniques like hybrid logical clocks and advanced consensus protocols.

However, all this power comes with some trade-offs. Distributed SQL databases need more resources than a single PostgreSQL server for the same workload. You typically need at least three servers for basic fault tolerance, and each server should be about as powerful as what you'd use for PostgreSQL. This means higher infrastructure costs, especially when you're just starting out. These systems are also more complex - while they handle server failures and network problems gracefully when things do go wrong, figuring out what happened requires a deeper understanding of distributed systems. Many teams find they need to invest in training to use these new databases effectively.

The decision to switch to distributed SQL should be based on clear business needs. If you're handling less than 5,000 operations per second, your data is under 500GB, and you don't need global distribution, stick with PostgreSQL. It's simpler to manage and cheaper to run. But if you're hitting these limits, it's time to consider distributed SQL seriously. While the transition takes significant effort and money, the long-term benefits of easier operations and unlimited scalability usually make it worthwhile.

Conclusion

Looking ahead, distributed SQL databases are becoming an increasingly important part of modern application infrastructure. While PostgreSQL remains excellent for many applications, understanding when and how to use distributed SQL is becoming a crucial skill for engineering teams. Many organizations are finding success with a hybrid approach - using PostgreSQL where it makes sense and distributed SQL for components that need global scale or high availability. The key is understanding your specific needs and choosing the right tool for each job while keeping things as simple and reliable as possible.

Author

Srinivas Pothuraju

Srinivas Pothuraju is a seasoned software engineer with a deep passion for distributed systems, Kubernetes, and database technologies. Srinivas specializes in designing scalable architectures and optimizing cloud-native applications. His expertise lies in integrating cutting-edge technologies like Kubernetes with distributed SQL databases, providing innovative solutions for modern application needs.

Read the latest articles from Srinivas Pothuraju

Running Distributed SQL databases on Kubernetes

September 1, 2024

As a technology enthusiast who has had the opportunity to work extensively with both Kubernetes and distributed databases, I must say that running distributed SQL databases on Kubernetes presents a fascinating intersection of distributed systems complexity. The orchestration of stateful workloads on Kubernetes, p [...]

Learn more

Scaling Smart: How Architecture Outweighs Language in Performance

July 27, 2024

Recent TechEmpower benchmark results have sparked intense debates in the software engineering community about language performance. In their JSON serialization benchmark tests, Go-fiber demonstrates exceptional performance by processing 225,871 requests per second, while Java Spring follows with 160,252 requests [...]

Learn more

Comments (0)

No comment

All comments are moderated. Spammy and bot submitted comments are deleted. Please submit the comments that are helpful to others, and we'll approve your comments. A comment that includes outbound link will only be approved if the content is relevant to the topic, and has some value to our readers.

Your IP	Hide My IP
IP Location	, ,
ISP
Platform
Browser

Blog Post View

Why Your PostgreSQL Database Won't Scale and What to Do About It

Conclusion

Author

Read the latest articles from Srinivas Pothuraju

Running Distributed SQL databases on Kubernetes

Scaling Smart: How Architecture Outweighs Language in Performance

Comments (0)

Leave a comment

About Us

Popular Topics

Company Info

Socialize

Blog Post View

Why Your PostgreSQL Database Won't Scale and What to Do About It

Conclusion

Share this post

Author

Read the latest articles from Srinivas Pothuraju

Running Distributed SQL databases on Kubernetes

Scaling Smart: How Architecture Outweighs Language in Performance

Comments (0)

Leave a comment