Web Performance Tuning Pattern: Database Sharding

Posted on 13. Aug, 2013 by ger in Performance Tuning Patterns

I was recently talking to an engineer at another company and he was asking about my opinions on database sharding.

I’ve worked on a number of web-applications including Goshido and a customer’s application that I’ve recently helped to scale. We successfully used a number of scaling techniques but so far we haven’t had to go as far as database sharding.

In the past, I wrote a pattern language for performance tuning embedded applications [1]. Patterns are a simple literary mechanism to share experience and impart solutions to commonly occurring problems.

So out of curiosity, I decided to do some research on sharding and write it up as a pattern for web application tuning.

web-performance-tuning-database-sharding

Context

  • Scaling a web application with a backend database
  • Database write performance or capacity is a confirmed bottleneck
  • Database configuration tuning has already been implemented
  • It is no longer feasible to move the database to a larger server

Problem

Eventually you can arrive at a situation where database write performance or database capacity is the limiting factor for your system scalability.

Solution

Distribute some of your application data across a number of database shards. The application or database implements a translation from an object id to shard database. Essentially your application now has one logical database layered above a number of physical databases.

For example, imagine a micro-blogging application where users are frequently submitting posts. You could use a modulus of the “post id” to distribute each post to one of many database shards. Later when someone wants to read a post you would translate the “post id” to a shard id before retrieving the post from the appropriate database.

This example demonstrates a key-based partitioning strategy. Other strategies include:

  • Vertical partitioning: where different types of data are stored on different databases (profiles, posts, users)
  • Directory-based partitioning: a lookup table that translates object id to shard

For detailed description of other strategies (like date-time and master indexes) see Max Indelicato’s blog post below [2]. Eric Ries outlines a URL based sharding strategy that enables you to incrementally shard a system over time [3]. He recommends quickly building non-scalable solutions then when the product gets traction, start sharding by identifying “highly-trafficked tables that are rarely joined with any others”.

Forces

  • Some NoSQL (NoJoin) databases (i.e. MongoDB) have integrated support for sharding and shard balancing.
  • If you are using a traditional relational database (i.e. MySQL) you probably need to implement application level sharding (although some databases support forms of sharding like federations in Windows Azure SQL Database).
  • Application level sharding adds complexity.
  • Changing sharding strategy or number of shards can be complicated and may involve moving data from one shard to another increasing deployment complexity/time.
  • Sharding can impact other system workloads. In the example above, if we wanted to show multiple posts for a specific user we would need to issue read requests to multiple database shards.
  • Sharding limits your capability to join across tables. To get around this you may need to modify your application to decompose joins, denormalize the data or copy some data to the shards [6]. This will increase complexity.
  • Directory based partitioning has a single point of failure.

Further reading

  1. “Tricks and techniques for performance tuning your embedded system using patterns”: a series of articles I co-authored with Peter Barry for Embedded Magazine a few years ago part 1, part 2, part 3.
  2. Max Indelicato’s Primer on Database Sharding has more details on implementing partitioning strategies
  3. Sharding for Startups by Eric Ries
  4. Book – Cloud Architecture Patterns by Bill Wilder. While the sub-title mentions Azure, this book is general enough to be applied to other cloud platforms.
  5. Book – Scalability Rules: 50 Principles for Scaling Web Sites by Martin L. Abbott; Michael T. Fisher.
  6. Book – Professional Website Performance: Optimizing the Front-End and Back-End by Peter Smith

Photo Credit: Winter Ice Mosaic by Ctd 2005.

Tags: , , , ,

Comments are closed.