Numbers Everyone Should Know

Google AppEngine Numbers
This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine.
Writes are expensive!
* The size and shape of your data
* Doing work in batches (batch puts and gets)
Reads are cheap!
* For a 1MB entity, that's 4000 fetches/sec
Numbers Miscellaneous
This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google.The Lessons
The Techniques
Keep in mind these are from a Google AppEngine perspective, but the ideas are generally applicable.Sharded Counters
We always seem to want to keep count of things. But BigTable doesn't keep a count of entities because it's a key-value store. It's very good at getting data by keys, it's not interested in how many you have. So the job of keeping counts is shifted to you.The naive counter implementation is to lock-read-increment-write. This is fine if there a low number of writes. But if there are frequent updates there's high contention. Given the the number of writes that can be made per second is so limited, a high write load serializes and slows down the whole process.
The solution is to shard counters. This means:
This approach seems counter-intuitive because we are used to a counter being a single incrementable variable. Reads are cheap so we replace having a single easily read counter with having to make multiple reads to recover the actual count. Frequently updated shared variables are expensive so we shard and parallelize those writes.
With a centralized database letting the database be the source of sequence numbers is doable. But to scale writes you need to partition and once you partition it becomes difficult to keep any shared state like counters. You might argue that so common a feature should be provided by GAE and I would agree 100 percent, but it's the ideas that count (pun intended).
Paging Through Comments
How can comments be stored such that they can be paged through
in roughly
the order they were entered?
Under a high write load situation this is a
surprisingly hard question to answer. Obviously what you want is just a counter.
As a comment is made you get a sequence number and that's the order comments are
displayed. But as we saw in the last section shared state like a single counter
won't scale in high write environments.
A sharded counter won't work in
this situation either because summing the shared counters isn't transactional.
There's no way to guarantee each comment will get back the sequence number it
allocated so we could have duplicates.
Searches in BigTable return data
in alphabetical order. So what is needed for a key is something unique and
alphabetical so when searching through comments you can go forward and backward
using only keys.
A lot of paging algorithms use counts. Give me records
1-20, 21-30, etc. SQL makes this easy, but it doesn't work for BigTable.
BigTable knows how to get things by keys so you must make keys that return data
in the proper order.
In the grand old tradition of making unique keys we
just keep appending stuff until it becomes unique. The suggested key for GAE is:
time stamp + user ID + user comment ID.
Ordering by date
is obvious. The good thing is getting a time stamp is a local decision, it
doesn't rely on writes and is scalable. The problem is timestamps are not
unique, especially with a lot of users.
So we can add the user name to
the key to distinguish it from all other comments made at the same time. We
already have the user name so this too is a cheap call.
Theoretically
even time stamps for a single user aren't sufficient. What we need then is a
sequence number for each user's comments.
And this is where the GAE
solution turns into something totally unexpected. Our goal is to remove write
contention so we want to parallelize writes. And we have a lot available storage
so we don't have to worry about that.
With these forces in mind, the
idea is to create a counter per user. When a user adds a comment it's added to a
user's comment list and a sequence number is allocated. Comments are added in a
transactional context on a per user basis using Entity Groups. So each comment
add is guaranteed to be unique because updates in an Entity Group are
serialized.
The resulting key is guaranteed unique and sorts properly in
alphabetical order. When paging a query is made across entity groups using the
ID index. The results will be in the correct order. Paging is a matter of
getting the previous and next keys in the query for the current page. These keys
can then be used to move through index.
I certainly would have never
thought of this approach. The idea of keeping per user comment indexes is out
there. But it cleverly follows the rules of scaling in a distributed system.
Writes and reads are done in parallel and that's the goal. Write contention is
removed.