Design Twitter - 1: Requirement and Storage Selection

11/09/2019 posted in  System Design

1. Requirement

1.1 Use Case

As a user

I can create my profile
I can post a tweet
I can list my tweets posted before
I can follow / unfollow other users
I can list my followers and my followees
I can see others tweets in my homepage timeline
I can see my own tweets in my profile timeline
I can read a tweet (unlike facebook, you don't  need to click into the message.)
I can delete my tweets, so that no one can see it any more. 

NOTE: If this is an interview, you'd better list some common use cases, but pick one or two key use cases to go in depth. Time slips away quickly when you explain a design. You want to show the breadth and depth of your knowledge.

1.2 Volume

Ask for TPS from your interviewer, unless he asks you to estimate it.

A useful way is to start from MAU or DAU based on US population and the relative size of each use case.

It took me 20 minutes to think about the TPS of each use case below. This is absolutely not feasible in an interview. So in reality, we should go with just the read and write of one use case or two.

Your interviewer would like to see how you estimates, not the accurate number. So I intend not to go too high for the DAU. On one hand, You may trap yourself into an over-challenging problem. On the other hand, we can start with US market first, and if we still have time, we can think about scaling it to EU and FE by replication.

  • MAU: 2% US people - 60 Million
  • DAU: 1/3 MAU - 20 Million 1

  • TPS

API Peak TPS Reason
Show timeline (read tweets) 10k Assume every DAU access it twice with one refresh.
20e6 * 4 / (24 * 60 * 60) ~ 1000.
Consider peak hours and peak events, we give it 10 times buffer.
Post a tweet 100 1% of read.
Comment a tweet 1k 10 times of post.
Delete a tweet 10 Rare
List my tweets 10 Rare
Follow a user 100 1% of read
Unfollow a user 10 Rare
List my followers 100 Same as follow
List my followees 100 Same as list followers
  • Storage
    600 Million entries in user table (Assume 10% of all users are MAU)
    315 Million tweets per year (10% of Peak post TPS is 10. 10 * 365 * 24 * 3600 = 315 M)

2. API

Each use case can be a RESTFUL API.

3. Storage Choice


Both user table and tweets table are too big for RDBMS without sharding. Usually one MySQL table is good for < 1M rows.

We can do sharding like this:

  • partition user table based on user id.
  • partition tweets table based on tweet id.

Sharding RDBMS is painful and error prone:

  • Need a proxy layer to route requests.
  • Cannot access data based on other columns instead of the partition key.
  • Rescaling is difficult, may need to turn off the whole system.
  • You cannot do join tables or select columns using flexible where clause.

Therefore RDBMS is not considered as scalable.

3.2 NoSQL

No SQL is good for this use case - No complex join, No transaction, Eventual consistency is enough.

Hbase/DynamoDB/MongoDb and even Redis will all work.

I'll talk about my schema design using HBase, DynamoDB and Redis respectively in the next few articles.

  1. World wide DAU in Q4 2018: Facebook 1,520M; Snap 186M; Twitter 126M. Reference: