The Interview Where I Had to Build Instagram From Scratch

I had a system design interview where the prompt was basically:

“Design Instagram from scratch. Handle feeds, media uploads, caching, high-traffic accounts, and ensure that when someone unfollows another user, their photos immediately become non-viewable.”

Here’s exactly how I approached it as a developer and engineering manager — clean, technical, and to the point.


1. Requirements I Clarified

Functional

  • Users can create accounts, follow/unfollow, post photos, view feeds.
  • Photos and profiles must reflect privacy controls.
  • When user A unfollows user B, A instantly loses access to B’s posts.

Non-functional

  • Heavy read volume.
  • Very heavy write bursts (from celebrities).
  • Low latency feed (<300ms).
  • Strong access control for private posts.
  • High availability, horizontal scale.

2. High-Level Architecture

                 +--------------------+
     Client  --> |    API Gateway     |
                 +---------+----------+
                           |
    +----------------------+------------------------+
    |            |             |           |        |
+---v---+   +----v----+   +----v----+  +---v---+   +------+
| Auth  |   | Profile |   | Follow  |  | Feed |   | Media |
| Svc   |   | Svc     |   | Graph   |  | Svc  |   | Svc   |
+---+---+   +----+----+   +----+----+  +---+---+   +--+---+
    |            |             |           |          |
    |        +---v-----+   +---v-----+  +--v-----+   |
    |        | User DB |   | Follow  |  | Feed DB|   |
    |        +---------+   | EdgesDB |  +--------+   |
    |                      +---------+              |
    |                                                |
    |               +-------------------------------v------+
    |               |     Object Storage + CDN (Photos)    |
    |               +--------------------------------------+

3. Caching Strategy (Core to Instagram-Scale Systems)

Instagram lives on caching. I made this explicit in every layer.

Cache Layers I Defined

  1. Feed Cache (critical)
    • user_feed:{user_id}
    • TTL ~ 30–60 seconds
    • Stored in Redis / Memcached
    • Invalidated when:
      • new post arrives
      • user follows/unfollows someone
      • privacy settings change
  2. User Profile Cache
    • username, follower count, bio
    • Heavy read, low write
    • Cached aggressively with TTL 3–10 minutes
  3. Follow Graph Cache
    • following:{user_id} = list of users they follow
    • followers:{user_id} = list of users following them
    • Stored in memory (Redis) for fast fan-out and auth checks
    • On follow/unfollow:
      • update DB
      • update cache
      • push invalidation event
  4. Post Metadata Cache
    • Post captions, timestamps, media references
    • Small objects, ideal for Redis
    • TTL ~ 24h (safe to keep for performance)
  5. Media Cache (CDN)
    • Real images cached at the CDN edge
    • Origin fetch is rare
    • Signed URLs protect access

Caching is not optional — Instagram cannot function without it.


4. Posting a Photo (Flow + Cache Interaction)

Client -> API Gateway -> Media Service -> Object Storage/CDN
                                 |
                                 v
                         Post Service -> DB -> Feed Service
                                               |
                                               v
                                      Feed Write + Cache Invalidations

Steps

  1. Media Service gives the client a pre-signed upload URL.
  2. Client uploads image directly to object storage (S3-like).
  3. Client sends metadata (POST /posts) to Post Service.
  4. Post stored in DB → event emitted.
  5. Fan-out worker updates followers’ feeds.
  6. Invalidate feed cache for affected users:
    • Delete user_feed:{id} for each follower.

This ensures follower feeds show new posts immediately.


5. Feed Generation — With Caching

Home Feed Read

Flow:

Client -> API Gateway -> Feed Service -> (Cache first)

Algorithm:

  1. Check feed cache:
    • If hit → return cached feed. (fast)
  2. If miss → rebuild feed:
    • Read from feeds table
    • Merge with “celebrity accounts” real-time posts
  3. Store result:
    • SET user_feed:{id} <feed> TTL=60s
  4. Return feed.

6. Fan-Out Strategy (Normal vs High Traffic Accounts)

Normal users

Use fan-out on write:

New Post -> push post_id into each follower’s feed list

Fast reads, moderate writes.

High-traffic / Celebrity accounts

Use fan-out on read:

  • Don’t push posts to millions of followers.
  • Store posts only in posts table.
  • On feed read:
    • Merge cached feed with celebrity posts.

Diagram:

Normal:     Fan-out on write -> Feed table
Celebrity:  Fan-out on read  -> Query on read + merge

7. Unfollow Logic + Security: Instantly Blocking Content

This was one of the key interview points:
What happens when A unfollows B?

The system must make B’s images non-viewable to A.

The steps I defined:


7.1 Follow Graph Update

When A unfollows B:

DELETE FROM follows WHERE follower_id=A AND followee_id=B

Then:

  • Delete from follow graph cache:
    • following:A
    • followers:B

Emit unfollow event:

unfollow(A, B)

7.2 Feed Cleanup

Background worker removes B’s posts from A’s feed:

DELETE FROM feeds WHERE user_id=A AND post_owner=B

Then:

DEL user_feed:{A}       // Clear cached feed

Next time A loads their feed → cache is rebuilt → B is gone.


7.3 Media Authorization — The Final Gate

Even if A somehow still has old URLs to B’s photos:

  • URLs are signed with short TTL.
  • Client requests /media/{post_id}.
  • Media Service checks:
if user_is_allowed(A, post_owner=B) == false:
     deny access

Why critical?

Because images are cached globally at the CDN edge.

Authorization must occur on every media request, not just feed load.

No follow = no signed URL granted = no image visible.


8. Protecting Private Photos

If B has a private account:

  • Only followers can request signed URLs.
  • Feed Service enforces read-time checks.
  • Media Service enforces access control before signing URLs.
  • Follow Graph cache ensures checks are fast.

If A unfollows B → B’s account becomes private to A.


9. High-Traffic Scenarios & Caching Stabilizers

To handle celebrities or viral moments, I emphasized:

Anti-Thundering-Herd Techniques

  • Staggered cache expiry (jitter).
  • Lock-based cache rebuilds:
    • Prevent multiple servers rebuilding the same feed at once.
  • Write-through caching for profiles.
  • CDN for all images/video.
  • Sharded DBs for:
    • Posts
    • Feeds
    • Follows

Hot User Protection

If Beyoncé posts:

  1. Post Service writes metadata.
  2. Celebrity posts don’t fan out.
  3. Feed Service merges them on read.
  4. Cache per user remains stable.

This prevents millions of writes per post.


10. Full System Diagram with Caching

                 +--------------------+
                 |    API Gateway     |
                 +---------+----------+
                           |
                 [Auth + Rate Limits]
                           |
             +-------------+-------------+
             |             |             |
      +------v-----+ +-----v------+ +----v------+
      | Profile Svc| | Follow Svc | |  Feed Svc |
      +------+-----+ +------+-----+ +-----+-----+
             |              |             |
             |              |             |
     +-------v---+   +------v----+   +----v-------+
     | Profile DB|   | Follow DB |   | Feed DB    |
     +-----------+   +-----------+   +------------+
              |           |               |
              |      [Redis Cache]        |
              |                           |
                              +-----------v----------+
                              |   Media Service      |
                              | (Signed URLs)        |
                              +-----------+----------+
                                          |
                                  +-------v-------+
                                  |  CDN Layer    |
                                  +-------+-------+
                                          |
                                  +-------v-------+
                                  | Object Storage|
                                  +---------------+

Caching sits at:

  • Feed Service
  • Follow Service
  • Profile Service
  • CDN
  • Media Service access layer

11. Summary of My Interview Approach

What I communicated, step-by-step:

  1. Decompose the system into services.
  2. Define data models that scale.
  3. Describe hybrid fan-out strategy.
  4. Introduce caching everywhere:
    • Feed cache
    • Profile cache
    • Follow graph cache
    • Post cache
    • CDN media cache
  5. Explain authorization logic:
    • Follow graph is source of truth
    • Signed URLs
    • No “zombie” access
  6. Describe unfollow security:
    • Remove feed items
    • Invalidate cache
    • Deny media access
  7. Handle high-traffic accounts separately.
  8. Cover security, rate limiting, and resilience strategies.

Leave a Reply