System Design Interview: TikTok architecture with @sudoCODE

About this video

### Final Summary of the Mock System Design Interview with Yogita Sharma This session featured **Yogita Sharma**, a software engineer at Careem and creator of the YouTube channel "SudoCode," in a live mock system design interview. The purpose of the session was to simulate a real-life system design interview, providing insights into how candidates should approach such scenarios while demonstrating best practices in designing scalable and resilient systems. --- ### **Interview Question** The task was to design a system akin to TikTok or Instagram Reels, focusing on three core functionalities: 1. **Video Uploads**: Handle uploads from creators worldwide. 2. **Storage**: Efficiently store videos and associated metadata. 3. **Distribution**: Deliver videos to users globally with low latency. --- ### **Functional and Non-Functional Requirements** #### Functional Requirements: - Support video uploads from a global user base. - Store videos securely and efficiently. - Distribute videos with minimal latency to ensure smooth playback. #### Non-Functional Requirements: - **High Availability and Fault Tolerance**: The system must remain operational even during regional server failures. - **Low Latency**: Prioritize fast uploads and streaming for a seamless user experience. - **Eventual Consistency**: Consistency is secondary to availability and performance. --- ### **User Assumptions** - **Daily Active Users (DAUs)**: ~10 million viewers. - **Creator-to-Consumer Ratio**: 1:100, translating to ~100k creators. - **Daily Uploads**: ~500k videos, assuming each creator uploads five videos daily. --- ### **System Design Considerations** #### Data Storage: - **User Data**: Stored in MySQL due to its suitability for structured, relational data like user profiles. - **Video Metadata**: Managed using MongoDB (or another NoSQL database) for its flexible schema and faster read access. - **Video Files**: Stored in Amazon S3, which provides reliable large-file storage and integrates seamlessly with Content Delivery Networks (CDNs). #### Upload Process: - An **upload API** queues incoming requests to handle high upload volumes. - HTTP 202 status codes acknowledge uploads, ensuring non-blocking acknowledgment for creators. #### Video Distribution: - Videos are distributed via CDNs, leveraging geographically distributed edge servers to minimize latency and enhance reliability. --- ### **Scalability and Storage Choices** - **Storage Needs**: Each uploaded video generates ~2,400 files due to multiple formats and resolutions, leading to an estimated daily storage requirement of ~1.2 TB. - **Scalability Strategy**: Incremental assessment of storage demands as the platform grows, ensuring the system remains performant under increasing loads. #### Justification for Storage Choices: - **Amazon S3**: Ideal for storing large video files and enabling replication across regions for fault tolerance. - **MySQL for User Data**: Best suited for structured data with frequent read/write operations. - **MongoDB for Metadata**: Offers flexibility for evolving metadata schemas and supports high-speed queries. --- ### **Ingest Engine Workflow** The video processing pipeline includes: 1. **Validation**: Ensures uploaded files meet format and size requirements. 2. **Categorization**: Tags videos for easier retrieval and organization. 3. **Format Conversion**: Converts videos into multiple formats and resolutions to support diverse devices. 4. **Parallel Processing**: Divides tasks into chunks to improve performance and reduce latency. --- ### **CDN and Caching Strategies** - A **hybrid CDN approach** is proposed, combining third-party solutions like Akamai with potential custom CDN development for greater control. - **Caching** optimizes video delivery by storing frequently accessed content at edge locations, reducing load on origin servers. - Metadata caching enhances response times for trending content queries. --- ### **Protocols for Upload and Delivery** - **Uploads**: Use HTTPS or SFTP for secure and reliable transfers. - **Delivery**: Rely on HTTP-based protocols, leveraging TCP for ordered and reliable transport. --- ### **Request Flow** When a user requests videos (e.g., from a profile): 1. The system queries metadata databases to retrieve video lists. 2. Selected videos are streamed via CDN, leveraging cached metadata and preprocessed video files. --- ### **Trade-offs and Future Considerations** - **Cost Estimation**: While the candidate demonstrated strong architectural decisions, cost estimation was not addressed early enough, leading to hesitancy in assessing feasibility. - **Network Protocols**: The candidate showed reluctance in discussing network protocols for uploads/downloads, highlighting a gap in practical experience. - **CDN Limitations**: Potential challenges include API constraints and cost considerations, emphasizing the need for engineering best practices. --- ### **Feedback and Learning Outcomes** The interviewer provided constructive feedback, noting that the candidate effectively applied common engineering practices, such as: - Decoupling ingestion and processing rates. - Implementing high availability through replication and CDNs. However, areas for improvement


Course: System Design Playlist

**Course Description: System Design Playlist** This comprehensive course, titled "System Design Playlist," is designed to provide students with a deep understanding of system design principles and practices through real-world analogies and technical explanations. The course begins by using the analogy of running a pizza restaurant to illustrate fundamental concepts in system design, such as optimizing processes, scaling resources, and ensuring resilience. Students will learn about vertical scaling—enhancing the capabilities of existing resources—and horizontal scaling—adding more resources to distribute the workload. Through this engaging example, participants will grasp essential strategies for improving throughput, eliminating single points of failure, and implementing backup systems to maintain operational continuity. As the course progresses, students will delve into advanced topics like microservice architecture, where responsibilities within a system are clearly defined and divided among specialized teams or services. This approach allows for efficient scaling and management of different components based on their specific needs. Additionally, the course covers distributed systems, highlighting the importance of fault tolerance and quick response times by strategically placing servers closer to users. Concepts such as load balancing, which intelligently routes requests to optimize performance, and decoupling systems to enhance flexibility and adaptability, are thoroughly explored. Participants will also learn about logging and metrics to monitor system health and make informed decisions. The course wraps up by contrasting high-level system design, which focuses on overarching architectural decisions, with low-level system design, which deals with the actual coding and implementation details. By mapping business scenarios to technical solutions, students will gain insights into designing scalable, reliable, and extensible systems. Whether you're new to system design or looking to deepen your expertise, this course equips you with the knowledge and tools needed to tackle complex design challenges and develop robust systems capable of meeting diverse user demands.

View Full Course