Capacity Planning and Estimation: How much data does YouTube store daily?
About this video
### Comprehensive Summary: 1. **Estimation Question Context**: - Estimation questions are crucial in design interviews and general problem-solving. - Example: Estimating the total storage YouTube requires per day for uploaded videos. 2. **Assumptions and Calculations**: - **User Base**: Assume 1 billion YouTube users globally. - **Uploaders**: Approximately 1 in 1,000 users upload videos daily (~1 million uploaders). - **Video Length**: Each uploaded video averages 10 minutes. - **Video Size**: - A 2-hour movie is ~4 GB; thus, 10 minutes would be ~333 MB (assuming lower quality uploads). - Simplified assumption: ~3 MB per minute of video footage. - **Daily Storage Requirement**: - Total minutes uploaded daily: \(10^7\) minutes. - Raw footage storage: \(10^7 \times 3 = 30\) TB. - **Redundancy**: - For fault tolerance and performance, maintain 3 copies. - Total raw footage with redundancy: \(30 \times 3 = 90\) TB. 3. **Multiple Resolutions**: - Videos stored in multiple resolutions (e.g., 720p, 480p, 360p, 240p, 144p). - Assuming each resolution reduces size by half from the previous one. - Total storage requirement doubles: \(90 \times 2 = 180\) TB (~0.2 petabytes). 4. **Metadata Caching**: - **Thumbnail Size**: Thumbnails (~10 KB each) are cached for popular videos. - **Popular Videos**: Last 90 days’ uploads (~1 million videos). - **Cache Size**: \(1 \text{ million} \times 10 \text{ KB} = 10 \text{ GB}\); approximated to 1 TB RAM. - **Hardware**: Use 16 GB nodes; requiring ~64 nodes for 1 TB cache. - **Redundancy**: To prevent cascading failures, operate at 50% capacity—use ~500 nodes. 5. **Processing Power**: - **Data Processing Requirement**: - Process \(10^7\) minutes/day → ~40 MB/sec. - Considering multiple formats and data centers: ~400 MB/sec worldwide. - **Processing Time**: - Read: 10 ms/MB, Process: 20 ms/MB, Write: 20 ms/MB → Total: 50 ms/MB. - Effective work: \(400 \times 50 = 20,000\) ms (20 seconds) of work per second. - **Parallel Processing**: Need ~20 processors to handle the load. 6. **Key Insights**: - Importance lies in methodical breakdown rather than exact numbers. - Being off by a factor of 10-100 is acceptable; beyond that indicates flawed assumptions. - Clarify assumptions during interviews and validate them if possible. 7. **Final Notes**: - Understanding basic computing metrics (read/write times, processing speeds) is essential. - Assumptions should be reasonable and justifiable. - Encourage feedback and continuous learning through comments and further resources.
Course: System Design Playlist
**Course Description: System Design Playlist** This comprehensive course, titled "System Design Playlist," is designed to provide students with a deep understanding of system design principles and practices through real-world analogies and technical explanations. The course begins by using the analogy of running a pizza restaurant to illustrate fundamental concepts in system design, such as optimizing processes, scaling resources, and ensuring resilience. Students will learn about vertical scaling—enhancing the capabilities of existing resources—and horizontal scaling—adding more resources to distribute the workload. Through this engaging example, participants will grasp essential strategies for improving throughput, eliminating single points of failure, and implementing backup systems to maintain operational continuity. As the course progresses, students will delve into advanced topics like microservice architecture, where responsibilities within a system are clearly defined and divided among specialized teams or services. This approach allows for efficient scaling and management of different components based on their specific needs. Additionally, the course covers distributed systems, highlighting the importance of fault tolerance and quick response times by strategically placing servers closer to users. Concepts such as load balancing, which intelligently routes requests to optimize performance, and decoupling systems to enhance flexibility and adaptability, are thoroughly explored. Participants will also learn about logging and metrics to monitor system health and make informed decisions. The course wraps up by contrasting high-level system design, which focuses on overarching architectural decisions, with low-level system design, which deals with the actual coding and implementation details. By mapping business scenarios to technical solutions, students will gain insights into designing scalable, reliable, and extensible systems. Whether you're new to system design or looking to deepen your expertise, this course equips you with the knowledge and tools needed to tackle complex design challenges and develop robust systems capable of meeting diverse user demands.
View Full Course