Learning Objectives
Course-Level Outcomes
This assignment directly supports the following course learning objectives for CS 4459:
1. Understand Distributed Systems Communication
- Explain how RPC enables remote function calls across network boundaries
- Compare RPC with alternative communication patterns (REST, message queues)
- Analyze trade-offs between different communication approaches
2. Handle Failures in Distributed Systems
- Identify failure modes in distributed systems (network partitions, server crashes, timeouts)
- Implement retry strategies and timeout handling
- Apply at-least-once and at-most-once semantics appropriately
3. Design for Reliability
- Implement idempotent operations to enable safe retries
- Apply circuit breaker pattern to prevent cascading failures
- Design fault-tolerant distributed services
4. Apply Industry-Standard Tools
- Use gRPC and Protocol Buffers for efficient RPC communication
- Write service definitions using protobuf IDL
- Generate client and server code from service definitions
Technical Skills
By completing this assignment, you will gain hands-on experience with:
gRPC Framework
- Define service contracts using
.protofiles - Generate Python code from protobuf definitions
- Implement both synchronous and asynchronous RPC calls
- Configure client and server options (timeouts, interceptors)
Distributed Systems Patterns
- Retry Logic: Exponential backoff, jitter, max attempts
- Idempotency: Request IDs, deduplication, state management
- Circuit Breaker: Failure tracking, state transitions, recovery
Testing Distributed Systems
- Simulate network failures and timeouts
- Test retry behavior and idempotency guarantees
- Measure latency and throughput under different conditions
Conceptual Understanding
What Makes Operations Idempotent?
You'll understand why these operations are idempotent:
set_value(key="x", value=10) # Always sets x to 10
get_value(key="x") # Read-only, no side effects
delete(key="x") # Deleting twice = same result
And why these are NOT idempotent:
increment(key="x") # Calling twice increments twice
append(key="x", value=5) # Calling twice appends twice
withdraw(amount=100) # Calling twice withdraws $200
When to Use Circuit Breakers
You'll learn to identify scenarios where circuit breakers prevent cascading failures:
Scenario: Payment Service Down
- Payment service becomes unavailable
- Without circuit breaker: Every request waits for timeout (5s × 1000 requests = 5000s wasted)
- With circuit breaker: Fails fast after detecting pattern (saves resources, better UX)
Assessment Criteria
Your understanding will be assessed through:
1. Implementation Quality (60%)
- Correct gRPC service implementation
- Proper error handling and retry logic
- Working idempotency mechanism
- Functional circuit breaker
2. Testing & Validation (20%)
- Comprehensive test scenarios
- Clear demonstration of failure handling
- Performance measurements
3. Analysis & Reflection (20%)
- Written report answering key questions
- Trade-off analysis (RPC vs REST)
- Design decisions and justifications
Success Indicators
You've mastered the material when you can:
✅ Explain why retrying a withdrawal operation is dangerous
✅ Design an API that's safe to retry automatically
✅ Identify when circuit breakers improve system resilience
✅ Choose between RPC and REST for a given use case
✅ Implement timeout handling without blocking other requests
Self-Assessment Questions
Before starting, reflect on these questions:
- What happens if a client sends the same request twice?
- How long should a client wait before timing out?
- When should a system stop retrying and fail fast?
- What information do you need to make operations idempotent?
You'll answer these through hands-on implementation!