331 lines
12 KiB
Markdown
331 lines
12 KiB
Markdown
# Architecture Choices
|
|
|
|
This document explains the key architectural decisions made in the Hash of Wisdom project and the reasoning behind them.
|
|
|
|
## Overall Architecture
|
|
|
|
### Clean Architecture
|
|
We follow Clean Architecture principles with clear layer separation:
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ Infrastructure Layer │ ← cmd/, internal/server, internal/protocol
|
|
├─────────────────────────────────────┤
|
|
│ Application Layer │ ← internal/application (message handling)
|
|
├─────────────────────────────────────┤
|
|
│ Domain Layer │ ← internal/service, internal/pow (business logic)
|
|
├─────────────────────────────────────┤
|
|
│ External Layer │ ← internal/quotes (external APIs)
|
|
└─────────────────────────────────────┘
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Testability**: Each layer can be unit tested independently
|
|
- **Maintainability**: Changes in one layer don't cascade
|
|
- **Flexibility**: Easy to swap implementations (e.g., different quote sources)
|
|
- **Domain Focus**: Core business rules are isolated and protected
|
|
|
|
## Protocol Design
|
|
|
|
### Binary Protocol with JSON Payloads
|
|
Choice: Custom binary protocol with JSON-encoded message bodies
|
|
|
|
**Why Binary Protocol**:
|
|
- **Performance**: Efficient framing and length prefixes
|
|
- **Reliability**: Clear message boundaries prevent parsing issues
|
|
- **Extensibility**: Easy to add message types and versions
|
|
|
|
**Why JSON Payloads**:
|
|
- **Simplicity**: Standard library support, easy debugging
|
|
- **Flexibility**: Schema evolution without breaking compatibility
|
|
- **Tooling**: Excellent tooling and human readability
|
|
|
|
**Alternative Considered**: Pure binary (Protocol Buffers)
|
|
- **Rejected Because**: Added complexity without significant benefit for our use case
|
|
- **Trade-off**: Slightly larger payload size for much simpler implementation
|
|
|
|
### Stateless Challenge Design
|
|
Choice: HMAC-signed challenges with all state embedded
|
|
|
|
```go
|
|
type Challenge struct {
|
|
Target string `json:"target"` // "quotes"
|
|
Timestamp int64 `json:"timestamp"` // Unix timestamp
|
|
Difficulty int `json:"difficulty"` // Leading zero bits
|
|
Random string `json:"random"` // Entropy
|
|
Signature string `json:"signature"` // HMAC-SHA256
|
|
}
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Scalability**: No server-side session storage required
|
|
- **Reliability**: Challenges survive server restarts
|
|
- **Security**: HMAC prevents tampering and replay attacks
|
|
- **Simplicity**: No cache management or cleanup needed
|
|
|
|
**Alternative Considered**: Session-based challenges
|
|
- **Rejected Because**: Requires distributed session management for horizontal scaling
|
|
|
|
## Proof-of-Work Algorithm
|
|
|
|
### SHA-256 with Leading Zero Bits
|
|
Choice: SHA-256 hashing with difficulty measured as leading zero bits
|
|
|
|
**Why SHA-256**:
|
|
- **Security**: Cryptographically secure, extensively tested
|
|
- **Performance**: Hardware-optimized on most platforms
|
|
- **Standardization**: Well-known algorithm with predictable properties
|
|
|
|
**Why Leading Zero Bits**:
|
|
- **Linear Scaling**: Each bit doubles the difficulty (2^n complexity)
|
|
- **Simplicity**: Easy to verify and understand
|
|
- **Flexibility**: Fine-grained difficulty adjustment
|
|
|
|
**Alternative Considered**: Scrypt/Argon2 (memory-hard functions)
|
|
- **Rejected Because**: Excessive complexity for DDoS protection use case
|
|
- **Trade-off**: ASIC resistance not needed for temporary challenges
|
|
|
|
### Difficulty Range: 4-30 Bits
|
|
Choice: Configurable difficulty with reasonable bounds
|
|
|
|
- **Minimum (4 bits)**: ~16 attempts average, sub-second solve time
|
|
- **Maximum (30 bits)**: ~1 billion attempts, several seconds on modern CPU
|
|
- **Default (4 bits)**: Balance between protection and user experience
|
|
|
|
## Server Architecture
|
|
|
|
### TCP Server with Per-Connection Goroutines
|
|
Choice: Custom TCP server with one goroutine per connection
|
|
|
|
```go
|
|
func (s *TCPServer) Start(ctx context.Context) error {
|
|
// Start listener
|
|
listener, err := net.Listen("tcp", s.config.Address)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Start accept loop in goroutine
|
|
go s.acceptLoop(ctx)
|
|
return nil // Returns immediately
|
|
}
|
|
|
|
func (s *TCPServer) acceptLoop(ctx context.Context) {
|
|
for {
|
|
conn, err := s.listener.Accept()
|
|
if err != nil || ctx.Done() != nil {
|
|
return
|
|
}
|
|
|
|
// Launch handler in goroutine with WaitGroup tracking
|
|
s.wg.Add(1)
|
|
go func() {
|
|
defer s.wg.Done()
|
|
s.handleConnection(ctx, conn)
|
|
}()
|
|
}
|
|
}
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Concurrency**: Each connection handled in separate goroutine
|
|
- **Non-blocking Start**: Server starts in background, returns immediately
|
|
- **Graceful Shutdown**: WaitGroup ensures all connections finish before stop
|
|
- **Context Cancellation**: Proper cleanup when context is cancelled
|
|
- **Resource Control**: Connection timeouts prevent resource exhaustion
|
|
|
|
**Alternative Considered**: HTTP/REST API
|
|
- **Rejected Because**: Test task requirements
|
|
|
|
### Connection Security: Multi-Level Timeouts
|
|
Choice: Layered timeout protection against various attacks
|
|
|
|
1. **Connection Timeout (15s)**: Maximum total connection lifetime
|
|
2. **Read Timeout (5s)**: Maximum time between incoming bytes
|
|
3. **Write Timeout (5s)**: Maximum time to send response
|
|
|
|
**Protects Against**:
|
|
- **Slowloris**: Slow read timeout prevents slow header attacks
|
|
- **Slow POST**: Connection timeout limits total request time
|
|
- **Resource Exhaustion**: Automatic cleanup of stale connections
|
|
|
|
## Configuration Management
|
|
|
|
### cleanenv with YAML + Environment Variables
|
|
Choice: File-based configuration with environment variable overrides
|
|
|
|
```yaml
|
|
# config.yaml
|
|
server:
|
|
address: ":8080"
|
|
|
|
pow:
|
|
difficulty: 4
|
|
```
|
|
|
|
```bash
|
|
# Environment override
|
|
export POW_DIFFICULTY=8
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Development**: Easy configuration files for local development
|
|
- **Production**: Environment variables for containerized deployments
|
|
- **Validation**: Built-in validation and type conversion
|
|
- **Documentation**: Self-documenting with struct tags
|
|
|
|
**Alternative Considered**: Pure environment variables
|
|
- **Rejected Because**: Harder to manage complex configurations
|
|
|
|
## Observability Architecture
|
|
|
|
### Prometheus Metrics
|
|
Choice: Prometheus format metrics with essential measurements
|
|
|
|
**Application Metrics**:
|
|
- `wisdom_requests_total` - All incoming requests
|
|
- `wisdom_request_errors_total{error_type}` - Errors by type
|
|
- `wisdom_request_duration_seconds` - Request processing time
|
|
- `wisdom_quotes_served_total` - Successfully served quotes
|
|
|
|
**Go Runtime Metrics** (automatically exported):
|
|
- `go_memstats_*` - Memory allocation and GC statistics
|
|
- `go_goroutines` - Current number of goroutines
|
|
- `go_gc_duration_seconds` - Garbage collection duration
|
|
- `process_*` - Process-level CPU, memory, and file descriptor stats
|
|
|
|
**Design Principle**: Simple metrics that provide actionable insights
|
|
- **Avoided**: Complex multi-dimensional metrics
|
|
- **Focus**: Essential health and performance indicators
|
|
- **Runtime Visibility**: Go collector provides deep runtime observability
|
|
|
|
### Metrics at Infrastructure Layer
|
|
Choice: Collect metrics in TCP server, not business logic
|
|
|
|
```go
|
|
// In TCP server (infrastructure)
|
|
metrics.RequestsTotal.Inc()
|
|
start := time.Now()
|
|
response, err := s.wisdomApplication.HandleMessage(ctx, msg)
|
|
metrics.RequestDuration.Observe(time.Since(start).Seconds())
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Separation of Concerns**: Business logic stays pure
|
|
- **Consistency**: All requests measured the same way
|
|
- **Performance**: Minimal overhead in critical path
|
|
|
|
## Design Patterns
|
|
|
|
### Dependency Injection
|
|
All major components use constructor injection:
|
|
```go
|
|
server := server.NewTCPServer(wisdomApplication, config, options...)
|
|
service := service.NewWisdomService(generator, verifier, quoteService)
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Testing**: Easy to inject mocks and stubs
|
|
- **Configuration**: Runtime assembly of components
|
|
- **Decoupling**: Components don't know about concrete implementations
|
|
|
|
### Interface Segregation
|
|
Small, focused interfaces for easy testing:
|
|
```go
|
|
type ChallengeGenerator interface {
|
|
GenerateChallenge(ctx context.Context) (*Challenge, error)
|
|
}
|
|
|
|
type QuoteService interface {
|
|
GetQuote(ctx context.Context) (string, error)
|
|
}
|
|
```
|
|
|
|
### Functional Options
|
|
Flexible configuration with sensible defaults:
|
|
```go
|
|
server := NewTCPServer(application, config,
|
|
WithLogger(logger),
|
|
)
|
|
```
|
|
|
|
### Clean Architecture Implementation
|
|
See the layer diagram in the Overall Architecture section above for package organization.
|
|
|
|
## Testing Architecture
|
|
|
|
### Layered Testing Strategy
|
|
1. **Unit Tests**: Each package tested independently with mocks
|
|
2. **Integration Tests**: End-to-end tests with real TCP connections
|
|
3. **Benchmark Tests**: Performance validation for PoW algorithms
|
|
|
|
```go
|
|
// Unit test with mocks
|
|
func TestWisdomService_HandleMessage(t *testing.T) {
|
|
mockGenerator := &MockGenerator{}
|
|
mockVerifier := &MockVerifier{}
|
|
mockQuotes := &MockQuoteService{}
|
|
|
|
service := NewWisdomService(mockGenerator, mockVerifier, mockQuotes)
|
|
// Test business logic in isolation
|
|
}
|
|
|
|
// Integration test with real components
|
|
func TestTCPServer_SlowlorisProtection(t *testing.T) {
|
|
// Start real server, make slow connection
|
|
// Verify server doesn't hang
|
|
}
|
|
```
|
|
|
|
## Security Architecture
|
|
|
|
### Defense in Depth
|
|
Multiple security layers working together:
|
|
|
|
1. **HMAC Authentication**: Prevents challenge tampering
|
|
2. **Timestamp Validation**: Prevents replay attacks (5-minute TTL)
|
|
3. **Connection Timeouts**: Prevents resource exhaustion
|
|
4. **Proof-of-Work**: Rate limiting through computational cost
|
|
5. **Input Validation**: All protocol messages validated
|
|
|
|
### Threat Model
|
|
**Primary Threats Addressed**:
|
|
- **DDoS Attacks**: PoW makes attacks expensive
|
|
- **Resource Exhaustion**: Connection timeouts and limits
|
|
- **Protocol Attacks**: Binary framing prevents confusion
|
|
- **Replay Attacks**: Timestamp validation in challenges
|
|
|
|
**Threats NOT Addressed** (by design):
|
|
- **Authentication**: Public service, no user accounts
|
|
- **Authorization**: All valid solutions get quotes
|
|
- **Data Confidentiality**: Quotes are public information
|
|
|
|
## Trade-offs Made
|
|
|
|
### Simplicity vs Performance
|
|
- **Chose**: Simple JSON payloads over binary serialization
|
|
- **Trade-off**: ~30% larger messages for easier debugging and maintenance
|
|
|
|
### Memory vs CPU
|
|
- **Chose**: Stateless challenges requiring CPU verification
|
|
- **Trade-off**: More CPU per request for better scalability
|
|
|
|
### Flexibility vs Optimization
|
|
- **Chose**: Interface-based design with dependency injection
|
|
- **Trade-off**: Small runtime overhead for much better testability
|
|
|
|
### Features vs Complexity
|
|
- **Chose**: Essential features only (no rate limiting, user accounts, etc.)
|
|
- **Benefit**: Clean, focused implementation that does one thing well
|
|
|
|
## Future Architecture Considerations
|
|
|
|
For production scaling, consider:
|
|
1. **Quote Service Enhancement**: Caching, fallback quotes, multiple API sources
|
|
2. **Load Balancing**: Multiple server instances behind load balancer
|
|
3. **Rate Limiting**: Per-IP request limiting for additional protection
|
|
4. **Monitoring**: Full observability stack (Prometheus, Grafana, alerting)
|
|
5. **Security**: TLS encryption for sensitive deployments
|
|
|
|
The current architecture provides a solid foundation for these enhancements while maintaining simplicity and focus.
|