# Architecture Choices This document explains the key architectural decisions made in the Hash of Wisdom project and the reasoning behind them. ## Overall Architecture ### Clean Architecture We follow Clean Architecture principles with clear layer separation: ``` ┌─────────────────────────────────────┐ │ Infrastructure Layer │ ← cmd/, internal/server, internal/protocol ├─────────────────────────────────────┤ │ Application Layer │ ← internal/application (message handling) ├─────────────────────────────────────┤ │ Domain Layer │ ← internal/service, internal/pow (business logic) ├─────────────────────────────────────┤ │ External Layer │ ← internal/quotes (external APIs) └─────────────────────────────────────┘ ``` **Benefits**: - **Testability**: Each layer can be unit tested independently - **Maintainability**: Changes in one layer don't cascade - **Flexibility**: Easy to swap implementations (e.g., different quote sources) - **Domain Focus**: Core business rules are isolated and protected ## Protocol Design ### Binary Protocol with JSON Payloads Choice: Custom binary protocol with JSON-encoded message bodies **Why Binary Protocol**: - **Performance**: Efficient framing and length prefixes - **Reliability**: Clear message boundaries prevent parsing issues - **Extensibility**: Easy to add message types and versions **Why JSON Payloads**: - **Simplicity**: Standard library support, easy debugging - **Flexibility**: Schema evolution without breaking compatibility - **Tooling**: Excellent tooling and human readability **Alternative Considered**: Pure binary (Protocol Buffers) - **Rejected Because**: Added complexity without significant benefit for our use case - **Trade-off**: Slightly larger payload size for much simpler implementation ### Stateless Challenge Design Choice: HMAC-signed challenges with all state embedded ```go type Challenge struct { Target string `json:"target"` // "quotes" Timestamp int64 `json:"timestamp"` // Unix timestamp Difficulty int `json:"difficulty"` // Leading zero bits Random string `json:"random"` // Entropy Signature string `json:"signature"` // HMAC-SHA256 } ``` **Benefits**: - **Scalability**: No server-side session storage required - **Reliability**: Challenges survive server restarts - **Security**: HMAC prevents tampering and replay attacks - **Simplicity**: No cache management or cleanup needed **Alternative Considered**: Session-based challenges - **Rejected Because**: Requires distributed session management for horizontal scaling ## Proof-of-Work Algorithm ### SHA-256 with Leading Zero Bits Choice: SHA-256 hashing with difficulty measured as leading zero bits **Why SHA-256**: - **Security**: Cryptographically secure, extensively tested - **Performance**: Hardware-optimized on most platforms - **Standardization**: Well-known algorithm with predictable properties **Why Leading Zero Bits**: - **Linear Scaling**: Each bit doubles the difficulty (2^n complexity) - **Simplicity**: Easy to verify and understand - **Flexibility**: Fine-grained difficulty adjustment **Alternative Considered**: Scrypt/Argon2 (memory-hard functions) - **Rejected Because**: Excessive complexity for DDoS protection use case - **Trade-off**: ASIC resistance not needed for temporary challenges ### Difficulty Range: 4-30 Bits Choice: Configurable difficulty with reasonable bounds - **Minimum (4 bits)**: ~16 attempts average, sub-second solve time - **Maximum (30 bits)**: ~1 billion attempts, several seconds on modern CPU - **Default (4 bits)**: Balance between protection and user experience ## Server Architecture ### TCP Server with Per-Connection Goroutines Choice: Custom TCP server with one goroutine per connection ```go func (s *TCPServer) Start(ctx context.Context) error { // Start listener listener, err := net.Listen("tcp", s.config.Address) if err != nil { return err } // Start accept loop in goroutine go s.acceptLoop(ctx) return nil // Returns immediately } func (s *TCPServer) acceptLoop(ctx context.Context) { for { conn, err := s.listener.Accept() if err != nil || ctx.Done() != nil { return } // Launch handler in goroutine with WaitGroup tracking s.wg.Add(1) go func() { defer s.wg.Done() s.handleConnection(ctx, conn) }() } } ``` **Benefits**: - **Concurrency**: Each connection handled in separate goroutine - **Non-blocking Start**: Server starts in background, returns immediately - **Graceful Shutdown**: WaitGroup ensures all connections finish before stop - **Context Cancellation**: Proper cleanup when context is cancelled - **Resource Control**: Connection timeouts prevent resource exhaustion **Alternative Considered**: HTTP/REST API - **Rejected Because**: Test task requirements ### Connection Security: Multi-Level Timeouts Choice: Layered timeout protection against various attacks 1. **Connection Timeout (15s)**: Maximum total connection lifetime 2. **Read Timeout (5s)**: Maximum time between incoming bytes 3. **Write Timeout (5s)**: Maximum time to send response **Protects Against**: - **Slowloris**: Slow read timeout prevents slow header attacks - **Slow POST**: Connection timeout limits total request time - **Resource Exhaustion**: Automatic cleanup of stale connections ## Configuration Management ### cleanenv with YAML + Environment Variables Choice: File-based configuration with environment variable overrides ```yaml # config.yaml server: address: ":8080" pow: difficulty: 4 ``` ```bash # Environment override export POW_DIFFICULTY=8 ``` **Benefits**: - **Development**: Easy configuration files for local development - **Production**: Environment variables for containerized deployments - **Validation**: Built-in validation and type conversion - **Documentation**: Self-documenting with struct tags **Alternative Considered**: Pure environment variables - **Rejected Because**: Harder to manage complex configurations ## Observability Architecture ### Prometheus Metrics Choice: Prometheus format metrics with essential measurements **Application Metrics**: - `wisdom_requests_total` - All incoming requests - `wisdom_request_errors_total{error_type}` - Errors by type - `wisdom_request_duration_seconds` - Request processing time - `wisdom_quotes_served_total` - Successfully served quotes **Go Runtime Metrics** (automatically exported): - `go_memstats_*` - Memory allocation and GC statistics - `go_goroutines` - Current number of goroutines - `go_gc_duration_seconds` - Garbage collection duration - `process_*` - Process-level CPU, memory, and file descriptor stats **Design Principle**: Simple metrics that provide actionable insights - **Avoided**: Complex multi-dimensional metrics - **Focus**: Essential health and performance indicators - **Runtime Visibility**: Go collector provides deep runtime observability ### Metrics at Infrastructure Layer Choice: Collect metrics in TCP server, not business logic ```go // In TCP server (infrastructure) metrics.RequestsTotal.Inc() start := time.Now() response, err := s.wisdomApplication.HandleMessage(ctx, msg) metrics.RequestDuration.Observe(time.Since(start).Seconds()) ``` **Benefits**: - **Separation of Concerns**: Business logic stays pure - **Consistency**: All requests measured the same way - **Performance**: Minimal overhead in critical path ## Design Patterns ### Dependency Injection All major components use constructor injection: ```go server := server.NewTCPServer(wisdomApplication, config, options...) service := service.NewWisdomService(generator, verifier, quoteService) ``` **Benefits**: - **Testing**: Easy to inject mocks and stubs - **Configuration**: Runtime assembly of components - **Decoupling**: Components don't know about concrete implementations ### Interface Segregation Small, focused interfaces for easy testing: ```go type ChallengeGenerator interface { GenerateChallenge(ctx context.Context) (*Challenge, error) } type QuoteService interface { GetQuote(ctx context.Context) (string, error) } ``` ### Functional Options Flexible configuration with sensible defaults: ```go server := NewTCPServer(application, config, WithLogger(logger), ) ``` ### Clean Architecture Implementation See the layer diagram in the Overall Architecture section above for package organization. ## Testing Architecture ### Layered Testing Strategy 1. **Unit Tests**: Each package tested independently with mocks 2. **Integration Tests**: End-to-end tests with real TCP connections 3. **Benchmark Tests**: Performance validation for PoW algorithms ```go // Unit test with mocks func TestWisdomService_HandleMessage(t *testing.T) { mockGenerator := &MockGenerator{} mockVerifier := &MockVerifier{} mockQuotes := &MockQuoteService{} service := NewWisdomService(mockGenerator, mockVerifier, mockQuotes) // Test business logic in isolation } // Integration test with real components func TestTCPServer_SlowlorisProtection(t *testing.T) { // Start real server, make slow connection // Verify server doesn't hang } ``` ## Security Architecture ### Defense in Depth Multiple security layers working together: 1. **HMAC Authentication**: Prevents challenge tampering 2. **Timestamp Validation**: Prevents replay attacks (5-minute TTL) 3. **Connection Timeouts**: Prevents resource exhaustion 4. **Proof-of-Work**: Rate limiting through computational cost 5. **Input Validation**: All protocol messages validated ### Threat Model **Primary Threats Addressed**: - **DDoS Attacks**: PoW makes attacks expensive - **Resource Exhaustion**: Connection timeouts and limits - **Protocol Attacks**: Binary framing prevents confusion - **Replay Attacks**: Timestamp validation in challenges **Threats NOT Addressed** (by design): - **Authentication**: Public service, no user accounts - **Authorization**: All valid solutions get quotes - **Data Confidentiality**: Quotes are public information ## Trade-offs Made ### Simplicity vs Performance - **Chose**: Simple JSON payloads over binary serialization - **Trade-off**: ~30% larger messages for easier debugging and maintenance ### Memory vs CPU - **Chose**: Stateless challenges requiring CPU verification - **Trade-off**: More CPU per request for better scalability ### Flexibility vs Optimization - **Chose**: Interface-based design with dependency injection - **Trade-off**: Small runtime overhead for much better testability ### Features vs Complexity - **Chose**: Essential features only (no rate limiting, user accounts, etc.) - **Benefit**: Clean, focused implementation that does one thing well ## Future Architecture Considerations For production scaling, consider: 1. **Quote Service Enhancement**: Caching, fallback quotes, multiple API sources 2. **Load Balancing**: Multiple server instances behind load balancer 3. **Rate Limiting**: Per-IP request limiting for additional protection 4. **Monitoring**: Full observability stack (Prometheus, Grafana, alerting) 5. **Security**: TLS encryption for sensitive deployments The current architecture provides a solid foundation for these enhancements while maintaining simplicity and focus.