hash-of-wisdom/docs/ARCHITECTURE.md

# Architecture Choices

This document explains the key architectural decisions made in the Hash of Wisdom project and the reasoning behind them.

## Overall Architecture

### Clean Architecture
We follow Clean Architecture principles with clear layer separation:

```
┌─────────────────────────────────────┐
│         Infrastructure Layer        │  ← cmd/, internal/server, internal/protocol
├─────────────────────────────────────┤
│         Application Layer           │  ← internal/application (message handling)
├─────────────────────────────────────┤
│           Domain Layer              │  ← internal/service, internal/pow (business logic)
├─────────────────────────────────────┤
│          External Layer             │  ← internal/quotes (external APIs)
└─────────────────────────────────────┘
```

**Benefits**:
- **Testability**: Each layer can be unit tested independently
- **Maintainability**: Changes in one layer don't cascade
- **Flexibility**: Easy to swap implementations (e.g., different quote sources)
- **Domain Focus**: Core business rules are isolated and protected

## Protocol Design

### Binary Protocol with JSON Payloads
Choice: Custom binary protocol with JSON-encoded message bodies

**Why Binary Protocol**:
- **Performance**: Efficient framing and length prefixes
- **Reliability**: Clear message boundaries prevent parsing issues
- **Extensibility**: Easy to add message types and versions

**Why JSON Payloads**:
- **Simplicity**: Standard library support, easy debugging
- **Flexibility**: Schema evolution without breaking compatibility
- **Tooling**: Excellent tooling and human readability

**Alternative Considered**: Pure binary (Protocol Buffers)
- **Rejected Because**: Added complexity without significant benefit for our use case
- **Trade-off**: Slightly larger payload size for much simpler implementation

### Stateless Challenge Design
Choice: HMAC-signed challenges with all state embedded

```go
type Challenge struct {
    Target     string `json:"target"`      // "quotes"
    Timestamp  int64  `json:"timestamp"`   // Unix timestamp
    Difficulty int    `json:"difficulty"`  // Leading zero bits
    Random     string `json:"random"`      // Entropy
    Signature  string `json:"signature"`   // HMAC-SHA256
}
```

**Benefits**:
- **Scalability**: No server-side session storage required
- **Reliability**: Challenges survive server restarts
- **Security**: HMAC prevents tampering and replay attacks
- **Simplicity**: No cache management or cleanup needed

**Alternative Considered**: Session-based challenges
- **Rejected Because**: Requires distributed session management for horizontal scaling

## Proof-of-Work Algorithm

### SHA-256 with Leading Zero Bits
Choice: SHA-256 hashing with difficulty measured as leading zero bits

**Why SHA-256**:
- **Security**: Cryptographically secure, extensively tested
- **Performance**: Hardware-optimized on most platforms
- **Standardization**: Well-known algorithm with predictable properties

**Why Leading Zero Bits**:
- **Linear Scaling**: Each bit doubles the difficulty (2^n complexity)
- **Simplicity**: Easy to verify and understand
- **Flexibility**: Fine-grained difficulty adjustment

**Alternative Considered**: Scrypt/Argon2 (memory-hard functions)
- **Rejected Because**: Excessive complexity for DDoS protection use case
- **Trade-off**: ASIC resistance not needed for temporary challenges

### Difficulty Range: 4-30 Bits
Choice: Configurable difficulty with reasonable bounds

- **Minimum (4 bits)**: ~16 attempts average, sub-second solve time
- **Maximum (30 bits)**: ~1 billion attempts, several seconds on modern CPU
- **Default (4 bits)**: Balance between protection and user experience

## Server Architecture

### TCP Server with Per-Connection Goroutines
Choice: Custom TCP server with one goroutine per connection

```go
func (s *TCPServer) Start(ctx context.Context) error {
    // Start listener
    listener, err := net.Listen("tcp", s.config.Address)
    if err != nil {
        return err
    }

    // Start accept loop in goroutine
    go s.acceptLoop(ctx)
    return nil // Returns immediately
}

func (s *TCPServer) acceptLoop(ctx context.Context) {
    for {
        conn, err := s.listener.Accept()
        if err != nil || ctx.Done() != nil {
            return
        }

        // Launch handler in goroutine with WaitGroup tracking
        s.wg.Add(1)
        go func() {
            defer s.wg.Done()
            s.handleConnection(ctx, conn)
        }()
    }
}
```

**Benefits**:
- **Concurrency**: Each connection handled in separate goroutine
- **Non-blocking Start**: Server starts in background, returns immediately
- **Graceful Shutdown**: WaitGroup ensures all connections finish before stop
- **Context Cancellation**: Proper cleanup when context is cancelled
- **Resource Control**: Connection timeouts prevent resource exhaustion

**Alternative Considered**: HTTP/REST API
- **Rejected Because**: Test task requirements

### Connection Security: Multi-Level Timeouts
Choice: Layered timeout protection against various attacks

1. **Connection Timeout (15s)**: Maximum total connection lifetime
2. **Read Timeout (5s)**: Maximum time between incoming bytes
3. **Write Timeout (5s)**: Maximum time to send response

**Protects Against**:
- **Slowloris**: Slow read timeout prevents slow header attacks
- **Slow POST**: Connection timeout limits total request time
- **Resource Exhaustion**: Automatic cleanup of stale connections

## Configuration Management

### cleanenv with YAML + Environment Variables
Choice: File-based configuration with environment variable overrides

```yaml
# config.yaml
server:
  address: ":8080"

pow:
  difficulty: 4
```

```bash
# Environment override
export POW_DIFFICULTY=8
```

**Benefits**:
- **Development**: Easy configuration files for local development
- **Production**: Environment variables for containerized deployments
- **Validation**: Built-in validation and type conversion
- **Documentation**: Self-documenting with struct tags

**Alternative Considered**: Pure environment variables
- **Rejected Because**: Harder to manage complex configurations

## Observability Architecture

### Prometheus Metrics
Choice: Prometheus format metrics with essential measurements

**Application Metrics**:
- `wisdom_requests_total` - All incoming requests
- `wisdom_request_errors_total{error_type}` - Errors by type
- `wisdom_request_duration_seconds` - Request processing time
- `wisdom_quotes_served_total` - Successfully served quotes

**Go Runtime Metrics** (automatically exported):
- `go_memstats_*` - Memory allocation and GC statistics
- `go_goroutines` - Current number of goroutines
- `go_gc_duration_seconds` - Garbage collection duration
- `process_*` - Process-level CPU, memory, and file descriptor stats

**Design Principle**: Simple metrics that provide actionable insights
- **Avoided**: Complex multi-dimensional metrics
- **Focus**: Essential health and performance indicators
- **Runtime Visibility**: Go collector provides deep runtime observability

### Metrics at Infrastructure Layer
Choice: Collect metrics in TCP server, not business logic

```go
// In TCP server (infrastructure)
metrics.RequestsTotal.Inc()
start := time.Now()
response, err := s.wisdomApplication.HandleMessage(ctx, msg)
metrics.RequestDuration.Observe(time.Since(start).Seconds())
```

**Benefits**:
- **Separation of Concerns**: Business logic stays pure
- **Consistency**: All requests measured the same way
- **Performance**: Minimal overhead in critical path

## Design Patterns

### Dependency Injection
All major components use constructor injection:
```go
server := server.NewTCPServer(wisdomApplication, config, options...)
service := service.NewWisdomService(generator, verifier, quoteService)
```

**Benefits**:
- **Testing**: Easy to inject mocks and stubs
- **Configuration**: Runtime assembly of components
- **Decoupling**: Components don't know about concrete implementations

### Interface Segregation
Small, focused interfaces for easy testing:
```go
type ChallengeGenerator interface {
    GenerateChallenge(ctx context.Context) (*Challenge, error)
}

type QuoteService interface {
    GetQuote(ctx context.Context) (string, error)
}
```

### Functional Options
Flexible configuration with sensible defaults:
```go
server := NewTCPServer(application, config,
    WithLogger(logger),
)
```

### Clean Architecture Implementation
See the layer diagram in the Overall Architecture section above for package organization.

## Testing Architecture

### Layered Testing Strategy
1. **Unit Tests**: Each package tested independently with mocks
2. **Integration Tests**: End-to-end tests with real TCP connections
3. **Benchmark Tests**: Performance validation for PoW algorithms

```go
// Unit test with mocks
func TestWisdomService_HandleMessage(t *testing.T) {
    mockGenerator := &MockGenerator{}
    mockVerifier := &MockVerifier{}
    mockQuotes := &MockQuoteService{}

    service := NewWisdomService(mockGenerator, mockVerifier, mockQuotes)
    // Test business logic in isolation
}

// Integration test with real components
func TestTCPServer_SlowlorisProtection(t *testing.T) {
    // Start real server, make slow connection
    // Verify server doesn't hang
}
```

## Security Architecture

### Defense in Depth
Multiple security layers working together:

1. **HMAC Authentication**: Prevents challenge tampering
2. **Timestamp Validation**: Prevents replay attacks (5-minute TTL)
3. **Connection Timeouts**: Prevents resource exhaustion
4. **Proof-of-Work**: Rate limiting through computational cost
5. **Input Validation**: All protocol messages validated

### Threat Model
**Primary Threats Addressed**:
- **DDoS Attacks**: PoW makes attacks expensive
- **Resource Exhaustion**: Connection timeouts and limits
- **Protocol Attacks**: Binary framing prevents confusion
- **Replay Attacks**: Timestamp validation in challenges

**Threats NOT Addressed** (by design):
- **Authentication**: Public service, no user accounts
- **Authorization**: All valid solutions get quotes
- **Data Confidentiality**: Quotes are public information

## Trade-offs Made

### Simplicity vs Performance
- **Chose**: Simple JSON payloads over binary serialization
- **Trade-off**: ~30% larger messages for easier debugging and maintenance

### Memory vs CPU
- **Chose**: Stateless challenges requiring CPU verification
- **Trade-off**: More CPU per request for better scalability

### Flexibility vs Optimization
- **Chose**: Interface-based design with dependency injection
- **Trade-off**: Small runtime overhead for much better testability

### Features vs Complexity
- **Chose**: Essential features only (no rate limiting, user accounts, etc.)
- **Benefit**: Clean, focused implementation that does one thing well

## Future Architecture Considerations

For production scaling, consider:
1. **Quote Service Enhancement**: Caching, fallback quotes, multiple API sources
2. **Load Balancing**: Multiple server instances behind load balancer
3. **Rate Limiting**: Per-IP request limiting for additional protection
4. **Monitoring**: Full observability stack (Prometheus, Grafana, alerting)
5. **Security**: TLS encryption for sensitive deployments

The current architecture provides a solid foundation for these enhancements while maintaining simplicity and focus.