2025-08-23 18:26:12 +07:00

12 KiB

Raw Blame History

Architecture Choices

This document explains the key architectural decisions made in the Hash of Wisdom project and the reasoning behind them.

Overall Architecture

Clean Architecture

We follow Clean Architecture principles with clear layer separation:

┌─────────────────────────────────────┐
│         Infrastructure Layer        │  ← cmd/, internal/server, internal/protocol
├─────────────────────────────────────┤
│         Application Layer           │  ← internal/application (message handling)
├─────────────────────────────────────┤
│           Domain Layer              │  ← internal/service, internal/pow (business logic)
├─────────────────────────────────────┤
│          External Layer             │  ← internal/quotes (external APIs)
└─────────────────────────────────────┘

Benefits:

Testability: Each layer can be unit tested independently
Maintainability: Changes in one layer don't cascade
Flexibility: Easy to swap implementations (e.g., different quote sources)
Domain Focus: Core business rules are isolated and protected

Protocol Design

Binary Protocol with JSON Payloads

Choice: Custom binary protocol with JSON-encoded message bodies

Why Binary Protocol:

Performance: Efficient framing and length prefixes
Reliability: Clear message boundaries prevent parsing issues
Extensibility: Easy to add message types and versions

Why JSON Payloads:

Simplicity: Standard library support, easy debugging
Flexibility: Schema evolution without breaking compatibility
Tooling: Excellent tooling and human readability

Alternative Considered: Pure binary (Protocol Buffers)

Rejected Because: Added complexity without significant benefit for our use case
Trade-off: Slightly larger payload size for much simpler implementation

Stateless Challenge Design

Choice: HMAC-signed challenges with all state embedded

type Challenge struct {
    Target     string `json:"target"`      // "quotes"
    Timestamp  int64  `json:"timestamp"`   // Unix timestamp
    Difficulty int    `json:"difficulty"`  // Leading zero bits
    Random     string `json:"random"`      // Entropy
    Signature  string `json:"signature"`   // HMAC-SHA256
}

Benefits:

Scalability: No server-side session storage required
Reliability: Challenges survive server restarts
Security: HMAC prevents tampering and replay attacks
Simplicity: No cache management or cleanup needed

Alternative Considered: Session-based challenges

Rejected Because: Requires distributed session management for horizontal scaling

Proof-of-Work Algorithm

SHA-256 with Leading Zero Bits

Choice: SHA-256 hashing with difficulty measured as leading zero bits

Why SHA-256:

Security: Cryptographically secure, extensively tested
Performance: Hardware-optimized on most platforms
Standardization: Well-known algorithm with predictable properties

Why Leading Zero Bits:

Linear Scaling: Each bit doubles the difficulty (2^n complexity)
Simplicity: Easy to verify and understand
Flexibility: Fine-grained difficulty adjustment

Alternative Considered: Scrypt/Argon2 (memory-hard functions)

Rejected Because: Excessive complexity for DDoS protection use case
Trade-off: ASIC resistance not needed for temporary challenges

Difficulty Range: 4-30 Bits

Choice: Configurable difficulty with reasonable bounds

Minimum (4 bits): ~16 attempts average, sub-second solve time
Maximum (30 bits): ~1 billion attempts, several seconds on modern CPU
Default (4 bits): Balance between protection and user experience

Server Architecture

TCP Server with Per-Connection Goroutines

Choice: Custom TCP server with one goroutine per connection

func (s *TCPServer) Start(ctx context.Context) error {
    // Start listener
    listener, err := net.Listen("tcp", s.config.Address)
    if err != nil {
        return err
    }

    // Start accept loop in goroutine
    go s.acceptLoop(ctx)
    return nil // Returns immediately
}

func (s *TCPServer) acceptLoop(ctx context.Context) {
    for {
        conn, err := s.listener.Accept()
        if err != nil || ctx.Done() != nil {
            return
        }

        // Launch handler in goroutine with WaitGroup tracking
        s.wg.Add(1)
        go func() {
            defer s.wg.Done()
            s.handleConnection(ctx, conn)
        }()
    }
}

Benefits:

Concurrency: Each connection handled in separate goroutine
Non-blocking Start: Server starts in background, returns immediately
Graceful Shutdown: WaitGroup ensures all connections finish before stop
Context Cancellation: Proper cleanup when context is cancelled
Resource Control: Connection timeouts prevent resource exhaustion

Alternative Considered: HTTP/REST API

Rejected Because: Test task requirements

Connection Security: Multi-Level Timeouts

Choice: Layered timeout protection against various attacks

Connection Timeout (15s): Maximum total connection lifetime
Read Timeout (5s): Maximum time between incoming bytes
Write Timeout (5s): Maximum time to send response

Protects Against:

Slowloris: Slow read timeout prevents slow header attacks
Slow POST: Connection timeout limits total request time
Resource Exhaustion: Automatic cleanup of stale connections

Configuration Management

cleanenv with YAML + Environment Variables

Choice: File-based configuration with environment variable overrides

# config.yaml
server:
  address: ":8080"

pow:
  difficulty: 4

# Environment override
export POW_DIFFICULTY=8

Benefits:

Development: Easy configuration files for local development
Production: Environment variables for containerized deployments
Validation: Built-in validation and type conversion
Documentation: Self-documenting with struct tags

Alternative Considered: Pure environment variables

Rejected Because: Harder to manage complex configurations

Observability Architecture

Prometheus Metrics

Choice: Prometheus format metrics with essential measurements

Application Metrics:

wisdom_requests_total - All incoming requests
wisdom_request_errors_total{error_type} - Errors by type
wisdom_request_duration_seconds - Request processing time
wisdom_quotes_served_total - Successfully served quotes

Go Runtime Metrics (automatically exported):

go_memstats_* - Memory allocation and GC statistics
go_goroutines - Current number of goroutines
go_gc_duration_seconds - Garbage collection duration
process_* - Process-level CPU, memory, and file descriptor stats

Design Principle: Simple metrics that provide actionable insights

Avoided: Complex multi-dimensional metrics
Focus: Essential health and performance indicators
Runtime Visibility: Go collector provides deep runtime observability

Metrics at Infrastructure Layer

Choice: Collect metrics in TCP server, not business logic

// In TCP server (infrastructure)
metrics.RequestsTotal.Inc()
start := time.Now()
response, err := s.wisdomApplication.HandleMessage(ctx, msg)
metrics.RequestDuration.Observe(time.Since(start).Seconds())

Benefits:

Separation of Concerns: Business logic stays pure
Consistency: All requests measured the same way
Performance: Minimal overhead in critical path

Design Patterns

Dependency Injection

All major components use constructor injection:

server := server.NewTCPServer(wisdomApplication, config, options...)
service := service.NewWisdomService(generator, verifier, quoteService)

Benefits:

Testing: Easy to inject mocks and stubs
Configuration: Runtime assembly of components
Decoupling: Components don't know about concrete implementations

Interface Segregation

Small, focused interfaces for easy testing:

type ChallengeGenerator interface {
    GenerateChallenge(ctx context.Context) (*Challenge, error)
}

type QuoteService interface {
    GetQuote(ctx context.Context) (string, error)
}

Functional Options

Flexible configuration with sensible defaults:

server := NewTCPServer(application, config,
    WithLogger(logger),
)

Clean Architecture Implementation

See the layer diagram in the Overall Architecture section above for package organization.

Testing Architecture

Layered Testing Strategy

Unit Tests: Each package tested independently with mocks
Integration Tests: End-to-end tests with real TCP connections
Benchmark Tests: Performance validation for PoW algorithms

// Unit test with mocks
func TestWisdomService_HandleMessage(t *testing.T) {
    mockGenerator := &MockGenerator{}
    mockVerifier := &MockVerifier{}
    mockQuotes := &MockQuoteService{}

    service := NewWisdomService(mockGenerator, mockVerifier, mockQuotes)
    // Test business logic in isolation
}

// Integration test with real components
func TestTCPServer_SlowlorisProtection(t *testing.T) {
    // Start real server, make slow connection
    // Verify server doesn't hang
}

Security Architecture

Defense in Depth

Multiple security layers working together:

HMAC Authentication: Prevents challenge tampering
Timestamp Validation: Prevents replay attacks (5-minute TTL)
Connection Timeouts: Prevents resource exhaustion
Proof-of-Work: Rate limiting through computational cost
Input Validation: All protocol messages validated

Threat Model

Primary Threats Addressed:

DDoS Attacks: PoW makes attacks expensive
Resource Exhaustion: Connection timeouts and limits
Protocol Attacks: Binary framing prevents confusion
Replay Attacks: Timestamp validation in challenges

Threats NOT Addressed (by design):

Authentication: Public service, no user accounts
Authorization: All valid solutions get quotes
Data Confidentiality: Quotes are public information

Trade-offs Made

Simplicity vs Performance

Chose: Simple JSON payloads over binary serialization
Trade-off: ~30% larger messages for easier debugging and maintenance

Memory vs CPU

Chose: Stateless challenges requiring CPU verification
Trade-off: More CPU per request for better scalability

Flexibility vs Optimization

Chose: Interface-based design with dependency injection
Trade-off: Small runtime overhead for much better testability

Features vs Complexity

Chose: Essential features only (no rate limiting, user accounts, etc.)
Benefit: Clean, focused implementation that does one thing well

Future Architecture Considerations

For production scaling, consider:

Quote Service Enhancement: Caching, fallback quotes, multiple API sources
Load Balancing: Multiple server instances behind load balancer
Rate Limiting: Per-IP request limiting for additional protection
Monitoring: Full observability stack (Prometheus, Grafana, alerting)
Security: TLS encryption for sensitive deployments

The current architecture provides a solid foundation for these enhancements while maintaining simplicity and focus.

12 KiB Raw Blame History

Architecture Choices

Overall Architecture

Clean Architecture

Protocol Design

Binary Protocol with JSON Payloads

Stateless Challenge Design

Proof-of-Work Algorithm

SHA-256 with Leading Zero Bits

Difficulty Range: 4-30 Bits

Server Architecture

TCP Server with Per-Connection Goroutines

Connection Security: Multi-Level Timeouts

Configuration Management

cleanenv with YAML + Environment Variables

Observability Architecture

Prometheus Metrics

Metrics at Infrastructure Layer

Design Patterns

Dependency Injection

Interface Segregation

Functional Options

Clean Architecture Implementation

Testing Architecture

Layered Testing Strategy

Security Architecture

Defense in Depth

Threat Model

Trade-offs Made

Simplicity vs Performance

Memory vs CPU

Flexibility vs Optimization

Features vs Complexity

Future Architecture Considerations

12 KiB

Raw Blame History