diff --git a/README.md b/README.md new file mode 100644 index 0000000..18fffc3 --- /dev/null +++ b/README.md @@ -0,0 +1,87 @@ +# Hash of Wisdom + +A TCP server implementing the "Word of Wisdom" concept with proof-of-work challenges to protect against DDoS attacks. + +## Overview + +The Hash of Wisdom server requires clients to solve computational puzzles (proof-of-work) before receiving wise quotes. This approach prevents spam and DDoS attacks by requiring clients to invest CPU time for each request. + +## Quick Start + +### Prerequisites +- Go 1.24.3+ +- Docker (optional) + +### Building +```bash +# Build server +go build -o hash-of-wisdom ./cmd/server + +# Build client +go build -o client ./cmd/client +``` + +### Running +```bash +# Start server (uses config.yaml by default) +./hash-of-wisdom + +# Or with custom config +./hash-of-wisdom -config /path/to/config.yaml + +# Connect with client +./client -addr localhost:8080 +``` + +### Using Docker +```bash +# Build image +docker build -t hash-of-wisdom . + +# Run container +docker run -p 8080:8080 -p 8081:8081 hash-of-wisdom +``` + +### Monitoring +- Metrics: http://localhost:8081/metrics (Prometheus format with Go runtime stats) +- Profiling: http://localhost:8081/debug/pprof/ + +## Documentation + +### Protocol & Implementation +- [Protocol Specification](docs/PROTOCOL.md) - Binary protocol definition +- [Implementation Plan](docs/IMPLEMENTATION.md) - Development phases and progress +- [Package Structure](docs/PACKAGES.md) - Code organization and package responsibilities +- [Architecture Choices](docs/ARCHITECTURE.md) - Design decisions and patterns + +### Production Readiness +- [Production Readiness Guide](docs/PRODUCTION_READINESS.md) - Requirements for production deployment + +## Algorithm Choice + +The server uses **SHA-256 based proof-of-work** with leading zero bits difficulty: +- **Why SHA-256**: Cryptographically secure, well-tested, hardware-optimized +- **Leading Zero Bits**: Simple difficulty scaling, easy verification +- **HMAC Authentication**: Prevents challenge tampering and replay attacks +- **Configurable Difficulty**: Adaptive to different threat levels (4-30 bits) + +This approach provides strong DDoS protection while remaining computationally reasonable for legitimate clients. + +## Current Status + +✅ **Complete**: Core functionality, TCP server, client, metrics, containerization +🔄 **In Progress**: Documentation (Phase 9) +📋 **Planned**: See [Production Readiness Guide](docs/PRODUCTION_READINESS.md) for production deployment requirements + +## Testing + +```bash +# Run all tests +go test ./... + +# Run integration tests +go test ./test/integration/... + +# Benchmarks +go test -bench=. ./internal/pow/... +``` diff --git a/cmd/server/main.go b/cmd/server/main.go index 136970a..332497c 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -8,7 +8,6 @@ import ( "os" "os/signal" "syscall" - "time" "hash-of-wisdom/internal/config" "hash-of-wisdom/internal/lib/sl" @@ -64,6 +63,8 @@ func main() { }, } + // Go runtime metrics are automatically registered by default registry + // Start metrics and pprof HTTP server go func() { http.Handle("/metrics", promhttp.Handler()) @@ -73,31 +74,31 @@ func main() { } }() + // Create context that cancels on interrupt signals + ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM) + defer cancel() + // Create server srv := server.NewTCPServer(wisdomService, serverConfig, server.WithLogger(logger)) // Start server - ctx := context.Background() if err := srv.Start(ctx); err != nil { logger.Error("failed to start server", sl.Err(err)) os.Exit(1) } - // Wait for interrupt - sigChan := make(chan os.Signal, 1) - signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM) - logger.Info("server ready - press ctrl+c to stop") - <-sigChan + + // Wait for context cancellation (signal received) + <-ctx.Done() // Graceful shutdown logger.Info("shutting down server") if err := srv.Stop(); err != nil { logger.Error("error during shutdown", sl.Err(err)) + } else { + logger.Info("server stopped gracefully") } - // Give connections time to close - time.Sleep(100 * time.Millisecond) - logger.Info("server stopped") } diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..97b4a61 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,330 @@ +# Architecture Choices + +This document explains the key architectural decisions made in the Hash of Wisdom project and the reasoning behind them. + +## Overall Architecture + +### Clean Architecture +We follow Clean Architecture principles with clear layer separation: + +``` +┌─────────────────────────────────────┐ +│ Infrastructure Layer │ ← cmd/, internal/server, internal/protocol +├─────────────────────────────────────┤ +│ Application Layer │ ← internal/application (message handling) +├─────────────────────────────────────┤ +│ Domain Layer │ ← internal/service, internal/pow (business logic) +├─────────────────────────────────────┤ +│ External Layer │ ← internal/quotes (external APIs) +└─────────────────────────────────────┘ +``` + +**Benefits**: +- **Testability**: Each layer can be unit tested independently +- **Maintainability**: Changes in one layer don't cascade +- **Flexibility**: Easy to swap implementations (e.g., different quote sources) +- **Domain Focus**: Core business rules are isolated and protected + +## Protocol Design + +### Binary Protocol with JSON Payloads +Choice: Custom binary protocol with JSON-encoded message bodies + +**Why Binary Protocol**: +- **Performance**: Efficient framing and length prefixes +- **Reliability**: Clear message boundaries prevent parsing issues +- **Extensibility**: Easy to add message types and versions + +**Why JSON Payloads**: +- **Simplicity**: Standard library support, easy debugging +- **Flexibility**: Schema evolution without breaking compatibility +- **Tooling**: Excellent tooling and human readability + +**Alternative Considered**: Pure binary (Protocol Buffers) +- **Rejected Because**: Added complexity without significant benefit for our use case +- **Trade-off**: Slightly larger payload size for much simpler implementation + +### Stateless Challenge Design +Choice: HMAC-signed challenges with all state embedded + +```go +type Challenge struct { + Target string `json:"target"` // "quotes" + Timestamp int64 `json:"timestamp"` // Unix timestamp + Difficulty int `json:"difficulty"` // Leading zero bits + Random string `json:"random"` // Entropy + Signature string `json:"signature"` // HMAC-SHA256 +} +``` + +**Benefits**: +- **Scalability**: No server-side session storage required +- **Reliability**: Challenges survive server restarts +- **Security**: HMAC prevents tampering and replay attacks +- **Simplicity**: No cache management or cleanup needed + +**Alternative Considered**: Session-based challenges +- **Rejected Because**: Requires distributed session management for horizontal scaling + +## Proof-of-Work Algorithm + +### SHA-256 with Leading Zero Bits +Choice: SHA-256 hashing with difficulty measured as leading zero bits + +**Why SHA-256**: +- **Security**: Cryptographically secure, extensively tested +- **Performance**: Hardware-optimized on most platforms +- **Standardization**: Well-known algorithm with predictable properties + +**Why Leading Zero Bits**: +- **Linear Scaling**: Each bit doubles the difficulty (2^n complexity) +- **Simplicity**: Easy to verify and understand +- **Flexibility**: Fine-grained difficulty adjustment + +**Alternative Considered**: Scrypt/Argon2 (memory-hard functions) +- **Rejected Because**: Excessive complexity for DDoS protection use case +- **Trade-off**: ASIC resistance not needed for temporary challenges + +### Difficulty Range: 4-30 Bits +Choice: Configurable difficulty with reasonable bounds + +- **Minimum (4 bits)**: ~16 attempts average, sub-second solve time +- **Maximum (30 bits)**: ~1 billion attempts, several seconds on modern CPU +- **Default (4 bits)**: Balance between protection and user experience + +## Server Architecture + +### TCP Server with Per-Connection Goroutines +Choice: Custom TCP server with one goroutine per connection + +```go +func (s *TCPServer) Start(ctx context.Context) error { + // Start listener + listener, err := net.Listen("tcp", s.config.Address) + if err != nil { + return err + } + + // Start accept loop in goroutine + go s.acceptLoop(ctx) + return nil // Returns immediately +} + +func (s *TCPServer) acceptLoop(ctx context.Context) { + for { + conn, err := s.listener.Accept() + if err != nil || ctx.Done() != nil { + return + } + + // Launch handler in goroutine with WaitGroup tracking + s.wg.Add(1) + go func() { + defer s.wg.Done() + s.handleConnection(ctx, conn) + }() + } +} +``` + +**Benefits**: +- **Concurrency**: Each connection handled in separate goroutine +- **Non-blocking Start**: Server starts in background, returns immediately +- **Graceful Shutdown**: WaitGroup ensures all connections finish before stop +- **Context Cancellation**: Proper cleanup when context is cancelled +- **Resource Control**: Connection timeouts prevent resource exhaustion + +**Alternative Considered**: HTTP/REST API +- **Rejected Because**: Test task requirements + +### Connection Security: Multi-Level Timeouts +Choice: Layered timeout protection against various attacks + +1. **Connection Timeout (15s)**: Maximum total connection lifetime +2. **Read Timeout (5s)**: Maximum time between incoming bytes +3. **Write Timeout (5s)**: Maximum time to send response + +**Protects Against**: +- **Slowloris**: Slow read timeout prevents slow header attacks +- **Slow POST**: Connection timeout limits total request time +- **Resource Exhaustion**: Automatic cleanup of stale connections + +## Configuration Management + +### cleanenv with YAML + Environment Variables +Choice: File-based configuration with environment variable overrides + +```yaml +# config.yaml +server: + address: ":8080" + +pow: + difficulty: 4 +``` + +```bash +# Environment override +export POW_DIFFICULTY=8 +``` + +**Benefits**: +- **Development**: Easy configuration files for local development +- **Production**: Environment variables for containerized deployments +- **Validation**: Built-in validation and type conversion +- **Documentation**: Self-documenting with struct tags + +**Alternative Considered**: Pure environment variables +- **Rejected Because**: Harder to manage complex configurations + +## Observability Architecture + +### Prometheus Metrics +Choice: Prometheus format metrics with essential measurements + +**Application Metrics**: +- `wisdom_requests_total` - All incoming requests +- `wisdom_request_errors_total{error_type}` - Errors by type +- `wisdom_request_duration_seconds` - Request processing time +- `wisdom_quotes_served_total` - Successfully served quotes + +**Go Runtime Metrics** (automatically exported): +- `go_memstats_*` - Memory allocation and GC statistics +- `go_goroutines` - Current number of goroutines +- `go_gc_duration_seconds` - Garbage collection duration +- `process_*` - Process-level CPU, memory, and file descriptor stats + +**Design Principle**: Simple metrics that provide actionable insights +- **Avoided**: Complex multi-dimensional metrics +- **Focus**: Essential health and performance indicators +- **Runtime Visibility**: Go collector provides deep runtime observability + +### Metrics at Infrastructure Layer +Choice: Collect metrics in TCP server, not business logic + +```go +// In TCP server (infrastructure) +metrics.RequestsTotal.Inc() +start := time.Now() +response, err := s.wisdomApplication.HandleMessage(ctx, msg) +metrics.RequestDuration.Observe(time.Since(start).Seconds()) +``` + +**Benefits**: +- **Separation of Concerns**: Business logic stays pure +- **Consistency**: All requests measured the same way +- **Performance**: Minimal overhead in critical path + +## Design Patterns + +### Dependency Injection +All major components use constructor injection: +```go +server := server.NewTCPServer(wisdomApplication, config, options...) +service := service.NewWisdomService(generator, verifier, quoteService) +``` + +**Benefits**: +- **Testing**: Easy to inject mocks and stubs +- **Configuration**: Runtime assembly of components +- **Decoupling**: Components don't know about concrete implementations + +### Interface Segregation +Small, focused interfaces for easy testing: +```go +type ChallengeGenerator interface { + GenerateChallenge(ctx context.Context) (*Challenge, error) +} + +type QuoteService interface { + GetQuote(ctx context.Context) (string, error) +} +``` + +### Functional Options +Flexible configuration with sensible defaults: +```go +server := NewTCPServer(application, config, + WithLogger(logger), +) +``` + +### Clean Architecture Implementation +See the layer diagram in the Overall Architecture section above for package organization. + +## Testing Architecture + +### Layered Testing Strategy +1. **Unit Tests**: Each package tested independently with mocks +2. **Integration Tests**: End-to-end tests with real TCP connections +3. **Benchmark Tests**: Performance validation for PoW algorithms + +```go +// Unit test with mocks +func TestWisdomService_HandleMessage(t *testing.T) { + mockGenerator := &MockGenerator{} + mockVerifier := &MockVerifier{} + mockQuotes := &MockQuoteService{} + + service := NewWisdomService(mockGenerator, mockVerifier, mockQuotes) + // Test business logic in isolation +} + +// Integration test with real components +func TestTCPServer_SlowlorisProtection(t *testing.T) { + // Start real server, make slow connection + // Verify server doesn't hang +} +``` + +## Security Architecture + +### Defense in Depth +Multiple security layers working together: + +1. **HMAC Authentication**: Prevents challenge tampering +2. **Timestamp Validation**: Prevents replay attacks (5-minute TTL) +3. **Connection Timeouts**: Prevents resource exhaustion +4. **Proof-of-Work**: Rate limiting through computational cost +5. **Input Validation**: All protocol messages validated + +### Threat Model +**Primary Threats Addressed**: +- **DDoS Attacks**: PoW makes attacks expensive +- **Resource Exhaustion**: Connection timeouts and limits +- **Protocol Attacks**: Binary framing prevents confusion +- **Replay Attacks**: Timestamp validation in challenges + +**Threats NOT Addressed** (by design): +- **Authentication**: Public service, no user accounts +- **Authorization**: All valid solutions get quotes +- **Data Confidentiality**: Quotes are public information + +## Trade-offs Made + +### Simplicity vs Performance +- **Chose**: Simple JSON payloads over binary serialization +- **Trade-off**: ~30% larger messages for easier debugging and maintenance + +### Memory vs CPU +- **Chose**: Stateless challenges requiring CPU verification +- **Trade-off**: More CPU per request for better scalability + +### Flexibility vs Optimization +- **Chose**: Interface-based design with dependency injection +- **Trade-off**: Small runtime overhead for much better testability + +### Features vs Complexity +- **Chose**: Essential features only (no rate limiting, user accounts, etc.) +- **Benefit**: Clean, focused implementation that does one thing well + +## Future Architecture Considerations + +For production scaling, consider: +1. **Quote Service Enhancement**: Caching, fallback quotes, multiple API sources +2. **Load Balancing**: Multiple server instances behind load balancer +3. **Rate Limiting**: Per-IP request limiting for additional protection +4. **Monitoring**: Full observability stack (Prometheus, Grafana, alerting) +5. **Security**: TLS encryption for sensitive deployments + +The current architecture provides a solid foundation for these enhancements while maintaining simplicity and focus. diff --git a/docs/IMPLEMENTATION.md b/docs/IMPLEMENTATION.md index c720499..e4068bb 100644 --- a/docs/IMPLEMENTATION.md +++ b/docs/IMPLEMENTATION.md @@ -109,6 +109,12 @@ - [X] Implement configuration management using cleanenv library - [X] Read configuration from file with environment variable support +## Phase 9: Documentation +- [X] Create comprehensive README.md with project overview and quick start +- [X] Document package structure and responsibilities +- [X] Document architecture choices and design decisions +- [X] Update production readiness assessment + ## Directory Structure ``` diff --git a/docs/PACKAGES.md b/docs/PACKAGES.md new file mode 100644 index 0000000..8c30427 --- /dev/null +++ b/docs/PACKAGES.md @@ -0,0 +1,147 @@ +# Package Structure + +This document explains the organization and responsibilities of all packages in the Hash of Wisdom project. + +## Directory Structure + +``` +/ +├── cmd/ # Application entry points +│ ├── server/ # Server application +│ └── client/ # Client application +├── internal/ # Private application packages +│ ├── application/ # Application layer (message handling) +│ ├── config/ # Configuration management +│ ├── lib/ # Shared utilities +│ ├── metrics/ # Prometheus metrics +│ ├── pow/ # Proof-of-Work implementation +│ ├── protocol/ # Binary protocol codec +│ ├── quotes/ # Quote service +│ ├── server/ # TCP server implementation +│ └── service/ # Business logic layer +├── test/ # Integration tests +└── docs/ # Documentation +``` + +## Package Responsibilities + +### `cmd/server` +**Entry point for the TCP server application** +- Parses command-line flags and configuration +- Initializes all components with dependency injection +- Starts TCP server and metrics endpoints +- Handles graceful shutdown signals + +### `cmd/client` +**Entry point for the client application** +- Command-line interface for connecting to server +- Handles proof-of-work solving on client side +- Manages TCP connection lifecycle + +### `internal/config` +**Configuration management with cleanenv** +- Defines configuration structures with YAML/env tags +- Loads configuration from files and environment variables +- Provides sensible defaults for all settings +- Supports both development and production configurations + +### `internal/lib/sl` +**Shared logging utilities** +- Structured logging helpers for consistent log formatting +- Error attribute helpers for slog integration + +### `internal/metrics` +**Prometheus metrics collection** +- Defines application-specific metrics (requests, errors, duration) +- Provides simple counters and histograms for monitoring +- Integrated at the infrastructure layer (TCP server) + +### `internal/pow` +**Proof-of-Work implementation** + +#### `internal/pow/challenge` +- **Challenge Generation**: Creates HMAC-signed stateless challenges +- **Verification**: Validates solutions against original challenges +- **Security**: HMAC authentication prevents tampering +- **Configuration**: Difficulty scaling, TTL management, secrets + +#### `internal/pow/solver` +- **Solution Finding**: Brute-force nonce search with SHA-256 +- **Optimization**: Efficient bit counting for difficulty verification +- **Client-side**: Used by client to solve server challenges + +### `internal/protocol` +**Binary protocol codec** +- **Message Types**: Challenge requests/responses, solution requests/responses, errors +- **Encoding/Decoding**: JSON-based message serialization +- **Streaming**: MessageDecoder for reading from TCP connections +- **Validation**: Message structure and field validation +- See [Protocol Specification](PROTOCOL.md) for detailed message flow and format + +### `internal/quotes` +**Quote service implementation** +- **HTTP Client**: Fetches quotes from external APIs using resty +- **Interface**: Clean abstraction for quote retrieval +- **Error Handling**: Graceful degradation for network issues +- **Timeout Management**: Configurable request timeouts + +### `internal/server` +**TCP server implementation** + +#### `internal/server/tcp.go` +- **Connection Management**: Accept, handle, cleanup TCP connections +- **Protocol Integration**: Uses protocol package for message handling +- **Security**: Connection timeouts, slowloris protection +- **Metrics**: Request tracking at infrastructure layer +- **Lifecycle**: Graceful startup/shutdown with context + +#### `internal/server/config.go` +- **Server Configuration**: Network settings, timeouts +- **Functional Options**: Builder pattern for server customization + +### `internal/application` +**Application layer (message handling and coordination)** +- **WisdomApplication**: Protocol message handler and coordinator +- **Message Processing**: Handles challenge and solution requests from protocol layer +- **Response Generation**: Creates appropriate protocol responses +- **Service Coordination**: Orchestrates calls to business logic layer +- **Error Handling**: Converts service errors to protocol error responses + +### `internal/service` +**Business logic layer (core domain services)** +- **WisdomService**: Main business logic coordinator +- **Challenge Workflow**: Manages challenge generation and validation +- **Solution Workflow**: Handles solution verification and quote retrieval +- **Clean Architecture**: Pure business logic, no I/O dependencies +- **Testing**: Easily mockable interfaces for unit testing + +**Service Dependencies**: +- `ChallengeGenerator` - Creates new challenges +- `ChallengeVerifier` - Validates submitted solutions +- `QuoteService` - Retrieves quotes after successful validation + +### `test/integration` +**End-to-end integration tests** +- **Slowloris Protection**: Tests server resilience against slow attacks +- **Connection Timeouts**: Validates timeout configurations +- **Full Workflow**: Tests complete client-server interaction +- **Real Components**: Uses actual TCP connections and protocol + +## Dependency Flow + +``` +cmd/server + ↓ +internal/config → internal/server → internal/application → internal/service + ↓ ↓ ↓ + internal/protocol internal/protocol internal/pow + internal/metrics internal/quotes +``` + +## Architecture Benefits + +This package structure provides: +- **Clear Separation**: Each package has a single, well-defined responsibility +- **Testability**: Dependencies are injected, making testing straightforward +- **Maintainability**: Changes are isolated to specific layers +- **Scalability**: Clean interfaces allow for easy implementation swapping diff --git a/docs/PRODUCTION_READINESS.md b/docs/PRODUCTION_READINESS.md new file mode 100644 index 0000000..e9355da --- /dev/null +++ b/docs/PRODUCTION_READINESS.md @@ -0,0 +1,70 @@ +# Production Readiness Assessment + +## Current Implementation Status + +### ✅ Core Functionality (Complete) +- **Proof of Work System**: SHA-256 hashcash with HMAC-signed stateless challenges +- **Binary Protocol**: Custom TCP protocol with JSON payloads and proper framing +- **TCP Server**: Connection handling with timeout protection against slowloris attacks +- **Client Application**: CLI tool with challenge solving and solution submission +- **Service Layer**: Clean architecture with dependency injection +- **Quote System**: External API integration for inspirational quotes +- **Security**: HMAC authentication, replay protection, input validation +- **Testing**: Comprehensive unit tests and slowloris protection integration tests + +### ✅ Observability & Configuration (Complete) +- **Metrics Endpoint**: Prometheus metrics at `/metrics` with application and Go runtime KPIs +- **Application Metrics**: Request tracking, error categorization, duration histograms, quotes served +- **Go Runtime Metrics**: Memory stats, GC metrics, goroutine counts, process stats (auto-registered) +- **Profiler Endpoint**: Go pprof integration at `/debug/pprof/` for performance debugging +- **Structured Logging**: slog integration throughout server components with consistent formatting +- **Configuration**: cleanenv-based config management with YAML files and environment variables +- **Containerization**: Production-ready Dockerfile with security best practices +- **Error Handling**: Proper error propagation and categorization +- **Graceful Shutdown**: Context-based shutdown with connection draining + +## Remaining Components for Production + +### Critical for Production +1. **Connection Pooling & Resource Management** (worker pools, connection limits) +2. **Rate Limiting & DDoS Protection** +3. **Secret Management** (HMAC keys, external API credentials) +4. **Advanced Monitoring & Alerting** +5. **Advanced Configuration Management** +6. **Health Checks** (graceful shutdown already implemented) + +### Important for Scale +7. **Security Hardening** +8. **Quote Service Enhancement** (caching, fallback quotes, multiple sources) +9. **Load Testing & Performance** +10. **Documentation & Runbooks** + +### Nice to Have +11. **Advanced Observability** +12. **Chaos Engineering** +13. **Automated Deployment** + +## Risk Assessment + +### High Risk Areas +- **No rate limiting**: Vulnerable to sophisticated DDoS attacks +- **Hardcoded secrets**: HMAC keys in configuration files (not properly secured) +- **Limited monitoring**: Basic metrics but no alerting or attack detection +- **Single point of failure**: No redundancy or failover + +### Medium Risk Areas +- **Memory management**: Potential leaks under high load +- **External dependencies**: Quote API could become bottleneck +- **Configuration drift**: Manual configuration prone to errors + +## Current Architecture Strengths + +The existing implementation provides an excellent foundation: +- **Clean Architecture**: Proper separation of concerns with dependency injection +- **Security-First Design**: HMAC authentication, replay protection, and timeout protection +- **Stateless Operation**: HMAC-signed challenges enable horizontal scaling +- **Graceful Shutdown**: Proper context handling and connection draining +- **Comprehensive Testing**: Proven slowloris protection and unit test coverage +- **Observability Ready**: Prometheus metrics, pprof profiling, structured logging +- **Standard Protocols**: Industry-standard approaches (TCP, JSON, SHA-256) +- **Container Ready**: Production Dockerfile with security best practices