Architecture Overview
This document defines the microservice architecture for the GoGreen Flags Feature Flag Management System. It translates the functional requirements into a concrete service topology with strict separation of concerns and well-defined interfaces.
High-Level Topology
GoGreen Flags employs a Split-Plane Architecture to ensure high availability and performance.
1. Control Plane
- Responsibility: Management, configuration, governance, billing (CRUD flags, rules, segments, approvals, subscriptions).
- Components: Management API Gateway, Flag Config, Segment Service, User & Access, Audit, Billing, Integrations, PostgreSQL, Keycloak.
- Failure Impact: If the Control Plane is down, you cannot change flags, but existing flags continue to evaluate correctly using the cached state.
2. Data Plane
- Responsibility: High-volume flag evaluation, real-time streaming, config propagation.
- Components: Evaluation API, Streaming Service, Config Pipeline, Redis.
- Failure Impact: Direct impact on SDK clients. Designed for high availability and low latency (p99 < 50ms). SDKs fall back to last known good config if the data plane is unreachable.
3. Analytics Plane
- Responsibility: Event ingestion, aggregation, and experimentation analysis.
- Components: Events Ingestion, Experimentation Service, ClickHouse.
- Failure Impact: Experiment data collection pauses, but flag evaluation is unaffected.
Service Catalog
| Service | Port | Role |
|---|---|---|
| Management API Gateway | 8085 | Entry point for Dashboard, CLI, and API clients. Routes requests, enforces auth (OIDC + service tokens), RBAC, rate limiting, idempotency, and usage metering. |
| Flag Config | 8090 | Source of truth for flags, targeting rules, variations, and prerequisites. Publishes change events to Redpanda. Runs background workers for scheduled changes and stale flag detection. |
| Config Pipeline | 8091 | Consumes flag change events from Redpanda and updates Redis cache. Ensures data plane has the latest configuration. |
| Evaluation API | 8092 | Stateless serving layer for SDKs. Reads flag config from Redis, evaluates targeting rules, and returns variation results. Supports all 4 flag types with 17+ operators. |
| User & Access | 8094 | Handles OIDC authentication, RBAC (admin/editor/viewer with tag and environment scoping), service token management, approval workflows, and GDPR data export/deletion. |
| Segment Service | 8096 | Manages user segments and cohorts. Redis-cached with Kafka-based cache invalidation. Supports rule-based and list-based segments with bulk import. |
| Audit Service | 8098 | Records immutable audit logs for all platform changes. Supports CSV export and configurable retention enforcement. |
| Streaming Service | 8100 | Pushes real-time flag updates to connected SDKs via Server-Sent Events (SSE). Consumes change events from Redpanda. |
| Events Ingestion | 8102 | Ingests impression and custom events from SDKs. PII stripping, deduplication, and aggregation rollups. Writes to ClickHouse. |
| Experimentation | 8104 | Manages A/B test lifecycle (draft → running → stopped). Queries ClickHouse for aggregated metrics and computes statistical significance (t-test, chi-squared). |
| Billing | 8108 | Subscription management with Stripe integration. Usage metering (6 dimensions), quota enforcement, and webhook idempotency. |
| Integrations | 8106 | Manages outbound integrations with Slack, Microsoft Teams, Jira, Datadog, New Relic, and Dynatrace. HMAC-signed webhooks with retry logic. |
Data Flow
Write Path
- User updates a flag in the Dashboard (or via CLI/API).
- Management API Gateway authenticates, authorizes (RBAC), checks rate limits, and routes to Flag Config.
- Flag Config persists the change to PostgreSQL and publishes a change event to Redpanda.
- Audit Service records the change. Integrations Service sends notifications to configured channels.
Propagation
- Config Pipeline consumes the change event from Redpanda.
- Config Pipeline updates the Redis cache with the new flag configuration.
- Streaming Service pushes an SSE event to all connected SDK clients.
Read Path (Evaluation)
- SDK client sends an evaluation request to the Evaluation API (or evaluates locally from cached config).
- Evaluation API reads flag configuration from Redis.
- Evaluation engine applies targeting rules, prerequisites, and percentage rollouts.
- SDK receives the variation result with evaluation reason (e.g.,
RULE_MATCH,FALLTHROUGH,PREREQUISITE_FAILED).
Analytics Path
- SDKs send impression and custom events to Events Ingestion.
- Events Ingestion deduplicates, strips PII, and writes to ClickHouse.
- Experimentation Service queries ClickHouse aggregations to compute experiment results.
Security
- Authentication: OIDC/Keycloak for human users; service tokens with expiry enforcement for machine-to-machine communication; SDK keys for evaluation clients.
- Authorization: Role-based access control with tag-scoped and environment-scoped restrictions. Org-level isolation ensures multi-tenant data separation.
- Encryption: TLS everywhere; secrets management for keys and tokens.
- PII: Private attributes are hashed or stripped at the SDK edge before events are transmitted. Configurable per-attribute redaction strategies (hash, mask, drop).
- Webhooks: All outbound webhooks are signed with HMAC-SHA256 for payload integrity verification.
- Network Policies: Kubernetes network policies restrict pod-to-pod communication to declared dependencies.