Architecture Overview

This document defines the microservice architecture for the GoGreen Flags Feature Flag Management System. It translates the functional requirements into a concrete service topology with strict separation of concerns and well-defined interfaces.

High-Level Topology

GoGreen Flags employs a Split-Plane Architecture to ensure high availability and performance. The diagram below is the canonical topology; the subsections that follow describe each plane in more detail.

1. Control Plane

Responsibility: Management, configuration, governance, billing (CRUD flags, rules, segments, approvals, subscriptions).
Components: Management API Gateway, Flag Config, Segment Service, User & Access, Audit, Billing, Integrations, PostgreSQL, Keycloak.
Failure Impact: If the Control Plane is down, you cannot change flags, but existing flags continue to evaluate correctly using the cached state.

2. Data Plane

Responsibility: High-volume flag evaluation, real-time streaming, config propagation.
Components: Evaluation API, Streaming Service, Config Pipeline, Redis.
Failure Impact: Direct impact on SDK clients. Designed for high availability and low latency (p99 < 50ms). SDKs fall back to last known good config if the data plane is unreachable.

3. Analytics Plane

Responsibility: Event ingestion, aggregation, and experimentation analysis.
Components: Events Ingestion, Experimentation Service, ClickHouse.
Failure Impact: Experiment data collection pauses, but flag evaluation is unaffected.

Service Catalog

Service	Port	Role
Management API Gateway	8085	Entry point for Dashboard, CLI, and API clients. Routes requests, enforces auth (OIDC + service tokens), RBAC, rate limiting, idempotency, and usage metering.
Flag Config	8090	Source of truth for flags, targeting rules, variations, and prerequisites. Publishes change events to Redpanda. Runs background workers for scheduled changes and stale flag detection.
Config Pipeline	8091	Consumes flag change events from Redpanda and updates Redis cache. Ensures data plane has the latest configuration.
Evaluation API	8092	Stateless serving layer for SDKs. Reads flag config from Redis, evaluates targeting rules, and returns variation results. Supports all 4 flag types with 17+ operators.
User & Access	8094	Handles OIDC authentication, RBAC (admin/editor/viewer with tag and environment scoping), service token management, approval workflows, and GDPR data export/deletion.
Segment Service	8096	Manages user segments and cohorts. Redis-cached with Kafka-based cache invalidation. Supports rule-based and list-based segments with bulk import.
Audit Service	8098	Records immutable audit logs for all platform changes. Supports CSV export and configurable retention enforcement.
Streaming Service	8100	Pushes real-time flag updates to connected SDKs via Server-Sent Events (SSE). Consumes change events from Redpanda.
Events Ingestion	8102	Ingests impression and custom events from SDKs. PII stripping, deduplication, and aggregation rollups. Writes to ClickHouse.
Experimentation	8104	Manages A/B test lifecycle (draft → running → stopped). Queries ClickHouse for aggregated metrics and computes statistical significance (t-test, chi-squared).
Billing	8108	Subscription management with Stripe integration. Usage metering (6 dimensions), quota enforcement, and webhook idempotency.
Integrations	8106	Manages outbound integrations with Slack, Microsoft Teams, Jira, Datadog, New Relic, and Dynatrace. HMAC-signed webhooks with retry logic.

Data Flow

Write Path

User updates a flag in the Dashboard (or via CLI/API).
Management API Gateway authenticates, authorizes (RBAC), checks rate limits, and routes to Flag Config.
Flag Config persists the change to PostgreSQL and publishes a change event to Redpanda.
Audit Service records the change. Integrations Service sends notifications to configured channels.

Propagation

Config Pipeline consumes the change event from Redpanda.
Config Pipeline updates the Redis cache with the new flag configuration.
Streaming Service pushes an SSE event to all connected SDK clients.

Read Path (Evaluation)

SDK client sends an evaluation request to the Evaluation API (or evaluates locally from cached config).
Evaluation API reads flag configuration from Redis.
Evaluation engine applies targeting rules, prerequisites, and percentage rollouts.
SDK receives the variation result with evaluation reason (e.g., RULE_MATCH, FALLTHROUGH, PREREQUISITE_FAILED).

Analytics Path

SDKs send impression and custom events to Events Ingestion.
Events Ingestion deduplicates, strips PII, and writes to ClickHouse.
Experimentation Service queries ClickHouse aggregations to compute experiment results.

Security

Authentication: OIDC/Keycloak for human users; service tokens with expiry enforcement for machine-to-machine communication; SDK keys for evaluation clients.
Authorization: Role-based access control with tag-scoped and environment-scoped restrictions. Org-level isolation ensures multi-tenant data separation.
Encryption: TLS everywhere; secrets management for keys and tokens.
PII: Private attributes are hashed or stripped at the SDK edge before events are transmitted. Configurable per-attribute redaction strategies (hash, mask, drop).
Webhooks: All outbound webhooks are signed with HMAC-SHA256 for payload integrity verification.
Network Policies: Kubernetes network policies restrict pod-to-pod communication to declared dependencies.