Skip to Content
DocsArchitecture Overview

Architecture Overview

This document defines the microservice architecture for the GoGreen Flags Feature Flag Management System. It translates the functional requirements into a concrete service topology with strict separation of concerns and well-defined interfaces.

High-Level Topology

GoGreen Flags employs a Split-Plane Architecture to ensure high availability and performance.

1. Control Plane

  • Responsibility: Management, configuration, governance, billing (CRUD flags, rules, segments, approvals, subscriptions).
  • Components: Management API Gateway, Flag Config, Segment Service, User & Access, Audit, Billing, Integrations, PostgreSQL, Keycloak.
  • Failure Impact: If the Control Plane is down, you cannot change flags, but existing flags continue to evaluate correctly using the cached state.

2. Data Plane

  • Responsibility: High-volume flag evaluation, real-time streaming, config propagation.
  • Components: Evaluation API, Streaming Service, Config Pipeline, Redis.
  • Failure Impact: Direct impact on SDK clients. Designed for high availability and low latency (p99 < 50ms). SDKs fall back to last known good config if the data plane is unreachable.

3. Analytics Plane

  • Responsibility: Event ingestion, aggregation, and experimentation analysis.
  • Components: Events Ingestion, Experimentation Service, ClickHouse.
  • Failure Impact: Experiment data collection pauses, but flag evaluation is unaffected.

Service Catalog

ServicePortRole
Management API Gateway8085Entry point for Dashboard, CLI, and API clients. Routes requests, enforces auth (OIDC + service tokens), RBAC, rate limiting, idempotency, and usage metering.
Flag Config8090Source of truth for flags, targeting rules, variations, and prerequisites. Publishes change events to Redpanda. Runs background workers for scheduled changes and stale flag detection.
Config Pipeline8091Consumes flag change events from Redpanda and updates Redis cache. Ensures data plane has the latest configuration.
Evaluation API8092Stateless serving layer for SDKs. Reads flag config from Redis, evaluates targeting rules, and returns variation results. Supports all 4 flag types with 17+ operators.
User & Access8094Handles OIDC authentication, RBAC (admin/editor/viewer with tag and environment scoping), service token management, approval workflows, and GDPR data export/deletion.
Segment Service8096Manages user segments and cohorts. Redis-cached with Kafka-based cache invalidation. Supports rule-based and list-based segments with bulk import.
Audit Service8098Records immutable audit logs for all platform changes. Supports CSV export and configurable retention enforcement.
Streaming Service8100Pushes real-time flag updates to connected SDKs via Server-Sent Events (SSE). Consumes change events from Redpanda.
Events Ingestion8102Ingests impression and custom events from SDKs. PII stripping, deduplication, and aggregation rollups. Writes to ClickHouse.
Experimentation8104Manages A/B test lifecycle (draft → running → stopped). Queries ClickHouse for aggregated metrics and computes statistical significance (t-test, chi-squared).
Billing8108Subscription management with Stripe integration. Usage metering (6 dimensions), quota enforcement, and webhook idempotency.
Integrations8106Manages outbound integrations with Slack, Microsoft Teams, Jira, Datadog, New Relic, and Dynatrace. HMAC-signed webhooks with retry logic.

Data Flow

Write Path

  1. User updates a flag in the Dashboard (or via CLI/API).
  2. Management API Gateway authenticates, authorizes (RBAC), checks rate limits, and routes to Flag Config.
  3. Flag Config persists the change to PostgreSQL and publishes a change event to Redpanda.
  4. Audit Service records the change. Integrations Service sends notifications to configured channels.

Propagation

  1. Config Pipeline consumes the change event from Redpanda.
  2. Config Pipeline updates the Redis cache with the new flag configuration.
  3. Streaming Service pushes an SSE event to all connected SDK clients.

Read Path (Evaluation)

  1. SDK client sends an evaluation request to the Evaluation API (or evaluates locally from cached config).
  2. Evaluation API reads flag configuration from Redis.
  3. Evaluation engine applies targeting rules, prerequisites, and percentage rollouts.
  4. SDK receives the variation result with evaluation reason (e.g., RULE_MATCH, FALLTHROUGH, PREREQUISITE_FAILED).

Analytics Path

  1. SDKs send impression and custom events to Events Ingestion.
  2. Events Ingestion deduplicates, strips PII, and writes to ClickHouse.
  3. Experimentation Service queries ClickHouse aggregations to compute experiment results.

Security

  • Authentication: OIDC/Keycloak for human users; service tokens with expiry enforcement for machine-to-machine communication; SDK keys for evaluation clients.
  • Authorization: Role-based access control with tag-scoped and environment-scoped restrictions. Org-level isolation ensures multi-tenant data separation.
  • Encryption: TLS everywhere; secrets management for keys and tokens.
  • PII: Private attributes are hashed or stripped at the SDK edge before events are transmitted. Configurable per-attribute redaction strategies (hash, mask, drop).
  • Webhooks: All outbound webhooks are signed with HMAC-SHA256 for payload integrity verification.
  • Network Policies: Kubernetes network policies restrict pod-to-pod communication to declared dependencies.