Skip to Content
DocsExperimentation

Experimentation

GoGreen includes a built-in experimentation platform for running A/B tests, measuring feature impact, and making data-driven decisions.

How It Works

  1. Create an experiment linked to a feature flag with two or more variations.
  2. Define metrics — what you want to measure (e.g., conversion rate, revenue, latency).
  3. Start the experiment — GoGreen begins collecting impression and custom events from SDKs.
  4. View results — statistical analysis tells you which variation performs better with confidence intervals.

Experiment Lifecycle

Draft → Running → Stopped
  • Draft: Configure the experiment, link a flag, and define metrics. No data is collected yet.
  • Running: Data collection is active. SDKs send impression events (which variation a user saw) and custom events (what the user did).
  • Stopped: Data collection freezes. Results are finalized and available for review.

Creating an Experiment

Via Dashboard

  1. Navigate to Experiments in the sidebar.
  2. Click Create Experiment.
  3. Select the flag and environment to experiment on.
  4. Define one or more metrics (event key + aggregation type).
  5. Click Create (starts in Draft state).
  6. When ready, click Start Experiment.

Via API

# Create an experiment curl -X POST https://api.gogreenflags.com/v1/projects/{projectId}/experiments \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "Checkout Flow Test", "flag_key": "new-checkout", "environment_id": "env-prod", "description": "Test new checkout flow vs legacy" }'

Tracking Events

SDKs automatically send impression events when a flag is evaluated. You can also send custom events to track business metrics:

Go

client.Track("purchase", user, map[string]any{ "revenue": 49.99, "currency": "USD", "items": 3, })

TypeScript

client.track('purchase', { revenue: 49.99, currency: 'USD', items: 3, });

Statistical Analysis

GoGreen computes statistical significance automatically:

Metric TypeTestOutput
Numeric (e.g., revenue)Welch’s t-testMean difference, p-value, confidence interval
Categorical (e.g., conversion)Chi-squared testProportion difference, p-value, confidence interval

Results include:

  • p-value: Probability the difference is due to chance. A p-value < 0.05 indicates statistical significance.
  • Confidence interval: Range within which the true difference lies with 95% confidence.
  • Sample size: Number of users in each variation.
  • Lift: Percentage improvement of the treatment over the control.

Event Pipeline

Events flow through a purpose-built analytics pipeline:

  1. SDKs send events to the Events Ingestion service.
  2. Events Ingestion deduplicates events, strips PII, and writes to ClickHouse.
  3. ClickHouse stores raw events and maintains materialized views for hourly, daily, and monthly aggregation rollups.
  4. Experimentation Service queries ClickHouse aggregations to compute results.

Data Retention

Event data is retained according to your plan’s retention period. Aggregation rollups (hourly → daily → monthly) ensure long-term trend analysis while managing storage costs.

Best Practices

  • Run experiments for at least 1-2 weeks to account for day-of-week effects.
  • Don’t peek at results and stop early — let the experiment run to the planned sample size for valid statistical conclusions.
  • Use guardrail metrics alongside your primary metric to catch unintended negative effects (e.g., monitor error rate alongside conversion rate).
  • One change at a time — avoid running overlapping experiments on the same flag to prevent interaction effects.