Skip to content

Track and fix flaky tests

Flaky tests kill productivity. They erode test suite confidence, slow deployments, and waste developer time. Cased detects flaky tests intelligently and remediates them automatically.

Cased combines data analysis with automated remediation:

  1. Test Execution Monitoring: Monitors CI/CD pipeline test results across runs
  2. Pattern Analysis: Analyzes failure patterns to identify flaky behavior
  3. Flakiness Scoring: Scores tests by failure frequency, inconsistency, and impact
  4. Automated Remediation: Creates pull requests to fix or remove flaky tests
  5. Continuous Monitoring: Catches new flaky tests through ongoing analysis
  • Failure Rate Tracking: Tracks test pass/fail rates over time
  • Consistency Scoring: Identifies inconsistent failures across identical conditions
  • Environmental Correlation: Detects environment-specific failures
  • Timing Analysis: Finds timing issues and race conditions
  • Error Message Analysis: Groups similar failures and identifies root causes
  • Dependency Mapping: Maps test dependencies and cascading failures
  • Historical Trends: Tracks test reliability changes over time
  • Impact Assessment: Measures flaky test impact on pipeline reliability
  • Jest: Full support for Jest test results and reporting
  • Vitest: Native integration with Vitest test runner
  • Mocha: Analysis of Mocha test outputs and failure patterns
  • Cypress: End-to-end test flakiness detection
  • pytest: Comprehensive pytest result analysis
  • unittest: Standard Python unittest framework support
  • nose: Legacy nose framework compatibility
  • JUnit: Java test framework analysis
  • RSpec: Ruby test framework support
  • Go Test: Native Go testing framework
  • PHPUnit: PHP test framework integration

Cased creates pull requests automatically to address flaky tests:

# Example PR Description Generated by Cased
## Flaky Test Remediation
This PR addresses flaky tests identified in the test suite:
### Tests Modified:
- `test_user_authentication_flow` - Added retry logic for network calls
- `test_database_connection` - Improved connection cleanup
- `test_async_operation` - Fixed race condition with proper awaits
### Tests Removed:
- `test_deprecated_feature` - Consistently failing, feature removed
- `test_flaky_integration` - Unreliable external dependency
### Analysis:
- 3 tests showed >30% failure rate over last 30 days
- 2 tests had inconsistent failures across environments
- Total pipeline reliability improved from 85% to 96%
  • Retry Logic: Adds intelligent retries for network-dependent tests
  • Wait Conditions: Implements proper waits for async operations
  • Mock Improvements: Replaces unreliable external dependencies
  • Resource Cleanup: Ensures proper test resource cleanup
  • Deprecated Features: Removes tests for non-existent features
  • Redundant Coverage: Eliminates duplicate test coverage
  • Unmaintainable Tests: Removes overly complex or unreliable tests
name: Flaky Test Analysis
on:
schedule:
- cron: "0 2 * * *" # Daily at 2 AM
workflow_dispatch:
jobs:
analyze-flaky-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Test Suite
run: npm test -- --reporter=json > test-results.json
continue-on-error: true
- name: Analyze with Cased
run: |
curl -X POST https://app.cased.com/api/v1/test-analysis/ \
-H "Authorization: Bearer ${{ secrets.CASED_API_KEY }}" \
-H "Content-Type: application/json" \
-d '{
"repository": "${{ github.repository }}",
"test_results": "'$(cat test-results.json | base64 -w 0)'",
"commit_sha": "${{ github.sha }}"
}'
.github/workflows/test-and-analyze.yml
name: Test and Flaky Analysis
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Tests
run: |
npm test -- --reporter=json --outputFile=test-results.json
echo "TEST_EXIT_CODE=$?" >> $GITHUB_ENV
continue-on-error: true
- name: Report to Cased
if: always()
uses: cased/test-analysis-action@v1
with:
api_key: ${{ secrets.CASED_API_KEY }}
test_results_file: test-results.json
create_pr_on_flaky: true
  • Flaky Test Overview: Dashboard showing all identified flaky tests
  • Trend Analysis: Historical view of test reliability improvements
  • Impact Metrics: Measure how flaky tests affect deployment frequency
  • Remediation Tracking: Track the status of automated fixes
  • Overall Test Reliability: Percentage of test runs that pass completely
  • Flaky Test Count: Number of tests identified as flaky
  • Time to Fix: Average time from detection to remediation
  • Pipeline Impact: How flaky tests affect deployment speed
  1. Isolation: Ensure tests don’t depend on each other
  2. Deterministic: Avoid random data or timing dependencies
  3. Cleanup: Properly clean up resources after each test
  4. Mocking: Mock external dependencies and network calls
  1. Regular Analysis: Run flaky test analysis regularly, not just when problems occur
  2. Immediate Action: Address flaky tests as soon as they’re identified
  3. Root Cause Analysis: Understand why tests are flaky, don’t just add retries
  4. Team Communication: Share flaky test reports with the development team
  1. Fail Fast: Don’t let flaky tests slow down your pipeline
  2. Parallel Analysis: Run flaky test analysis in parallel with regular CI
  3. Automated Remediation: Enable automatic PR creation for obvious fixes
  4. Monitoring: Continuously monitor test reliability metrics
  • Slack Notifications: Get notified when flaky tests are detected or fixed
  • Assignment: Automatically assign flaky test fixes to relevant team members
  • Progress Tracking: Track team progress on flaky test remediation
  • Cross-Environment Comparison: Compare test reliability across different environments

Managing flaky tests with Cased transforms a major development pain point into an automated, manageable process that improves overall development velocity and deployment confidence.