Litmus — Find Your Riskiest Untested Code

The Problem

Legacy codebases don’t come with a map

Every team that inherits a legacy .NET codebase arrives at the same question — “where do we actually start adding tests?”

The instinct is one of two extremes: test everything systematically (paralysis by scope) or pick a file at random (paralysis by arbitrariness). Neither survives contact with a real delivery schedule.

Roy Osherove addresses this in The Art of Unit Testing (3rd edition, Chapter 12.1). His answer is to build a priority list — a ranked set of components where testing delivers the most value relative to effort. He frames evaluation across three dimensions: logical complexity (how much can go wrong?), dependency level (how hard is it to isolate?), and priority (how urgent is it?). The problem is that Osherove describes these dimensions conceptually. He doesn’t give you a tool that computes them against your actual codebase.

That gap is what Litmus fills.

Existing Solutions

Why existing tools fall short

The space of code quality tooling is well-populated. So why build something new?

SonarQube & NDepend

Comprehensive platforms that require server infrastructure or expensive licenses. They produce dozens of metrics designed for ongoing governance — not the specific question of “which files should I write tests for first?”

CodeScene

Pioneered cross-referencing git history with complexity to find “hotspots.” Genuinely excellent. But it’s a SaaS product, language-agnostic, and doesn’t address unit testability. It tells you where code is complex — not whether it’s covered or how hard it would be to test.

Coverlet & dotnet-coverage

Give you coverage data. They tell you which lines are covered. They have no opinion about which uncovered lines matter most or in what order you should approach them.

Seam detection — nobody does it

Michael Feathers’ Working Effectively with Legacy Code made “seams” the standard vocabulary for legacy testability. Yet no publicly available tool automates their detection. Teams still eyeball code to judge whether a file is testable. Litmus is the first tool to do this programmatically — using Roslyn to detect six categories of unseamed dependencies and weight them into a testability score.

The gap is specific: no lightweight, .NET-native CLI tool exists that combines git churn, code coverage, and testability analysis into a single ranked output — framed explicitly around where to add tests first and whether you can start today.

Usage

Two commands, one answer

Choose the workflow that fits your environment.

scan — all-in-one

$ dotnet-litmus scan

# Auto-detects your .sln file,
# runs dotnet test with Coverlet,
# discovers & merges all coverage files,
# analyzes git history + source,
# outputs the ranked table.

The recommended starting point. Handles the complete workflow in a single invocation — no separate coverage generation, no file paths to locate, no intermediate output to manage.

analyze — bring your own coverage

$ dotnet-litmus analyze \
    --solution ./MyApp.sln \
    --coverage ./coverage.xml

# For CI pipelines or teams that
# already generate coverage as
# part of their build.

The CI-friendly alternative. Supply your own Cobertura XML and get the same ranked output. Both commands accept any .sln or .slnx solution file.

Phase 1

Risk: What is dangerous to leave untested?

Phase 1 combines three data sources into a single RiskScore per file, answering the first two of Osherove’s dimensions: logical complexity and priority.

Git Churn (Priority)

Measures weighted churn: the sum of lines added and deleted across all commits in the analysis window. A commit that changed one formatting line contributes 1. A commit that rewrote 200 lines of business logic contributes 200.

An additional noise floor discards any commit’s contribution to a file if the total lines changed is two or fewer — cleanly eliminating auto-fixes and import reordering.

Code Coverage (The Gap)

Extracts line coverage rate from a Cobertura XML report. Files absent from the report are treated as zero coverage — because absence means either they were never executed or no tests exist at all.

When multiple test projects exist, scan merges coverage automatically: for any source file covered by more than one test project, it takes the highest coverage rate seen.

Cyclomatic Complexity (Logical Complexity)

Computed via Roslyn syntax analysis without requiring a full compilation. Counts branching constructs per method: if, else if, case, for, foreach, while, catch, when, ternary, null-coalescing, &&, and ||.

Each method starts at base complexity 1. File complexity is the sum across all methods, normalized across the solution.

Risk Formula

RiskScore = ChurnNorm × (1 − CoverageRate) × (1 + ComplexityNorm)

Score range: 0 to 2.0. The maximum occurs when a file has the highest churn, zero coverage, and the highest complexity.

ChurnNorm

Files that never change get score 0, regardless of coverage or complexity. No churn = no risk of regression.

(1 − CoverageRate)

100% covered files get score 0 — if every line is tested, churn is safely caught. 0% coverage = full penalty.

(1 + ComplexityNorm)

A multiplier (1× to 2×), not an independent term. It amplifies risk for complex files without creating risk where none exists.

High ≥ 0.6

Changes often, poorly tested, complex logic. Prioritize tests here immediately.

Medium ≥ 0.2

Moderate combined signal. Worth investigating and planning for the next sprint.

Low < 0.2

Low churn, well-tested, or simple code. Lower priority for test investment.

Phase 2

Starting Priority: Where can you actually begin today?

Phase 1 tells you what’s dangerous. It doesn’t tell you what’s practical. This distinction matters enormously in legacy work, and it’s the part of Osherove’s framework that most tools ignore entirely.

A deeply entangled file — one that directly instantiates database connections, calls DateTime.Now in five methods, and has no interface injection points — may be highly risky. But attempting to write the first test for it before introducing any seams is an exercise in frustration. For a team trying to build momentum, starting there is counterproductive.

Phase 2 quantifies this cost using Michael Feathers’ concept of seams (from Working Effectively with Legacy Code) — points in the code where a dependency can be substituted in a test without modifying production code. It measures how many unseamed dependencies a file has.

The Six Dependency Signals

Unseamed Infrastructure Calls

Direct access to things a test can never control: DateTime.Now, File.*, Environment.*, Guid.NewGuid(), directly instantiated HttpClient, SqlConnection, DbContext subclasses. The worst offenders — zero seam available.

weight: 2.0×

Direct Instantiation in Methods

Any new ConcreteType() inside a method body that isn’t a value object, DTO, exception, or collection. new inside a method seals the dependency — the test has no way to substitute it.

weight: 1.5×

Concrete Constructor Parameters

Constructor parameters whose type doesn’t follow the ITypeName interface convention. The seam exists structurally but isn’t actually substitutable without extra work.

weight: 0.5×

Static Calls on Non-Utility Types

Method calls shaped as TypeName.Method() on PascalCase identifiers that aren’t known safe utilities (Math, Convert, Encoding, etc.). Static methods have no instance to substitute.

weight: 1.0×

Async Seam Calls

Await expressions targeting known async I/O: await _httpClient.GetAsync(), await _db.SaveChangesAsync(). Combines the seam problem with async execution complexity.

weight: 1.5×

Concrete Downcasts

Cast expressions like (ConcreteType)expr and expr as ConcreteType. Defeats interface abstractions — a seam in name only that will fail if the substitute isn’t the exact concrete type.

weight: 1.0×

Dependency Score

DependencyNorm = RawScore / Max(RawScore)

Each signal count is multiplied by its weight and summed. The result is normalized across all files in the solution.

Starting Priority

StartingPriority = RiskScore × (1 − DependencyNorm)

Discounts priority as entanglement increases. A fully seamed file keeps its full RiskScore. A maximally entangled file drops to zero.

The most valuable signal

A file that scores High Risk but Low Starting Priority is one of the most valuable signals the tool produces. It tells you: “This file is dangerous to leave untested, but it has too many unseamed dependencies to test directly today. Introduce seams first — then come back.” That is a concrete, actionable instruction, not a vague warning.

Output

See it in action

Run one command, get a prioritized table showing both what’s dangerous and where to start.

dotnet-litmus scan

$ dotnet-litmus scan

▸ Running tests with coverage collection...
▸ Discovered 3 coverage files, merging...
▸ Analyzing 847 commits since 2025-03-01...
▸ Computing complexity for 42 source files...
▸ Evaluating dependency seams...

Rank │ File                     │ Commits │ Coverage │ Complexity │ Dependency │ Risk   │ Priority │ Level
─────┼──────────────────────────┼─────────┼──────────┼────────────┼────────────┼────────┼──────────┼───────
 1   │ OrderService.cs          │ 847     │ 12%      │ 94         │ Low        │ 1.42   │ 1.42     │ High
 2   │ ReportFormatter.cs       │ 290     │ 31%      │ 67         │ Low        │ 0.71   │ 0.71     │ High
 3   │ PaymentGateway.cs        │ 612     │ 8%       │ 118        │ Very High  │ 1.61   │ 0.32     │ Medium
 4   │ LegacyDbSync.cs          │ 503     │ 0%       │ 201        │ Very High  │ 1.89   │ 0.19     │ Low

42 files analyzed. 2 high-priority, 1 medium-priority, 1 low-priority.

▸ PaymentGateway.cs: High risk (1.61) but Very High dependency — introduce seams first.
▸ LegacyDbSync.cs: Highest risk (1.89) but maximally entangled — backlog until refactored.

PaymentGateway.cs and LegacyDbSync.cs are the most dangerous files in this table. But their high dependency scores push them down the starting priority list — because before you can write a single test, you need to introduce seams. OrderService.cs and ReportFormatter.cs are where the team should begin.

Get Started

Up and running in under a minute

Install the tool, point it at your solution, and get a ranked list.

1

Install the tool

Requires .NET 8 SDK or later. Installs as a global CLI tool via NuGet.

dotnet tool install -g dotnet-litmus

2

Scan your solution (recommended)

Runs tests, collects coverage, and analyzes everything in one command. Auto-detects your .sln file, or pass --solution explicitly.

dotnet-litmus scan

3

Or bring your own coverage

If you already have a Cobertura XML from CI, use analyze instead.

dotnet-litmus analyze --solution MyApp.sln --coverage coverage.xml

Advanced examples

advanced usage

# Last 6 months, top 10 files
$ dotnet-litmus scan \
    --solution src/MyApp.sln \
    --since 2025-09-01 \
    --top 10

# Exclude generated code, export JSON
$ dotnet-litmus scan \
    --solution MyApp.sln \
    --exclude "*.Generated.cs" \
    --exclude "**/Migrations/*.cs" \
    --output report.json

# CI baseline comparison
$ dotnet-litmus analyze \
    --solution MyApp.sln \
    --coverage coverage.xml \
    --baseline main-baseline.json \
    --format json

Reference

CLI options

Shared options apply to both scan and analyze. A few are scan-only, marked below.

Option	Default	Description
--solution	auto-detect	Path to a `.sln` or `.slnx` solution file. Auto-detected when exactly one exists in the current directory.
--coverage	—	Path to a Cobertura XML coverage file. Required for `analyze` only.
--since	1 year ago	Limit git history to commits after this date (ISO 8601 format).
--top	20	Number of top risky files to display in the output table.
--exclude	—	Glob pattern(s) to exclude files. Repeatable for multiple patterns.
--output	—	Export full results to a `.json`, `.csv`, or `.html` file.
--format	table	Output format for stdout: `table`, `json`, `csv`, or `html`.
--baseline	—	Compare against a previous JSON run to surface regressions.
--tests-dir	—	`scan` only. Directory or `.csproj` to run `dotnet test` against. Defaults to the solution file.
--coverage-tool	coverlet	`scan` only. Coverage collector: `coverlet` or `dotnet-coverage`. Use the latter if coverlet hangs.
--no-coverage	false	`scan` only. Skip test execution and coverage collection. Ranks by churn, complexity, and testability only.
--timeout	10	`scan` only. Maximum minutes for the test run. Auto-falls back to `dotnet-coverage` on coverlet timeout.
--verbose	false	Show all intermediate scores per file (churn norm, complexity norm, all 6 dependency signal counts, raw dependency score).
--quiet	false	Suppress all output except errors.
--fail-on-threshold	—	Exit with code 1 if any file’s Risk Score or Starting Priority exceeds this value (0.0–2.0).
--no-color	false	Disable colored output for CI environments and piping.

Default exclusions are always applied: Program.cs, Startup.cs, *.Designer.cs, *.g.cs, *.generated.cs, *.xaml.cs, *Migrations/*.cs, *ModelSnapshot.cs, *AssemblyInfo.cs, **/obj/**, **/bin/**, and more. Use --exclude to add patterns on top of these.

Track Progress

Baseline comparison & CI integration

Save a report, compare it next sprint, and catch regressions automatically.

baseline comparison

# Save a baseline
$ dotnet-litmus scan --output baseline.json

# Later: compare against it
$ dotnet-litmus scan --baseline baseline.json

Rank ∣ File                  ∣ … ∣ Priority ∣ Delta  ∣ Level
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
 1   ∣ OrderService.cs       ∣ … ∣ 1.42     ∣ +0.15  ∣ High
 2   ∣ ReportFormatter.cs    ∣ … ∣ 0.71     ∣ -0.10  ∣ High
 3   ∣ NotificationSvc.cs    ∣ … ∣ 0.55     ∣ NEW    ∣ Medium

vs baseline: 1 improved, 1 degraded, 1 new, 0 removed.

The Delta column appears automatically when --baseline is provided. Red deltas mean a file got riskier. Green means improved. NEW marks files not in the baseline.

GitHub Actions example

.github/workflows/litmus.yml

name: Litmus Analysis
on: [push]
jobs:
  litmus:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # full history

      - uses: actions/setup-dotnet@v4

      - run: dotnet tool install -g dotnet-litmus

      - run: dotnet-litmus scan \
               --output report.json \
               --format json --quiet

      - uses: actions/upload-artifact@v4
        with:
          name: litmus-report
          path: report.json

Key CI flags: --quiet suppresses console output, --no-color disables ANSI, --format json pipes structured data to stdout, --output report.html generates a shareable HTML report, and --fail-on-threshold 1.0 fails the build if any file exceeds a score.

Philosophy

Why Litmus works where others don’t

It asks a more precise question than generic code quality tools.

It is not asking “is this codebase healthy?” It is asking “given that I have finite time and this codebase has no tests, what is the highest-value, lowest-friction place to start?”

That precision is what allows it to be small, fast, and installable with a single command. There is no server, no dashboard, no license. A developer can run it in under a minute and walk into a sprint planning meeting with a concrete proposal for which files to target.

The scoring model is transparent and formula-driven. Every score is reproducible and explainable. A team can look at a file’s individual signal values and understand exactly why it ranked where it did. That explainability matters for team buy-in — nobody argues with a formula backed by git data and coverage numbers.

Most importantly, the separation of RiskScore from StartingPriority is not a cosmetic choice. It is the difference between a tool that tells you what is dangerous and a tool that tells you where to go today. Legacy codebases are not conquered in one sprint. They are approached incrementally, file by file, wave by wave. Litmus gives you the map for each wave.

Know exactly where toadd tests first