Stability and compatibility contract
This document pins what's covered by semantic versioning so downstream users (CI integrations, dashboards, audit pipelines) can build on pipeline-check without re-validating on every minor release.
Anything described as stable below changes only on a major-version bump and gets a deprecation period of at least one minor release beforehand. Anything described as unstable can change on any release; do not depend on it.
TL;DR for CI integrations
If you only have a minute, this is what's safe to build on and what isn't. The rest of the page is the long version.
Safe to depend on:
--output json --output-file <path>writes a parseable JSON file whoseschema_versionyou can branch on.--output sarifwrites a SARIF 2.1.0 file uploadable to GitHub Code Scanning.- Exit codes
0/1/2/3/4keep their meanings (see the canonical table inusage.md). check_idvalues (GHA-001,JF-033,AC-001,XPC-008, …) are stable identifiers across releases.
Don't depend on:
- The terminal report for failure counts or scores — it's rendered for humans. Use JSON.
- Specific
[scan]/[warn]/[gate]stderr lines for programmatic decisions. Use JSON + exit codes. - The exact wording of
descriptionorrecommendationstrings. Refined every release. - Severity downgrades / upgrades within the rule's logical scope.
Wire the gate to
--fail-onor--fail-on-check, not to a hard severity expectation per rule.
CLI flags and subcommands — stable
Every flag listed by pipeline_check --help is stable. That includes:
- Long names (
--pipeline,--output,--severity-threshold,--fail-on,--baseline,--diff-base,--checks,--standard, …) and their values. - Short names (
-p,-o,-f,-c,-O,-v,-q). - The
initsubcommand and the--list-*/--explainfamily. - Default values shown in
--help. - Provider-path flags (
--gha-path,--gitlab-path,--jenkinsfile-path, …) and their auto-detect contracts.
Stability promises:
- A new flag may be added in any minor release.
- A flag's behavior may be expanded (new values accepted, new side effects gated behind opt-in) in any minor release.
- A flag's existing accepted values keep working for the rest of the current major.
- Deprecation marks an option with a warning at least one minor release before removal.
Not stable:
- Wording of help text. Flag descriptions are refined freely.
- The exact text of error messages emitted by
click. Programs that parse stderr should match against the structured signal (exit code, JSON output) instead.
Finding identity — stable
A "finding ID" is one of:
- A rule ID like
GHA-001,JF-033,K8S-042,DF-021. - A taint marker like
TAINT-001. - A chain ID like
AC-001,XPC-008.
Stable contracts on a finding ID:
- An ID, once published in a release, never gets reused for a different rule.
- An ID's severity may be raised or lowered, but the ID itself stays attached to the same logical security concern.
- A rule may be deprecated in a minor release (still emits findings,
but marked deprecated in
--explainoutput) and removed in the next major.
Not stable:
- The exact wording of
title,description,recommendation. Prose is refined every release. CI scripts that key off finding identity should match oncheck_id, not title text. - The exact set of
controlsmapped to a finding. Standards mappings (OWASP CICD, NIST CSF, SLSA, etc.) are corrected and extended every release. The set is additive on minor releases unless a mapping was factually wrong. - The
incident_refsandexploit_examplefields.
JSON output — stable per schema_version
The JSON report emitted by --output json carries a top-level
schema_version field. The current version is 1.1 (see
pipeline_check.core.reporter.JSON_SCHEMA_VERSION).
Stable contracts for schema_version="1.x":
- Top-level keys:
schema_version,tool_version,score,findings,chains.inventoryappears when--inventorywas passed. Consumers should ignore keys they don't recognize so additive changes (new top-level keys) are non-breaking. - Each
findings[]entry has at minimum:check_id,title,severity,confidence,resource,description,recommendation,passed,controls,cwe. New optional fields are added on minor releases. severityis one ofCRITICAL,HIGH,MEDIUM,LOW,INFO.passedis a boolean:falsemeans the rule fired.scoreshape:{score: int, grade: "A"|"B"|"C"|"D", summary: {<SEVERITY>: {passed: int, failed: int}}}.tool_versionis the released pipeline-check version (PEP 440 string).
Breaking JSON changes bump schema_version. A new major version
of the schema (2.0) ships alongside an opt-in flag for at least one
minor release before the old format is dropped, so consumers can
migrate without coordinating with their pipeline owners.
SARIF output — stable
--output sarif emits SARIF 2.1.0 conforming to the GitHub Advanced
Security upload contract. Key contracts:
runs[].tool.driver.versionis the pipeline-check release.runs[].results[].ruleIdmatches the finding'scheck_id.runs[].results[].partialFingerprints.pipelineCheckV1is stable per-finding across runs (same input → same fingerprint), so GitHub's deduplication works correctly. The fingerprint algorithm itself is internal; only the property name is contracted.- Standard slugs go in
tags; control IDs go inproperties.controls. GitHub capstagsat 20 entries.
Not stable:
- The exact contents of
properties.controlstrack the finding's control list, which is itself stable only at the standards-slug level (see "Finding identity" above).
JUnit output — stable
--output junit emits JUnit XML readable by every major CI test
viewer (Jenkins, GitHub Actions, GitLab CI). The contract:
- One
<testcase>per finding, classname=pipeline-check.<provider>, name=<check_id>: <title>. - Failures use
<failure>withtype="<severity>". <testsuite>aggregates the total/failed counts.
Markdown output — unstable
--output markdown is intended for PR-comment rendering. The exact
formatting is refined release-to-release. Consumers should not parse
it; use JSON or SARIF for machine-readable output.
Threat-model output — unstable
--output threatmodel (STRIDE table) is similarly prose-shaped and
subject to release-to-release refinement.
Terminal output — explicitly not stable
The Rich-rendered terminal report exists to be read by a human at the
end of a scan. Its layout, color palette, severity glyphs, and
per-finding panel shape change freely. CI scripts must not parse
terminal output — use --output json (or SARIF / JUnit) instead.
The [auto], [scan], [warn], [gate], [debug], [hint],
[autofix], [ingest] log lines on stderr are also unstable. The
prefix shape is intentional (so `grep -E '^[(warn|gate)]' filters
work), but the message wording is not contracted.
Exit codes — stable
| Code | Meaning |
|---|---|
0 |
Scan completed; gate passed. |
1 |
Scan completed; gate failed (--fail-on / --min-grade / --max-failures / --fail-on-check / --fail-on-chain / --fail-on-any-chain tripped). |
2 |
Bad invocation or unexpected scan exception. Click UsageError (bad flag value, missing required path, mutually-exclusive conflict) and uncaught scanner exceptions both surface here. The error and any traceback are on stderr. |
3 |
Operational failure on a non-scan action: --list-checks / --explain for an unknown ID, --apply without --fix, MCP support not installed, malformed --ignore-file, unparseable --baseline. |
4 |
--ai-explain request failure (missing SDK, missing API key, unknown provider, request error). |
Code 1 is what users gate CI runs on. Codes 2, 3, and 4 mean
the scan didn't complete usefully; treating them as failures is the
safe default but distinct semantically from 1. The full table is
the canonical one in usage.md; the same
contract applies here and is covered by the stability promise.
Gate semantics — stable
The default gate fails on any CRITICAL finding. Passing any explicit
gate option (--fail-on, --min-grade, --max-failures,
--fail-on-check, --fail-on-chain, --fail-on-any-chain)
suppresses the default and only the explicit options govern.
Loosen with e.g. --max-failures 999999; tighten with
--fail-on HIGH. Severity ranking is `CRITICAL > HIGH > MEDIUM > LOW
INFO`. INFO-severity findings never count toward the score.
Degraded-mode findings (<PREFIX>-000, emitted when an AWS API call
fails) are INFO-severity and never trip the gate. A [warn] line on
stderr surfaces them.
Scoring model — stable
The weighted score formula (CRITICAL=20, HIGH=10, MEDIUM=5, LOW=2, INFO=0) and the A/B/C/D grade thresholds (A ≥ 90, B ≥ 75, C ≥ 60, D < 60) are stable. They will not change without a major version bump.
Python API — stable for the documented surface
The pipeline_check package surface re-exported from the top-level
module (Scanner, ScanMetadata, Finding, Location, Severity,
Confidence, ControlRef, severity_rank, confidence_rank,
score, ScoreResult, Chain, ChainRule, evaluate_chains,
list_chain_rules, available_providers, available_standards,
load_custom_rules, LoadedCustomRules, CustomRuleError,
__version__) is stable. The authoritative list is __all__ in
pipeline_check/__init__.py. Internal modules
(pipeline_check.core.checks.*, _primitives, provider helpers)
are not part of the public surface — they can change freely between
minor releases.
Configuration file — stable
.pipeline-check.yml keys documented in docs/config.md
are stable. Unknown keys log a warning but don't fail the load, so
adding new options in newer pipeline-check releases doesn't break
older configs.
See also
The short list at the top of this page ("TL;DR for CI integrations") restates the safe-to-depend-on contract for readers who just need the punch list.