Usage

Quick-reference task-oriented guide. For deep dives, follow the links at the bottom of each section.

📦 Install

pip install pipeline-check       # package name: hyphenated
pipeline_check --version         # command name: underscored

Python 3.10+ is required. pipx install pipeline-check also works and keeps the CLI out of your project environment.

Container image

Every release also publishes a multi-arch (linux/amd64 + linux/arm64) image to Docker Hub and GHCR, with SLSA build provenance and an SBOM attached to the manifest:

docker run --rm -v "$PWD:/scan" dmartinochoa/pipeline-check
docker run --rm -v "$PWD:/scan" ghcr.io/dmartinochoa/pipeline-check

Both registries publish the same digest; pick whichever your platform already pulls from. Tag flavors are :<version> (e.g. :1.0.4), :sha-<short> for a commit-specific tag (mutable: still resolves through Docker Hub / GHCR), and :latest on master. For true immutable pinning, append the manifest digest: dmartinochoa/pipeline-check@sha256:<full-digest>. docker buildx imagetools inspect dmartinochoa/pipeline-check:<version> prints the digest. /scan is the image working directory, so a -v "$PWD:/scan" bind mount makes the auto-detect walk Just Work. Append CLI flags after the image reference:

docker run --rm -v "$PWD:/scan" dmartinochoa/pipeline-check \
  --pipeline github --output json

For air-gapped or supply-chain-locked environments, pin the image by digest (@sha256:…) rather than tag. The digest for each release is visible on the Docker Hub tags page and on the GHCR package page.

🚀 First scan (auto-detect)

Run with no flags in any supported repo, the working directory is walked for every supported provider's canonical file:

cd your-repo
pipeline_check

Auto-detect looks for: .github/workflows/, .gitlab-ci.yml, bitbucket-pipelines.yml, azure-pipelines.yml, Jenkinsfile, .circleci/config.yml, cloudbuild.yaml, .buildkite/pipeline.yml, .drone.yml / .drone.yaml, Dockerfile/Containerfile, CloudFormation templates (*.yml, *.yaml, *.json at repo root), a kubernetes/ / k8s/ / manifests/ directory of K8s manifests, Helm Chart.yaml, and falls back to aws (live account scan) when nothing matches. OCI manifests (index.json) are not auto-detected because the filename is too generic; pass --pipeline oci or --pipelines github,oci explicitly.

A single match runs through Scanner unchanged. Two or more matches automatically switch to MultiScanner (the same engine --pipelines github,oci activates) so cross-provider attack chains in the XPC-NNN family fire on the union of every sub-scan's findings. The routing decision is announced on stderr so it stays visible in CI logs:

[auto] detected providers: github, dockerfile (running --pipelines github,dockerfile)

When Chart.yaml is present alongside a kubernetes/ / k8s/ / manifests/ directory the Kubernetes provider is dropped, helm renders the templates and feeds them to the K8s rule pack already, so scanning both would double-count.

🎯 Scan a specific provider

pipeline_check -p github                        # short flag
pipeline_check --pipeline github

pipeline_check --pipeline gitlab --gitlab-path path/to/.gitlab-ci.yml
pipeline_check --pipeline azure  --azure-path  azure-pipelines.yml
pipeline_check --pipeline jenkins --jenkinsfile-path Jenkinsfile
pipeline_check --pipeline circleci --circleci-path .circleci/config.yml
pipeline_check --pipeline bitbucket --bitbucket-path bitbucket-pipelines.yml
pipeline_check --pipeline cloudbuild --cloudbuild-path cloudbuild.yaml
pipeline_check --pipeline buildkite --buildkite-path .buildkite/pipeline.yml
pipeline_check --pipeline tekton --tekton-path tekton/
pipeline_check --pipeline argo --argo-path workflows/
pipeline_check --pipeline dockerfile --dockerfile-path Dockerfile
pipeline_check --pipeline kubernetes --k8s-path manifests/
pipeline_check --pipeline helm --helm-path charts/myapp/

pipeline_check --pipeline drone --drone-path .drone.yml
pipeline_check --pipeline oci --oci-manifest index.json

pipeline_check --pipeline cloudformation --cfn-template template.yml
pipeline_check --pipeline terraform --tf-plan plan.json
pipeline_check --pipeline aws --region eu-west-1 --profile prod

# SCM posture (GitHub repo governance via the REST API).
# Token comes from --gh-token or $GITHUB_TOKEN. Without admin
# scope on the repo, the ``security_and_analysis``-driven rules
# (SCM-004 / -005 / -015 / -016) cannot tell ``disabled`` from
# ``unknown`` -- re-run with admin scope to confirm those
# rules' verdicts.
pipeline_check --pipeline scm --scm-platform github \
    --scm-repo octocat/hello-world

# Hermetic mode: read SCM API responses from JSON fixtures
# under DIR. Useful for offline tests and CI runs that don't
# hold a token.
pipeline_check --pipeline scm --scm-platform github \
    --scm-repo octocat/hello-world \
    --scm-fixture-dir ./scm-fixtures/

Full per-provider reference: providers/.

🧩 Scan multiple providers in one run

Cross-provider attack chains (the XPC-NNN family) only fire when the engine sees findings from more than one provider in the same scan. Use --pipelines (plural, comma-separated) to opt in:

# Pull GitHub Actions + OCI manifest into one report; XPC-001 (deploy
# without verifiable provenance) fires when both legs are missing.
pipeline_check --pipelines github,oci

# Per-provider auto-detection still applies; override any single
# provider's path with its companion flag the same way as in
# single-provider mode.
pipeline_check --pipelines dockerfile,kubernetes \
    --dockerfile-path Dockerfile --k8s-path manifests/

--pipelines is mutually exclusive with the single-valued --pipeline.

🛠️ Scaffold a config file

pipeline_check init                 # writes .pipeline-check.yml in cwd
pipeline_check init --path infra/   # redirect output
pipeline_check init --force         # overwrite existing

The init subcommand pre-fills the pipeline: key based on what it finds in the working directory.

Config file reference: config.md.

🚦 Gate a CI build on results

# Fail the build if any HIGH or CRITICAL finding exists
pipeline_check --fail-on HIGH

# Fail if grade drops below B
pipeline_check --min-grade B

# Fail only on new findings vs a committed baseline
pipeline_check --fail-on HIGH --baseline-from-git origin/main:baseline.json

# Snapshot today's findings so future runs gate only on new issues
pipeline_check --write-baseline baseline.json

# Cap total failures
pipeline_check --max-failures 10

For multi-lane CI (pre-commit / PR / release-gate), bundle the gate flags into a named policy file under policies/<name>.yml:

# Pre-commit lane uses a HIGH-only profile
pipeline_check --policy pre-commit

# Release lane uses MEDIUM-fail + attestation rules forced
pipeline_check --policy release-gate

# Enumerate every discoverable policy
pipeline_check --list-policies

Gate details: ci_gate.md. Policy schema: config.md.

🔑 AWS live scans: credentials

The AWS provider uses the standard boto3 credential chain. Any of these work:

# Named AWS CLI profile
pipeline_check --pipeline aws --profile prod

# Environment variables
AWS_PROFILE=prod pipeline_check --pipeline aws
AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... pipeline_check --pipeline aws

# SSO / assume-role
aws sso login --profile prod && pipeline_check --pipeline aws --profile prod

# LocalStack (for testing)
AWS_ENDPOINT_URL=http://localhost:4566 pipeline_check --pipeline aws

Required IAM permissions for a full scan, with a copy-paste IAM policy: see providers/aws.md#required-iam-permissions.

📤 Output formats

pipeline_check --output terminal                   # default (rich table)
pipeline_check --output json                       # machine-parseable
pipeline_check --output html -O report.html        # self-contained file
pipeline_check --output sarif -O scan.sarif        # GitHub/GitLab SAST
pipeline_check --output markdown                   # PR comments
pipeline_check --output junit -O junit.xml         # test-runner UIs
pipeline_check --output both                       # terminal→stderr, JSON→stdout

Format schemas: output.md.

🔍 Filter what gets scanned

# Only run specific checks
pipeline_check --checks GHA-001 --checks GHA-003

# Glob patterns
pipeline_check --checks 'GHA-*' --checks '*-008'

# Only files changed in this branch
pipeline_check --diff-base origin/main

# Suppress noisy findings (per-repo .pipelinecheckignore)
echo "GHA-019" > .pipelinecheckignore

🩹 Auto-fix findings

pipeline_check --fix              # print unified-diff patches to stdout
pipeline_check --fix --apply      # write patches in place
pipeline_check --fix | git apply  # review first, then apply

111 fixers cover pinning, secrets, timeouts, TLS bypass, script injection, Docker flags, Kubernetes securityContext, and more. See individual check pages under providers/ for which have autofix support.

📋 Compliance annotations

Every finding carries control IDs from every enabled standard. Filter:

# Annotate with a single standard
pipeline_check --standard owasp_cicd_top_10

# Multiple standards
pipeline_check --standard nist_ssdf --standard soc2

# List all registered standards
pipeline_check --list-standards

# Print the control-to-check matrix for one standard
pipeline_check --standard-report slsa

Standards reference: standards/.

⛓️ Attack chains

The scanner correlates independent findings into MITRE ATT&CK-mapped kill chains (e.g. "unpinned action + overpermissive token + no approval gate = full-pipeline takeover"). Chains are on by default and print after the findings section.

pipeline_check --list-chains              # one line per registered chain
pipeline_check --explain-chain AC-001     # full reference card

pipeline_check --fail-on-chain AC-001     # gate on a named chain
pipeline_check --fail-on-any-chain        # gate on any matched chain
pipeline_check --no-chains                # disable correlation entirely

Chain gates bypass baseline and ignore-file filtering, a correlated attack path is intrinsically a new finding even when its constituent legs were baselined separately.

Chain reference: attack_chains.md.

🧪 Cross-provider dataflow taint analysis

The TAINT-NNN family is a workflow-wide / pipeline-wide taint engine that follows attacker-controllable input across step, job, template, and reusable-workflow boundaries. Each provider gets its own engine port routed through the host's native cross-step propagation channel:

Rule	Provider	Channel
`TAINT-001`	GHA	`${{ github.event.* }}` flowing through `$GITHUB_OUTPUT` to a same-job step
`TAINT-002`	GHA	The same flow crossing a `jobs.<id>.outputs.*` boundary into another job
`TAINT-003`	GHA	Untrusted input forwarded into a reusable-workflow `with:` input
`TAINT-004`	GitLab CI	`$CI_COMMIT_` / `$CI_MERGE_REQUEST_` flowing through `artifacts.reports.dotenv` to a downstream `needs:` job
`TAINT-005`	Buildkite	`$BUILDKITE_*` flowing through the per-build `buildkite-agent meta-data` store to a downstream step
`TAINT-006`	Tekton	`$(params.<X>)` flowing into `$(results.<Y>.path)` then read via `$(tasks.<producer>.results.<Y>)` in a consumer task's script
`TAINT-007`	Argo Workflows	`{{inputs.parameters.<X>}}` flowing through `outputs.parameters` then read via `{{tasks.<producer>.outputs.parameters.<X>}}` in a consumer template
`TAINT-008`	GitLab CI	`extends:` job-template inheritance carrying tainted `variables:` into a consumer job's scripts. Quote-state aware; transitive across the extends chain with cycle detection.

Each finding carries the full source-to-sink chain in its description. Single-rule scanners stop at the producer's direct-interpolation finding (GHA-003 / GL-002 / BK-003 / TKN-003 / ARGO-005) and miss the actual injection sink one step (or one job, or one template) later. The TAINT family is what catches the cross-boundary flow.

🔐 Dataflow secret detection

--detect-entropy adds a Shannon-entropy pass to the secret detector. It catches custom org tokens with no public prefix (an internal Snowflake token, a custom JWT issuer secret, an opaque session token) that the deterministic prefix-shape catalog can't match:

pipeline_check --detect-entropy

Off by default, turning it on can introduce new findings on previously-clean scans. Layered FP suppression (key-context match, length floor, token shape, deterministic-detector overlap, placeholder markers) keeps signal high; hits are labeled entropy:<redacted> so operators can write targeted ignore rules per-class.

🤖 AI-augmented `--explain`

--ai-explain CHECK_ID prints the deterministic --explain body and appends a clearly-banner-framed AI-generated remediation paragraph grounded in the project's README and an optional context file. Three providers supported, all opt-in:

pip install pipeline-check[ai-anthropic]   # or [ai-openai]
ANTHROPIC_API_KEY=... pipeline_check --ai-explain GHA-016 \
    --ai-context-file docs/security-model.md

Default models: claude-sonnet-4-6 (Anthropic), gpt-4o-mini (OpenAI), llama3.2 (Ollama, stdlib HTTP, no Python dep). The deterministic surfaces (--explain, --list-checks, --list-standards, JSON / SARIF / scoring / gating, attack chains) are unaffected, no AI call fires unless --ai-explain is passed.

📚 Inventory

Emit the list of resources / workflows / templates the scanner discovered, with per-type metadata:

pipeline_check --inventory                       # alongside findings
pipeline_check --inventory-only                   # skip checks entirely
pipeline_check --inventory-type 'AWS::IAM::*'     # glob filter (repeatable)

📥 Multi-scanner SARIF ingest

--ingest <file>.sarif (repeatable) absorbs findings from any SARIF 2.1.0-conformant scanner (Trivy, Checkov, Snyk, KICS, CodeQL, …) into the same scan output as pipeline-check's native findings. External rules become INGEST-<tool>-<rule-id> Finding rows; the chain engine RE-EVALUATES over the union, so cross-tool chains (e.g. XPC-009 — ingested CVE finding + DF-001 mutable runtime image) fire on compositions no individual scanner would surface alone.

# Run pipeline-check natively + ingest a Trivy report
trivy fs --format sarif --output trivy.sarif ./
pipeline_check --pipeline auto --ingest trivy.sarif --output sarif \
    --output-file combined.sarif

# Multiple feeds compose cleanly
pipeline_check --ingest trivy.sarif --ingest checkov.sarif \
    --ingest snyk.sarif

# Ingest-only (pipe one tool's output through pipeline-check's
# correlation engine without running any native rules):
pipeline_check --pipeline auto --checks 'INGEST-*' --ingest trivy.sarif

Severity reads from properties.security-severity (the GitHub-Code-Scanning CVSS-like 0..10 score) when present, falling back to the SARIF level enum (error -> HIGH, warning -> MEDIUM, note -> LOW, otherwise INFO). Failures to parse a feed surface as warnings on stderr; the rest of the scan keeps going. Caps: 25 MiB per file, 5,000 results per file (both configurable via the public Python API in pipeline_check.core.sarif_ingest).

🎓 Vulnerable-by-design benchmark

bench/ ships intentionally-vulnerable fixture sets (one folder per attack pattern, anchored to a real-world incident) plus a runner that asserts pipeline-check fires on every expected check ID for each case. Used as a CI regression gate AND as verifiable coverage proof for adopters.

# Run all cases, recall table to stdout
python bench/run.py

# One case
python bench/run.py --case unpinned-supply-chain

# Machine-readable JSON
python bench/run.py --json

# Pre-populate expected.txt for a new case from current scan output
python bench/run.py --case <slug> --suggest

Exit code is zero only when every case hits 100 % recall. tests/test_bench.py runs the harness as part of the CI suite. The eventual cross-scanner comparison matrix (vs Zizmor / Poutine / Checkov / KICS / Trivy) is tracked under bench/COMPARISON.md with the trade-offs that justify deferring its build.

🌳 Environment variables

Every CLI flag has an env-var equivalent: PIPELINE_CHECK_<FLAG> with dashes converted to underscores. Gate flags nest under GATE:

PIPELINE_CHECK_PIPELINE=github \
PIPELINE_CHECK_GATE_FAIL_ON=HIGH \
pipeline_check

Precedence: CLI > env > config file > defaults.

🚪 Exit codes

Code	Meaning
0	Gate passed
1	Gate failed
2	Scanner error (e.g. AWS API failure, malformed config file)
3	Usage / config error (unknown flag, missing required path, bad YAML)
4	`--ai-explain` request failure (missing SDK, missing key, unknown provider, request error)

Verbose and quiet modes

pipeline_check -v       # debug logs to stderr (per-check timing, API calls)
pipeline_check -q       # suppress all output, rely on the exit code

Extended manual pages

Topic-specific help without leaving the terminal:

pipeline_check --man              # list topics
pipeline_check --man gate
pipeline_check --man autofix
pipeline_check --man secrets
pipeline_check --man standards