Modelfile provider
Parses model declarations on disk, text-only static analysis, no model
pull, no Ollama daemon. Two formats: Ollama Modelfile recipes (the
declarative file that pins a model into the local registry, so this
provider is the "Dockerfile of models") and vendored Hugging Face
config.json model configs. The MODEL-* rules reason over the FROM
base model / ADAPTER references a Modelfile declares and the custom
code a model config wires in. It is the static, declaration-side
complement to the CI-script AI rules (GHA-120/121/122, GL-045..049) that
catch model pulls in build scripts.
Producer workflow
# Defaults to scanning the working tree for a Modelfile / config.json.
pipeline_check --pipeline modelfile
# …or pass it explicitly.
pipeline_check --pipeline modelfile --modelfile-path models/chat.Modelfile
# Recursively scan a directory. The loader matches Modelfile,
# *.Modelfile, Modelfile.<suffix>, and HF model config.json by default.
pipeline_check --pipeline modelfile --modelfile-path models/
All other flags (--output, --severity-threshold, --checks,
--standard, …) behave the same as with the other providers.
Modelfile-specific checks
The MODEL-* pack covers the model supply chain a Modelfile declares:
- MODEL-001, the
FROMbase model must pin an immutable tag or@sha256:digest rather than a bare name or:latest. The model-registry analogue of GHA-001 / DF-001. - MODEL-002, a
FROM hf.co/.../huggingface.co/...base model is pulled straight from a third-party hub, bypassing the curated Ollama library (the source-trust axis). - MODEL-003, a
FROM ./model.gguflocal weights blob has no registry provenance, and a.bin/.ptimport is pickle-backed. - MODEL-004, an
ADAPTERLoRA pulled from a remote source can re-steer the model's behavior and deserves the same pin-and-verify treatment as the base model. - MODEL-005, a vendored HF
config.jsonwhoseauto_mapwires the transformers auto-classes to the model repo's own Python, which runs undertrust_remote_code=True. The model-side complement of GHA-120 / GL-045 (which flag thetrust_remote_codeload in CI scripts).
What it covers
5 checks · 0 have an autofix patch (--fix).
| Check | Title | Severity | Fix |
|---|---|---|---|
| MODEL-001 | Base model pulled without a pinned reference | MEDIUM | |
| MODEL-002 | Base model pulled from a third-party hub | MEDIUM | |
| MODEL-003 | Base model loaded from a local unverified weights blob | LOW | |
| MODEL-004 | LoRA adapter applied from a remote source | MEDIUM | |
| MODEL-005 | Vendored model config declares custom loader code (auto_map) | MEDIUM |
MODEL-001: Base model pulled without a pinned reference
Fires on a FROM whose reference is a registry / hub model (llama3, library/llama3, hf.co/org/model) carrying no tag or an explicit :latest. Does NOT fire on a specific tag, an @sha256: digest, or a local weights file (covered by MODEL-003). Pulling a third-party hub model is sharpened separately by MODEL-002.
Recommended action
Pin the base model to an immutable reference. Prefer an @sha256: digest (FROM library/llama3@sha256:...); failing that, pin a specific, stable tag (FROM llama3:8b-instruct-q4_0) rather than a bare name or :latest, both of which the publisher can move. A pinned reference is what makes a swapped-weights or swapped-template attack show up as a diff in your Modelfile instead of landing silently on the next pull.
MODEL-002: Base model pulled from a third-party hub
Fires on a FROM whose reference begins with hf.co/ or huggingface.co/. This is the source-trust axis; whether that same reference is also unpinned is reported separately by MODEL-001.
Recommended action
Treat a hf.co / huggingface.co base model as an untrusted dependency: vet the uploader, prefer a first-party or curated Ollama-library model, and if the hub model is required pin it to an @sha256: digest (MODEL-001), prefer GGUF / safetensors over pickle-backed formats, and review the baked-in TEMPLATE / SYSTEM the import carries.
MODEL-003: Base model loaded from a local unverified weights blob
Fires on a FROM whose reference is a local path (./, /, ~/, ../) or a bare weights filename (.gguf / .safetensors / .bin / .pt / .pth). Pickle-backed extensions are called out in the finding because they deserialize arbitrary code at load.
Recommended action
Source the base model from a pinned registry / hub reference (MODEL-001) with a recorded digest rather than a loose local weights file, or, if a local file is required, record and verify its checksum out of band and prefer GGUF / safetensors over pickle-backed .bin / .pt formats. A committed binary blob has no provenance a reviewer can check.
MODEL-004: LoRA adapter applied from a remote source
Fires on an ADAPTER whose reference is not a local file (a hf.co / huggingface.co pull or a bare registry-style name). A local adapter file does not fire; pin / verify it out of band.
Recommended action
Vet and pin the adapter the same way as the base model: prefer a local, checksum-verified adapter file, or pin a remote one to an @sha256: digest and review who controls it. An adapter re-steers the model's behavior, so an untrusted or mutable one is a behavior-injection vector.
MODEL-005: Vendored model config declares custom loader code (auto_map)
Fires on a vendored Hugging Face config.json whose auto_map block is non-empty (the file is recognized as a model config by its auto_map / architectures / model_type keys). auto_map points the transformers auto-classes at the model repo's own Python, which runs under trust_remote_code=True. The model-side complement of GHA-120 / GL-045 (which flag the trust_remote_code load in CI scripts).
Recommended action
Review the custom Python the auto_map references (modeling_*.py / configuration_*.py in the model directory) the same way you would any dependency, and pin the model to an exact revision so the code can't change under you. Load the model with trust_remote_code=False (the library default) wherever the model works without its custom classes; if the custom code is required, load it in a job scoped to no production secrets. Prefer models that ship standard architectures and safetensors weights over ones that require remote code.
Adding a new Modelfile check
- Create a new module at
pipeline_check/core/checks/modelfile/rules/modelNNN_<name>.pyexporting a top-levelRULE = Rule(...)and acheck(ctx: ModelfileContext) -> list[Finding]function. The orchestrator auto-discoversRULEand callscheckwith theModelfileContext. - Add a mapping for the new ID in
pipeline_check/core/standards/data/owasp_cicd_top_10.py(and any other standard that applies). - Drop unsafe/safe snippets at
tests/fixtures/per_check/modelfile/MODEL-NNN.{unsafe,safe}.ymland add aCheckCaseentry intests/test_per_check_real_examples.py::CASES. - Regenerate this doc: