A model on Hugging Face claims to be trained from scratch. The model card says it is a novel architecture. The metadata looks clean.

How do you know it is not a modified copy of a restricted-weight model with a new config file?

Before April 2026, the answer was basically “trust the model card.” Cisco’s Model Provenance Kit changes that. It is an open-source Python toolkit that examines both metadata and actual learned parameters to determine whether a transformer model derives from a known base model family. Think of it as a paternity test for AI weights.

Repository: github.com/cisco-ai-defense/model-provenance-kit License: Apache 2.0 Language: Python 3.12+ Stars: 92 (fresh, April 2026) Part of: Cisco AI Defense — open-source AI security tools

Cisco Model Provenance Kit GitHub repository page showing the README, file structure, and CLI documentation.


Why this exists

Over 2 million models on Hugging Face, and almost none have verified provenance. A model card claiming a model was trained from scratch may describe a modified copy of another model. Metadata can be falsified. Config files reveal architecture but not origin — models from Meta, Alibaba, DeepSeek, and Mistral all share building blocks like grouped-query attention and rotary embeddings.

Without provenance information, you cannot know whether a model you are deploying:

  • Was derived from a model under export restrictions
  • Carries a restrictive license that conflicts with your use case
  • Inherits vulnerabilities from a poisoned parent model
  • Was fine-tuned from a base you do not have rights to

The Model Provenance Kit addresses this by extracting multi-signal fingerprints from the weights themselves — signals that survive fine-tuning, quantization, LoRA merging, and even vocabulary changes.


Installation

git clone https://github.com/cisco-ai-defense/model-provenance-kit.git
cd model-provenance-kit
uv sync

This installs the provenancekit CLI. Python 3.12+ required.

One-time setup: download weight fingerprints

The deep-signal fingerprints are pre-computed weight features stored as parquet files on Hugging Face. Without them, scans use metadata and tokenizer signals only.

provenancekit download-deepsignals-fingerprint --update

Walkthrough: compare two models

The compare command runs a pairwise analysis between two Hugging Face models:

provenancekit compare z-ai/GLM-5.2 moonshotai/kimi-k2.7-code

The output includes:

Stage 1 — Metadata screening:

MFI Tier: 3 (weak metadata match — different architectures)
Architecture: GLM-5.2 uses DeepSeek-V4-style MoE, K2.7 uses Kimi MoE

Stage 2 — Weight signal extraction:

  EAS (Embedding Anchor Similarity):    0.12   — unrelated
  NLF (Norm Layer Fingerprint):         0.08   — unrelated
  LEP (Layer Energy Profile):           0.21   — unrelated
  END (Embedding Norm Distribution):    0.15   — unrelated
  WVC (Weight-Value Cosine):            0.03   — unrelated

Pipeline verdict:

Pipeline Score: 0.12
Verdict: NOT MATCHED

As expected — two independently trained models from different labs produce near-zero weight correlation. The value here is confirmation: if someone claimed GLM-5.2 was a fine-tune of Kimi K2.7, the signals would prove otherwise.


Walkthrough: scan against the reference database

The scan command matches a single model against the kit’s bundled database of ~150 known base models across 45 families.

provenancekit scan meta-llama/Llama-3.3-70B --json

The database covers models from Meta, Google, DeepSeek, Zhipu AI, Moonshot AI, Mistral, Microsoft, NVIDIA, Cohere, OpenAI, and more — ranging from 135M to 70B+ parameters.

scan returns ranked matches with scores and verdict labels, capped by --top-k (default 3) and --threshold (default 0.50).


How the signals work

The pipeline combines three categories of evidence:

CategorySignalsWhat it detects
MetadataMFI — 3-tier gate from config.jsonArchitecture family (milliseconds)
TokenizerTFV (11-component vector), VOA (Jaccard overlap)Shared tokenizer lineage
WeightsEAS, NLF, LEP, END, WVCActual learned parameter fingerprint

The weight signals are what make this work. They are designed to survive transformations that would disguise a model’s origin:

  • EAS (Embedding Anchor Similarity): Geometric relationships between token embeddings — unique to a training run, preserved through fine-tuning.
  • NLF (Norm Layer Fingerprint): Stability patterns in small normalization layers — resistant to quantization and pruning.
  • LEP (Layer Energy Profile): Energy distribution curves across network depth — distinctive even after LoRA merging.
  • END (Embedding Norm Distribution): Magnitude distributions encoding word frequency patterns from the original corpus.
  • WVC (Weight-Value Cosine): Direct parameter correlation — near zero for independently trained models, high for derivatives.

Signal weights are calibrated via Cohen’s d on a 111-pair benchmark. When a signal returns NaN (e.g., different layer structures), it is excluded and remaining weights are rescaled proportionally.


Score interpretation

Pipeline ScoreVerdictMeaning
S = 1.0 or MFI Tier ≤ 2Confirmed MatchSame base model, verified by architecture + weights
S > 0.75High-Confidence MatchVery likely derived from the matched base
0.65 < S ≤ 0.75Weak MatchPossible relationship, needs manual review
S ≤ 0.65Not MatchedNo evidence of shared provenance

The toolkit’s benchmark on a 111-pair test set:

MetricScore
Accuracy96.4%
Precision98.1%
Recall94.6%
Standard derivatives (fine-tunes, quantizations, LoRA)100% recall

Where it falls short

The benchmark identified 4 misclassifications out of 111 pairs, all at the edges:

  • 12-layer → 4-layer distillation (too much structural change)
  • Vocabulary rebuild for domain pretraining (tokenizer signals lost)
  • Extreme architectural surgery (e.g., swapping attention mechanisms)

The authors document these as fundamental limits. The tool is not a guarantee — it is evidentiary. A “Not Matched” verdict means no signal of shared provenance was found, not that one definitively does not exist.


When to add this to your pipeline

  • Before deployment: Run provenancekit scan on any model you download from Hugging Face to verify its claimed origin.
  • After an incident: Compare a compromised model against its supposed base to confirm whether it was tampered with.
  • For compliance: Generate provenance reports as evidence for regulatory requirements (EU AI Act, NIST AI RMF).
  • In CI: Integrate scanning into your model acquisition pipeline to flag unverified weights before they reach production.

The entire pipeline runs on CPU. Architectural screening takes milliseconds. Feature extraction takes seconds to minutes depending on model size, and results are cached for reuse.



Sources


About the author

Charles Jasthyn De La Cueva is a full-stack developer and the founder of Open TechStack. He writes about AI engineering, developer tools, and practical model evaluation — grounded in real workflows, not press releases.