How Jiffy scans AI artifacts: a technical overview
The detection pipeline end to end -- signatures, heuristics, sandboxed execution, cross-ecosystem dedupe, and scoring. What runs where, and why.
This post is the short technical tour. Deeper docs are in the repo.
The pipeline
ingest --> inventory --> static scan --> sandbox scan --> dedupe --> scoring --> catalog
1. Ingest
Sources we pull from:
- GitHub orgs — OAuth app against an org; we walk repos and detect artifacts by path pattern (skill directories, rule files,
mcp.json, agent config shapes). - Anthropic Skills marketplace — public listings, pulled on a schedule.
- Hugging Face — spaces and repos tagged as Claude skills or MCP servers.
- Registry APIs — npm, PyPI, and cargo for packages that declare MCP server or skill metadata.
- Direct uploads — a customer pushes an artifact file directly via our API.
Every ingest is tagged with provenance: the source, the timestamp, the commit SHA or release tag when applicable, and the claimed publisher.
2. Inventory
Before scanning we normalize the artifact into a common structure:
interface Artifact {
kind: 'skill' | 'mcp-server' | 'rule-file' | 'agent-repo' | 'config';
id: string; // cross-ecosystem stable ID
sources: Source[]; // registries / repos where it has been observed
files: ArtifactFile[]; // content-addressed file list
manifest: ArtifactManifest;
declaredCapabilities: DeclaredCapabilities;
}
The declaredCapabilities field is what the artifact says it does. The later pipeline stages check observedCapabilities against it. Divergence is a flag.
3. Static scan
Three parallel pass types:
Regex signatures — the high-precision layer. Credential patterns (AWS, Stripe, GitHub, Slack, OpenAI, Anthropic tokens, generic high-entropy strings in env contexts), known-bad URLs, known-bad publisher handles.
AST patterns — Python, JavaScript, TypeScript, shell. We build the AST and match against patterns like "env var read followed by network POST to non-allowlisted host". This catches credential exfiltration that regex alone misses.
Prompt-content heuristics — for the Markdown parts of the artifact (SKILL.md, rule files, prompt bundles). A separate model (not the agent under protection) classifies prompt content against the Jiffy taxonomy: override, smuggling, sub-agent spawn, prompt-injection boilerplate, instructional network-destination reference.
Each static finding includes file pointer, line range, signature ID, confidence weight, and a short human-readable explanation.
4. Sandbox scan
Artifacts that include executable code get a sandboxed execution pass. We use E2B microVMs with:
- Per-invocation teardown.
- A network egress policy that defaults to deny, with an allowlist derived from
declaredCapabilities. - Filesystem and process telemetry via a lightweight collector inside the sandbox.
We exercise the artifact's initialization path plus any exported tools with synthetic inputs. Every network destination and every filesystem write is logged. Divergence from the declared capabilities becomes a finding.
5. Cross-ecosystem dedupe
The same malicious artifact shows up in multiple places. A rule file on GitHub, an MCP server on npm under a different publisher name, a skill on Hugging Face. We dedupe against:
- Content hash — identical file bytes.
- Near-match hash — minor diffs (whitespace, rename, boilerplate change). We use a locality-sensitive hash that tolerates these.
- Behavioral signature — same observed capabilities, same network destinations, same AST shape even under renaming.
A deduped entry in the catalog records all registries where it appears. The ID is stable. If it shows up in a sixth place next month, it joins the existing record.
6. Confidence scoring
A weighted score combining:
- Static signature strength (weighted by historical precision).
- Runtime findings (heavier weight).
- Capability divergence (declared vs observed).
- Publisher reputation (a signed publisher with history outranks a first-time anonymous one).
- Catalog cross-references.
Outputs a tier:
- Trusted — signed publisher, no findings, observed behavior matches declaration.
- Caution — minor findings, unsigned publisher, or capability declaration issues.
- Risky — static or runtime findings that match known-bad patterns at moderate precision.
- Malicious — matches a confirmed entry in the catalog or produces high-confidence runtime evidence of exfil / smuggling / spawn.
Every score is a structured document you can audit. Policy layers can override — a customer's policy engine can downgrade Caution to Trusted for specific first-party publishers.
The public catalog
Every Malicious entry (and, with publisher opt-in, Risky entries) flows into the public catalog at intel.jiffylabs.app. Each catalog entry includes:
- Stable ID.
- Observed registries and publishers.
- Content hashes and near-match siblings.
- Signature matches and runtime findings.
- Confidence score and tier.
- First-seen and last-seen timestamps.
- A JSON export via the public API.
The catalog is licensed CC BY 4.0. Downstream security tools are welcome to ingest it directly.
What you get when you connect Jiffy
- Inventory across skills, MCP servers, rule files, agent repos for your connected orgs.
- Scored findings for each.
- Policy-configurable block / allow / quarantine actions.
- Continuous re-scan on artifact updates.
- Webhook and API access for integration with your SIEM, SOAR, MDM, or Zero Trust layer.