Skip to content

Privacy Model

The core guarantee

With DISTILLER_PROVIDER=ollama (default), your raw text never crosses a network boundary.

Tool call (Read, Bash, Edit, etc.)
        │
        ▼
   ┌─────────────┐
   │ PostToolUse  │ ← Claude Code hook, fire & forget
   │    hook      │
   └──────┬──────┘
          │ curl POST to localhost
          ▼
   ┌─────────────┐
   │ private_store│ ← JSONL queue, never synced, local only
   └──────┬──────┘
          │ background worker
          ▼
   ┌─────────────┐
   │  Distiller   │ ← ollama: localhost / gemini: Google API
   └──────┬──────┘
          │ distilled fact (no names, no emotion, no PII)
          ▼
   ┌─────────────┐
   │   Scanner    │ ← redacts any leaked secrets
   └──────┬──────┘
          │
          ▼
   ┌─────────────┐
   │  Team DB     │ ← team-safe knowledge
   └─────────────┘

What makes this different

Every "memory MCP" stores your raw text in a database. Distill doesn't. The LLM is a mandatory privacy gateway that transforms tool I/O into impersonal team knowledge. With DISTILLER_PROVIDER=ollama, your raw data never leaves your machine. Observations are captured automatically via hooks — raw text exists only in a local JSONL queue until the background worker distills it.

FAQ

Question Answer
Does Anthropic see my raw input? No. It goes to the distiller: Ollama (local) or Gemini (Google).
Can my team read what I typed? No. Only the distilled fact is stored.
Can my manager see who wrote what? Only if you opt in (AUTH_ENABLED=true). Anonymous by default.
Where is my raw text? ~/.distill/private/ on your machine. Delete anytime.
What if distillation leaks a name? The scanner checks all distilled output for PII and secrets before saving.

The scanner: secrets and PII

The scanner runs at two points in the pipeline: before distillation (to protect the local LLM input) and after distillation (to catch anything the model reproduced). It detects two categories of sensitive content:

Secrets

API keys, tokens, passwords, connection strings — anything that looks like a credential. These are redacted with [REDACTED] markers before the text proceeds.

PII

In addition to secrets, the scanner detects personally identifiable information:

PII type Examples Handling
Email addresses user@company.com Redacted
Phone numbers +1-555-0123, (555) 867-5309 Redacted
URLs and domains internal.corp.net, 192.168.1.1/admin Redacted (with allowlist)
IP addresses 10.0.0.5, 2001:db8::1 Redacted
SSNs 123-45-6789 Redacted
Credit card numbers 4111-1111-1111-1111 Redacted

URL allowlist: Public sites like github.com, pypi.org, and stackoverflow.com are not redacted — these appear frequently in technical knowledge and aren't personally identifying.

Scanner coverage

The scanner runs on all paths that write to the team database:

  • Auto-observe pipeline — scans raw tool I/O before distillation, scans distilled output after
  • update_memory() — scans the new input through the full pipeline

All write paths pass through PII and secret scanning.

Author modes

Mode Behavior
anonymous (default) No author attribution stored
AUTH_ENABLED=true Git identity (user.email) used for ownership and RLS enforcement

When authentication is enabled, PostgreSQL Row-Level Security policies enforce that only the author can modify or delete their memories. Anonymous users retain read-only search access.

Cloud distillation (DISTILLER_PROVIDER=gemini)

Setting DISTILLER_PROVIDER=gemini sends raw text to Google's Gemini API for distillation. This means raw text leaves your device.

Use this when:

  • Local compute resources are limited (no GPU, low RAM)
  • The privacy tradeoff is acceptable for your team
  • You want $0 cost without running Ollama

The same distillation prompt runs on Gemini — names, emotions, and PII are still stripped. The scanner still checks output for leaked secrets. The difference is the distillation happens in Google's cloud, not on your machine.