GDPR + AI in Frontend: Safe LLM Integration Without PII Leaks in 2026

At Symfio we run a multi-tenant SaaS with users across the EU. When we started adding AI-powered suggestions to the dashboard — autocomplete, content generation, smart search — our legal team flagged a question nobody on the engineering side had fully thought through: what exactly are we sending to the LLM provider, and does any of it count as personal data?

It turned out the answer was "yes, sometimes, and it depends." That kicked off a three-month effort to build a privacy-safe AI layer in our React frontend. This article documents what we learned.

Why Frontend Is the Riskiest Layer

Most GDPR + AI discussions focus on the backend: data retention policies, data processing agreements (DPAs) with providers like OpenAI or Anthropic, audit logs. All of that matters. But the frontend is where the data originates, and it is the layer developers touch most casually.

Consider what flows through a typical AI feature:

The user's free-text input — which might contain their name, address, or medical information
Context injected by the app — often the user's profile, recent activity, or account data
System prompts — which might inadvertently include PII from a database query

Each of these is a potential leak vector. And unlike a backend data breach, a frontend prompt leak can happen silently, in production, with no error logged anywhere.

GDPR note Under GDPR Article 4(1), "personal data" is any information relating to an identified or identifiable person. A user's first name, email, IP address, or even a unique identifier in a prompt can qualify. When in doubt, treat it as PII.

The Three-Layer Defense Model

We settled on a model with three distinct layers of protection. None of them is sufficient alone; together they make a defensible system.

Layer 1: Input Sanitization Before the Prompt

The first line of defense is stripping or masking PII before it enters any prompt. We built a lightweight sanitizeForPrompt utility that runs on the client before any API call:

// utils/sanitize-prompt.ts

const EMAIL_RE = /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/g;
const PHONE_RE = /(\+?\d[\d\s\-().]{7,}\d)/g;
// Simplified — production uses a more complete pattern set

export function sanitizeForPrompt(input: string): string {
  return input
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(PHONE_RE, '[PHONE]');
}

// Usage in a React component
const handleSubmit = async (userInput: string) => {
  const safeInput = sanitizeForPrompt(userInput);
  const response = await callLLM({ prompt: buildPrompt(safeInput) });
};

Note Regex-based PII detection has false negatives. It is a first pass, not a guarantee. Pair it with server-side validation and a DPA with your LLM provider that prohibits training on your data.

Layer 2: Explicit Context Allowlisting

The more dangerous leak is not user input — it is the context your app injects into prompts. Developers naturally reach for "give the AI more context" and end up serializing an entire user object into a system prompt.

We introduced an explicit allowlist pattern. Instead of passing the full user object, every AI feature declares exactly which fields it needs:

// types/ai-context.ts

type AllowedUserContext = {
  accountTier: 'free' | 'pro' | 'enterprise';
  locale: string;
  preferredLanguage: string;
  // Notably absent: name, email, id, address
};

function buildSystemPrompt(ctx: AllowedUserContext): string {
  return `You are a helpful assistant for a ${ctx.accountTier} user.
Respond in ${ctx.preferredLanguage}.`;
}

// TypeScript enforces the boundary — user.email won't compile here
const context: AllowedUserContext = {
  accountTier: user.accountTier,
  locale: user.locale,
  preferredLanguage: user.settings.preferredLanguage,
};

TypeScript does the enforcement here. If someone tries to add user.email to the context object, it will not compile. This is the kind of guard that actually survives team growth — it does not rely on code review catching a subtle mistake.

Layer 3: Network-Level Audit Logging via Proxy

The third layer is observability. We proxy all LLM API calls through our backend rather than calling the provider directly from the browser. This gives us:

A complete audit log of every prompt and response
The ability to redact logs before storage using server-side PII detection
Rate limiting and cost controls per user
A single point to rotate API keys without touching the frontend

Tip If you cannot build a proxy immediately, at minimum ensure your LLM provider has signed a DPA and opted you out of training data usage. For EU users, confirm the provider can process data within the EU or has an adequacy agreement.

Handling Streaming Responses Safely

Streaming is now standard for LLM UIs — users expect to see tokens arrive as they are generated. But streaming adds a wrinkle: the response arrives in chunks, so you cannot validate the full output before rendering it.

We handle this with a buffered hook that streams tokens to state while keeping the abort controller accessible for cancellation:

// hooks/use-streaming-llm.ts

export function useStreamingLLM() {
  const [output, setOutput] = useState('');
  const abortRef = useRef<AbortController | null>(null);

  const stream = async (prompt: string) => {
    abortRef.current = new AbortController();
    setOutput('');

    const response = await fetch('/api/ai/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt }),
      signal: abortRef.current.signal,
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      const chunk = decoder.decode(value, { stream: true });
      setOutput(prev => prev + chunk);
    }
  };

  const cancel = () => abortRef.current?.abort();

  return { output, stream, cancel };
}

Consent and Transparency

Engineering controls are necessary but not sufficient. GDPR also requires that users know their data is being processed by a third-party LLM. Our approach:

First-use disclosure — a one-time modal the first time a user activates an AI feature, explaining what data is sent and to whom
Persistent indicator — a subtle "AI" badge on any input that sends data to an LLM
Opt-out support — enterprise users can disable AI features entirely at the account level, stored server-side, not in localStorage

What We Would Do Differently

Looking back, the biggest mistake was retrofitting these controls onto features that were already in production. It would have been significantly cheaper to establish the proxy architecture and the allowlist pattern before the first AI feature shipped.

If you are starting today: treat any user input that touches an LLM as if it were a form field being sent to a third-party analytics provider — because that is effectively what it is. Design for that from day one.

Privacy-by-design is not a legal checkbox. It is an architecture decision. And like most architecture decisions, it is much easier to build in than to bolt on.

Key Takeaways

Sanitize user input client-side before building any prompt
Use TypeScript to enforce an explicit allowlist of context fields — never serialize full user objects
Proxy all LLM calls through your backend; never call the provider directly from the browser
Ensure your provider has signed a DPA; for EU users, verify data residency
Build consent disclosure and opt-out before shipping, not after