AI-Powered Face Anonymisation For Clinical Data Management

How we built a production-ready video anonymisation pipeline with four deployment modes — from on-premise to real-time — for a regulated content platform.

Pipeline modes (on-prem, cloud, hybrid, real-time)

% 0 +

Face detection accuracy (hybrid)

Audio frames lost on output

The Challenge

A clinical data management company needed to use real patient-interaction footage for training and analytics — but every frame with an identifiable face was a compliance liability under GDPR, India’s DPDP framework, and tightening US state biometric laws. Manual blurring in After Effects was slow, inconsistent, and couldn’t scale. Off-the-shelf tools kept failing in three ways: missing faces in profile shots, producing flickering output, and silently stripping the audio track.

What we Built

A unified face-anonymisation architecture with four deployment modes, each optimised for a different constraint:

On-premise (MTCNN)

Footage never leaves the network. Zero per-minute cost. Built for medical, defence, and HR content.

Cloud-scale (Rekognition)

Parallel worker threads against AWS. An hour of 4K video processed in minutes, not hours.

Hybrid (dual-detector)

Two independent models; blur anything flagged by either. Near-zero false negatives for regulated submissions.

Real-time (YOLO + MediaPipe)

Live preview with on-the-fly blur controls. Runs on modest hardware. Built for broadcast and events.

Technology Used

Backend

Python

Computer Vision

OpenCV
MTCNN
YOLOv8

AI/ML Components

AWS Rekognition
MediaPipe

Video Processing

FFmpeg
MoviePy

Key Engineering Decisions

Detection JSON as a contract — every run produces an auditable, frame-by-frame detection record separate from the rendered video. Compliance teams can review what was detected without re-running the model.
Temporal smoothing — when a face is detected in frames N-1 and N+1 but missed in N, the pipeline interpolates. No flicker. Continuous output.
Audio preservation — original audio track preserved bit-for-bit through every pipeline mode. No silent clips, no re-sync workflow.
Configurable output — blur strength, pixelation density, mask shape, and anonymisation style are all parameters. Brand-safety and editorial teams get different settings from the same engine.

Results

Unlocked use of real patient-interaction footage for training — previously shelved for compliance reasons
Turnaround from raw footage to anonymised, audit-ready output dropped from days to minutes
Deterministic re-runs: same input + same detection JSON = byte-identical output, satisfying regulatory audit requirements
Single architecture serves four deployment contexts without code duplication

"Bring us a clip; we'll show you what comes back. Faces, gone. Audio, intact. Deadlines, met."

Want to know more? Book a free 30-minute consultation

Book a Call