Human Centric Video Stabilization
Pipeline that isolates a person from the background and stabilizes their position across every frame.
A complete three-pass pipeline to process a video of a person, isolate them from the background, and stabilize their position on screen. Output includes a stabilized video and a side-by-side comparison with the original.
How It Works
The pipeline runs in three passes:
Pass 1 — Data Collection: MediaPipe detects human pose in every frame. The midpoint between the hip/shoulder keypoints is used as a stable anchor point, and its raw shaky coordinates are stored per frame.
Pass 2 — Trajectory Smoothing: The raw anchor trajectory is passed through a Kalman Filter, which predicts and corrects the person’s position — producing a smooth path that removes high-frequency camera shake.
Pass 3 — Rendering: For each frame, a warp transform moves the person from their original position to the smoothed position via cv2.warpAffine. Optionally, DeepLabv3 removes the background before warping.
Stack
- PyTorch (DeepLabv3) — background removal
- MediaPipe — human pose detection
- Kalman Filter — trajectory smoothing
- OpenCV — video processing and rendering
Results
Benchmarked on a 13s video at 25 fps (328 frames, 1080×1920):
| Time Taken | Device | Background Removal |
|---|---|---|
| 24s | CPU | No |
| 1h 15m | CPU | Yes |
| 3m 27s | GPU | Yes |
Demo
Limitations
- If MediaPipe fails on a frame, the last known position is reused; prolonged failures cause drift
- Tracks a single person — anchors on the first detected pose