Extract, Replace, and Mix Audio Tracks in FFmpeg

Thu Jul 25 2024

Pull audio from video, swap music tracks, mix commentary, adjust volume, and sync audio perfectly with visual content.

Audio editing is just as important as video processing. FFmpeg excels at extracting audio streams, replacing background music, mixing multiple audio sources, and fixing sync issues. This guide walks you through practical audio workflows that complement your video projects.

Extract Audio from Video

# Extract as AAC (preserve original audio codec)
ffmpeg -i video.mp4 -vn -c:a copy audio.aac

# Extract and convert to MP3
ffmpeg -i video.mp4 -vn -c:a libmp3lame -q:a 2 audio.mp3

# Extract multiple audio tracks (if present)
ffmpeg -i video.mkv -map 0:a:0 -c:a copy track1.aac
ffmpeg -i video.mkv -map 0:a:1 -c:a copy track2.aac

# High-quality WAV for editing
ffmpeg -i video.mp4 -vn -c:a pcm_s16le audio.wav

The -vn flag tells FFmpeg to ignore video streams. For archival or editing, WAV or FLAC preserve quality without compression artifacts.

Replace Audio Track

# Replace audio entirely (keep video)
ffmpeg -i video.mp4 -i new_audio.mp3 -c:v copy -c:a aac -b:a 192k -map 0:v:0 -map 1:a:0 output.mp4

# Replace and trim audio to match video length
ffmpeg -i video.mp4 -i music.mp3 -c:v copy -c:a aac -shortest output.mp4

# Replace with time offset (delay audio by 0.5 seconds)
ffmpeg -i video.mp4 -itsoffset 0.5 -i new_audio.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output.mp4

The -shortest option is crucial when your audio is longer than video—it stops encoding when the shorter stream ends.

Mix Multiple Audio Sources

# Mix two audio tracks (50/50 volume)
ffmpeg -i video.mp4 -i commentary.mp3 -filter_complex "[0:a][1:a]amix=inputs=2:duration=first" -c:v copy -c:a aac output.mp4

# Mix with volume control (video audio 70%, music 30%)
ffmpeg -i video.mp4 -i music.mp3 \
  -filter_complex "[0:a]volume=0.7[a0];[1:a]volume=0.3[a1];[a0][a1]amix=inputs=2:duration=first" \
  -c:v copy -c:a aac output.mp4

# Add commentary over original audio
ffmpeg -i video.mp4 -i voice.mp3 \
  -filter_complex "[0:a]volume=0.4[a0];[1:a]volume=1.0[a1];[a0][a1]amix=inputs=2:duration=first" \
  -c:v copy -c:a aac narrated.mp4

Adjust Volume and Normalize

# Increase volume by 50%
ffmpeg -i input.mp4 -vf copy -af "volume=1.5" output.mp4

# Decrease volume to 50%
ffmpeg -i input.mp4 -c:v copy -af "volume=0.5" output.mp4

# Normalize audio with loudnorm (two-pass for best results)
ffmpeg -i input.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json -f null -
# Use the output stats from above in the second pass
ffmpeg -i input.mp4 -c:v copy \
  -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-18.0:measured_TP=-1.2:measured_LRA=9.5:measured_thresh=-28.0:offset=0.5:linear=true \
  normalized.mp4

Loudness normalization is essential for consistent playback across different platforms. The two-pass approach measures first, then applies precise adjustments.

Sync and Delay Audio

# Delay audio by 200ms (fix sync issues)
ffmpeg -i input.mp4 -itsoffset 0.2 -i input.mp4 -map 0:v -map 1:a -c:v copy -c:a aac synced.mp4

# Advance audio by 300ms (start audio earlier)
ffmpeg -i input.mp4 -itsoffset -0.3 -i input.mp4 -map 0:v -map 1:a -c:v copy -c:a aac synced.mp4

# Add silence at the start (1 second)
ffmpeg -i input.mp4 -af "adelay=1000|1000" -c:v copy output.mp4

Remove or Mute Audio

# Strip all audio from video
ffmpeg -i input.mp4 -an -c:v copy silent.mp4

# Mute specific time range (5s to 10s)
ffmpeg -i input.mp4 \
  -af "volume=enable='between(t,5,10)':volume=0" \
  -c:v copy output.mp4

Stereo to Mono and Channel Manipulation

# Convert stereo to mono
ffmpeg -i stereo.mp4 -ac 1 -c:v copy mono.mp4

# Swap left and right channels
ffmpeg -i input.mp4 -af "channelmap=1-0|0-1" -c:v copy swapped.mp4

# Extract only left channel as mono
ffmpeg -i stereo.mp4 -af "pan=mono|c0=c0" -c:v copy left_only.mp4

# Extract only right channel as mono
ffmpeg -i stereo.mp4 -af "pan=mono|c0=c1" -c:v copy right_only.mp4

Fade In and Fade Out

# Fade in audio over first 3 seconds
ffmpeg -i input.mp4 -af "afade=t=in:st=0:d=3" -c:v copy fadedin.mp4

# Fade out over last 4 seconds (calculate start time: duration - 4)
ffmpeg -i input.mp4 -af "afade=t=out:st=26:d=4" -c:v copy fadedout.mp4

# Both fade in and fade out
ffmpeg -i input.mp4 -af "afade=t=in:st=0:d=2,afade=t=out:st=28:d=2" -c:v copy faded.mp4

For the fade out, you need to know the video duration first. Use ffprobe to check:

ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 input.mp4

Advanced: Ducking (Lower Music During Speech)

# Duck background music when speech is present
ffmpeg -i video.mp4 -i music.mp3 \
  -filter_complex "[0:a]asplit[sc][mix];[sc]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[compr];[1:a][compr]sidechaincompress=threshold=0.03:ratio=3:attack=200:release=1000[bg];[mix][bg]amix=duration=first" \
  -c:v copy -c:a aac ducked.mp4

This technique automatically reduces music volume when someone speaks, common in podcasts and tutorials.

Batch Process Audio Extraction

#!/bin/bash
# Extract audio from all MP4 files
for f in *.mp4; do
  ffmpeg -i "$f" -vn -c:a libmp3lame -q:a 2 "${f%.mp4}.mp3"
done

# Replace audio in all videos with a single music track
for f in *.mp4; do
  ffmpeg -i "$f" -i background_music.mp3 -c:v copy -c:a aac -shortest "music_${f}"
done

Pitfalls

When replacing audio, always check the duration match; use -shortest to avoid blank video frames.
Audio delay/advance requires careful measurement; use video editing software to identify exact frame offsets first.
Mixing too many audio sources can cause clipping; keep total volume under 100% or use compression.
The amix filter defaults to averaging inputs; use volume filters before mixing for precise control.
Not all containers support multiple audio tracks; MKV is more flexible than MP4 for multi-track storage.

Real-World Example: Video with Commentary

Let's say you have a screen recording and want to add background music plus voice commentary:

# Step 1: Extract original audio (keyboard/system sounds)
ffmpeg -i screencast.mp4 -vn -c:a copy original_audio.aac

# Step 2: Mix original (low volume) + music (mid) + commentary (high)
ffmpeg -i screencast.mp4 -i music.mp3 -i commentary.mp3 \
  -filter_complex "[0:a]volume=0.3[sys];[1:a]volume=0.2[mus];[2:a]volume=1.0[vox];[sys][mus][vox]amix=inputs=3:duration=first" \
  -c:v copy -c:a aac final_video.mp4

With these audio techniques, you have complete control over your video's sound design, whether you're fixing sync issues, adding music, or creating professional multi-track mixes—all from the command line.