Skip to content

Using ffmpeg to cut/trim songs together with crossfade

While doing something completely not-tech for a while, I’ve been teaching partner dancing at a local studio. Every Friday, we have a social dance.

At a social dance, we put on a playlist, but you don’t want to use regular Spotify or the like if you can avoid it … people are often dancing with folks they don’t know, and four minutes is a long time (most songs are between three and five minutes … and could be more!) to dance to one song. And depending on the dance, tiring!

So one of the teachers spends time custom-making these playlists, where the songs are only ~2-3 minutes long, and then it moves onto the next. And a variety of styles of dance.

I’ve used ffmpeg often in conjunction my livecode music to prep field recordings (taken as video) into audio I can use in a set. And lo, of course, if I’ve got an audio [or video] thing to do, ffmpeg is the one to do it!

ffmpeg is so cool, and lots of StackOverflow/StackExchange questions helped me out. I commented my very inefficient, could definitely be much better, bash code below to try to illustrate how this worked.

#!bin/bash
# Scoop up all the music you put in a folder and ordered
# ahead of this.
# Could you do that with computers? Definitely.
# Probably machine-learning something to ID dances and
# create a mixture so one dance isn't next to another.
# Or you can make an ordered folder.
arr=(music/*)

# Convert that first file (these are all mp3s) to a wav
# and cut it down.
# This way so we can operate in lossless wav land.
# atrim trims the audio from the start (0) to 160 seconds in
ffmpeg -i "${arr[0]}" -filter_complex "atrim=0:160.0" next_mix0.wav

# Keep going in the array now that you've got inputs to start with.
# Start i at 1 though since next_mix0 exists
for ((i=1; i<${#arr[@]}; i++)); do
    y=$(($i-1))
    echo "Concatting next_mix$y.wav and ${arr[$i]}"
    ffmpeg -i next_mix$y.wav -i "${arr[$i]}" -filter_complex "[1]atrim=0:160.0[b];
        [0][b]acrossfade=d=10.0" -y next_mix${i}.wav
    # Clean up previous file for tidiness
    rm next_mix$y.wav
done

Hurdles

Understanding & debugging filter_complex is tough. What are those [bracket-y] things doing? I didn’t exactly successfully find great docs on this, so here’s my understanding.

The brackets before a filter argument (ex. “atrim”, which trims the audio) references the input(s) to send to that filter. So the [0] is the first argument. After the filter, there might be another [bracket] and that’s a labelled output. That output can then get used, ex. in the [0][b]acrossfade filter.

Not re-encoding. At first, I was operating on all mp3s, and it took a while to run this. When it eventually worked (to my shock) I spot-checked the track and some of it sounded “watery”. Shout out to DuckDuckGo for understanding my question “ffmpeg sounds watery” enough to lead to the word “distorted”. This led back to one of the first Q&A answers I found, “Crossfade many audio files into one”.

Since an mp3 is compressed but a wav is not (*note, there’s probably much more nuance to this), working with wav both removed the distortion, and I could run the script wildly faster. Like, so fast.

Things I learned

I learned about atrim to trim audio in the filter_complex arg.

I learned about crossfade.

I learned about inputs/outputs in filter_complex, or, at least how I think they work.

I learned a bit about mp3 vs wav!

Some additional shout outs to the Youtube-mp3-cli for no particular reason at all πŸ‘ΌπŸ» and someone’s power hour creator program that I didn’t use, but was like “good, I’m glad someone has done this” [0].

 

[0] A “power hour” is a drinking game popular in U.S. universities where you drink every minute/every time the song changes. It’s not a good idea.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.