I know I'm not the only person who felt a sense of loss when humankind finally accrued the resources and developed the algorithms necessary to detect whether or not somebody was illicitly backing their YouTube videos with music from Universal Music Group. In 2006 it appeared that rampant file sharing had effectively merc'd the ability of the RIAA to enforce its will, and anyone could sample "Everyday I'm Hustlin'" at any time, with total impunity. We'd hustled into an age where anybody with Windows Movie Maker and a Linkin Park mp3 could shamelessly create content with said music as the soundtrack without having to pay an annual $4,000 fee for the license to do so.
With Google's continued development of the Content ID algorithm that scans all uploaded video for fingerprints identifying copyrighted content (music, video, and even video game footage), we lost the anarchic fad of appropriating the digital intellectual property of others for our own use that pervaded for about a decade. According to their 2010 explainer video on the algorithm, YouTube has poured over $100,000,000 into the development of Content ID, and that was nine years ago:
While I admit that the Content ID system is a bit of a let-down and often pulls false positives (such as flagging white noise, silence, and an average creator's unedited speaking voice), I think that its implementation has protected the platform from powerful litigants. If the cost of protecting copyrighted material is a project worth hundreds of millions of dollars, imagine the cost they'd face if YouTube didn't develop Content ID!
Having studied HCI, I know one thing for sure, though: Users will break anything they touch. It's the nature of the beast that we try to exploit and test any system or software we come into contact with, and today, I want to test the limits of Content ID's fingerprinting of popular music.
We need a fat sample size, but not so fat that it takes us forever to perform these tests.
I've struck a happy medium with ten popular songs I can tolerate hearing a ton of times:
- "Billie Jean" - Michael Jackson
- "Agua de Beber" - Astrud Gilberto
- "Love Sosa" - Chief Keef
- "Policy of Truth" - Depeche Mode
- "Chandelier" - Sia
- "Young Lust" - Pink Floyd
- "Gunship Politico" - State Radio
- "The Morning" - The Weeknd
- "What You Know" - T.I.
- "Waiting for the Miracle" - Leonard Cohen
"Billie Jean" is the heavyweight here. It was the most popular song that came to mind; the total opposite of obscure. I figure that if I could warp "Billie Jean" such that Content ID couldn't identify the track, I could do the same with any other song, having applied the same techniques.
I've noticed that you can get certain songs past Content ID unedited. Nobody's protecting them. YouTube doesn't have a public database of what songs it protects, though, so figuring these 'safe' tracks out is a matter of intuition and luck, and we're not looking to exploit this today. We're trying to figure out the level of a song's distortion that YouTube decides that I've created a 'unique,' fair use work.
I'm going to take these ten songs and run them through several of sets of 'distortions;' filters from a basic sound-editing program (Adobe Audition) that might feasibly 'hide' the identity of the songs from YouTube's Content ID algorithm. In total, there'll be 29 variants of distortion per song (with many being combinations of two or three distortions), making for a total of 290 tested audio files. I'll then package the 290 files into videos, upload these videos on to YouTube, and see what gets claimed by Content ID.
We'll be using the Krusher bitcrusher VST by Tritik to downsample our songs at four different levels to test the algorithm's ability to identify copyrighted content! Here's "Agua de Beber" downsampled by 34%:
Wikipedia has a comprehensive article on downsampling that I can't even pretend to understand, but let your ears speak for themselves: downsampled audio sounds like something emitted from a tinny 16-bit game console. I imagine that it's like the audio equivalent of pixelization. Here's downsampling at 38.4%:
As you can hear, with greater downsampling, the audio loses 'fidelity.' At 38.4% we can still discern the lyrics and melody. Now listen to 44%:
44% is 'crunchy.' It's not particularly pleasant to listen to. At this point we're getting near something that sounds like haunted, circuit-bent Tickle Me Elmos. And, finally, brace yourself for 54.8%:
The rhythm's still there, but we've lost anything remotely resembling vocals or instrumentation. A human familiar with this song could still identify it, but it'd be miraculous if the algorithm could do the same with any song downsampled this hard.
We'll "stretch" or "lengthen" songs as well. Nowadays you can do it while retaining the original pitch of the song. Here's "Agua de Beber" at half its normal speed:
And we'll shift the pitch of our songs. Here's the same song, pitched up:
And here it is pitched down:
In the past channels would upload copyrighted content only slightly pitched higher or sped up; that's how I watched The Boondocks and Daria in high school. Nowadays pitch-shifting isn't the silver bullet it once was. I'm more interested in investigating if has any efficacy at all in passing Content ID.
I've stuck together this simple web app (the Sample Player 2000) so you can listen to the samples I tested yourself and peruse the results. Try spotting the patterns!