Waiting For Podcast Transcripts

John Gruber talks about one of podcasting’s unsolved problems:

What’s the biggest problem with podcasts, in general? I have one specific gripe, and it applies to almost every podcast I’m aware of, and it’s the lack of transcripts. For a couple of reasons: one, for me selfishly, wouldn’t it be great to have a searchable archive of all the shows that I’ve done? … and then the other big thing would be, it’d be Google-searchable by everybody else, not just me…

The Talk Show, Dec 13, 2013, @5:57 mark

He’s right: podcasts aren’t text. You can’t search them or scan them.

This daydream of Gruber’s is one shared by many web publishers: to give every recording a text equivalent that can be indexed and searched and infused with Google-juice. Maybe that would be nice; podcasts could become full-fledged citizens of the web again, instead of the distant satellites they are now.

An even more important consideration, though, is durability. Historically, text has a far higher long-term survival rate than recorded audio. Every book or magazine ever printed can still be read; but every radio broadcast ever recorded is trapped inside a tape. The same is true of digital text and audio. Any text you put on the web, no matter how casual or insignificant, is granted eternal life pretty much by default — you don’t even have to give a thought to preserving it. And even text that predates the web has a good chance of getting broomed into an online collection somewhere. The same is not true of recorded audio. Because audio isn’t searchable, the vast archival machines of the web ignore it altogether.

At some point, voice recognition will improve to the point where it can make perfect transcriptions from recordings of casual conversations, even to the point of distinguishing between speakers, and then folks like Gruber can have their cake and eat it too: zero-preparation podcasts and complete transcripts.1 Almost no one will read any one transcript, because a zero-prep podcast is usually fish-wrap after a few weeks. But as a whole, the “long tail” of artifacts made by podcasters and broadcasters has great future value; textifying them would keep them around in useful form long enough to realize it.

Meanwhile: podcasts that actually have some preparation put into them already have plenty of accompanying text. Because preparation means organizing your thoughts, and that means writing, which means text. Whether you’re concerned about utility or durability, the best way to ensure either is still to put some care and craft into your work.


  1. When we do get to that point, the implications are going to reach far beyond podcasting. Imagine, for example, your phone being able to keep a continuous running transcript of everything said by you or those around you, to which you could refer later. What would life be like in a world where nothing is off the record? (Also, would a text transcript be treated differently than surreptitiously recorded audio under current laws?) (Also, does it say something about the mindset of web publishers that I’ve treated the direction of technology as a given and put all the legal and social implications in a footnote as an afterthought?) 

As the Guardian’s technology editor, Charles Arthur, points out in the Independent back in 2005, “Podcasts take content and put it into a form that can’t be indexed by search engines or be speed-read, and which you can’t hyperlink to (or from). A podcast sits proud of the flat expanse of the Internet like a poppy in a field. Until we get really good automatic speech-to-text converters, such content will remain outside the useful, indexable web.”

Why Audio Never Goes Viral on digg.com, Jan 14, 2014.

The article is very long and the whole thing is worth reading.

Joel Dueck (Author)