Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Naively I would think that there must be a service out there that auto-generates transcriptions from podcasts. Youtube already do this for (some) videos. The output isn't perfect, but it gets about 99% of the words correct.

So my suggestion to OP wouldn't be "don't do the podcast". Instead I would say "do the podcast, but consider that it might not be searchable without a bit of extra effort".



100% there are services! I believe that Azure ML/AI has a natural text transcription service.

In theory I could try and tweet at Seth Juarez@microsoft to see if I can get a sweetheart deal.

But the more I think about the previous poster's position on the text, the more I convince myself that it might be worth a few bucks an episode for me to pay out-of-pocket if it means improved accessibility for users with disabilities.

Not really because it's good business, but more because I've really come to believe in the value and power of implementing accessibility features.

I'm actually starting to get a little teary eyed thinking about that blindness activist's viral video where he broke down physically sobbing the words "thank you" when he saw all of the accessibility features that Naughty Dog added to The Last of Us 2. (I'll hunt the video for people who haven't seen it, but be warned that you will probably bawl your eyes out like a school-child).

From a cost perspective, I'll be putting in hundreds of fake dollars worth of time. The people I interview will be effectively donating hundreds of dollars worth of their own time.

The least I can do is put some serious thought into plunking down dozens of dollars of pocket money for a transcription service.

I'm more sold on the idea than I was in my previous post (5 minutes ago)

Edit: the tweet in question. https://twitter.com/stevesaylor/status/1271404306697158659


I haven’t personally used it but heard good things about Descript (also their demo vids are hilarious)


IBM Watson, Google, Azure, and AWS all have speech-to-text APIs. IBM's claims to distinguish between different voices, although when I used it a couple of years ago for Japanese it was a little lackluster. It's pretty inexpensive: IBM gives you 500 minutes free per month and is a few cents / minute thereafter; Google you gives 60 minutes free, and it's 4c/min thereafter. It's an API rather than a service, so you'd have to write a client to use it. IBM's API (and maybe the others) allows you to request time stamps on the output, so you could let people click on your transcription and seek directly to the part of the video they are interested in.

I suppose you could always do it on your own machine with Sphinx, too, although I don't know how they compare to the others in accuracy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: