You finished your novel. Months of early mornings, late nights, and one particularly unhinged weekend where you wrote 8,000 words fueled by cold brew and spite. The manuscript is done. Your editor signed off. Your cover looks great.
And then someone asks: “Is there an audiobook?”
For most indie authors, the honest answer is that audiobook production costs somewhere between $3,000 and $5,000. A skilled narrator who can voice your characters and hold a listener’s attention for twelve hours doesn’t work for free. Nor should they.
AI narration has been chipping away at this problem for a couple of years now. Upload your manuscript, pick a voice, press generate. The technology keeps improving. But most AI audiobook tools give you one voice reading your entire book, which means your grizzled sea captain and your teenage protagonist sound like the same person adjusting their pitch slightly. (Not exactly immersive.)
Phil Marshall ran into this exact frustration. And because his previous career involved building conversational AI companies, his solution went a bit further than most.
A Surgeon, a Sci-Fi Novel, and a Missing Audiobook
Marshall’s career doesn’t follow a straight line. He trained as a surgeon, spent more than 25 years in technology, and led a conversational AI company in healthcare before selling it in 2021. Free from the corporate grind, he did what a lot of people dream of doing. He wrote a novel.
The result was Taming the Perilous Skies, a science fiction thriller with more than 100 characters, many with distinct accents. Marshall, who describes himself as an “audio-only reader,” naturally wanted to hear his own story. So he tried to build a multicast audiobook using the AI tools available at the time (ElevenLabs, Camtasia, a lot of patience).
The process was expensive and painful. Each character needed a separate voice. The workflow was clunky. Costs climbed into the thousands. And at the end of it, he had audio files sitting on his hard drive with no clear path to distribution or monetization. Ouch.
That gap between “I have a book” and “I have an audiobook people can actually find” became the problem he decided to solve. He brought in co-founder Andrew Wallner (they’d worked together at the previous AI company) and a technology co-founder named Brent, and they started building Spoken out of Portland, Oregon. The platform went through a year of beta testing with over 3,000 registered users before launching publicly in August 2025.
What Spoken Actually Does
The pitch is straightforward. Upload your manuscript, and Spoken turns it into a professionally narrated audiobook. But the way it gets there is where it diverges from other AI narration tools.
When you upload a manuscript (DOCX, TXT, or ePUB), Spoken’s AI doesn’t just start reading. It analyzes the entire text for genre, style, tone, and, most importantly, the characters. The system parses dialogue, identifies who’s speaking, and builds what the team calls mathematical representations of each character’s ideal voice profile.
From there, you choose your narration format. Single narrator, dual narration, or full cast. For multi-voice projects, Spoken matches characters to voices from a library of nearly 200 voice actors (who are compensated for their work) or generates custom AI voices designed around your character descriptions. If no existing voice matches your grizzled sea captain, you can describe what he sounds like and Spoken will create a voice exclusively for him, usable only by you. That’s pretty cool.
The narration itself is powered by partnerships with Hume AI and ElevenLabs. The system applies emotional cues at the dialogue level, attempting to understand when a character is angry or frightened or cracking a joke, and adjusting delivery accordingly. This isn’t text-to-speech reading words off a page. It’s closer to a directed performance, just one directed by algorithms instead of a sound engineer.
Once your audiobook is generated, you can review and edit specific passages, adjusting pacing and delivery. Re-narration doesn’t cost extra. You keep refining until you’re satisfied.
The Multi-Voice Difference
Full-cast narration at indie-author prices. That’s the short version.
Traditional multi-voice audiobook production (the kind where different actors voice different characters) is the most expensive format in publishing. We’re talking $25,000 or more for a single title. It’s why most audiobooks, even from major publishers, use a single narrator.
Spoken makes multi-voice narration the default experience. And the numbers suggest authors want it. During beta testing, 80% of projects used the multi-cast format.
The technical approach matters here. Rather than randomly assigning voices to character names, Spoken analyzes how a character speaks throughout the manuscript, their described physicality, their emotional arc, and builds a voice profile that reflects those attributes. During early testing, 63% of authors actually preferred the AI-selected voices over their own manual choices. The manuscript analysis is doing real work.
For series writers, character voices persist across books. Your protagonist sounds the same in book three as they did in book one. That kind of consistency across hundreds of thousands of words is something even human narrators find challenging to maintain over multi-year projects.
Getting Your Audiobook Into the World
Creating the audio is only half the equation. Spoken also handles distribution.
Finished audiobooks can be published to Google Play, Spotify, Kobo, Barnes & Noble, TuneIn, OverDrive, and Everand, among others. Distribution partners include BookFunnel and Author Republic. You can also download your files (MP3 or LPF format) and distribute them independently.
One detail that matters to indie authors, and it’s a big one. You retain 100% ownership of everything you create. Spoken doesn’t claim royalties or rights to your content. Your audiobook, your IP.
The platform also offers streaming directly on spoken.press, with a listener discovery network. The revenue model for streaming gives authors a 50/50 split, which is notably more generous than the roughly 33% that Kindle Unlimited typically pays. I like that math a lot.
What It Costs
Spoken uses pay-per-project pricing. No upfront subscription required.
The base rate is $20 per 5,000 words, rounded up to the nearest 5,000-word block. A 46,000-word novel would be calculated at 50,000 words and cost $200.
For authors producing multiple audiobooks, a $50/month subscription cuts narration costs in half ($10 per 5,000 words). That same 50,000-word novel drops to $100.
There’s a free 5,000-word trial, so you can test the platform with a short story or a few chapters before spending anything. Definitely take advantage of that before committing.
For context, a traditionally produced audiobook of the same length would run $5,000 to $15,000. Spoken brings that down by roughly 95%. It’s not free, but it fundamentally changes the math for indie authors who assumed audiobooks were out of reach.
Who This Is For (and Who It Isn’t)
Spoken is built for indie and self-published authors. If you’re producing your own books and want to add audiobooks to your catalog without a five-figure investment, this is squarely aimed at you. The pricing, the distribution options, the “you own everything” philosophy… all of it points at independent creators who want to control their work.
It’s particularly strong for fiction with multiple characters. The multi-voice system shines when your manuscript has distinct characters who need distinct voices. Romance with dual POVs, fantasy with a full ensemble. These are the projects where Spoken’s analysis and voice-matching pay off.
It’s not a replacement for a great human narrator. If you write literary fiction where the narrator’s vocal performance IS the product, where every pause and inflection carries meaning, AI narration isn’t there yet. Marshall himself acknowledges this openly, noting that “this is the worst it will ever be” as the technology continues to improve. But today, there’s still a gap between AI and a top-tier voice actor.
The platform is web-only. No desktop app, no mobile app. You need a browser and an internet connection.
It’s still young. Spoken launched publicly in August 2025 after a year of beta. The core technology is solid (3,000+ registered beta users helped shape it), but this isn’t a platform with five years of community feedback and battle-tested edge cases. If you prefer tools that have been through many iterations, temper your expectations accordingly.
Text uploads need some prep. For manuscripts longer than 20,000 words uploaded as text files, you’ll need to add chapter-break tags manually. ePUB uploads handle this automatically, which is the easier path for longer works. Save yourself the headache and go with ePUB if you can.
For the thousands of novels that would never get an audiobook because the economics simply didn’t work, Spoken changes the equation. It won’t replace a gifted human narrator for a prestige literary title. But $200 for a full-cast audiobook of your 50,000-word novel? That’s a door that used to be locked shut for most of us, and it’s wide open now.