You finished your novel. Months of early mornings, late nights, and one particularly unhinged weekend where you wrote 8,000 words fueled by cold brew and spite. The manuscript is done. Your editor signed off. Your cover looks great.

And then someone asks: “Is there an audiobook?”

For most indie authors, the honest answer is that audiobook production costs somewhere between $3,000 and $5,000. A skilled narrator who can voice your characters, hold a listener’s attention for twelve hours, and deliver a performance worthy of the story you spent a year writing doesn’t work for free. Nor should they.

AI narration has been chipping away at this problem for a couple of years. Upload your manuscript, pick a voice, press generate. The technology keeps improving. But most AI audiobook tools give you one voice reading your entire book, which means your grizzled sea captain and your teenage protagonist and your sarcastic bartender all sound like the same person adjusting their pitch slightly.

Phil Marshall ran into this exact frustration. And because his previous career involved building conversational AI companies, his solution went a bit further than most.

A Surgeon, a Sci-Fi Novel, and a Missing Audiobook

Marshall’s career doesn’t follow a straight line. He trained as a surgeon, spent more than 25 years in technology, and led a conversational AI company in healthcare before selling it in 2021. Free from the corporate grind, he did what a lot of people dream of doing: he wrote a novel.

The result was Taming the Perilous Skies, a science fiction thriller with more than 100 characters, many with distinct accents. Marshall, who describes himself as an “audio-only reader,” naturally wanted to hear his own story. So he tried to build a multicast audiobook using the AI tools available at the time (ElevenLabs, Camtasia, a lot of patience).

The process was expensive and painful. Each character needed a separate voice. The workflow was clunky. Costs climbed into the thousands. And at the end of it, he had audio files sitting on his hard drive with no clear path to distribution or monetization.

That gap, between “I have a book” and “I have an audiobook people can actually find,” became the problem he decided to solve. He brought in co-founder Andrew Wallner (they’d worked together at the previous AI company) and a technology co-founder named Brent, and they started building Spoken out of Portland, Oregon. The platform went through a year of beta testing with over 3,000 authors before launching publicly in August 2025.

What Spoken Actually Does

The pitch is straightforward: upload your manuscript, and Spoken turns it into a professionally narrated audiobook. But the way it gets there is where it diverges from other AI narration tools.

When you upload a manuscript (DOCX, TXT, or ePUB), Spoken’s AI doesn’t just start reading. It analyzes the entire text for genre, style, tone, and, most importantly, the characters. The system parses dialogue, identifies who’s speaking, and builds what the team calls mathematical representations of each character’s ideal voice profile.

From there, you choose your narration format: single narrator, dual narration, or full cast. For multi-voice projects, Spoken matches characters to voices from a library of nearly 200 voice actors (who are compensated for their work) or generates custom AI voices designed around your character descriptions. If no existing voice matches your grizzled sea captain, you can describe what he sounds like and Spoken will create a voice exclusively for him, usable only by you.

The narration itself is powered by partnerships with Hume AI and ElevenLabs. The system applies emotional cues at the dialogue level, attempting to understand when a character is angry, frightened, or cracking a joke, and adjusting delivery accordingly. This isn’t text-to-speech reading words off a page. It’s closer to a directed performance, just one directed by algorithms instead of a sound engineer.

Once your audiobook is generated, you can review and edit specific passages, adjusting pacing, tone, and delivery. Re-narration doesn’t cost extra. You keep refining until you’re satisfied.

The Multi-Voice Difference

So what does Spoken do that other AI audiobook tools don’t?

Full-cast narration at indie-author prices.

Traditional multi-voice audiobook production (the kind where different actors voice different characters) is the most expensive format in publishing. We’re talking $25,000 or more for a single title. It’s why most audiobooks, even from major publishers, use a single narrator.

Spoken makes multi-voice narration the default experience. And the numbers suggest authors want it: during beta testing, 80% of projects used the multi-cast format.

The technical approach matters. Rather than randomly assigning voices to character names, Spoken analyzes how a character speaks throughout the manuscript, their described physicality, their emotional arc, and builds a voice profile that reflects those attributes. During early testing, 63% of authors actually preferred the AI-selected voices over their own manual choices. That manuscript analysis is doing real work.

For series writers, character voices persist across books. Your protagonist sounds the same in book three as they did in book one. That kind of consistency, across hundreds of thousands of words and dozens of characters, is something even human narrators find challenging to maintain over multi-year projects.

Getting Your Audiobook Into the World

Creating the audio is only half the equation. Spoken also handles distribution.

Finished audiobooks can be published to Google Play, Spotify, Kobo, Barnes & Noble, TuneIn, OverDrive, and Everand, among others. Distribution partners include BookFunnel and Author Republic. You can also download your files (MP3 or LPF format) and distribute them independently.

One detail that matters to indie authors: you retain 100% ownership of everything you create. Spoken doesn’t claim royalties or rights to your content. Your book, your audiobook, your intellectual property.

The platform also offers streaming directly on spoken.press, with a listener discovery network. The revenue model for streaming gives authors a 50/50 split, which is notably more generous than the roughly 33% that Kindle Unlimited typically pays.

What It Costs

Spoken uses pay-per-project pricing. No upfront subscription required.

The base rate is $20 per 5,000 words, rounded up to the nearest 5,000-word block. A 46,000-word novel would be calculated at 50,000 words and cost $200.

For authors producing multiple audiobooks, a $50/month subscription cuts narration costs in half ($10 per 5,000 words). That same 50,000-word novel drops to $100.

There’s a free 5,000-word trial, so you can test the platform with a short story or a few chapters before spending anything.

For context: a traditionally produced audiobook of the same length would run $5,000 to $15,000. Spoken brings that down by roughly 95%. It’s not free, but it fundamentally changes the math for indie authors who assumed audiobooks were out of reach.

Who This Is For (and Who It Isn’t)

Spoken is built for indie and self-published authors. If you’re producing your own books and want to add audiobooks to your catalog without a five-figure investment, this is squarely aimed at you. The pricing, the distribution options, the “you own everything” philosophy, all of it points at independent creators who want to control their work.

It’s particularly strong for fiction with multiple characters. The multi-voice system shines when your manuscript has distinct characters who need distinct voices. Romance with dual POVs, fantasy with a full ensemble, mystery with a roster of suspects: these are the projects where Spoken’s analysis and voice-matching pay off.

It’s not a replacement for a great human narrator. If you write literary fiction where the narrator’s vocal performance IS the product, where every pause and inflection carries meaning, AI narration isn’t there yet. Marshall himself acknowledges this openly, noting that “this is the worst it will ever be” as the technology continues to improve. But today, there’s still a gap between AI and a top-tier voice actor.

The platform is web-only. No desktop app, no mobile app. You need a browser and an internet connection.

It’s still young. Spoken launched publicly in August 2025 after a year of beta. The core technology is solid (3,000+ beta testers helped shape it), but this isn’t a platform with five years of community feedback and battle-tested edge cases. If you prefer tools that have been through many iterations, temper your expectations.

Text uploads need some prep. For manuscripts longer than 20,000 words uploaded as text files, you’ll need to add chapter-break tags manually. ePUB uploads handle this automatically, which is the easier path for longer works.

The Bottom Line

Spoken solves a real problem that has kept audiobooks out of reach for most indie authors: cost. But it doesn’t just solve it by generating cheap single-narrator AI audio. It makes multi-voice, full-cast narration, the format that used to require a $25,000 budget, accessible for a couple hundred dollars.

The platform was built by someone who had exactly the problem it addresses. Phil Marshall wanted a multi-voice audiobook of his 100-character sci-fi novel, couldn’t make it work with existing tools, and built something that could. A year of beta testing with thousands of authors shows in the details: voice persistence across series, emotional delivery at the dialogue level, a rights model that keeps authors in control of their work.

If you’ve been waiting for the economics of audiobook production to shift in your favor, Spoken is a significant move in that direction. It won’t replace a gifted human narrator for a prestige literary title. But for the thousands of novels that would never get an audiobook otherwise, it gives them a voice. Several voices, actually.

Spoken: What If Every Character in Your Audiobook Had Their Own Voice?