CoolTTS - JavaScript TTS Player with SSML

Summary: Combines TTS and SSML using pure JavaScript

Last updated on
((( spellific )))
Practice spelling while playing a fun word game!
Click here to play
Speech Recognition Anywhere
  • Type emails with your voice
  • Write documents with your voice
  • Control the Internet with your voice
  • Chrome Extension
Reconocimiento de voz en cualquier lugar
  • Escribe correos electrónicos con tu voz
  • Escribe documentos con tu voz
  • Controla la Internet con tu voz
  • Extensión de Chrome
Spracherkennung Allerorts
  • Geben Sie E-Mails mit Ihrer Stimme ein
  • Schreiben Sie Dokumente mit Ihrer Stimme
  • Steuern Sie das Internet mit Ihrer Stimme
  • Chrome-Erweiterung
Reconnaissance de la parole
  • Tapez des e-mails avec votre voix
  • Écrivez des documents avec votre voix
  • Contrôlez l'Inernet avec votre voix
  • Extension Chrome
Riconoscimento vocale ovunque
  • Digita e-mail con la tua voce
  • Scrivi documenti con la tua voce
  • Controlla Internet con la tua voce
  • Estensione Chrome
どこでも
音声認識
  • あなたの声で文書を書く
  • あなたの声でメールを入力してください
  • 音声でインターネットをコントロール
  • Chrome拡張機能
语音识别
无处不在
  • 用你的声音写文件
  • 用您的声音输入电子邮件
  • 用你的声音控制互联网
  • Chrome 扩展程序
語音識別
無處不在
  • 用你的聲音寫文件
  • 用您的聲音輸入電子郵件
  • 用你的聲音控制互聯網
  • Chrome 擴展程序
Subscribe to Internet Tips and Tools Feed

ATTENTION! June 29, 2025: Google has fixed the speechSynthesis bug in Google Chrome browser. speechSynthesis and coolTTS will now work with Google voices again! (Chrome Version 138.0.7204.50)

CoolTTS Demonstration Test Box

Introduction

Summary: CoolTTS combines TTS and SSML using pure JavaScript.

Terms:
TTS = Text-to-speech
SSML = Speech Synthesis Markup Language
speechSynthesis = The JavaScript Interface built-in to many browsers for text to speech

Description: speechSynthesis has been part of many browsers for years. The original W3 specification said that speechSynthesis should work with SSML. However, to this day, no browsers have built SSML into the speechSynthesis interface. "CoolTTS Javascript TTS Player" attempts to fix this problem by providing simple JavaScript functions for TTS and SSML including the SSML <break> tag to add pauses to the speech. CoolTTS also supports the SSML <mark> tag. The other problem with speechSynthesis is that each browser (or voice) doesn't implement some of the basic features of the speechSynthesis interface or they have annoying bugs. "CoolTTS Javascript TTS Player" attempts to fix these problems as well. Sadly, there isn't a lot you can do with some of the speechSynthesis limitations of browsers. In addition, updates to the browsers often introduce new bugs.

Features

JavaScript speechSynthesis
Microsoft Local VoicesMicrosoft Online Voices (Edge for Desktop)Google Voices (Chrome for Desktop)iOS VoicesAndroid Voices
Avg Time between utterances1100ms900ms800ms500ms500ms
Avg Time between sentences1100ms800ms550ms500ms500ms
Word Boundary Event
Sentence Boundary Event
Pause Event
Resume Event
speechSynthesis.paused
SSML
JavaScript speechSynthesis With CoolTTS
Microsoft Local VoicesMicrosoft Online Voices (Edge for Desktop)Google Voices (Chrome for Desktop)iOS VoicesAndroid Voices
Avg Time between utterances1100ms900ms800ms500ms500ms
Avg Time between sentences1100ms800ms550ms500ms500ms
Word Boundary Event
Sentence Boundary Event
Pause Event
Resume Event
cooltts.paused
SSML

Pricing

You may use this website for free for testing of CoolTTS. Make sure that you test in different browsers to see how speechSynthesis and CoolTTS has different voices and works differently in each browser.

To use CoolTTS on your own website, for a limited time, you can download the JavaScript file for the price of a donation. The suggested donation is $19.99 USD.

Download

Please thoroughly test CoolTTS using this web page in different browsers before downloading cooltts.js. Make sure that you understand the limitations and differences of speechSynthesis in different browsers and with different voices. CoolTTS uses ONLY the free voices that are built-in to the browser. It will not work with the voices that are available with subscription TTS services.

dlc_b

Download

Downloaded 0 times.
Please make a donation to reveal the download link.

For support for CoolTTS, please leave a comment below.

How To Use

Upload cooltts.js to your server in the same folder as your html file. Paste the following code in your html file to load cooltts.js:

In your html file, for every element that you want a CoolTTS Player to appear above it, you must add a cooltts class to it. Example:

	<div class="cooltts">

The player controls will not appear above the elements until most of the page is loaded. So if the web page has a lot of external scripts, advertisements or images then it may take a while for the player controls to appear. (*Note: On iOS devices speechSynthesis can stop working on web pages with external resources such as Google Ads. It is best not to have speechSynthesis or CoolTTS on websites with external resources like Google Ads.)

You can also send a string or an element directly to cooltts.play() . But because of browser security policies it will probably not start playing speech unless there is a user gesture or interaction that invokes it (a button click).
To send a string: cooltts.play("Hello world!");
To send an element: cooltts.play(document.getElementById("speech_div"));

Other CoolTTS controls: cooltts.stop(); cooltts.pause(); cooltts.resume(); cooltts.rewind(); cooltts.fastforward();

CoolTTS also dispatches custom cooltts events that your web page can listen for with an EventListener: document.addEventListener('cooltts', function() {console.log(event);}, false);
See eventListener for more information about the 'cooltts' event.

<break>

The break tag can insert a pause in the speech to text. It can have one of two attributes: strength or time.
strength can be none, x-weak, weak, medium, strong or x-strong.
time can be in seconds or milliseconds. Examples: time="250ms" or time="3s"
W3 Specification

Example:

<audio>

The audio tag can be used to play an audio file during text to speech. When the audio tag is reached then Text-To-Speech will pause while the audio file plays. When the audio file ends then Text-To-Speech should resume. Text in-between the audio open and closing tag will be spoken if the audio file fails to play for some reason.
W3 Specification

Example of a possibly working audio file:

Example of missing audio file:

Captions

CoolTTS will display captions of the text-to-speech if the variable cooltts.captions=true
Or to display captions you can add the cooltts_captions class to any element.

Example:

<emphasis>

The emphasis element requests that the contained text be spoken with emphasis or stress. The optional level attribute indicates the strength of emphasis to be applied. Defined values are "strong", "moderate", "none" and "reduced". The default level is "moderate".
W3 Specification

eventListener

CoolTTS dispatches custom cooltts events that you can listen for with an EventListener: document.addEventListener('cooltts', function() {console.log(event);}, false);

The JavaScript speechSynthesis interface built-in to most browsers only dispatches events on SpeechSynthesis utterances. (Except for the "onvoiceschanged" event) A website maker has to add multiple event listeners to each of the utterances that are sent to the speechSynthesis.speak() queue. It dispatches an event when each utterance starts and ends, but it doesn't dispatch an event for the beginning and ending of the entire queue of utterances. CoolTTS tries to solve that issue. Also, many of the better quality voices in most browsers do not dispatch many of the utterance events.

statechange: CoolTTS sends an event with event.detail.type=="statechange" when the TTS player changes state. The value of event.detail.state can be: started, playing, paused, resumed, rewind, fastforward, ended, stopped. The variable cooltts.state can also be checked at any time to see the current state of text-to-speech.
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "statechange") {
			console.log(event);
			console.log("state: "+event.detail.state // started, playing, paused, resumed, rewind, fastforward, ended, stopped.
				+"node: ",event.detail.node // the text node
				,event.target // the player controls
				,event.detail.element); // the element being played
		}
	}, false);
</script>

start or end: Whenever a speechSynthesis utterance starts or ends SpeechSynthesisUtterance sends a SpeechSynthesisEvent. CoolTTS also sends the event along with a "cooltts" event.
W3 Specification
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "SpeechSynthesisEvent") {
			console.log(event.detail.event); // SpeechSynthesisEvent properties
			console.log("SpeechSynthesisEvent:"
				+" type:"+event.detail.event.type // start or end
				+", node:",event.detail.node // the text node
				+", sentence:"+event.detail.sentence
				+", sentenceIndex:"+event.detail.sentenceIndex
				+", sentenceLength:"+event.detail.sentenceLength
				,event.target // the player controls
				,event.detail.element); // the element being played
		}
	}, false);
</script>

boundary: If you use the robotic sounding local Microsoft voices in Windows 11 in a browser (David, Mark and Zira), then the utterances dispatch "boundary" events for "sentence" and "word" boundaries. In Google Chrome, Google voices do not dispatch a "boundary" event. In Microsoft Edge, Microsoft Online (Natural) voices dispatch "boundary" events for "word" boundaries only. CoolTTS doesn't have a way to make a "boundary" event for voices that don't support it. However, CoolTTS does make a pseudo sentence "boundary" event for voices like Google voices and Microsoft Online (Natural voices. CoolTTS divides speechSynthesis utterances into sentences. The "start" and "end" event for each utterance that CoolTTS dispatches provides an event.detail.sentenceIndex and an event.detail.sentenceLength variable. For voices that support the word "boundary" event, you can listen for the 'cooltts' event:
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "SpeechSynthesisEvent" && event.detail.event.type == "boundary") {
			// Not all voices dispatch a boundary event
			console.log(event.detail.event); // SpeechSynthesisEvent properties
			console.log("name: "+event.detail.event.name // "word" or "sentence"
				+", charIndex: "+event.detail.event.charIndex 
				+", charLength: "+event.detail.event.charLength
				+", sentence:"+event.detail.sentence
				+", sentenceIndex:"+event.detail.sentenceIndex
				+", sentenceLength:"+event.detail.sentenceLength
				+", node: ",event.detail.node // the text node
				,event.detail.element); // the element being played
			// To calculate the word position in the text node:
			var word_start = event.detail.sentenceIndex + event.detail.event.charIndex;
			var word_end = word_start + event.detail.event.charLength;
		}
	}, false);
</script>

Boundary event example: Smiley face mouth speech movements

Smiley face mouth speech movements


JavaScript Code:
<script>

</script>

pause: JavaScript speechSynthesis has a "pause" event, however for most of the better quality voices the "pause" event is never dispatched. Also speechSynthesis.paused is often "false" in most browsers for most voices even when speechSynthesis is paused. CoolTTS fixes this problem by dispatching a "paused" event whenever the speechSynthesis is paused. Also you can check the variable cooltts.paused. If it is "true" then speechSynthesis is paused.
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "statechange" && event.detail.state == "paused") {
			console.log(event);
			console.log("node: ",event.detail.node // the text node
				,event.target // the player controls
				,event.detail.element); // the element being played
		}
	}, false);
</script>

resume: JavaScript speechSynthesis has a "resume" event, however for most of the better quality voices the "resume" event is never dispatched. Also speechSynthesis.paused is often "false" in most browsers for most voices even when speechSynthesis is paused. CoolTTS fixes this problem by dispatching a "resumed" event whenever the speechSynthesis is resumed. Also you can check the variable cooltts.paused. If it is "true" then speechSynthesis is paused.
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "statechange" && event.detail.state == "resumed") {
			console.log(event);
			console.log("node: ",event.detail.node // the text node
				,event.target // the player controls
				,event.detail.element); // the element being played
		}
	}, false);
</script>

error: JavaScript speechSynthesis dispatches an "error" event when there is an error. It also dispatches an "error" event when SpeechSynthesis is stopped and it dispatches error: "canceled" or "interrupted". CoolTTS also passes the error event along.
W3 Specification
JavaScript Code:

<script>
	document.addEventListener('cooltts', function() {
		if (event.detail.type == "SpeechSynthesisErrorEvent") {
			console.log(event.detail.event); // The SpeechSynthesisErrorEvent properties
			console.log("error: "+event.detail.event.error // The type of error
				+", node: ",event.detail.node // the text node
				,event.detail.element); // the element being played
		}
	}, false);
</script>

mark: The W3 Specification for the JavaScript speechSynthesis interface says that a mark event should be fired when a mark tag is reached. Also speechSynthesisUtterance has an "onmark" event listener. But it is apparently never fired presumably because none of the browsers ever integrated SSML into the speechSynthesis interface. CoolTTS attempts to fix that by providing a "mark" event. See <mark> for how to use it.

Hidden elements

CoolTTS will speak elements that are not visible to the user such as elements with CSS display:none or visibility:hidden. If you do not want CoolTTS to speak these elements then you can add class="cooltts_skip" to the element.

Example:
The element for the player below has CSS display:none;

Example:
The element for the player below has CSS visibility:hidden;

xml:lang

xml:lang is a defined attribute for the speak, lang, desc, p, s, token, and w elements. It accepts a 2 letter language code and an optional 2 letter country code. CoolTTS will add the value of the xml:lang attribute to the utterance being spoken by the speechSynthesis interface. This may change the voice that is being used. However, if a Microsoft Online Multilingual voice is selected then the same voice will likely be used.
W3 Specification

Example:

<mark>

The mark tag can be used to place a mark for an event that you want to happen at that mark. Add an event listener for the mark. Each mark should have a name attribute, such as: <mark name="my_mark">

According to https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/mark_event browsers are supposed to have a built-in "mark" eventListener for speechSynthesis utterances. However, it appears that the event is never fired in browsers because they do not correctly parse SSML. Therefore, CoolTTS has its own method for parsing mark tags and using an event listener for the "cooltts" event.

Sadly, with JavaScript speechSynthesis in the popular browsers there is a slight pause when a mark tag is reached because one utterance ends and another begins and the popular browsers (Google Chrome and Microsoft Edge) and quality voices have a slight pause between utterances. So if the mark tag is in the middle of a sentence it will sound strange.

You can listen for a mark event by adding an event listener for the "cooltts" event: document.addEventListener('cooltts', function() {console.log(event);}, false);

Mark event properties:

event.targetThe player controls of the event
event.detail.typemark
event.detail.nameThe name given in the name attribute of the mark tag
event.detail.elementThe element that is being played with tts

W3 Specification

Example:


JavaScript Code:
<script>

</script>

<p>

A p element represents a paragraph. An s element represents a sentence. Both elements can have a lang attribute.

<s>

An s element represents a sentence. s elements can have a lang attribute. If you use an <s> tag in a standard html document then it will be parsed by the html browser as a strikethrough tag. To stop that from happening you may want to add style="text-decoration: inherit;" to the tag.

<phoneme>

The phoneme element provides a phonemic/phonetic pronunciation for the contained text. Unfortunately, there seems to be no good method for doing phoneme pronunciations with the JavaScript speechSynthesis interface in today's browsers. It would always sound jerky and inaccurate. Therefore, the phoneme tag will probably never be part of CoolTTS.
W3 specification

<prosody>

The prosody element permits control of the pitch, speaking rate and volume of the speech output. All of the attributes are optional. Because of limitations in how browsers implement the speechSynthesis interface it is impossible to follow completely the W3 specification for prosody.

prosody attributes:

pitch Adjust the pitch of the voice. The value can be a number from 0 to 2. Where 1 is the default value. (Note: Google voices only go as low as 0.1. Also Note: Microsoft Natural voices do not have the ability to change the pitch at all.) Or the value can be: "x-low", "low", "medium", "high", "x-high", or "default".
rate Adjust the rate or speed of the speech. That value can be a range from 0.1 to 2. Where 1 is the default value. (Note: Microsoft local voices or non-Natural voices can have a value from 0.1 to 10.) Or rate can be a percetage from 10% (slowest) to 200% (fastest). 100% is default rate. Or rate can be: "x-slow", "slow", "medium", "fast", "x-fast", or "default".
volume Adjust the volume of the speech. speechSynthesis in browsers only allows for a volume from 0 (lowest) to 1 (highest). The default value is 1. Or the value for volume can be: "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default".
  • pitch
  • rate
  • volume

<say-as>

The say-as element allows the author to indicate information on the type of text construct contained within the element and to help specify the level of detail for rendering the contained text.
W3 Specification

The interpret-as attribute supports the following values:

  • verbatim or spell-out
  • digits
  • date
  • cardinal
  • ordinal
  • time

Skip element

CoolTTS will speak elements that are not visible to the user such as elements with CSS display:none or visibility:hidden. If you do not want CoolTTS to speak these elements then you can add class="cooltts_skip" to the element.

Example:

This is a feature of CoolTTS, not SSML. You can do something similar in SSML using the sub tag and alias="".
Example:

<sub>

The sub element is employed to indicate that the text in the alias attribute value substitutes the contained text for pronunciation. This allows a document to contain both a spoken and written form. The REQUIRED alias attribute specifies the string to be spoken instead of the enclosed string.
W3 Specification

Note: If you use the <sub> tag in an HTML document then the HTML browser will treat the tag as a subscript html tag which can have the undesirable effect of changing the text to be smaller and lower than the surrounding text. To prevent that from happening you may want to use to add: style="font-size: inherit; vertical-align: inherit; to the element.

Example:

<voice>

The voice element allows you to attempt to change the voice by any combination of name, gender, age or language attributes.
W3 Specification

  • name

    For the name attribute you can put any name that is available for TTS in the browser that is being used. CoolTTS allows you to do partial matching and use | OR operator. For example, if you put name="Brian|Male" then if the user is using Micrsoft Edge browser then it will probably choose the voice: "Microsoft Brian Online (Natural) - English (United States)". If the user is using Google Chrome browser then it will choose a voice with the word "Male" in it. It would most likely be "Google UK English Male".

  • gender

    Options for the gender attribute are "male", "female", "neutral", or the empty string "".

    Unfortunately, the JavaScript speechSynthesis interface in modern browsers doesn't have a method for gender. Google Chrome's "Google" voices mostly have only one gender for each language and the gender is female. The two exceptions are Spanish and UK English which both have male and female voices. CoolTTS will attempt to make the other languages and dialects sound "male" if gender="male" is included in the voice tag by changing their pitch. Microsoft Online Natural voices cannot change pitch so we need to choose a male or female voice. But Microsoft does not specify if a voice is male or female for each language. Instead they put different gender names. Instead of making a long list of female and male names for Micrsoft Edge browser, CoolTTS will choose one of the Multilingual male or female voices.

  • age

    The JavaScript speechSynthesis interface does not have a method for age. In Google Chrome the pitch of the voice can be changed to a higher level to try and mimic a child. The pitch can also be lowered in Chrome and the rate and can slowed a bit to try and mimic an older person. Microsoft Online Natural voices do not have the ability to change pitch. So in Microsoft Edge browser to mimic an older person the only thing that can be done is to slow down the rate a little bit. Microsoft Edge includes at least 3 child voices at the moment: Maisie (en-GB), Ana (en-US), Eloise (fr-FR). CoolTTS will pick one of those voices for an age specification of 12 or under.

  • language

    The language attribute accepts a 2 letter language code and an optional 2 letter country code, such as "en-US" CoolTTS will add the value of the language attribute to the utterance being spoken by the speechSynthesis interface. This may change the voice that is being used. However, if a Microsoft Multilingual voice (Microsoft Edge) is selected then the same voice will likely be used.

Browser Limitations of JavaScript speechSynthesis

Desktop and Laptop Computers

It seems that the only two browsers have put any effort into the JavaScript speechSynthesis interface: Google Chrome and Microsoft Edge browsers. Edge has put a little more effort into the programming and has a nice selection of voices. Other Chromium browsers (Opera, Brave, Vivaldi) and Firefox have not put much effort into speechSynthesis. They MIGHT have a few older, robotic sounding voices available that come with the operating system. In Windows, they might have older Microsoft voices such as Microsoft David, Mark and Zira. Please do not expect CoolTTS to work well with these browsers. If users want a good sounding Text-to-speech interface for free then they need to use either Google Chrome or Microsoft Edge. Note that JavaScript speechSynthesis in every popular browser can only play one utterance at a time. So if a new utterance is started, even in a different tab and with a different website, then the first utterance will stop playing.

To test all the capabilities of JavaScript speechSynthesis without CoolTTS go to: https://seabreezecomputers.com/tts

Events: Microsoft local voices are available in most Windows browsers. Microsoft local voices are usually low quality but they dispatch more events than other voices including Google voices. They dispatch "pause", "resume", and word and sentence "boundary" events.

  • Google Chrome

    Events: Google Chrome has a few high quality Google voices in various languages. But none of the Google voices fire a "boundary" event. Utterances using Google voices also never fire a "pause" or "resume" event. But CoolTTS attempts to solve this problem by dispatching a "paused" and "resumed" event as well as many other events. When utterances using Google voices are paused speechSynthesis.paused still stays false. CoolTTS provides a more accurate coolTTS.paused variable.

    Pause Bug: Google voices don't resume if speechSynthesis.pause() is invoked in the middle of an utterance and then resume() is invoked more than 15 seconds later. Google voices also sometimes ignore pause() if the pause is invoked close to the end and start of utterances. CoolTTS attempts to solve these issues by using its own queue instead of the speechSynthesis.speak() queue and by pausing and resuming based on sentences rather than in the middle of words.

    Time limit Bug: Another issue with Google voices is that they will stop and get stuck after about 14 seconds of text to speech. CoolTTS attempts to solve this problem by calling speechSynthesis.pause() and speechSynthesis.resume() every 12 seconds. However, there is a slight stutter when those commands are called. If Chrome gets stuck then calling speechSynthesis.speak() with a new utterance does nothing unless speechSynthesis.cancel() is called first.

    Blank Utterance Bug: If an utterance is blank with a Google voice then Chrome never fires the "start" event on the utterance. Instead it goes right to the "end" event for the utterance.

    Pitch: You are able to change the pitch, rate and volume on Google voices.

    New Lines: A nice feature of Google voices is that they ignore new lines (\n) in the middle of sentences and speak them as the same sentence. New lines in HTML are generally not visible but are treated as a space, so it makes sense that Google voices deal with them this way. However, if the new line starts with a capital letter then Google voices treat them as a different sentence. This also makes sense.

  • Microsoft Edge

    Events: Microsoft Edge browser has dozens of high quality Online Natural voices in many languages. The Microsoft Online Natural voices fire a "boundary" event for "word" boundaries, but not for "sentence" boundaries. Utterances using Microsoft Natural voices also never fire a "pause" or "resume" event. But CoolTTS attempts to solve this problem by dispatching a "paused" and "resumed" event as well as many other events. When utterances using Microsoft Online Natural voices are paused speechSynthesis.paused still stays false. CoolTTS provides a more accurate coolTTS.paused variable.

    Pitch: Another issue with Microsoft Online Natural voices is that the pitch never changes even when setting the pitch variable on the utterance. Setting the volume and rate variables does work though.

    New Lines: Both Microsoft Local voices and Microsoft Online voices treat new lines (\n) as a new sentence. HTML usually doesn't display new lines but treats them as spaces or white space. When Microsoft voices treat new lines as new sentences then it can cause unwanted pausing in the middle of a sentence. CoolTTS attempts to solve this potential problem.

  • Mozilla Firefox

    Mozilla Firefox might have some older, robotic sounding voices that come with the operating system. It will probably be using Microsoft local voices in Windows. English might be the only language available. The JavaScript speechSynthesis interface works for the most part in Firefox browser, however, if there is more than one utterance in the speechSynthesis.speak() queue then it ignores speechSynthesis.pause(). Therefore, the <break> tag doesn't work. To compensate for that issue, CoolTTS will change the cooltts.use_cooltts_queue variable to true and then then <break> tag works. That causes coolTTS to use its own queue instead of the speechSynthesis.speak() queue.

Mobile Devices

Mobile device browsers on iPhones, iPads and Android devices are not very good at speechSynthesis either. The mobile browsers usually don't have the same quality voices as their desktop browser counterparts.

  • Volume:

    Often times users of mobile devices turn their media volume all the way down at some point when browsing the Internet. Then if they press play on a web page with speechSynthesis they don't hear any audio. They usually end up pausing or stopping the speechSynthesis and then they try to press the volume up button on the side of the device. But with the media paused the volume up button is only changing the ringer volume. The user has to figure out how to press the play button and then press the volume up button on the side of the device WHILE the media is playing. So it can be difficult for some users of mobile browsers to figure out how to hear the audio from TTS speechSynthesis. There are no methods in JavaScript for detecting the volume level of a mobile device.

  • iOS:

    Mute: If an Apple mobile device is soft muted (muted with the button on the side of the device) then JavaScript speechSynthesis will be silent. There is no indication to the user that their device is muted. There is also no JavaScript method for detecting if a device is muted. So a user may have a difficult time figuring out how to hear speechSynthesis on an iOS device like an iPhone or iPad.

    Voices: It seems that every browser for iOS is just a skin on top of a Safari engine. So speechSynthesis on iOS will probably run the same in every browser on iPhone/iPad. Starting with around iOS 16 Apple installed a lot of voices for JavaScript speechSynthesis. But half of them seem to be some kind of joke. They are robotic sounding and have strange sound effects or instruments with the voice playback. ("Sandy", "Shelley", "Grandma", "Grandpa", "Eddy", "Reed", "Anna", "Rocko", "Flo", "Bahh", "Albert", "Jester", "Organ", "Cellos", "Zarvox", "Superstar", "Bells", "Trinoids", "Kathy", "Boing", "Whisper", "Good News", "Wobble", "Bad News", "Bubbles", "Junior", "Ralph") It has been decided to filter these voices out of CoolTTS voice options. What is left over are some ok quality voices for some languages, however, they seem to list duplicates of the same voices many times.

    External resources/Ads: Another issue with iOS devices is that they seem to not fire any events on a SpeechSynthesis utterance (except for an error event if the utterance is canceled) if the web page is large with a lot of information. If that is the case, then iOS will probably only play one sentence with CoolTTS and then pause. Many websites with external resources or advertisements like Google Ads often mess up the JavaScript speechSynthesis so that it does not work properly. So it is best to use JavaScript speechSynthesis on a web page without ads and external resources for iOS devices.

    lang: Changing the "lang" attribute for an utterance in iOS does not change the voice like it does with other browsers. In iOS you have to choose a voice to change the language.

    Events: iOS voices dispatch "pause" and "resume" events. iOS voices also dispatch "boundary" events, but only for word boundaries, not sentence boundaries. speechSynthesis.paused correctly reports "true" when speechSynthesis.pause() is invoked.

    New Lines: iOS voices seem to ignore new lines (\n) in the middle of sentences and speak them as the same sentence. New lines in HTML are generally not visible but are treated as a space, so it makes sense that iOS voices deal with them this way. However, even if the new line starts with a capital letter then iOS voices still treats them as the same sentence. This may make two different sentences sound like one. CoolTTS attempts to solve this issue.

  • Android:

    Events: Android does not dispatch "pause" or "resume" events. speechSynthesis.paused always reports "false" even after speechSynthesis.pause() is invoked. Android voices also do not dispatch "boundary" events.

    Pause Bug: On some Android devices speechSynthesis.resume() does not work after speechSynthesis.pause() and the speechSynthesis utterance remains paused indefinitely. CoolTTS attempts to solve this issue.

    New Lines: Android voices seem to treat new lines (\n) as a new sentence. HTML usually doesn't display new lines but treats them as spaces or white space. When Android voices treat new lines as new sentences then it can cause unwanted pausing in the middle of a sentence. CoolTTS attempts to solve this potential problem.

Future Development

If there is enough interest for this project then I will continue to work on it, fixing bugs and possibly adding new features.

There are no plans to make this script work with subscription TTS services. Those services have their own methods for processing JavaScript voices and SSML using their own APIs. Those subscription services can get expensive. The point of this project is to use the JavaScript speechSynthesis interface built-in to many modern day browsers and to provide a method to use it with SSML.

For support questions, please leave a comment below.

History

6/3/2025 - Version 1.1 - Improved applying settings changes while playing or paused.

5/31/2025 - Version 1.0 - CoolTTS JavaScript TTS Player created.

Last updated on June 30, 2025
Created on January 21, 2025

User Comments

There are 0 comments.

Displaying first 50 comments.