Speech Recognition Anywhere - Chrome Extension

Published on
Last updated on
Subscribe to Internet Tips and Tools Feed

With "Speech Recognition Anywhere" you can control the Internet with your voice. Use Speech Recognition to fill out any input, textarea, form or document on the web! The speech you speak is automatically typed into any form on any web page. "Speech Recognition Anywhere" can also be used as an awesome Virtual Assistant in Chrome. Download the Speech Recognition Anywhere Chrome Extension today.

Get Extension


  • Virtual Assistant Mode
  • Choose between dozens of languages and dialects for speech recognition
  • Dictate emails and online documents
  • Fill in forms with your voice
  • Go to the next or previous field with your voice
  • Go to any web page with your voice
  • Switch tabs and navigate webpages with your voice
  • Scroll page up or down
  • Click on links and buttons with your voice
  • Cut, Copy, Paste, Clear, Highlight
  • Say "Show labels" to see labels to buttons on a webpage
  • Say "Play (name of artist or song)" to play music instantly
  • Create Custom Voice Commands
  • Text To Speech
  • Scripting
Get Extension

Custom Commands

Look in the comments below for "Custom Commands" that you can add to Speech Recognition Anywhere. If you have created some awesome commands for Speech Recognition Anywhere then please share them in the comments section below. (If you have urls (http: https:) in the action then please surround it with <code> </code> tags so the comment box does not convert it to a link.)

To create custom commands you do not need to use regular expressions but regular expressions will make your custom commands more powerful. For example, you could create this basic custom command:

Phrase: Display the weather satellite
Description: Display the weather satellite
Action: https://weather.weatherbug.com/maps/

But you would have to say the exact phrase, "Display the weather satellite" in order for the satellite image to display. But if you use regular expressions like in the example below then you can say a number of similar sentences to activate the command:

Phrase: (?:Display|Show)(?:.*?)?(?:satellite|clouds)(?:.*?)?(?:for |of |in )?(.*?)?
Description: Display the weather satellite
Action: https://weather.weatherbug.com/maps/$1

With the above phrase you could say "Show me the clouds" or "Display the weather satellite for New York". Here is a breakdown of the phrase:


(?:Display|Show) means to look for either "display" or "show". The | symbol means "or". Putting ?: at the beginning of the match inside the parentheses () means to look for the match but don't remember the match.

(?:.*?)? means to look for any number of optional words like "the" or "the weather" and do not remember the match. The ? at the end outside of the parentheses () means these words are optional. For example, you could say "Show me the weather satellite" or you could just say "Show satellite".

(?:satellite|clouds) means that the user has to say either "satellite" or "clouds" in the phrase for the phrase to be detected.

(?:.*?)? means that we again look for any number of optional words.

(?:for |of |in )? means that we look for "for " or "of " or "in " so that the user can say "Display the satellite for Colorado". Putting the ? at the end means this is optional.

(.*?)? means that we look for one more optional words or group of words at the end. But this time we don't put ?: at the beginning inside the parentheses () because we want to remember the match to use it later on. We want to remember the last word for a spoken command like "Show me the weather in London". Then the remembered match can be used in the action: https://weather.weatherbug.com/maps/$1 . The $1 will be replaced with London in the url. $1 is used for the frist match and $2 for the second, etc. If you wanted to put the whole spoken command in the action then you would use $0. As an example, if you were wanting to let Google to decide how to play music for you then you could use this phrase: Play (.*?) . So the spoken command could be "Play Lady Gaga". And the action could be http://www.google.com/search?btnI&q=$0 because the $0 would match the entire phrase "Play Lady Gaga" so what would be sent to google is: http://www.google.com/search?btnI&q=Play Lady Gaga. btnI means to instantly use the I'm feeling lucky button, so Google would use the first result which would probably be a youtube video.

Text to Speech

As of 1/30/2017, the Speech Recognition Anywhere Chrome extension now has text to speech capabilities. Here is an example of a custom command for making Wolfram Alpha into a talking virtual assistant with voice recognition.

Phrase: Wolfram\s*Alpha (.*?)
Description: Wolfram Alpha
Action: http://www.wolframalpha.com/input/?i=$1;speak(Result.img[0])

The above phrase includes \s* between Wolfram and Alpha because sometimes Google's Web Speech API detects the phrase as "Wolfram Alpha" and other times as "WolframAlpha". This command will accept both. The Action is actually a script. Each script command is separated by ; (semi-colon). The first action in the script goes to wolframalpha website with the input string that was spoken. For example, say "Wolfram Alpha When is the next moon rise?". The next action in the script tells Speech Recognition Anywhere to speak out loud with text-to-speech an element on the web page. The element has an id of Result. But Wolfram Alpha puts the result in an image instead of plain text. But that image has an alt attribute with a plain text answer to the question. So Result.img[0] reads out loud the first or 0th image inside of the element with id of "Result".

Here is another example of a text-to-speech custom command that creates a Decision Maker:

Phrase: Should I (.*?)
Description: Decision Maker
Action: say(Yes|No|Definitely Yes|Absolutely Not|Probably Not)

The say command will read aloud whatever text you put there. The | or pipe (also called vertical bar) separates each text to read as an OR. The say command will randomly choose one of the answers to read aloud. Now ask any question that begins with "Should I...?"


In the Action field of custom commands you can create an action script. Each command is separated with a ; (semi-colon). Here is an example:

Action: http://example.com/;scroll_it(down);click_element(search);speak(answer)

The above action script will first go to example.com. Then it will scroll down the page, then click on an element with an id of search and then speak out loud the text in an element with an id of answer.

Scripting Commands
say(text)Speak out loud with text-to-speech text
speak(el)Read or speak out loud the contents of an element. el can be the id of an element or if the element does not have an id then it can be a tag under an element. For example, if el is body.div[1] then the speak command will read out loud the text inside of the 2nd div (the 1st div is 0) on the body of the document.
scroll_it(direction)Scroll the page. direction can be up, down, right or left.
click_keyword(el)el can be the id of an element to click or the name, text, title or alt of an element to click.
click_element(el)el can be the id of an element to click on or if the element does not have an id then it can be a tag under an element. For example, if el is results.img[0] then the click_element command will click on the first (or 0th) img under the element with id of "results".
clear_text()Clear all text in the currently selected input or textarea.
select(all)Select all text in the currently selected input or textarea or on page.
;;Pause for 1 second. (Each command is separated by half a second, so to pause for 1 second use two semi-colons.
textType text on to the page in the currently selected input or textarea or it will choose the first available input on the page
enter_key()Press the enter key
undo()Undo the last command
keypress_inject(n)For webpages that listen to keypresses. Where n is the decimal character code of the key to press. For example: keypress_inject(49) will press the 1 key. Decimal Character Codes

Last updated on September 13, 2017
Created on December 11, 2016

Back to www.seabreezecomputers.com
Subscribe to Internet Tips and Tools Feed        

User Comments

There are 26 comments.

Displaying first 50 comments.

1. Posted By: Jeff - - December 11, 2016, 1:44 pm
Description: Display the weather satellite (for New York)

Phrase: (?:Display|Show)(?:.*?)?(?:satellite|clouds)(?:.*?)?(?:for |of |in )?(.*?)?

Action: https://weather.weatherbug.com/maps/$1?layerId=GlobalSatellite


2. Posted By: Jeff - - December 11, 2016, 2:25 pm
Phrase: (?:Display|Show)(?:.*?)?(?:moon)(?:.*?)?(phase)?

Action: http://api.usno.navy.mil/imagery/moon.png

Description: You can say: "Show me the moon" or "Display the current moon phase"


3. Posted By: Jeff - - December 13, 2016, 2:26 pm
Phrase: (?:Display|Show)(?:.*?)?(?:rain|radar)(?:.*?)?(?:for |of |in )?(.*?)?
Action: https://weather.weatherbug.com/maps/$1?layerId=Radar.US

Description: Display the radar (for New York)

4. Posted By: Jeff - - December 13, 2016, 2:39 pm
Phrase: (?:Display|Show)(?:.*?)?(?:traffic)(?:.*?)?(?:for |of |in )?(.*?)?

Action: https://www.google.com/maps/place/$1/data=!5m1!1e1

Description: Show me the traffic (for Los Angeles)

5. Posted By: emin - - February 18, 2017, 3:38 pm
I need turkish language

6. Posted By: Jeff - - February 19, 2017, 5:28 pm
To change the speech recognition language to Turkish, In Speech Recognition Anywhere, click on "Settings" and then under "Language" you can select "Turkish".


7. Posted By: Jeff - - February 25, 2017, 7:11 pm
Control Philips Hue Lights

1. Go to www.meethue.com/api/nupnp to get the IP address of your Hue Bridge.
2. Go to http://<bridge ip address>/debug/clip.html/debug/clip.html
3. For url enter: /api for message body enter: {"devicetype":"jeff"}
Where jeff is the username you want to create.
4. Press POST
5. You will get the message "link button not pressed". So press the big round link button on top of your Hue Bridge.
6. Press POST again.
7. This time you get a username hash similar to 1028d66426293e821ecfd9ef1a0731df . Save the username hash for future requests.

Then create a custom command in Speech Recognition Anywhere:

Phrase: Turn( on)?( the)? living room light(s)?( on)?
Action: http://<bridge ip address>/debug/clip.html;clear_text();/api/<username hash>/lights/1/state;click_keyword(messagebody);{"on":true};click_keyword(put)


8. Posted By: Jeff - - March 6, 2017, 7:33 pm
Description: Youtube (any video)

Phrase: Youtube (.*?)

Action: https://www.youtube.com/results?search_query=$1;click_element(results.img[0])


9. Posted By: Jeff - - March 15, 2017, 2:32 pm

Description: "Play (title of song or video)" with youtube

Phrase: ^Play (.*?)$

Action: http://www.google.com/search?q=youtube $0;click_element(res.a[0])


10. Posted By: Jeff - - April 7, 2017, 9:23 pm
Description: What is the UV index in New York

Phrase: (?:What|Display|Show)(?:.*?)(?:UV index)(?:.*?)?(?:for |of |in )?(.*?)?

Action: http://sunburnmap.com/;;;;;clear_text();$1;click_keyword(find)


11. Posted By: Raymond - - June 12, 2017, 12:48 pm
I am not a programmer, so I need some help. I want to be able to switch Speech Recognition from English to Spanish and vice versa. Is there a script I can use for this? Any help will be greatly appreciated

June 20, 2017 - From the Editor:
Version 0.98.8 now has the voice command built in "Change the language to spanish or english"

12. Posted By: Paul LaZar - - July 4, 2017, 10:22 am
I just purchased and installed your speech software. I have installed it on a Windows Tablet running Windows 10 home (full version).

My only interest in your software is to enable OK Google which I have done and it is working. The problem is OK Google responds and takes me away from the web page an example:

"OK Google navigate to DC", the map page comes up with route shown and Google says "navigating to DC" and the page goes somewhere else.

I have tried this in various ways asking questions etc and everytime Google speaks, your software interprets it and types new pages.

Please advise.

13. Posted By: Jeff - - July 4, 2017, 2:49 pm
Hi Paul Lazar,

Sorry about the problem you are having. It is because the microphone is picking up the speech from your speakers. Try this, in Speech Recognition Anywhere click on Settings and then check Pause "Speech Recognition Anywhere" if audio is playing in a tab. I believe that should solve the problem.


14. Posted By: Lola - - July 26, 2017, 4:09 am
Is there anyway to stop the text from appearing in the upper left hand corner before it is typed in the speech box? Also, is there a way to stop the yellow highlighting and scrolling when text is entered?

15. Posted By: Jeff - - July 26, 2017, 11:36 am
Hello Lola,

There is no way to stop the text from appearing before it is typed into the box. That is how users know that the speech recognition is hearing what they say. I'm not sure by what you mean about the yellow highlighting and scrolling. Do you have an example website and box where this is happening?


16. Posted By: Nice software, but not quite working on one site - - August 1, 2017, 11:30 am
Nice software, just what I'm looking for, but unfortunately doesn't quite work for me on ankiweb.net. This site displays a link that I repeatedly click to hear voice prompts. When I try to do this using SRE, the recording starts to play but then I get a 404. I'm having trouble getting the source for the page, but I think what is going on is that the recording is played by a JavaScript snippet which is connected to the link as an onClick attribute. The location specified by the link doesn't really exist, so, e.g., if I right-click on the link and copy the destination to the clipboard and try to load it in another page, I do indeed get a 404. But if I manually click on the link, it works as designed -- i.e., it plays the voice recording and does not try to go to another page.

So I'm not sure of the disconnect here, and unfortunately you won't be able to repro easily without my credentials. If you want to pursue it, I'll give you my creds (and $150 if you make this work on ankiweb).

17. Posted By: Paul LaZar - - August 2, 2017, 9:44 am
Thanks for the info Jeff.

18. Posted By: Jeff - - August 2, 2017, 9:46 am
Hi Nelson,

I think I figured it out. ankiweb.net has another element overlaying the "Play" link with a javascript click event attached to it. So to use your voice and say "Click Play", add this custom command to Speech Recognition Anywhere:

Phrase: Click Play
Action: click_element(jp_container_1)

Or, if you like you can replace "Click Play" in the phrase above with "Press Play".


19. Posted By: Nice software, but not quite working on one site - - August 2, 2017, 9:48 am
Nice work, Jeff -- this works for me.

20. Posted By: Kim V - - August 16, 2017, 1:31 pm
Hi. You should make an option to hide the labels when speaking. They are highly annoying when not using it, and it keeps trying to detect what im saying.

Another thing that would be geat is to disable speech to text and only ahve it work with commands.

21. Posted By: Jeff - - August 17, 2017, 11:26 am
Hi Kim,

Thanks for the suggestions! I like the idea of being able to disable speech to text and only have it work with commands. But I'm not so sure about hiding the labels when speaking. I think most people would be confused and think that the Speech Recognition is not working because it would show no sign of it working until after they finish talking. Let me think about it and see what I can do.


22. Posted By: Jeff - - August 21, 2017, 7:00 pm
Hi Kim V,

In version 0.98.9 of Speech Recognition Anywhere I added the following three settings:
*Disable yellow speech bubble (Only final speech will display)
*Disable Speech-To-Text (Only Voice Commands will work)
*Disable Voice Commands (Only Speech-To-Text will work)

I hope that works for you.


23. Posted By: Kim V - - August 22, 2017, 12:35 pm
Hi Jeff.

You are awesome! Nice to see a suggestions getting into final product :)

24. Posted By: Jeff - - September 4, 2017, 9:32 am
Hi Samuel Cartaxo,

Thank you for notifying me of the error. Some people get the error and others don't. But Google seems to be doing nothing about it. How often do you get the error?


25. Posted By: thomas - - September 28, 2017, 11:18 am
I am curious what you are using internally for ASR and TTS?
Are you using google's speech API, or built-in browser-specific tools from chrome?

26. Posted By: Jeff - - September 28, 2017, 12:39 pm
Hi thomas,

Speech Recognition in Speech Recognition Anywhere is accomplished with the Javascript Web Speech API which at this time only works well in Chrome.
See: developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

TTS is accomplished with the Web Speech API SpeechSynthesis. It seems to work with Chrome, Firefox, Edge and Safari but not IE.
See: developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis