With "Speech Recognition Anywhere" you can control the Internet with your voice.
Use Speech Recognition to fill out any input, textarea, form or document on the web!
The speech you speak is automatically typed into any form on any web page. "Speech
Recognition Anywhere" can also be used as an awesome Virtual Assistant in Chrome.
Download the Speech Recognition Anywhere Chrome Extension today.
- Virtual Assistant Mode
- Choose between dozens of languages and dialects for speech recognition
- Dictate emails and online documents
- Fill in forms with your voice
- Go to the next or previous field with your voice
- Go to any web page with your voice
- Switch tabs and navigate webpages with your voice
- Scroll page up or down
- Click on links and buttons with your voice
- Cut, Copy, Paste, Clear, Highlight
- Say "Show labels" to see labels to buttons on a webpage
- Say "Play (name of artist or song)" to play music instantly
- Create Custom Voice Commands
- Text To Speech
Look in the comments below for "Custom Commands" that you can add to Speech Recognition Anywhere.
If you have created some awesome commands for Speech Recognition Anywhere then please share them
in the comments section below. (If you have urls (http: https:) in the action then please surround
it with <code> </code> tags so the comment box does not convert it to a link.)
To create custom commands you do not need to use
but regular expressions will make your custom commands more powerful. For example, you could create this
basic custom command:
Phrase: Display the weather satellite
Description: Display the weather satellite
But you would have to say the exact phrase, "Display the weather satellite" in order for
the satellite image to display. But if you use regular expressions like in the example below
then you can say a number of similar sentences to activate the command:
Phrase: (?:Display|Show)(?:.*?)?(?:satellite|clouds)(?:.*?)?(?:for |of |in )?(.*?)?
Description: Display the weather satellite
With the above phrase you could say "Show me the clouds" or "Display the weather satellite for New York".
Here is a breakdown of the phrase:
(?:Display|Show) means to look for either "display" or "show".
The | symbol means "or". Putting ?: at the beginning of the match inside the parentheses ()
means to look for the match but don't remember the match.
(?:.*?)? means to look for any number of optional words like "the" or "the weather" and do not
remember the match. The ? at the end outside of the parentheses () means these words are optional.
For example, you could say "Show me the weather satellite" or you could just say "Show satellite".
(?:satellite|clouds) means that the user has to say either "satellite" or "clouds" in the phrase
for the phrase to be detected.
(?:.*?)? means that we again look for any number of optional words.
(?:for |of |in )? means that we look for "for " or "of " or "in " so that the user can say
"Display the satellite for Colorado". Putting the ? at the end means this is optional.
(.*?)? means that we look for one more optional words or group of words at the end. But this
time we don't put ?: at the beginning inside the parentheses () because we want to remember the
match to use it later on. We want to remember the last word for a spoken command like "Show me the weather
in London". Then the remembered match can be used in the action: https://weather.weatherbug.com/maps/$1 .
The $1 will be replaced with London in the url. $1 is used for the frist match and $2 for the second, etc.
If you wanted to put the whole spoken command in the action then you would use $0. As an example, if you
were wanting to let Google to decide how to play music for you then you could use this phrase: Play (.*?) . So the
spoken command could be "Play Lady Gaga". And the action could be http://www.google.com/search?btnI&q=$0 because
the $0 would match the entire phrase "Play Lady Gaga" so what would be sent to google is:
http://www.google.com/search?btnI&q=Play Lady Gaga. btnI means to instantly use the I'm feeling lucky button, so
Google would use the first result which would probably be a youtube video.
Text to Speech
As of 1/30/2017, the Speech Recognition Anywhere Chrome extension now has text to speech
capabilities. Here is an example of a custom command for making Wolfram Alpha into a talking
virtual assistant with voice recognition.
Phrase: Wolfram\s*Alpha (.*?)
Description: Wolfram Alpha
The above phrase includes \s* between Wolfram and Alpha because sometimes Google's Web
Speech API detects the phrase as "Wolfram Alpha" and other times as "WolframAlpha". This command
will accept both. The Action is actually a script. Each script command is separated by ;
(semi-colon). The first action in the script goes to wolframalpha website with the input string
that was spoken. For example, say "Wolfram Alpha When is the next moon rise?". The next action in
the script tells Speech Recognition Anywhere to speak out loud with text-to-speech an element
on the web page. The element has an id of Result. But Wolfram Alpha puts the result in an
image instead of plain text. But that image has an alt attribute with a plain text answer to
the question. So Result.img reads out loud the first or 0th image inside of the element with id
Here is another example of a text-to-speech custom command that creates a Decision Maker:
Phrase: Should I (.*?)
Description: Decision Maker
Action: say(Yes|No|Definitely Yes|Absolutely Not|Probably Not)
The say command will read aloud whatever text you put there. The | or pipe (also called vertical bar)
separates each text to read as an OR. The say command will randomly choose one
of the answers to read aloud. Now ask any question that begins with "Should I...?"
In the Action field of custom commands you can create an action script. Each command is
separated with a ; (semi-colon). Here is an example:
The above action script will first go to example.com. Then it will scroll down the page,
then click on an element with an id of search and then speak out loud the text in an element
with an id of answer.
|say(text)||Speak out loud with text-to-speech text|
|speak(el)||Read or speak out loud the contents of an element.
el can be the id of an element or if the element does not have an id then it can be a tag
under an element. For example, if el is body.div then the speak command will read out loud the text inside of
the 2nd div (the 1st div is 0) on the body of the document.|
|scroll_it(direction)||Scroll the page. direction can be up, down, right or left.|
|click_keyword(el)||el can be the id of an element to click or the name, text, title or alt of an element to click.|
|click_element(el)||el can be the id of an element to click on or if the element does not have
an id then it can be a tag under an element. For example, if el is results.img then the click_element command
will click on the first (or 0th) img under the element with id of "results".|
|clear_text()||Clear all text in the currently selected input or textarea.|
|select(all)||Select all text in the currently selected input or textarea or on page.|
|;;||Pause for 1 second. (Each command is separated by half a second, so to pause for 1 second use two semi-colons.|
|text||Type text on to the page in the currently selected input or textarea or it will choose the first available input on the page|
|enter_key()||Press the enter key|
|undo()||Undo the last command|
|keypress_inject(n)||For webpages that listen to keypresses. Where n is the decimal character code of the key to press.
For example: keypress_inject(49) will press the 1 key.
Decimal Character Codes|
Last updated on September 13, 2017
Created on December 11, 2016