How to Build a Voice Assistant for Your Website

Speaking Web Bot in 10 Minutes

Today you can actually build a voice assistant for your website without too much time or effort. Just some gumption and a little know how. Chatbot functionality and quality are growing rapidly these days, with a snow ball effect of rolling down hill. The popularity of chatbot widgets shows the value and expected user experience. We get used to self-service, but still want some personalized attention.

Mobile users are familiar with voice assistants recognizing speech commands, such as Siri, the Google Assistant, Alexa. With Web Speech API support in browsers, it is also possible to speak to browser-based bots directly even offline.

In this article, we will describe the creation of a chat working entirely in the web browser with a bot listening to your voice commands and speaking back to you by voice.

The browser for this demo has to support speech recognition and speech synthesis (currently they both work only in the Chromium-based browsers).

Part 1 - Installation

To build this demo, you need Node.js installed, see instruction on their site.

Create a directory for this application ("speakbot"). Then inside this new directory initialize the application with default settings:

npm init --yes

and install dependencies:

npm i @nlpjs/core @nlpjs/lang-en-min @nlpjs/nlp

To generate the web bundle, you need two development libraries: browserify and terser.

npm i -D browserify terser

Open your package.json and add this into the scripts section:

    "dist": "browserify ./lib.js | terser --compress --mangle > ./bundle.js"

Part 2 - The AI

The "AI" part of our application will be NLP.js - general natural language utility for Node.js development.

First, we need to build a bundle file, which contains the framework parts we are going to use. Create a file named "lib.js" with the following content:

const core = require("@nlpjs/core")
const nlp = require("@nlpjs/nlp")
const langenmin = require("@nlpjs/lang-en-min")

window.nlpjs = { ...core, ...nlp, ...langenmin }

Second, compile the bundle:

npm run dist

Third, create a file named "index.js" with the following content:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs

;(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")

  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")

  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")

  await nlp.train()

  // Test it
  const response = await nlp.process("en", "I should go now")
  console.log(response)
})()

Here we use NLP.js example from their documentation, feel free to create your own intents.

To execute this code in browser you need a simple html file "index.html" with the following content:

<html>
<head>
  <title>Speak</title>
  <script src='./bundle.js'></script>
  <script src='./index.js'></script>
</head>
<body>
</body>
</html>

Open the "index.html" file in the browser and look to the console (press F12).

Part 3 - The UI

Now you need a simple chat interface to speak to the AI assistant. Add UI elements to the "index.html":

<html>
<head>
  <title>Speak</title>
  <script src="./bundle.js"></script>
  <script src="./index.js"></script>
</head>
<body>
  <div id="history" style="height: 300px; overflow-y: scroll;"></div>
  <form>
    <input id="message" placeholder="Type your message" style="width: 70%;" />
    <button id="send" type="submit">Send</button>
  </form>
</body>
</html>

Then modify the "index.js" to use our form:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs

// shortland function
const el = document.getElementById.bind(document)

// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")

  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")

  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")

  await nlp.train()

  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
    const response = await nlp.process("en", msg)
    const answer = response.answer || "I don't understand."
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
  }

  // Add form submit event listener
  document.forms[0].onsubmit = onMessage
})

Open the "index.html" in the browser and test our chat interface.

Part 4 - The voice

To make voice interface for our bot, we use browser SpeechRecognition and speechSynthesis APIs.

Currently, only Google Chrome browser has the full support of both simultaneously. Modify the "index.js" with the following:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs

// shortland function
const el = document.getElementById.bind(document)

function capitalize(string) {
  return string.charAt(0).toUpperCase() + string.slice(1)
}

// initialize speech recognition
const SpeechRecognition =
  window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = SpeechRecognition ? new SpeechRecognition() : null

// how long to listen before sending the message
const MESSAGE_DELAY = 3000

// timer variable
let timer = null

let recognizing = false

// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")

  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")

  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")

  await nlp.train()

  // initialize speech generation
  let synthVoice = null
  if ("speechSynthesis" in window && recognition) {
    // wait until voices are ready
    window.speechSynthesis.onvoiceschanged = () => {
      synthVoice = text => {
        clearTimeout(timer)
        const synth = window.speechSynthesis
        const utterance = new SpeechSynthesisUtterance()
        // select some english voice
        const voice = synth.getVoices().find(voice => {
          return voice.localService && voice.lang === "en-US"
        })
        if (voice) utterance.voice = voice
        utterance.text = text
        synth.speak(utterance)
        timer = setTimeout(onMessage, MESSAGE_DELAY)
      }
    }
  }

  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
    const response = await nlp.process("en", msg)
    const answer = response.answer || "I don't understand."
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
    if (synthVoice && recognizing) synthVoice(answer)
  }

  // Add form submit event listener
  document.forms[0].onsubmit = onMessage

  // if speech recognition is supported then add elements for it
  if (recognition) {
    // add speak button
    const speakElement = document.createElement("button")
    speakElement.id = "speak"
    speakElement.innerText = "Speak!"
    speakElement.onclick = e => {
      e.preventDefault()
      recognition.start()
    }
    document.forms[0].appendChild(speakElement)

    // add "interim" element
    const interimElement = document.createElement("div")
    interimElement.id = "interim"
    interimElement.style.color = "grey"
    document.body.appendChild(interimElement)

    // configure continuous speech recognition
    recognition.continuous = true
    recognition.interimResults = true
    recognition.lang = "en-US"

    // switch to listening mode
    recognition.onstart = function () {
      recognizing = true
      el("speak").style.display = "none"
      el("send").style.display = "none"
      el("message").disabled = true
      el("message").placeholder = "Listening..."
    }

    recognition.onerror = function (event) {
      alert(event.error)
    }

    // switch back to type mode
    recognition.onend = function () {
      el("speak").style.display = "inline-block"
      el("send").style.display = "inline-block"
      el("message").disabled = false
      el("message").placeholder = "Type your message"
      el("interim").innerText = ""
      clearTimeout(timer)
      onMessage()
      recognizing = false
    }

    // speech recognition result event;
    // append recognized text to the form input and display interim results
    recognition.onresult = event => {
      clearTimeout(timer)
      timer = setTimeout(onMessage, MESSAGE_DELAY)
      let transcript = ""
      for (var i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          let msg = event.results[i][0].transcript
          if (!el("message").value) msg = capitalize(msg.trimLeft())
          el("message").value += msg
        } else {
          transcript += event.results[i][0].transcript
        }
      }
      el("interim").innerText = transcript
    }
  }
})

Open the "index.html" in the browser. Press the "Speak" button, allow microphone usage, and talk to our new voice assistant.

Getting Started

Modern browsers have got all the features to recognize and generate the speech already. Together with JavaScript or WebAssembly based NLP frameworks, building custom portable voice assistants, working entirely in the browser, or connected with some API is completely doable.

Our simplest standalone voice assistant is written in just about 160 lines of code, thanks to NLP.js text processing features and Google Chrome speech recognition and synthesis.

There also exists a bunch of other usable NLP libraries and SaaS solutions with their own unique features, so you have a choice to select the one which fits your needs the best. You can download our tutorial bundle in one HTML file here.

Intersog builds custom solutions in AI, IoT and web software. We are a perfect partner when embarking on new technologies. Web-based voice assistants are becoming more and more common place and great place to begin amplifying your customer engagement and service.

Dmytro

An experienced full-stack software developer with engineer mentality. Dmytro codes mostly with Node.js, Python and Rust, explores and experiments with the latest technologies, such as AI and Deep Learning.