Code and Developers

Building a Speaking Web-Bot in 10 Minutes

Let’s talk about how to develop an NLP voice-based chatbot that listens to your voice and speaks to you in Chromium-based browsers.

Chatbots functionality and quality grow quickly nowadays, with more and more advancement. It is popular to use chatbot widgets on the web to provide support to site users.

The big and complex models are provided by different vendors and can be used from the web-based chatbot development via API. Also, there are embedded NLP frameworks and QnA systems working completely in the browser. They are simpler compared to SaaS ones yet they are feature-rich and evolving.

Mobile users are familiar with voice assistants recognizing speech commands, such as Siri, the Google Assistant, Alexa.

Check out a related article:

With the appearance of Web Speech API support in browsers (Google introduced this API in their Chrome browser in 2013), it is also possible to speak to browser-based bots directly even offline.

In this article, we will describe the creation of a chat working entirely in the web browser with a bot listening to your voice commands and speaking back to you by voice.

The browser for this demo has to support speech recognition and speech synthesis (currently they both work only in the Chromium-based browsers).

Part 1 - Installation

To build this demo, you need Node.js installed, see instruction on their site.

Create a directory for this application ("speakbot"). Then inside this new directory initialize the application with default settings:

npm init --yes

and install dependencies:

Check out a related article:

npm i @nlpjs/core @nlpjs/lang-en-min @nlpjs/nlp

To generate the web bundle, you need two development libraries: browserify and terser.

npm i -D browserify terser

Open your package.json and add this into the scripts section:

    "dist": "browserify ./lib.js | terser --compress --mangle > ./bundle.js"

Part 2 - The AI

The "AI" part of our application will be NLP.js - a general natural language utility for Node.js.

First, we need to build a bundle file, which contains the framework parts we are going to use. Create a file named "lib.js" with the following content:

const core = require("@nlpjs/core")
const nlp = require("@nlpjs/nlp")
const langenmin = require("@nlpjs/lang-en-min")
​
window.nlpjs = { ...core, ...nlp, ...langenmin }

Second, compile the bundle:

npm run dist

Third, create a file named "index.js" with the following content:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs
​
;(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")
​
  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")
​
  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  await nlp.train()
​
  // Test it
  const response = await nlp.process("en", "I should go now")
  console.log(response)
})()

Here we use NLP.js example from their documentation, feel free to create your own intents.

To execute this code in browser you need a simple html file "index.html" with the following content:

<html>
<head>
  <title>Speak</title>
  <script src='./bundle.js'></script>
  <script src='./index.js'></script>
</head>
<body>
</body>
</html>
Open the "index.html" file in the browser and look to the console (press F12).

Part 3 - The UI

Now you need a simple chat interface to speak to the AI assistant. Add UI elements to the "index.html":

<html>
<head>
  <title>Speak</title>
  <script src="./bundle.js"></script>
  <script src="./index.js"></script>
</head>
<body>
  <div id="history" style="height: 300px; overflow-y: scroll;"></div>
  <form>
    <input id="message" placeholder="Type your message" style="width: 70%;" />
    <button id="send" type="submit">Send</button>
  </form>
</body>
</html>

Then modify the "index.js" to use our form:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs
​
// shortland function
const el = document.getElementById.bind(document)
​
// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")
​
  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")
​
  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  await nlp.train()
​
  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
    const response = await nlp.process("en", msg)
    const answer = response.answer || "I don't understand."
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
  }
​
  // Add form submit event listener
  document.forms[0].onsubmit = onMessage
})

Open the "index.html" in the browser and test our chat interface.

Part 4 - The voice

To make voice interface for our bot, we use browser SpeechRecognition and speechSynthesis APIs.

Currently, only Google Chrome browser has the full support of both simultaneously. Modify the "index.js" with the following:

const { containerBootstrap, Nlp, LangEn } = window.nlpjs
​
// shortland function
const el = document.getElementById.bind(document)
​
function capitalize(string) {
  return string.charAt(0).toUpperCase() + string.slice(1)
}
​
// initialize speech recognition
const SpeechRecognition =
  window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = SpeechRecognition ? new SpeechRecognition() : null
​
// how long to listen before sending the message
const MESSAGE_DELAY = 3000
​
// timer variable
let timer = null
​
let recognizing = false
​
// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")
​
  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")
​
  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  await nlp.train()
​
  // initialize speech generation
  let synthVoice = null
  if ("speechSynthesis" in window && recognition) {
    // wait until voices are ready
    window.speechSynthesis.onvoiceschanged = () => {
      synthVoice = text => {
        clearTimeout(timer)
        const synth = window.speechSynthesis
        const utterance = new SpeechSynthesisUtterance()
        // select some english voice
        const voice = synth.getVoices().find(voice => {
          return voice.localService && voice.lang === "en-US"
        })
        if (voice) utterance.voice = voice
        utterance.text = text
        synth.speak(utterance)
        timer = setTimeout(onMessage, MESSAGE_DELAY)
      }
    }
  }
​
  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
    const response = await nlp.process("en", msg)
    const answer = response.answer || "I don't understand."
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
    if (synthVoice && recognizing) synthVoice(answer)
  }
​
  // Add form submit event listener
  document.forms[0].onsubmit = onMessage
​
  // if speech recognition is supported then add elements for it
  if (recognition) {
    // add speak button
    const speakElement = document.createElement("button")
    speakElement.id = "speak"
    speakElement.innerText = "Speak!"
    speakElement.onclick = e => {
      e.preventDefault()
      recognition.start()
    }
    document.forms[0].appendChild(speakElement)
​
    // add "interim" element
    const interimElement = document.createElement("div")
    interimElement.id = "interim"
    interimElement.style.color = "grey"
    document.body.appendChild(interimElement)
​
    // configure continuous speech recognition
    recognition.continuous = true
    recognition.interimResults = true
    recognition.lang = "en-US"
​
    // switch to listening mode
    recognition.onstart = function () {
      recognizing = true
      el("speak").style.display = "none"
      el("send").style.display = "none"
      el("message").disabled = true
      el("message").placeholder = "Listening..."
    }
​
    recognition.onerror = function (event) {
      alert(event.error)
    }
​
    // switch back to type mode
    recognition.onend = function () {
      el("speak").style.display = "inline-block"
      el("send").style.display = "inline-block"
      el("message").disabled = false
      el("message").placeholder = "Type your message"
      el("interim").innerText = ""
      clearTimeout(timer)
      onMessage()
      recognizing = false
    }
​
    // speech recognition result event;
    // append recognized text to the form input and display interim results
    recognition.onresult = event => {
      clearTimeout(timer)
      timer = setTimeout(onMessage, MESSAGE_DELAY)
      let transcript = ""
      for (var i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          let msg = event.results[i][0].transcript
          if (!el("message").value) msg = capitalize(msg.trimLeft())
          el("message").value += msg
        } else {
          transcript += event.results[i][0].transcript
        }
      }
      el("interim").innerText = transcript
    }
  }
})

Open the "index.html" in the browser. Press the "Speak" button, allow microphone usage, and talk to our new voice assistant.

Conclusion

Modern browsers have got all the features to recognize and generate the speech already.

Together with JavaScript or WebAssembly based NLP frameworks, those open a wide field of building custom portable voice assistants, working entirely in the browser, or connected with some API.

Our simplest standalone voice assistant is written in just about 160 lines of code, thanks to NLP.js text processing features and Google Chrome speech recognition and synthesis.

There also exists a bunch of other usable NLP libraries and SaaS solutions with their own unique features, so you have a choice to select the one which fits your needs the best.

You can download this tutorial bundle in one HTML file here.

avatar
AI Developer
An experienced full-stack software developer with engineer mentality. Dmitriy codes mostly with Node.js, Python and Rust, explores and experiments with the latest technologies, such as AI and Deep Learning.

Leave a comment

Related Posts

Code and Developers
ReactWay 2020

Code and Developers
Programming Bot Actions in 10 Minutes

Code and Developers
Is RxSwift Still Relevant in Modern Mobile iOS Development?