IT Strategy

How to Program a Voice Assistant in 10 Minutes

Implement a few skills such as weather forecast and news aggregation when the bot is using external APIs to provide data.

In the previous article, we built the simplest speaking chatbot entirely in a browser, with the help of the NLP.js library and Web Speech API.

Our demo application showed an ability to distinguish a few phrases (or intents), recognize and generate speech. Using this skeleton, we can extend its functionality to recognize more intents, and as a more interesting part, to perform some programmed actions. Usually, such a functionality is called the skill.

We are going to implement a few skills such as weather forecast and news aggregation when the bot is using external APIs to provide data. Follow the steps from the previous post to prepare the development environment and take the starting source code.

Part 1 – The weather forecast

To get weather data you need to use some Weather Data API.

Let's use openweathermap.org/ to retrieve weather information. Here are the steps to follow:

  1. We have to exact location from text, so need to install NER module:
npm install @nlpjs/ner

import it in "lib.js" file:

const core = require("@nlpjs/core")
const nlp = require("@nlpjs/nlp")
const langenmin = require("@nlpjs/lang-en-min")
const ner = require('@nlpjs/ner')
​
window.nlpjs = { ...core, ...nlp, ...langenmin, ...ner }

and build:

npm run dist

2. Open "index.js" and add the following function at the source bottom:

async function getWeatherText(location) {
  const weatherURL = new URL("https://api.openweathermap.org/data/2.5/weather")
​
  weatherURL.searchParams.set("q", location)
  weatherURL.searchParams.set("APPID", 'YOUR_WEATHER_KEY')
  weatherURL.searchParams.set("units", "metric")
​
  const resp = await fetch(weatherURL.toString())
  const data = await resp.json()
  return `There will be ${data.weather[0].description} today in ${data.name}. Currently, the temperature is ${data.main.temp} °C. 🌡`
}

3. Go to the openweathermap.org and sign up.

They have a free plan to start.

4. Open the account page. You'll find a key there.

Copy-paste it to the function just created.

Note that in production you should never expose keys or tokens to the public. It is recommended to have a back-end server requesting the API transparently, to keep the key in secret.

5. Now, we need to prepare the weather intent.

In the utterances section of "index.js" add new phrases:

  // weather intent
  nlp.addDocument("en", "what is the weather in London", "weather.current")
  nlp.addDocument("en", "show me the weather", "weather.current")
  nlp.addDocument("en", "what is the weather today", "weather.current")
  nlp.addDocument("en", "what is today's weather", "weather.current")
  nlp.addDocument("en", "what's the weather forecast for today", "weather.current")
  nlp.addDocument("en", "how's the weather today", "weather.current")

6. Use a simple NER rule:


  nlp.addLanguage("en")
  const ner = container.get("ner")
  ...
  // city entity
  ner.addAfterLastCondition('en', 'city', 'in')

7. For the detected intent "weather.current", use the Weather API to retrieve the current weather data. The City is filled as a "city" entity. If the city is omitted, ask for it. See details in the resulting code.

The resulting "index.js" code is:

const { containerBootstrap, Nlp, LangEn, Ner } = window.nlpjs
​
// shortland function
const el = document.getElementById.bind(document)
​
function capitalize(string) {
  return string.charAt(0).toUpperCase() + string.slice(1)
}
​
// initialize speech recognition
const SpeechRecognition =
  window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = SpeechRecognition ? new SpeechRecognition() : null
​
// how long to listen before sending the message
const MESSAGE_DELAY = 3000
​
// timer variable
let timer = null
​
let recognizing = false
​
let city
​
// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(Ner)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")
  const ner = container.get("ner")
​
  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")
​
  // weather intent
  nlp.addDocument("en", "what is the weather in London", "weather.current")
  nlp.addDocument("en", "show me the weather", "weather.current")
  nlp.addDocument("en", "what is the weather today", "weather.current")
  nlp.addDocument("en", "what is today's weather", "weather.current")
  nlp.addDocument(
    "en",
    "what's the weather forecast for today",
    "weather.current"
  )
  nlp.addDocument("en", "how's the weather today", "weather.current")
​
  // city entity
  ner.addAfterLastCondition("en", "city", "in")
​
  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  await nlp.train()
​
  // initialize speech generation
  let synthVoice = null
  if ("speechSynthesis" in window && recognition) {
    // wait until voices are ready
    window.speechSynthesis.onvoiceschanged = () => {
      synthVoice = text => {
        clearTimeout(timer)
        const synth = window.speechSynthesis
        const utterance = new SpeechSynthesisUtterance()
        // select some english voice
        const voice = synth.getVoices().find(voice => {
          return voice.localService && voice.lang === "en-US"
        })
        if (voice) utterance.voice = voice
        utterance.text = text
        synth.speak(utterance)
        timer = setTimeout(onMessage, MESSAGE_DELAY)
      }
    }
  }
​
  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
​
    let answer
​
    if (city === "x") {
      city = removePunctuation(msg)
      answer = await getWeatherText(city)
    } else {
      const response = await nlp.process("en", msg)
      if (response.intent === "weather.current") {
        // extend with NER results
        response.entities = [
          ...response.entities,
          ...(
            await ner.process({
              text: msg,
              locale: "en"
            })
          ).entities
        ]
        city = getCity(response)
        if (!city) {
          city = "x"
          answer = "In which city?"
        } else {
          answer = await getWeatherText(city)
        }
      } else {
        answer = response.answer || "I don't understand."
      }
    }
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
    if (synthVoice && recognizing) synthVoice(answer)
  }
​
  // Add form submit event listener
  document.forms[0].onsubmit = onMessage
​
  // if speech recognition is supported then add elements for it
  if (recognition) {
    // add speak button
    const speakElement = document.createElement("button")
    speakElement.id = "speak"
    speakElement.innerText = "Speak!"
    speakElement.onclick = e => {
      e.preventDefault()
      recognition.start()
    }
    document.forms[0].appendChild(speakElement)
​
    // add "interim" element
    const interimElement = document.createElement("div")
    interimElement.id = "interim"
    interimElement.style.color = "grey"
    document.body.appendChild(interimElement)
​
    // configure continuous speech recognition
    recognition.continuous = true
    recognition.interimResults = true
    recognition.lang = "en-US"
​
    // switch to listening mode
    recognition.onstart = function () {
      recognizing = true
      el("speak").style.display = "none"
      el("send").style.display = "none"
      el("message").disabled = true
      el("message").placeholder = "Listening..."
    }
​
    recognition.onerror = function (event) {
      alert(event.error)
    }
​
    // switch back to type mode
    recognition.onend = function () {
      el("speak").style.display = "inline-block"
      el("send").style.display = "inline-block"
      el("message").disabled = false
      el("message").placeholder = "Type your message"
      el("interim").innerText = ""
      clearTimeout(timer)
      onMessage()
      recognizing = false
    }
​
    // speech recognition result event;
    // append recognized text to the form input and display interim results
    recognition.onresult = event => {
      clearTimeout(timer)
      timer = setTimeout(onMessage, MESSAGE_DELAY)
      let transcript = ""
      for (var i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          let msg = event.results[i][0].transcript
          if (!el("message").value) msg = capitalize(msg.trimLeft())
          el("message").value += msg
        } else {
          transcript += event.results[i][0].transcript
        }
      }
      el("interim").innerText = transcript
    }
  }
})
​
async function getWeatherText(city) {
  const weatherURL = new URL("https://api.openweathermap.org/data/2.5/weather")
​
  weatherURL.searchParams.set("q", city)
  weatherURL.searchParams.set("APPID", "YOUR_WEATHER_KEY")
  weatherURL.searchParams.set("units", "metric")
​
  const resp = await fetch(weatherURL.toString())
  const data = await resp.json()
  return `There will be ${data.weather[0].description} today in ${data.name}. Currently, the temperature is ${data.main.temp} \u2103 (feels like ${data.main.feels_like} \u2103).`
}
​
function getCity(response) {
  if (!response.entities) return ""
  const entity = response.entities.find(x => x.entity === "city")
  if (!entity) return ""
  return removePunctuation(entity.utteranceText)
}
​
function removePunctuation(x) {
  return x
    .replace(
      /[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/g,
      " "
    )
    .trim()
}

Part 2 - News aggregation

The same as in previous part, here we'll add news skill.

To get news, we'll use the newsapi.org/ API.

Signup and get your API key there. Add corresponding intent, function and condition to the "index.js" file, the same as before:

const { containerBootstrap, Nlp, LangEn, Ner } = window.nlpjs
​
// shortland function
const el = document.getElementById.bind(document)
​
function capitalize(string) {
  return string.charAt(0).toUpperCase() + string.slice(1)
}
​
// initialize speech recognition
const SpeechRecognition =
  window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = SpeechRecognition ? new SpeechRecognition() : null
​
// how long to listen before sending the message
const MESSAGE_DELAY = 3000
​
// timer variable
let timer = null
​
let recognizing = false
​
let city
​
// delay initialization until form is created
setTimeout(async () => {
  const container = await containerBootstrap()
  container.use(Nlp)
  container.use(Ner)
  container.use(LangEn)
  const nlp = container.get("nlp")
  nlp.settings.autoSave = false
  nlp.addLanguage("en")
  const ner = container.get("ner")
​
  // Adds the utterances and intents for the NLP
  nlp.addDocument("en", "goodbye for now", "greetings.bye")
  nlp.addDocument("en", "bye bye take care", "greetings.bye")
  nlp.addDocument("en", "okay see you later", "greetings.bye")
  nlp.addDocument("en", "bye for now", "greetings.bye")
  nlp.addDocument("en", "i must go", "greetings.bye")
  nlp.addDocument("en", "hello", "greetings.hello")
  nlp.addDocument("en", "hi", "greetings.hello")
  nlp.addDocument("en", "howdy", "greetings.hello")
​
  // weather intent
  nlp.addDocument("en", "current weather", "weather.current")
  nlp.addDocument("en", "what is the weather in London", "weather.current")
  nlp.addDocument("en", "show me the weather", "weather.current")
  nlp.addDocument("en", "what is the weather today", "weather.current")
  nlp.addDocument("en", "what is today's weather", "weather.current")
  nlp.addDocument(
    "en",
    "what's the weather forecast for today",
    "weather.current"
  )
  nlp.addDocument("en", "how's the weather today", "weather.current")
​
  // city entity
  ner.addAfterLastCondition("en", "city", "in")
​
  // news intent
  nlp.addDocument("en", "what are the latest news", "news.top")
  nlp.addDocument("en", "what are the top news", "news.top")
  nlp.addDocument("en", "latest news", "news.top")
  nlp.addDocument("en", "top news", "news.top")
  nlp.addDocument("en", "what are today's news", "news.top")
​
  // Train also the NLG
  nlp.addAnswer("en", "greetings.bye", "Till next time")
  nlp.addAnswer("en", "greetings.bye", "see you soon!")
  nlp.addAnswer("en", "greetings.hello", "Hey there!")
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  nlp.addAnswer("en", "greetings.hello", "Greetings!")
​
  await nlp.train()
​
  // initialize speech generation
  let synthVoice = null
  if ("speechSynthesis" in window && recognition) {
    // wait until voices are ready
    window.speechSynthesis.onvoiceschanged = () => {
      synthVoice = text => {
        clearTimeout(timer)
        const synth = window.speechSynthesis
        const utterance = new SpeechSynthesisUtterance()
        // select some english voice
        const voice = synth.getVoices().find(voice => {
          return voice.localService && voice.lang === "en-US"
        })
        if (voice) utterance.voice = voice
        utterance.text = text
        synth.speak(utterance)
        timer = setTimeout(onMessage, MESSAGE_DELAY)
      }
    }
  }
​
  // form submit event
  async function onMessage(event) {
    if (event) event.preventDefault()
    const msg = el("message").value
    el("message").value = ""
    if (!msg) return
    const userElement = document.createElement("div")
    userElement.innerHTML = "<b>User</b>: " + msg
    userElement.style.color = "blue"
    el("history").appendChild(userElement)
​
    let answer
​
    if (city === "x") {
      city = removePunctuation(msg)
      answer = await getWeatherText(city)
    } else {
      const response = await nlp.process("en", msg)
      if (response.intent === "weather.current") {
        // extend with NER results
        response.entities = [
          ...response.entities,
          ...(
            await ner.process({
              text: msg,
              locale: "en"
            })
          ).entities
        ]
        city = getCity(response)
        if (!city) {
          city = "x"
          answer = "In which city?"
        } else {
          answer = await getWeatherText(city)
        }
      } else if (response.intent === "news.top") {
        answer = await getNewsText()
      } else {
        answer = response.answer || "I don't understand."
      }
    }
    const botElement = document.createElement("div")
    botElement.innerHTML = "<b>Bot</b>: " + answer
    botElement.style.color = "green"
    el("history").appendChild(botElement)
    if (synthVoice && recognizing) synthVoice(answer)
  }
​
  // Add form submit event listener
  document.forms[0].onsubmit = onMessage
​
  // if speech recognition is supported then add elements for it
  if (recognition) {
    // add speak button
    const speakElement = document.createElement("button")
    speakElement.id = "speak"
    speakElement.innerText = "Speak!"
    speakElement.onclick = e => {
      e.preventDefault()
      recognition.start()
    }
    document.forms[0].appendChild(speakElement)
​
    // add "interim" element
    const interimElement = document.createElement("div")
    interimElement.id = "interim"
    interimElement.style.color = "grey"
    document.body.appendChild(interimElement)
​
    // configure continuous speech recognition
    recognition.continuous = true
    recognition.interimResults = true
    recognition.lang = "en-US"
​
    // switch to listening mode
    recognition.onstart = function () {
      recognizing = true
      el("speak").style.display = "none"
      el("send").style.display = "none"
      el("message").disabled = true
      el("message").placeholder = "Listening..."
    }
​
    recognition.onerror = function (event) {
      alert(event.error)
    }
​
    // switch back to type mode
    recognition.onend = function () {
      el("speak").style.display = "inline-block"
      el("send").style.display = "inline-block"
      el("message").disabled = false
      el("message").placeholder = "Type your message"
      el("interim").innerText = ""
      clearTimeout(timer)
      onMessage()
      recognizing = false
    }
​
    // speech recognition result event;
    // append recognized text to the form input and display interim results
    recognition.onresult = event => {
      clearTimeout(timer)
      timer = setTimeout(onMessage, MESSAGE_DELAY)
      let transcript = ""
      for (var i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          let msg = event.results[i][0].transcript
          if (!el("message").value) msg = capitalize(msg.trimLeft())
          el("message").value += msg
        } else {
          transcript += event.results[i][0].transcript
        }
      }
      el("interim").innerText = transcript
    }
  }
})
​
async function getWeatherText(city) {
  const weatherURL = new URL("https://api.openweathermap.org/data/2.5/weather")
​
  weatherURL.searchParams.set("q", city)
  weatherURL.searchParams.set("APPID", "YOUR_WEATHER_KEY")
  weatherURL.searchParams.set("units", "metric")
​
  const resp = await fetch(weatherURL.toString())
  const data = await resp.json()
  return `There will be ${data.weather[0].description} today in ${data.name}. Currently, the temperature is ${data.main.temp} \u2103 (feels like ${data.main.feels_like} \u2103).`
}
​
function getCity(response) {
  if (!response.entities) return ""
  const entity = response.entities.find(x => x.entity === "city")
  if (!entity) return ""
  return removePunctuation(entity.utteranceText)
}
​
function removePunctuation(x) {
  return x
    .replace(
      /[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/g,
      " "
    )
    .trim()
}
​
async function getNewsText() {
  const newsURL = new URL("https://newsapi.org/v2/top-headlines")
​
  newsURL.searchParams.set("sources", "bbc-news")
  newsURL.searchParams.set("apiKey", "YOUR_NEWS_KEY")
​
  const resp = await fetch(newsURL.toString())
  const data = await resp.json()
  let text = ""
  for (let i = 0; i < data.articles.length && i < 5; i++) {
    const article = data.articles[i]
    text +=
      '<a href="' +
      article.url +
      '" target="_blank"><b>' +
      article.title +
      ".</b></a><br />" +
      article.description +
      "<br />"
  }
  return text
}

To make news API working locally, use some HTTP server (newsapi.org doesn't support requests from local files).

For example the simple "serve" module:

npm i -g serve
serve .

Open the "index.html" in the browser and test it (remember it should be Chrome if you want Speech API to work).

index.html

Conclusion

We just implemented a few skills in the simplest way of using NLP.js. As you see, it is not so hard, but for more complex things NLP.js has other powerful features to use, such as the corpus, actions, slots, and pipeline.

The demo we made can be used to integrate a lot of external APIs, but remember to proxy all sensitive requests throw some back-end server to avoid exposing your keys to the public.

Also, to simplify further extension, it is better to organize our one-file solution as a project. Feel free to try some other APIs or dialogues to extend this speaking bot.

avatar
AI Developer
An experienced full-stack software developer with engineer mentality. Dmytro codes mostly with Node.js, Python and Rust, explores and experiments with the latest technologies, such as AI and Deep Learning.