Controlling your TV with voice and Raspberry Pi

raspberry_pi

What’s cooler than controlling your TV with voice commands?
A billion dollars!

But for this post,
it’s controlling your TV with voice commands through Raspberry Pi!
(and a billion dollars…)

So, down to business.

Full code can be found on GitHub.

Hardware requirements:

  1. TV set with enabled HDMI-CEC / Anynet+ (Samsung) - You’ll have to check your own TV set and make sure that you enable the HDMI-CEC control.
  2. Raspberry Pi - I recommend the CanaKit Raspberry Pi 2. For this tutorial, the Raspberry Pi should be connected to the TV set via HDMI.
  3. USB microphone - I used C-Media microphone USB

Install usb microphone on Raspberry Pi

It seems that just connecting the USB microphone is not enough with the Raspberry Pi, you actually have to enable it on the Raspberry Pi OS in order to use it:

1) Edit /etc/modprobe.d/alsa-base.conf with your favorite editor

1.1) Set the value options snd-usb-audio index=-2 to be options snd-usb-audio index=0
1.2) On the next line add: options snd_bcm2835 index=1

2) Reboot your Raspberry Pi

sudo reboot

3) From the Raspberry Pi terminal, run lsusb.
Make sure that you can see your device in the list (in my case “C-Media Electronics, Inc. CM108 Audio Controller”):

pi@raspberrypi ~ $ lsusb
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
Bus 001 Device 005: ID 0781:5575 SanDisk Corp.
Bus 001 Device 006: ID 0d8c:013c C-Media Electronics, Inc. CM108 Audio Controller
Bus 001 Device 007: ID 046d:c31c Logitech, Inc. Keyboard K120 for Business

4) Run “cat /proc/asound/cards” and verify that your USB mic is set on device 0:

pi@raspberrypi ~ $ cat /proc/asound/cards
 0 [Device         ]: USB-Audio - USB PnP Sound Device
                      C-Media Electronics Inc. USB PnP Sound Device at usb-3f980000.usb-1.4, full spe
 1 [ALSA           ]: bcm2835 - bcm2835 ALSA
                      bcm2835 ALSA

Install requirements (on the Raspberry Pi)

Install basics

sudo apt-get install libcec-dev build-essential python-dev

Install pyaudio

1) Clone the pyaudio git repository

git clone http://people.csail.mit.edu/hubert/git/pyaudio.git

2) Go to the pyaudio folder and run the install command

cd pyaudio
sudo python setup.py install

Install flac

Flac is an audio compression format (like MP3).
If you don’t have it, pyaudio will not be able to run.

sudo apt-get install flac

Install required python modules

Last step before creating our script is to install a few python packages. I usually like to create virtual environments for these projects.

In your virtual environment install the following packages:

  1. Speechrecognition - Library for performing speech recognition with the Google Speech Recognition API.
  2. pyaudio - provides Python bindings for PortAudio, the cross-platform audio I/O library
  3. python cec - Python bindings for libcec. This will be used to control the TV through HDMI.
pip install Speechrecognition
pip install --allow-external pyaudio --allow-unverified pyaudio pyaudio
pip install cec

Creating the script

This is a very simple script, which performs the following steps:

  1. Initialize the required components
  2. Start running an infinite loop until there’s a stop command:
    1. Record audio through the microphone
    2. Translate the audio to text
    3. Check if text contains a command
    4. Perform the required command

Step 1: Initialize required components

We’ll start by importing the required packages:

import cec
import speech_recognition as sr

We need to define the commands that we want to use. For this example I chose the following commands:

  1. Turn the TV on
  2. Turn the TV off
  3. Close program
TURN_TV_ON = "turn tv on"
TURN_TV_OFF = "turn tv off"
CLOSE_PROGRAM = "close program"

Initialize CEC control

cec.init()

Create speech recognition object

r = sr.Recognizer()

Step 2: Record and analyze audio

In an infinite loop, record audio through the microphone

with sr.Microphone() as source:
    audio = r.listen(source)

Translate the audio to text

command = r.recognize(audio)

Check if the command is to turn the TV on

if TURN_TV_ON in command.lower():
	# Get tv device and turn it on
    tv = cec.Device(0)
    tv.power_on()

Check if the command is to turn the TV off

if TURN_TV_OFF in command.lower():
    # Get tv device and turn it off
    tv = cec.Device(0)
    tv.standby()

Check if the command is to stop the script

if CLOSE_PROGRAM in command.lower():
    # Stop program
    break

This is how the full code looks like:

import cec
import speech_recognition as sr

TURN_TV_ON = "turn tv on"
TURN_TV_OFF = "turn tv off"
CLOSE_PROGRAM = "close program"

def main():
    # Create cec control
    cec.init()

    # Ceate speech recognizer object
    r = sr.Recognizer()

    # Create infinite loop
    while True:
        # Record sound
        with sr.Microphone() as source:
            print("Recording")
            audio = r.listen(source)

        try:
            # Try to recognize the audio
            command = r.recognize(audio)
            print("Detected speech:{0}".format(command))
            # Check the current command
            if TURN_TV_ON in command.lower():
                # Get tv device and turn it on
                tv = cec.Device(0)
                tv.power_on()
            elif TURN_TV_OFF in command.lower():
                # Get tv device and turn it off
                tv = cec.Device(0)
                tv.standby()
            elif CLOSE_PROGRAM in command.lower():
                # Stop program
                break
        except LookupError:
            # In case of an exception
            print("Could not translate audio")

if __name__ == '__main__':
    main()

If you run the script, you can say the commands and check in the script output if it was able to translate your speech correctly and run the correct command.

That’s it!

Full code can be found on GitHub.

So, what’s cooler than controlling your TV with voice commands?
A billion dollars!

billion

Written on August 14, 2015
comments powered by Disqus