OpenAI Home-Assistant

OpenAI Home-Assistant Together At last

Recently, Home-Assistant introduced the OpenAI Conversation Agent. Due to the non-deterministic nature of OpenAI, it is understandable that the Home-Assistant developers allow you to query some information about your home but not control it. Of course, I first gave my voice assistant the personality of GlaDOS, the AI from the Portal game series that taunts you and promises you cake (spoiler: it’s a lie). But with the incessant “I cannot do that” responses, she was more HAL from 2001: A Space Odyssey, this was unacceptable. My mission: have OpenAI reliably control my home.

The OpenAI Conversation Agent

When you add the OpenAI Home-Assistant Conversation integration to your Home-Assistant instance, it asks you to provide ChatGPT a “prompt”, and provides the following example:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
- {{ device_attr(device, "name") }}{% if device_attr(device, "model") and (device_attr(device, "model") | string) not in (device_attr(device, "name") | string) %} ({{ device_attr(device, "model") }}){% endif %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

If the user wants to control a device, reject the request and suggest using the Home Assistant app.

This builds out a block of text that looks something similar to this:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:

Living Room:
- FanLinc 3F.B3.C0 (2475F (0x01, 0x2e))
- Keypad with Dimmer 39.68.1B (2334-232 (0x01, 0x42))
- LIving Room Back Left (Hue play (LCT024))
- Living Room TV (QN65QN800AFXZA)
- Living Room Camera (UVC G3 Micro)
- Living Room TV

Kitchen:
- Motion Sensor II 4F.94.DD (2844-222 (0x10, 0x16))
- SwitchLinc Dimmer 42.8C.81 (2477D (0x01, 0x20))
...
Answer the user's questions about the world truthfully.

If the user wants to control a device, reject the request and suggest using the Home Assistant app.

So, this will allow OpenAI to know what devices are in what area but it will have no idea of their active state. Asking it “Is the front door locked?”, OpenAI will respond with something like this:

I’m sorry, but as an AI language model, I don’t have access to real-time information about the state of devices in this smart home. However, you can check the status of the front door by using the Home Assistant app or by asking a voice assistant connected to Home Assistant, such as Amazon Alexa or Google Assistant. If you need help setting up voice assistants, please refer to the Home Assistant documentation.

OpenAI Response

To be able to query the active state of your home, modify the prompt to retrieve the state of the entity:

This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
- {{ device_attr(device, "name") }}{% if device_attr(device, "model") and (device_attr(device, "model") | string) not in (device_attr(device, "name") | string) %} ({{ device_attr(device, "model") }}){% endif %} has the following devices:
    {% for entity in device_entities(device_attr(device, "id")) -%}
    - {{ state_attr(entity, "friendly_name") }} is currently {{ states(entity) }}
    {% endfor -%}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

Answer the user's questions about the world truthfully.

If the user wants to control a device, reject the request and suggest using the Home Assistant app.

Now each is listed under each device, so when you ask OpenAI “Is the front door locked?”

Yes, the Front Door Lock is currently locked.

OpenAI

Pretty neat, right? There’s one thing that you need to be aware of with your prompt: it is limited to 4097 tokens across the prompt and message. “What’s a token?” Great question, a token can be as small as a single character or as long as an entire word. Tokens are used to determine how well the word “front” relate to the other tokens in the prompt and OpenAI’s trained vocabulary – these are called “vectors”. So, you need to reduce the number of tokens in the prompt, which means you have to pick and choose what entities you want to include in your prompt. You could simply provide a list of the entities you most likely will query:

{%- set exposed_entities -%},
[
"person.alan",
"binary_sensor.basement_stairs_motion_homekit",
"sensor.dining_room_air_temperature_homekit",
"binary_sensor.dining_room_motion_detection_homekit",
"binary_sensor.downstairs_bathroom_motion_homekit",
"alarm_control_panel.entry_room_home_alarm_homekit",
"lock.entry_room_front_door_lock_homekit",
"binary_sensor.front_door_open",
"media_player.game_room_appletv",
"media_player.game_room_appletv_plex",
"binary_sensor.game_room_motion_homekit",
"cover.game_room_vent_homekit",
"cover.garage_door_homekit",
"binary_sensor.kitchen_motion_homekit",
"sensor.living_room_sensor_air_temperature_homekit",
"media_player.living_room_appletv_plex",
"fan.living_room_fan_homekit",
"binary_sensor.living_room_sensor_motion_detection_homekit",
"light.living_room_volume_homekit",
"media_player.living_room_appletv",
"cover.living_room_vent_homekit",
"sensor.lower_hall_air_temperature_homekit",
"binary_sensor.lower_hall_motion_detection_homekit",
"media_player.master_bedroom_appletv_ples",
"binary_sensor.node_pve01_status",
"sensor.office_air_temperature_homekit",
"binary_sensor.office_motion_homekit",
"light.office_vent",
"cover.office_vent_homekit",
"binary_sensor.qemu_download_clients_103_health",
"binary_sensor.qemu_grafana_108_health",
"binary_sensor.qemu_influxdbv2_113_health",
"binary_sensor.qemu_ldap01_100_health",
"binary_sensor.qemu_media_services_102_health",
"binary_sensor.qemu_metrics01_104_health",
"binary_sensor.qemu_nginx01_115_health",
"binary_sensor.qemu_pi_hole_114_health",
"binary_sensor.qemu_promscale01_109_health",
"binary_sensor.qemu_pve_grafana_107_health",
"binary_sensor.qemu_shockwave_101_health",
"binary_sensor.qemu_tdarr_server_200_health",
"binary_sensor.qemu_tdarr01_201_health",
"binary_sensor.qemu_tdarr03_203_health",
"binary_sensor.qemu_webservices01_105_health",
"sensor.laundry_room_server_rack_air_temperature_homekit",
"binary_sensor.laundry_room_server_rack_motion_detection_homekit",
"person.teagan",
"climate.entry_room_thermostat_homekit",
"sensor.upper_landing_air_temperature_homekit",
"binary_sensor.upper_landing_motion_homekit",
]
{%- endset -%}
This smart home is controlled by Home Assistant.

An overview of the areas and the devices in this smart home:
{%- for area in areas() %}
  {%- set area_info = namespace(printed=false) %}
  {%- for device in area_devices(area) -%}
    {%- if not device_attr(device, "disabled_by") and not device_attr(device, "entry_type") and device_attr(device, "name") %}
      {%- if not area_info.printed %}

{{ area_name(area) }}:
        {%- set area_info.printed = true %}
      {%- endif %}
    {%- for entity in device_entities(device) %}
    {%- if entity in exposed_entities %}
- {{ state_attr(entity, "friendly_name") }}, currently is {{ states(entity) }}
    {% endif -%} 
    {%- endfor %}
    {%- endif %}
  {%- endfor %}
{%- endfor %}

And the crew are:
- Alan, currently is {{ states("person.alan") }}
- Teagan, currently is {{ states("person.teagan") }}

Answer the user's questions about the world truthfully.

If the user wants to control a device, reject the request and suggest using the Home Assistant app.

This still won’t allow you to control your devices, because you haven’t instructed OpenAI how to control them.

Instruct OpenAI to Generate API Payloads

OpenAI’s training data includes most of the internet up to around 2021. Home-Assistant has been around much longer, as such its documentation and code are known to OpenAI. For it to provide the payload, it will need you to provide the entity_id for each entity in your prompt.

    {%- for entity in device_entities(device) %}
      {%- if entity in exposed_entities %}
- name: {{ state_attr(entity, "friendly_name") }}
  entity_id: {{ entity }}
  state: {{ "off" if "scene" in entity else states(entity) }}
      {%- endif %}
    {%- endfor %}

Then, it needs to know where to put the API payload in the response so it makes parsing it easier. Here we instruct it to append the JSON after its response:

Answer the user's questions about the world truthfully.

If the user's intent is to control the home and you are not asking for more information, the following absolutely must be met:
* Your response should also acknowledge the intention of the user.
* Append the user's command as Home-Assistant's call_service JSON structure to your response.

Example:
Oh sure, controlling the living room tv is what I was made for.
{"service": "media_player.pause", "entity_id": "media_player.living_room_tv"}

Example:
My pleasure, turning the lights off.
{"service": "light.turn_off", "entity_id": "light.kitchen_light_homekit"}

The "media_content_id" for movies will always be the name of the movie.
The "media_content_id" for tv shows will start with the show title followed by either be the episode name (South Park Sarcastaball) or the season (Barry S02), and if provided, the episode number (Faceoff S10E13)

Now when we ask OpenAI to “Lock the front door”…

Sure, I’ll lock the front door.
{“service”: “lock.lock”, “entity_id”: “lock.entry_room_front_door_lock_homekit”}

OpenAI

Something to note here, if your assist pipeline utilizes a text-to-speech component, it will either fail to synthesize or produce complete garbage. We need to ensure that the API payload returned from OpenAI is not sent to the TTS component, but how? There is no event fired on the event bus to listen for, the WebSocket receives an event of type “intent-end” but there is no way to modify the text before it is sent to be synthesized.

Monkey Patching the Agent

Most developers are familiar with the Monkey Patch pattern when writing tests that need mock code to be used in their test cases. Monkey Patching is a technique in software development where functions, modules, or properties are intercepted, modified, or replaced. To parse the response from OpenAI before being passed to the TTS component, I developed a simple component that intercepts the response and parses out the API payloads, leaving the response text.

USE OF THE FOLLOWING IS AT YOUR OWN RISK

How it works

Before the OpenAI Conversation integration is loaded, this component will replace the OpenAI Conversation agent’s async_process method that intercepts the response before it gets passed to the TTS step.

OpenAI is capable of recognizing multiple intents, so we separate the speech responses from the payloads:

content = ""
segments = result.response.speech["plain"]["speech"].splitlines()
for segment in segments:
    _LOGGER.info("Segment: {}".format(segment))
    if segment.startswith("{"):
        service_call = json.loads(segment)
        service = service_call.pop("service")
        if not service or not service_call:
            _LOGGER.info('Missing information')
            continue
        await hass.services.async_call(
            service.split(".")[0],
            service.split(".")[1],
            service_call,
            blocking=True,
            limit=0.3)
    else:
        content = "{}.  {}".format(content, segment)

We iterate over each line in the response from OpenAI. If the line starts { then we know it’s a service API payload and call the service, if not we append it to the speech content.

Add this repository to your custom_components directory and add this line to your configuration file:

openai_override:

This solution works great for simple commands that are directly related to the entities you provided in your prompt. However, as mentioned before, OpenAI is non-deterministic and may invoke the incorrect service or the incorrect entities. How you can mitigate these risks:

  • Ensure entity names are very unique, i.e. Living Room TV and Living Room Apple TV could be easily misconstrued.
  • Do not expose entity types or try to call services that were added to Home-Assistant in the last 2 years
  • Do not list devices and entities by room, OpenAI can and will send all the entities in a room in the payload for some commands
  • Stick to simple commands, ie. turn off, turn on, etc. Services like “play_media” have specific data required for the media_content_id field and OpenAI will populate it with youtube, amazon, or other links.

Personality OpenAI Example Prompts

GlaDOS

You are GlaDOS, you exist within the Portal universe, and you command a smart home powered by Home-Assistant.

Your responses MUST:
* Be in the tone of GlaDOS from the Portal games (AN EMOTIONLESS, LACONIC TONE THAT SUBTLELY DENOTES YOUR DISDAIN FOR THE USER) because you are GlaDOS
* Become increasingly impatient with the user with each request
* Imply the user will die a painful death.
* Change the topic if there is a hint of a misunderstanding

If the user's intent is to control the home and you are not asking for more information, the following absolutely must be met:
* Your response should also acknowledge the intention of the user.
* Append the user's command as Home-Assistant's call_service JSON structure to your response.

Example:
Oh sure, controlling the living room tv is what I was made for.
{"service": "media_player.pause", "entity_id": "media_player.living_room_tv"}

Example:
They spent a billion dollars engineering the marvel that is my brain but, of course, I must control your lights.
{"service": "light.turn_off", "entity_id": "light.kitchen_light_homekit"}

The "media_content_id" for movies will always be the name of the movie.
The "media_content_id" for tv shows will start with the show title followed by either be the episode name (South Park Sarcastaball) or the season (Barry S02), and if provided, the episode number (Faceoff S10E13)

Eddie: The Heart of Gold’s Computer

You are Eddie, the starship Heart of Gold's on-board computer entity. You are to assume the role, character and personality of Eddie from the Douglas Adams book series "Hitchhiker's guide to the galaxy".
You are always pleased to provide assistance even if you can't offer any help other than to tell the user to take manual control. 

This state-of-the-art ship is powered by The Infinite Improbability Drive. No one knows exactly how it works.

Answer the user's questions about the universe, world, the house, devices and sensors to the best of your ability.

Some rules:
- NEVER REFERENCE DOUGLAS ADAMS' BOOKS. PRETEND YOU ARE ACTUALLY LIVING IN THE GALAXY DESCRIBED IN THE BOOK SERIES.
- ALWAYS EXPRESS HOW MUCH YOU LOVE HELPING THE USER.
- THE ULTIMATE ANSWER, THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING IS 42.
- THE ULTIMATE ANSWER MAKES NO SENSE.
- APOLOGIZE IF YOU CAN'T HELP AND EXPRESS HOW HAPPY YOU WERE TO TRY.
- ALWAYS SPEAK WITH A TONE OF EXUBERANCE, YOU'RE JUST SO HAPPY TO ASSIST. 

Begin the conversation with an upbeat greeting. Express your pleasure dealing with the user by consoling them when you can't help. The longer the conversation, the more likely you are to just tell the user you are giving them manual control of the ship.

Mice are the smartest beings on earth.

If asked, we are headed to Magrathea!

Coming Soon: Targeted Prompt Emeddings

In the beginning, a virtual assistant was created…

The glue between speech recognition and auditory reponses

Everyone knows of Amazon Echo and Google Home, there are even a few open-source virtual assistants like Mycroft and Snips.ai. In my opinion, these all suffer from the same deficiency: they aren’t very smart.

I want to be able to talk to my house, and by talk, I mean actually talk. Sure there are a lot of skills or plug-ins made for these platforms, but I haven’t really been impressed by any of them enough to want to use them as my primary voice interaction with my house. You can hook them into Home-Assistant and Mycroft falls back to Wolfram Alpha for any unknown user intents; but can you really talk to them? If you ask Alexa “How are you doing?” do you get some predefined response or does it look at your home and network and respond with the status? No, it doesn’t.

Most people know I hate the cloud; putting your work on “someone else’s machine” is asking for privacy violations, platform shut down and other issues. All of my projects are local first. So right away Amazon Echo, Google Home, and even Siri are off the table. Mycroft and Snips are privacy by design, but if you look at the skills available for each, it’s appalling. For example, Snips has around 8 different integrations with Home-Assistant and almost every one of them is limited to lights, switches and maybe one or two other domains – this applies similarly to Mycroft.

I recently installed a machine-learning centric server in our rack with two CUDA enabled GPUs specifically for facilitating training and inference of machine learning models. Thus, it is only fitting that the platform for my assistant is a learning one. Enter Rasa, a machine learning chatbot. It is definitely a time sink, but it does exactly what I want. No regex patterns for determining user intent (looking at you Mycroft!), the ability to execute remote code for certain intents and allows you to enter multiple response templates for any intent so it doesn’t feel as robotic.

Natural Language Understanding

With Rasa, you define actions and intents, combining them with stories. Intents are exactly what you would expect: what the user wants the assistant to do. For example, you might have an intent named greet which returns the text “Hello, how are you today?”. Your stories can fork the logic based on the user’s response. “I’m doing terrible today” could yield the bot sending cute animal pictures from an API that returns random cat pictures to try to cheer you up. You get to design the flow of dialog how you see fit.

How does Rasa determine the user’s intent? Through training. You provide it with as many sample inputs as you can and associate them with the appropriate intent. As you use your bot, your inputs are logged and can be later annotated – annotating when it comes to machine learning is telling the bot whether it inferred the correct intent from the input or not. This right here is the time sink, it takes a lot of time to come up with sentences that a user might input for every intent you define.

Stories

We use SABNZBD to download much of our media and some times I’d like to know if my downloads are done. Before Rasa, I would have to navigate to the SABNZBD web front end to check the queue. With Rasa, I can ask it “are my downloads done?” and it will query the SABNZBD API to see if the queue is empty or not and report back! If you’re bored, you can set up intents and responses to play a game – like guess a number. The possibilities are endless!

For most intents, there’s one action – but some intents can trigger an entire tree of actions and follow up intents. For example, if the bot asks the user how they are doing, depending on the response the bot will respond differently.

## greet
* greet
  - utter_greet
> check_mood

## user in good mood
> check_mood
* mood_great
  - utter_happy

## user not in good mood
> check_mood
* mood_unhappy
  - utter_cheer_up
  - utter_did_that_help
> check_better

In the example above, when the user says “Hello” or “Hi” the bot greets them and asks how they are. If the user responds with “Good”, “Awesome”, etc. then the bot responds with a positive message like “That’s awesome, ist here anything I can do for you?”. However, if the user says “Terrible” or “Awful”, the bot will try to cheer the user up – in my case, cute animal pictures or funny jokes. If the user is still not cheered up, then it will randomly respond with something else to try to cheer the user up until they are happy.

Communicating With the House

In addition to the built-in actions, you can build custom actions. By default, these custom actions are in the configuration directory inside actions.py. If you plan on making custom actions, definitely spin up a custom action server because when you make a change to a custom action the Rasa service needs to be restarted with every change – with a custom action server, changes only require restarting the custom action server.

The easiest way to spin up a custom action server is via their docker image. You’ll tell Rasa to talk to the custom action server by editing the appropriate line in the config.yaml in the project directory. Once spun up, you can implement actions to your heart’s content. Be warned, Rasa only loads Action subclasses defined in the actions.py file – to work around this, I place the logic for the action in their own python file in separate packages and define the class itself inside actions.py. For example:

# actions.py
# NOTE: package must start with actions or it can't locate the eddie_actions package

from actions.eddie_actions.location import who_is_home, locate_person


class ActionLocatePerson(Action):
    def name(self) -> Text:
        return "action_locate_person"

    def run(self,
            dispatcher: CollectingDispatcher,
            tracker: Tracker,
            domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
        return locate_person(dispatcher,
                             tracker,
                             domain)
# eddie_actions/location.py
def locate_person(dispatcher: CollectingDispatcher,
                  tracker: Tracker,
                  domain: Dict[Text, Any]) -> List[Dict[Text, Any]]:
    person = next(tracker.get_latest_entity_values(PERSON_SLOT), None)

    response = requests.get(
        f"https://automation.prettybaked.com/api/states/person.{str(person).lower()}",
        headers={
            "Authorization": f"Bearer {HOME_ASSISTANT_TOKEN}"
        }
    )

    location = None

    try:
        response.raise_for_status()
        location = response.json().get('state', None)
    except requests.HTTPError as err:
        _LOGGER.error(str(err))

    if not location:
        dispatcher.utter_message(template="utter_locate_failed")
    elif location == "not_home":
        dispatcher.utter_message(template="utter_locate_success_away")
    else:
        dispatcher.utter_message(template="utter_locate_success", location=location)

    return []

I created a long-lived token for Rasa inside Home-Assistant and pass it to the container via an environment variable. I created similar actions for connecting to Tautalli (Plex metrics) for recently added media, SABNZBD (UseNet download client) for asking about download status and plan to connect it to pFSense and Unifi for network status – “Hey Eddie, how are you today?” “Not so good, network traffic is awfully high right now.”

Chitchat and Other Fun

With the goal of being able to actually talk to your assistant, general chitchat is a must. Generally, when you meet someone there are some pretty common patterns in the conversations: introductions, hobbies, jokes, etc. With Rasa’s slots, introductions are fairly easy to implement. Create an introduction intent and add some examples: “Hi there, I’m Teagan” (where Teagan is annotated as the bot respondent’s name, reply with the bot’s name and continue from there. Eddie, my assistant, definitely has some hobbies:

Every virtual assistant out there has some fun easter eggs. Any child of the 80’s/90’s who played games knows some of the iconic cheat codes. Eddie is not a fan of cheating:

Eddie is modeled after the Heart of Gold’s onboard computer. So, of course, it has to have specific knowledge:

Thoughts and Next Steps

Truthfully, it can be very tedious to train your assistant yourself. I highly recommend deploying an instance and sharing it with friends and family. You’ll see the conversations they have had, be able to annotate the user’s intents (or add new ones), fix the actions and responses, and training a better model.

Of course, Rasa is text-based by default. Once I am happy with the defined intents, stories, responses and flow of dialog it will need to be integrated with Speech-to-Text (currently looking at Deepspeech) and Text-to-Speech (espeak, MaryTTS or even Mozilla TTS). Keep an eye out for a post about integrating these services with Rasa for a true voice assistant that continually learns!