Pushing Object Detection to Home-Assistant with Coral EdgeTPU

Forget the Tensorflow Component

Lots of awesome developers have added image processing components to Home-Assistant’s integration list. In fact, there are now 9 different image processors (a few that do more than just object detection) built right into Home-Assistant. I wrote the first version of the OpenCV image processing component, which was not the best given my lack of experience in Python at the time, but it obviously triggered some ideas in others – which is one of the amazing parts of open source software! However, most computers (and servers) are just not built to perform inference analysis, so the components that are self-hosted, like the OpenCV integration, just aren’t very efficient.

Enter Coral the EdgeTPU from Google

A lot of providers provide cloud-based inference engines – Google has released some solutions for performing inference at the “edge” (i.e. locally): The Google Coral Dev Board and the USB Accelerator (see all products here). The Coral Dev Board is similar to a Raspberry Pi with a TPU and the USB Accelerator is a USB 3 external TPU; A TPU is a Tensor Processing Unit – specifically designed for processing Tensors or n-D matrices. Tensors are basically mathematical representations of real-world patterns; they are, in basic terms, used for pattern matching. There are some alternatives to the EdgeTPU by Google, like the Intel Neural Compute Stick, but none have seemed to have the community backing that the EdgeTPU does.

Pushing State to Home-Assistant

My house was built around 1970, and it’s quite obvious any “repairs” (the term is used loosely) by the previous owners were done by those without the know-how. The doorbell button looks like it’s from the ’70s, I experimented with a Z-Wave Doorbell but it just wasn’t loud enough – my partner works from home and his office is in the basement. We already had an Unifi Camera mounted above the door, why not let the house tell us when someone was at the door? So, I implemented the OpenCV integration with Home-Assistant, and then tried the TensorFlow integration; I had to throw an extra 4 cores at the VM to get even semi-reliable results: if it worked, it took a few seconds before it would even trigger a notification which caused our Doordash drivers to get frustrated, and had no idea which delivery company had dropped packages as they had left the frame of view.

Why was the reliability of the integrations such an issue? Well, for one, it was done on Xenon processors (not exactly top of the line) and, two, because those integrations were polling – only updating when the loop requested their state; I lived with it but hated it.

When I discovered the EdgeTPU, I ordered both a Coral Dev Board and a USB Accelerator, I had plenty of Raspberry Pis laying around I was sure I could put it to use. Of course, the idea got back-logged to all of my other projects, like implementing my Distributed, Modular State Machine. I finally got around to it this last weekend.

The 1st Pass

I wanted the Raspberry Pi to push the state to Home-Assistant in order to get more immediate results. So the application was designed to consume an RTSP video stream and perform object detection on the frames. Watching the logs, I couldn’t believe how fast it was; each loop (retrieve frame, process, and push the state to Home-Assistant) appeared to take around a second.

The logs, however, were very misleading. While I watched the logs and Home-Assistant, I stepped in front of the camera. Only, it took a couple seconds for it to detect a person was in the frame; when I left the view of the camera, it reported that a person was in the frame for close to five to six seconds. It was way better than using the Home-Assistant integrations, but it definitely puzzled me.

OpenCV VideoCapture Implementation

The code was written to run 1:1, 1 thread to 1 camera. The camera stream was fed to OpenCV’s VideoCapture class, and continually looped while the connection was open. I found on some forums the answer to why I was experiencing such delay: when you call the VideoCapture::read() function, it provides the next frame in the buffer, not the most recent frame. This wouldn’t be much a problem if your processing could keep up with the frame rate of the video stream; if you can’t keep up with the frame rate you experience lag, as I was.

Attempting to work around this limitation, I found you could retrieve the number of frames in the buffer and set the current frame index. Unfortunately, this let to around 4-5 seconds per frame, still better than the Home-Assistant integrations, but completely unacceptable for replacing a doorbell! The answer I found somewhere deep in Stack Overflow (I’ll link if I can find it I found it).

Fun with Thread Synchronization

Have two threads for each video stream. The first thread continually pop’s the oldest frame off of the buffer and the second processes the, hopefully, current frame. Since there’s a shared resource involved, you can’t have both threads popping the VideoCapture’s buffer queue; no, you need to synchronize the shared resource, otherwise, you run into concurrency issues. Concurrency issues, depending on the context and implementation, could crash your application, cause a thread to grab expired data, or even grab data that mutate later!

So we have two streams per video stream: the “Grabber” thread and the “Processor” thread. Grabbing a frame from the buffer and discarding it takes, essentially, no time at all, while Processing a frame from the buffer could take a bit (the term “bit” is used loosely here). So which thread should be the one to tell the other “Hey dude, it’s my turn!”?

Whenever a thread wants to read from the buffer, it must tell the other “Hol’ up, yo!” to prevent some of the concurrency issues mentioned above. While the thread is chatting away with the buffer, the other thread is waiting… patiently, or impatiently – kinda depends on how late they are to their next appointment. Imagine the UI thread is waiting: all of a sudden the user sees a frozen screen (and most like bitches loudly to their cube mates). For this reason, threads should quit the chit-chat and let the next thread do what it needs to do!

To accomplish this behavior, we use a shared Lock: a local, domain-specific object that identifies who has the right to access sharing resources across separate threads. A Lock, while similar, is different than a Mutex, which usually relates to system processes – though some people (and languages) use them interchangeably (they probably mean Semaphore). When a thread wants to access a shared resource, it attempts to acquire the Lock, waiting – sometimes impatiently – until it acquires the lock; precisely the reason a lock should be released as soon as possible.

Back to the topic at hand: when the Processor thread has received its frame from the buffer, it immediately relinquishes the lock so it can process it – while the Grabber thread happily gifts the buffer’s oldest frames to the garbage collector – until the Processor needs its fix from the frame.

What the hell did I just read?

Exactly how to handle a FIFO buffer between discrete processes…

The Grabber thread:

while self._video_stream.isOpened():
    self.lock.acquire() # Blocking action, wait for lock to be free
    self._video_stream.grab()
    self.lock.release() # Put the lock up for grab

The Processor thread:

while self.video_stream.isOpened():
    self.lock.acquire() # Blocking action, wait for lock to be free
    frame = self._retrieve_frame()
    self.lock.release() # Put the lock up for grab
    if frame is None:
       time.sleep(FRAME_FAILURE_SLEEP)
       continue # Stop at next light

   detection_entity = self._process_frame(frame)

   self._set_state(detection_entity)

Making Home-Assistant Distributed, Modular State Machine Part 3

Minimalist Diagram of Home-Assistant Distributed

Custom Component: remote_instance

Original Component

Lukas Hetzenecker posted this component to the Home-Assistant community forums, but I felt it was lacking in a few places

Multiple Instances with Z-Wave

I started to notice my Z-Wave network was a little slow and decided to add a slave instance of Home-Assistant with another Z-Wave controller; however, I quickly discovered node-id collisions. This caused attributes from Z-Wave devices that shared the same node-id to merge and caused problems when trying to invoke services on the Z-Wave domain.

Addressing Attribute Merging

The component accepts an entity_prefix configuration value, intended to prevent entity_id collisions between instances. Expanding on this concept, when the component sends a state change event and the node_id attribute is present, I made the component prefix the node_id, this immediately rectified the issue with attribute merging. When invoking a service against the Z-Wave domain, instances that did not have the prefix of the node_id ignored the service, and the appropriate instance would recognize that it is intended to run there. The prefix is removed from the node_id attribute and the service call is propagated to the event loop as appropriate.

Home-Assistant Distributed API Routes

A HomeAssistantView allows a platform/component to register an endpoint with the Home-Assistant REST API, e.g. components that use OAuth, like Spotify or Automatic. OAuth requires a redirect URI for the component to receive the access token from the service, this means exposing all Home-Assistant distributed instances to the web that need this type of configuration. Exposing a single instance reduces the possible attack surface on your network and simplifies DNS configuration if you use it. Unfortunately, when a HomeAssistantView is registered, no event is sent to the event bus – which would allow the master instance to register a corresponding route.

Following the idea of Kubernetes Ingress and Redis Sentinel, when a remote instance registers its own API route, it notifies the proxy (master instance in this case). The master registers the endpoint in its own router, and when there is a request to the endpoint performs a fan-out search for the instance that can appropriately answer the request. Why the fan-out? Well, most endpoints are variable-path, i.e. /api/camera_proxy/{entity_id} or /api/service/{domain}/{entity}, which may apply to multiple instances. If an instance responds to the proxy request with a 200, the master instance registers an exact match proxy for it. For example, if both the security and appliance instance register routes of /api/camera_proxy/{entity_id}, and a request comes in for /api/camera_proxy/camera.front_door, the security instance can respond with HTTP 200, so from that point forward, when a request for /api/camera_proxy/camera.front_door comes in, the proxy server automatically sends it to the security instance.

Check out the appropriate repositories below:

Making Home-Assistant Distributed, Modular State Machine Part 2

Minimalist Diagram of Home-Assistant Distributed

Home-Assistant Distributed: Deployment with Ansible

Why Ansible

Ansible is open-source state management and application deployment tool; we use Ansible for all of our server configuration management in our home lab. It uses human-readable YAML configuration files that define tasks, variables and files to be deployed to a machine, making it fairly easy to pick up and work with and, when written correctly can be idempotent (running the same playbook again should have no side-effects) – perfect for the deployment of our Home-Assistant Distributed Cluster.

Ansible uses inventory files that define groups that hosts are members of. For this project, I created a group under our home lab parent group named “automation-cluster” with child groups for each slave instance, e.g. tracker, hubs, settings, etc. Using this pattern facilitates scalability: if deploying the cluster to a single host, or deploying to multiple hosts makes no difference to the playbook.

Role: automation-cluster

Ansible operates using roles. A role defines a set of tasks to run, variables, files/file templates in order to configure the target host.

Tasks

The automation-cluster role is fairly simple:

  1. Clone the git repository for the given slave instances being deployed
  2. Generate a secrets.yaml file specific to each instance, providing only the necessary secrets for the instance
  3. Modify the file mods for the configuration directory for the service user account
  4. Run instance-specific tasks, i.e. create the plex.conf file for the media instance
  5. Generate the Docker container configurations for the instances being deployed
  6. Pull the Home-Assistant Docker image and run a Docker container for each slave instance

Variables (vars)

To make the automation-cluster role scalable to add new instances in the future, a list variable named instance_arguments was created. Each element of the list represents one instance with the following properties:

  • Name – The name of the instance which matches the automation-cluster child group name mentioned above
  • Description – The description to be applied to the Docker container
  • Port – The port to be exposed by the Docker container
  • Host – The host name or IP used for connecting the cluster instances together
  • Secrets – The list of Ansible variables to be inserted into the secrets.yaml file
  • Configuration Repo – The URL to the git repository for the instance configuration
  • Devices – Any devices to be mounted in the Docker container

The other variables defined are used for populating template files and the secrets.yaml files.

Defaults

In Ansible, the defaults are variables that are meant to be overridden, either by group variables, host variables, or variables entered in the command line.

Templates

Templates are files that are rendered via the Jinja2 templating engine. These are perfect for files that differ for different groups or hosts using Ansible variables. For the secrets.yaml, my template looks like this:

#Core Config Variables
{% for secret in core_secrets %}
{{ secret }}: {{ hostvars[inventory_hostname][secret] }}
{% endfor %}
#Instance Config Variables
{% for secret in item.secrets %}
{{ secret }}: {{ hostvars[inventory_hostname][secret] }}
{% endfor %}

There is an Ansible list variable named core_secrets that contains the names of variables used in all slave instances, i.e. home latitude, home longitude, time zone, etc. The secrets variable is defined in each automation-cluster child group and is a list variable that contains the names of variables that the instance requires. This allows you to keep the secrets.yaml file out of the configuration repository, in case an API key or password may be accidentally committed and become publically available.

In the template, you’ll see hostvars[inventory_name][secret]. Since the loop variable, secret, is simply the name of the Ansible variable, we need to retrieve the actual value of the variable. The hostvars dictionary, where the keys are the hostname currently being ran against, contains all of the variables and facts that are currently known. So hostvars[inventory_hostname] is a dictionary, with keys of the variable names, of all of the variables that are applicable to the current host. In the end, hostvars[inventory_name][secret] returns the value of that variable.

Handlers (WIP)

Handlers are tasks that are run after each block of tasks in a play, perfect for restarting services, or if a task should only be run if something changed. For the automation-cluster, if the instance configuration or secrets.yaml change from a previous run will trigger the Docker container to be restarted and thus loading the changes into the state machine

Role: Docker

Our Docker role simplifies the deployment of Docker containers on the target host by reducing boilerplate code and is a dependency of the automation-cluster role.

Gotchas Encountered

Home-Assistant deprecated the API password authentication back in version 0.77 and now uses a more secure authentication framework that relies on refresh tokens for each user. In order to connect to the API via services or, in our case, separate instances, you must create a long-lived bearer token. Because of this, the main (aggregation instance) must be deployed after all other instances are deployed and long-lived bearer tokens are created for each instance. It is possible to deploy all instances on the first play, but the tokens will need to be added to the variables and the play re-ran to regenerate the secrets.yaml file for the main instance.

If you are deploying multiple instances on the same host, you will have to deal with intercommunication between each instance. There are a few different solutions here:

  • Set the host for each instance in the instance-arguments variable to the Docker bridge gateway address
  • Add an entry to the /etc/hosts file in the necessary containers that aliases the Docker bridge gateway address to a resolvable hostname
  • Set the homeassistant.http.port property in the instance configurations to discrete ports and run the Docker container with network mode host
  • Create a Docker network and deploy the instances on that network with a static IP set for each instance

Next Steps

  • Unlink all Z-Wave and Zigbee devices from my current setup and link them on the host housing the hubs instance
  • Remove the late-night /etc/hosts file hack and deploy the containers to docker network or host network mode
  • Implement handlers to restart or recreate the containers based on task results

Making Home-Assistant Distributed, Modular State Machine Part 1

Minimalistic Example of Home-Assistant Distributed

Why Make Home-Assistant Distributed

Home-Assistant is one of the largest open-source projects on GitHub, integrating with almost every IoT device, mesh protocol, internet platform, and more; it is highly extensible at its core, and easily configured. However, once your system reaches a certain size, many of its limitations become apparent – I believe making Home-Assistant distributed will resolve many of these limitations.

The biggest and most annoying limitation is the inability to change almost any of the configurations without restarting the system. This is compounded when your Z-Wave or Zigbee controller is Home-Assistant, causing the mesh network to be re-initialized because you added a new component or new input-slider for automations. Anecdotally, I have noticed the integrations that require polling to update states to slow as the number of services needed to update states increases, i.e. hundreds of devices on multitudes of platforms.

Separation of concerns is one engineering principle that facilitates maintainability. Home-Assistant has two protocols that facilitate multiple instance communication: web-sockets and MQTT. MQTT, Message Queuing Transport Telemetry, is a lightweight pub/subsystem used by many IoT devices and platforms. While Home-Assistant includes an MQTT Auto-Discovery component that requires a decent amount of upfront planning and it is not the easiest to work with. Home-Assistant’s web-socket api is well documented moreover is fairly easy to work with, especially given this custom component by Lukas Hetzenecker:

The delineation between modules is kind of up to you; I decided to run the following instances:

  • Hubs – Z-Wave, Zigbee, and Insteon
  • Media – Televisions, set-top boxes, and receivers
  • Tracker – LAN-based presence detection, zones, and GPS tracking
  • Security – Security cameras, and alarm control panels
  • Communications – Notifiers, text-to-speak, and voice assistant intents
  • Appliances – Vacuums, litter box, and smart furniture
  • Planning – Shopping lists, calendars, and To-Do lists
  • Settings – Inputs for controlling, and configuring rules
  • AI – Image processing, face detection, and license plate recognition
  • Environmental – Weather, air quality, and neighborhood news
  • Main – Primary UI, people, and history recording

The main instance implements the Home-Assistant Remote Instance custom component configured to connect to the IP and port of all of the slave instances. If you haven’t moved your automation/rules out of Home-Assistant’s YAML-based automation components, this is the time to investigate Node-Red or AppDaemon; Home-Assistant is an amazing aggregation of hundreds of integrations, but it’s a state machine, and following good engineering principles dictates atomic purpose. Following the philosophy of atomic purpose means everything handles one, and only one, purpose very well. YAML-based automations, while probably decent enough for most end-users, cannot compete with writing real applications around your state machine and break the atomic purpose philosophy (combining the state machine with the rules engine).

You have several options for deploying a Home-Assistant distributed state machine: install them side-by-side, configured each on different ports; on the same box using Python Virtual Environments; Docker Compose, which allows you to describe all of the containers that make up the application; or a configuration management system, like Ansible or Salt Stack, to deploy however you wish (Docker containers, separate virtual machines, etc).

As of this post, I have started splitting up the configurations (see repositories below) and am waiting for a good time to unpair all of my Z-Wave and Zigbee devices from the controller – my family has grown quite used to the automations so this needs to happen when everyone is out or asleep. We currently use MaaS to provision our virtual machines; my current plan is to spin up an instance for AI, as it is fairly process-intensive, and either: one for the remaining slaves and one for main, or run the rest on one instance (will try both to see if there are any performance issues running so many instances on one box). Each instance will be deployed as a Docker container via Ansible playbooks. All code related to my Home-Assistant Distributed State-Machine will be hosted on GitHub: