With the most recent update, it's actually very simple. You need three things: 1...

With the most recent update, it's actually very simple. You need three things:

1) Add OpenAI Conversation integration - https://www.home-assistant.io/integrations/openai_conversati... - and configure it with your OpenAI API key. In there, you can control part of the system prompt (HA will add some stuff around it) and configure model to use. With the newest HA, there's now an option to enable "Assist" mode (under "Control Home Assistant" header). Enable this.

2) Go to "Settings/Voice assistants". Under "Assist", you can add a new assistant. You'll be asked to pick a name, language to use, then choose a conversation model - here you pick the one you configured in step 1) - and Speech-to-Text and Text-to-Speech models. I have a subscription to Home Assistant Cloud, so I can choose "Home Assistant Cloud" models for STT and TTS; it would be great to integrate third party ones here, but I'm not sure if and how.

3) Still in "Settings/Voice assistants", look for a line saying "${some number} entities exposed", under "Add assistant" button. Click that, and curate the list of devices and sensors you want "exposed" to the assistant - "exposed" here means that HA will make a large YAML dump out of selected entities and paste that into the conversation for you[0]. There's also other stuff (I heard docs mentioning "intents") that you can expose, but I haven't look into it yet[1].

That's it. You can press the Assist button and start typing. Or, for much better experience, install HA's mobile app (and if you have a smartwatch, the watch companion app), and configure Home Assistant as your voice assistant on the device(s). That's how you get the full experience of randomly talking to your watch, "oh hey, make the home feel more like a Borg cube", and witnessing lights turning green and climate control pumping heat.

I really recommend everyone who can to try that. It's a night-and-day difference compared to Siri, Alexa or Google Now. It finally fulfills those promises of voice-activated interfaces.

(I'm seriously considering making a Home Assistant to Tasker bridge via HA app notification, just to enable the assistant to do things on my phone - experience is just that good, that I bet it'll, out of the box, work better than Google stuff.)

[0] - That's the inefficient token waster I mentioned in the previous comment. I have some 60 entities exposed, and best I can tell, it generates a couple thousand token's worth of YAML, most of which is noise like entity IDs and YAML structure. This could be cut down significantly if you named your devices and entities cleverly (and concisely), but I think my best bet is to dig into the code and trim it down. And/or create a synthetic entities that stand for multiple entities representing a single device or device group, like e.g. one "A/C" entity that combines multiple sensor entities from all A/C units.

[1] - Outside the YAML dump that goes with each message (and a preamble with current date/time), which is how the Assistant know current state of every exposed entity, there's also an extra schema exposing controls via "function calling" mechanism of OpenAI API, which is how the assistant is able to control devices at home. I assume those "intents" go there. I'll be looking into it today, because there's a bunch of interactions I could simplify if I could expose automation scripts to the assistant.