Gemini multimodal live docs here: https://cloud.google.com/vertex-ai/generative-...

kwindla · on Dec 11, 2024

The Multimodal Live API is free while the model/API is in preview. My guess is that they will be pretty aggressive with pricing when it's in GA, given the 1.5 Flash multimodal pricing.

If you're interested in this stuff, here's a full chat app for the new Gemini 2 API's with text, audio, image, camera video and screen video. This shows how to use both the WebSocket API and to route through WebRTC infrastructure.

https://github.com/pipecat-ai/gemini-multimodal-live-demo

dandiep · on Dec 11, 2024

Thanks, this is great!

spencerchubb · on Dec 12, 2024

I am eager to learn the pricing as well. It works sooo well but the pricing will make or break whether it's viable for apps