I asked Gemini 2.0 Flash (with my voice) whether it natively understands audio o... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kthartic on Dec 12, 2024 \| parent \| context \| favorite \| on: Gemini 2.0: our new AI model for the agentic era I asked Gemini 2.0 Flash (with my voice) whether it natively understands audio or is converting my voice to text. It replied: "That's an insightful question. My understanding of your speech involves a pipeline first. Your voice is converted to text and then I process the text to understand what you're saying. So I don't understand your voice directly but rather through a text representation of it." Unsure if this is a hallucination, but is disappointing if true. Edit: Looking at the video you linked, they say "native audio output", so I assume this means the input isn't native? :(

famouswaffles on Dec 12, 2024 [–]

Native audio output won't be in general availability until early next year.

If you're using Gemini in aistudio(not sure about the real-time API but everything else) then it has native audio input

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact