If you mean doing speech to text on the device then that was my first thought as well. But DSP isn't cheap and we are talking about serious battery consumption. Even if they cache audio and only process it while the phone is charging then they would still need the algorithms baked into the binary (researchers could find em) unless they somehow sidestep the app stores not allowing remote code to be loaded.
They could have a really cheap algorithm that just tries to inexpensively match audio fingerprints in windows of audio. I guess if you have trillions of hours of audio it's ok not to inspect every minute to the fullest extent.
It's an interesting problem to think about but as other hackers have mentioned: why would they risk doing it in secret? They could just update the EULA.