Probably possible, but what would be the advantage? I know for instance using something like Google Speech to Text API is a lot more accurate than Web Speech API.
The advantage is reduced server costs + more efficient use of computing resources in general. I personally am always happy to offload processing to the client side wherever possible.