Where can I learn more about these phrases to say to capture as many sounds as possible? They listed two in the article, but now I'm curious how they decided which sentences to say, how efficient they are, if they account for the same sounds but with different qualities, like going up at the end if it's a question? It's a rabbit hole I need to find the entrance to.
This doesn’t answer all of your questions, but Apple has an accessibility feature which lets you generate a voice based on your own. It’ll prompt you to say various phrases, at the end you’ll have a synthetic voice to use. Info on it here: https://support.apple.com/en-us/104993