The "quality" of the sound file as a source of random numbers is not high. In a small area, integers change less than we would like. To improve the quality of "random" numbers, special processing and selection algorithms are needed.
You can use several algorithms at the same time. Because Since the source of integers (sound file) in the same assembly of the Operating System on the same smartphones turns out to be the same, it is important that the processing algorithms of the sender and recipient are the same.
It is clear that such an encryption key cannot be considered ideal for the Vernam code. The strength of the cipher will not be absolute, but the labor costs for cracking can be huge. This mechanism can be used to further protect already encrypted data.
Countermeasures for such an algorithm can be to correct the speech synthesis code - adding "random" values to the synthesis parameters. But you can develop your own codec for encryption purposes.
The use of speech synthesis to generate encryption keys can be greatly improved by using technology with huge programming objects.
The advantage is the ability to generate symmetric keys simultaneously for any number of consumers without using information exchange channels (hidden).