Speech Generation
Speech synthesis is a relatively old technology. Even home computers in the 1980s could turn text into spoken words, but while understandable, they sounded strange. They used synthesis, artificially generated sounds to make the voices.
When discussing speech generation with modern AI technology, we mean synthetic, not synthesised speech. AI models can now analyse someone’s speech and break it down into its components, working out how those would change under different emotional conditions—even adding breathing sounds for extra believability. The result can turn text or different speech into an almost undetectable copy of the original speaker.
Speech generation has many positive uses. It unlocks the potential for AI tutors to speak with warmth and emotion, perhaps in the voice of role models (with their permission). It can give people who have lost their own voice through illness or disability a way to speak as themselves.
Sadly, it also has the potential for misuse.
What risks does AI speech generation pose to kids?
The risks of speech generation are more indirect for kids than those of other forms of AI, but they are no less real and certainly something we need to consider as technology advances. Until now, people could not impersonate someone known to the receiver.
Now, the voice you hear on the phone or in a voicemail might not be from who it appears. Training a voice once took many hours of recording, but this has been reduced to under a minute.
The potential for harmful deception will increase for everyone, especially children who cannot handle the unexpected well. We will all face the increased likelihood of scams and misinformation designed to shape our thoughts and opinions.
Children will also be attracted to this technology for fun and pranks. While it is out of their reach for now, it will become more widely available, and they need to become aware of the damage to themselves and others a seemingly harmful joke might have when impersonating someone else.
How can we minimise the risk?
Dealing with fake identifying features is an evolving area. The Federal Trade Commission in the US has examined the protection of consumers as a whole with Impersonation Laws.
Families subjected to scams involving faked voices of family members say they will have a family password for use when it is crucial their identity can be confirmed. In practice, though, will kids and other family members remember these?
The shift in mindset needed to respond to the potential for AI to produce cloned voices has barely begun. Legislation and education will both play their part. The first will move slowly and never fully replace being aware of the risks and thinking them through in advance of dealing with them.