How To Fake Someone’s Voice

Click The Arrow For The Table Of Contents
Alexa, Smart speaker and virtual assistant from Amazon company connected to smartphone app. Wooden background

Following recent security concerns about Amazon’s plans to enable Alexa to mimic voices, we look at how easy it is to do, what the benefits are, and what risks it poses. 

Alexa The Mimic

Recently, Amazon announced that it was working on technology to enable its Alexa digital assistant to take on the voice of anyone, e.g., a user’s voice or any of their loved ones. Furthermore, it was reported that Rohit Prasad, an Amazon senior vice president, said at a Las Vegas conference that the reason was to help users “make the memories last” following the loss of loved ones in the pandemic and that a video segment highlighted how Alexa could, in theory, read a story to a child in the voice of their grandmother! 

Other Voice Mimicking Options

Many different options are available for creating a fake voice / digitally cloning a user’s voice. Some examples include: 

– Microsoft’s Custom Neural Voice is a text-to-speech feature that allows users to create a one-of-a-kind, customised, synthetic voice for their applications and build a highly natural-sounding voice by providing their audio samples as the training data. Microsoft says it can “represent brands, personify machines, and allow users to interact with applications conversationally”It also has a use in restoring impaired people’s speech. 

– Researcher is a digital voice cloning tool which the company says is “indistinguishable from the original speaker”It has been designed for filmmakers, game developers, and other content creators. 

– Resemble AI, which offers custom brand voices for assistants, e.g. a user’s voice for their smart assistant, Alexa and Google Assistant, integrates with DialogFlow, IBM Watson, or any other NLU engine. 

– Descript is a deepfake voice generator that can be used to create realistic voices based on transcripts or audio clips and make a text-to-speech model of your voice. 

– Scotland-based ‘CereVoice Me’ is a voice cloning system that allows users to produce a text-to-speech (TTS) version of their own voice for Windows. 

– iSpeech, a free voice cloning platform to create familiar voice interfaces for products, applications, and services. 

– ReadSpeaker, is proprietary voice cloning software that produces text-to-speech (TTS) voices that are indistinguishable from the source and offers a range of TTS engines that allow a cloned voice to speak across all a user’s audio channels: smart speaker apps, interactive marketing campaigns, advertisements, and more.

What Could Possibly Go Wrong?

The recent announcement of Amazon’s plans to allow Alexa to mimic voices triggered long-held concerns that the cloned voices could be used to launch deep fake audio attacks on some voice authentication security systems.  

One real-life example from 2019 is when hackers used AI software to mimic an energy company CEO’s voice to steal £201,000 from a UK-based energy company. The company’s CEO received a phone call from someone he believed to be the German chief executive of the parent company. The person on the end of the phone ordered the CEO of the UK-based energy company to immediately transfer €220,000 (£201,000) into the bank account of a Hungarian supplier. The voice was reported to have been so accurate in its sound that the CEO of the energy company even recognised what he thought were the subtleties of the German accent of his boss and even the “melody” of the accent. The call was so convincing that the energy company made the transfer of funds as requested. 

Other concerns about the use of voice cloning include: 

– Issues of consent and disclosure, i.e. of the person whose voice is used and informing the listener that the voice is fake. For example, Microsoft has now stipulated that its Custom Neural Voice AI model cannot be used to mimic a voice without that person’s consent, and software will have to disclose that voices are fake. 

– Concerns that AI (e.g. for faking voices) is advancing too far ahead of regulation, which has led Microsoft to say that existing customers must obtain permission to continue using the Custom Neural Voice tool from June 30, 2023, and new customers will have to apply to use it, with Microsoft deciding whether the intended usage is appropriate. 

– Criticism (by Rights activists) that internal company ethics committees deciding what is appropriate in using a voice mimicking software can’t be truly independent. Competitive pressures limit public transparency, and that external oversight may be necessary. 

What Does This Mean For Your Business?

Although there are good arguments for the value of software that can clone a voice, e.g. interfaces for products, applications, and services, and use by filmmakers, game developers, and other content creators, there are concerns that they could also be used to make deepfakes for sinister purposes. For example, this could be getting past voice authentication security systems or impersonating people to obtain money. There are also ethical concerns about how producers of these tools decide upon appropriate usage and matters of consent. A balance must be struck, and many feel that more regulations and external oversight are needed to limit risk and potential harm.