What we can learn from Google Assistant about sounding natural

Play Video

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on email

Last week Google’s CEO demonstrated a conversation between Google Assistant (a machine) and a real person (a real person).
The conversation was so smooth and natural that the woman on the other side of the phone did not even suspect she was NOT speaking with a real person.

It got me thinking….
What made Google Assistant’s voice so natural and so freakishly real?
And what we, as non-native speakers can learn from the choices made by those who developed this AI technology?
In other words – if it works for Assistant, it can work for you.

In the video, I’m going to analyze the conversation phrase by phrase, and illustrate how speech elements like intonation, phrasing, connected speech, upspeak and filler words were so effectively used.

let me know in the comments below the video what other elements of speech/ phrases/ fillers words do you hear people use that make them sound natural?

Liked this video?

Get a weekly bite size pronunciation lesson straight to your inbox

Don’t like it? No problem. You can unsubscribe in one click.

Hey, guys, it’s Hadar and this is the Accent’s Way. A few days ago Google CEO, Sundar Pichai, presented

an extraordinary demo showing the new capabilities of the Google assistant.

In the demo, he played a real conversation between the Google assistant

which is a robot, AI technology and a real human being.

The stunning thing was that in the conversation the person could not detect

that she was not speaking to a real person but to a machine.

And the reason for that is, of course, the algorithm and the ability to use the right sentences according to the

nuances in the conversation, but also the way it was executed, the sound of the voice, the intonation.

What makes a voice sound natural?

What did they do, over there at Google, that made the voice sound so natural that the person could not

imagine they were speaking to a robot?

So what we’re going to do, is we’re gonna analyze the

conversation together and I’m going to pinpoint the places where the Google voice sounds so natural

and explain why it makes it sound like a real human being.

Okay, so she starts with, that’s Google assistant right, it’s a machine it’s not a real person, I know it’s crazy.

‘Hi’

‘Hi’, right.

The way of saying ‘Hi’ this way, is a very welcoming, nice, warm, friendly way of saying it.

There’s always a glide from high to low.

‘Hi’

And listen to the ending.

‘Hi’

I’m going down. It’s not ‘Hi’.

I’m going up

‘Hi’

And then there is a little tail going up at the end.

‘Hi’

That means that something else is coming up, I’m not done. And then she says something like this.

‘Hi’

‘I’m calling to book a woman’s haircut for a client’

Now in English, when you start a new idea, when you start a conversation, when you have a question, you

kind of start high in pitch.

‘I’m calling…’

It’s not

‘I’m calling to book a woman’s haircut for a client.’

‘I’m calling to book a….’

It’s like asking for permission or telling you something new.

‘Hi’

And notice it, now like start listening to how people start asking questions

or starting sentences or new ideas.

There’s always this wavy thing at the beginning, like a really high-pitched tone that they begin with

regardless to what words they’re choosing to stress.

Now in the sentence

‘Hi, I’m calling to book a woman’s haircut for a client.’

‘I’m calling to book a woman’s haircut for a client.’

So there is this rise in pitch at the beginning.

‘I’m calling…’

And ‘calling’ is a stress word.

‘…to book…’

That’s a little less stress. So it goes down

‘…a woman’s haircut…’

Right, that’s the subject, that’s what I’m calling to book, that goes higher in pitch

‘…for a client…’

And then there is this rising-rising intonation, the up-speak, where I go up.

That means that there is something else coming up, and then she continues

‘I’m looking for something on May 3rd’

So she stresses the word looking, she starts again high in pitch at the beginning of the sentence

‘I’m looking for something on May 3rd.’

And then she goes up in pitch at the end.

Now, look, it’s totally okay, and sometimes even better to end it like a statement.

‘I’m looking for something on May 3rd’

Right, and then it’s a rising intonation and then you drop it down.

However, this rising-rising intonation at the end of a sentence

even if it’s not a question, it’s a very common speech pattern in America nowadays.

Which made it sound even more natural than just a regular ending statement.

‘I’m looking for something on May 3rd’

And that open ending leaves more room for an answer.

It means that I’m waiting for an answer from you, but it’s sort of like a question.

And then there is thi s part

‘Mm-hmm’

Which is fantastic. What sounds more natural than

‘Mm-hmm’

That’s what we say, notice even here there is this glide in intonation.

‘Mm-hmm’

Again going up in pitch, making it sound more natural.

Like someone would actually say it like that.

‘At 12 pm’

Now we can learn a lot just from this one statement.

Notice that every syllable hits a different note. It’s not all on the same note.

‘At 12 pm’

‘At 12 pm’

‘At 12 pm’

Right and even the ‘m’ is kind of like gliding down.

Okay, so it goes up in pitch and then it goes down.

‘At 12 pm’

‘Do you have anything between…’

‘Do you have anything between…’

A question

‘Do you have…’

Reduction at the beginning

‘Do you have anything between…’

Again starting with a higher pitch.

‘Do you have anything between 10 am…’

Pause

Because people pause, they want to think about what they want to say

‘…and 12 pm’

Okay, so it’s not ‘between 10 am and 12 pm’

The system knows what hours it’s going to suggest, but it takes that little pause to make it sound more natural.

So phrasing is crucial when we speak English.

Phrasing, filler words, intonation patterns, stressed words, so the rising-rising intonation.

But then also the falling intonation at the end, to indicate that I’m done.

‘Just a woman’s haircut for now’

So again, this glide at the beginning, this high pitch at the beginning, just a woman’s, and then she goes down

‘…haircut for now.’

The assistant could have answered ‘a woman’s haircut’

but they added the ‘just’ and for ‘now’.

So ‘just a woman’s haircut’

the ‘just’ is not an essential word here

but it’s a filler word that a lot of people use, which made it sound more natural.

‘Just a woman’s haircut for now’

And ‘for now’ is just another filler word that says well, let’s begin with that and see where we go.

It’s a polite way of saying ‘that’s it’. I don’t need anything else.

‘just a woman’s haircut for now’

So those extra words

extra phrases, extra sounds, make it sound more natural and not like a robot.

And the thing is that these extra sounds and extra words are not usually used by non-native

speakers because we use efficient English. The way English is being taught is by very concise sentences

‘this is how you say it’

and then you learn that people use all these extra phrases and sounds

‘hmm’

‘aah’

‘well’

‘for now’

‘just’

Okay, all these extra phrases that make it sound more conversational and that’s a way to communicate

and make it sound more friendly and polite.

’10 am is fine’

Again, that rising, rising intonation. She could have said

’10 am is fine.’

’10 am is fine.’

but

’10 am is fine.’

makes it sound a little more friendly, a little less aggressive, a little less determined

’10 am is fine.’

I’m still waiting for an answer. I need you to approve it still.

’10 am is fine.’

And again notice that high pitch at the beginning

’10 am is fine.’

Again, up-speak at the end.

‘The first name is Lisa’

It’s not a question. So why does she go up in pitch?

Because that’s a common speech pattern which makes it sound so natural.

You as a non-native speaker don’t have to use it.

You can definitely go high in pitch and drop down at the end.

‘The first name is Lisa.’

I’m a fond of this kind of conversation, where you go up and close it at the end.

But notice that these are the patterns that they chose to use, knowing that it would make it sound more natural.

‘Okay, great!’

‘Okay, great!’

She could have said just

‘Thank you!’

‘Okay, great!’

That’s how people comment on something that they’re happy about.

‘Okay, great!’

‘Thanks!’

And there is a build up here in terms of the intonation.

That shows that, one thing is a little more important than the other.

‘Okay, great!’

‘Thanks!’

Rising, falling and then rising intonation at the end.

So to conclude, in order to answer our question

What makes a voice sound more natural?

We look at what the people at Google did, to make their Google assistant sound like a real human being.

So when it comes to intonation, it wasn’t monotonous.

‘Hi, I’d like to book a woman’s haircut’

But it had that nice glide

‘Hi, I’d like to book a woman’s haircut’

So every syllable had a different note.

Also, at the beginning of an idea or a sentence, it started high in pitch.

Every important word stuck out.

So it was a little higher in pitch and longer.

And at the end, every sentence ending, ended up with rising – rising intonation.

Almost like a question even though it wasn’t always a question.

Why? Because up-speak is a common speech pattern in U.S. today, whether you like it or not.

Another thing they added is those extra words

‘just’

‘for now’

‘hmm’

Extra sounds.

‘Mm-hmm’, that made it sound more natural and even here intonation played a major role.

Because it wasn’t flat. ‘

‘Mm-hmm’

‘Mm-hmm’

Right, it was really like music.

‘hmm’

And the last thing was phrasing, taking small pauses to indicate that the person is thinking

I mean the machine is thinking, I mean the assistant is thinking.

I don’t even know how to call it anymore. This is how actually people speak. They take small pauses between

chunks, parts of the sentence, not between words and not only at the end of the sentence.

As I said, we want to recognize these patterns as we just did today and recognize what makes it sound more

natural, more conversational and then take these elements and add them to our speech in English.

And it’s also great for you as a speaker, because sometimes you need to come up with the right words

so it doesn’t have to be 100% concise.

Because it’s not concise for American speakers as well and it can give you time, those extra filler words like

‘hmm’ and ‘well’

And the phrases and the pauses and the extra words like

‘just’ and ‘okay’

That can give you some, that can buy you some time to come up with a right word, in order to

convey what you want to say.

And as a side note, to all you non-native speakers out there

when we look at the presentation, we see that Sundar, Google CEO, is not a native English speaker.

And he is a phenomenal presenter.

This is to say, that you don’t have to lose your accent to be a great speaker in English.

In fact, the accent is an advantage, it reveals some layers that you have as a speaker

It shows that you carry your history behind you, that you have an interesting story.

You don’t want to lose your accent. You don’t want to hide your accent.

You do want to use the elements of speech to sound great.

To convey your message, to be a strong speaker, to speak slowly, to be clear, to be understood.

But it doesn’t mean that you need to lose your accent.

So when you work on your accent, and intonation, and rhythm, and stress, your goal should not necessarily be

lose your accent, speak like a native speaker.

But be the best speaker that you can.

With or without a foreign accent, because that doesn’t really matter.

What matters is how you feel about yourself and how you convey your message

and if you’re clear and communicative.

Now I have a question for you.

What other elements of speech, whether it’s specific words or phrases or intonation patterns, do people use

that make them sound more natural? What have you noticed? What are you using?

So let me know in the comments below

‘So’ is one of them, I use ‘so’ all the time, you’ve probably noticed.

That’s it! Thank you so much for watching.

Please share this video with your friends if you liked it and don’t forget to subscribe to my YouTube channel

and click on the belt to get notifications

there are a lot more videos coming up about American intonation, so you don’t want to miss it out.

Have a wonderful week and I’ll see you next week, in the next video.

Bye.

Show Episode Transcript

6 Comments on “What we can learn from Google Assistant about sounding natural”

  1. WOW that’s is wonderful. What can I say? It makes me more appreciate the beauty of both English and technology.

  2. It’s really interesting the cacteristics of a native speaker coming from a robot and I want to tell you that it was a geeat idea to add that record in your videos for teaching. I’m a teacher, when I see something different, new and usefull that brings inovation in teaching I love to recognize and encourage the author.
    God bless you!

  3. There’s one detail I’ve noticed that you don’t point out 🙂 and might be interesting(?): after the 1st sentence, the assistant makes a “thinking sound”, like she has some doubt “humm” or something like that. In my opinion, this kind of noises add a lot of “reality” to the thing. What do you think about it?

  4. I understand your analysis of the conversation and all elements of speech that makes it appear natural . But what I admire really was the flow of the words and their rythems that makes me feel the great ability to attract the lestner making him never expect the talker to be a robot .

  5. Native speakers use filler words such as: I mean, well, fantastic, so on, cool, great, I see, well done…

Leave a Reply

Your email address will not be published. Required fields are marked *