CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data


Abstract

Accent normalization (AN) systems often struggle with unnatural outputs and undesired content distortion, stemming from both suboptimal training data and rigid duration modeling. In this paper, we propose a "source-synthesis" methodology for training data construction. By generating source L2 speech and using authentic native speech as the training target, our approach avoids learning from TTS artifacts and, crucially, requires no real L2 data in training. Alongside this data strategy, we introduce CosyAccent, a non-autoregressive model that resolves the trade-off between prosodic naturalness and duration control. CosyAccent implicitly models rhythm for flexibility yet offers explicit control over total output duration. Experiments show that, despite being trained without any real L2 speech, CosyAccent achieves significantly improved content preservation and superior naturalness compared to strong baselines trained on real-world data.

You can download all audio files on this page by cloning this github repository.

Speech Samples for General Evaluation

In this panel, random samples from the testing set are presented.

Speaker BWC (Chinese) BWC (Chinese) LXC (Chinese) LXC (Chinese) NCC (Chinese) NCC (Chinese) TXHC (Chinese) TXHC (Chinese) ASI (Hindi) ASI (Hindi) RRBI (Hindi) RRBI (Hindi) SVBI (Hindi) SVBI (Hindi) TNI (Hindi) TNI (Hindi) ABA (Arabic) ABA (Arabic) SKA (Arabic) SKA (Arabic) YBAA (Arabic) YBAA (Arabic) ZHAA (Arabic) ZHAA (Arabic) HJK (Korean) HJK (Korean) HKK (Korean) HKK (Korean) YDCK (Korean) YDCK (Korean) YKWK (Korean) YKWK (Korean) EBVS (Spanish) EBVS (Spanish) ERMS (Spanish) ERMS (Spanish) MBMPS (Spanish) MBMPS (Spanish) NJS (Spanish) NJS (Spanish) HQTV (Vietnamese) HQTV (Vietnamese) PNV (Vietnamese) PNV (Vietnamese) BDL (American) BDL (American) CLB (American) CLB (American) RMS (American) RMS (American) SLT (American) SLT (American)
Text

But Martin smiled a superior smile

Without them he could not run his empire

In her haste to get away she had forgotten these things

From my earliest recollection my sleep was a period of terror

A rising tide of fat had submerged them

Miss Brodie's smile was slightly sarcastic

But already he had composed himself

He looked like one who had passed through an uncomfortable hour or two

He moved away as quietly as he had come

It occurred to me that there would have to be an accounting

But he reconciled himself to it by an act of faith

I had been born with no organic chemical predisposition toward alcohol

I want to know how all this is possible

They likewise are disinclined to being eaten

He was smooth shaven and his hair and eyes were black

There is not an iota of truth in it certainly not

The girl faced him her eyes shining with sudden fear

Lots of men take women buggy riding

They are his tongue by which he makes his knowledge articulate

His immaculate appearance was gone

Do you know that you are shaking my confidence in you

They will search for us between their camp and Churchill

What an excited whispering and conferring took place

That is the strange part of it

The very thought of the effort to swim over was nauseating

He caught himself with a jerk

When I can't see beauty in woman I want to die

They edged nearer and stood shoulder to shoulder facing their world

I was brought up the way most girls in Hawaii are brought up

One by one the boys were captured

Obviously it was a disease that could be contracted by contact

Your father's fifth command he nodded

He was sure now of but few things

Now you understand

The land exchanged its austere robes for the garb of a smiling wanton

Philip knew that she was not an Indian

All an appearance can know is mirage

Fresh meat they failed to obtain

All operations have been carried on from Montreal and Toronto

I just do appreciate it without being able to express my feelings

Now go ahead and tell me in a straightforward way what has happened

But how are you going to do it

But there came no promise from the bow of the canoe

Down there the earth was already swelling with life

Philip bent low over Pierre.

It lived in perpetual apprehension of that quarter of the compass.

A bush chief had died a natural death.

Besides, had he not whipped the big owl in the forest.

The other felt a sudden wave of irritation rush through him.

Mab, she said.

Suddenly his fingers closed tightly over the handkerchief.

He heard a sound which brought him quickly into consciousness of day.

Source
FramAN
TokAN-1
TokAN-2
CosyAccent-1
CosyAccent-2

* please scroll horizontally to explore additional columns in the table.

Speech Samples with Total Duration Control -- Long Cases

In this panel, the testing samples are relatively long compared to the text, likely due to disfluencies.

Speaker BWC (Chinese) LXC (Chinese) NCC (Chinese) TXHC (Chinese) ABA (Arabic) YBAA (Arabic) ZHAA (Arabic) YDCK (Korean) YKWK (Korean) EBVS (Spanish) MBMPS (Spanish) NJS (Spanish) PNV (Vietnamese)
Text

It occurred to me that there would have to be an accounting

His immaculate appearance was gone

His immaculate appearance was gone

The land exchanged its austere robes for the garb of a smiling wanton

He was sure now of but few things

They will search for us between their camp and Churchill

There is not an iota of truth in it certainly not

I never allow what can't be changed to annoy me

He was sure now of but few things

In her haste to get away she had forgotten these things

The steward has just tendered me a respectful bit of advice

He was sure now of but few things

He was sure now of but few things

Source
FramAN
TokAN-1
TokAN-2
CosyAccent-1
CosyAccent-2

* please scroll horizontally to explore additional columns in the table.