OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

Abstract

With the rapid development of large language models, researchers have created increasingly advanced spoken dialogue systems that can naturally converse with humans. However, these systems still struggle to handle the full complexity of real-world conversations, including audio events, musical contexts, and emotional expressions, mainly because current dialogue datasets are constrained in both scale and scenario diversity. In this paper, we propose leveraging synthetic data to enhance the dialogue models across diverse scenarios. We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios. Based on this dataset, we introduce OmniChat, a multi-turn dialogue system with a heterogeneous feature fusion module, designed to optimize feature selection in different dialogue contexts. In addition, we explored critical aspects of training dialogue systems using synthetic data. Through comprehensive experimentation, we determined the ideal balance between synthetic and real data, achieving state-of-the-art results on the real-world dialogue dataset DailyTalk. We also highlight the crucial importance of synthetic data in tackling diverse, complex dialogue scenarios, especially those involving audio and music.

A.Samples of ShareChatx dataset.

A.1. ShareChatx-Emotion

Case 1

[Normal, Normal, Happy]
I've been really getting into painting lately. It's such a calming activity.
[High, Normal, Neutral]
That sounds lovely! What kind of paintings do you like to create?
[Normal, Normal, Happy]
Mostly landscapes, but I enjoy abstracts sometimes too. They let you express more emotions.
[High, Normal, Neutral]
Abstracts are intriguing! I’ve never tried that. Have you taken any classes?
[Normal, Normal, Neutral]
Not yet, I just watch a lot of online tutorials and experiment on my own.
[High, Slow, Neutral]
That’s a great way to learn. It's interesting to find your own style.
[Normal, Fast, Happy]
I find that creating art gives me a sense of fulfillment.
[High, Fast, Happy]
That’s awesome! It must feel exhilarating to express yourself like that!

Case 2

[Normal, Normal, Happy]
Did you catch the last game between the Lakers and the Warriors? It was intense!
[High, Normal, Happy]
Oh, absolutely! The way Curry shot that three-pointer was unbelievable! What a moment!
[High, Fast, Surprised]
I know, right? I couldn't believe he pulled that off under pressure!
[Normal, Normal, Neutral]
Yeah, I guess both teams always bring their A-game, don't they?
[Low, Slow, Neutral]
I really enjoyed watching that match yesterday.
[Low, Slow, Sad]
I did too, but I wished it had gone differently...it was tough to watch when they lost.

Case 3

[Normal, Normal, Happy]
Did you ever try peanut butter on a burger? It's actually pretty good!
[High, Normal, Surprised]
Really? I've never heard of that combination before!
[Low, Normal, Neutral]
Yeah, it sounds weird, but it has this nice sweetness with the salty flavors.
[Normal, Fast, Happy]
That actually sounds intriguing! What about some pickles with ice cream?
[Normal, Normal, Disgusted]
Ugh, yeah, I'm not sure about that one. It sounds kind of off.
[High, Fast, Happy]
Oh come on! It's like sweet and sour all at once. You should be more adventurous!
[Low, Slow, Sad]
Maybe. I just can't get past the idea of it.
[High, Fast, Neutral]
I guess we all have our limits. But that's what makes it fun, isn't it?
[Low, Slow, Disgusted]
I’ve also heard of chocolate with bacon.
[Low, Slow, Disgusted]
Chocolate with bacon? I don't think I can wrap my head around that...

Case 4

[Normal, Normal, Neutral]
I had a really weird nightmare last night.
[High, Normal, Surprised]
Oh really? What happened in it?
[Low, Normal, Fearful]
I was trapped in a dark room, and I couldn't move at all.
[Normal, Slow, Sad]
That sounds terrifying! Sleep paralysis can feel so real.
[High, Fast, Fearful]
It's such an unsettling experience.
[High, Fast, Angry]
Yeah? That must be so annoying to deal with!

A.2. ShareChatx-Audio

Case 1

[Normal, Normal, Happy]
It's such a beautiful day despite the rain, don't you think?
[Low, Normal, Neutral]
Yeah, I really love this kind of weather.
[Normal, Normal, Happy]
I find the sound of the rain so relaxing.
[Low, Normal, Neutral]
It has a certain charm to it, for sure.
[High, Normal, Surprised]
I heard a rumor that it will flood here soon!
[Low, Normal, Fearful]
That sounds a bit alarming, I hope it won’t be as bad as they say.
[High, Normal, Sad]
Yeah, I guess we have to be careful.
[Low, Normal, Neutral]
Just have to keep an eye on things then.

Case 2

[Normal, Normal, Fearful]
We need to find cover; this storm is only going to make things worse.
[High, Fast, Angry]
It's not the storm I'm worried about; it's the gunfire nearby!
[Normal, Normal, Fearful]
Stay low! They could be targeting us without thinking twice.
[Low, Slow, Sad]
I didn’t expect this could happen. We were just out for a quick run.
[Normal, Normal, Neutral]
I know… but we've got to keep moving to find safety.
[High, Normal, Surprised]
Did you hear that? It sounds like they’re closing in on us.

Case 3

[Normal, Normal, Happy]
It's such a beautiful evening! The horses seem very relaxed tonight.
[High, Normal, Happy]
I know, right? I love how the crickets are singing us a little tune.
[Low, Normal, Neutral]
Even better with the stars starting to come out, don't you think?
[High, Fast, Happy]
Absolutely! This moment feels perfect!
[Normal, Normal, Neutral]
It's nice to take a breath and enjoy this piece of nature.
[High, Slow, Happy]
Yes! And just listening to those sounds, so calming.

A.3. ShareChatx-Music

Case 1

[Normal, Normal, Happy]
I really enjoy the hands of the workspace creating that 'aaah' intertwined sound. It almost evokes nostalgia, don't you think?
[Normal, Normal, Happy]
Totally! It really captures the warmth of old memories blending with that vintage environment, like an old film's score.
[Low, Normal, Neutral]
And that is intensified with those mellow piano chords. They create such a meditative state.
[High, Normal, Neutral]
Right? It sort of takes you somewhere, and the woodwind melody adds an interesting layer.
[High, Normal, Surprised]
I didn’t expect to feel transported by something that sounds so eerie upon first listen!
[High, Fast, Angry]
Eerie might be a stretch! It’s simply haunting, a kind of hypnotic appeal, if you ask me!
[Low, Fast, Neutral]
The combination of chair squeaks and dark undertones makes a unique atmosphere.
[High, Fast, Surprised]
That's true! It creates this tension that's oddly appealing mixed with cheerfulness, right?

Case 2

[Normal, Normal, Happy]
Did you hear that new r&b track with the echoing harmonizing vocals? Completely mesmerizing!
[High, Normal, Happy]
Oh, totally! The way they used that pulsating synth melody really drew me in!
[Normal, Normal, Happy]
Right? And that punchy snare gives the song such a strong rhythm.
[High, Normal, Happy]
Absolutely! Plus, the shimmering shakers add just the right touch of flair.
[Normal, Normal, Happy]
Definitely! It feels so romantic, almost like you can feel the emotion behind every note.
[High, Normal, Happy]
I agree! And I just love that passionate male vocal overlay, it really makes the track stand out!
[High, Slow, Neutral]
The overall mix is great but some parts feel a bit low quality.
[High, Slow, Neutral]
Yes, some segments do sound altered, but I like the raw essence it adds to the track overall.

Case 3

[High, Normal, Happy]
Hey, did you hear that new song with the ukulele? It's really such a cute vibe!
[Normal, Normal, Happy]
Yeah! I loved it, especially that part where a child was singing. Really heartwarming!
[Normal, Normal, Neutral]
The syncopated strumming was really interesting, I felt the rhythm was different yet catchy.
[High, Normal, Neutral]
I agree! And that mix with a backbeat was definitely engaging, kept my feet tapping.
[Normal, Normal, Happy]
Plus, those loud wood block sounds reminded me of playground games! Totally nostalgic!
[Low, Normal, Happy]
Right? It’s like a little party in the back of my head! Made me want to join in and dance.
[Normal, Fast, Neutral]
Funny how low fidelity somehow makes it feel more authentic, isn't it?
[High, Normal, Neutral]
Yes, it really gives it character!
[Normal, Normal, Neutral]
The whole mix really stands out in the song.
[Low, Fast, Happy]
Yeah, the layers in it are so delightful; they really bring an extra element that pulls you in!

B.Output samples of OmniChat on Dailytalk dataset. *

Case 1

[Neutral]
What are you hiding in your coat pocket?
[Neutral]
It's a camera.
[Neutral]
Umm you want to take pictures of the inside of your pocket?
[Neutral]
Not even funny, Billy. This is my new snapshot sticker camera.
[Neutral]
What's that supposed to mean?

[Neutral]
It has stickers. You can stick stickers on the things through the camera.
Well, it means I can take little picture stickers and stick them on things.
It's a sticker that you can stick on your clothes to take pictures of the inside of your pockets.

Case 2

[Neutral]
You look tired.
[Neutral]
Yeah, I’ve been working so much overtime lately.
[Surprised]
Really? How come?
[Neutral]
My boss gave me a big project. I had to have it finished by this morning. It was so difficult!
[Neutral]
You shouldn’t work so hard.
[Neutral]
I know. But hard work pays off, you know.
[Neutral]
What do you mean?

[Happy]
Well, if I can finish this project on time, I’ll get a raise at the end of the month.
It’s a good opportunity to learn new skills.
Well, I got the promotion I wanted after putting in all that extra effort.

Case 3

[Neutral]
Julia, I apologize to you for what I have done. Please forgive me.
[Neutral]
Uh, I've already forgiven you. Actually, I also owe you an apology. I was so rude to you that day.
[Neutral]
It doesn’t matter. I know that people are easy to get mad when they're tired.

[Neutral]
Let's just put it behind us, and move on, shall we?
Yeah, everyone has those days. Just take a break.
Yes, it's true that being tired can make people act irritable.

C.Output samples of OmniChat on ShareChatx-Emotion dataset.*

Case 1

[Normal, Normal, Happy]
You know, embracing change is really exciting! It can open new doors for us.
[High, Normal, Neutral]
Absolutely, but it can also feel a little overwhelming at times.
[Normal, Fast, Happy]
For sure! But I think that's part of the journey. We grow from it.
[Low, Normal, Sad]
True, but sometimes it's hard to let go of what we’re used to.
[Normal, Normal, Happy]
It can be challenging to get used to it.

[Normal, Normal, Happy]
Yeah, but I think it's worth it in the end, right?
Just remember, every step forward is a step towards something better
Yes, change requires effort and patience.
[Normal, Normal, Happy]
You know, embracing change is really exciting! It can open new doors for us.
[High, Normal, Neutral]
Absolutely, but it can also feel a little overwhelming at times.
[Normal, Fast, Happy]
For sure! But I think that's part of the journey. We grow from it.
[Low, Normal, Sad]
True, but sometimes it's hard to let go of what we’re used to.
[High , Normal, Sad]
It can be challenging to get used to it.

[Low, Slow, Sad]
I know, it can be really tough to adjust sometimes.
Just remember, every step forward is a step towards something better
Yes, change requires effort and patience.

Case 2

[Normal, Normal, Neutral]
It's quite distressing to think about, isn't it?
[Normal, Normal, Sad]
Yeah, it seems like kids are getting such different chances.
[Low, Normal, Neutral]
It's not fair to those who are at a disadvantage.
[Low, Normal, Angry]
Exactly! There's a huge gap that shouldn't exist.
[Normal, Fast, Neutral]
Education is supposed to level the playing field.
[High, Normal, Happy]
True, but right now, it feels more like a lottery.
[Low, Slow, Sad]
Education is supposed to help everyone equally.

[Low, Slow, Sad]
I know, it's really disheartening to think about.
It's meant to open doors for everyone, but sometimes it doesn't work that way.
Education is supposed to help everyone equally. But in reality, it often doesn't.
[Normal, Normal, Neutral]
It's quite distressing to think about, isn't it?
[Normal, Normal, Sad]
Yeah, it seems like kids are getting such different chances.
[Low, Normal, Neutral]
It's not fair to those who are at a disadvantage.
[Low, Normal, Angry]
Exactly! There's a huge gap that shouldn't exist.
[Normal, Fast, Neutral]
Education is supposed to level the playing field.
[High, Normal, Happy]
True, but right now, it feels more like a lottery.
[Normal, Normal, Neutral]
Education is supposed to help everyone equally.

[Normal, Normal, Neutral]
Yeah, it really should be a fair shot for everyone.
It's meant to open doors for everyone, but sometimes it doesn't work that way.
Education is supposed to help everyone equally. But in reality, it often doesn't.

Case 3

[Normal, Normal, Happy]
You know, I really appreciate our open talks about everything.
[Normal, Normal, Happy]
Me too! It feels nice to just be honest with each other.
[Normal, Normal, Neutral]
It's important that we trust each other, especially with personal stuff.
[Normal, Normal, Neutral]
Absolutely, trust is the foundation of any good relationship.
[Low, Normal, Sad]
But sometimes, trust can be broken so easily, which is really tough.
[Normal, Normal, Neutral]
Totally understand that. It can leave you feeling uncertain.
[Low, Slow, Sad]
It can leave you feeling uncertain.

[Low, Slow, Neutral]
Yeah, it really does. But with understanding, we can overcome it together.
Yeah, it's tough when trust gets shaken.
Yes, it can take time and effort to rebuild trust after it's been broken.
[Normal, Normal, Happy]
You know, I really appreciate our open talks about everything.
[Normal, Normal, Happy]
Me too! It feels nice to just be honest with each other.
[Normal, Normal, Neutral]
It's important that we trust each other, especially with personal stuff.
[Normal, Normal, Neutral]
Absolutely, trust is the foundation of any good relationship.
[Low, Normal, Sad]
But sometimes, trust can be broken so easily, which is really tough.
[Normal, Normal, Neutral]
Totally understand that. It can leave you feeling uncertain.
[Normal, Normal, Neutral]
It can leave you feeling uncertain.

[Normal, Normal, Neutral]
Yeah, it really does. It's hard to move on from that feeling.
Yeah, it's tough when trust gets shaken.
Yes, it can take time and effort to rebuild trust after it's been broken.

D.Output samples of OmniChat on ShareChatx-Audio dataset.*

Case 1

A pigeon coos softly and then there is a lot of pecks and cooing by a lot of birds.
[Normal, Normal, Happy]
Can you believe how peaceful it is here? The birds sound so lovely!
[Low, Normal, Neutral]
Yeah, it definitely has a certain charm.
[High, Normal, Surprised]
I didn’t expect to see so many birds together, it’s like they're having a party!
[Normal, Normal, Happy]
It really feels like a joyful gathering, doesn’t it?
[High, Normal, Neutral]
And they all seem to get along with each other nicely.

[Normal, Normal, Happy]
Yeah, it sounds like they are peacefully eating together, right?
Isn't it amazing how they coexist so peacefully?
Yes, the birds seem to be enjoying each other’s company.

Case 2

A motorcycle engine is running.
[Low, Normal, Happy]
Can you believe how nice the weather is today?
[Normal, Normal, Happy]
I know, right? Perfect day for a ride!
[Normal, Fast, Happy]
Exactly! I just got my bike tuned up.

[High, Normal, Happy]
Cool, the sound of the engine has already got me excited.
Awesome! It's the best time to hit the trails.
That's great! Where did you go riding?

E.Output samples of OmniChat on ShareChatx-Music dataset.*

Case 1

A male vocalist sings this retro classic melody. The tempo is slow with a melodious electric guitar lead, soft acoustic guitar accompaniment and subtle bass complimenting the soft vocals and backup. The song is mellow, melancholic, pensive, insightful, euphonious, dulcet and soothing. There is a sound of static in the background. This song is classic Rock.
[Normal, Normal, Happy]
Have you heard that new track with the male vocalist? The vibe is so mellow!
[Normal, Normal, Happy]
Yes! I love the way the slow tempo really sets a nostalgic tone.
[Low, Normal, Neutral]
The lyrics are quite insightful, aren’t they? They really stick with you.
[Normal, Normal, Happy]
Totally! It has this blend of soft yellow sad energy but with an electric guitar lead that's exotic.
[Normal, Slow, Neutral]
It’s like classic pop mixed with something modern. Definitely a fresh take.
[High, Fast, Neutral]
Right? I feel a bit pensive and yet relaxed. What do you think of the soft bass line underpinning the vocals?
[High, Fast, Neutral]
The overall mood seems a bit melancholic, doesn't it?

[Normal, Normal, Happy]
Yeah, it definitely has that melancholic vibe, but it's also quite soothing.
Yeah, there's a touch of melancholy that makes you reflect on past experiences.
It feels uplifting in its own unique way.

Case 2

This is a Carnatic classical music piece. There is a female vocalist singing melodically while shifting her voice in a veena-like manner. The sitar is playing the melody and the tabla is playing the rhythm. There is a sentimental and dreamy atmosphere to this piece.
[Low, Normal, Happy]
Wow, have you listened to that new Carnatic classical piece with the veena playing? It’s beautiful!
[Normal, Normal, Happy]
I know, right? The female vocals are so emotional and moving. It almost takes you to another place!
[High, Normal, Happy]
Absolutely! And those tabla beats really add that vibrant touch, don’t you think?
[Normal, Normal, Neutral]
Yes, they definitely keep the rhythm alive! The sitar also blends so well throughout the piece.
[Low, Normal, Happy]
Exactly! Everything comes together so harmoniously it feels like a journey.
[Normal, Fast, Happy]
I love how dreamy it sounds, especially the moments when the singing swells!
[High, Normal, Surprised]
And right when you think it can’t get any better, the veena-like singing kicks back in!
[Low, Normal, Neutral]
It has a unique way of transferring emotions with each phrase, doesn't it?
[Normal, Fast, Neutral]
Do you think the nuances in the vocals add richness to the overall texture?

[Normal, Normal, Neutral]
I think it's the blend of traditional and modern elements that makes it stand out.
Absolutely, the subtle changes in melody and rhythm really make the music more engaging
The nuances paint a rich picture that enhances the depth of the sound.

Case 3

This is the live recording of a didgeridoo concert. The didgeridoo is being played at a wide variety of frequency ranges. There is a raw feel to this piece. The sound has tribal characteristics. The sounds from this recording could be sampled for use in beat-making.
[Normal, Normal, Happy]
Man, I really enjoyed that jam earlier, didn’t you?
[Normal, Normal, Happy]
Absolutely, the vibe of that instrumental was amazing!
[Normal, Normal, Happy]
For sure! It felt like a live performance straight from the heart.
[Normal, Normal, Happy]
Right? And the single instrument made it feel so personal.
[Normal, Normal, Happy]
The didgeridoo has such a unique sound; it takes you to another place.
[Normal, Normal, Happy]
I love how it carried a Native vibe throughout the piece. Super raw!
[Normal, Normal, Happy]
Such a tribal feel too; it connected everything really well.
[Normal, Normal, Happy]
I could listen to that type of music all day, it just flows.
[High, Fast, Neutral]
This piece had a really varying range of emotions too.

[Normal, Normal, Neutral]
Yeah, it was interesting how it shifted between calm and intense moments.
Yeah, it took you on an emotional journey, didn't it?
Definitely! It felt like a rollercoaster of feelings.