Chapter 5

Singing and Background Accompaniment Separation

1. DAMP-VSEP samples to illustrate the challenges

Sample 1

  • Mixture provided contain proprietary studio effect (non-linear operation).
  • Vocal and background sources are shifted by 2.5 seconds.
 Track
Mixture
Background
Vocal

Sample 2

  • Mixture provided contain proprietary studio effect (non-linear operation).
  • Vocal and background sources are shifted by 1 seconds.
 Track
Mixture
Background
Vocal

Sample 3

  • Mixture provided contain proprietary studio effect (non-linear operation).
  • Vocal and background sources are shifted by 1 seconds.
  • Background source with vocal from the original artist.
 Track
Mixture
Background
Vocal

2. Recordings used to make the figures.

Figure 5.2 in page 131

 Track
Mixture
Vocal Target
English
English+Duets
English+nonEnglish

Figure 5.3 in page 132

 Track
Vocal Target
Original
Remix

Figure 5.4 in page 135

 Track
Mixture
Vocal Target
Baseline
Composite Loss

Figure 5.5 in page 136

 Track
Mixture
Vocal Target
Composite Loss

3. Samples of separation results using instrumental embeddings.

 Track
Mixture
Target
Baseline
VGGish
VGGish 2pass
X-vectors
X-vectors 2pass