PVOC()

PVOC

Phase vocoder.

in RTcmix/insts/std

quick syntax:

PVOC(outsk, insk, dur, AMP, inputchan, fftsize, windowsize, DECIMATION, interpolation[, PITCHMULT, npoles, OSCTHRESHOLD])

set_filter(“filter_name” OR “filter_DSO_path”)

Param Field	Parameter	Units	Dynamic	Optional	Notes
p0	output start time	seconds	no	no
p1	input start time	seconds	no	no
p2	duration	seconds	no	no
p3	amplitude multiplier	relative multiplier of analyzed input signal	yes	no
p4	input channel	-	no	no
p5	fft size	samples, power of 2	no	no
p6	window size	samples, normally 2 * fft size	no	no
p7	decimation amount	samples, amount to read in	yes	no	should be < p5
p8	interpolation amount	samples, amount to write out	no	no
p9	pitch multiplier	-	yes	yes	default: 0.0 (no pitch change)
p10	npoles (used for LPC data only)	-	no	yes	leave at 0.0
p11	gain threshold for resynthesis	-	yes	yes	default: 0.0

Parameters labled as Dynamic can receive dynamic updates from a table or real-time control source.

Author: Doug Scott (based on earlier work by Christopher Penrose and others).

set_filter

Param Field	Parameter	Units	Dynamic	Optional	Notes
p0	filter	string	no	no	either the name of a PVOC filter or the full path

NOTE: this subcommand is only available in standalone RTcmix configurations.

Phase vocoding is an analysis/resynthesis technique whereby a sound is analyzed through a filter bank with additional computation of phase deviation from each of the channels of the vocoder (sort of an expanded Fourier transform). The data discerned from the analysis allows for realistic time-independent transposition and, by corollary, pitch-independent time-stretching of a soundfile, with much fewer resynthesis artifacts than would normally be possible (from Dodge and Jerse, 1997).

The RTcmix PVOC instrument uses a standard FFT analysis with additional phase computation, allowing the user to specify FFT parameters and time- and pitch-shifting in terms of multiples of the original sound.

Usage Notes

The “fftsize” (p5) determines the resolution in time and frequency of the anaylsis. This has to be a power of 2. The larger the size (2048, 4096, 8192, etc.) the greater the frequency resolution, but time resolution suffers – larger FFT windows ‘smear’ the signal in time. A lower “fftsize” parameter (64, 128, 256) resolves time-events, but will not have as fine a representation of the frequency spectrum. Such is life.

The “windowsize” parameter (p6) sets how much overlap will occur between analysis windows (chunks) on output synthesis. The amount of overlap is p6 compared to p5. Larger values can create a smoother sound, but it may start sounding a bit reverberant. A value of twice the “fftsize” is usually reasonable. It should also be a power of 2.

The ratio between p7 (“decimation”) and p8 (“interpolation”) determines how much time-dilation or time-compression will occur. The time-scaling factor is p8/p7. These don’t have to be a power of 2, and smaller numbers may yield smoother results. Smaller values are more computationally expensive, however. An example of how the time-shifting works: if p7 is 50 and p8 is 100, then the resynthesized sound will be twice as long as the original sound. If p7 is set to 300 and p8 is 100, the resulting sound will be 0.333 times (100/300) as long (three times faster) than the original sound.

If the optional “pitchmult” (p9) is 0, then PVOC will do an inverse-FFT resynthesis (fairly efficient). If it is > 0, however, it will cause an oscillator-bank resynthesis, with individual oscillators tracking the frequency and amplitude from the FFT analysis. p9 is a direct multiplier of all frequency values, so that a value of 2.0 will shift the entire spectrum up one octave, a value of 0.25 will shift it down two octaves. The following score fragment can be used to calculate an oct.pc transposition:

   transposition = 0.05   // shift up 5 semitones
   pitch_multiplier = cpspch(transposition) / cpspch(0.0)

If p9 is > 0, it also allows for the use of LPC-generated amplitude coefficients for the spectral envelope resynthesis. The optional “npoles” parameter (p10) can set how many poles the LPC filter will have. Smaller values create a very approximate spectral resynthesis, and larger values can generate a filter that is too “lumpy”. Usually values between 20 - 40 are good starting points. A value of 0 will turn off this feature.

Also if p9 > 0, the optional “oscthreshold” (p11) parameter is engaged. During oscillator-bank rsynthesis, only parts of the frequency spectrum with amplitudes greater than this value will be resynthesized. Values > 1.0 will generally start having an effect on the output sound. This feature can be useful for eliminating noise from a signal, although it will cause audible artifacts in the resulting sound.

PVOC can read mono or stereo input files; it only writes mono output.

PVOC has the ability to set filter plugins which operate on the frequency and amplitude bins before they are used to resynthesize the audio. The plugins are loaded by name or by DSO path using the set_filter command. This feature is only available in the standalone version, and the details are, for now, left to those who are willing to examine the source code.

Sample Scores

very basic:

   rtsetparams(44100, 1, 512)
   load("PVOC")

   rtinput("mysound.aif")

   PVOC(start=0, inputskip=0, inputread=DUR(0), amp=0.9, inputchan=0, fft=1024, window=2*fft,
      readin=1024, writeout=2*readin)

slightly more advanced:

   rtsetparams(44100, 1)
   load("PVOC")

   rtinput("mysound.aif")

   // Resynthesize with oscillator bank, at 0.5 the orig pitch,
   // and only with oscillators > 1.1 in amplitude
   PVOC(0, 0, DUR(0), 1, 0, 1024, 2048, 100, 100, 0.5, 0, 1.1)

fun stuff!

   rtsetparams(44100, 1)
   load("PVOC")

   rtinput("mysound.aif")

   start = 0
   inskip = 0
   duration = DUR(0)
   gain = 1
   inskip = 0
   fftsize = 2048
   winsize = 2048*2
   pitch = 1
   decim = 512
   interp = 512
   
   PVOC(start, inskip, duration, gain, 0, fftsize, winsize, decim, interp, pitch)
   
   start = start + duration
   pitch = pitch * 0.8
   PVOC(start, inskip, duration, gain, 0, fftsize, winsize, decim, interp, pitch)
   
   start = start + duration
   pitch = pitch * 0.8
   PVOC(start, inskip, duration, gain, 0, fftsize, winsize, decim, interp, pitch)
   
   start = start + duration
   pitch = pitch * 0.8
   PVOC(start, inskip, duration, gain, 0, fftsize, winsize, decim, interp, pitch)

PVOC

quick syntax:

Usage Notes

Sample Scores

See Also