CompAudio [options] AFileA [AFileB]
Compare audio files, printing statistics
This program gathers and prints statistics for one or two input audio files.
The signal-to-noise ratio (SNR) of the second file relative to the first file
is printed. For this calculation, the first audio file is used as the
reference signal. The "noise" is the difference between sample values in
the files. This program can also be invoked with just one file name. In
that case, only the statistics for that file are printed.
Multi-channel audio files are treated as if they were single channel files
with the effective sampling frequency increased by a factor equal to the
number of channels.
For each file, the following statistical quantities are calculated and
Xm = SUM x(i) / N
sd = sqrt [ (SUM x(i)^2 - Xm^2) / (N-1) ]
Xmax = max (x(i))
Xmin = min (x(i))
For data which is restricted to the range [-32768,+32767], two additional
counts (if nonzero) are reported.
Number of Overloads:
Count of values taking on values -32768 or +32767, along with the number
of such runs. For 16-bit data from a saturating A/D converter, the
presence of such values is an indication of a clipped signal.
Number of Anomalous Transitions:
Dividing the 16-bit data range into 2 positive regions and 2 negative
regions, an anomalous transition is a transition from a sample value in
the most positive region directly to a sample value in the most negative
region or vice-versa. A large number of such transitions is an
indication of wrapped values or byte-swapped data.
An optional delay range can be specified when comparing files. The samples
in file B are delayed relative to those in file A by each of the delay values
in the delay range. For each delay, the SNR with optimized gain factor (see
below) SNR is calculated. For the delay corresponding to the largest SNR,
the full regalia of file comparison values is reported.
SNR = ------------------------------------------- .
SUM xa(i)^2 - 2 SUM xa(i)*xb(i) + SUM xb(i)
The corresponding value in dB is printed.
SNR with optimized gain factor:
SNR = 1 / (1 - r^2) ,
where r is the (normalized) correlation coefficient,
r = -------------------------------------- .
sqrt [ (SUM xa(i)^2) * (SUM xb(i)^2) ]
The SNR value in dB is printed. This SNR calculation corresponds to
using an optimized gain factor Sf for file B,
Sf = --------------- .
This is the average of SNR values calculated for segments of data. The
segment length by default corresponds to 16 ms (128 samples at a sampling
rate of 8000 Hz). However if the sampling rate is such that the segment
length is less than 64 samples or more than 1024 samples, the segment
length is set to 256 ssamples. For each segment, the SNR is calculated
SS(k) = log10 (1 + --------------------------) .
0.01 + SUM [xa(i)-xb(i)]^2
The term 0.01 in the denominator prevents a divide by zero. This value
is appropriate for data with values significantly larger than 0.01. The
additive unity term discounts segments with SNR's less than unity. The
final average segmental SNR is calculated as
SSNR = 10 * log10 ( 10^[SUM SS(k) / N] - 1 ) dB.
The subtraction of the unity term tends to compensate for the unity term
If any of these SNR values is infinite, only the optimal gain factor is
printed as part of the message (Sf is the optimized gain factor),
"File A = Sf * File B".
The command line specifies options and file names.
-d DL:DU, --delay=DL:DU
Specify a delay range. Each delay in the delay range represents a delay
of file B relative to file A. The default range is 0:0.
-s SAMP, --segment=SAMP
Segment length (in samples) to be used for calculating the segmental
signal-to-noise ratio. The default is a length corresponding to 16 ms.
-P PARMS, --parameters=PARMS
Parameters to be used for headerless input files. This option may be
given more than once. Each invocation applies to the files that follow
the option. See the description of the environment variable RAWAUDIOFILE
below for the format of the parameter specification.
Print a list of options and exit.
Print the version number and exit.
This environment variable defines the data format for headerless or
non-standard input audio files. The string consists of a list of parameters
separated by commas. The form of the list is
"Format, Start, Sfreq, Swapb, Nchan, ScaleF"
The default values for the audio file parameters correspond to the following
Format: File data format
The lowercase versions of these format specifiers cause a headerless
file to be accepted only after checking for standard file headers; the
uppercase versions cause a headerless file to be accepted without
checking the file header.
"undefined" - Headerless files will be rejected
"mu-law8" or "MU-LAW8" - 8-bit mu-law data
"A-law8" or "A-LAW8" - 8-bit A-law data
"unsigned8" or "UNSIGNED8" - offset-binary 8-bit integer data
"integer8" or "INTEGER8" - two's-complement 8-bit integer data
"integer16" or "INTEGER16" - two's-complement 16-bit integer data
"float32" or "FLOAT32" - 32-bit floating-point data
"text" or "TEXT" - text data
Start: byte offset to the start of data (integer value)
Sfreq: sampling frequency in Hz (floating point number)
Swapb: Data byte swap parameter
"native" - no byte swapping
"little-endian" - file data is in little-endian byte order
"big-endian" - file data is in big-endian byte order
"swap" - swap the data bytes as the data is read
Nchan: number of channels
The data consists of interleaved samples from Nchan channels
ScaleF: Scale factor
Scale factor applied to the data from the file
"undefined, 0, 8000., native, 1, 1.0"
This environment variable specifies a list of directories to be searched when
opening the input audio files. Directories in the list are separated by
colons (semicolons for MS-DOS).
Author / version
P. Kabal / v1r11 1996/08/12
Main Index audio