Articulation-
Prerequisite to Performance
by Arthur Noxon • Presented at the 87th AES Convention in
NY, October 1989
Acoustic Sciences Corporation
Eugene, Oregon USA
The Modulation
Transfer Function is the established basis for testing the quality
of speech intelligibility. This paper reviews the current of MTF
test signals as the performance spec for hi-fi and pro sound playback
rooms. Recordings of the test, made in a listening room under
different conditions of acoustic treatment, will be played while
hard copy is displayed.
Acoustic Articulation is
the ability of an acoustic space to faithfully track signal level
changes. That description alone is sufficient to warrant our attention
to the subject. What would the world be like if we increased audio
signal gain, but did not hear a corresponding sound level rise?
What if we cut the signal power and did not experience a drop
in sound level? Articulation is such a fundamental concept that
it is easily taken for granted. It is the current best indicator
for a communication channel and human perception. That is why
we use articulation measurements as the baseline for evaluating
sound systems.
Introduction
The search to define quality
audio playback has for many years been keyed to electronic performance
specifications. However, the final link in an audio chain is always
the acoustic coupler, the interconnect between the speaker and the
listener. The proverbial chain is still only as strong as its weakest
link and with today’s sophisticated electronics and transducers,
the weakest link in the audio chain is undoubtedly the playback
room. The question inevitably arises as to how to test the room
as the final link in the audio chain and what should be the specification.
The long-standing test procedure
for room acoustics is the RT-60 decay time measurement. In the last
few years, a new acoustic test has been introduced into audio. It
is the speech intelligibility test and it comes from the world of
speech and communication. Intelligibility measurements combine the
consequences of RT-60 with the room’s background noise level
to predict the integrity that remains of a modulated signal that
has been transmitted across a room. This test is applied to the
acoustic link of sound systems that are as huge as a dome stadium
to as small as a telephone earpiece. Intelligibility testing is
now beginning to impact pro sound and hi end audio, that is why
it is the topic of this paper.
Over the last few years B
& K (RASTI) and the Crown (Tecron) each have produced a procedure
to measure speech intelligibility. Their data is converted into
a single number, the STI (Speech Transmission Index). This test
equipment only monitors the performance of an existing system and
is not a piece of diagnostic equipment. The STI is a performance
rating number, it does not help the engineer to know what to fix
in order to get a better STI. The next generation of test equipment
in this arena will naturally be of the diagnostic type.
The concern for intelligibility
and how to measure it is not new. It dates back at least to early
radio days with the problem of signal-to-noise ratio (SNR) that
prevents messages from getting through. The development of the telegraph,
telephone and radio, right on into today’s deep space communications
form a continuous chain of contributions to the advancement in the
understanding of the perception of signals.
Speech Intelligibility Within the last few years, Speech Intelligibility has surfaced
as a performance requirement in sound systems. Engineers, designers,
contractors and architects no longer only work towards smooth-sound
level distributions and properly shaped octave band equalization
(EQ) contours; now they are being required to meet Speech Transmission
Index (STI) criteria. Speech intelligibility is a special application
of the basic concept of articulation. It is a speech band limited
and “weighted” version of articulation.
We encounter something similar
when doing sound level measurements. The “A-Weighted”
sound level frequency response curve is not a “flat”
response curve, it has been modified to include the loss of efficiency
of human perception in the lower and very high frequency range.
It is the weighted response curve that is integrated over the audio
range to achieve the total adjusted sound level in dB,A. This is
directly analogous to the STI which is an integration of the articulation
frequency response curve which has been weighted for the purpose
of speech and communication.
Modulation Transfer
Function The response curve that forms the basis of articulation
measurements is called the MTF, or Modulation Transfer Function,
ranges from zero to 100%. Zero percent MTF signifies that a modulated
signal is undetectable by a person. Tone bursts, as in a Morse code
transmission, would have absolutely no signal modulation at the
receiving end. There are two ways this can happen.
To
achieve zero signal modulation, the receiver could be a long way
from the transmitter. It would receive nothing but background noise,
“static” on the transmission channel. The tone sequence
may well actually be received but it is not perceived by the listener
if the signal is buried more than 10 dB below the background noise
floor. The MTF is zero if the external noise is too loud compared
to the modulated signal.
Another instance in which
MTF drops to zero would occur when transmitting code across a reverb
chamber. With a typical RT-60 of 10 seconds (sound level drops 60
dB in 10 seconds), the rapid staccato of a Morse code will be totally
obscured by the room’s reverberant noise field. Because the
tone of the reverberation sounds just like the signal, it masks
the signal very easily. The reverberant field type of noise easily
masks signal modulation that is 5 dB below the noise floor.
The
preferred signal perception is 100% MTF. Morse Code could easily
have 40 dB of electronic signal modulation, the tone burst signal
level relative to the circuit noise floor. People have limits to
perceived modulation. Sound over 140 dB is painful and that under
10 dB is inaudible. Maximum perceptible modulation is 130 dB. That
is why a 1000 dB signal-to-noise ration is imperceptibly different
from a 100dB SNR, assuming the signal strength for both signals
was the same.
We might be able to tolerate
130 dB of signal level modulation but 20 dB has proven to be effectively
full range. A 10 dB modulated SNR has proven clearly heard, this
would occur if a 70 dB test tone was placed in a 60 dB background
noise level. The
result of many studies in perception is that for effective communication,
modulated 18 dB SNR is sufficient to be called 100% modulation.
At the other end is ½ dB modulation which is essentially
imperceptible. The dynamic range for modulated signals that is significant
to human perception is about 18 dB. With these two end points defined,
all that remains is to fill in between the intervening points. Much
research into human perception has been spent in developing this
relationship shown in Figure 1.
Signal to Noise Ratio By now it should be clear that an articulation test measures
both the dynamic and static behavior of sound levels. A third-octave
or other RTA device measures static sound level conditions. The
sound levels of a facility can be measured first without and later
with a signal applied and the MTF can be evaluated with respect
to background noise.
The
background noise spectrum can be loaded into “Memory A”
of an RTA. Then power up the sound system and measure pink noise
levels at the listening position. Load them into “Memory B.”
The difference between these two curves is the SNR vs. frequency
curve. An example of this is shown in Figure 2.
The SNR can be converted
to MTF by using Figure 1. The resulting TI (Transmission
Index) vs. frequency curve of Figure 4 is a linear,
unweighted response curve. For speech intelligibility the TI is
multiplied
by the weighting curve for (Figure 1) speech. The result shown in
Figure 5 is the band-limited STF (Speech Transfer
Function) curve. The percent of the area coverage under the STF
equals the STI, Speech Transmission Index.
This signal to background
noise version of MTF analysis is fairly straight forward. Most of
us in audio could produce today the STI by using an RTA, the MTF-S/N
chart, the STF weighting curve and a lot of data plotting. This
version of MTF has limited application. Conceptually, it measures
the quality of communication for an anechoic chamber filled with
background noise the announce system in a noisy, large factory or
the PA for a huge, noise crowd of people might be a reasonable application.
Signal to RT-60 Ratio The other aspect of MTF includes reverberation, the more
common problem in audio playback. Reverberation is the energy that
lingers after a signal has been transmitted. No matter how reverberant
a space may be, the residual energy will eventually die away leaving
the ambient background noise as the sound in the room. If an alarm
went off every hour in a reverb chamber a valid signal would be
received because the time between signals far exceeds the decay
time of the reverb chamber. Conversely, a high-speed Morse code
transmitting four bursts per second would be converted to a total
blur of noise, completely inaudible signal modulation.
As
a consequence of reverberation, the signal modulation rate or bursts
per second is related to the MTF. Slow burst rates naturally have
good MTF and fast burst rates often have poor MTF. The range of
burst rates that matter to people and communication is the range
from 2 Hz to 20 Hz and the MTF vs. Reverberation, shown in Figure
6. Burst rates above 20 Hz sound like a low frequency note
and therefore are not capable of being a modulate signal.
Real World MTF The two basic versions of signal-to-noise have been presented.
Background noise and reverberation are combined in most real-life
situations. If the MTF for these two independent processes can be
determined and the combined effect is desired, then we multiply
the background noise MTF by the RT-60 MTF. The result gives the
combined effect of substantial background noise in a reverberant
space.
For
example, consider a noise basketball game in a gymnasium. The crowd
noise level could be 85 dB,A. The PA might be set at 90 dB. The
RT-60 of the occupied gym might be 2.5 seconds. Shown in Figure
7 the SNR of 5 dB gives 75% partial MTF due to the PA level
and crowd noise. The MTF/RT-60 curve gives a partial MTF of about
50% due to the gym reverberance at 2 bursts/sec. The combined effect
is a MTF of about 35%, pretty bad. Successful announcers instinctively
understand this and enunciate slowly to utilize the intelligibility
benefits that go with slow modulation rates.
3-Dimensional MTF
Displays With MTF, the signal modulation rate is not impacted by
the background noise levels but it is strongly effected by the RT-60.
Low modulation rates are more audible than fast modulations in a
reverberant space. At the lowest modulation rate, the MTF is usually
controlled by the background or external noise. MTF for the higher
burst rates are controlled by the reverberation of the room.
The full audio frequency
ranges from 20 Hz to 20 KHz. Not only does the background noise
spectrum vary with frequency, the RT-60 will also vary with frequency.
The next step then is to perform the MTF analysis throughout the
full frequency range. The MTF frequency response curve is absolutely
essential for a detailed analysis or diagnostics of the communication
channel.
If
both the modulation and tonal frequency aspects of MTF are combined,
the result appears as a 3-dimensional print-out, or the MTF waterfall.
Figure 8 illustrates this display. The present day’s use of
MTF analysis is dedicated to speech intelligibility. It is limited
(Figure 9) to modulation rates between 2 and 8
Hz, and a frequency range between 100 Hz and 4 KHz. This is 1/6
of the total 3-dimensional MTF volume available to human perception.
Depending on the application, different sections of the MTF volume
will be used. For example, as shown in Figure 10, a Morse Code transmission
would need a narrow range, about 1/30 of the total MTF space.
A
typical recording studio control room and quality hi-fi listening
room are required to handle a wide frequency range and be capable
of fast modulation rates. Figure 10 also shows
how a precision playback room might occupy 50% of the full MTF space.
Dynamic stability might be required up to a 12 Hz modulation rate
for any frequency ranging between 40 Hz and 16 KHz.
A digital sampling studio
could have even higher expectations and be required to track well
into the first 70% of the MTF space. It might have the full frequency
bandwidth of 20 Hz to 20 KHz and handle up to a 15 Hz modulation
rate. The MTF volume for various categories of performance can only
be estimated at this time as they have yet to be properly defined.
Conclusion The role of MTF analysis in audio is just beginning to
make its presence felt. For the last two years it has been making
its way into audio by the way of commercial sound systems. An advancement
into one specialty area of audio eventually makes its presence felt
in all areas of audio. It is safe to expect that in the next decade
we will be using another rackmount, the MTF will probably be located
just above the RTA and EQ. There can be no doubt that by including
human perception of signals as an audio performance indicator we
will produce even better, more accurate and most importantly, more
relevant audio playback systems.