Reply to comment

And the Oscar goes to...

07/03/2005


As the buzz surrounding the Oscars ceremony dies down, two Cambridge engineers are celebrating winning an award for their mathematical methods of removing noise.

Dave Betts and Christopher Hicks

Dave Betts and Christopher Hicks

Christopher Hicks and Dave Betts, from CEDAR Audio, were awarded a Technical Achievement Award - a "technology Oscar" - by the Academy of Motion Pictures Arts and Sciences. The award recognises their role in the development of the DNS 1000, a digital noise suppressor, that uses mathematical methods to strip out background noise from sound recordings, particularly motion picture and TV dialogue. One Academy member said that it was "probably used on every major movie coming out of Hollywood in the past couple of years", including "Spiderman 2" and all the outdoor recording in New Zealand for "Lord of the Rings", as well as on TV programmes such as "The Bill" and "A Touch of Frost".

The sounds that we hear are actually waveforms made up of different frequency vibrations that change over time as the sounds change. Using Fourier analysis, the frequency structure of even the most complicated sound can be written as a sum of sine waves. The waveforms of tonal sounds, such as musical notes, have a well-defined structure. Dialogue, however, is a mixture of both tonal (the vowels) and atonal (the consonants) signals. The frequency of a speech signal can jump around a lot, and there are obvious differences between male and female speech (read more about the mathematics of sound and audio engineering in our interview from Issue 27.)

This 10 second audio signal contains an utterance of about four words in the middle of significant background noise.

This 10 second audio signal contains an utterance of about four words in the middle of significant background noise.

Noise on location can come from a variety of sources, including traffic noise, wind, rain, and even the background hum of camera and recording equipment. When filming on location, getting the right pictures is always the first priority and sound comes a distant second. "Previously people had more time and money to make films, and particularly TV programmes," says Hicks. "They would record the dialogue on location, and then, back in the studio, either make the best of a bad job or get the actors back in to do lengthy ADR (Additional Dialogue Replacement) sessions."

The DNS 1000

The DNS 1000

The DNS 1000 drastically reduces the need for these expensive and time-consuming sessions, explains Hicks, by making location sound recording usable by suppressing background noise. "It relies largely on the fact that the human ear is not perfect [at analysing sound]. The ear has a phenomenon called masking - a strong sound at a certain frequency masks out weak sounds at that and other close-by frequencies." For example, a strong tone at 500Hz masks a weak tone at 510Hz; the ear simply doesn't hear it. So, to remove the noise you only need to remove the parts of the signal that are far from the speech signal in either frequency (signals close in frequency will be masked) or in time (that is, where there is no dialogue).

Filters are used to remove these parts of the signal, just as the treble and bass adjustments you might find on a stereo allow you to turn up or down higher and lower ends of the frequency spectrum. However, the DNS 1000 allows far more precise control, dividing the frequency spectrum into 18 bands that can be adjusted individually. It analyses the incoming signal and uses mathematical algorithms to decide which of these bands to turn up or down to remove the parts that are not close, either in frequency or time, to the dialogue.

Actress Scarlett Johansson hosted the February 12 Science and Technology award ceremony. (Image copyright AMPAS)

Actress Scarlett Johansson hosted the February 12 Science and Technology award ceremony. (Image © AMPAS)

It also uses the fact that the speech signal varies much more in time than background noise does - speech varies over a timescale of 10-100 milliseconds, while background noise is more likely to vary over a 1 second timescale. So the idea is to keep the fast-changing signal and lose the slow-changing one.

"Film presentation has come on in leaps and bounds in the last 10 to 15 years," says Hicks. "People watch TV at home with much better sound - nearly everyone uses stereo and many now use surround sound - and they treat the sound as an integral part of the experience." These improvements in listening technology for film and TV mean we now have much higher expectations of sound quality.

So the next time you settle into your cinema seat, or even your armchair at home, it will be the things that maths and the DNS 1000 have taken out that have added to your high-quality sound experience.


Reply

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.