Loudness Normalization:The Future of File-Based Playback

Raviraj Panchal
Jun 16
21 min read

Today, playback on portable devices dominates how music is enjoyed. A large portion of songs present on the average music player come from online music services, obtained either by per-song purchase or by direct streaming. The listener often enjoys this music in shuffle play mode, or via playlists.

Playing music this way poses some technical challenges. First, sometimes tremendous differences in loudness between selections requires listeners to adjust volume. Second, the reduction in sound quality of music production over the years due to a loudness war. The art of dynamic contrast has almost been lost because of the weaknesses of current digital systems. Third, the potential damage to the ear caused by these loudness differences and a tendency towards higher playback levels in portable listening especially when using earbuds.

The Three Challenges

Loudness Differences

In digital audio, the maximum (peak) audio modulation has a hard ceiling that cannot be crossed. Digital audio tracks are routinely peak normalized. This results in tremendous loudness differences from track to track because the peak level of a signal is not representative of its subjective loudness. Rather, the listener perceives loudness according to the average energy of the signal. Because of the widespread practice of peak normalization, program producers apply severe compression, limiting and clipping techniques to the audio. This removes the original peaks and allows normalization to amplify the signal increasing its average energy. This has resulted in a loudness war with large loudness differences between newer and older material and different genres. When older recordings are included in a playlist with new material, the listener experiences noticeable jumps in loudness from track to track requiring frequent adjustments in playback level. The differences can be as large as 20 dB. The same problem occurs when different musical genres share a playlist. Portable device listening is therefore not the comfortable experience it could be, and computer playback exhibits some of the same problems.

Restoration of Sound Quality to our Recorded Legacy

In the practice commonly referred to as the “loudness war”, many artists, recording engineers and record labels strive to make their recordings sound louder so they will stand out compared to others. The aggressive dynamic range compression used to produce loud recordings reduces peak-to-average-energy ratio. The effect has been that the important artistic and narrative tool of dynamic contrast has almost totally disappeared in modern music production.

The result of this pressure to be louder is that the steps of the production process, recording, mixing and mastering, produce masters that incorporate several generations of digital processing which can cumulate clipping and alias products. This distortion is exacerbated when the product is finally encoded to a lossy medium like AAC. Cumulative distortion also leads to further significant distortion being added during distribution or playback. This is fatiguing to the ear, which turns off some listeners and may even be the cause of reduced sales of contemporary music. This reduction in signal quality and dynamic range amounts to a removal of the very parts of the sound which make programs sound interesting.

By switching from peak normalization to loudness normalization as a default in playback media, producers who wish to mix and master programs with wide dynamic range and without distortion can do so without fear that their programs will not be heard as loudly as the ‘competition’. Loudness normalization also permits older, more dynamic material to live alongside the newer, which will allow listeners to appreciate the sound qualities of more dynamic recordings and permit them to mix genres and recording styles.

Hearing Damage

High playback levels, whether achieved by accident, chosen per personal preference or to over-come ambient noise, are a potential source of hearing damage. This is especially true for headphones and earbuds which, due to their close proximity to the eardrums, require relatively little power to reach damaging levels. In the past some European countries have attempted to address hearing damage by legislating maximum peak output level for portable players. The net result is that it is difficult to enjoy old recordings or dynamic genres like classical music at sufficient loudness on these output-limited devices. Unfortunately this has increased pressure on mastering engineers to remove dynamic peaks from tracks in order to provide loud enough playback levels for the restricted peak level. Again, peak output level is not directly connected to perceived loudness. It is also not used as a predictor of hearing damage potential in international law. Instead the integrated level over a certain period of time should be used.

An Integrated Solution

ITU Loudness Normalization

There is a solution for problems of inconsistent playback, the loudness war, hearing damage and the sound quality issues. This solution is founded in the massive adoption of file-based music consumption in all kinds of formats. All playback devices and music servers are effectively computers that may analyze the average perceptual energy of a file and adjust its playback level accordingly. For international broadcasting, the ITU-R BS.1770-2 standard for loudness measurement has recently been developed. It defines the equivalent loudness of an audio signal as its LUFS level, meaning Loudness Units relative to Full Scale. BS.1770-2 does a very good job in predicting subjective loudness. Loudness normalization based upon BS.1770-2 is being rolled out worldwide for television broadcast. Apple has successfully implemented loudness normalization in its Sound Check algorithm for iTunes and supported portable players. A similar open system known as ReplayGain is available for other players. The adoption of BS.1770-2 by these systems would be advantageous in the sense that music normalization would then be based on one international standard for loudness measurement.

ON by Default

Listener experience will generally improve when a loudness normalization algorithm is turned ON by default. This will also facilitate compliance with regulations to prevent hearing loss. Loudness normalization ON by default would also help to put an end to the “loudness war” in music production. In order for playback devices not to drop in loudness level compared to what listeners had been familiar with, we suggest a different form of system level control that we call Normalized Level Control.

NORM-L (Normalized Level Control)

Typical loudness normalization solutions normalize playback material to a fixed target loudness level. A separate user-adjusted volume control sets playback level following normalization. This is a compromise: if the target level is too low, the maximum acoustical level will not be sufficient in battery-operated devices; if it is too high, normalization will be compromised or distortion introduced. “NORM-L” (Normalized Level control) is a method of resolving the shortcomings associated with a traditional fixed-target solution. The idea behind NORM-L is that upon playback the listener’s volume control sets the loudness target level to which the files will be adjusted. Loudness normalization and volume control are integrated into one gain step. If this would lead to clipping of the file, the applied gain is restricted appropriately. (See Appendix 1 for a detailed description of NORM-L).

Album Normalization

One important refinement to loudness normalization is album normalization. Although it is common for music nowadays to be bought as separate songs, most artists still release their songs in album format. The loudness of album tracks has been carefully balanced by the mastering engineer to optimize the artistic impact of the recordings. In a classical symphony recording, for example, individual movements have a distinct dynamic relationship to each other. If all tracks were normalized to the same target loudness these important aesthetic properties would be lost. Listeners commonly construct playlists from many different albums. In these cases, the loud and soft songs should be reproduced at the producer’s intended relative level; the soft songs should not be brought up to the same loudness as the loud ones. (See Appendix 2 for further details.)

I propose that album normalization be turned on as a default, in order to satisfy the aesthetics of the artist and album producer and the majority of playback situations.

Hearing Damage Protection

In Europe new safety requirements for A&V equipment have been published that prescribe mobile music players must show a warning to users when their hearing is in danger. By integrating these demands in NORM-L, automatic compliance to European law is obtained with the best possible user experience. (See Appendix 3 for a more detailed description and suggested solutions).

Appendix 1: NORM-L (Normalized Level Control)

NORM-L analyzes a file’s average loudness and stores this alongside the audio as FileLUFS metadata. The file’s peak level is also stored, as FilePeak metadata. The audio content of the file is not changed. NORM-L can be described algebraically as follows:

Gain = min ( FaderPosition - FileLUFS, -FilePeak )

Where:

Gain is the setting applied to playback hardware in decibels.

FaderPosition is the physical position of the listener’s volume control. The range of this control is from a MaxFaderPosition at the physical top, down to -infinity. In other words, if MaxFaderPosition is -13 dB, when the user’s fader is at its physical maximum, the value applied to the calculation is -13 dB (see Appendix 3 for MaxFaderPosition recommendations).

FileLUFS is the loudness measurement of the file in LUFS units.

FilePeak is the maximum peak level of the file in decibels relative to digital full scale.

NORM-L can be described graphically as follows:

The recorded file has an average measured loudness (LUFS), indicated with a horizontal line, a maximum peak level (at the top of the red section), and a loudness range (LRA) (the purple segment) which is a measure for the macro-dynamics of a recording, the difference between the average loud and soft parts.

This figure illustrates the loudness distribution which may be found in three different genres:

Because of the differences in the measured average loudness, it is obvious that playback of these three tracks in one sequence would lead to loudness jumps.

Next, an example of how NORM-L solves the problem:

Now the three files play back at the same loudness. The first file's loudness level was -8 LUFS and the NORM-L fader position is at -25, so this file will be attenuated by 17 dB at the moment of playback. Likewise, the -20 LUFS classical file will be attenuated by 5 dB.

Now, we raise the level control to position -20:

Even when turning up NORM-L by 5 dB, the classical material is peaking to the maximum level but does not clip. The other two files still have ample headroom.

However, now the fader position is set to -15:

This setting would cause the classical music to clip, so NORM-L constrains its normalization to prevent this. In this case the file will be played back effectively at -20 although the fader is set to -15. The classical music plays back 5 dB quieter than the other two examples, but is not clipped.

An alternative is to add a limiter to the playback device, as shown below:

This allows the user to increase the loudness of dynamic tracks beyond the normal clipping level, but compromises sound quality as transients are removed by the limiter. The vast majority of recorded music is at an average measured loudness of -16 LUFS or higher, so only extremely dynamic material such as late-romantic symphonies, and rare pop material will encounter this clipping issue and only if the listener turns the level control up too far.

For the user, this new type of level control will behave in exactly the same manner he was used to. The only difference is that all songs will sound equal in loudness, regardless of the peak level of the recordings. The main advantage of NORM-L over fixed target systems, such as Sound Check and ReplayGain, is that normalization improves as the fader is lowered.

Appendix 2: Album Normalization

All tracks from one album should receive the metadata value of the loudest track of the entire album, AlbumLUFS. When available, this AlbumLUFS should be used instead of FileLUFS metadata. When a quieter track from an album is played in sequence with other tracks, it will then still receive the intended lower loudness level. To determine the maximum gain, the FilePeak level is still used. Algebraically, the NORM-L formula becomes:

Gain = min ( FaderPosition - AlbumLUFS, -FilePeak )

Appendix 3: MaxFaderPosition, Hearing Damage

In the context of our NORM-L proposal, we advise to limit the volume control of mobile music players to a certain “MaxFaderPosition”. The same parameter can be used to limit the maximum acoustic level of a player and headphone combination as demanded by new safety standards in Europe. We differentiate between four situations.

a) Portable devices and other devices with sufficient headphone output level, not sold in the Euro zone.

For devices with sufficient output level, we recommend a MaxFaderPosition of -13. Well-designed players have more than sufficient analog output to allow a -13 MaxFaderPosition. A -13 max value provides effective normalization for the vast majority of music which is encountered today. Furthermore, most listeners will experience a minimal change or no level drop when normalization is introduced. This will help ensure easy adoption and success of normalization. A higher MaxFaderPosition would offer an even smaller level drop, but this potentially leads to a large dead zone at the top of the volume control for files with low loudness. It would also lead to poor normalization when the user sets the player’s volume control to maximum and the headphone output is connected to a line input feeding an external amplified speaker or a car system.

b) Lower cost mp3 players and other devices with lower output level and headroom, not sold in the Euro zone.

In this case we recommend the lowest possible value that still produces sufficient acoustical output through the included earbuds. Values higher than -13 provide inadequate normalization at higher fader settings. As an alternative, manufacturers should consider improving the head-phone output capability of their players so as to provide adequate level and peak headroom.

c) Line or digital outputs and wireless connections on mobile players, media systems and personal computers.

When placing a mobile device in a docking station, the audio will often be played via a separate digital or analog line output. This output is connected to an amplifier which has its own volume control that functions as the main volume control for the sound system. NORM-L has no advantage here and we advise to use a fixed target level of, preferably, -23 LUFS (based on the EBU Tech Doc 3344). Although this may seem like a low value, the connected amplifiers normally have more than sufficient gain to compensate and the advantage is that even most classical music will be properly loudness normalized without clipping. Another advantage is that when switching to modern AV systems that operate at the same target level, the user will not experience any loudness jump.

d) Hearing Loss Protection in the Euro zone.

For hearing loss prevention, international laws prescribe use of A-weighted intensity measurement and equivalent exposure over time (the dose). In Europe a CENELEC working group, in consultation with the European Committee, has published a standard for portable music playback devices with their included earbuds. The standard requires a safety warning message be given should intensity exceed 85 dB SPL A-weighted (dBA). The listener is required to actively confirm the message before he is allowed to play at higher levels and under no circumstance is playback above 100 dBA permitted. Conventionally the 85 dBA limit is enforced by measuring in realtime the average energy over a 30 second window. As a result, loud passages in dynamic recordings (classical music, for instance) may unnecessarily trigger the warning. The CENELEC group was aware of this and allows that “if data is available of the average level of the whole song, the message may also be given in case the integrated average level exceeds 85 dBA.”

While loudness normalization is being performed, hearing loss prevention can be accomplished by calculating a per-track WarningFaderPosition and MaxFaderPosition that take into account the file’s A-weighted level. By measuring and storing FileDBA (the integrated average A-weighted level), in addition to the FileLUFS loudness level, the same metadata mechanism used for loudness normalization can also be used to produce hearing damage warnings and operating restrictions that conform to EU law. Using FileDBA instead of a conventional real-time measurement has several benefits: The user can be warned of excessive loudness at the beginning of a track instead of being interrupted in the middle. Also, the hearing damage potential of dynamic content like classical music is judged on a more long-term basis in line with hear-ing loss protection standards and law.

The per file fader position at which the device must show a warning and the level at which the device limits its output can be described algebraically as follows:

WarningFaderPosition = 85 - IECLevel + RefnoiseLUFS - FileDBA + RefnoiseDBA MaxFaderPosition = 100 - IECLevel + RefnoiseLUFS - FileDBA + RefnoiseDBA

Where:

WarningFaderPosition is the fader position above which the player must display a warn-ing in conformance with EN 60065.

MaxFaderPosition is the physical maximum of the device’s fader used for the duration of the file.

IECLevel is the EN 50332 measured acoustical level of a portable device at its maximum gain (NORM-L in bypass) with its standard headphones in dB(A) SPL.

RefnoiseLUFS is the measured loudness of EN 50332 single channel reference noise. A value of -13 LUFS should be used here.28

FileDBA is the A-weighted level of the file’s loudest channel.

RefnoiseDBA is the A-weighted level of the EN 50332 reference noise. A value of -12.6 dBA should be used here.

For example, suppose a device can produce a maximum acoustic output of 104 dBA from factory earbuds when playing the reference noise. While playing a file whose loudest channel measures -14.6 dBA, MaxFaderPosition = 100 - 104 - 13 + 14.6 - 12.6 = -15 would need to be used to prevent output from exceeding the 100 dBA hearing loss protection limit.

Portable players may feature an equalizer function. If present, EN 50332 requires that this equalizer be adjusted in order to maximize the sound pressure level and that this setting be used to establish the 100 dBA limit. Because the impact of an equalizer on sound pressure level is content dependent, when the equalizer is engaged it's no longer possible to accurately determine the 85 dBA and 100 dBA thresholds based on FileDBA. To meet EN 50332 requirements, a system must account for the effect of EQ settings. This can be done by conservatively biasing WarningFaderPosition and MaxFaderPosition to ensure that in the presence of EQ, the thresholds are never exceeded. Alternatively manufacturers may choose to design the equalizer such that it will never boost the sound level at any frequency; to affect a boost everything else is cut. An additional advantage of the latter method of EQ is that the system cannot overload be-fore the volume control.

By following this rule, the portable audio device automatically complies to the maximum acoustic level of 100 dBA as specified in EN 60065 and any additional available headroom is used to improve normalization effectiveness. Note that NORM-L in this case should not be defeatable by the user or the device would become illegal. In EU countries where device output so far has been limited to meet the law, old recordings and uncompressed genres like classical music can once again be played with adequate loudness.

Appendix 4: Loudness Analysis

When or where should the loudness analysis of the file take place? Ultimately this is a decision made by player manufacturers. There are at least four options:

by the record label or mastering house
at the point of sale (iTunes Store or other web store)
in the media server (iTunes in the context of Apple products for instance)
in the portable player itself.

Metadata from an unknown source cannot be trusted. So unless the source is secure (as with iTunes), we advise to let the portable player perform the analysis itself as it only has to be per-formed once. Battery power consumption may be a reason to perform loudness normalization of content outside the player. Again, this is a decision ultimately made by manufacturers.

NOISE REDUCTION TECHNOLOGY

INTRODUCTION

The intelligibility of human speech plays an important part in communication. It is both a measure of comfort and comprehension.

The quality and intelligibility of the speech are not only determined by physical characteristics of the speech itself but also by communication conditions and information capacity, the ability to get the information from context, mimics and gestures.

When discussing intelligibility it is important to understand the difference between a real and recorded speech.

During a real conversation a person can recognize the surrounding sounds and concentrate on the speech of another person thus filtering the desired information out of various audio environments. Therefore the ability of a human to recognize and filter sounds significantly increases the intelligibility and comprehension of the speech even if a communication takes place in a noisy environment, situation or condition.

Listening to recorded speech is a different situation. The recording equipment doesn't focus on certain audio streams (unless it is a specialized shotgun microphone) and impartially record everything that happens in the audio spectrum. As a result we receive a “flat picture” of all recorded sounds which often makes the speech unintelligible, quiet and buried in the noises.

Additional reasons why speech recordings may be indistinct and distorted can be due to technical limitations of recording equipment, poorly placed or defective microphones and objective difficulties to record high quality "clean" sound.

As audio recording technologies achieved wider use since the middle of the 20th century the demand for audio processing and noise reduction has also increased exponentially. Even now when audio equipment has less limitations and allows for better quality the need for noise suppression is still of the utmost importance especially in the area of security and law enforcement.

Police departments, military and national security services largely use overt and covert recordings of speech communications that can be a crucial element in investigations and intelligence operations. Needless to say that sometimes audio recording may be the only evidence of a security threat or crime and therefore may become a key element in the case analysis or subsequent court trial. In these cases it is important for speech to be clear and easily understandable to ensure no vital information is lost. Moreover the intelligibility of audio evidence is a must for court proceedings as otherwise they might be eliminated from consideration.

Improving the intelligibility of a speech signal, reducing the noises and compensating for distortions is a main task of noise reduction technology that is currently available through different software and hardware products.

This research paper is aimed at discussing the basics of noise reduction technology, its methods, goals.

CLASSIFICATION OF AUDIO HINDRANCES

To understand the basics of noise reduction technology and successfully use its methods in practice it is important to know what audio hindrances there are, how they differ and their specific unique attributes.

Generally all audio hindrances are divided into two main categories: noises and distortions. If we consider an original human speech in a recording as a useful signal all the additional information which decreases the quality of a useful signal are noises. Everything that changes the original useful signal itself are considered distortions.

Noises are mostly characterized by time and frequency (domains). In time domain noises can be:

Continuous, slowly changing noises, like the sound of an office, industrial equipment, sound of the wind, traffic, hiss of an old record or a bad phone line.

Discontinuous, repeated, usually tonal noises like honks, beeps or bells.

Pulse like, abrupt, usually unharmoniously and sometimes loud noises like clicks, taps of the steps, gunshots, bangs and thumps.

In frequency domains noises can be:

Broad band noises which present at many frequencies like background hiss or fizzing sounds.

Narrow band noises which represent a set of certain frequencies, fairly stable tonal sine waves (sinusoid): drones, power-supply hums, equipment hindrances (drills, chainsaws) machinery engine noises.

Distortions are modifications of the useful speech signal that decrease its quality. When distortions occur, parts or the whole speech signal changes and become new and sometimes can sound unacceptable.

Typical distortions at acoustical level are reverberation and echo effects.

Distortions also occur when the acoustic signal (speech) transforms into electrical signal and meets various technical limitations like:

Filtration of the audio signal caused by poor frequency response (FR) of the recording equipment or communication channel.

Loss of the useful data caused by narrow dynamic range.

Overflow effect which occurs when the amplitude of the acoustic signal is higher that the amplitude that a microphone can process.

Total harmonic distortions which are the additional tones (harmonics) that mask a real signal components and make it indistinct and incomprehensible.

Recording audio data in a compressed lossy format.

Generally the noise reduction technology helps to deal with such kinds of distortions however some types of distortions may completely destroy the useful information and cannot be restored during further signal processing.

	AUDIO HINDRANCES
NOISES		DISTORTIONS
TIME DOMAIN	FREQUENCY DOMAIN	- Reverberation
		- Overflow
CONTINUOUS	BROAD band	- Poor frequency response
- Street noise	- Hiss	- Total harmonic distortions
- Industrial noise	- Swish	- Compression
- Traffic noise	- Fizz
DISCONTINUOUS	NARROW band
- Honks	- Drone
- Beeps	- Power-supply hum
- Bells	- Sirens
PULSE-LIKE	- Whistle
- Clicks	- Equipment hindrances
- Clacks	(drills, chainsaws, machinery engine
- Gunshots etc.	noises)

NOISE REDUCTION METHODS

The process of noise reduction touches on a lot of questions regarding different fields of science (digital signal processing, acoustics, psychoacoustics and physiology) and engineering (programming, constructing etc.).

Its effectiveness depends on the correspondence between the method of processing and the type of audio interferences. Each digital filtration method is more effective for a specific kind of noise. This is why it is necessary to know at least generally what kind of audio hindrances changed an audio recording in order to choose an appropriate processing method. One can identify the audio hindrance in the recording by either the specific sound of the noisy signal or by the analysis of its spectrum and waveform.

However various noises and distortions sometimes may sound similar therefore the most popular method to identify an audio hindrance is the analysis of the spectrum and the waveform. As noise characteristics usually change over time, it is necessary to use the special processing method that provides an automatic adjusting to noise characteristics.

Digital filtration algorithms that may adjust to a certain type of audio hindrance are called adaptive filtration algorithms.

SpeechPro Inc. extensively uses adaptive algorithms of a new generation in its hardware and software products:

Adaptive broadband filtration

Adaptive inverse filtration

Frequency compensation

Impulse filtration

Dynamic processing

Stereo processing

Adaptive broadband filtration is based upon an adaptive frequency algorithm. This algorithm is designed to suppress broadband and periodic noises due to electric pick-ups or mechanic vibrations, room and street noise, communication channel or recording equipment interferences. You may hear these noises as hum, rumbling, hisses or roars. The broadband filtration method usually consists of two processing procedures of adaptive spectral noise subtraction that allows to enhance the speech and adaptive background extraction that separates background acoustic environment from the useful signal. It is nearly impossible to remove such noises with other methods, such as one-channel adaptive filtration, spectrum smoothing or equalizer, because the noises are spread across the whole spectrum and intersect with the speech signal.

Recorded conversation between two people in the noisy street

Before reduction After reduction

Adaptive inverse filtration process is based upon the adaptive spectral correction algorithm, sometimes also called adaptive spectral smoothing.

Adaptive inverse filtration effectively suppresses strong periodic noises from electrical pick-ups or mechanical vibrations thus recovering speech and equalizing the signal. It amplifies weaker signal components and suppresses the stronger ones at the same time. The average spectrum therefore tends to approach the flat spectrum enhancing the speech signal and improving its intelligibility. Broadband noises, however, usually become stronger making signal perception less comfortable. It means that you should try to reach a compromise between noise reduction and speech perception.

Frequency compensation process uses the Widrow–Hoff adaptive filtering algorithm of one-channel adaptive compensation. It is most effective for narrow-band stationary interferences. The filter adjusts itself smoothly maintaining good quality of the speech. The frequency compensation in this process also provides adaptive compensation in time domain.

Frequency compensation enables the ability to remove both narrowband stationary interferences as well as regular ones (vibrations, power-line pickups, electrical device noises, steady music, room, traffic and water noises, reverberation etc.). The main advantage of this process is its capability to preserve the speech signal much better than other filters usually do. Since the audio interferences in some cases may be removed only partially, it is possible to use frequency compensation method more than once.

Power-line buzz masking the conversation between two people

Before reduction After reduction

Adaptive impulse filter automatically restores speech or musical fragments distorted and masked by various pulse interferences such as clicks, radio noises, knocks, gunshots etc. Adaptive impulse filtering algorithms improve the quality of the signal suppressing powerful signal impulses and thus unmasking the useful audio signal and increasing its intelligibility. During impulse filtration it substitutes impulses with smoothed and weakened interpolated signals. If the algorithm does not detect an impulse, it leaves the fragment intact. It also does not suppress tonal interferences and broadband noises. Impulse detection is based upon the information about the differences between the useful signal and an interference that the algorithms automatically detects.

Tapped phone conversation interfered by another line's beeping

Before reduction After reduction

Dynamic signal processing improves the intelligibility of the speech if the signal fragments greatly differ in level, in the case of resonant knocks (i.e. long impulses) and room noises. Dynamic processing algorithms improve and unmask the audio signal suppressing the powerful impulses and clicks and reducing the listener’s fatigue in case of long audio recordings.

Stereo filtration is one of the latest innovations in the field of noise reduction technologies. In some cases the problem of removing the noises can be resolved with the help of dual-channel audio information monitoring and further dual-channel adaptive filtration (stereo filtration). This method however is more sensitive to the audio recording process and its quality because it requires the more accurate use of two or more microphones. There are two methods of stereo filtration available: two-channel signal processing and adaptive stereo filtering. In the first case the sound in each channel is processed independently, while in the second case data acquired from one channel (reference channel) is used for filtering the signal in the second one (primary channel). Stereo filtration is the most effective way to control the audio environment full of various hindrances. This method effectively reduces background music and crowd noises enhancing the useful speech signal and is perfect for the recordings in big-sized rooms like halls, restaurants, theaters etc.

peechPro's product line includes solutions

Some of the key elements of SpeechPro's expert systems is a unique sound cleaning software application that received the first prize in audio enhancement contest organized by AES (Audio Engineering Society) in 2008 and our present professional real-time noise cancellation and speech enhancement software.

(AES logo as in catalog and brochures)

SpeechPro's expert systems were highly evaluated by world-class experts in forensic audio analysis and were adopted by law-enforcement agencies throughout the USA, Europe and Latin America.

2. Automatic systems are mostly characterized by compact real-time noise filtering and speech enhancement devices that may be of great value for police, surveillance teams, private investigators, forensic labs and other law enforcement agencies. They can be used in real-time sound and speech quality improvement while recording or listening in field conditions. Moreover SpeechPro's hardware solutions can be of great interest for audio engineers working in the area of mobile processing of recorded audio data and broadcasting in terms of “live” mastering of the interviews and reports.

Being mobile and compact SpeechPro's hardware is effective against different noise sources: communication channel interferences, office equipment, industrial and vehicle engines, street traffic, environmental noises, background music, hiss and rumbling, reverberations and echo effects. They also provide the original methods for stereo processing using algorithms by reference channel.

3. Research and development solutions in the area of noise reduction are mostly presented as cross platform libraries, automatic/manual algorithm adjustment and real-time/post-processing embedded/workstation implementation

SpeechPro's SDK noise reduction features:

Broadband Noise Filter/Canceller
Equalizer (EQ), Graphical EQ, Adaptive EQ, Parametric EQ
Dynamic Range Control, Sound Level Limiter
Automatic Gain Control
Level Control, Speech Level Enhancement
Punch & Crunch Dynamics Processing
Acoustic Shock Protection, Adaptive Shock
Attenuator/Limiter [DSP-factory]
Harmonic Reject Filter : Adaptive & Fixed COMB
Hiss Filtration

CONCLUSION

Generally noise reduction methods were developed to unmask the useful signal hidden in different types of audio hindrances.

The noise reduction standard approach lies in the principle of the most practical removal of unnecessary extraneous sound components and returning the distorted parameters to their typical values. The most typical noise suppression goal is useful signal unmasking, i.e. to suppress noisy signal components in the areas where the hindrances are powerful and the useful signal is weak and to intensify those components where the useful signal is at maximum.

Thus the basic principles of noise reduction technologies are:

Unmasking the useful speech signal in time and frequency domains taking in consideration psycho-acoustic properties of human speech hearing.
Removing different kinds of background noises to decrease tiredness during listening.
Decreasing the frequency pass band of the signal to and removing low-frequency drones and high-frequency hisses.
Smoothing the high peaks and decreasing the audio signal amplitude in pauses without the speech.
Removing or decreasing pulse-like interference amplitude and other intensive outside sounds.
Removing regular slowly changing hindrances: music, traffic and industrial noises, decreasing reverberation (echo effects).
Smoothing signal spectrum.
Additional subtraction of narrow band interferences.
Removing additive broadband noises (tape, radio, phone and microphone hiss).

Loudness Normalization:The Future of File-Based Playback

Recent Posts

Comments