Tag Archive for: audio

Acustica Audio have steadfastly adhered to their remit of providing genuine hardware topologies in software formats thus making ‘hardware’ available to all and at cringingly competitive prices. Their approach of using dynamic convolution to recreate hardware processors is both novel and potently effective, offering the user a wide range of genuinely comparable software equivalents. They have achieved this level of realism through a detailed and intensive sampling protocol which entails running sine sweeps, also known as impulses, through mouth watering hardware processors at various levels with snapshots at various instances. The resulting responses are then convolved to provide an eerily accurate representation of the hardware being multisampled. This form of advanced dynamic convolution has resulted in very accurate modeling of the time-dependent behaviour of non-linear analog circuit components. The proprietary Volterra Series non-linear convolution technology, as Acustica like to call the ever-evolving protocol, has taken leaps and bounds in recent years and the result is the mouth watering line of Acqua products and one product that best exemplifies Acustica’s ideology is Gold 2.

Billed as ‘the one-stop-shop solution for any lover of the sound of legendary vintage British consoles’ Gold takes the user on a journey through the various incarnations of the much loved and legendary British classic topology – that of Neve!

Anyone who has even the remotest smattering of hardware processor knowledge will recognise the brand name Neve. Synonymous with a big and coloured sound Neve has been responsible for some of the great classics of our time and the brand still dominates most genres, and no company has captured the Neve sound and essence better than Acustica Audio with the release of their latest Acqua plugin Gold 2.

GOLD 2 consists of 6 different EQs, 2 compressors, 9 solid-state preamp emulations, and 6 tube preamp emulations, a detailed and flexible routing control section (matrix). However, this count increases when you take into account the number of permutations available via the routing matrix across all the modules.

The modules are listed as follows:

  • GOLD2 (Channel-strip incorporating all modules and routing matrix)
  • 2 rack EQ with selectable models identical to those in the standalone EQ plug-in
    • Compressor section with the same 8052 and 8054 models and controls
    • 7 different preamp options
  • GOLD2 PRE (standalone preamps module)
  • GOLD2 EQ (Equalizer standalone module);
  • GOLD2 COMP (standalone compressor module)

Each plug-in has a Standard version and an alternative ZL version which operates at zero latency –which comes at a cost, notably, cpu processor load.

The simple description above belies what is under the hood of this impressive plugin. Let me lift the hood and share with you the various Neve colours on offer and to do that truthfully I need to list the various modules that were painstakingly and lovingly sampled!

Make a cup of coffee, sit down and prepare yourself:

• 8066 EQ: Neve 1066
• 8112 EQ: Neve 33122
• 8093 EQ: EMI – Neve 1093 (shelf filters only, Neve 1081 with Marinair
transformers)
• 8193 EQ: EMI – Neve 1093 (Bell/peak mode, Neve 1081 with Marinair
transformers)
• 8077 EQ: Rare Neve 1077 – Mid-frequency bands are duplicated.
• H073 EQ: Homebrew 1073 clone – a rare and uniquely powerful eq.
• 8052 Compressor: Neve 2252
• 8054 Compressor: Neve 2254
• A variety of tube preamps, including a Neve 9001
• A variety of Neve EQ/console/microphone preamps.

Gold 2 is an improvement on the original Gold in that it uses the latest Core 13 technology with the improvements listed below:

  • Upgraded SASM™ (Symmetric & Asymmetric Saturation Modeling) high-performance saturation algorithm
  • Introduction of a new post-production sample de-noising technology for cleaner deconvolved impulses called STT™ (Super Transient Technology)
  • Full compatibility with Client/Server architecture integrated by default in Acqua plugins
  • Engine optimization thanks to a new highly efficient algorithm. This innovative technique is applied to all the deconvolved impulses for further de-noising and subsequent elimination of any incorrect low-level behaviour (including the so-called “echo bug”)

Now that we have the specifications out of the way we can enjoy the ‘colour’ of Gold 2.

The manual includes detailed installation instructions and once the software has been installed using the elegant and simple Aquarius* you will be confronted with two versions of the plugin: Gold 2 and Gold 2 ZL. ZL denotes zero latency and is very useful if you are tracking or want a lower CPU load. In the event the full channel strip is not required Acustica Audio have cleverly broken down the modules into separate plugins. The pre, eq, and compressor are provided as standalone modules along with respective ZL versions.

The manual is something of a thesis with detailed explanations on how the various modules are activated and how they work in relation to the overall topology of the plugin. The presentation of the manual harks back to the golden days of bound literature and this in itself speaks volumes about how the company perceives their products. But don’t think for one minute that this is all about pretty pictures and italic text.  For the tech purists like me, the manual affords a wealth of useful information in the form of graphs and tables.

Gold 2 is cleverly thought out with attention given to the routing matrix. Traditional hardware channel strips worked in a linear and left to right mode with the pre residing in the first slot followed by filtering and equalisation and then topped off with compression and limiting. However, Acustica gives us the flexibility of customising the strip to suit the mix project’s requirements and they do so with a mouth watering gold knob that switches between the various routings on offer. I know it sounds silly but for us producers the visual aspect of a plugin is as important as its function and performance and Acustica never fails in this department. This level of versatility allows the user to chain modules in a predetermined manner, affording huge scope for both mixing and sound design chores. Each module is offered with various topologies that are available at the flick of a switch and when you factor in the extensive routing options on offer you can see, quite quickly, why this plugin is more than just a single function dynamic tool.

For my tests, I used one RnB vocal line, one Hip Hop drum beat and a busy EDM take with staccato synth builds so as to afford me a wide range of frequencies and responses to test with. Additionally, these particular choices allowed me to test for short static transients versus longer sustained evolving transients – how a compressor detects and captures peak transients is as important as its overall behaviour. Each take was run through the various topologies on offer and the results, on the whole, were as I expected. However, it is always difficult to test against early Neve topologies in that each release version had a variation to design and featured components. To take a working example: the Neve 2254 compressor combined an ‘active gain‑control’ section with classic Neve modular discrete Class A design gain stages and transformer-coupled circuits. This afforded the user the classic Neve sound. However, each revision altered the colour ever so slightly and to the purists, the ‘correct’ version had to be used to ‘best represent’ the Neve sound. What ‘that version’ is is still in debate. With this in mind, you can appreciate how hard it is to exact a colour to the original. Acustica have made a bold attempt at sampling the Neve colour across the various topologies, and they have done so admirably. Ultimately, this is all that matters to the end-user – the fact that the major Neve colours are represented in a single product that houses all the major topologies that are easily and instantly accessible.

Using the 2 compressor designs – the 2252 and 2254 I knew pretty much what to expect in terms of processing the Hip Hop drum beat. The 2252 suffered from its original design as it was Neve’s first diode-bridge compressor and employed germanium transistors in the output amplifier. This resulted in high levels of distortion. Although it contributed greatly to the 1960’s rock sound it was on the whole ignored by the broadcasting industry which was the market Neve aimed for. The 2254, however, is a different kettle of fish and the design I used for most of the testing. The smoothness and depth were instantly obvious with no enharmonic colouration. The distortions, as expected, were both harmonic and manageable. The vocal take also benefited from the smooth and fluid compression that the 2254 and 2252 offer and the processes added a certain texture that is evident when using these particular topologies from Neve. Busy transient rich material like the synth staccatos I tested suffered a little from lack of peak detection but that is expected as the 2 stage dynamic feed-back design of the 2254 affords a smoother and more musical result. However, this was not a showstopper as the plethora of Neve topologies on offer meant I had access to all manner of dynamic tools to shape and hone the takes and this more than compensated for any design compromises.

The Neve EQs are a joy to use and Acustica have come extremely close to capturing the Neve flavour across the various modules. I didn’t test for time constraint processes, or phase anomalies, as I would expect those to be in line with Acustica’s stringent sampling protocols, and if I had, to be honest, the tests would conjure up wildly variable results as, once again, we would have to take into account the various versions of each module that populated the hardware markets of yesteryear. The smoothness of colour denoted across the various eq modules sounded, to my ears, a very close match to the originals. Neve EQs are very musical and the differences between module topologies are night and day to the purist and I had a great deal of fun playing with the various EQs on offer and the whole experience had me harking back to the glory days of expensive temperamental hardware albeit affordable and controlled.

Gold 2 met all my expectations and the variety of audio material I used to test the beast stood all manner of bludgeoning tasks. The real power of this plugin rears its head when you start to combine processes using the routing matrix. A simple switch from one topology to another sprung a different colour in my face and that is all you can ask for from a product that claims to have captured the Neve sound.

I cannot stress how good Gold 2 is. Sure, it might be a little quirky in its routing behaviour but that is its strength and to be honest, the results would outweigh any anal reservations about the viability of the colours bearing in mind the sheer volume of versions available on each topology.

Acustica Audio have captured the essence of the Neve sound and presented it as an elegant and versatile solution in Gold 2.

Buy it!

*Aquarius is a brand new assistant application by Acustica Audio which offers a quick and easy way to download, install, update and authorize your products.

Samplecraze and Acustica Audio have joined forces to bring you a great discount package.

Click here for more info!

This subject has done the rounds for years.

Often it is the cash strapped home studio owner who has to resort to using headphones, the cheaper and space saving solution, instead of speakers to conduct mixing projects. There are obvious advantages to using headphones for mixing, but glaring disadvantages too. There are no winners here on either side of the fence. Quite simply, if you want to be fully armed to conduct the best mixes, then a combination of both is essential.

Good quality headphones can reveal detail that some good speakers/monitors omit. In terms of sound design, a good headphone is imperative as it will be unforgiving in revealing anomalies. In terms of maintaining a clean and noise free signal path, it is crucial. On the flip side, stereo imaging and panning information is much harder to judge on headphones. Determining the spatial feel of a mix is almost impossible to convey on headphones, but simple with speakers. Pans are pronounced and extreme on headphones and do not translate across well when used with speakers. Even EQ can come across as subdued or extreme.

I find that if I mix on headphones alone, then the mix never travels well when auditioned with monitors. The reverse is also true.

When using monitors and because the monitors are placed in front of us our natural hearing perceives the soundstage as directly in front of us. With headphones, because the ‘speakers’ are on either side of us, there’s no real front-to-back information. Headphones also provide a very high degree of separation between the left and right channels, which produces an artificially detailed stereo image. Our brains and ears receive and analyse/process sound completely differently when using headphones as opposed to monitors. When using headphones, each ear will only hear the audio signal carried on the relevant channel, but with speakers, both ears will hear the signals produced by both loudspeakers.

You also need to factor in the fact that different people perceive different amounts of bass – factors such as the distance between the headphone diaphragm and the listener’s ear will change the level of bass. The way in which the headphone cushion seals around the ear also play a part, which is why pushing the phones closer to your ears produces a noticeable increase in bass. This increases the bass energy and this alone negates the idea of having correct tonal balance in the mix being auditioned.

With monitors, both ears hear both the left and right channels.

If your room is acoustically problematic and you have poor monitors, then headphones may well be a better and more reliable approach. But it is a lot harder to achieve the same kind of quality and transferability that comes more naturally on good monitors in a good acoustically treated room.

I find that if I record and check all my signals with headphones, then I am in a strong position to hear any anomalies and be in a better position to judge clarity and integrity of the recorded signals. This, coupled with speaker monitoring, assures me of the best of both worlds; clarity and integrity married with spatial imaging.

If you want further reading on this subject then I recommend Martin Walker’s seminal article here entitled: Mixing On Headphones.

Noise Gate does exactly what it sounds like.

It acts as a gate and opens when a threshold is achieved and then closes depending on how fast a release you set, basically acting as an on-off switch.
It reduces gain when the input level falls below the set threshold, that is, when an instrument or audio stops playing, or reaches a gap where the level drops, the noise gate kicks in and reduces the volume of the file.

Generally speaking, noise gates will have the following controls:

Threshold: the gate will ‘open’ once the threshold has been reached. The threshold will have varying ranges (eg: -60dB to infinity) and is represented in dB (decibels). Once the threshold has been set, the gate will open the instant the threshold is reached.

Attack: this determines the speed of the gate kicking in, much like a compressor’s attack, and is usually measured in ms (milliseconds) and sub derivatives of. This is a useful feature as the speed of the gate’s attack can completely change the tonal colour of a sound once gated.

Hold: this function allows the gate to stay open (or hold) for the specified duration, and is measured in ms and seconds. Very useful particularly when passages of audio need to be ‘let through’.

Decay or release: this function determines how quickly the gate closes and whether it is instant or gradual over time. Crucial feature as not all sounds have an abrupt end (think pads etc).

Side Chaining (Key Input): Some gates (in fact most) will also have a side-chain function that allows an external audio signal to control the gate’s settings.

When the side-chained exceeds the threshold, a control signal is generated to open the gate at a rate set by the attack control. When the signal falls below the threshold, the gate closes according to the setting of the hold and release controls. Clever uses for key input (side-chaining) are ducking and repeat gated effects used in Dance genres. The repeated gate effect (or stuttering) is attained by key inputting a hi-hat pattern to trigger the gate to open and close. By using a pad sound and the hi-hat key input pattern you are able to achieve the famous stuttering effect used so much in Dance music.

Ducking: Some gates will include a ‘Ducking’ mode whereby one signal will drop in level when another one starts or is playing.The input signal, which is usually the signal that needs ducking, is sent to the key input (side-chain), and the gate’s attack and release times set the rate at which the levels change in response to the key input signal. A popular use for ducking is in the broadcasting industry whereby the DJ needs the music to go quiet so he/she can be heard when speaking (once the voice is used at key input and triggered then the music will drop in volume).

However, side-chaining (key input) and ducking are not all the gate is good for.

The most common use for a gate, certainly in the old days of analog consoles and tape machines, was to use the gate to remove ‘noise’. By selecting a threshold just above the noise level the gate would open to allow audio through above the threshold and then to close when required. This meant that certain frequencies and levels of noise were ‘gated’ out of the audio passage and thus cleaner.

BUT it doesn’t end there. There are so many uses for a noise gate, using an EQ unit as the key input for shaping audio lines and curing false triggers, for ducking in commentary situations (and still used today), for creative sonic mangling tasks (much like the repeat gate) and so on.

With today’s software-based gates we are afforded a ton of new and interesting features that make the gate more than a simple ‘noise’  gate.

Experiment and enjoy chaining effects and dynamics in series and make sure to throw a gate in there somewhere for some manic textures.

If you prefer the visual approach then try this video tutorial:

Noise Gate – What is it and how does it work

In essence, noise is a randomly changing, chaotic signal, containing an endless number of sine waves of all possible frequencies with different amplitudes. However, randomness will always have specific statistical properties. These will give the noise its specific character or timbre.

If the sine waves’ amplitude is uniform, which means every frequency has the same volume, the noise sounds very bright. This type of noise is called white noise.

White noise is a signal with the property of having constant energy per Hz bandwidth (an amplitude-frequency distribution of 1) and so has a flat frequency response and because of these properties, white noise is well suited to test audio equipment. The human hearing system’s frequency response is not linear but logarithmic. In other words, we judge pitch increases by octaves, not by equal increments of frequency; each successive octave spans twice as many Hertz as the previous one down the scale. And this means that when we listen to white noise, it appears to us to increase in level by 3dB per octave.

If the amplitude of the sine waves decreases with a curve of about -6 dB per octave when their frequencies rise, the noise sounds much warmer. This is called pink noise.

Pink noise contains equal energy per octave (or per 1/3 octave). The amplitude follows the function 1/f, which corresponds to the level falling by 3dB per octave. These attributes lend themselves perfectly for use in acoustic measurements.

If it decreases with a curve of about -12 dB per octave we call it brown noise.

Brown noise, whose name is actually derived from Brownian motion, is similar to pink noise except that the frequency function is 1/(f squared). This produces a 6dB-per-octave attenuation.

Blue noise is essentially the inverse of pink noise, with its amplitude increasing by 3dB per octave (the amplitude is proportional to f).

Violet noise is the inverse of brown noise, with a rising response of 6dB per octave (amplitude is proportional to f squared).

So we have all these funky names for noise, even though you need to understand their characteristics, but what are they used for?

White noise is used in the synthesizing of hi-hats, crashes, cymbals etc, and is even used to test certain generators.

Pink noise is great for synthesizing ocean waves and the warmer type of ethereal pads.

Brown noise is cool for synthesizing thunderous sounds and deep and bursting claps. Of course, they can all be used in varying ways for attaining different textures and results, but the idea is simply for you to get an idea of what they ‘sound’ like.

At the end of the day, it all boils down to maths and physics.

 

Here is an article I wrote for Sound On Sound magazine on how to use Pink noise referencing for mixing.

And here is the link to the video I created on master bus mixing with Pink noise.

And here is another video tutorial on how to use ripped profiles and Pink noise to mix.

Jitter is the timing variation in the sample rate clock of the digital process. It would be wonderful to believe that a sample rate of 44.1 kHz is an exact science, whereby the process samples at exactly 44,100 cycles per second. Unfortunately, this isn’t always the case. The speed at which this process takes place usually falters and varies and we get the ‘wobbling’ of the clock trying to keep up with the speeds of this process at these frequencies. This is called jitter. Jitter can cause all sorts of problems and it is best explained, for you, as: the lower the jitter the better the audio representation. This is sometimes why we use better clocks and slave our sound cards to these clocks, to eradicate or diminish ‘jitter’ and the effects caused by it.

Jitter is a variation in the timing of the sampling instants (time-based) when the audio is converted to or from the digital domain. If the conversion process suffers from any time anomaly then the resulting signal amplitude will differ from its true value. Usual side effects are an increase in high-frequency noise, clicks and worst-case scenario muted and not working.  In simple terms, the clicks are caused when one of the digital devices searches for an incoming audio ‘sample’ but fails to find it as it is looking at the wrong time ‘frame’ (instance). Apart from these ‘anomalies’, the real-world audio effect is that the stereo imaging is compromised leading to a flat stereo image as opposed to one with depth and width.

Jitter affects the stability of the sample clock. The lower the jitter figure, the more stable the clock and the better the performance. This means that the lower the jitter values, the better the performance and the more stable the clock is.

When using more than one digital device it is best to interface and synchronize, using clock synchronization, between both the source and destination digital device.

Most of today’s digital systems will have embedded clock at source that can then be used to synchronize the two devices. In more sophisticated systems like DAWs, digital consoles, higher-end sound cards and so on, there will be some form of control panel whereby desired clock sources can be selected. The most common selections available are digital input, external word clock, and the internal clock. The selection comes down to system configuration and project choice. However, what is a given is that all digital devices must be synchronized.

Using the internal clock ensures stability as the clock rate is known, but this is where all devices must be synchronized to the internal clock’s rate. Alternatively, and a common choice amongst most studios, is to use a dedicated external clock. This affords a universal and global rate that all devices can be synchronized to, and more importantly, a dedicated master clock has one function and that can often alleviate system configuration problems. The only problem that arises from this scenario is that most consumer systems do not accommodate for slaving to external clocks and the internal clock will have to be the master clock source.

At the end of the day, it comes down to knowledge and experience and ignoring the benefits of a good clock source in a digitally configured system is the equivalent of running top-end processors through a Radio Shack budget 2 channel DJ mixer.

When dealing with events, as we do for cycles as an example, we are concerned with two factors: Frequency (f) and Time (T). If we look at a single event then T is defined as the start to the end of that event and that amount is measured as a Period.
When dealing with a waveform cycle, the time it takes for the cycle to return to its starting position is defined as Periodicity. Taking this a step further, Frequency is then defined as the number of events that occur over a specified time, and this is illustrated with the following equation:

We measure this Periodicy in seconds (s), cycles per second. The SI unit for one cycle per second is measured in Hertz (Hz). We tend to measure anything above 1000 Hz as kHz (kilohertz) and if dealing with cycles that are measured in shorter durations than one second we use ms (milliseconds: 1/1000th of a second). This is a huge advantage when it comes to measuring microphone distances from sources and trying to correct alignments.

Relevant content:

Total and Partial Phase cancellation

If you consider the dynamic range of varying bit depths, 1 bit being roughly equivalent to 6dB of dynamic range, then it makes sense that the higher the bit depth the higher the dynamic range. With 24 bit depth, the dynamic range (theoretically) is 144dB. Bearing in mind that our hearing does not even come close to a 144dB range, it makes sense to use a dynamic range beyond our hearing’s dynamic range for the very simple reason that audio captured at this bit resolution will fall below our hearing’s minimum range and above its maximum range.

To accommodate internal processing within a digital system a much higher headroom is required for the simple reason that processing will require additional bits. By adding more than one 24 bit numbers together it is obvious that more bits are required. Dynamic processing, by its very nature, requires higher bit counts as the process itself generates bits, or subs of, that need managing otherwise there will be sonic compromises.

The 32 bit system seems to handle these processes well and it has become a minimum standard. Of course, we now have higher bit internal processing.

Fixed Point systems use the 32 bits in the standard way and the maths is simply a scale that provides a dynamic range of  192dB (32×6). The usual procedure is to allow the 24 bit signal to work closely at the top of the 32 bit processing. This makes complete sense as it provides a higher headroom and a lower noise floor.

Floating-point still uses the 32 bit system but arranges the bits in a different manner. The signal is still kept in 24 bit but the remaining bits are allocated to denote scaling factors. This basically means that the 24-bit can be used in a more flexible and dynamic manner allowing for a massive dynamic range. This equates to a never-ending scale of headroom and a noise floor that is so low as to be negligible.

Relevant content:

Digital Audio – Understanding and Processing

Jitter in Digital Systems

Dither – What is it and how does it work?

Understanding how sound travels in a given space is critical when setting up speakers in your studio.

Sound Waves  

Let us have a very brief look at how sound travels, and how we measure its effectiveness.  

Sound travels at approximately 1130 feet per second (about 1 foot per ms).
By the way, this figure is a real help when setting up microphones and working out phase values.

Now let us take a frequency travel scenario and try to explain its movement in a room.

For argument’s sake, let’s look at a bass frequency of 60 Hz.

When emitting sound, the speakers will vibrate at a rate of 60 times per second. Each cycle (Hz) means that the speaker cones will extend forward when transmitting the sound, and refract back (rarefaction) when recoiling for the next cycle.  

These vibrations create peaks on the forward drive and troughs on the refraction. Each peak and trough equates to one cycle. 

Imagine 60 of these cycles every second.

We can now calculate the wave cycles of this 60 Hz wave. We know that sound travels at approximately 1130 feet per second, so we can calculate how many wave cycles that is for the 60 Hz wave. We divide 1130 by 60, and the result is around 19 feet (18.83 if you want to be anal about it). We can now deduce that each wave cycle is 19 feet apart. To calculate each half-cycle, i.e. the distance between the peak and trough, drive and rarefaction, we simply divide by two. We now have a figure of 91/2 feet. What that tells us is that if you sat anywhere up to 91/2 feet from your speakers, the sound would fly past you completely flat.
However, this is assuming you have no boundaries of any sort in the room, i.e. no walls or ceiling. As we know that to be utter rubbish, we then need to factor in the boundaries.

These boundaries will reflect back the sound from the speakers and get mixed with the original source sound. This is not all that happens. The reflected sounds can come from different angles and because of their ‘bouncing’ nature; they could come at a different time to other waves. And because the reflected sound gets mixed with the source sound, the actual volume of the mixed wave is louder.

In certain parts of the room, the reflected sound will amplify because a peak might meet another peak (constructive interference), and in other parts of the room where a peak meets a trough (rarefaction), frequencies are canceled out (destructive interference).

Calculating what happens where is a nightmare.
This is why it is crucial for our ears to hear the sound from the speakers arrive before the reflective sounds. For argument’s sake, I will call this sound ‘primary’ or ‘leading’, and the reflective sound ‘secondary’ or ‘following’.

Our brains have the uncanny ability, due to an effect called the Haas effect, of both prioritizing and localizing the primary sound, but only if the secondary sounds are low in amplitude. So, by eliminating as many of the secondary (reflective) sounds as possible, we leave the brain with the primary sound to deal with. This will allow for a more accurate location of the sound, and a better representation of the frequency content.

But is this what we really want?

I ask this because the secondary sound is also important in a ‘real’ space and goes to form the tonality of the sound being heard. Words like rich, tight, full etc. all come from secondary sounds (reflected). So, we don’t want to completely remove them, as this would then give us a clinically dead space. We want to keep certain secondary sounds and only diminish the ones that really interfere with the sound.

Our brains also have the ability to filter or ignore unwanted frequencies. In the event that the brain is bombarded with too many reflections, it will have a problem localizing the sounds, so it decides to ignore, or suppress, them.

The best example of this is when there is a lot of noise about you, like in a room or a bar, and you are trying to have a conversation with someone. The brain can ignore the rest of the noise and focus on ‘hearing’ the conversation you are trying to have. I am sure you have experienced this in public places, parties, clubs, football matches etc. To carry that over to our real-world situation of a home studio, we need to understand that reflective surfaces will create major problems, and the most common of these reflective culprits are walls. However, there is a way of overcoming this, assuming the room is not excessively reflective and is the standard bedroom/living room type of space with carpet and curtains.

We overcome this with clever speaker placement and listening position, and before you go thinking that this is just an idea and not based on any scientific foundation, think again. The idea is to have the primary sound arrive at our ears before the secondary sound.   Walls are the worst culprits, but because we know that sound travels at a given speed, we can make sure that the primary sound will reach our ears before the secondary sound does. By doing this, and with the Haas effect, our brains will prioritize the primary sound and suppress (if at low amplitude) the secondary sound, which will have the desired result, albeit not perfectly.

A room affects the sound of a speaker by the reflections it causes. Some frequencies will be reinforced, others suppressed, thus altering the character of the sound. We know that solid surfaces will reflect and porous surfaces will absorb, but this is all highly reliant on the materials being used. Curtains and carpets will absorb certain frequencies, but not all, so it can sometimes be more damaging than productive. For this, we need to understand the surfaces that exist in the room. In our home studio scenario, we are assuming that a carpet and curtains, plus the odd sofa etc, are all that are in the room. We are not dealing with a steel factory floor studio.

In any listening environment, what we hear is a result of a mixture of both the primary and secondary (reflected) sounds. We know this to be true and our sound field will be a combination of both. In general, the primary sound, from the speakers, is responsible for the image, while the secondary sounds contribute to the tonality of the received sound. 

The trick is to place the speaker in a location that will take of advantage of the desirable reflections while diminishing the unwanted reflections. ‘Planning’ your room is as important as any piece of gear. Get the sound right and you will have a huge advantage. Get it wrong and you’re in the land of lost engineers.

Relevant content:

Sinusodial Creation and Simple Harmonic Motion

Frequency and Period of Sound

Total and Partial Phase cancellation

The first premise to understand is that simple harmonic motion through time generates sinusoidal motion.

The following diagram will display the amplitude of the harmonic motion and for this we need to use the term A in our formula. We will also be using θ.

I have used the equation A sin θ where θ completes one cycle (degrees).
The axis displays values based on a unit circle with being interpreted as amplitude.
The x axis denotes degrees (θ)

It then follows that:

When the angle θ is 0° or 180° then y = 0
sin 0° and sin 180° = y/A = 0

When the angle θ is 90° then y = 1
sin 90° = y/A = 1

When the angle θ is 270° then y = −1
sin 270° = y/A = −1

When constructing and working with sinusoids we need to plot our graph and define the axis.

I have chosen the y-axis for amplitude and the x-axis for time with phase expressed in degrees. However, I will later define the formulae that define the variables when we come to expressing the processes.

For now, a simple sine waveform, using each axis and defining them, will be enough.

I will create the y-axis as amplitude with a range that is set from -1 to +1.
y: amplitude

Now to create the x-axis and define its variables to display across the axis.

The range will be from -90 deg to 360 deg
x: time/phase/deg

The following diagram displays the axis plus the waveform and the simplest formula to create a sinusoid is y-sinx

The diagram shows one cycle of the waveform starting at 0, peaking at +1 (positive), dropping to the 0 axis and then down to -1 (negative).

The phase values are expressed in degrees and lie on the x-axis. A cycle, sometimes referred to as a period, of a sine wave is a total motion across all the phase values.

I will now copy the same sine wave and phase offset (phase shift and phase angle) so you can see the phase values and to do this we need another simple formula and that is:
y=sin(x-t) where t (time/phase value) being a constant will, for now, have a value set to 0. This allows me to shift by any number of degrees to display the phase relationships between the two sine waves.

The shift value is set at 90 which denotes a phase shift of 90 degrees. In essence, the two waveforms are now 90 degrees out of phase.

The next step is to phase shift by 180 deg and this will result in total phase cancelation. The two waveforms together, when played and summed, will produce silence as each peak cancels out each trough.

Relevant content:

Frequency and Period of Sound

Total and Partial Phase cancellation

Digital Audio – Understanding and Processing