26. TDOA¶

Time Difference of Arrival (TDOA) is a technique that can find the position of a transmitter (a.k.a. emitter) using multiple synchronized receivers (a.k.a. sensors), by comparing differences in signal arrival time. This chapter covers the full TDOA pipeline: geometry, GCC-PHAT time-delay estimation, closed-form and maximum-likelihood localization, accuracy bounds (CRLB and GDOP), and challenges like synchronization and multipath. TDOA is commonly used in both RF and acoustic/sonar applications.

Before diving in, try playing with the interactive demo below to get a quick feel for how TDOA works, which involves the intersection of hyperbolas.

Introduction¶

A common problem across RF and acoustics/sonar is the desire to find the position of an emitter, also known as the process of geolocation. The emitter may be cooperative (a cell phone trying to be found) or non-cooperative (a radar emitter that would rather not be), stationary or moving, and the medium may be air, water, or free space. TDOA-based localization appears in cellular emergency-caller location, acoustics with microphone arrays (e.g., gunshot-detection systems mounted on city streetlights), passive sonar, passive (non-emitting) radar, electronic warfare and signals intelligence, and even wildlife tracking. In each case the engineering details differ, but the mathematical skeleton is the same.

The key behind TDOA is that when the same wavefront hits two sensors, the difference in arrival times depends only on geometry, not on when the emitter transmitted. To see why, consider the propagation time from an emitter to sensor \(i\): \(t_i = t_0 + r_i / c\), where \(t_0\) is the (unknown) start of transmission, \(r_i\) is the emitter-to-sensor distance, and \(c\) is the propagation speed. If we subtract the arrival times at two sensors,

\[\tau_{ij} = t_i - t_j = \frac{r_i - r_j}{c},\]

the unknown \(t_0\) vanishes, which is good because we will likely never know \(t_0\). The TDOA depends only on the difference of ranges, which depends only on emitter and sensor geometry. This single fact is why TDOA dominates for non-cooperative emitters; we never need to know when the emitter transmitted, only that the same wavefront reached our synchronized receivers at measurable relative delays. You still need to isolate the signal so that you’re only observing one emitter, so signal detection and classification may be necessary, plus filtering.

Each pair of sensors yields one TDOA, and each TDOA traces out one hyperbola, so the number of hyperbolas we can draw is just the number of sensor pairs. With \(N\) sensors that is

\[\binom{N}{2} = \frac{N(N-1)}{2},\]

i.e. 3 sensors give 3 hyperbolas, 4 give 6, 5 give 10, and so on. Not all of these are independent, as we will see below, only \(N-1\) carry new geometric information, but the full set is still useful for averaging out noise.

The price we pay is that the receivers must share a precise common time reference, a requirement that, as discussed later, is itself a demanding engineering problem because a timing error of one nanosecond corresponds to about 1 foot or 0.3 meters of range error, at least in RF applications.

TDOA Geometry¶

From Time Difference to Range Difference¶

Multiplying a measured time difference by the propagation speed converts it into a range difference:

\[\Delta r_{ij} = c\,\tau_{ij} = r_i - r_j .\]

For acoustic problems \(c \approx 343\) m/s in air; for radio problems \(c \approx 2.998\times10^8\) m/s. Note immediately the consequence for accuracy: in air, a \(0.1\) ms timing error is only \(\sim\)3 cm, whereas in free space the same timing error is 30 km. Radio TDOA therefore demands extraordinarily precise timing, a theme we return to repeatedly.

The diagram below shows an example of an emitter and three sensors, with a time domain plot of the signal being received by each sensor at different times.

An emitter and three sensors, with a time domain plot of the signal being received by each sensor at different times.

The Hyperbola¶

Now think about what a single range difference actually tells us. Suppose we have two sensors, and we have measured that the emitter is, say, 100 meters closer to one sensor than the other. Where could the emitter be? Not at a single spot, it turns out, but anywhere along a curved line. As you slide along that line, the two distances to the sensors both change, but their difference stays fixed at 100 meters the whole way.

That curve has a name: it is a hyperbola, with the two sensors sitting at its two focal points. (In 3D the same idea sweeps out a curved surface called a hyperboloid, but the 2D picture is easier to reason about and everything carries over). If we write the emitter position as \(\mathbf{u}\) and the two sensor positions as \(\mathbf{s}_i\) and \(\mathbf{s}_j\), the hyperbola is just the set of points obeying

\[|\mathbf{u}-\mathbf{s}_i| - |\mathbf{u}-\mathbf{s}_j| = \Delta r_{ij} = \text{constant},\]

which reads “distance to one sensor minus distance to the other equals our measured range difference”. A few practical consequences fall right out of this:

A range difference can’t exceed the spacing between the two sensors. Intuitively, the difference of two distances is largest when the emitter lies directly out beyond one sensor along the line connecting them, and even then it can only equal that sensor-to-sensor spacing (often called the baseline). So if you ever measure a range difference bigger than the baseline, something is wrong, most likely noise, multipath, or a timing/synchronization error.
The sign tells you which side you’re on. A hyperbola actually has two mirror-image branches, one curving toward each sensor. Whether your range difference came out positive or negative picks the branch nearer the closer sensor, so you don’t get confused between the two halves.
The shape depends on the measurement. When the range difference is close to the full baseline, the hyperbola hugs the line between the sensors. When the range difference is near zero (the emitter is roughly equidistant), the curve straightens out into the line that perpendicularly bisects the baseline. Near both of these extremes the geometry becomes “ill-conditioned,” meaning small measurement errors push the estimated position around a lot, so positioning there is less reliable.

A single TDOA thus constrains the emitter’s position to a curve, not a point. To fix a position we intersect several such curves. Below we plot two sensors, and several hyperbola branches drawn for \(\Delta r < 0\), \(\Delta r = 0\) (the perpendicular bisector), and \(\Delta r > 0\). On each hyperbola, the TDOA between the two sensors is constant. If you calculated the TDOA with just two sensors, you would know it is somewhere on that line, but you would need a third sensor to find where on that line it is (performing geolocation).

Two sensors, and several hyperbola branches drawn

Multilateration¶

With \(N\) sensors we can form pairs and intersect their hyperbolas; the emitter lies at (or near) their common intersection. This process is hyperbolic multilateration. Counting degrees of freedom tells us how many sensors we need:

In 2D the emitter position has 2 unknowns \((x,y)\). Each independent TDOA gives one equation, so we need at least 2 independent TDOAs, which requires 3 sensors. For example, if we know the emitter is on land and we’re not having to take into account curvature of the Earth, this would work.
In 3D the emitter position has 3 unknowns \((x,y,z)\), requiring 3 independent TDOAs and therefore 4 sensors.

In the noiseless case the hyperbolas meet at a single point (with an occasional geometric ambiguity resolved by branch signs or an extra sensor). With more sensors than the minimum the system is overdetermined: noisy hyperbolas no longer share an exact common point, and we must solve a least-squares or maximum-likelihood problem, as described below.

Reference Sensor and Independent Pairs¶

From \(N\) sensors one can form \(\binom{N}{2}\) pairwise TDOAs, but they are not all independent. Choosing one sensor as a reference (say sensor 1) and forming \(\tau_{i1}\) for \(i = 2,\dots,N\) yields \(N-1\) TDOAs from which every other pairwise difference can be reconstructed, since \(\tau_{ij} = \tau_{i1} - \tau_{j1}\). These \(N-1\) are the independent measurements that carry all the geometric information.

The redundant pairs are not worthless, however. Because each measured TDOA carries independent noise, using all \(\binom{N}{2}\) pairs (with a correctly modeled, correlated noise covariance, where the reference sensor’s noise is common to every \(\tau_{i1}\)) can improve the estimate.

Example: A Three-Sensor 2D Fix¶

Place three sensors at

\[\mathbf{s}_1=(0,0),\quad \mathbf{s}_2=(100,0),\quad \mathbf{s}_3=(0,100)\ \text{(meters)},\]

and suppose the true emitter is at \(\mathbf{u}=(40,30)\). The emitter-to-sensor distances are

\[r_1=\sqrt{40^2+30^2}=50,\quad r_2=\sqrt{60^2+30^2}=\sqrt{4500}\approx 67.08,\quad r_3=\sqrt{40^2+70^2}=\sqrt{6500}\approx 80.62 .\]

Taking sensor 1 as reference, the range-difference measurements are

\[\Delta r_{21}=r_2-r_1\approx 17.08\ \text{m},\qquad \Delta r_{31}=r_3-r_1\approx 30.62\ \text{m}.\]

Each defines a hyperbola with foci \(\{\mathbf{s}_2,\mathbf{s}_1\}\) and \(\{\mathbf{s}_3,\mathbf{s}_1\}\) respectively; their intersection is the emitter’s position. Solving the two hyperbola equations by hand is awkward, which is precisely the motivation for the algebraic linearization developed below, where we will recover \((40,30)\) from exactly these numbers in closed form.

The Signal and Measurement Model¶

Received-Signal Model¶

Each sensor receives a time-delayed, scaled, noisy copy of whatever the emitter is transmitting. Specifically, let \(s(t)\) be the transmit waveform. Sensor \(i\) receives:

\[x_i(t) = a_i \, s(t - t_i) + n_i(t), \qquad i = 1,\dots,N,\]

where \(a_i\) is a real (or complex, for passband signals) gain capturing propagation loss and antenna response, \(t_i = t_0 + r_i/c\) is the absolute arrival time, and \(n_i(t)\) is additive noise. This model assumes a single dominant line-of-sight path; multipath and non-line-of-sight effects are deferred to a later section.

Defining the TDOA¶

The pairwise TDOA is the difference of arrival times,

\[\tau_{ij} = t_i - t_j = \frac{r_i - r_j}{c} = \frac{|\mathbf{u}-\mathbf{s}_i| - |\mathbf{u}-\mathbf{s}_j|}{c}.\]

The right-hand side makes explicit that the TDOA is a nonlinear function of the emitter coordinates \(\mathbf{u}\). The measurement problem is to estimate \(\tau_{ij}\) from the waveforms \(x_i, x_j\); the localization problem is to invert the nonlinear map from \(\mathbf{u}\) to the collection of TDOAs.

Noise Assumptions¶

We assume each \(n_i(t)\) is zero-mean, wide-sense stationary, Gaussian, and independent of the transmitted signal and of the noise at other sensors. The per-sensor signal-to-noise ratio is

\[\mathrm{SNR}_i = \frac{a_i^2 \sigma_s^2}{\sigma_{n_i}^2},\]

with \(\sigma_s^2\) and \(\sigma_{n_i}^2\) the signal and noise powers. These are idealizations; real noise is often colored and partially correlated across sensors, but they lead to estimators and bounds that perform well in practice, and the framework extends to a general noise covariance when needed.

The Nonlinear Measurement Equations¶

Collecting the \(N-1\) reference-based range differences into a vector \(\mathbf{m}\) with entries \(m_i = c\,\tau_{i1} = r_i - r_1\), the noiseless model is

\[\mathbf{m} = \mathbf{h}(\mathbf{u}), \qquad h_i(\mathbf{u}) = |\mathbf{u}-\mathbf{s}_i| - |\mathbf{u}-\mathbf{s}_1|,\]

and the noisy measurement is \(\tilde{\mathbf{m}} = \mathbf{h}(\mathbf{u}) + \boldsymbol{\varepsilon}\), where \(\boldsymbol{\varepsilon}\) is the range-difference error induced by time-delay estimation errors. The function \(\mathbf{h}\) is nonlinear because of the Euclidean norms, and this nonlinearity is the source of every algorithmic complication that follows. Two broad strategies address it: algebraically linearize by introducing an auxiliary variable (described in the next section), or iteratively linearize about a current estimate (described further below).

Time-Delay Estimation (the Measurement Front End)¶

Before any geometry can be exploited we must extract the delays \(\tau_{ij}\) from the raw waveforms. This is the time-delay estimation (TDE) problem, and its accuracy ultimately caps the accuracy of the entire system.

Cross-Correlation¶

The natural estimator exploits the fact that \(x_i\) and \(x_j\) are noisy, shifted copies of the same waveform. Their cross-correlation,

\[R_{x_i x_j}(\tau) = \mathbb{E}\!\left[ x_i(t)\, x_j(t+\tau) \right],\]

is maximized when the shift \(\tau\) aligns the two copies, i.e. at \(\tau = \tau_{ij}\). The estimator is therefore

\[\hat{\tau}_{ij} = \arg\max_{\tau} \, \hat{R}_{x_i x_j}(\tau).\]

In practice the correlation is usually computed efficiently in the frequency domain via the FFT (just like how large convolutions typically use an FFT), using the cross-power spectral density \(G_{x_i x_j}(f) = \mathcal{F}\{R_{x_i x_j}(\tau)\}\) and an inverse transform.

Python Simulation¶

Enough math, let’s see how this all looks with a simple Python example. First we’ll set up the simulation with some high level parameters such as emitter and sensor position, and the simulated sample rate, which is essentially how much spectrum the receivers will “see”.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from itertools import combinations
from scipy.signal import firwin, lfilter

sample_rate = 50e6
c = 3e8  # speed of light [m/s]
snr_db = 10  # SNR of the received signal at each receiver [dB]
tx_len_samples = 1000 # samples to transmit
rx_positions = np.array([
   [65,  229],   # Rx0
   [676, 123],  # Rx1
   [153, 543],  # Rx2
])
num_rx = rx_positions.shape[0]
tx_position = np.array([153, 355])
pairs = list(combinations(range(num_rx), 2)) # For 3 receivers it's (Rx0,Rx1), (Rx0,Rx2), (Rx1,Rx2) -> 3 pairs

TDOA is not very dependent on the specific signal the transmitter emits, although the bandwidth of the signal does matter, so to keep this simple we’ll have it transmit random noise that is band-limited to a specified bandwidth. If we were to use something like QPSK of the same bandwidth instead, nothing would really change.

bandwidth = 20e6
taps = firwin(numtaps=129, cutoff=bandwidth / 2, fs=sample_rate)
tx_signal = lfilter(taps, 1.0, np.random.randn(tx_len_samples) + 1j * np.random.randn(tx_len_samples))

Next we will simulate the receivers receiving the signal at a delay based on their position. We will use a fractional delay filter like we learned about in the Synchronization Chapter. The rest of the code should look relatively straightforward. We make sure to apply unique AWGN per receiver.

# Simulate what each receiver records
true_distances = np.linalg.norm(rx_positions - tx_position, axis=1)
true_delays = true_distances / c
unknown_tx_time = 1.234e-5   # seconds. arbitrary, unknown to receivers and we won't use it in any TDOA calcs

# Calc the actual TDOAs to act as ground truth
for k, (a, b) in enumerate(pairs):
   true_rd = true_distances[b] - true_distances[a]

# Figure out how many samples we have to simulate
total_delay_samples = (unknown_tx_time + true_delays.max()) * sample_rate
buffer_len = tx_len_samples + int(np.ceil(total_delay_samples)) + 10

# Taken from Synchronization chapter
def frac_delay_filter(delay): # delay is in samples, but it can (and will be) not an integer
   N = 21 # number of taps, keep this odd
   n = np.arange(-(N-1)//2, N//2+1) # -10,-9,...,0,...,9,10
   h = np.sinc(n - delay) # calc filter taps
   h *= np.hamming(N) # window the filter to make sure it decays to 0 on both sides
   h /= np.sum(h) # normalize to get unity gain, we don't want to change the amplitude/power
   return h

# Simulate the delayed signal being received by each sensor
rx_signals = np.zeros((num_rx, buffer_len), dtype=complex)
for i in range(num_rx):
   tau = unknown_tx_time + true_delays[i] # absolute delay at this Rx, in seconds
   tau_samples = tau * sample_rate
   tau_integer_samps = int(np.round(tau_samples))
   tau_frac_samps = tau_samples - tau_integer_samps
   rx = np.zeros(buffer_len, dtype=complex)
   rx[tau_integer_samps:tau_integer_samps+tx_len_samples] = tx_signal
   frac_delay_i = frac_delay_filter(tau_frac_samps)
   rx = np.convolve(rx, frac_delay_i, "same")

   # Each receiver adds its own thermal noise, scaled to hit the SNR set at the top
   signal_power = np.mean(np.abs(tx_signal)**2)
   noise_power = signal_power / 10**(snr_db / 10)
   noise = np.sqrt(noise_power / 2) * (np.random.randn(buffer_len) + 1j * np.random.randn(buffer_len))
   rx_signals[i] = rx + noise

Everything so far was purely for simulation, the rest represents what you would actually do to calculate the TDOA, typically at a central location or one of the sensors, but it needs access to the samples received at all three sensors. It’s not a lot of code, we simply loop through each pair of sensors, calculate the cross-correlation between their received samples, and pull out the peak. Later we will see how to do the subsample version of this for more granularity.

# Estimate the TDOAs using a normal cross-correlation
range_diff = np.zeros(len(pairs)) # meters
for k, (a, b) in enumerate(pairs):
   xcorr = np.correlate(rx_signals[b], rx_signals[a], mode='full')
   peak_lag = np.argmax(np.abs(xcorr)) - (buffer_len - 1) # 'full' puts zero lag at index buffer_len-1
   range_diff[k] = (peak_lag / sample_rate) * c # meters

Not much to it! This gives us the following results:

Python simulation output when doing integer correlation

Note that this code doesn’t fully “solve” the problem, even though it might seem like it does at first glance because the lines intersect exactly at the position of the emitter, but it’s really your brain doing the final “solving” of the position, by looking at the intersection of the hyperbolas. Also, if there was more noise, the hyperbolas would not all intersect at one point. We will dive into automated solutions later in this chapter.

The full Python code (including the plotting portion) can be found here.

Resolution and Sub-Sample Estimation¶

With sampling rate \(f_s\), the correlation is computed on a lag grid spaced \(1/f_s\) apart, so the naive peak resolution is one sample, i.e. \(c/f_s\) in range. This is usually far too coarse, especially if the sensors (and emitter) are close together, e.g., less than 100 meters. There are two options for sub-sample refinement, one way is to interpolate the signals as part of the cross-correlation, and another is to fit a model to the samples around the discrete peak. For the latter, parabolic interpolation through the peak and its two neighbors is the simplest, while sinc-based interpolation is more accurate because the true correlation of a band-limited signal is a sinc-like function. Good interpolation routinely yields delay estimates one to two orders of magnitude finer than the sample period. Below we show an example of doing the interpolated cross-correlation.

U = 16 # correlation upsampling factor
half = (buffer_len + 1) // 2 # number of DC + positive-frequency bins
range_diff = np.zeros(len(pairs)) # meters
for k, (a, b) in enumerate(pairs):
   # Cross-correlation in the frequency domain
   X = np.conj(np.fft.fft(rx_signals[a])) * np.fft.fft(rx_signals[b])

   # Insert zeros in the high-frequency MIDDLE: DC + positive freqs at the front, negative freqs at the back, so it stays a valid FFT layout.
   X_padded = np.zeros(U * buffer_len, dtype=complex)
   X_padded[:half] = X[:half]
   X_padded[U * buffer_len - (buffer_len - half):] = X[half:]

   # Now IFFT to finish the crosscorrelation
   xcorr = np.abs(np.fft.ifft(X_padded)) * U

   # Peak index -> signed lag; indices past the midpoint are negative lags
   peak_idx = np.argmax(xcorr)
   if peak_idx > U * buffer_len // 2:
      peak_idx -= U * buffer_len
   peak_lag = peak_idx / U # sub-sample lag, +ve => Rx_b farther
   range_diff[k] = (peak_lag / sample_rate) * c # meters

When doing the same Python simulation as before, but with subsampling, we get the following results. You would have to zoom in on the left-hand plot to see the accuracy difference.

Python simulation output when doing subsample correlation

Looking at the right-hand plot, we can see how the original integer-only method was off by a decent margin.

The Generalized Cross-Correlation Framework¶

Plain cross-correlation is fragile: if the transmitted signal is narrow in bandwidth or the channel contains multipath, the correlation peak is broad and easily shifted by noise. Knapp and Carter’s Generalized Cross-Correlation (GCC) addresses this by inserting a frequency weighting \(\Psi(f)\) before transforming back to the lag domain:

\[R^{\mathrm{GCC}}_{x_i x_j}(\tau) = \int_{-\infty}^{\infty} \Psi(f)\, G_{x_i x_j}(f)\, e^{j 2\pi f \tau}\, df .\]

The weighting reshapes the spectrum to sharpen and stabilize the peak. Different choices of \(\Psi(f)\) correspond to different classical estimators, and selecting it well is the heart of robust TDE. Common weightings include:

Cross-correlation (\(\Psi = 1\)): the maximum-likelihood choice only in the high-SNR, broadband-flat limit; otherwise suboptimal.
Roth (\(\Psi = 1/G_{x_i x_i}(f)\)): suppresses frequencies where one sensor is noisy.
SCOT (Smoothed Coherence Transform, \(\Psi = 1/\sqrt{G_{x_i x_i}G_{x_j x_j}}\)): symmetric whitening of both channels.
PHAT (Phase Transform, \(\Psi = 1/|G_{x_i x_j}(f)|\)): the most widely used choice in acoustics.

Diving deeper into the GCC-PHAT estimator- by dividing out the magnitude of the cross-spectrum it retains only the phase:

\[R^{\mathrm{PHAT}}_{x_i x_j}(\tau) = \int \frac{G_{x_i x_j}(f)}{\bigl|G_{x_i x_j}(f)\bigr|} e^{j2\pi f \tau} df .\]

Because the delay between two copies of a signal is encoded entirely in the linear phase term \(e^{-j2\pi f \tau_{ij}}\), while the magnitude carries the (often unhelpful) spectral shape and reverberant coloring, whitening to unit magnitude weights every frequency equally and produces a sharp, near-impulsive peak at the true delay. This makes PHAT strikingly robust to multipath. Its weakness is that it also whitens noise-dominated frequencies, so at low SNR the equal weighting amplifies noise; SNR-aware variants reintroduce a coherence-based weighting to compensate. In real systems, because most signals are not always on, you typically have to determine the time-frequency bounding box of the target signal before performing TDOA, so unless there is a lot of interference you can estimate the SNR pretty easily.

Practical Considerations¶

Several effects govern the accuracy of the TDOA results:

The integration window \(T\) trades estimator variance (longer is better, since variance falls roughly as \(1/T\)) against the stationarity assumption and, for moving emitters, against blurring of the delay over the window. Many times this value is determined by the signal itself, e.g. you might perform TDOA on a per-packet basis.
Coherence bandwidth limits which frequencies actually carry usable phase.
Signal bandwidth is decisive: as the Cramér-Rao analysis below shows, delay variance falls as the square of the bandwidth, so wideband signals localize far better than narrowband ones, unlike DOA where we didn’t care about bandwidth (in fact, many of the DOA concepts used a narrowband assumption). That being said, you don’t need to capture the entire signal (from a frequency domain perspective) to perform TDOA, if you are limited by your SDR’s maximum sample rate, and you can only receive a portion of the signal bandwidth, you can still do TDOA!

From a compute perspective, the TDOA computation is dominated by FFTs and is \(O(M\log M)\) per sensor pair for records of \(M\) samples, which is what makes large sensor networks tractable.

GCC-PHAT Python Example¶

The nice thing about PHAT is that we can fold it into the simulation we already built with almost no new code. Recall that the sub-sample estimator above already worked in the frequency domain: it formed the cross-spectrum \(X_a^*(f)\,X_b(f)\), zero-padded it to interpolate, and inverse-FFT to recover the lag-domain correlation. PHAT is just one extra line: before transforming back to the lag domain, we divide the cross-spectrum by its own magnitude, so that every frequency bin contributes with unit weight and only the phase, which is where the delay lives, survives.

U = 16 # correlation upsampling factor
half = (buffer_len + 1) // 2 # number of DC + positive-frequency bins
range_diff = np.zeros(len(pairs)) # meters
for k, (a, b) in enumerate(pairs):
   # Cross-spectrum, same as the sub-sample example
   X = np.conj(np.fft.fft(rx_signals[a])) * np.fft.fft(rx_signals[b])

   # PHAT weighting: divide out the magnitude so only the phase remains
   X = X / (np.abs(X) + 1e-12) # small epsilon avoids divide-by-zero

   # Zero-pad in the high-frequency middle to interpolate, then IFFT
   X_padded = np.zeros(U * buffer_len, dtype=complex)
   X_padded[:half] = X[:half]
   X_padded[U * buffer_len - (buffer_len - half):] = X[half:]
   xcorr = np.abs(np.fft.ifft(X_padded)) * U

   # Peak index -> signed lag; indices past the midpoint are negative lags
   peak_idx = np.argmax(xcorr)
   if peak_idx > U * buffer_len // 2:
      peak_idx -= U * buffer_len
   peak_lag = peak_idx / U # sub-sample lag, +ve => Rx_b farther
   range_diff[k] = (peak_lag / sample_rate) * c # meters

The only change from the sub-sample code is the single X = X / (np.abs(X) + 1e-12) line; the small epsilon in the denominator keeps frequency bins that hold almost no energy from blowing up when we divide. With our wideband, high-SNR simulated signal the result barely differs from plain cross-correlation, because PHAT and cross-correlation coincide in exactly that broadband, high-SNR limit. The payoff shows up in harder conditions: when the spectrum is colored or multipath smears the peak, whitening to unit magnitude collapses the correlation back to a sharp, near-impulsive spike at the true delay, which is why PHAT is the default in acoustics.

Closed-Form Localization Algorithms¶

The measurement equations above are nonlinear and, taken directly, require iterative solution with a good starting point. Closed-form (non-iterative) estimators sidestep this by an algebraic trick: introduce an auxiliary variable that absorbs the nonlinearity and renders the system linear. They are fast, need no initial guess, and cannot get stuck in local minima, making them invaluable both on their own and as initializers for the iterative methods described below.

The Linearization Strategy¶

The trick is to square the range equations and subtract pairs, which cancels the nonlinear \(x^2+y^2\) term and introduces \(r_1\), the range to the reference sensor, as a single auxiliary unknown. Starting with the squared range from the emitter \(\mathbf{u}=(x,y)\) to sensor \(i\) at \(\mathbf{s}_i=(x_i,y_i)\):

\[r_i^2 = (x-x_i)^2 + (y-y_i)^2 = K_i - 2x_i x - 2y_i y + (x^2+y^2), \qquad K_i \equiv x_i^2 + y_i^2 .\]

The troublesome term is \(x^2+y^2\), common to every sensor. Take sensor 1 as reference and subtract its equation from sensor \(i\)’s:

\[r_i^2 - r_1^2 = (K_i - K_1) - 2(x_i-x_1)x - 2(y_i-y_1)y .\]

Now use the measured range difference \(r_{i1}\equiv r_i - r_1 = c\,\tau_{i1}\). Since \(r_i = r_{i1}+r_1\), we have \(r_i^2 = r_{i1}^2 + 2r_{i1}r_1 + r_1^2\), so \(r_i^2 - r_1^2 = r_{i1}^2 + 2 r_{i1} r_1\). Substituting and rearranging,

\[\boxed{2(x_i-x_1)\,x + 2(y_i-y_1)\,y + 2 r_{i1} r_1 = K_i - K_1 - r_{i1}^2}\]

This equation is linear in the unknowns \((x, y, r_1)\), where the range to the reference \(r_1\) is treated as an auxiliary variable. Stacking it for \(i=2,\dots,N\) gives a linear system \(\mathbf{A}\boldsymbol{\theta} = \mathbf{b}\) with \(\boldsymbol{\theta}=[x,y,r_1]^\top\), solvable by ordinary or weighted least squares. The nonlinearity has been quarantined into the single extra unknown \(r_1\).

Spherical Interpolation and Spherical Intersection¶

The earliest closed-form estimators, Spherical Interpolation (SI) and Spherical Intersection (SX) of Schau and Robinson, exploit exactly this structure. They first solve the linear system for \((x,y)\) as a function of \(r_1\), then impose the constraint that ties them together, namely \(r_1^2 = (x-x_1)^2+(y-y_1)^2\), to pin down \(r_1\). SI obtains \(r_1\) by a least-squares projection; SX substitutes the linear solution into the quadratic constraint and solves the resulting scalar quadratic. They are simple and fast but treat the auxiliary variable somewhat crudely, leaving accuracy on the table at higher noise.

Fang’s Method¶

Fang’s algorithm provides an exact algebraic solution for the minimum configuration (3 sensors in 2D, 4 in 3D), giving a determined system rather than an overdetermined one. It is elegant and computationally trivial but does not use redundant sensors, so it cannot average down measurement noise and is sensitive to geometry. It is best viewed as the exact-determined special case that the least-squares methods generalize.

To see how it works, look again at the boxed linear equation above. With three sensors there are exactly two such equations (\(i=2,3\)) but three unknowns \((x,y,r_1)\). That looks underdetermined, but \(r_1\) is not free: it is glued to the position by \(r_1^2=(x-x_1)^2+(y-y_1)^2\). Fang’s trick is to defer that constraint, treat \(r_1\) as a known constant, and solve the two linear equations for \(x\) and \(y\). Because \(r_1\) enters linearly, inverting the \(2\times2\) matrix (well-conditioned as long as the sensors are not collinear) gives both coordinates as straight-line functions of the still-unknown range,

\[x = g_x + h_x\,r_1, \qquad y = g_y + h_y\,r_1 ,\]

with constants \(g_x,h_x,g_y,h_y\) from the matrix inverse. Now cash in the deferred constraint: substituting these into \(r_1^2=(x-x_1)^2+(y-y_1)^2\) collapses everything to a single scalar quadratic \(a\,r_1^2 + b\,r_1 + c = 0\). Solve it, keep the physical root (a range must be positive; the other root typically lands on the wrong hyperbola branch), and back-substitute to read off \((x,y)\). That is the whole method: one \(2\times2\) solve and one quadratic, no iteration and no initial guess. The worked example in the next subsection is precisely this procedure carried out with numbers.

Chan’s Method (Two-Step Weighted Least Squares)¶

The estimator that became the practical standard is Chan and Ho’s two-step weighted least squares (WLS). It is built on the linear system above but treats the statistics correctly and refines the auxiliary variable, achieving accuracy close to the Cramér-Rao bound at small-to-moderate noise.

First step. Treat \(\boldsymbol{\theta}=[x,y,r_1]^\top\) as if its three components were independent and solve the linear system by weighted least squares,

\[\hat{\boldsymbol{\theta}} = (\mathbf{A}^\top \mathbf{W}\mathbf{A})^{-1}\mathbf{A}^\top \mathbf{W}\,\mathbf{b},\]

with the weight \(\mathbf{W}\) chosen as the inverse covariance of the equation errors. Because that covariance itself depends on the unknown ranges, in practice one first solves with \(\mathbf{W}=\mathbf{I}\) (or the raw TDOA noise covariance), then recomputes \(\mathbf{W}\) from the resulting range estimates and re-solves, a one- or two-pass refinement.

Second step. The first step ignored the known relationship \(r_1^2 = (x-x_1)^2+(y-y_1)^2\) that couples the auxiliary variable to the position. The second step restores it: form a new small least-squares problem in the squared quantities \([(x-x_1)^2,(y-y_1)^2,r_1^2]\), using the first-step covariance to weight it, and solve for a corrected position. This second WLS removes much of the bias of the naive linear solution and is what brings Chan’s estimator close to optimal.

The method returns a position directly, with computational cost dominated by inverting small \(3\times3\) matrices, negligible compared with the FFTs of the front end. Its limitations appear at high noise or unfavorable geometry, where the squared-range manipulation amplifies errors and the second step can pick the wrong root; there, the iterative refinement described below, seeded by Chan’s output, is the standard remedy.

Example, Continued: Solving the Three-Sensor Fix in Closed Form¶

Let’s pick up right where the Python simulation left off. At that point we had the range_diff array holding one measured range difference per sensor pair, and earlier we let our brain do the final step by eyeballing where the hyperbolas crossed. Now we’ll replace that eyeballing with the closed-form algebra developed above, recovering the emitter position directly from range_diff and rx_positions. Because we have exactly three sensors in 2D, this is Fang’s minimum-configuration case: two boxed linear equations and one quadratic, no iteration and no initial guess.

We take Rx0 as the reference sensor. The pairs were built as (0,1), (0,2), (1,2), and recall that range_diff[k] for pair (a,b) is \(r_b - r_a\), so the two pairs that include the reference, (0,1) and (0,2), hand us exactly the reference-based range differences \(r_{i0}=r_i-r_0\) that the boxed equation needs.

# Solve for the emitter position in closed form (Fang's method, 3 sensors in 2D)
ref = 0 # use Rx0 as the reference sensor
s = rx_positions.astype(float)
K = np.sum(s**2, axis=1) # K_i = x_i^2 + y_i^2 for each sensor

# Reference-based range differences r_i0 = r_i - r_ref for the two non-reference sensors
others = [i for i in range(num_rx) if i != ref]
r_i0 = np.array([range_diff[pairs.index((ref, i))] for i in others]) # pair (ref,i) holds r_i - r_ref

# Build the 2x2 linear system that gives (x, y) as a function of the unknown range r_ref
M = 2 * (s[others] - s[ref])     # rows: [2(x_i - x_ref), 2(y_i - y_ref)]
d = K[others] - K[ref] - r_i0**2 # right-hand side constants
Minv = np.linalg.inv(M)          # well-conditioned as long as the sensors aren't collinear
g = Minv @ d                     # part of (x, y) that doesn't depend on r_ref
h = -2 * (Minv @ r_i0)           # how (x, y) slide with r_ref:  [x, y] = g + h * r_ref

# Cash in the deferred constraint r_ref^2 = (x - x_ref)^2 + (y - y_ref)^2 -> scalar quadratic in r_ref
p = g - s[ref] # constant part of (x - x_ref, y - y_ref)
a_q = h[0]**2 + h[1]**2 - 1
b_q = 2 * (p[0]*h[0] + p[1]*h[1])
c_q = p[0]**2 + p[1]**2
roots = np.roots([a_q, b_q, c_q])

# Keep the physical root (a range must be positive and real), then back-substitute
r_ref = roots[(roots.real > 0) & (np.abs(roots.imag) < 1e-6)].real.max()
emitter_est = g + h * r_ref

print("Estimated emitter position:", emitter_est) # ~[153, 355]
print("True emitter position:     ", tx_position)

The structure mirrors the math exactly: M and d are the two boxed linear equations, g and h express \(x\) and \(y\) as straight-line functions of the still-unknown reference range \(r_1\) (called r_ref here), and substituting those into \(r_1^2=(x-x_1)^2+(y-y_1)^2\) collapses everything to the scalar quadratic that np.roots solves. We discard the non-physical (negative or complex) root, keep the positive real one, and back-substitute to read off the position. With our high-SNR, wideband simulation the estimate lands right on top of the true emitter at \((153, 355)\), with no human in the loop reading off a hyperbola intersection.

With noisier measurements the two linear equations would no longer be perfectly consistent, the quadratic root would be perturbed, and, because three sensors give us no redundancy to average over, the error would pass straight through. That is exactly where the redundant pairs and the weighting and second step of Chan’s method earn their keep, governing how gracefully the estimate degrades.

Iterative and Statistical Estimation¶

Closed-form methods are fast but make algebraic approximations that cost accuracy at high noise or poor geometry. When the best possible estimate is required, we solve the nonlinear estimation problem directly, typically initialized by a closed-form result.

Nonlinear Least Squares¶

Define the residual between measured and predicted range differences and minimize its weighted squared norm:

\[\hat{\mathbf{u}} = \arg\min_{\mathbf{u}} \bigl[\tilde{\mathbf{m}} - \mathbf{h}(\mathbf{u})\bigr]^\top \mathbf{C}^{-1} \bigl[\tilde{\mathbf{m}} - \mathbf{h}(\mathbf{u})\bigr],\]

where \(\mathbf{C}\) is the covariance of the range-difference errors. This cost has no closed-form minimizer because \(\mathbf{h}\) is nonlinear, so we descend it iteratively.

Taylor-Series (Gauss-Newton) Method¶

Foy’s classical approach linearizes \(\mathbf{h}\) about the current estimate \(\mathbf{u}^{(k)}\) using its Jacobian \(\mathbf{J}\), whose row \(i\) is the gradient of \(h_i\):

\[\frac{\partial h_i}{\partial \mathbf{u}} = \frac{\mathbf{u}-\mathbf{s}_i}{|\mathbf{u}-\mathbf{s}_i|} - \frac{\mathbf{u}-\mathbf{s}_1}{|\mathbf{u}-\mathbf{s}_1|} = \hat{\mathbf{e}}_i - \hat{\mathbf{e}}_1,\]

a difference of unit vectors pointing from the candidate emitter toward sensor \(i\) and the reference. The Gauss-Newton update is

\[\mathbf{u}^{(k+1)} = \mathbf{u}^{(k)} + (\mathbf{J}^\top \mathbf{C}^{-1}\mathbf{J})^{-1}\mathbf{J}^\top \mathbf{C}^{-1}\bigl[\tilde{\mathbf{m}}-\mathbf{h}(\mathbf{u}^{(k)})\bigr],\]

iterated to convergence. Each step solves a small linear system. The method converges quickly when started near the solution, which is exactly why Chan’s closed-form estimate is the preferred seed: it places the iteration in the basin of the global minimum and avoids the spurious local minima that plague hyperbolic cost surfaces, especially in poor geometry.

Maximum-Likelihood Estimation¶

Under Gaussian noise, the negative log-likelihood is, up to constants, exactly the weighted squared residual above. So the maximum-likelihood estimator coincides with weighted nonlinear least squares, the Gauss-Newton iteration is not a heuristic, it is the statistically optimal estimator under the assumed model. This is also the estimator whose covariance the Cramér-Rao bound below predicts.

Let’s put that to work by continuing the Python example one more time. We already have a position from the closed-form solver, emitter_est, and the theory tells us two things: the maximum-likelihood estimate is just the Gauss-Newton iteration above, and the closed-form fix is the ideal seed for it because it drops us right inside the basin of the true minimum. So we’ll start at emitter_est and take a few Gauss-Newton steps, each one re-linearizing the range-difference model at the current guess and solving a tiny least-squares problem for the correction. Unlike Fang’s solver, which used only the two pairs touching the reference sensor, this one uses all three pairs in range_diff, so the extra pair acts as redundancy that the iteration averages over.

# Refine the closed-form fix with Gauss-Newton (= maximum likelihood under Gaussian noise)
u = emitter_est.copy() # seed the iteration with the closed-form estimate
for _ in range(10):
   h = np.zeros(len(pairs))      # predicted range differences at the current guess
   J = np.zeros((len(pairs), 2)) # Jacobian, one row per pair
   for k, (a, b) in enumerate(pairs):
      e_a = (u - s[a]) / np.linalg.norm(u - s[a]) # unit vector from Rx_a toward the guess
      e_b = (u - s[b]) / np.linalg.norm(u - s[b]) # unit vector from Rx_b toward the guess
      h[k] = np.linalg.norm(u - s[b]) - np.linalg.norm(u - s[a]) # predicted r_b - r_a
      J[k] = e_b - e_a # row of the Jacobian is a difference of unit bearing vectors

   residual = range_diff - h # measured minus predicted range differences
   delta, *_ = np.linalg.lstsq(J, residual, rcond=None) # Gauss-Newton step (J^T J)^-1 J^T residual
   u = u + delta
   if np.linalg.norm(delta) < 1e-9: # stop once the update stops moving the estimate
      break

emitter_ml = u
print("ML (Gauss-Newton) estimate:", emitter_ml) # ~[153, 355]
print("True emitter position:     ", tx_position)

A couple of details worth pointing out. Because we assumed the range-difference errors are independent with equal variance, the weight \(\mathbf{C}^{-1}=\sigma^{-2}\mathbf{I}\) is a scalar that cancels out of the update, which is why a plain np.linalg.lstsq (no weight matrix) computes the step exactly; if the pairs had unequal quality we would fold their inverse variances in here. The Jacobian rows are literally the e_b - e_a differences of unit bearing vectors from the math above, so you can watch the geometry enter the estimator directly. Starting from the already-good closed-form seed, the iteration converges in just a handful of steps and lands on the true emitter at \((153, 355)\). In our high-SNR simulation it barely moves off the closed-form answer, but with noisier measurements this is where the extra pair and the iterative refinement pay off, and it is this same \(\mathbf{J}^\top\mathbf{C}^{-1}\mathbf{J}\) that reappears in the Cramér-Rao bound below as the estimator’s covariance.

Robust, Recursive, and Bayesian Extensions¶

Real measurements contain outliers, a multipath-corrupted TDOA can be wildly wrong while the rest are fine. Plain least squares, which squares residuals, is badly distorted by such outliers. Robust estimators replace the squared loss with one that grows more slowly (e.g. Huber’s), or explicitly detect and discard inconsistent TDOAs via residual tests or RANSAC-style consensus.

When the emitter moves, we want to fuse measurements over time rather than localize each instant independently. State-space filtering does this by modeling the emitter’s position (and velocity) as an evolving state. The Kalman filter is optimal for linear-Gaussian dynamics, but the TDOA measurement is nonlinear, so practitioners use the Extended Kalman Filter (which linearizes the measurement with the same Jacobian as above), the Unscented Kalman Filter (which propagates a deterministic set of sigma points through the nonlinearity, avoiding explicit Jacobians and handling stronger nonlinearity better), or, for multimodal or heavily non-Gaussian problems, the particle filter (which represents the posterior by a weighted sample cloud). These trackers also naturally enforce motion continuity, which suppresses the per-snapshot ambiguities of static localization.

Brute-Force Heatmap Approach¶

Every method so far has been algebraic or iterative: we manipulated equations or descended a gradient. But there is a refreshingly simple alternative that needs neither. Lay a grid over the search area, and at every candidate position ask a single question: if the emitter were here, what range differences would the sensors see, and how far off are those from what we actually measured? Squaring and summing those mismatches gives a cost at each grid point, and the emitter is wherever that cost is smallest. The result is a heatmap of the same cost surface the Gauss-Newton iteration was quietly walking down, except now we can see all of it at once.

Back to the Python example, we can do this with the variables we already have, range_diff, rx_positions, and pairs:

# Evaluate the TDOA cost on a grid of candidate emitter positions
gx = np.linspace(0, 700, 400)
gy = np.linspace(0, 700, 400)
GX, GY = np.meshgrid(gx, gy)

cost = np.zeros_like(GX)
for k, (a, b) in enumerate(pairs):
   r_a = np.hypot(GX - rx_positions[a, 0], GY - rx_positions[a, 1]) # range to Rx_a
   r_b = np.hypot(GX - rx_positions[b, 0], GY - rx_positions[b, 1]) # range to Rx_b
   cost += ((r_b - r_a) - range_diff[k])**2 # squared mismatch for this pair, summed over pairs

# The best estimate is simply the grid cell with the lowest cost
iy, ix = np.unravel_index(np.argmin(cost), cost.shape)
emitter_grid = np.array([gx[ix], gy[iy]])
print("Grid estimate:", emitter_grid) # ~[153, 355]

# Invert the cost into a likelihood-style surface so higher = more likely emitter location
likelihood = -np.log10(cost + 1e-9)

Plotting likelihood as an image reveals the geometry directly: the bright ridges trace out the hyperbolas from earlier, and they all funnel into one bright peak at the true emitter. We take the negative log of the cost so that the most likely location is the maximum rather than a minimum, which is easier to read off visually. The trade-offs are exactly what you’d expect. The method is dead simple, needs no initial guess, and cannot diverge or land on the wrong root, so it is a great sanity check and a robust way to seed the iterative refiner. It also handles multimodal cost surfaces gracefully, since it sees every minimum, not just the nearest one. The price is resolution and speed: accuracy is limited by the grid spacing, and cost grows with the number of grid cells, so for a fine answer over a large area you would localize coarsely first and then refine, either by zooming the grid or by handing the result to Gauss-Newton. Below shows the heatmap approach applied to our Python example.

Adding heatmap to the tdoa plot shown earlier

One nice part about the heatmap approach is if there is a lot of error, or sensors with low SNR without realizing it, there may be multiple hot spots on the heatmap, which your brain can notice. The heatmap can even be overlaid on top of a satellite view of the area!

This brute-force approach is not very computationally efficient, and it’s not really an option at all for 3D TDOA. One alternative to calculating every grid point but still brute-forcing it is to draw all of the hyperbolas in 2D but with “width” applied to each one, e.g. by applying a lobe shaped function along the hyperbola so it tapers off.

Performance Analysis and Fundamental Bounds¶

Having estimators in hand, we ask: how accurate can a TDOA system be, and what governs that accuracy? Two ideas answer this: the Cramér-Rao bound, which sets a noise floor from the signals, and geometric dilution of precision, which describes how sensor-emitter geometry amplifies that floor.

Error Propagation¶

System accuracy is a two-stage cascade. First, finite SNR and bandwidth limit how precisely each delay can be measured (TDE error). Second, the geometry maps those range-difference errors into a position error. Writing \(\delta\mathbf{u}\) for the position error and \(\boldsymbol{\varepsilon}\) for the range-difference errors, the linearized relation near the solution is \(\boldsymbol{\varepsilon}\approx \mathbf{J}\,\delta\mathbf{u}\), so the position-error covariance is

\[\mathrm{Cov}(\hat{\mathbf{u}}) \approx (\mathbf{J}^\top \mathbf{C}^{-1}\mathbf{J})^{-1}.\]

This single expression contains both stages: \(\mathbf{C}\) is the measurement quality (from TDE) and \(\mathbf{J}\) is the geometry.

The Time-Delay Estimation Bound¶

We can bound how well any estimator can measure a single delay. For a signal of RMS bandwidth \(\beta\) observed over time \(T\), the variance of any unbiased delay estimate obeys

\[\mathrm{var}(\hat\tau_{ij}) \gtrsim \frac{1}{8\pi^2 \beta^2 T \gamma},\]

where \(\beta\) is the RMS (Gabor) bandwidth of the signal and \(\gamma\) is an effective SNR factor combining the two sensors’ SNRs. Three design lessons fall straight out: variance improves with integration time \(T\), with effective SNR \(\gamma\), and, most strikingly, with the square of bandwidth \(\beta^2\). Doubling the bandwidth quarters the delay variance. This is why wideband and spread-spectrum waveforms are so prized for ranging, and why narrowband emitters are intrinsically hard to localize by TDOA alone.

The Localization Cramér-Rao Lower Bound¶

Combining measurement quality and geometry, the Fisher information matrix for the emitter position is

\[\mathbf{F} = \mathbf{J}^\top \mathbf{C}^{-1} \mathbf{J},\]

and the Cramér-Rao Lower Bound states that any unbiased estimator has covariance no smaller than its inverse:

\[\mathrm{Cov}(\hat{\mathbf{u}}) \succeq \mathbf{F}^{-1} = (\mathbf{J}^\top \mathbf{C}^{-1}\mathbf{J})^{-1}.\]

The bound is the benchmark against which estimators are judged: a method that attains it is efficient. The maximum-likelihood estimator above attains it asymptotically (large \(T\), high SNR), and Chan’s closed-form method attains it at small noise, which is exactly why both are used. The CRLB also cleanly separates the two influences on accuracy: \(\mathbf{C}\) (signal-and-noise quality, improvable by more bandwidth, power, or integration) and \(\mathbf{J}\) (geometry, improvable by sensor placement), studied next. The plot below shows a few example bandwidths and the lower bound over SNR, to give you a feel for how much error you should expect, or at least the floor. The y-axis is the 1-\(\mathrm{\sigma}\) value (one standard deviation).

Geometric Dilution of Precision¶

Suppose your sensors can measure range differences to about 1 m of accuracy, a respectable number for a well-synchronized radio system. You might expect to then pin down the emitter to roughly 1 m as well. But where is the emitter? Picture it sitting comfortably inside a triangle of three sensors: the hyperbolas from each sensor pair slice across one another at steep, nearly right angles, and where they cross is pinned down tightly, so your 1 m of ranging error turns into maybe 1.5 m of position error. Now slide that same emitter far off to one side, well outside the cluster. The hyperbolas now graze each other at a shallow angle, like two gently curving lines that nearly overlap, and the crossing point smears out along the direction they share. The very same 1 m of ranging error can now balloon into tens of meters of position error. Nothing about your hardware changed, only the geometry did.

That blow-up factor has a name: Geometric Dilution of Precision (GDOP). It captures how much the sensor-emitter layout magnifies measurement error into position error. If the range-difference errors are independent and each has the same standard deviation \(\sigma\), so the covariance is \(\mathbf{C}=\sigma^2\mathbf{I}\) (a diagonal matrix with \(\sigma^2\) on the diagonal), then

\[\mathrm{GDOP} = \sqrt{\mathrm{tr}\bigl[(\mathbf{J}^\top\mathbf{J})^{-1}\bigr]}, \qquad \sigma_{\text{position}} = \mathrm{GDOP}\cdot \sigma .\]

Your position error is just your ranging error multiplied by GDOP, so GDOP is a unitless number, always \(\ge 1\), telling you the factor by which ranging error gets magnified at a given emitter location.

Where does the magnification come from? It is baked into the Jacobian \(\mathbf{J}\), whose rows are differences of unit bearing vectors \(\hat{\mathbf{e}}_i - \hat{\mathbf{e}}_1\) (the direction to one sensor minus the direction to another). When those directions point all over the place, \(\mathbf{J}^\top\mathbf{J}\) is well-conditioned (far from singular, so its inverse stays small) and GDOP is small. When they nearly line up, \(\mathbf{J}^\top\mathbf{J}\) becomes nearly singular and GDOP blows up. So an emitter surrounded by the sensors, with bearing vectors well-spread and hyperbolas crossing at large angles, gets a small GDOP (good), while an emitter far outside the cluster, or sensors nearly collinear (almost in a straight line), leaves the bearing vectors nearly parallel and the hyperbolas grazing at shallow angles, giving a huge GDOP (bad).

This is the same effect we saw when hyperbolas degenerate near the ends of the baseline. The takeaway: a TDOA system can be limited far more by where its sensors sit than by how well it measures time, all the nanosecond synchronization and wide bandwidth in the world won’t save a fix in a high-GDOP region of the map.

The figure below shows GDOP heat maps over a plane for (left) three sensors at the vertices of an equilateral triangle and (right) three nearly collinear sensors, showing a broad low-GDOP region inside the triangle versus a narrow usable corridor for the collinear array, with GDOP rising sharply outside the convex hull in both cases.

GDOP heat maps over a plane for (left) three sensors at the vertices of an equilateral triangle and (right) three nearly collinear sensors, showing a broad low-GDOP region inside the triangle versus a narrow usable corridor for the collinear array, with GDOP rising sharply outside the convex hull in both cases.

Sensor-Placement Optimization¶

Because geometry is often a design variable, we can place sensors to minimize error. Common objectives minimize a scalar derived from \(\mathbf{F}^{-1}\), such as its trace (equivalent to GDOP), its determinant (the confidence-ellipse volume), or its largest eigenvalue (worst-case error). The qualitative results are intuitive: spread the sensors widely so long baselines sharpen angular resolution, surround the region of interest so emitters fall inside the convex hull, avoid collinear or coplanar layouts that create ill-conditioned directions, and add sensors where redundancy both lowers variance and guards against outliers. For a moving target or large coverage area, placement is optimized over the whole region, minimizing average or worst-case GDOP, usually by numerical search.

Practical Challenges in Real Systems¶

The model used in this chapter so far omits a few effects that usually dominate the error budget in actual TDOA deployments. Three deserve detailed treatment:

Receiver Synchronization¶

TDOA’s defining advantage, that it needs no synchronized transmitter, comes paired with its defining burden: the receivers must share a common time reference, and any error in that reference enters the measurement directly. If sensor \(i\)’s clock is offset from truth by \(\delta t_i\), the measured TDOA is corrupted by \(\delta t_i - \delta t_j\), an error multiplied by \(c\) in range. The scale is unforgiving for radio systems:

\[c \times 1\ \text{ns} = (3\times10^8\,\text{m/s})(10^{-9}\,\text{s}) = 0.30\ \text{m}.\]

So a 1 ns synchronization error already costs \(\sim\)0.3 m, and 100 ns costs 30 m. Achieving and holding nanosecond-level synchronization across distributed sensors is therefore central to system design. Common mechanisms include GPS-disciplined oscillators (each sensor recovers a \(\sim\)10-100 ns timing reference from satellites), the Precision Time Protocol (IEEE 1588, distributing time over a network to sub-microsecond or, with hardware timestamping, sub-100 ns accuracy), and for the most demanding installations White Rabbit, which reaches sub-nanosecond synchronization over fiber. Two further subtleties matter: clocks not only have a static offset but drift over time, requiring continual discipline; and in acoustic systems, where \(c\) is a million times smaller, the same absolute timing error is a million times less harmful, which is why microphone-array TDOA is comparatively forgiving while radio TDOA lives or dies by its clocks.

Off-the-shelf SDRs that can be easily synchronized include any of the Ettus Research USRPs that can take a GPS disciplined oscillator (GPSDO), such as a B200 with a TCXO (a type of GPSDO). If the sensors are close enough, they can also be synchronized by sharing a PPS signal over a cable, e.g., generated using an OctoClock, which also generates a 10 MHz signal used to frequency synchronize them. Nearly all of the USRPs have a PPS and 10 MHz input, and most have room for a GPSDO, or come with one.

Multipath and Non-Line-of-Sight Propagation¶

Everything so far has assumed a single line-of-sight path between the emitter and receivers. Real environments add reflections (multipath) and can block the direct path entirely. Multipath superimposes delayed copies of the signal, which distort or split the correlation peak and bias the delay estimate; this is exactly the failure GCC-PHAT was designed to resist, since whitening sharpens the direct-path peak relative to the smeared reflections. When the direct path is entirely obstructed, the earliest arriving energy travels an excess distance, so the measured TDOA is biased long in a way no amount of averaging removes, because the error is systematic rather than random. Mitigation strategies include identifying these non-line-of-sight links statistically (non-line-of-sight measurements often show larger variance or violate geometric consistency among redundant sensors), down-weighting or discarding them (requires having way more sensors than three), and exploiting redundancy so that a few corrupted links among many can be detected and rejected by the robust estimators described earlier. In dense indoor multipath, which is an extremely difficult environment for TDOA, model-based delay estimation and machine-learning approaches increasingly outperform classical correlation.

Sensor-Position Uncertainty and Calibration¶

The geometry assumed exact knowledge of the sensor coordinates \(\mathbf{s}_i\). Errors in those coordinates propagate into the position estimate just as measurement errors do, and for distant emitters can be amplified by the same poor geometry that inflates GDOP. Careful survey of fixed installations, GPS positioning of mobile sensors, and self-calibration, jointly estimating sensor positions and emitter locations from emitters of opportunity at known or constrained locations, are the standard responses. A full error budget must include sensor-position uncertainty alongside timing and TDE error; in well-synchronized systems it is often the next-largest term.

Advanced Topics¶

Joint TDOA/FDOA Estimation¶

When the emitter, the sensors, or both are moving, the relative motion imparts a Doppler shift that differs between sensors, a Frequency Difference of Arrival (FDOA). FDOA carries information about the emitter’s velocity and, crucially, adds an independent geometric constraint that improves position observability, especially for the difficult far-field and few-sensor cases where TDOA alone is poorly conditioned. TDOA and FDOA are estimated jointly by maximizing the Complex Ambiguity Function (CAF) over both delay and frequency offset:

\[A(\tau,\nu) = \int_0^T x_i(t)\, x_j^{*}(t-\tau)\, e^{-j2\pi \nu t}\, dt,\]

whose two-dimensional peak gives \((\hat\tau_{ij},\hat\nu_{ij})\) simultaneously. The CAF generalizes the cross-correlation technique above by adding a frequency-search dimension, at correspondingly higher computational cost. Note that this is not the same CAF introduced in the cyclostationary chapter as the cyclic autocorrelation function. Joint TDOA/FDOA processing is the backbone of satellite and airborne geolocation of radio emitters, where a single pair of moving platforms can localize a stationary emitter from the combined delay and Doppler constraints.