Deep Learning-Based AoA Estimation

Train a neural network to perform angle of arrival estimation and see how it generalizes to unseen data.

Angle Estimation: Classical or Deep Learning-Based?

Our reference transmitter calibration tutorial outlines the steps required to perform angle of arrival (AoA) estimation with classical signal processing methods. So, why use neural networks for AoA estimation if good conventional algorithms such as MUSIC exist? Is it just so that we can check a box and claim that we are using "artificial intelligence" or "machine learning"?

No, of course not. There are several advantages (but also some disadvantages) to using a neural-network based approach over model-based AoA estimation. Here is a short comparison:

Classical AoA Estimation
  • No training data (CSI with AoA labels) required
  • Requires absolute phase and time synchronization between antennas
  • Well-known algorithms such as MUSIC and ESPRIT (and simpler, less optimal alternatives) exist
  • Objects in the environment will lead to accuracy issues due to reflections. Antenna properties such as radiation pattern and phase center location also need to be considered for accurate estimates.
  • Locations of individual antennas (assignment of channels, orientation of array, distance between antennas, ...) have to be known precisely
Neural Network-Based Estimation
  • Needs large amounts of labelled training data
  • The neural network can learn to compensate any phase and time offsets between antennas as well as other impairments
  • Need to tune neural network and training hyperparameters, there is no "optimal" solution
  • The neural network can learn to ignore multipath propagation issues caused by the radio environment. Antenna propagation properties are also learned during training.
  • No need to specify antenna array properties, the neural network can learn these from the training set

If there is a sufficient amount of training data, neural-network based AoA estimation usually performs better than classical techniques, as we will show in this tutorial. We will put a special focus on testing the ability of the neural network to generalize AoA estimation to physical regions in space that it has not seen during training.

In case you are unfamiliar with DICHASUS datasets, it might be a good idea to have a look at our position estimation tutorial first. It uses a very similar feature extraction and neural network training and explains both in greater detail.

When it comes to angle of arrival estimation, there are two incident angles that one might want to estimate at the antenna array: Elevation and azimuth. Since, at the time of writing, most of our datasets exhibit a much greater variance in azimuth (compared to elevation), we will focus on azimuth angle estimation in this tutorial. Estimating the elevation angle instead is, however, simply a matter of changing which label to train on, no modifications to the neural network are required.

Training Set and Test Set

As always, we start by downloading the dataset and importing it with TensorFlow. We use subcarrier averaging as a simple feature engineering technique, just like we did in the indoor positioning tutorial:

!mkdir dichasus
!wget --content-disposition https://darus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-2202/2 -P dichasus # dichasus-0152
!wget --content-disposition https://darus.uni-stuttgart.de/api/access/datafile/:persistentId?persistentId=doi:10.18419/darus-2202/3 -P dichasus # dichasus-0153
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

def record_parse_function(proto):
	record = tf.io.parse_single_example(proto, {
		"csi": tf.io.FixedLenFeature([], tf.string, default_value = ""),
		"pos-tachy": tf.io.FixedLenFeature([], tf.string, default_value = "")
	})
	csi = tf.ensure_shape(tf.io.parse_tensor(record["csi"], out_type = tf.float32), (32, 1024, 2))
	pos_tachy = tf.ensure_shape(tf.io.parse_tensor(record["pos-tachy"], out_type = tf.float64), (3))

	dist = tf.sqrt(tf.square(pos_tachy[0]) + tf.square(pos_tachy[1]))
	angle = tf.math.atan2(pos_tachy[1], -pos_tachy[0])

	return csi, pos_tachy[:2], angle, dist

def get_feature_mapping(chunksize = 32):
	def compute_features(csi, pos_tachy, angle, dist):
		assert(csi.shape[1] % chunksize == 0)
		featurecount = csi.shape[1] // chunksize
		csi_averaged = tf.stack([tf.math.reduce_mean(csi[:, (chunksize * s):(chunksize * (s + 1)), :], axis = 1) for s in range(featurecount)], axis = 1)
		return csi_averaged, pos_tachy, angle, dist

	return compute_features

datafiles = ["dichasus/dichasus-0152.tfrecords", "dichasus/dichasus-0153.tfrecords"]
dataset = tf.data.TFRecordDataset(datafiles).map(record_parse_function)

training_set = dataset.filter(lambda csi, pos, angle, dist: dist <= 4 and dist > 0.5)
test_set = dataset.filter(lambda csi, pos, angle, dist: dist > 4)

training_set_features = training_set.map(get_feature_mapping(32))
test_set_features = test_set.map(get_feature_mapping(32))

training_set_features = training_set_features.shuffle(buffer_size = 100000).cache()
test_set_features = test_set_features.shuffle(buffer_size = 100000).cache()

In the above code snippet, we already split the dataset into a training set and a test set. The training set contains all datapoints located at some distance \( d \) that is between \( 0.5 \,\mathrm{m} \) and \( 4 \,\mathrm{m} \) from the antenna array, which, itself, is located at \( (x, y) = (0, 0) \). The test set contains points that are further away than a \( 4\,\mathrm{m} \) radius. Thereby, we ensure that the neural network has to infer angles of arrival for datapoints that it has never seen before in locations that it has not seen either. Let's visualize this:
positions_train = np.vstack([pos for csi, pos, angle, dist in training_set_features])
positions_test = np.vstack([pos for csi, pos, angle, dist in test_set_features])

plt.figure(figsize = (8, 8))
plt.title("Training Set and Test Set", fontsize = 16, pad = 16)
plt.axis("equal")
plt.xlim(-6, 0)
plt.scatter(x = positions_train[:,0], y = positions_train[:,1], marker = ".", s = 1000, label = "Training Set")
plt.scatter(x = positions_test[:,0], y = positions_test[:,1], marker = ".", s = 1000, label = "Test Set")
plt.legend(fontsize = 16)
plt.xlabel("$x$ coordinate [m]", fontsize = 16)
plt.ylabel("$y$ coordinate [m]", fontsize = 16)
plt.tick_params(axis = "both", labelsize = 16)
plt.show()

Neural Network Architecture and Training

We use a simple dense neural network with mean squared error (MSE) loss for the AoA estimate. Really, there are only a few things to pay attention to here: First, we need to make sure to provide only the channel state information features as input and only take the AoA estimate as output. This is why there is a function called only_input_output which removes all irrelevant information from the dataset. Second, and perhaps less obvious, we need to make sure that there is no discontinuity in the desired AoA values in the dataset. This could occur if there are both angles close to \( 0^\circ \) and close to \( 360^\circ \) - MSE loss would not be suitable for these circumstances. However, thanks to the way the desired (ground truth) azimuth angle was computed earlier, we have already avoided this issue, all angles are in the continuous range \( (-90^\circ, 90^\circ) \).
nn_input = tf.keras.Input(shape=(32, 32, 2), name = "input")
nn_output = tf.keras.layers.Flatten()(nn_input)

nn_output = tf.keras.layers.Dense(units = 64, activation = "relu")(nn_output)
nn_output = tf.keras.layers.Dense(units = 64, activation = "relu")(nn_output)
nn_output = tf.keras.layers.Dense(units = 64, activation = "relu")(nn_output)
nn_output = tf.keras.layers.Dense(units = 1, activation = "linear", name = "output")(nn_output)
model = tf.keras.Model(inputs = nn_input, outputs = nn_output, name = "AoA_NN")
model.compile(optimizer = tf.keras.optimizers.Adam(), loss = "mse")

def only_input_output(csi, pos, angle, dist):
	return csi, angle

batch_sizes = [32, 64, 256, 1024, 4096]
for b in batch_sizes:
	dataset_batched = training_set_features.batch(b)
	test_set_batched = test_set_features.batch(b)
	print("\nBatch Size:", b)
	model.fit(dataset_batched.map(only_input_output), epochs = 10, validation_data = test_set_batched.map(only_input_output))

Performance Evaluation

positions = []
predicted_angles = []
true_angles = []
distances = []

for csi, pos, angle, dist in test_set_features.batch(100):
	positions.append(pos.numpy())
	predicted_angles.append(np.transpose(model.predict(csi))[0])
	true_angles.append(angle.numpy())
	distances.append(dist.numpy())

positions = np.vstack(positions)
predicted_angles = np.hstack(predicted_angles)
true_angles = np.hstack(true_angles)
distances = np.hstack(distances)

errorvectors = np.transpose(distances * np.vstack([-np.cos(predicted_angles), np.sin(predicted_angles)])) - positions
errors_abs_deg = np.rad2deg(np.abs(true_angles - predicted_angles))

We feed the complete test set into the neural network (in batches) and let it predict an azimuth angle estimate. In addition, we store the ground truth positions as well as the true angles and true distances to the antenna array in additional NumPy arrays.

Based on this information, we can compute error vectors, that is, vectors that point from the ground truth position, as provided in the dataset, to the estimated position. We only estimate an angle here, so, to obtain a complete position estimate, we will just use the true (ground truth) distance value to compute the coordinates of the location that the error vector points at.

Before taking a closer look at angle estimation errors (and thereby highlighting all the places where AoA estimation does not work so well), let's first see what does work. We can do this by visualizing the estimated AoA over the UE location in a heatmap. Remember, the antenna array is located at \( 0, 0 \).
plt.figure(figsize=(10, 10))
plt.title("AoA Estimate", fontsize = 16, pad = 16)

plt.axis("equal")
plt.xlim(-6, 0)
plt.hexbin(x = positions[:, 0], y = positions[:, 1], C = np.rad2deg(predicted_angles), gridsize = 30)
cb = plt.colorbar()
cb.set_label("AoA Estimate [deg]", fontsize = 16)
plt.xlabel("$x$ coordinate [m]", fontsize = 16)
plt.ylabel("$y$ coordinate [m]", fontsize = 16)
plt.tick_params(axis = "both", labelsize = 16)
plt.show()

Next, we visualize the estimation errors. We could try to plot all the error vectors, but that's simply too many lines. So, instead, we only plot a few vectors (the first 300 entries from the test set, which was randomly shuffled) and display estimation errors as a heatmap again.

plt.figure(figsize=(10, 10))
plt.title("AoA Estimation Error", fontsize = 16, pad = 16)

plt.axis("equal")
plt.xlim(-6, 0)
plt.hexbin(x = positions[:, 0], y = positions[:, 1], C = errors_abs_deg, gridsize = 30)
plt.quiver(positions[:300, 0], positions[:300, 1], errorvectors[:300, 0], errorvectors[:300, 1], color = "red", angles = "xy", scale_units = "xy", scale = 1)

cb = plt.colorbar()
cb.set_label("AoA Estimation Error [deg]", fontsize = 16)
plt.xlabel("$x$ coordinate [m]", fontsize = 16)
plt.ylabel("$y$ coordinate [m]", fontsize = 16)
plt.tick_params(axis = "both", labelsize = 16)
plt.show()
Clearly, there are some locations where estimation errors are higher than in others, but overall, most estimates are within less than \( 10^\circ \) or so. The comparably bad performance in some places could be due to particularly strong multipath components there. Since the neural network has never seen these areas before, it does not really have a chance to learn to compensate for these.
Finally, let's have a look at the distribution of angle estimation errors by plotting the error histogram. The histogram confirms that most estimation errors are below \( 10^\circ \), which goes to show that we are able to achieve at least some generalization capability on unseen data, even with a very simple dense neural network.
plt.figure(figsize=(15, 4))
plt.title("AoA Estimation Error Distibution", fontsize = 16)
plt.xlabel("AoA Estimation Error [deg]", fontsize = 16)
plt.ylabel("Number of Occurences", fontsize = 16)
plt.tick_params(axis = "both", labelsize = 14)
		
plt.hist(errors_abs_deg, bins = 100)
plt.show()

Licensing and Authors

All our datasets are licensed under the CC-BY license, i.e., you are free to use them for whatever you like as long as you reference us in your publications. All code in this tutorial is CC0-licensed. This tutorial was written jointly by Robin Sauerzapf and Florian Euchner.