Home/Magazine Archive/July 2014 (Vol. 57, No. 7)/JMB: Scaling Wireless Capacity with User Demands/Full Text

Research highlights
# JMB: Scaling Wireless Capacity with User Demands

We present JMB, a *joint* multiuser beamforming system, that enables independent access points (APs) to beamform their signals and communicate with their clients on the same channel as if they were one large MIMO transmitter. The key enabling technology behind JMB is a new low-overhead technique for synchronizing the phase of multiple transmitters in a distributed manner. The design allows a wireless LAN to scale its throughput by continually adding more APs on the same channel. JMB is implemented and tested with both software radio clients and off-the-shelf 802.11n cards, and evaluated in a dense congested deployment resembling a conference room. Results from a 10-AP software-radio testbed show a linear increase in network throughput with a median gain of 8.19.4×. Our results also demonstrate that JMB's joint multiuser beamforming can provide throughput gains with unmodified 802.11n cards.

Wireless spectrum is limited; wireless demands can, however, grow unlimited. Busy Wi-Fi networks, for instance, in conference rooms, hotels, and enterprises are unable to keep up with user demands,^{10, 24} even causing high-profile failures like the wireless network collapse during the Steve Jobs iPhone 4 keynote. Cellular networks are in a similar predicament, with their demands forecast to exceed available capacity within the next few years.^{20} This is not for lack of improvement in the performance of wireless devices. Indeed, individual devices have improved dramatically in recent years through innovations like multi-antenna systems, better hardware, and lower receiver noise. The problem, however, is that there is a mismatch between the way user demands scale and network throughput scales; user demands scale with the number of devices in the network but network throughput does not. Unless network throughput also scales with the number of devices, wireless networks will always find it hard to keep up with their demands, and the projected demands will keep exceeding the projected capacity.

In this paper, we present a system that enables a network to scale its throughput with the number of transmitting devices. We focus on the scenario of typical busy wireless environments such as multiple users in a conference room, enterprise, hotel, etc. We enable a wireless LAN to keep increasing its total throughput by continuously adding more access points (APs) on the same channel.

The key idea behind our system is joint multiuser beamforming (JMB). Multiuser beamforming is a known technique that enables a MIMO transmitter to deliver multiple independent streams (i.e., packets) to receivers that have fewer antennas, as shown in Figure 1(a), where a 2-antenna access point delivers two packets concurrently to two single antenna receivers. In contrast, as shown in Figure 1(b), JMB enables multiple access points on the same channel to deliver their packets concurrently to multiple receivers, without interfering with each other. This system scales network throughput with the number of devices and delivers as many concurrent streams/packets as the total number of antennas on all APs. Furthermore, it leverages the continuing performance and reliability improvements of individual devices (e.g., more antennas per device).

The main challenge in implementing JMB stems from the need to synchronize the phases of distributed transmitters. Specifically, the goal of beamforming is to ensure that each client can decode its intended signal without interference. Thus, at each client, the signals intended for the other clients have to cancel each other out. This requires the transmitters to control the relative phases of their transmitted signals so that the desired cancellation can be achieved. Such a requirement is naturally satisfied in the case of a single device performing multiuser beamforming. However, in the case of JMB, the transmitters have independent oscillators, which are bound to have differences in their carrier frequencies. If one simply tries to jointly beamform these independent signals from different transmitters, the drift between their oscillators will make the signals rotate at different speeds relative to each other, causing the phases to diverge and hence preventing beamforming.

At first blush, it might seem that it would be sufficient to estimate the frequency offset (i.e., the drift) between the transmitters, and compensate for the beamforming phase errors as = *t*, where *t* is the elapsed time. However, such an approach is not practical. It is well known^{9} that frequency offset estimates have errors due to noise, and using such estimates to compute phases causes rapidly accumulating errors over time. Even a small error of, say, 10 Hz (4 × 10^{3} ppm, which is several orders of magnitude smaller than the mandated 802.11 tolerance of 20 ppm, or cellular tolerance of 12 ppm) can lead to a large error of 20 degrees (0.35 radians) within a short time interval of 5.5 ms. Such a large error in the phase of the beamformed signals will cause significant interference at the receivers, preventing them from decoding.

JMB presents a simple, practical approach for synchronizing phases of multiple distributed transmitters. Its key idea is to elect one of the APs as a lead and use its phase as a reference for the whole system. Other APs (i.e., the slaves) directly measure the phase of the lead AP and change the phase of their signals to maintain a desired alignment with respect to the lead. In particular, JMB precedes every data packet with a couple of symbols transmitted by the lead AP. The slave APs use these symbols to directly measure the required phase correction for proper beamforming. Since this is a direct phase measurement as opposed to a prediction based on frequency offsets, it has no accumulated errors. After correcting for this phase error, the slave APs use the estimate for their frequency offset to predict any phase changes throughout the packet and correct for it. This bounds the maximum phase error accumulation to the duration of a packet. One can use a simple long-term average for the frequency offset to ensure that the phase error accumulated for the duration of a packet is within the desired performance bounds.

In the rest of the paper, we expand on this basic idea and demonstrate that it can deliver accurate joint beamforming across distributed transmitters. Further, we also extend this idea to work with off-the-shelf 802.11n cards. This would allow organizations to directly leverage JMB by simply upgrading their AP infrastructure, without requiring any modification to the clients.

We implemented JMB in two environments:

- The first environment consists of USRP2 APs and receivers, where both APs and clients can be modified. Here, we verify the scaling properties of JMB and also perform finer grained analysis of its components.
- The second environment consists of USRP2 APs and receivers with Intel Wi-Fi Link 5300 adapters. Each AP consists of two USRP2s connected via an external clock and configured to act as a 2-antenna MIMO AP. Correspondingly, each receiver Wi-Fi card has two antennas enabled. Here, we verify that JMB can provide throughput gains with off-the-shelf 802.11n cards, and further that it can provide these gains with multi-antenna devices.

We evaluated JMB in an indoor testbed using APs and receivers deployed densely in a room to simulate a conference room scenario. Our results reveal the following findings:

**USRP testbed**: JMB's throughput increases linearly with the number of APs. In particular, in our testbed, which has 10 APs, JMB can achieve a median throughput gain of 8.19.4× over traditional 802.11 unicast, across the range of 802.11 signal to noise ratios (SNRs).**802.11 testbed**: JMB's ability to linearly scale the network throughput with the number of transmitters applies to off-the-shelf 802.11 clients. Specifically, JMB can transmit simultaneously from two 2-antenna APs to two 2-antenna 802.11n clients to deliver a median throughput gain of 1.8× compared to traditional 802.11n.

**Contributions**: This work presents the first system that scales wireless throughput by enabling joint beamforming from distributed independent transmitters. We achieve this by designing a simple, practical approach for phase synchronization across multiple distributed transmitters. We also show that our system can deliver throughput gains from joint beamforming with off-the-shelf 802.11n cards.

The full version of the paper^{18} has a detailed survey of related work. In this version, we provide a brief overview.

Prior empirical systems that attempt to perform distributed multiuser beamforming ^{4, 15, 19} require tight synchronization using global positioning system (GPS) clocks or a shared oscillator, or joint decoding by exchanging received signals. Other systems that allow multiple nodes to transmit simultaneously, such as MU-MIMO in LTE,^{12} SAM,^{21} and multiuser beamforming,^{1} provide only constant throughput gain and do not scale with the number of APs in the system. A third strand of work harnesses channel diversity gains using systems like distributed antennas and SourceSync^{3, 17} or provides directional gains using phased arrays,^{6} but cannot provide multiplexing gains and hence cannot scale throughput with the number of APs in the system. In contrast to all these systems, JMB empirically achieves tight phase synchronization using independent oscillators at the devices in the network, allows devices to work independently without sharing clock signals, and scales throughput linearly with the number of APs in the system. Further, it can work with off-the-shelf 802.11n cards.

Prior theoretical work^{2, 22} on distributed phase synchronization assumes synchronous oscillators and only provides one-time phase offset calibration. Prior theory^{16} also proves that distributed MIMO scales wireless capacity with the number of nodes. While JMB builds on this foundational work, JMB is the first empirical system that shows linear scaling of throughput with the number of transmitters in practical systems with unsynchronized oscillators and resulting time-varying phase differences.

JMB is designed for the wireless downlink. It is applicable to wireless LANs, especially in dense deployments like enterprises, hotels, and conference rooms. JMB APs can operate with off-the-shelf Wi-Fi client hardware. Our techniques are applicable to cellular networks, but the details are beyond the scope of this paper.

JMB APs are connected by a high-throughput backend, say, Gigabit Ethernet, like APs are today. Packets intended for receivers are distributed to all APs over the shared backend. JMB enables the APs to transmit concurrently to multiple clients as if they were one large MIMO node, potentially delivering as many streams (i.e., packets) as the total number of antennas on all APs.

In the next few sections, we describe how JMB works. We start with the basic idea that enables distributed phase synchronization. We then describe our protocol implementing this basic idea for emulating a large MIMO node. We then extend our system to integrate our design with off-the-shelf Wi-Fi cards.

The chief goal of distributed phase synchronization is to enable different transmitters powered by different oscillators to emulate a single multi-antenna transmitter where all antennas are driven by the same oscillator. Intuitively our solution is simple: We declare one transmitter the lead, and make all other transmitters synchronize to the oscillator of the lead transmitter, that is, each transmitter measures the offset between its oscillator and the lead oscillator and compensates for the offset by appropriately correcting the phase of its transmitted signal. This behavior makes all transmitters act as if they were antennas on the same chip controlled by the same oscillator.

We now demonstrate how this intuitive design can deliver the proper MIMO behavior and hence enable each receiver to correctly decode its intended signal without interference. For simplicity, we consider a scenario of two single-antenna APs transmitting to two single-antenna clients, as shown in Figure 2. Let *h*_{ij}, where *i, j* {1, 2}, be the channel to client *i* from AP *j, x*_{j}(*t*) the symbol that needs to be delivered to client *j* at time *t*, and *y*_{j}(*t*) the symbol that is received by client *j* at time *t*. Correspondingly, let **H** = [*h*_{ij}], *i, j* {1, 2}, be the 2 × 2 channel matrix, be the desired symbol vector, and be the received symbol vector.

**No oscillator offset**: Assume first that there are no oscillator offsets between any of the APs and clients. If each AP *i* simply transmits the signal *x*_{i}(*t*), each client will receive a linear combination of the transmitted signals. Since each client has only one antenna, client 1 receives *y*_{1}(*t*) = *h*_{11}*x*_{1}(*t*) + *h*_{12}*x*_{2}(*t*) and client 2 receives *y*_{2}(*t*) = *h*_{21}*x*_{1}(*t*) + *h*_{22}*x*_{2}(*t*). Each of these equations has two unknowns, and hence, neither client can decode its intended data.

In order to deliver two concurrent packets to the two clients, the APs need to ensure that each client receives only the signal intended for it (i.e., it experiences no interference from the signal intended for the other client). Specifically, we need the effective channel experienced by the transmitted signal to be diagonal, that is, it should satisfy:

where *g*_{11} and *g*_{22} are any nonzero complex numbers. In this case, the received signal will simply appear at each receiver as if it has experienced the channel *g*_{ii}, which each receiver can estimate using standard techniques.

The APs can achieve this result by using *beamforming*. In beamforming, the APs measure all the channel coefficients from the transmitters to the receivers at time 0. Then, instead of transmitting *x*_{1}(*t*) and *x*_{2}(*t*) directly, the APs transmit:^{a}

In this case, the two clients receive:

Since **HH ^{1}** =

**With oscillator offset**: What happens when the oscillators of the APs and clients have different frequencies? Let _{Ti} be the oscillator frequency of AP *i*, and _{Rj} the oscillator frequency of client *j, i, j* {1, 2}. In this case, the channel at time *t*, **H**(**t**), can be written as:

where *j* = *sqrt*(1). Because the oscillators rotate with respect to each other, the channel no longer has a fixed phase.

Now, if the APs try to perform beamforming as before, using the channel value they computed at time *t* = 0 and transmitting , the clients receive:

The product **H**(**t**)**H**^{1} is no longer diagonal, and hence the receivers cannot decode their intended signal. Thus, standard MIMO beamforming does not work in this case.

So, how can one do beamforming with such a time-varying channel? A naive approach would try to make each transmitter compute **H**(**t**) at every *t* and then multiply its time signal by **H**(**t**)^{1}. Say that the network has *N* APs and *N* clients. Then such an approach would require each transmitter to maintain accurate estimates of *N*^{2} frequency offsets of the form _{ij} = _{Tj} _{Ri}. (Further since nodes can only measure offsets relative to other nodes, but not the absolute frequencies of their oscillators, the number of estimates cannot be reduced to *N*.) Measurement errors from all of these estimates will accumulate, prevent accuracy of beamforming, and create interference at the receivers. However, according to our initial intuition, we can make multiple transmitters act as if they were one MIMO node, and hence do accurate beamforming, by having each transmitter estimate only its frequency offset to the lead transmitter. Said differently, our intuition tells us that it should be possible to reduce the number of frequency offset estimates that each transmitter maintains from *N*^{2} to 1. Let us see how we can achieve this goal.

Observe that we can decompose the channel matrix at time *t* as **H**(**t**) = **R**(**t**)**HT**(**t**), where **H** is time invariant and **R**(**t**) and **T**(**t**) are diagonal matrices defined as:

Since **R**(**t**) is diagonal, it can function analogous to the **G** matrix in Equation (1). Thus, if the transmitters transmit the modified signal **T**(**t**)^{1} at time *t*, then the received signal can be written as:

which reduces to the desired form of Equation (1):

Note that **T**(**t**) is also diagonal, and as a result the transmitter phase correction matrix

is also diagonal. Further, the phase correction entry for each AP depends only on the oscillator phase of that AP. This means that if each AP, *i*, knows its phase, , at time *t*, it can simply compensate for that phase and the AP will not need any additional frequency or phase measurements. Unfortunately, this is not practical. An AP has no way to measure the exact phase change of its oscillator locally.

We address this difficulty by observing that the channel equation is unchanged when we multiply by , that is,

Since the new observed channel matrix is still diagonal, the clients can still continue to decode the received signal as before.

The resulting system implements our initial intuition.

We start by describing the protocol at a high level and follow by the detailed explanation. JMB's distributed transmission protocol works in two phases:

- JMB starts with a
*channel measurement phase*, in which the APs measure two types of channels: (1) the channels from themselves to the receivers (i.e., the channel matrix**H**), which is the beamforming channel matrix whose inverse the APs use to transmit data concurrently to their clients; and (2) the channels from the lead AP to each slave AP (the*h*_{i}^{lead,}s), which enable each slave AP to determine its relative oscillator offset from the lead AP. - The channel measurement phase is followed by the
*data transmission phase*. In this phase, the APs transmit jointly to deliver concurrent packets to multiple receivers. Data transmission uses beamforming after having each slave AP corrects for its frequency offset with respect to the lead AP.

Note that a single channel measurement phase can be followed by multiple data transmissions. Channels only need to be recomputed on the order of the coherence time, which is several hundreds of milliseconds in typical indoor scenarios.^{5} Section 7 describes how JMB reduces channel measurement overhead in greater detail.

We now describe the channel measurement and data transmission phases in greater detail. (The description below assumes symbol level time synchronization, for which we use the scheme in Rahul et al.,^{17} which provides tight synchronization up to a few nanoseconds. Our experimental results also incorporate an implementation of that scheme.)

**5.1. Channel measurement**

The goal of channel measurement is to obtain a snapshot of the channels from all APs to all clients, that is, **H** and the reference channels from the lead AP to the slave APs, that is, the *h*_{i}^{lead}, *i*.

The key point is that *all these channels have to be measured at the same time*, which is the reference time *t* = 0. Otherwise the channels would rotate with respect to each other due to frequency offsets and hence be inconsistent. Below, we divide channel measurement into a few subprocedures.

**Collecting measurements**. The lead AP starts the channel measurement phase with a synchronization header, followed by channel measurement symbols, that is, known orthogonal frequency division multiplexing (OFDM) symbols that the clients can use to estimate the channel. The channel measurement symbols are separated by a constant gap, whose value is chosen to permit the slave APs to send their channel measurement symbols interleaved with the symbols from the lead AP. When the slave APs hear the synchronization header, they know to transmit their channel measurement symbols in the gap, one after another, as shown in Figure 3.

Thus, channel measurement symbols are repeated and interleaved. They are repeated to enable the clients to obtain accurate channel measurements by averaging multiple estimates to reduce the impact of noise. They are interleaved because we want the channels to be measured as if they were measured at the same time. Since exactly simultaneous transmissions will lead the APs to interfere with each other, JMB performs a close approximation to simultaneous transmission by interleaving symbols from different APs.

**Estimating H at the clients**. Upon reception of the packet in Figure 3, each client performs three tasks: it computes its carrier frequency offset (CFO) to each AP; it then uses its knowledge of the transmitted symbols and the CFO to compute the channel from each AP to itself; and finally it uses its knowledge of the CFOs to rotate the phase of the channels so that they look as if they were measured exactly at the same time. We detail these tasks below.

Different transmitters (i.e., APs) have different oscillator offsets to receivers, and each receiver needs to measure the frequency offset from each transmitter to correct the corresponding symbols from that transmitter appropriately. To enable this, the channel measurement transmission uses CFO symbols from each AP followed by channel estimation symbols similar to traditional OFDM.^{9} The only departure is that the receiver computes and uses different CFO and channel estimates for symbols corresponding to different APs.

Note that these channel estimates are still not completely simultaneous, in particular, the channel estimation symbols of slave AP *i* is separated from the symbol of the lead AP by *i* 1 symbol widths, as shown in Figure 3. The receiver compensates for this by rotating the estimated channel for AP *i* by (in each OFDM subcarrier), where *T* is the duration of one OFDM symbol, *k* is the index of the interleaved symbol, and *D* is the duration of the lead AP synchronization header. This ensures that all channels are measured at one reference time, which is the start of the synchronization header. The receiver averages the channel estimates (in each OFDM subcarrier) from each AP to cancel out the noise and obtain an accurate estimate. The receivers then communicate these estimated channels back to the transmitters over the wireless channel.

**Estimating the**. Each slave AP uses the synchronization header to compute the value of the channel from the lead AP to itself at the reference time*h*_{i}^{lead}'s at the slave APs*h*_{i}^{lead}(0).

Note that at the end of the channel measurement phase, each slave AP *i* has the entire channel matrix to be used for beamforming, as well as a reference channel, *h*_{i}^{lead} (0), from the lead AP which it will use during data transmissions, with all channels measured with respect to one reference time.

**5.2. Data transmission**

Now that the channels are measured, the APs can use beamforming to transmit data concurrently without interference.

**AP coordination**: The APs need to agree on which packets are sent concurrently in one beamforming frame. To do this we leverage the bandwidth of the backend Gigabit Ethernet to send all client packets to all APs. The lead AP makes all control decisions and communicate them to the slave APs over the Ethernet. In particular, it determines which packets will be combined in a data transmission and communicates it to the slave APs over the wired backend.**Beamforming**: Client packets are transmitted by joint beamforming from the JMB APs participating in the system. Note that slave APs need to correct the phase of their signal prior to transmission. One way to do this would be for each slave to estimate the frequency offset_{lead}_{slave}from the lead to itself (using the synchronization header from the previous phase) and then compute the net elapsed phase by calculating (_{lead}_{slave})*t*, where*t*is the time elapsed since the channel measurement was taken. However, this would lead to large accumulated errors over time because of inaccuracies in the initial frequency offset measurement. For example, even a small error of 100 Hz in the measurement of the initial frequency offset can lead to a large phase error of radians in as short a timespan as 20 ms, and hence significantly affect the phase alignment required for correct beamforming. Unless addressed, this error would prevent JMB from amortizing the cost of a single channel measurement over the coherence time of the channel, for example, 250 ms, and would force the system to repeat the process of measuring**H**every few milliseconds, which means incurring the overhead of communicating the channels from all clients to the APs almost every packet.

JMB avoids this issue of accumulating error over large timescales by directly measuring the phase difference between the lead AP and the slave AP. Said differently instead of multiplying the frequency offset (= _{lead} _{slave}) by the elapsed time (which leads to errors that accumulate over time), JMB directly measures the phase difference (*t*) (= (_{lead} _{slave})*t*).

In JMB the lead AP initiates data transmission using a synchronization header, as in channel estimation. Each slave AP uses this synchronization header to measure the current channel, , from the lead AP to itself. Note that the current channel will be rotated relative to the reference channel because of the oscillator offset between the lead AP and slave AP. In particular, . Each slave can therefore compute directly, from its two measurements of the lead AP channel. Such an estimate does not have errors that accumulate over time because it is purely a division of two direct measurements. The slave then multiplies its transmitted signal by this quantity, as described in Section 4.

Now that all AP oscillators are synchronized at the beginning of the data transmission, the slave AP also needs to keep its oscillator synchronized with the lead transmitter through the actual data packet itself. It does this by multiplying its transmitted signal by , where *t* is the time since the initial phase synchronization at the beginning of the joint transmission. Note that this offset estimate only needs to be accurate within the packet, that is, for a few hundred microseconds or about 2 ms at most. JMB APs maintain a continuously averaged estimate of their offset with the lead transmitter across multiple transmissions to obtain a robust estimate that can maintain accurate phase synchronization within a packet.

Two additional points are worth noting. First, for ease of exposition, we have discussed the entire system so far in the context of correcting carrier frequency offsets. However, any practical wireless system has to also account for the sampling frequency offsets. Note that any offset in the sampling frequency just adds to the phase error in each OFDM subcarrier. Since our phase offset estimation using the synchronization header, described in Section 5, estimates the overall phase, it automatically accounts for the initial phase error accumulated from sampling frequency offset. Within each packet, the JMB slave APs correct for the effect of sampling frequency offset during the packet by using a long-term averaged estimate, similar to the carrier frequency offset.

Second, as mentioned earlier, in Section 5, JMB APs are synchronized in time using Rahul et al.^{17} As described in Rahul et al.,^{17} due to differences in propagation delays between different transmitters and different receivers, one cannot synchronize all transmitted signals to arrive exactly at the same time at all receivers. It is important to note that JMB works correctly even in the presence of different propagation delays between different transmitters and receivers. This is because the signals from different JMB APs will arrive within a cyclic prefix of each other at all receivers.^{b} The delay differences between the signals from different APs at a receiver translate to a relative phase difference between the channels from these APs to that receiver. JMB's channel measurement phase captures these relative phase differences in the channel matrix, and JMB's beamforming then applies the effect of these phase differences while computing the inverse of the channel matrix.

**5.3. Overarching principles**

In summary, the core challenge met by JMB's design is to accurately estimate and track the phase differences between each of the *N* clients and *N* APs. This challenge is particularly arduous for two reasons: (1) each receiver must simultaneously track the phase of *N* independent transmitters, and (2) errors in the estimates in the CFO result in phase offsets that accumulate over time, quickly leading to very large errors. Our general approach to tackling these challenges is to have all transmitters and receivers synchronize their phase to that of a single lead transmitter. Our implementation of this approach has been guided by following three overarching principles:

**Between APs and within a packet we can use estimated frequency and sampling offsets to track phase**: We can measure the frequency and sampling offsets*between APs*accurately enough that the accumulated phase differences within a single packet (tens to a few hundreds of*microseconds*) are not significant enough to harm performance. Specifically, since APs are a part of the infrastructure, and CFOs do not change significantly over time, we can get very accurate estimates of the CFO between APs by averaging over samples taken across many packets.**Between APs and**: The across packet time scales (tens to hundreds of*across*packets we*cannot*use estimated frequency and sampling offsets to track phase*milliseconds*) are large enough that even with extremely accurate estimates of the frequency and sampling offsets, the accumulated phase differences from residual errors will lead to significant performance degradation. To handle this, JMB uses a single header symbol to directly estimate the total*phase offset*and resync the phases of all nodes at the beginning of each packet.**Between a client and an AP we**: Since clients are a transient part of the network, we cannot get accurate enough estimates of frequency and sampling offsets to use for phase tracking even within a single packet. Thus, each client uses standard OFDM techniques to track the phase of the lead AP symbol by symbol. Additionally, when performing channel estimation, the APs interleave their packets so that the correction of the channels to a common reference time has minimal error.*cannot*use estimated frequency and sampling offsets to track phase even through a packet

In order for JMB to work with clients using off-the-shelf 802.11n cards, JMB needs to address two challenges:

**Sync header**: The sync header transmitted by the lead AP to allow the slave APs to compute their oscillator offset, and trigger their transmission, is not supported by 802.11.**Channel measurement**: Recall that JMB requires a snapshot of the channel from all transmitters to all receivers measured at the same time. In Section 4, we described how to do this with a custom channel measurement packet format with interleaved symbols, which allows a receiver to measure channels from all transmitters. However, such a packet format is not supported by 802.11, and hence 802.11n cards cannot simultaneously measure channels from all APs at the same time.

JMB solves these issues by leveraging 802.11n channel state information (CSI) feedback for beamforming. We now describe JMB's solutions to the above challenges.

**6.1. Sync header**

The lead AP in JMB needs to prefix each transmission with a sync header that allows the slave transmitters to measure their relative oscillator offset from the lead, and also triggers their joint transmission. A mixed mode 802.11n packet essentially consists of an 802.11n packet prefixed with five legacy symbols. These legacy symbols are only intended to trigger carrier sense in 802.11a/g nodes and are not used by 802.11n receivers. Thus, the lead JMB can use these legacy symbols as a sync header. JMB slave APs use the legacy symbols to measure their oscillator phase offset from the lead, correct their transmission signal, and join the lead AP's transmission after the legacy symbols when the actual 802.11n symbols are transmitted.

**6.2. Channel measurement**

802.11n does not support the interleaved packet format that allows JMB to measure a snapshot of the channels from all the transmitters to a receiver simultaneously. Further, an 802.11n receiver with *K* (at most 4) antennas can measure at most *K* channels at a time. In a JMB system, the total number of transmit antennas across all APs is larger than the number of antennas on any single receiver. Thus, a receiver with off-the-shelf 802.11n cards will be unable to simultaneously measure channels from all transmit antennas to itself.

Naively, one could measure the channels from all transmit antennas by transmitting a separate packet from each AP, and then correcting these measurements using the estimated frequency offsets to the receiver as described in Section 5.1.

Unlike the scenario in Section 5.1 where the transmissions from different APs are separated from each other by only a few symbols (using interleaving), the transmissions from different APs here are separated by at least one packet width. As discussed in Section 5.3, this separation would induce a large accumulated phase error due to inaccuracy in receiver frequency offset estimates.

JMB instead performs channel measurement by "tricking" the receiver into measuring channels from different AP antennas simultaneously. This trick allows JMB to measure the channel from each AP antenna to the receiver in conjunction with a common reference channel to the receiver. Using such a common reference across all measurements allows JMB to avoid measuring *receiver frequency offset*, and instead directly estimate and compensate *phase offset* between different measurements, as we describe in Section 7.

For simplicity, we focus on the scenario in Figure 4 with two APs and one client, where each node has two antennas. We will only describe the measurements to *R*_{1} since channels to *R*_{2} are naturally measured simultaneously with *R*_{1} in exactly the same manner.

At time *t*_{0}, *L*_{1} and *L*_{2} transmit a two-stream packet jointly to *R*_{1}. This measurement gives us the channels *L*_{1} *R*_{1} and *L*_{2} *R*_{1} at time *t*_{0}. In addition, *S*_{1} measures the channel *L*_{1} *S*_{1} using the synchronization header.

At time *t*_{1}, *L*_{1} and *S*_{1} trick the receiver by jointly transmitting a two-stream packet from two different APs. This measurement gives us the channels *L*_{1} *R*_{1} and *S*_{1} *R*_{1} at time *t*_{1}. Again, *S*_{1} measures the channel *L*_{1} *S*_{1} using the synchronization header.

The challenge is that we would like to obtain the channel *S*_{1} *R*_{1} at time *t*_{0} but we have only the channel *S*_{1} *R*_{1} measured at *t*_{1}.

We therefore need to correct our measured channel by the accumulated phase offset between *S*_{1} and *R*_{1} in the time interval *t*_{0} to *t*_{1}. To do this, we take advantage of the fact that we can compute the accumulated phase offset between both *L*_{1} and *R*_{1}, and between *L*_{1} and *S*_{1} in the time interval *t*_{0} to *t*_{1}.

*L*_{1}and*R*_{1}: We can compute this accumulated phase offset using the measurements of the channel*L*_{1}*R*_{1}at time*t*_{0}and time*t*_{1}.*L*_{1}and*S*_{1}: We can compute this accumulated phase offset using the measurements of the channel*L*_{1}*S*_{1}at time*t*_{0}and time*t*_{1}.

The difference between these two accumulated phase offsets gives us the desired accumulated phase offset between *S*_{1} and *R*_{1} in the time interval *t*_{0} to *t*_{1}.

We can similarly measure the channel *S*_{2} *R*_{1} in the next time slot, say *t*_{2}, and rotate it back to time *t*_{0}. We can repeat this process for all AP antennas.

The scheme in Section 4 assumed that all channels from all APs to all receivers are measured simultaneously. In Section 6.2, we showed that we can relax this assumption for a *single* receiver. That is, we can measure channels from different APs to that receiver at different times by using a shared reference measurement across all APs for that receiver. But what about channels to *another* receiver? If this receiver joins the network after the channels to the first receiver are measured, there is no opportunity for a shared reference measurement between the two receivers. It might therefore seem that JMB's requirement for all channels to be measured at the same time would necessitate measurement of channels to all receivers whenever a receiver joins the network, or when a single receiver's channels change.

In fact, we can show that such full measurement is not necessary, and that JMB can decouple channel measurements to different receivers. The key idea is that JMB can use the channels from the lead AP to slave APs as a shared reference, instead of the channel from the lead AP to a receiver as was the case in Section 6.2. We prove in Rahul et al.^{18} that using such a shared reference allows JMB to measure channels to different receivers at different times, and still correctly perform multiuser beamforming.

So far, we have described the use of JMB for multiplexing. The same principles apply to diversity except that in this case, we have all the APs transmitting jointly to a single client, say client 1. Each AP then computes its beamformed signal as and slaves continue to perform distributed phase synchronization as before.

In this paper, we have described JMB's physical layer that enables multiple APs to transmit simultaneously to multiple receivers. We refer the reader to the full version^{18} for a discussion of how JMB's link layer (MAC, carrier sense, acknowledgments, retransmissions, etc.) is designed to use this capability.

We implement JMB's AP design in USRPs and evaluate it with both USRP and off-the-shelf 802.11n clients.

**Implementation for the software radio testbed**: Each node is equipped with a USRP2 board and an RFX2400 daughterboard, and communicates on a 10 MHz channel in the 2.4 GHz range. We implement OFDM in GNURadio using various 802.11 modulations (BPSK, 4QAM, 16QAM, and 64QAM), coding rates, and choose between them using the effective SNR bitrate selection algorithm.^{8}

To perform correct phase alignment, concurrent transmitters must be synchronized at the sample level. We do this by using USRP2 timestamps to synchronize transmitters despite delays introduced by software. Before every data packet, the lead AP sends a trigger signal on the medium at *t*_{trigger}. All other APs log the timestamp of this signal, add a fixed delay *t*_{} to it, and then transmit concurrently at this new time. We select *t*_{} as 150 s based on the maximum delay of our software implementation. Finally, to optimize the software turnaround, we did not use GNURadio, but wrote our own C code that directly interacts with the USRP hardware.

**Implementation for the 802.11n testbed**: There are two main differences between this testbed and the one above. First, each client in this testbed uses an off-the-shelf 802.11n card. Second, each node has two antennas and can act as a MIMO node. Our objective is to show that JMB extends beyond single antenna systems; for example, it can combine two 2 × 2 MIMO systems to create a 4 × 4 MIMO system.

Each AP consists of two USRP2 nodes connected to an external clock, acting as a 2-antenna node. Each client is a PC equipped with an Intel Wi-Fi Link 5300 a/b/g/n wireless network adapter on which two antennas are enabled. The Intel Wi-Fi Link 5300 adapters are updated with a custom firmware and associated `iwlwifi`

driver in order to obtain the channel state information in user space.^{7}

The AP software implementation is similar to the other testbed except that we make the channel width 20 MHz to communicate with 802.11n cards. The packet format is also changed to match 802.11n. The client software collects the channel measurements from the firmware and logs correctly decoded packets.

**Testbed topology**: We evaluate JMB in an indoor testbed (shown in Rahul et al.^{18}) that simulates a conference room, with APs deployed on ledges near the ceiling and clients scattered through the room. In every run, the APs and clients are assigned randomly to these locations. The testbed exhibits diverse SNRs as well as both line-of-sight and non-line-of-sight paths due to obstacles such as pillars, furniture, ledges, etc. The APs transmit 1500 byte packets to the clients.

We evaluate JMB both through microbenchmarks of its individual components and an integrated system on both USRP and 802.11n testbed. We refer the reader to Rahul et al.^{18} for the microbenchmarks and focus on the system performance in this section.

**11.1. Increase of network throughput with the number of APs**

JMB's key goal is to increase network throughput with the number of APs. This experiment verifies if JMB delivers on that promise.

**Method**. We evaluate JMB's performance in three effective SNR ranges: low (612 dB), medium (1218 dB), and high (>18 dB). For each range, we place a certain number of JMB nodes in random AP locations in the testbed. We then place the same number of nodes in random client locations such that all clients obtain an effective SNR in the desired range. For each such topology, we measure the throughput obtained both with 802.11n and JMB. Since USRP2 cannot perform carrier sense due to software latency, we measure 802.11n throughput by scheduling each client so that it gets an equal share of the medium. We repeat the experiment for 20 different topologies and also vary the number of JMB APs for each SNR range.

**Results**. Figures 5(a), (b), and (c) show the total throughput obtained by 802.11n and by JMB for different numbers of APs and different SNR ranges. Note that, as one would expect, the obtained throughput increases with SNR (802.11n throughput at low SNR is 7.75 Mbps, at medium SNR is around 14.9 Mbps, and at high SNR is 23.6 Mbps). There are two main points worth noting:

- 802.11n cannot benefit from additional APs operating in the same channel, and allows only one AP to be active at any given time. As a result, its throughput stays constant even as the number of APs increases. This throughput might vary with the number of APs in a real 802.11n network due to increased contention; however, since USRPs do not have carrier sense, we compute 802.11n throughput by providing each client with an equal share of the medium. In contrast, with JMB, as we add more APs, JMB can use these APs to transmit concurrent packets to more receivers. As a result, the throughput of JMB increases linearly with the number of APs.
- The absolute gains provided by JMB are higher at high (~9.4× for 10 APs) and medium (~9.1×) SNRs than at low SNRs (~8.1×). This is a consequence of the theoretically predicted throughput of beamforming. In particular, the beamforming throughput with
*N*APs scales as*N*log() =*N*log(SNR)*N*log(*K*), where*K*depends on the channel matrix**H**and is related to how well conditioned it is.^{23}Natural channel matrices can be considered random and well conditioned, and hence*K*can essentially be treated as constant for our purposes. The 802.11n throughput scales roughly as log(SNR).^{23}The expected gain of JMB over 802.11n can therefore be written as*N*(1 ) and hence becomes closer to*N*as SNR increases. This is why JMB's gains at lower SNR grow at a lower rate than the gains at high SNR.

**11.2. Compatibility with 802.11**

Finally, as described in Section 6, JMB is compatible with existing 802.11n cards. In this section, we investigate whether JMB can deliver throughput gains when used with commodity 802.11n cards. Further, since each AP and each 802.11n card in this system has two antennas, this experiment also verifies that JMB can provide its expected gains with multi-antenna transmitters and receivers.

**Method**. We place two JMB nodes at random AP locations in the testbed and two 802.11n receivers at random client locations in the testbed. For each topology, we compute the throughput with 802.11n and with JMB. As before, we compute 802.11n throughput by giving each transmitter an equal share of the medium. We repeat the experiment across multiple topologies and the entire range of SNRs.

**Results**. Figure 6 shows the total throughput with and without JMB at high, medium, and low SNRs. Since we have two receivers in this experiment, the theoretical gain over 802.11n is 2×. The chart shows that JMB delivers an average gain of 1.671.83× across all SNR ranges. Similar to the case with USRP receivers, the gains in the high SNR regime are larger than the gains in the low SNR regime.

We now investigate JMB's fairness, that is, whether JMB can deliver its throughput gains for every receiver in the network across all locations and SNRs. Figure 7 shows the cumulative distribution function (CDF) of the throughput gain achieved by JMB as compared to 802.11n across all the runs. The results show that JMB delivers throughput gains between 1.652× for all the receivers and hence is fair to the receivers in the network.

This paper enables joint beamforming from distributed independent transmitters. The key challenge in delivering this system is to perform accurate phase synchronization across multiple distributed transmitters. The lessons learnt from building the system and testing it with real hardware are the following: (1) Estimates of frequency offset can be made accurate enough to predict (and hence correct) phase misalignment within an 802.11 packet; however, these estimates cannot be used across multiple packets due to large buildups in phase errors over time; and (2) Joint multiuser beamforming can be achieved by synchronizing the phases of all senders to one lead sender, and does not impose any phase synchronization constraints on the receivers.

We believe that the design of JMB has wider implications than explored in this paper. In particular, several areas of information theory such as lattice coding, noisy network coding, and transmitter cooperation for cognitive networks^{11,13,14} assume tight phase synchronization across transmitters. We are optimistic that the algorithms presented in this paper can bring these ideas closer to practice.

1. Aryafar, E., Anand, N., Salonidis, T., Knightly, E. Design and experimental evaluation of multi-user beamforming in wireless LANs. In *Proceedings of the 16th Annual International Conference on Mobile Computing and Networking* (MobiCom '10) (2010), ACM, New York, NY, 197208. doi:0.1145/1859995.1860019.

2. Berger, S., Wittneben, A. Carrier phase synchronization of multiple distributed nodes in a wireless network. In *Proceedings of 8th IEEE Workshop on Signal Processing Advances for Wireless Communications (SPAWC)*, (Helsinki, Finland Jun. 2007).

3. Distributed Antenna Systems. http://medicalconnectivity.com/2008/02/05/distributed-antenna-systems-no-replacement-for-wireless-strategy.

4. Forenza, A., Heath, R.W., Jr., Perlman, S.G.. *System and Method for Distributed Input-Distributed Output Wireless Communications*. U.S. Patent Application number 20090067402.

5. Goldsmith, A. *Wireless Communications*. Cambridge University Press, 2005.

6. Greentouch Consortium. https://www.youtube.com/watch?v=U3euDDr0uvo. GreenTouch Demonstrates Large-Scale Antenna.

7. Halperin, D., Hu, W., Sheth, A., Wetherall, D. Tool release: Gathering 802.11n traces with channel state information. *SIGCOMM Comput. Commun. Rev. 41*, 5353. doi:10.1145/1925861.1925870

8. Halperin, D., Hu, W., Sheth, A., Wetherall, D. Predictable 802.11 packet delivery from wireless channel measurements. In *ACM SIGCOMM* (2010).

9. Heiskala, J., Terry, J. *OFDM Wireless LANs: A Theoretical & Practical Guide*. Sams Publishing, 2001.

10. The iPad and its impact on hotel owners and operators. http://www.ibahn.com/en-us/public/docs/The_Impact_of_iPad.pdf. iBAHN.

11. Lim, S., Kim, Y., El Gamal, A., Chung, S. Noisy network coding. In *IEEE Information Theory Workshop* (2010).

12. LTE: MIMO techniques in 3GPP-LTE. http://lteportal.com/Files/MarketSpace/Download/130_LTEMIMOTechniquesFreescaleNov52008.pdf.

13. Maric, I., Liu, N., Goldsmith, A. Encoding against an interferer's codebook. In *Allerton* (2008).

14. Nazer, B., Gastpar, M. The case for structured random codes in network capacity theorems. *Eur. Trans. Telecommun. 19*, 4 (2008).

15. Network MIMO. http://www.alcatel-lucent.com/wps/DocumentStreamerServlet?LMSG_CABINET=Docs_and_Resource_Ctr&LMSG_CONTENT_FILE=Data_Sheets/Network_MIMO.pdf. Alcatel-Lucent.

16. Ozgur, A., Leveque, O., Tse, D. Hierarchical cooperation achieves optimal capacity scaling in ad hoc networks. *IEEE Trans. Info. Theor. 53*, 10 (Oct. 2007), 35493572. doi:10.1109/TIT.2007.905002.

17. Rahul, H., Hassanieh, H., Katabi, D. Sourcesync: A distributed wireless architecture for exploiting sender diversity. In *SIGCOMM* (2010).

18. Rahul, H., Kumar, S., Katabi, D. JMB: Scaling wireless capacity with user demands. In *ACM SIGCOMM 2012* (Helsinki, Finland, Aug. 2012).

19. Distributed-input distributed-output wireless technology. http://www.rearden.com/DIDO/DIDO_White_Paper_110727.pdf. Rearden Companies.

20. Mobile broadband capacity constraints and the need for optimization. http://rysavy.com/Articles/2010_02_Rysavy_Mobile_Broadband_Capacity_Constraints.pdf. Rysavy Research.

21. Tan, K., Liu, H., Fang, J., Wang, W., Zhang, J., Chen, M., Voelker, G.M. SAM: Enabling practical spatial multiple access in wireless LAN. In *MobiCom* (2009).

22. Thibault, I., Corazza, G., Deambrogio, L. Phase synchronization algorithms for distributed beamforming with time varying channels in wireless sensor networks. In *Wireless Communications and Mobile Computing Conference (IWCMC), 2011 7th International* (Jul. 2011), 7782.

23. Tse, D., Vishwanath, P. *Fundamentals of Wireless Communications*. Cambridge University Press, 2005.

24. Turn off your Wi-Fi network! http://www.youtube.com/watch?v=fFiJ5rnIPVw. Steve Jobs iPhone4 Keynote.

a. The APs also need to normalize **H**^{1} to respect power constraints, but we omit that detail for simplicity.

b. In fact, since the common design scenario for JMB is confined locations like conference rooms and auditoriums, the propagation delay differences between different APs to a receiver are in the tens of nanoseconds, which is smaller than the 802.11 cyclic prefix of 400 or 800 ns, which is designed for worst-case multipaths.

A full version of this paper was published in *Proceedings of ACM SIGCOMM 2012*, Helsinki, Finland.

Figure 1. Traditional vs. joint multiuser beamforming. (a) In a traditional multiuser beamforming system with multiple 2-antenna APs, only one AP can transmit on a given channel at any given time. This leads to a maximum of two simultaneous packet transmissions regardless of the total number of APs. (b) In contrast, JMB enables all APs to transmit on the same channel, allowing up to 2*N* simultaneous packet transmissions if there are *N* 2-antenna APs.

Figure 2. Channel matrix with two APs transmitting to two clients.

Figure 3. Packet structure from the perspective of APs and the receiver. Symbols in blue are transmitted by the lead AP, symbols in red by the slave AP, and symbols in white reflect silence periods.

Figure 4. 802.11n channel measurement. JMB measures channels to 802.11n clients by sending a series of two-stream transmissions. Every transmission includes the reference antenna, *L*_{1}, as well as one other antenna (either *L*_{2} or *S*_{1} in our example). For clarity, the figure does not show the transmissions to/from *R*_{2} and *S*_{2}, but JMB naturally measures the channels to *R*_{2} simultaneously.

Figure 5. Scaling of throughput with the number of APs. In this experiment, the number of APs equals the number of receivers. At all SNRs, JMB's network throughput increases linearly with the number of APs while total 802.11 throughput remains constant. (a) High SNR (>18 dB); (b) medium SNR (1218 dB); (c) low SNR (612 dB).

Figure 6. Throughput achieved using JMB on off-the-shelf 802.11n cards. JMB significantly improves the performance of off-the-shelf 802.11n cards at high (>18 dB), medium (1218 dB), and low (612 dB) SNRs.

Figure 7. Fairness results. For all nodes in our testbed, JMB delivers a throughput gain between 1.652×, with a median gain of 1.8× across SNRs. This shows that JMB provides similar throughput gains for every node in the network.

**©2014 ACM 0001-0782/14/07**

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.

No entries found