Article
Designing A DSP Broadband Modem
for Mesh Radio Networks
Article pushlished in EETimes (USA),
10 September 2002
David Spreadbury, Principal Technology
Consultant at Plextek Ltd
Recently
the engineering team at Plextek was involved in the initial design and construction
of a broadband wireless access system for Radiant Networks plc (Essex, England),
which is a mesh of point-to-point microwave links in the 28GHz band. With
paths of variable length and quality, each modem is designed to support four
bi-directional Time-Division Multiple-Access (TDMA) channels at 100Mb/s if
possible, with fallback rates of 50Mb/s and 25Mb/s if conditions demand, using
adaptive equalisation and modulation. In addition there are mesh-specific
operations like antenna steering, exploration, node discovery and link formation
and maintenance. Statistics are also gathered to support mesh management.
The system design phase explored
a range of installation, operating and management scenarios, resulting in
high-level operational and technical requirements. The modem described here
was conceived as a system component with convenient electrical and mechanical
boundaries. It had to be fully functional but flexible, and developed within
time and cost constraints. The uniqueness of mesh radio required additional
monitoring and control features, which suggested that the development would
not be without technical risk. Both raw signal processing capacity and significant
flexibility and intelligence were required, in a limited space. The tasks
range from demodulating a received IF signal near 50MHz to controlling the
drive to motors which steer the microwave horns! Any practical solution would
have to contain hardware and software.
Plextek have considerable experience
of implementing Digital Signal Processing (DSP) in both areas. For this application
a high performance fixed-point Texas Instruments' TMS320C6201 DSP processor
was selected to provided to run C code at 1600MIPs, and a pair of Xilinx
XCV400
Field-Programmable Gate Arrays (FPGAs) were selected to complement this with
about a million "system gates". The logic design would be entered
in VHDL. This is a powerful route which, although perhaps lacking in circuit
visualisation, allows comprehensive synthesis and placement with timing constraints
and efficient simulation.
Constructed on a single circuit
board of approximate A5 size, the board also includes 200Msps Analogue-to-Digital
and Digital-to-Analogue Converters (ADCs and DACs) for interfacing directly
to the radio at Intermediate Frequency (IF), and a pair of 100Msps DACs for
debugging/diagnostics (e.g. displaying a constellation on an oscilloscope).
Lower-level functions to be performed
include:
- Receive In-phase and Quadrature
(I and Q) sampling
- Root Raised Cosine (RRC) Channel
filtering
- Correlation
- Frame, slot and symbol timing
extraction
- Half-symbol spaced ("T/2")
equalisation and demodulation
- Carrier phase and frequency
offset tracking
- Scrambling, and Hamming & Reed-Solomon Forward Error Codes (FECs)
- Automatic Gain Control (AGC)
and power control
- Transmit RRC IQ and IF synthesis
- Quadrature Phase-shift Keying
(QPSK) and Quadrature Amplitude Modulation (QAM 16 and QAM64) operation
depending on link conditions
Detailed design required each function
to be allocated to hardware, software, or a mix of the two, at an early stage,
with well-defined interfaces. Both contribute necessary resources but neither
is adequate on its own.
Simplicity reigns
Perhaps contrary to modern trends,
simplicity is the key. Although device performance is ever increasing (which
has come to the rescue of some projects!), for a given technology a simpler
design will always be cheaper, faster, and consume less power. Here are a
few areas where simplicity can be won or lost:
- Each resource should be used
for functions that make the best use of its strengths, e.g. can be readily
implemented using native primitive operations. Hardware is perfect for deterministic,
repetitive and logical processes at continuous high speed. Software provides
much more decision-making capabilities and flexibility.
- Careful frequency planning is
important: integer-related IF, symbol, logic and sampling frequencies can
greatly reduce the complexity of key operations like IQ sampling and mixing
to and from the IF frequency. Here, the IF frequency was twice the symbol
frequency, and the sampling frequency was eight times the symbol frequency.
This allowed direct IQ subsampling and very simple mixing to and from IF
by assigning samples to I, Q, -I and -Q in sequence. Non-integer relation
can be accommodated through the use of numerical oscillators, CORDIC processors,
and the like. These are extremely useful, but consume silicon area, power,
and time, and should only be used when careful system design confirms their
necessity.
- The use of a common clock for
the logic and DSP processor makes communication between the two faster,
more deterministic, and reliable.
- For processing-intensive functions
(e.g. correlation and equalisation), pipelining and/or parallelism are key
hardware advantages. The silicon efficiency of fixed-architecture hardware
is highest when running the logic at its maximum speed.
- Where coefficients have constant
or trivial values and where there is an advantage in not using a multiplier
or adder, the rearrangement of some algorithms at bit level can reduce complexity.
This applies equally well to data, e.g. when constructing complex modulation
from symbol values. Rather than building a complete Finite Impulse Response
(FIR) filter, a look-up table (LUT) can be used. The table length is given
by the product of bits per symbol, samples per symbol, and the extent (in
symbols) of the filter transfer function. This approach is fast, readily
adaptable (by changing the table entries or using multiple banks), and handles
a wide range of linear and non- symbol-based modulation schemes. This method
is used in the Radiant modem, with a 64-entry table for each modulation
type, incorporating the necessary scaling to give the required transmit
power in each case. Note that this approach is also particularly efficient
when used with FPGAs that employ LUTs in their internal construction.
- High level tools should be used
with due consideration of the target architecture - portability usually
comes at a price. One can describe processes in C or VHDL that read elegantly
but give rise to cumbersome implementations. Tools can sanitise design entry,
but they should not be expected to replace design effort.
Traditionally, a hardware front
end runs continuously at the ADC sampling rate to process the incoming signal,
implementing a complete signal path which feeds partially processed results
(e.g. symbols or bits) on to the DSP. This can be inflexible, however, and
certainly would not be adequate here because of the many modes of operation
required, many responsive to "live" conditions. The final solution
employed in the modem was therefor to use the hardware in two complementary
ways:
- Functions which are intimately
related to time-critical signals (e.g. front and back end sampling and filtering,
frame timing) run continuously in hardware to provide truly deterministic
operation. These functions tend to transfer data to and from real-time memory,
mapping time to signal memory location and thereby relieving other functions
of this burden (for example, the hardware correlator, which is used to find
precise symbol positioning, informs the processor where in memory to initiate
a demodulation process, not when to do it).
- The remaining functions interact
with the software through control, status and data registers, mapped into
the processor address space. The trick is to design in flexibility through
the use of coefficients, parameters and flags, with an adequate repertoire
that does not compromise performance. Such functions operate as peripherals,
hardware accelerators or "toolboxes" for the software, allowing
their intelligent and flexible use, so that the software is free to construct
and adapt the overall processing algorithm as necessary. It benefits both
from the speed and parallelism of the hardware (both between different
hardware
functions and with the software) and a simple interface mechanism, so that
enough processing capacity remains to handle those tasks which are performed
in software. Even if the processor takes a large fraction of a microsecond
to get round to a task, as long each task is completed in adequate time
all is well. Run time concerns have been migrated from symbol to slot levels,
which is far more appropriate.
The end application may be unusual,
but this basic approach is applicable to any high-performance DSP-based Wireless
product. With care, "the best of both worlds" can be achieved,
as long as sufficient effort is given to task partitioning and interfacing
-
both between the hardware and software, and the logic designers and programmers!