A 128-Point FFT/IFFT Processor for MIMO-OFDM Transceivers – a Broader Survey

— The Fast Fourier transformation (FFT) is being employed in the algorithms of digital signal processing and in communication systems for several decades. FFT and IFFT algorithms simplify the Multiple Input and Multiple Output (MIMO) and Orthogonal Frequency Division multiplexing (OFDM) architecture. There are several variations in the FFT architecture, which are being employed in MIMO-OFDM transceivers and are discussed in this paper. It is been evident that SDF FFT and MDC FFT pipelined architectures provide better advantage in terms of hardware area, power and throughput requirements along with SQNR. In this work, it is seen that the proposed mixed radix algorithm uses only less number of logic gates, adders and counter. This mixed pipeline SDF-MDC architecture provides three times better area optimization than earlier implementations.

of the FFT processor, pipelined architectures turned out to be of greater significance. Pipelined FFT processors are computationally efficient in hardware. These processors are capable of processing an uninterrupted stream of input data samples while producing a stream of output data samples at a matching rate.
There are two main types of FFT pipelined architectures, which are: feedback (FB) and feed-forward (FF). Feedback architectures are characterized by their feedback loops, i.e., some outputs of the butterflies are fed back to the memories at the same stage. Feedback architectures can be divided into Single-path Delay Feedback (SDF), which process a continuous flow of one sample per clock cycle, and Multipath Delay Feedback (MDF) or parallel feedback, which process several samples in parallel. On the other hand, feed-forward architectures, Single-path Delay Commutator (SDC) and Multipath Delay Commutator (MDC), do not have feedback loops and each stage passes the processed data to the next stage. These architectures can also process several samples in parallel.
In Single path delay feedback (SDF) architectures, a single data stream goes through multiplier in every stage. The delay units are more efficiently utilized by sharing the same storage between the inputs and outputs of the butterfly. Single path delay feedback reduces amount of multipliers but it complicates the control mechanism and uses more memory resources. The SDF FFT architectures are always optimal for long FFT instances. Multipath Delay Feedback (MDF) FFT is better than conventional serial FFT architecture such as single-path delay feedback (SDF) architecture to get higher symbol rate.
In Multipath Delay Commutator (MDC) FFT, input sequence is first divided into multiple parallel data streams by commutator and then, butterfly operation followed by twiddle factor multiplication is performed with proper delays to each data stream. MDC makes the feedback paths in to feed forward streams using switch boxes along with memory. It saves more area because of lesser number of complex multipliers and efficiently uses butterfly units. Single Path Delay Commutator (SDC) FFT architecture is based on MDC architecture in which each stage produces single output rather than multiple. Each stage has one complex multiplier, a delay commutator to correct the order of data and a butterfly element.
The mixed-radix algorithm is based on highly optimized small length FFTs that are combined to create larger FFTs. The mixed-radix functions work for FFTs of any length. The split-radix FFT algorithm that uses a blend of radices 2 and 4: it recursively expresses a DFT of length N in terms of one smaller DFT of length N/2 and two smaller DFTs of length N/4. The split-radix algorithm can only be applied when N is a multiple of 4, but since it breaks a DFT into smaller DFTs it can be combined with any other FFT algorithm as desired. The computing cost of the FFT algorithm can be reduced by using split-radix algorithm. The MRMDF structure has a difficulty in meeting the requirement of transmission rate and the power consumption.

III. FFT PROCESSOR IMPLEMENTATIONS IN MIMO-OFDM APPLICATIONS
A dynamic voltage and frequency scaling (DVFS) FFT processor for MIMO-OFDM applications has been presented in this [1] paper. Both voltage and frequency can be scaled to optimal values in real time according to the processing needs in the dynamic voltage and frequency scaling (DVFS) technique. A multimode multipathdelay-feedback (MMDF) architecture has been proposed for the FFT processor, which can process 1-8-stream 256-point FFTs or a high-speed 256-point FFT in two processing domains at minimum clock frequency for DVFS operations. A parallelized radix-2 4 FFT algorithm and scheduling techniques are employed to reduce the number of complex multipliers and hence to save the power consumption and hardware cost through complex multipliers. High throughput rate up to 8-stream 300-Msample/s or 2.4-Gsample/s computations was obtained by using parallel-8 data-paths.
This [11] paper presents a 2048 fast Fourier transform (FFT) processor that provides high throughput rate by applying the eight-data-path pipelined approach together with a hardware reduction method and a multi-data scaling scheme for wireless personal area network applications. The hardware costs, including the power consumption and area, increase due to multiple data paths and increased word length along stages. To resolve these, a simplification method to reduce the hardware cost in multiplication units of the multiple-path FFT approach was proposed. A multi-data scaling scheme in which mantissa and exponent part are handled in separate paths to reduce word lengths while preserving the signal-to-quantization-noise ratio is also presented. The mantissa data are operated by eight data paths, and the exponent is operated by one data path.
A multimode FFT processor for wireless personal area network (WPAN), wireless local area network (WLAN), and wireless metropolitan area network (WMAN) applications has been presented in this [12] paper. Using the proposed flexible-radix-configuration multipath-delay-feedback (FRCMDF) architecture, variablelength/multiple-stream FFTs capable of achieving a high throughput can be performed in a hardware-efficient manner. In this paper, a dual-optimized multipath multiplication scheme is proposed in order to improve the area and energy efficiency associated with the multiple-path multiplier units for high-throughput FFT designs. The FFT processor supports high-throughput 128/256/512-point FFTs for WPAN, 1-to 4-stream 64/128-point FFTs for WLAN, and 128-to 1024-point FFTs for WMAN is proposed.
FIFO register and complex multipliers dominate the area and power consumption in FFT processor at each stage. This paper [9] proposes an 8-path feedback structure (8PFB) for the FFT processor to get high throughput, low hardware cost and low power consumption, which is, implemented for OFDM-based Ultra Wideband (UWB) communication systems. The 8PFB structure can halve the register reverse frequency in the MRMDF structure. The combination of two 64-points and one 128-point FFT has been achieved through 8×8×2 mixedradix arithmetic. The 8-path feedback parallel structure has enabled a 1GS/s throughput at a comparatively low clock frequency of 125MHz, so it saves a large amount of power dissipation without the expense of the signal processing ability. The modified shift-add algorithm used here can remove complex multipliers in the FFT processor.
In this paper [2], a modified radix-2 5 algorithm for 512-point Fast Fourier Transform computation and highspeed eight-parallel data-path architecture for multi-gigabit wireless personal area network (WPAN) systems was presented. The FFT processor can provide a high data throughput and low hardware complexity by using eight-parallel data-path and multi-path delay feedback (MDF) structure. It reduces the number of complex multiplications by the use of Booth multiplier and twiddle factor look-up tables. Multi-path delay-feedback (MDF) architecture based on multi-path parallel structure is used for this system because in a gigabit WPAN system, the symbol rate is required up to 2.5 GS/s.
Earlier to this paper [3], radix-2 k was only proposed for Single-path Delay Feedback (SDF) architectures, but not for feed-forward ones, which is called as Multipath Delay Commutator (MDC). This paper presents the radix-2 k feed-forward (MDC) FFT architectures. In feed-forward architectures, radix-2 k can be used for any number of parallel samples which is a power of two. Furthermore, both Decimation in Frequency (DIF) and Decimation in Time (DIT) decompositions can be used using this architecture. The paper proposed designs include radix-2 2 , radix-2 3 and radix-2 4 architectures.
A 128 to 2048-point variable length FFT processor for 4×4 MIMO-OFDM systems with 256-point FFT algorithm as the basic FFT core has been presented in this paper [6]. Here radix-4 2 algorithm is to deal with four data sequences simultaneously and a butterfly sharing technique to improve the hardware utilization. The operating time of the radix-r butterfly is 1/r, and the butterfly is idle for the rest of cycles. The FFT architecture that uses multiple data paths and feedback memory achieve high data throughput with less hardware complexity.
Here the memory allocation method has been modified and proposed a Butterfly and Multiplier Sharing (BMS) architecture.
A 128/64 point fast Fourier transform (FFT)/ inverse FFT (IFFT) processor for the applications in a multipleinput multiple-output orthogonal frequency-division multiplexing based IEEE 802.11n wireless local area network baseband processor has been presented in this paper [7]. The unfolding mixed-radix multipath delay feedback (MDF) FFT architecture is proposed to efficiently deal with 1-4 simultaneous data sequences. Power consumption can be saved by using higher radix FFT algorithm thus a three-step radix-8 FFT algorithm is chosen to save complex multiplications. The mixed-radix multipath delay feedback (MRMDF) FFT architecture can provide higher throughput rate with minimal hardware cost by combining the features of MDC and SDF. The hardware costs of memory and complex multiplier were saved by adopting delay feedback and data scheduling approaches.
This paper [8] presents a high-speed, low complexity 128/64-point radix-2 4 FFT/IFFT processor for the MIMO-OFDM systems. The high radix multi-path delay feedback (MDF) FFT architecture provides a higher throughput rate and low hardware complexity by using a four-parallel data-path scheme. This radix-2 4 MDF (R2 4 MDF) architecture can provide higher throughput rate with minimal hardware cost by combining the features of MDC and SDF by using complex Booth multipliers with a Dadda reduction network, which maintains the input and output at 10-bit width at 30dB SQNR and CSD complex constant multipliers. The radix-2 4 FFT algorithm has fewer multipliers than other schemes of lower radix FFT algorithm and can reduce the degree of multiplicative complexity efficiently.
The organization of FFT processor is based on the Single-path Delay Feedback (SDF) scheme and it computes mixed radix FFT algorithms with radixes 2, 2 2 , 2 3 and 2 4 in this paper [10]. The proposed SDF is able to execute FFT of size varying from 128 to 2048 in continuous-flow by exploiting the memory of each stage for efficiently storing the elements of any FFT frame-size. The design handles FFT size variation without requiring additional buffers and/or idle time for reconfiguration, while it keeps the complexity and the memory size comparable to that of the radix-2 SDF for 2048 points. In the case of computing FFT with length less than 2048 points the proposed technique bypasses the processing, which are not required in the computation but it exploits their local memory to pipeline contiguous FFT frames of either the same or different size.

IV. DESIGN OF FFT PROCESSOR
In the previous section, there has been many FFT/IFFT processor or processing elements used. They use any one of the pipelined FFT architectures to reduce the area used in the chip. In those implementations, SDF, MDC and MDF architectures are mainly used. In this work, we would like to propose a mixed architecture using SDF and MDC pipelined FFT architectures. Also a mixed radix FFT is being used to improvise the performance.
The general expressions for discrete Fourier transformation (DFT) of N sub-carriers are defined as follows: Here x(n)and X(k) are complex numbers. The k is the frequency index and n is the time index. W N is twiddle factor which denotes the N-th primitive root of unity with its exponent evaluated modulo N. Twiddle factor is defined as, = = cos − sin Applying eq (5), eq (6) in eq (1),    That is, 2x2x2 x 4x4 structure. Thus it is divided into two stages. First stage consists of the Radix-2 DFTs with SDF architecture and second stage has Radix-4 DFT with MDC architecture. Fig. 1 shows the mixed pipeline SDF-MDC architecture for the 128 point FFT.
First stage consists of three Radix-2 butterfly units followed by twiddle factor multiplication units. First stage of FFT is shown in Fig. 2. Second stage has two butterfly units of Radix-4, one twiddle factor multiplication unit and two commutator units. Second stage is presented in Fig. 3. Twiddle factors used here are from Radix-2, Radix-4 and Radix-128 FFT. These twiddle factor values are stored in look up tables and are used in the design. Traditional butterfly units are used for both SDF and MDC stage. Commutators are used to align the real and imaginary data properly.

V. VLSI IMPLEMENTATION OF FFT/IFFT PROCESSOR IN MIMO-OFDM TRANSCEIVER
Various FFT/IFFT processors have been implemented in software platform. In the past decade, hardware implementation of FFT/IFFT processors became a significant growth. The FFT/IFFT processors have been implemented in chip level and their area, power and performance has been discussed in the papers [1], [11] and [12]. The paper [10] concentrates on the development of 32-bit Fast Fourier Transform (FFT) using Radix-2 algorithm, based on Decimation-In-Time (DIT) domain, in Verilog Hardware Description Language and realization of this on Xilinx FPGA chip. The paper [11] presents a real-time FPGA prototype for a 4-stream MIMO-OFDM transceiver capable of transmitting 216Mbit/s in 20MHz bandwidth. The 128 point mixed radix 2/4 pipeline based SDF-MDC FFT/IFFT processor has been implemented in Xilinx Plan-ahead FPGA. Its area analysis has been mentioned in Table 1 along with other 128-point FFT implementation comparison. From the table its seen that the slice register usage has been reduced by three times compared to the normal 128 point FFT and twice its been reduced in mixed radix case. Look up tables (LUT) usage has been reduced considerably. Counters and adder trees have been used very less compared to the previous work.
From the observations, it is seen that the proposed mixed radix algorithm uses only less number of logic gates, adders and counter. Thus, the proposed algorithm gives better area optimization. It's two to three times better than the earlier works.
VI. CONCLUSION From this analysis, it is evident that MDC FFT architecture has promising features for the implementation of FFT/IFFT processors in MIMO-OFDM transceiver. Simultaneously, SDF architecture offers efficient use of delay units and lesser number multipliers. By using both SDF and MDC FFT algorithm, better performance can be obtained in FPGA implementation. From this work, we can summaries mixed radix FFT algorithms along with SDF and MDC FFT offers greater advantage compared to traditional FFT implementations. It is seen that the proposed mixed radix algorithm uses only less number of logic gates, adders and counter and thus three times better than earlier implementations. Further the work can be extended to optimize on the multipliers to get further reduction in area utilization.