# Low Power Complex Multiplier based FFT Processor

V.Sarada<sup>1</sup>, Dr.T.Vigneswaran<sup>2</sup> <sup>1</sup>ECE, SRM University, Chennai,India saradasaran@gmail.com <sup>2</sup>ECE, VIT University, Chennai,India vigneshvlsi@gmail.com

*Abstract*- High speed processing of signals has led to the requirement of very high speed conversion of signals from time domain to frequency domain. Recent years there has been increasing demand for low power designs in the field of Digital signal processing. Power consumption is the most important aspect while considering the system performance. In order to design high performance Fast Fourier Transform (FFT) and realization, efficient internal structure is required. In this paper we present FFT Single Path Delay feedback (SDF) pipeline architecture using radix -2<sup>4</sup> algorithm .The complex multiplier is realized by using Digit Slicing Concept multiplier less architecture. To reduce computation complexity radix 2<sup>4</sup> algorithms is used. The proposed design has been coded in Verilog HDL and synthesizes by Cadence tool. The result demonstrates that the power is reduced compared with complex multiplication used CSD (Canonic Signed Digit) multiplier.

Keyword- Fast Fourier Transform, Multiplier less multiplier, radix 2<sup>4</sup>

## 1. INTRODUCTION

The Fast Fourier Transform and Inverse Fast Fourier Transform (IFFT) have played important roles in communication applications. The demand for variable length, high speed and low power FFT has increased, especially Orthogonal Frequency Division Multiplexing (OFDM) application such as 3GPP, LTE(Long Term Evaluation), WiMax(Worldwide Interoperability for Microwave Access).There are various architecture to implement FFT such as memory based, pipeline based. In memory based architecture hardware cost and power consumption are lower but has long latency.In pipeline based architecture are acceptable hardware cost, high throughput and low latency .Mostly Preferred design in pipelined architecture are SDF[1] and Multipath Delay Commutator(MDC).The SDF is good ,it require less memory space about(N-1) delay element and a multiplication computation less than 50 % also design control unit is simple based on these reason SDF Architecture is adopted in this paper. Various FFT algorithm start with Cooley-Tukey algorithm can reduce computation complexity from  $O(N^2)$  to  $O(N \log_2 N)$ . Higher radix FFT algorithm reduces computation complexity for the algorithm makes it suitable for VLSI implementation. In this paper we propose radix 2<sup>4</sup> algorithm, SDF pipeline Architecture based FFT Processor with Digit slicing multiplier for complex multiplication [3].

This paper is organized as follows; Section II explains the FFT algorithms, section III shows FFT architecture, Section IV complex constant multiplier. In section V the proposed design are compared with existing one. Finally the contribution of this work is summarized.

### II. FFT ALGORITHMS

An N-point DFT of a sequence x[n] is defined as

$$X(k) = \sum_{n=0}^{N-1} x(n) W_{N}^{nk}, k = 0, 1, ..., N - 1$$

$$w_{N}^{nK} = e^{-j2\Pi nk/N} = \cos(\frac{2\Pi nk}{N}) - j\sin(\frac{2\Pi nk}{N})$$
(1)

Where k is frequency index, n is time index and the twiddle factor  $W_N^{nk}$  are the complex roots of unity, they are symmetric and equally spaced on the unity circle.

The straight forward implementation of this algorithm is impractical due to high hardware requirement. Therefore FFT was developed [2] to reduce computation time and hardware cost. Our work uses DIF (Decimation In Frequency) decomposition because it facilitate SDF. The radix- $2^{k}$  algorithm has the same butterfly structure regardless of k value. The radix  $2^{4}$  was formulated [1] using K=4 dimension linear index mapping. Radix  $2^{4}$  algorithm can be expressed as various formula using Common factor algorithm. The Radix  $2^{4}$  algorithm is given as follows.

Applying a 5 dimensional linear index mapping

 $X(k_{1}+2k_{2}+4k_{3}+8k_{4}+16k_{5}) = \sum_{n_{3}=0n_{4}=0n_{3}=0n_{2}=0n_{1}=0}^{\frac{N}{16}-1} \sum_{n_{3}=1}^{1} \sum_{n_{3}=0n_{4}=0n_{3}=0n_{2}=0n_{1}=0}^{1} x(\frac{N}{2}n_{1}+\frac{N}{4}n_{2}+\frac{N}{8}n_{3}+\frac{N}{16}n_{4}+n_{5})W_{N}^{nk}$ (2) With the cascade decomposition the twiddle factor can be expressed in the form

$$W_{N}^{nk} = \{(-1)^{n_{1}k_{1}}(-j)^{n_{2}(k_{1}+2k_{2})}W_{8}^{n_{3}(k_{1}+2k_{2}+4k_{3})} \\ W_{16}^{n_{4}(k_{1}+2k_{2}+4k_{3}+8k_{4})}\}W_{N}^{n_{5}(k_{1}+2k_{2}+4k_{3}+8k_{4})}W_{N/16}^{n_{5}k_{5}}$$
(3)

The signal flow graph for 16 point radix  $2^4$  algorithm is shown in Fig. 1



Fig 1.Signa flow graph for 16 point radix 2<sup>4</sup> algorithm

# III. FFT ARCHITECTURE

There are many pipeline architecture to design FFT Processor for reducing complexity, power and overall to improve the performance. Among the structure three kind of pipeline architecture are used such as Multipath Delay Commutator(MDC),Single path delay Commutator(SDC) and Single path Delay feedback(SDF).In proposed FFT SDF architecture hardware and control circuit complexity is reduced. The proposed FFT processor shown in Fig.2. The hardware requirements of different pipelined SDF architecture are listed in Table 1.

TABLE 1. Comparison of Hardware requirement for N-length with different SDF architecture

|                     | Complex               | Complex     | Register | Contro  |
|---------------------|-----------------------|-------------|----------|---------|
|                     | multiplier            | adder       |          | 1       |
|                     |                       |             |          | circuit |
| Radix2              | $2(Log_4N-1)$         | $4(Log_4N)$ | N-1      | Simple  |
| SDF                 |                       |             |          |         |
| Radix2 <sup>2</sup> | Log <sub>4</sub> N-1  | $4(Log_4N)$ | N-1      | Simple  |
| [1]SDF              | -                     | _           |          | _       |
| Radix2 <sup>3</sup> | Log <sub>8</sub> N-1  | $4(Log_4N)$ | N-1      | Simple  |
| [1]SDF              | -                     | _           |          | _       |
| Radix2 <sup>4</sup> | Log <sub>16</sub> N-1 | $4(Log_4N)$ | N-1      | Simple  |
| SDF                 |                       |             |          | _       |



Fig. 2 16 point radix 2<sup>4</sup> SDF architecture

It is composed of Radix-2 butterfly unit, Delay Buffer (FIFO) for support time multiplexing,-j for non-trivial multiplication, complex multiplier, and a control unit.

The Dual port FIFO function as shift register, it is used to accept data from butterfly unit and then shift left and feed back to the same butterfly unit.

The operation of radix-2 butterfly is shown in Fig 3



Fig. 3 Radix-2 butterfly

#### **IV. COMPLEX MULTIPLIER**

The complex multiplication is regarded as most important for FFT operation as it affects the speed performance of the digital signal processing system.

The complex multiplication with three multiplier is given by expression (4) and shown in Fig. 4

 $(Ar+jAi)^{*}(Br+jBi) = \{Br(Ar-Ai) + Ai(Br-Bj)\} + j\{Bi(Ar+Ai) + Ai(Br-Bj)\}$ 



Fig. 4 Complex multiplication

In order to reduce the complexity of computation and to enhance the throughput of complex multiplication the digit slicing concept is applied to the multiplier.

#### A. Digit Slicing multiplier

The concept behind the digit slicing architecture is any binary number can be sliced into a few blocks of shorter binary numbers, each having a shorter word length. [4] Depending on the word length, the fundamental sliced algorithm is applied to the binary number.

$$F=F_{R}+jF_{I}$$

$$F=\sum_{k=0}^{b-1} (2^{p-1}) FR_{k} + j \sum_{k=0}^{b-1} (2^{p-1}) FI_{k}$$
Where  $F_{Rk}=\sum_{j=0}^{p-1} 2^{j} F_{Rk,j}$  and  $F_{Ik}=\sum_{j=0}^{p-1} 2^{j} F_{Ik,j}$ 
(5)

(4)

In this equation  $FR_k$  and  $FI_k$  have values which are either zero or one. Any value whose absolute value is less than one can be represented in two's complement as  $x = [\sum_{k=0}^{b-1} 2^{pk} X_k] 2^{-1(pb-1)}$ . Here x is any number with an absolute value less than one and x is sliced into b blocks, each block being p bits wide  $X_k = \sum_{j=0}^{p-1} 2^j X_{k,j}$ 

For example X=A\*B In this multiplication one of the operand (A) divided into four parts as shown in Fig. 5



Fig. 5 Digit slice

A divided into four parts Such as

 $part1 = A_3A_2A_1A_0$   $part2 = A_7A_6A_5A_4,$   $part3 = A_{11}A_{10}A_9A_8$  $part4 = A_{15}A_{14}A_{13}A_{12}.$ 

There are four different cases for the multiplication between the four bits and the twiddle factors. Fig 6 shows the block diagram of the digit-slicing multiplier less using the shift and addition technique. Shift-and-add multiplication is similar to the multiplication performed by paper and pencil.

$$\begin{split} &K_0 = (A_3 A_2 A_1 A_0)^* B, \\ &K_1 = (A_7 A_6 A_5 A_4)^* B, \\ &K_2 = (A_{11} A_{10} A_9 A_8)^* B, \\ &K_3 = (A_{15} A_{14} A_{13} A_{12})^* B \\ &X = A^* B = K_0 + 2^4 K_1 + 2^8 K_2 + 2^{12} K_3 \end{split}$$

(7)

|                                                        | A, 'B                       | ~·                          |
|--------------------------------------------------------|-----------------------------|-----------------------------|
| A.0                                                    | A, 'B                       | · · n                       |
| + + + + + + + + +                                      |                             |                             |
| Ass                |                             | Aus Aus Aus Aus             |
|                                                        |                             | ***                         |
| B++3 B++2 B++1 B 0 B++3 B++2 B++1 B 0 B+               | Head Beer Beet H O          | B-+3 B++2 B++1 B            |
| Adder Adder                                            | Adder                       | Adder                       |
| Arithmetic left shift by 3 Arithmetic right shift by 1 | Arithmetic right shift by 5 | Arithmetic right shift by 9 |
|                                                        | +                           | +                           |
| Adder                                                  |                             |                             |

Fig. 6 Complex multiplication using Digit slicing

### B. CSD Multiplier

The Canonic signed digit (CSD) representation to realize the function of complex multiplier in non-trivial multiplication. The complex multiplier performs the computation of a constant factor and a non-constant number. The real and imaginary parts of the constant factor are represented in CSD form. The corresponding parts of the non-constant factor are represented in two's complement form. This method has potential advantage of easier circuit structures than the conventional complex multiplier.

| Coefficient | 16 Bit Expression |                    |  |  |
|-------------|-------------------|--------------------|--|--|
| Value       | Binary            | CSD                |  |  |
| 0.923828    | 0111011001000000  | 100ī 10 ī          |  |  |
|             |                   | 001000000          |  |  |
| 0.707092    | 0101101010000010  | 0110 ī             |  |  |
|             |                   | 01010000010        |  |  |
| 0.382629    | 0011000100000110  | 010 ī 0001000010 ī |  |  |
|             |                   | 0                  |  |  |

TABLE 2. Represent of binary and CSD

The CSD unit consists of many shifters and adders. It is used to realize the multiplication when the one factor is constant. The CSD unit is used to construct number that contains the minimum possible number of non-zero bits.



Fig. 7. CSD unit to realize the multiplication of 0.707092

0.707092= [(x>>1 + x>>2) - (x>>4+ x>>6 + x>>8)]

(8)

Where x denotes input data, can be shared by two or more constant multiplications. The values in the diamonds represent the times of right shift. So this realization only needs four adders/subtractors [5].

### V. RESULT

The architecture of the proposed FFT processor was designed in Verilog and simulated to verify its functionality. The simulation and synthesis were performed using the cadence design tool 180 nm CMOS Technology. Table 3 shows the performance comparison between the proposed and FFT processor using existing CSD multiplier based FFT. The results shows the proposed FFT processor obtain low power when compared with CSD based processor with little increase in area.

| Generated by: Enc         | ounter(R) RTL Compiler v11.20-s017_1               |
|---------------------------|----------------------------------------------------|
| Generated on:             | Mar 24 2015 03:10:50 pm                            |
| Module:                   | sixteenpointfft_ver2                               |
| Technology library:       | tsmc18 1.0                                         |
| Operating conditions:     | slow (balanced_tree)                               |
| Wireload mode:            | enclosed                                           |
| Area mode:                | timing library                                     |
|                           | Leakage Dynamic Total                              |
| Instance Cel              | ls Power(nW) Power(nW) Power(nW)                   |
| sixteenpointfft_ver2 389  | 2 2895.751 6694217.130 6697112.881                 |
| Generated by:             | Encounter(R) RTL Compiler v11.20-s017 1            |
| Generated on:             | Mar 24 2015 02:58:24 pm                            |
| Module:                   | sixteenpointfft_digislice                          |
| Technology library:       | tsmc18 1.0                                         |
| Operating conditions:     | slow (balanced_tree)                               |
| Wireload mode:            | enclosed                                           |
| Area mode:                | timing library                                     |
| Instance                  | Leakage Dynamic Total<br>Cells Power(nW) Power(nW) |
| sixteenpointfft_digislice | 4022 4036.295 4599368.597 4603404.892              |

|                   |     | Word   | Power | Frequency | Area(No. | slice |
|-------------------|-----|--------|-------|-----------|----------|-------|
|                   |     | length | (mW)  | (MHz)     | used)    |       |
| FFT(using C       | CSD | 16     | 6.69  | 166.6     | 3892     |       |
| multiplier based) |     |        |       |           |          |       |
| Proposed          |     | 16     | 4.60  | 166.6     | 4022     |       |

TABLE 3. Power and area Analysis

## VI. CONCLUSION

In this paper, low power 16 point FFT processor using Digit slice based multiplier less multiplier has been designed and 16 point FFT processor using CSD multiplier has also been designed. The result shows that our design lowers hardware cost and power consumption. Our proposed scheme can also be adapted to high point FFT Fixed and variable FFT processor which is used in various OFDM based communication.

#### ACKNOWLEDGEMENT

The author would like to thank SRM University Research lab for supporting this work.

## REFERENCES

- [1] S.He and M.Torkelson,"Designing pipeline FFT processor for OFDM (de) modulation, vol 29, Oct 1998 pp 257-262.
- J.W. Cooley and J.W.Tukey,"An algorithm for the machine calculation and complex Fourier series", Math Comp, vol 19 pp 297-301 Apr. 1965
- [3] Yazan Samir Algnabi, FuratA.Aldaamee Rozita Teymourzadeh, Masuri Othman, Md Shabiul Islam," Novel Architecture of Pipeline Radix 2<sup>2</sup> SDF FFT Based on Digit-Slicing Technique" IEEE-ICSE2012 Proc., 2012, Kuala Lumpur, Malaysia pp 470-474.
- [4] Jung-yeol and Myoung-seob LIM,"New Radix-2 to 4<sup>th</sup> power pipeline FFT processor", IEICE Trans Electron vol 88-c ,No 8 Aug 2005.
- [5] Bahram Rashidi,"High performance and low power finite impulse response filter based on ring topology with modified retiming serial Multiplier on FPGA", IET Signal process 2013 vol 7 pp 743-753.
- [6] Sharrif Z. A. M. "Digit slicing architecture for real time digital filters". PhD thesis. Loughborough University. UK, 1980.
- [7] Hung-Yu Wang, Jhong-Jhou Wu, Chun-Wei Chiu and Yu-Hsuan Lai ,"A Modified Pipeline FFT Architecture", International Conference on Electrical and Control Engineering, 2010.