Implementation of Radon-Framelet Based OFDM on FPGA Platform

—This article proposes an efficient design and implementation of Framelet-Based Orthogonal Frequency Division Multiplexing (OFDM) with Finite Radon Transform (FRAT), named as (FRAT-FT-OFDM), on Field Programmable Gate Array (FPGA) platform. Modern FPGA design tool called Xilinx System Generator (XSG) is used to implement the transceiver of FRAT-FT-OFDM system. The FPGA implementation is carried out on a Zynq (XC7Z020-1CLG484) evaluation board with Joint Test Action Group (JTAG) hardware co-simulation. Performance of the implemented architecture has been validated through hardware co-simulation in term of utilized area and consumed power. The results showed that the system was implemented efficiently with few resources and consumed power, and could support real-time operations.

This article aims to present an efficient design and implementation of the FRAT-FT-OFDM model [8] on FPGA platform using a higher abstraction level tool called Xilinx System Generator (XSG) which is a part of System Editions of Vivado ® Design Suite. The Vivado Design Suite (v2015.2) will be used in this work. The performance of the proposed architecture has been validated through hardware co-simulation in terms of utilized area and consumed power.
The rest of this article is divided into following sections; Section II gives an overview for the FRAT-FT-OFDM system. Section III presents the design of FRAT-FT-OFDM in FPGA platform. Section IV discusses the hardware co-simulation. Finally, the work of this article is concluded in Section V. Fig. 1 depicts the basic blocks of FRAT-FT-OFDM transceiver. At the transmitter, a serial binary input data stream is converted to parallel form via serial-to-parallel (S/P) converter to construct symbols and great (p×p) matrix (A) which is the first step in the FRAT mapping technique. In the FRAT mapping procedure, a two dimensional FFT (2D-FFT) will be applied on this A and then the resulted matrix will be reordering by applying the optimal ordering algorithm [8] to produce the optimal matrix (F opt ). After that one dimensional IFFT (1D-IFFT) will be applied on each columns of the F opt to obtain FRAT matrix .

II. FRAT-FT-OFDM SYSTEM
 In order to increase the bit per Hertz of the mapping, the complex matrix  will be constructed from the real matrix ,  according to the equation (1) [8]. (1) where , l m r represent to the elements of matrix ,  , i j r represent to the elements of matrix .
 After that a set of zero symbols is inserted to the end of the signal to decrease the adjacent carrier's interference. The N f -point of inverse FT (IFT) is then applied to the signal to achieve orthogonality between subcarriers. The procedure of IFT is based on the matrix multiplication between the reconstruction matrix (W 2 ) given in equation (2) and the input signal [8]. The h 0 , h 1 , and h 2 are the low and two high-pass filter coefficients. Finally, the transformed data are converted into serial via parallel-to-serial (P/S) conversion and then sent to the receiver over the wireless channel.
At the receiver, the inverse operations for each block in the transmitter side must be performed to produce the correct data stream [8].

III. XILINX DESIGN OF FRAT-FT-OFDM
The FPGA design of FRAT-FT-OFDM model has been performed using XSG tool. In our design, 4-FRAT mapping, which is equivalent to 16-QAM, is selected since it can easily be modified to implement more complex mapping schemes, such as 6-FRAT and 8-FRAT, which are equivalent to 64-QAM and 256-QAM, respectively. The FRAT matrix dimensions (i.e., matrix A) were set to (3×3). The number of subcarrier (N f ) used in this implementation is 8, whilst the number of useful subcarrier is 6, which means that there are two zeros that will be padded.
Each block in the FRAT-FT-OFDM system are designed and tested by a relatively independent subsystem using XSG with aid of MATLAB SIMULINK. Then, all subsystems will be combined to create an entire system. The following subsections will detail each subsystem blocks. A. Xilinx Design of the Transmitter Fig. 2 depicts the XSG design for the transmitter side of the FRAT-FT-OFDM. As can be seen, this design consists of four subsystems. The "Gateway In" and "Gateway Out" blocks are used to define the inputs and outputs to the Xilinx design. While the "System Generator" token serves as a control panel for controlling system and simulation parameters, and it is also used to automatically compile designs into low-level representations. The concept design for each subsystem block, with its functionality and specifications illustrated in the following subsections:

1) S/P Subsystem
This subsystem is used to convert the input serial bits to parallel form and construct the symbols, which are then arranged in a (3×3) matrix (i.e. matrix A). The row(s) of the matrix A is designed for sequential access to the next subsystem. The XSG blocks construction of this subsystem is shown in Fig. 3. The "Serial-to-Parallel" block will convert each two consequential input bits to a single output to be mapped to the corresponding 4-FRAT symbol. Therefore, the clock frequency after this block will be reduced by half. The multiplexer "Mux" blocks will be used to regulate the elements of each row in the matrix, and these blocks are controlled by the "Counter" and the "MCode" blocks. After the entry of the first vector is completed, the "Constant" block with zero value will be used to reset the "Mux" blocks until the next input vector is generated

2) FRAT Mapping Subsystem
The FRAT mapping procedure consist of four steps, so each step of FRAT mapping will be designed in XSG as a separate subsystem, then combined to construct the FRAT mapping subsystem, as shown in Fig. 4. Step 1: 2D-DFT Subsystem The first three subsystems in the FRAT mapping design perform the 2D-DFT process to matrix A. 2D-DFT is computed using Row-Column (RC) decomposition [11], where the 1D-DFT is computed for each row, and then computed for each column in the resulting matrix. The RC decomposition reported high performance implementation for the 2D-DFT, but it is not favored for large matrix sizes due to the increasing complexity of the design.
The first subsystem is used to perform 1D-DFT on each row of the matrix A, which is computed using equation (3) [12]. Fig. 5 shows the XSG model for this subsystem. In this figure, the "Xilinx Constant Multiplier" blocks have been used to perform the multiplication, while the "AddSub" blocks have been used to combine the results. The input of this subsystem is a real numbers only (the imaginary part = 0), which lead to decreased computational operations.
The second subsystem in the design of the 2D-DFT is used to transpose the input matrix. Since, the input to the next subsystem should be in the row form, and we want to perform the 1D-DFT on each column. Therefore, this subsystem has been used to transpose the input matrix and convert each column to the row form before enter to the next subsystem. Fig. 6 illustrates XSG design for this subsystem, the "Mux" blocks will be used to regulate the elements in the form of columns, and these blocks are controlled by the "Counter" and the "MCode" blocks, while the "Delay" blocks are used to control the time of entry and exit of elements. The last subsystem in the design of 2D-DFT is used to compute the 1D-DFT on each input columns that comes from the second subsystem. The XSG design for this subsystem is shown in Fig. 7. In this subsystem the design of 1D-DFT is more complex compared with the subsystem given in Fig. 5, because the imaginary part in this subsystem is taken into consideration. Step 2: Reordering Subsystem Fig. 8 depicts the XSG design for the reordering subsystem. The action of this design is rearranging the elements of (3×3) matrix A and converts it to (3×4) matrix F opt (as shown in [13]), which can be achieved by using many of "Time Division Demultiplexer" and "Time Division Multiplexer" blocks. The left part of the figure has been used to reorder the real part of the elements, while the right one has been used to reorder the imaginary part of the elements. Step 3: IDFT Subsystem Fig. 9 exhibits the XSG design for Inverse DFT (IDFT) subsystem. The 1D-IDFT has been computed using the following equation [12].
As pointed out previously, the multiplication and addition operations are performed using the "Xilinx Constant Multiplier" and "AddSub" blocks, respectively. The imaginary part output from this step is always equal to zero, so the "Terminator" blocks will be used to ignore this value so that it will not be used in the next subsystem. Therefore, only real numbers can access the next subsystem. Step 4: Construct the Complex Matrix Subsystem Fig. 10 illustrates the XSG design for this subsystem; actually, this design intends to do two things. The first is to construct a complex matrix which is the last step in the design of FRAT mapping technique. This matrix is designed based on equation (1), so the resulting matrix will be converted from (3×4) dimensions (i.e., matrix )  to (3×2) dimensions (i.e., matrix )  . While, the second is to control the input flow to the next subsystem. The purpose of this control is to convert (3×2) matrix to one vector with a length equal to 6, and then force the real and imaginary components (in this vector) to pass through the same next subsystem (i.e., IFT subsystem) so that each output consists of real values followed by imaginary values. In this case, the design becomes more efficient, and requires fewer resources.  Fig. 11 depicts the Xilinx design for 8-piont IFT subsystem; this subsystem performs the IFT on the input vector consisting of eight elements, where two of them are zeros padded to the end of this vector. The IFT is performed by applying the matrix multiplication between the W 2 (given in equation 2) and the input vector.

4) P/S Subsystem
The XSG design of the P/S converter subsystem is illustrated in Fig. 12. At first, the "Time Division Multiplexer" block has been used to convert the incoming parallel input data to one serial sequence consisting of the real components, followed by imaginary components. Then, two "Mux" blocks are used to separate the real and imaginary parts of the data, in order to transmit them simultaneously.  Fig. 13 illustrates the XSG design for the receiver side design of the FRAT-FT-OFDM system. The concept design for each subsystem block, with its functionality and specifications, is illustrated in the following subsections:

1) S/P Subsystem
This subsystem converts the serial input data to the parallel form, while simultaneously controlling access of the parallel data to the next subsystem (i.e., FT subsystem) where the real part goes to the next subsystem. After waiting for few sampled periods, the imaginary part goes to the same subsystem to reduce the utilization of resources. The XSG design of this subsystem is illustrated in Fig. 14. "Mux" block has been used to arrange the flow of the input data with the aid of the "Counter" and "MCode" blocks, where d0 and d1 input ports are allocated to pass the real and imaginary components, respectively, while the d2 is allocated to reset the multiplexer between the input vectors. First, the serial real component passes through multiplexer to the "Time Division Demultiplexer" block in order to have a parallel form. After 12 sampling periods, specified by the "Delay" block the imaginary components passes to the same "Time Division Demultiplexer" block in order to have the parallel form as well.

2) FT Subsystem
The purpose of this subsystem is to perform 8-piont FT on the input vector, consisting of twelve elements. The FT can be achieved by applying the matrix multiplication between the transformation matrix, W 1 , (which is the transpose of W 2 given in equation 2) and the input vector.

3) FRAT Demapping Subsystem
FRAT demapping consists of four steps as well. Each step will be designed in XSG as a separate subsystem, and then all of these subsystems will be combined to form the FRAT demapping subsystem, as shown in Fig. 15. Step 1: Construct Real Matrix Subsystem Fig. 16 illustrates the XSG design for this subsystem; this design serves two purposes. The first remove zeros padding that were added in the transmitter side previously. The second is to construct a real matrix with (3×4) dimensions from complex numbers. Each incoming data from the previous subsystem consists of real components, followed by the imaginary components, so in order to construct a real matrix; first we should separate the two components and then rearrange them. This has been achieved using "Time Division Demultiplexer" and "Time Division Multiplexer" blocks. Step 2: 1D-DFT Subsystem In this subsystem, 1D-DFT will be computed on each column of the matrix constructed in step 1. The XSG design of this subsystem was illustrated in Fig. 5, so no need to repeat the design description of this subsystem.
Step 3: Retrieve the Original Ordering Subsystem Fig. 17 depicts the XSG design for retrieving the original ordering subsystem; the action of this design rearranges the elements of (3×4) matrix, and comes back again to (3×3) matrix. Similar to the design of Reordering subsystem in the transmitter, this design has been implemented using "Time Division Demultiplexer" and "Time Division Multiplexer" blocks. Step 4: 2D-IDFT Subsystem The last step in the FRAT demapping design is performing the 2D-IDFT process for input matrix consists of (3×3) elements. The 2D-IDFT is computed by using inverse procedure of 2D-DFT, where Column-Row (CR) decomposition should be performed. In this case, the 1D-IDFT is computed for each column, and then computed for each row in the resulting matrix. Therefore, the XSG design for this subsystem will be similar procedure of the design 2D-DFT in the transmitter which was described in detail previously.

4) Decision and P/S Subsystem
This subsystem performs a suitable threshold decision for the incoming data in order to complete the FRAT demapping design, then convert the data to the binary in serial form, which is the final step in the design of the receiver side. Fig. 18 illustrates the XSG design for this subsystem, where two "MCode" blocks is used to perform the threshold decision on the incoming data; one for the real components and the other for the imaginary components. A number of "Delay" and two "Mux" blocks with aid of "Counter" block and another "MCode" block has been used to control the entree of the incoming data to the decision "MCode" blocks. Then, the 4-FRAT symbol is converted to binary data with serial form through "Parallel-to-Serial" block. Because of the imaginary component of the output is always equal to zero, so this component will be neglected using "Terminator" block.

IV. HARDWARE CO-SIMULATION
System Generator provides several methods for transforming the models built using SIMULINK into hardware and to simplify the hardware verification and accelerate simulation process, one of these methods is called Hardware Co-simulation. This method provides a hardware-in-the-loop verification, where the inputs and test vectors are generated by SIMULINK, and then the System Generator send them to the design on the FPGA target board using a "Joint Test Action Group (JTAG)" connection in order to carry out the system simulation on the FPGA platform. After that, System Generator read the output back from JTAG and sends it to SIMULINK for displayed. Fig. 19 shows a picture of a hardware co-simulation model for the FRAT-FT-OFDM system and its programming to Zynq (XC7Z020-1CLG484) FPGA board using a JTAG connection. As seen from this figure, the output from the FPGA board is identical to the output from the Matlab simulation model, and both are identical to the input data without any error in the bit sequences, meaning that the complete model has been successfully loaded onto the FPGA board.
On the other hand, the VHDL (Very high speed Hardware Description Language) codes were automatically generated from the System Generator block sets and then analyzed with Vivado Design Suite for further system verification at Register Transfer Level (RTL), where synthesis/implementation operations are performed. Many hardware and software reports will be produced post-operation, such as the device resources utilization report, timing report, and power consumed report for the FPGA design.
The resource utilization report for the transmitter and the receiver design of the Xilinx FRAT-FT-OFDM model are depicted in Figs. 20 and 21, respectively. It shows that our design utilized very few resources compared with the available resources of the used board, so that this design performs well in the context of resource consumption for both the transmitter and receiver sides.
Figs. 22 and 23 represent the power analysis reports for the transmitter and receiver design, respectively. These reports show that the transmitter design consumed about 14% from the total power of the chip, while the receiver design consumed about 15% from the total power of the chip, and this consumed power are distributed between the clocks, signals, logic elements, and Input/Output pins with specific rates. These reports summarized that there is a little power consumed by Xilinx FPGA design compared with the total power of the used FPGA chip which is a favorite command in any design. V. CONCLUSION In this article, FRAT-FT-OFDM transceiver was designed and successfully implemented on the FPGA platform using Xilinx System Generator (XSG). The XSG is compatible with MATLAB SIMULINK, making it possible to provide design parameters, bit-and cycle-true simulations, hardware co-simulation, and VHDL code generation. After a successful design for the FRAT-FT-OFDM, the system was verified with hardware cosimulation, and successfully implemented on Zynq (XC7Z020-1CLG484) evaluation board. The hardware simulation and synthesis results presented that the implemented system is properly working, efficient in terms of resource utilization, and supports real-time operations with little consumed power.