An Efficient Implementation of Image Mipmapping using Frequency Domain Techniques

—Mipmapping is a popular technique, generally, used in real time processors to increase rendering speed and minimize aliasing effects. The basic idea of this technique is to construct a pyramid of binary fractioned downscaled images of the original image. The existing time domain convolution approach suffers from more computations for downscaling of images, particularly when the image resolution is very high. In this paper, frequency domain based overlap save method along with basic FFT properties is explained on how to implement mipmaps efficiently, aiming at less computations. The theoretical computations were provided of the proposed approach. The proposed approach was implemented on ADSP-BF533 DSP processor for different resolutions of input images. The computational comparison clearly reveals that the proposed approach is attractive to be used in scaling at very high resolution images and provides good performance than time domain filtering. The experimental results also show that proposed approach does not degrade the quality of the downscaled images based on the measurement of PSNR between proposed and time domain techniques.


II. REVIEW OF PREVIOUS WORK
In literature, several techniques are available to obtain downsampled images in mipmapping. The aim of mipmapping is to pick-up the correct pixel colour among different neighbouring pixels. The easiest approach is nearest neighbour interpolation in which the closest pixel colour will be used among the closest four pixels as shown in Fig.3(a). Here the interpolated pixel color should be identified at co-ordinate (s, t). The four closest pixels are available at (s 0 , t 0 ), (s 0 , t 1 ), (s 1 , t 0 ) and (s 1 , t 1 ). Among these, the colour of pixel at (s 1 , t 0 ) will be selected as interpolated pixel colour at co-ordinate at (s, t) because (s 1 , t 0 ) is near to (s, t). Though this approach is very simple, the downsampled image suffers from low quality and aliasing [9]- [10]. Another approach is bilinear interpolation [9] in which linear interpolation is applied horizontally and vertically to obtain the desired pixel from the neighbouring pixels as shown in Fig.3 (b) and (c) respectively. Referring to Fig.3 (b), the pixels C top and C bot are obtained using horizontal linear interpolation as is the pixel value at the texel co-ordinate (i, j) r s is the pixel ratio in s-direction, which is given by Now the pixel value at (s, t) is obtained from C top and C bot using vertical linear interpolation (refer to Fig. 3(c)) as where r t is the pixel ratio in t-direction, which is given by But bilinear interpolation is not attractive in all cases because the very small downscaled image causes accuracy problems due to missed pixels and this, in turn, results into blurriness that can lead to loss in smoothness in downscaled image [10]. An update of bilinear interpolation is trilinear interpolation in which the bilinear interpolation followed by linear interpolation is applied on the neighboring pixels to yield the desired pixel value. But this technique can often combine pixels from outside of the sample area, particularly when the samples are of not a square type, which is not correct [11].
To overcome these issues, the original image is filtered with low-pass filter (LPF) first and then downsampled (Fig.2). To avoid aliasing effects, either Gaussian or sinc function is used as LPF [12]. Convolving the original image with discrete sinc function is computationally expensive task in time domain due to huge number of multiplications and additions. This becomes severe issue when the original image has high resolution. To obtain same output quality as that of time domain convolution without consuming too many computations, efficient implementation techniques are desired. The proposed approach is based on principle that time domain convolution is equivalent to frequency domain multiplication. The overlap save method is used for convolution process, which is efficient, particularly at high resolutions. The mathematical background of using overlap save approach for downscaling is explained and implementation details were provided in this paper. The computational comparison was provided between time domain and frequency domain based approaches.  Fig.2, let X(k), V(k) and Y(k) be the discrete frequency equivalents of time domain signals x(n), v(n) and y(n) respectively. Referring to [5], frequency domain outputs can be expressed, in terms of continuous frequency variable, as

Referring to
where ω x and ω y are the continuous frequency variables of x(n) and y(n) respectively and D is downsampling factor. The discrete frequency equivalents of equations (6) and (7) are given as where k is discrete frequency index and N is FFT length. The discrete frequency of x(n) varies from 0 to N-1 when continuous time frequency ω x is varying from 0 to 2π. In similar manner, the discrete frequency index, k varies from 0 to N/D -1 as output frequency ω y is varying from 0 to 2 π /D. From equation (9), it is understood that the FFT of y(n) is sum of downsampled component, V(k), k=0,1,…N/D -1 and aliased components from is fed to IFFT as input and the IFFT output is scaled by the factor 1/D. Fig.4 shows this process in block diagram.

A. Implementation for Downsampling Original Image
The implementation procedure is explained in flow chart of Fig.5. Initially, a complex sequence is formed with two successive rows of original image. The FFT length is decided based on N = L+M-1, where L and M are the lengths of row and filter response respectively. If N is not a power of 2, sufficient zeros will be appended to complex sequence. The FFT, H(k) of filter response is calculated once and will be reused for every row or column processing. The FFT, X(k) of input complex sequence is evaluated first and complex multiplication is performed for each frequency index. From the output of complex frequency multiplication, actual downsampled component and other aliasing components are separated. All these components are summed to get the final FFT, Y(k) of downsampled component of x(n) by the downsampling factor D. Since the downsampled component has N/D frequency points, IFFT length should become N/D. The component, Y(k) is fed to IFFT block of length N/D and the output of IFFT is scaled with factor 1/D. From the output y(n), the desired pixels are copied to output buffer by separating real and imaginary components as two individual rows. This process will be continued for rest of rows. Also after completion of downsampling all rows, the same approach is repeated for all columns to obtain the final downsampled image with factor D. This process is common for obtaining downsampled image at different mipmap levels. One should note that the overlap save approach introduces a pixel delay equal to filter length, in worst case. While copying the IFFT output, one should ensure to leave the initial M pixels due to delay as they won't contain desired information.

B. Computational Complexity of Overlap Save based Downsampling
Table I provides computational complexity details. Here N = L+M-1 and D is the downsampling factor. L and M are lengths of row/column and filter respectively. The computational complexity details described in above table are valid if decimation factor is a power of 2. Also computational complexity in this table is valid when two rows/columns are processed simultaneously by forming a complex sequence. The overall computational complexity will vary based on original image width and height accordingly.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
To prove the efficiency of proposed method, several images of different resolutions, such as 1920x1080, 800x533 and 640x480, were taken as input images and were applied to proposed method. The 128 taps sinc filter is used as LPF, designed using elliptic filter approach. At each mipmap level, FFT length is decided based on N=L+M-1. If N is not power of 2, sufficient zeros are padded to the row or column sequence. During processing, a complex sequence was formed with either successive rows or columns before feeding as input to overlap save approach. This is basically done to save computations and 50\% performance improvement can be obtained with this approach.
The Analog Devices BF533 Ez-Kit Lite [13]- [14] is taken as DSP platform to implement the proposed approach due to high performance, power efficient processor architectural features as mentioned below   High performance (Up to 756 MHz) 16-bit/32-bit embedded processor core  10-stage RISC MCU/DSP pipeline with mixed 16-bit/32-bit ISA for optimal code density  Full SIMD architecture, including instructions for accelerated video and image processing.
 Memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, Flash, or ROM  Large on-chip SRAM for maximum system performance  Clean, orthogonal RISC like microprocessor instruction set  Flexible instructions to process 8, 16 and 32 bits in a register Also, Visual DSP++ Software environment tool [15] is used for development of the proposed method. The image data was stored in external memory, SDRAM initially and rows/columns are copied into on-chip DSP RAM based on processing needs. The original image was downscaled to various mipmap levels ranging from 2 to 128. The computations are measured in terms of cycle count for the proposed approach at each mipmap level. The same measurement was done for time domain filtering approach at all mipmap levels. Table II provides the computations taken by the proposed approach and time domain filtering for image resolutions of 1920x1080, 800x533 and 640x480 respectively. The computations comparison between time-domain filtering and proposed approach was provided in Fig.6 (A), (B) and (C) respectively for the above resolutions. From the comparison, it is clear that the proposed approach yields better performance at the low decimation factors i.e. at high resolutions and the deviation in performance between TDF and OS becomes reduced as decimation factor is increasing. At high decimation factors, TDF performs better than OS approach (indicated with coloured cells in table II). The reason for this is that FFT size becomes more at high decimation factors for processing of lower number of pixels. The cycles that are required for OS approach in this case are more than that of TDF. The output quality of the proposed approach is measured by finding the PSNR (Peak Signal Noise Ratio) between output images obtained from TDF and OS methods. The output downscaled images are shown in Fig.7, 8 and 9 for input images of 1920x1080, 800x533 and 640x480 resolutions respectively. In each figure, (A), (B) and (C) contain the original image, frequency domain processed output image and time domain processed output images respectively. The PSNR in dB for all downscaled images was found out to be -∞, which is clear that there is no quality difference between TDF and OS approach. At the same time, OS approach performs better than TDF at high resolutions for mipmapping.

V. CONCLUSION
An efficient implementation approach called overlap save method is proposed to generate downscaled images in mipmapping application. This approach has the benefit of better performance over the existing time domain filtering approach, particularly at high resolutions. To prove the concept, this method was implemented on ADSP-BF533 processor for some of the input images. The results clearly show that proposed implementation approach is very efficient at high resolutions and there won't be any deviation in output quality of downscaled images.