Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor

Marc Moreno Maza

Abstract

We study various algorithms for the Truncated Fourier Transform (TFT) which is a variation of the Discrete Fourier Transform (DFT) that allows one to work with an input vector of arbitrary size without zero padding. After a review of the original algorithms for the forward and inverse TFT introduced by J. van der Hoeven, we consider the variation of D. Harvey as well as that of J. Johnson and L.C. Meng. Both variations are based on Cooley-Tukey like formulas. The former is called strict general radix as it strictly follows the specifications proposed by J. van der Hoeven, while the latter is called relaxed general radix as it requires some zero padding so as to improve data flow which supports full vectorization and parallelization.

In this thesis, we report on an implementation of the relaxed general radix forward TFT and a strict general radix inverse TFT. We have three objectives. First, obtaining a software tool generating optimized code forward and inverse TFT, extending the previous work of S. Covanov dedicated to FFT code generation. Second, comparing the practical efficiency of the strict and relaxed general radix schemes. Third, investigating the parallelization of one-dimensional TFT algorithms.

Our experimental results show that, in practice, the relaxed general radix forward TFT can reach similar performance (in terms of running time, clock cycles and cache misses) as the optimized FFT code of the BPAS library (on input vectors on which both codes apply without zero padding). Moreover, for an input vector whose size ranges between two consecutive values for which FFT does not require zero padding, our relaxed TFT generated code provides an effective implementation. Unfortunately, the same satisfactory observation does not hold for the strict radix scheme when comparing the inverse TFT and FFT. As for parallelization, here again the relaxed general radix scheme is satisfactory while the strict general radix is not. For instance, w.r.t. to the FFT code, the parallel forward TFT code has a speedup factor of 5.31 and 6.78 for an input vector of size 2^23 and 2^26 respectively.


Share

COinS