Reconfigurable Computing

Reconfigurable computing is the application of field programmable gate array (FPGA) technology to solve computational problems, often taking advantage the flexibility available to produce problem-specific architectures that achieve high performance because of customization. 


We are applying this technology to computational problems in application domains including: satisfiability [1], cryptography [2],  random number generation [3], neuromorphic engineering [4], and DNA string comparison [5]; demonstrating performance vastly exceeding that of traditional approaches.

[1] P.H.W. Leong, C.W. Sham, W.C. Wong, H.Y. Wong, W.S. Yuen, and M.P. Leong. A bitstream reconfigurable FPGA implementation of the WSAT algorithm. IEEE Transactions on VLSI Systems, 9(1):197–201, 2001.

[2] P.H.W. Leong and K.H. Leung. A microcoded elliptic curve processor using FPGA technology. IEEE Transactions on VLSI Systems, 10(5):550–559, 2002.

[3] Dong-U Lee, Wayne Luk, John D. Villasenor, Guanglie Zhang, and Philip H.W. Leong. A hardware Gaussian noise generator using the Wallace method. IEEE Transactions on VLSI Systems, 13(8):911–920, August 2005.

[4] M.P. Leong, C.T. Jin, and P.H.W. Leong. An FPGA–based electronic cochlea. EURASIP Journal on Applied Signal Processing, 2003(7):629–638, 2003. Special issue on neuromorphic signal processing and implementations.

[5] C.W. Yu, K.H. Kwong, K.H. Lee, and P.H.W. Leong. A Smith-Waterman systolic cell. In Proc. International Conference on Field Programmable Logic and Applications (FPL), pages 375–384, 2003. Selected as one of the 27 most significant papers from the first 25 years of FPL.

Low-latency Machine Learning

ARC Linkage Grant: LP130101034

Recent advances in machine learning theory have made previously intractable problems in speech recognition, image recognition, handwriting analysis and machine translation feasible. However, the problem of processing massive high-frequency data streams such as those seen in financial market, internet traffic and hyperspectral imaging remains a challenge. We are working on real-time machine-learning algorithms and implementations which employ circuit-based execution to avoid overheads incurred by software running on a processor. If real-time high-speed machine learning can be realized on a hardware-accelerated platform, the applicability of machine learning applications can be broadened. Our research has demonstrated the feasibility of real-time ML with 100 ns latency [1] and at 70 Gbps [2]. in collaboration with Xilinx Research Labs, we have developed a convolutional neural network implementation which used binarized synaptic values which can only represent the values +1 and -1, but achieve high performance. On a ZC706 embedded FPGA platform, drawing less than 25 W total system power, we demonstrated up to 12.3 million image classifications per second (cps) with 0.31 us latency on the MNIST dataset with 95.8% accuracy, and 21906 cps with 283 us latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy [3]. Application domains include compliance and low-latency trading for the finance industry; line-rate monitoring of network data for cyber security applications; understanding of data from sensor networks; and scene analysis using audio and video signals.

[1] Stephen Tridgell, Duncan J.M. Moss, Nicholas J. Fraser, and Philip H.W. Leong. Braiding: a scheme for resolving hazards in kernel adaptive filters. In Proc. International Conference on Field Programmable Technology (FPT), pages 136–143, 2015. (doi:10.1109/FPT.2015.7393140)

[2] Nicholas J. Fraser, Duncan J.M. Moss, JunKyu Lee, Stephen Tridgell, Craig T. Jin, and Philip H.W. Leong. A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation. InProc. International Conference on Field Programmable Logic and Applications (FPL), pages 1–6, 2015. (doi:10.1109/FPL.2015.7293952)

[3] Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. FINN: A framework for fast, scalable binarized neural network inference. In Proc. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pages 65–74, 2017. (doi:10.1145/3020078.3021744)

Design Methodologies

We are also studying design methodologies for modelling and implementation. Our work includes automated floating-point rounding analysis [1], a variable radix variable wordlength arithmetic scheme which allows tradeoffs between resource utilisation [2], performance and accuracy; an arithmetic library that can support different number systems such as fixed point, floating point, and logarithmic [3]; the virtual embedded block technique for exploring tradeoffs with coarse-grained blocks in FPGAs [4]; and using map-reduce for hardware/software codesign [5].

[1] Michael Frechtling and Philip H. W. Leong. MCALIB - a tool for automated rounding error analysis. ACM Transactions on Programming Languages and Systems, 37:5:1–5:25, April 2015. (doi:10.1145/2665073)

[2] M.P. Leong and P.H.W. Leong. A variable-radix digit-serial design methodology and its application to the discrete cosine transform. IEEE Transactions on VLSI Systems, 11(1):90–104, 2003.

[3] K.H. Tsoi, C.H. Ho, H.C. Yeung, and P.H.W. Leong. An arithmetic library and its application to the N-body problem. In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 68–78, 2004.

[4] Chun Hok Ho, Chi Wai Yu, Philip Leong, Wayne Luk, and Steven J.E. Wilton. Floating point FPGA: Architecture and modelling. IEEE Transactions on VLSI Systems, 17(12):1709–1719, 2009.

[4] Jackson H.C. Yeung, C.C. Tsang, K.H. Tsoi, Bill S.H. Kwan, Chris C.C. Cheung, Anthony P.C. Chan, and Philip H.W. Leong. Map-reduce as a programming model for custom computing machines. In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 149–159, 2008.

FPGA Hardware

FPGAs are among the fastest growing segments of the semiconductor industry. We have strong interest in developing hardware for embedded systems. Our projects including reconfigurable computing cards such as Pilchard, the first memory-slot based reconfigurable computing card [1] and the architecture and printed circuit board design for The Cube Project, a 512 FPGA supercomputer [2] and the first university-designed structured ASIC [3].

[1] P.H.W. Leong, M.P. Leong, O.Y.H. Cheung, T. Tung, C.M. Kwok, M.Y. Wong, and K.H. Lee. Pilchard - a reconfigurable computing platform with memory slot interface. In Proc. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 170–179, 2001. Selected as one of the 25 most significant papers from the first 20 years of FCCM.

[2] Oskar Mencer, Kuen Hung Tsoi, Stephen Craimer, Timothy Todman, Wayne Luk, Ming Yee Wong, and Philip Heng Wai Leong. CUBE: A 512-FPGA cluster. In Proc. Southern Programmable Logic Conference (SPL), pages 51–57, 2009.

[3] Man-Ho Ho, Yan-Qing Ai, Thomas C.P. Chau, Steve C.L. Yuen, Chiu-Sing Choy, Philip H.W. Leong, and Kong-Pang Pun. Architecture and design flow for a highly efficient structured ASIC. IEEE Transactions on VLSI Systems, 21(3):424–433, 2013.

FPGA Architectures

We also study how FPGAs can be improved for domain-specific applications. Along these lines we have studied how they can be optimised to support floating-point arithmetic [1]; and looked at how reconfigurable fabrics can be included in synthesisable ASICs for debugging purposes [2].

[1] Chun Hok Ho, Chi Wai Yu, Philip H.W. Leong, Wayne Luk, and Steve Wilton. Domain-specific hybrid FPGA: architecture and floating point applications. In Proc. International Conference on Field Programmable Logic and Applications (FPL), pages 196–201, 2007. Stamatis Vassiliadis Award for Outstanding Paper.

[2] Steve J.E. Wilton, C.H. Ho, Brad Quinton, P.H.W. Leong, and Wayne Luk. A synthesizable datapath-oriented embedded FPGA fabric for silicon debug applications. ACM Transactions on Reconfigurable Technology and Systems, 1(1):7:1–7:25, March 2008.