Towards Efficient Kyber on FPGAs: A Processor for Vector of Polynomials

Date:

More information here

I presented in Session 4C:Cryptographic Hardware Implementation and Secure Approximate Computing in ASP-DAC 2020. I had a presentation named “Towards Efficient Kyber on FPGAs: A Processor for Vector of Polynomials”. Session Chair is Weiqiang Liu from Nanjing University of Aeronautics and Astronautics, China.

Kyber is a promising candidate in post-quantum cryptography standardization process. In this paper, we propose a targeted optimization strategy and implement a processor for Kyber on FPGAs. By merging the operations, we cut off 29.4% clock cycles for Kyber512 and 33.3% for Kyber1024 compared with the textbook implementations. We utilize Gentlemen-Sande (GS) butterfly to optimize the Number-Theoretic Transform (NTT) implementation. The bottleneck of memory access is broken taking advantage of a dual-column sequential scheme. We further propose a pipeline architecture for better performance. The optimizations help the processor achieve 31684 NTT operations per second using only 477 LUTs, 237 FFs and 1 DSP. Our strategy is at least 3x more efficient than the state-of-the-art module for NTT with a similar security level.