Faster multiplication in ℤ2m[x] on Cortex-M4 to speed up NIST PQC candidates

Matthias J. Kannwischer, Joost Rijneveld and Peter Schwabe

Abstract: In this paper we optimize multiplication of polynomials in ℤ2m[x] on the ARM Cortex-M4 microprocessor. We use these optimized multiplication routines to speed up the NIST post-quantum candidates RLizard, NTRU-HRSS, NTRUEncrypt, Saber, and Kindi. For most of those schemes the only previous implementation that executes on the Cortex-M4 is the reference implementation submitted to NIST; for some of those schemes our optimized software is more than factor of 20 faster. One of the schemes, namely Saber, has been optimized on the Cortex-M4 in a CHES 2018 paper; the multiplication routine for Saber we present here outperforms the multiplication from that paper by 37%, yielding speedups of 17% for key generation, 15% for encapsulation and 18% for decapsulation. Out of the five schemes optimized in this paper, the best performance for encapsulation and decapsulation is achieved by NTRU-HRSS. Specifically, encapsulation takes just over 430 000 cycles, which is more than twice as fast as for any other NIST candidate that has previously been optimized on the ARM Cortex-M4.

Paper: 2018-10-19

Source code: Available on GitHub

Related talks:
Faster multiplication in ℤ2m[x] on Cortex-M4 to speed up NIST PQC candidates
2018-11-09 – Crypto Working Group – by Matthias J. Kannwischer –
PQM4: Implementing Post-Quantum Crypto on the Cortex M4
2018-09-13 – RIOT Summit 2018

@misc{KRS18,
  author       = {Matthias J. Kannwischer and Joost Rijneveld and Peter Schwabe},
  title        = {Faster multiplication in $\mathbb{Z}_{2^m}[x]$ on {Cortex-M4} to speed up {NIST PQC} candidates},
  year         = {2018},
}