### Faster multiplication in ℤ_{2m}[x] on Cortex-M4 to speed up NIST PQC candidates

Matthias J. Kannwischer, Joost Rijneveld and Peter Schwabe

**Abstract:** In this paper we optimize multiplication of polynomials in ℤ_{2m}[x] on the ARM Cortex-M4 microprocessor. We use these optimized multiplication routines to speed up the NIST post-quantum candidates RLizard, NTRU-HRSS, NTRUEncrypt, Saber, and Kindi. For most of those schemes the only previous implementation that executes on the Cortex-M4 is the reference implementation submitted to NIST; for some of those schemes our optimized software is more than factor of 20 faster. One of the schemes, namely Saber, has been optimized on the Cortex-M4 in a CHES 2018 paper; the multiplication routine for Saber we present here outperforms the multiplication from that paper by 37%, yielding speedups of 17% for key generation, 15% for encapsulation and 18% for decapsulation. Out of the five schemes optimized in this paper, the best performance for encapsulation and decapsulation is achieved by NTRU-HRSS. Specifically, encapsulation takes just over 430 000 cycles, which is more than twice as fast as for any other NIST candidate that has previously been optimized on the ARM Cortex-M4.

**Source code:**
Available on GitHub

**Related talks:**

*Faster multiplication in ℤ _{2m}[x] on Cortex-M4 to speed up NIST PQC candidates*

2018-11-09 – Crypto Working Group – by Matthias J. Kannwischer –

*PQM4: Implementing Post-Quantum Crypto on the Cortex M4*

2018-09-13 – RIOT Summit 2018 –

@misc{KRS18, author = {Matthias J. Kannwischer and Joost Rijneveld and Peter Schwabe}, title = {Faster multiplication in $\mathbb{Z}_{2^m}[x]$ on {Cortex-M4} to speed up {NIST PQC} candidates}, year = {2018}, }