Publication:
Parallel Implementation of Nussbaumer Algorithm and Number Theoretic Transform on a GPU Platform: Application to QTESLA

dc.authorscopusid56528569300
dc.authorscopusid15833929800
dc.authorscopusid55813782300
dc.authorscopusid14827620500
dc.authorscopusid6508137561
dc.authorscopusid7404626583
dc.authorwosidHwang, Seong Oun/Afn-1294-2022
dc.authorwosidLee, Wai/L-2715-2018
dc.authorwosidAkleylek, Sedat/D-2090-2015
dc.authorwosidYap, Wun-She/Abb-5158-2021
dc.contributor.authorLee, Wai-Kong
dc.contributor.authorAkleylek, Sedat
dc.contributor.authorWong, Denis Chee-Keong
dc.contributor.authorYap, Wun-She
dc.contributor.authorGoi, Bok-Min
dc.contributor.authorHwang, Seong-Oun
dc.contributor.authorIDLee, Wai Kong/0000-0003-4659-8979
dc.contributor.authorIDAkleylek, Sedat/0000-0001-7005-6489
dc.date.accessioned2025-12-11T01:13:27Z
dc.date.issued2021
dc.departmentOndokuz Mayıs Üniversitesien_US
dc.department-temp[Lee, Wai-Kong; Hwang, Seong-Oun] Gachon Univ, Seongnam, South Korea; [Akleylek, Sedat] Ondokuz Mayis Univ, Samsun, Turkey; [Wong, Denis Chee-Keong; Yap, Wun-She; Goi, Bok-Min] Univ Tunku Abdul Rahman, Bandar Sungai Long, Malaysiaen_US
dc.descriptionLee, Wai Kong/0000-0003-4659-8979; Akleylek, Sedat/0000-0001-7005-6489;en_US
dc.description.abstractAmong the popular post-quantum schemes, lattice-based cryptosystems have received renewed interest since there are relatively simple, highly parallelizable and provably secure under a worst-case hardness assumption. However, polynomial multiplication over rings is the most time-consuming operation in most of the lattice-based cryptosystems. To further improve the performance of lattice-based cryptosystems for large scale usage, polynomial multiplication must be implemented in parallel. The polynomial multiplication can be performed using either number theoretic transform (NTT) or Nussbaumer algorithm. However, Nussbaumer algorithm is inherently serial. Meanwhile, the efficient implementation of NTT using various indexing methods on GPU platform remains unknown. In this paper, we explore the best combination of various indexing methods to implement NTT on GPU platform and the efficient way to parallelize the Nussbaumer algorithm. Our results suggest that the combination of Gentleman-Sande and Cooley-Tukey (GS-CT) indexing methods produced the best performance on RTX2060 GPU (i.e. 422,638 polynomial multiplications per second). A technique to parallelize Nussbaumer algorithm by reducing the non-coalesced global memory access to half is produced. To the best of our knowledge, this is the first GPU implementation of Nussbaumer algorithm and it outperforms the best aforementioned NTT (GS-CT) implementation by 14.5%. For illustration purpose, the proposed GPU implementation techniques are applied to qTESLA, a state-of-the-art lattice based signature scheme. We emphasize that the proposed implementation techniques are not specific to any cryptosystem; they can be easily adapted to any other lattice-based cryptosystems.en_US
dc.description.sponsorshipKorea Research Fellowship program - Ministry of Science and ICT, Korea through the National Research Foundation (NRF) of Korea [2019H1D3A1A01102607]; NRF - Korea government (MSIT) [2020R1A2B5B01002145]; Fundamental Research Grant Scheme (FRGS), Malaysia [FRGS/1/2018/STG06/UTAR/03/1]; TUBITAK [EEEAG-117E636]en_US
dc.description.sponsorshipWai-Kong Lee was supported by Korea Research Fellowship program funded by the Ministry of Science and ICT, Korea through the National Research Foundation (NRF) of Korea (2019H1D3A1A01102607). Seong-Oun Hwang was supported by the NRF grant funded by the Korea government (MSIT) (2020R1A2B5B01002145). Denis Chee-Keong Wong, Wun-She Yap and Bok-Min Goi were supported by the Fundamental Research Grant Scheme (FRGS), Malaysia under Project Number FRGS/1/2018/STG06/UTAR/03/1. Sedat Akleylek was partially supported by TUBITAK under Grant No. EEEAG-117E636.en_US
dc.description.woscitationindexScience Citation Index Expanded
dc.identifier.doi10.1007/s11227-020-03392-x
dc.identifier.endpage3314en_US
dc.identifier.issn0920-8542
dc.identifier.issn1573-0484
dc.identifier.issue4en_US
dc.identifier.scopus2-s2.0-85089298096
dc.identifier.scopusqualityQ2
dc.identifier.startpage3289en_US
dc.identifier.urihttps://doi.org/10.1007/s11227-020-03392-x
dc.identifier.urihttps://hdl.handle.net/20.500.12712/42120
dc.identifier.volume77en_US
dc.identifier.wosWOS:000559632500001
dc.identifier.wosqualityQ2
dc.language.isoenen_US
dc.publisherSpringeren_US
dc.relation.ispartofJournal of Supercomputingen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectNumber Theoretic Transformen_US
dc.subjectNussbaumer Algorithmen_US
dc.subjectLattice-Based Cryptographyen_US
dc.subjectGraphics Processing Unitsen_US
dc.subjectPost-Quantum Cryptographyen_US
dc.titleParallel Implementation of Nussbaumer Algorithm and Number Theoretic Transform on a GPU Platform: Application to QTESLAen_US
dc.typeArticleen_US
dspace.entity.typePublication

Files