There are at least two problems with the first Saber theorem, Theorem 6.1. (Also with Round5 Theorem 2.6.1, which copied the Saber theorem and proof, modulo tweaks for the Round5 details; I won't bother filing this as a separate comment.)

1. Formally, the theorem statement is trivially correct and useless, since the statement fails to require an _efficient_ reduction. The most useful fix from the perspective of reviewers is to define the specific reductions before the theorem, and then use those definitions in the theorem statement.

2. More fundamentally, instead of claiming

   * security of Saber.PKE against all IND-CPA attacks under two Mod-LWR assumptions and a standard-model PRF assumption,

the theorem has to be weakened to claim merely

   * security of Saber.PKE against _ROM_ IND-CPA attacks under Mod-LWR assumptions.

In other words, cryptanalysts have to check not merely whether the hash function is a PRF, but whether the hash function as used inside the PKE enables any sort of non-ROM IND-CPA attacks.

Section 4.1 of my latticeproofs paper pinpoints the proof error. I'm not saying that I vouch for the correctness and applicability of the theorem with the above changes; I did only a sanity check, not a serious review.

---Dan
Hi,

SaberX4 is a software implementation of Saber to achieve higher throughput, i.e., more KEM operations per second, on server machines with AVX2 support. A server needs to compute thousands of key exchanges every second; hence high throughput is desired in server applications.

The source code of SaberX4 is available at https://github.com/sujoyetc/SaberX4

SaberX4 batches four Saber KEM operations and computes them in parallel using AVX2 instructions, most of the time. With respect to the existing AVX2 implementation of Saber, SaberX4 achieves nearly 38%, 45%, and 35% higher throughput for key generation, encapsulation, and decapsulation respectively.

The current implementation is a proof-of-concept. I believe there are scopes for further optimization to improve throughput. Some of the routines, such as message encoder/decoder, byte-string-to-polynomial and polynomial-to-byte-string conversions, binomial sampling, etc. are still executed serially, although they can be batched to achieve higher throughput.

Regards
Sujoy

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+unsubscribe@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/284a7b51df7ede43d63244fb614264db%40esat.kuleuven.be.
Dear all,

We would like to let you know that our paper


presents a co-processor architecture to speed up Saber. The HW/SW co-design runs more than 5 times faster than SW-only implementation.

The architecture was implemented in the Xilinx ZedBoard Zynq-7000 ARM/FPGA SoC Development Board following HW/SW codesign strategy. The paper shows how to implement the Toom-Cook polynomial multiplier in hardware.

Execution time (measured in million CPU cycles):

<table>
<thead>
<tr>
<th></th>
<th>SW only</th>
<th>SW/HW</th>
<th>Improvement</th>
</tr>
</thead>
<tbody>
<tr>
<td>Key Generation</td>
<td>11.761</td>
<td>2.180</td>
<td>5.4</td>
</tr>
<tr>
<td>Encapsulation</td>
<td>14.944</td>
<td>2.762</td>
<td>5.4</td>
</tr>
<tr>
<td>Decapsulation</td>
<td>17.983</td>
<td>2.560</td>
<td>7.0</td>
</tr>
</tbody>
</table>

For a detailed description of the architecture, optimization techniques, results, and comparisons, please have a look at the paper at https://eprint.iacr.org/2020/321.pdf

More information regarding the Saber cryptosystem can be found on the fully updated website https://www.esat.kuleuven.be/cosic/pqcrypto/saber/

Regards
SABER Team
Dear all,

We would like to let you know about the last updates on Saber:


shows how pre-computation and lazy interpolation can be applied to Toom-Cook multiplication to offer different time-memory trade-offs. The effect of such techniques is highly influenced by the underlying platform, but they can be combined for any SW implementation. Additionally, memory optimizations for embedded implementations are also shown.

The execution time for key generation/encapsulation/decapsulation is now:

Saber on AVX2 : 82 / 105 / 108 [x1000 clock cycles]
Saber on Cortex-M4 : 846 / 1098 / 1112 [x1000 clock cycles]

2) The Saber website ([https://www.esat.kuleuven.be/cosic/pqcrypto/saber/](https://www.esat.kuleuven.be/cosic/pqcrypto/saber/)) has been updated to include these last results, comparisons to other schemes and references to the last publications by our team or other teams.

3) Sources for software implementations of Saber on different platforms are now publicly available on github ([https://github.com/KULeuven-COSIC/SABER](https://github.com/KULeuven-COSIC/SABER)). Further updates will come as we keep on improving Saber software and documentation.

Best regards,

SABER Team

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+unsubscribe@list.nist.gov.
To view this discussion on the web visit [https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/d566d0aa-96dd-4e0e-b656-3742746febd8%40list.nist.gov](https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/d566d0aa-96dd-4e0e-b656-3742746febd8%40list.nist.gov).
Dear all,

We would like to announce our high-speed hardware implementation results for Saber. Our instruction set coprocessor architecture for Saber implements all the building blocks (including CCA transformations) in the hardware. The unified architecture is programmable and supports all KEM operations (key generation, encapsulation, and decapsulation).

During a KEM operation, the operand data is transferred to the coprocessor at once from a host processor, then all the computations are performed in the FPGA, and finally, the result is read by the host processor.

The implementation benefits from the modular algorithmic nature of Saber and its power-of-two moduli. For a security level similar to AES-192, the architecture achieves fast computation time and computes Saber key generation, encapsulation and decapsulation in only 21.8, 26.5, and 32.1 microseconds respectively and consumes only 9% of LUTs and 2% of flip-flops available in the XCZU9EG-2FFVB1156 FPGA.

We have uploaded a report to the ePrint. The report contains details of the architecture and comparisons with HW implementations of other schemes. Hopefully, the report will be available soon.

Regards,
Andrea and Sujoy