From 0e752a6e215aee21dc73da097c3225495d54a5b6 Mon Sep 17 00:00:00 2001 From: SoniEx2 Date: Fri, 9 Apr 2021 07:19:03 -0300 Subject: Add libotr/etc sources --- libotr/libgcrypt-1.8.7/mpi/i586/README | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 libotr/libgcrypt-1.8.7/mpi/i586/README (limited to 'libotr/libgcrypt-1.8.7/mpi/i586/README') diff --git a/libotr/libgcrypt-1.8.7/mpi/i586/README b/libotr/libgcrypt-1.8.7/mpi/i586/README new file mode 100644 index 0000000..d73b082 --- /dev/null +++ b/libotr/libgcrypt-1.8.7/mpi/i586/README @@ -0,0 +1,26 @@ +This directory contains mpn functions optimized for Intel Pentium +processors. + +RELEVANT OPTIMIZATION ISSUES + +1. Pentium doesn't allocate cache lines on writes, unlike most other modern +processors. Since the functions in the mpn class do array writes, we have to +handle allocating the destination cache lines by reading a word from it in the +loops, to achieve the best performance. + +2. Pairing of memory operations requires that the two issued operations refer +to different cache banks. The simplest way to insure this is to read/write +two words from the same object. If we make operations on different objects, +they might or might not be to the same cache bank. + +STATUS + +1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium +documentation indicates that they should take only 43/8 = 5.375 cycles/limb, +or 5 cycles/limb asymptotically. + +2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop +overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb. + +3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they +should... -- cgit 1.4.1