This is largely a save-your-work checkin.
Created p521/arch_ref64 code to make sure E-521 basically works.
Fixed some of the testing code around E-521. It doesn't quite pass
everything yet.
Created p521/arch_x86_64 code with optimized multiply. In this
checkin, the multiply is fast and works, but all the other code in
that directory is the completely unoptimized ref64 build which
reduces after every add and sub. So the whole thing isn't fast yet.
Very experimental Ed480-Ridinghood support is now in. It's not fully optimized,
but in general the current build is 8-15% slower than Goldilocks. It only works on
arch_x86_64, though arch_ref64 support ought to be easy. Support on other arches
will be trickier, which is of course why I chose Goldilocks over Ridinghood in the
first place.
Next up, E-521. Hopefully.
The code is starting to get spread out over a lot of files. Some are per field*arch,
some per field, some per curve, some global. It's hard to do much about this, though,
with a rather ugly .c.inc system.
There's currently no way to make a Ridinghood eBAT. In fact, I haven't tested eBAT
support in this commit. I also haven't tested NEON, but at least ARCH_32 works on
Intel.
Continuing demagication and factoring of field code.
Removing high-level ops from p448.h and putting them in field.h. That way they
won't need rewriting for new fields and architectures.
Create constant_time.h which contains constant-time lookups, condswaps, etc.
That way the code is the same on all architectures, instead of varying depending
on whether the field size is a multiple of the vector register size. I should
still add a constant_time_select to factor out field_cond_negate.
TODO: I need to test this for correctness and performance on various platforms.
It works on my Mac, but since Yosemite the timing is totally unpredictable
(background tasks? variable boost?).
Create new src/arithmetic.c for field-independent arithmetic (eg batch invert, is_square).
Replace p448_ with field_ where possible.
Create constant EDWARDS_D = -39081.
Create inline function field_mulw_scc for multiplying by compile-time signed curve constants.
src/include/barrett_field.h:
- Requires review: corrected failure to cast to (mask_t) prior to negation. (Or, if this is wrong; should cast to needed bitwidth explicitly.)
- Changed type of nwords_out to uint32_t to agree with header.
src/include/intrinsics.h:
- Fixed up various preprocessor statements to check for definition rather than value of built-ins.
- Added macro to use Clang’s __builtin_readcyclecounter on platforms on which it’s available. (Which is most platforms these days.)
src/include/magic.h: Preprocessor “if” versus “if defined”.
src/include/word.h: Fixed ifdefs; enabled support for memset_s on Darwin. Added explicit cast to mask_t.
Added void to function definitions and declarations in the following files (not including void is okay in modern C++, but not modern C, IIRC):
include/goldilocks.h, src/crandom.c, src/goldilocks.c, src/include/api.h, src/include/intrinsics.h, test/bench.c, test/test.c, test/test.h, test/test_arithmetic.c, test/test_goldilocks.c, test/test_pointops.c, test/test_scalarmul.c, test/test_sha512.c
Trying to work around an apparent GCC bug on SSE2, thanks Samuel
Neves.
Added an experimental NEON arch. It's fast. It's not yet GCC clean.
It needs some more work on general cleanliness too.
improve GCC-cleanness, etc.
Disable the crandom output buffer so that it won't return duplicate
data across fork(). I should still stir in more entropy into the
buffer at least when RDRAND is available, but this should prevent
disasters for now.
The Elligator code in the current version is incompatible with past
versions due to a minor tweak. It wasn't being called by any of
the API functions, though.
Removing "magic" constants and type names. So for example p448_t
is now field_t (though maybe it should really be felem_t?). This
should enable other curves with the Goldilocks code in the not-too-
distant future.
Added CRANDOM_MIGHT_IS_MUST so that you don't have to -D a bunch of
things on the command line.
You can `make bat` to make an eBAT which probably doesn't work.
I haven't implemented the improved nonce generation from the
curves@moderncrypto.org thread yet.