|
- October 23, 2014:
- Pushing through changes for curve flexibility. First up is
- Ed480-Ridinghood, because it has the same number of words. Next
- is E-521.
-
- Experimental support for Ed480-Ridinghood. To use, compile with
- make ... FIELD=p480 -XCFLAGS=-DGOLDI_FIELD_BITS=480
-
- I still need to figure out what to do about the fact that the library
- is called "goldilocks", but in will soon support curves that are not
- ed448-goldilocks, at least experimentally.
-
- Currently the whole system's header "goldilocks.h" doesn't have
- a simpler way to override field size, but it does work (as a hack)
- with -DGOLDI_FIELD_BITS=...
-
- There is no support yet for coexistence of multiple fields in one
- library. The field routines will have unique names, but scalarmul*
- won't, and the top-level goldilocks routines have fixed names.
-
- Current timings on Haswell:
- Goldilocks: 178kcy keygen, 536kcy ecdh
- Ridinghood: 193kcy keygen, 617kcy ecdh
-
- Note that Ridinghood ECDH does worse than 480/448. This is at least
- in part because I haven't calculated the overflow handling limits yet
- in ec_point.h (this is a disadvantage of dropping the automated
- tool for generating that file). So I'm reducing much more often
- than I need to. (There's a really loud TODO in ec_point.h for that.)
-
- Also, I haven't tested the limits on these reductions in a while, so
- it could be that there are actual (security-critical) bugs in this
- area, at least for p448. Now that there's field flexibility, it's
- probably a good idea to make a field impl with extra words to check
- this.
-
- Furthermore, field_mulw_scc will perform differently on these two
- curves based on whether the curve constant is positive or negative.
- I should probably go optimize the "hot" routines like montgomery_step
- to have separate cases for positive and negative.
-
- September 29, 2014:
- Yesterday I put in some more architecture detection, but it should
- really be based on the arch directory, because what's in there really
- was a terrible hack. So I've changed it to use $arch/arch_config.h
- to get WORD_BITS.
-
- I've tweaked the eBAT construction code to rename the architectures
- using test/batarch.map. Maybe I should also rename them internally,
- but not yet.
-
- I added some new TODO.txt items. Some folks have been asking for a
- more factored library, instead of this combined arithmetic, curve code,
- encodings and protocol all-in-one jumble. Likewise the hash and RNG
- should be flexible.
-
- I've also been meaning to put more work in on SPAKE2EE, which would
- also mean finalizing the Elligator code.
-
- September 18, 2014:
- Begin work on a "ref" implementation. Currently this is just the
- arch_ref64 architecture. The ref implementation always weak_reduces
- after arithmetic, and doesn't use vectors or other hackery. Currently
- it still must declare field elements as vector aligned, though,
- other code outside the arch directory can be vectorized.
-
- Change goldilocks.c to use field_eq instead of calling deep into field
- apis.
-
- September 6, 2014:
- Pull in minor changes from David Leon Gil and Nicholas Wilson, with
- some adjustments. I hope the adjustments don't break their compiles.
-
- `make bat` now makes a bat which passes supercop-fastbuild, though
- the benchmarks are rather different from `make bench`. I need to track
- down why.
-
- August 4, 2014:
- Experiments and bug fixes.
-
- Add really_memset = memset_s (except not because I'm setting -std=c99),
- thanks David Leon Gil. I think I put it in the right places.
-
- Try to work around what I think is a compiler bug in GCC -O3 on non-AVX
- platforms. I can't seem to work around it as -Os, so I'm just flagging
- a warning (-Werror makes it an error) for now. Will take more
- investigation. Thanks Samuel Neves.
-
- Added an experimental (not ready yet!) ARM NEON implementation in
- arch_neon_experimental. This implementation seems to work, but needs
- more testing. It is currently asm-heavy and not GCC clean. I am
- planning to have a flag for it to use intrinsics instead of asm;
- currently the intrinsics are commented out. On clang this does ECDH
- in 1850kcy on my BeagleBone Black, comparable to Curve41417. Once this
- is ready, I will probably move it to arch_neon proper, since arch_neon
- isn't particularly tuned.
-
- July 11, 2014:
- This is mostly a cleanup release.
-
- Added CRANDOM_MIGHT_IS_MUST config flag (default: 1). When set, this
- causes crandom to assume that all features in the target arch will
- be available, instead of detecting them. This makes sense because
- the rest of the Goldilocks code is not (yet?) able to detect features.
- Also, I'd like to submit this to SUPERCOP eventually, and SUPERCOP won't
- pass -DMUST_HAVE_XXX on the command line the way the Makefile here did.
-
- Flag EXPERIMENT_CRANDOM_BUFFER_CUTOFF_BYTES to disable the crandom
- output buffer. This buffer improves performance (very marginally at
- Goldilocks sizes), but can cause problems with forking and VM
- snapshotting. By default, the buffer is now disabled.
-
- I've slightly tweaked the Elligator implementation (which is still
- unused) to make it easier to invert. This makes anything using Elligator
- (i.e. nothing) incompatible with previous releases.
-
- I've been factoring "magic" constants such as curve orders, window sizes,
- etc into a few headers, to reduce the effort to port the code to other
- primes, curves, etc. For example, I could test the Microsoft curves, and
- something like:
- x^2 + y^2 = 1 +- 5382[45] x^2 y^2 mod 2^480-2^240-1
- ("Goldeneye"? "Ridinghood"?) might be a reasonable thing to try for
- 64-bit CPUs.
-
- In a similar vein, most of the internal code has been changed to say
- "field" instead of p448, so that a future version of magic.h can decide
- which field header to include.
-
- You can now `make bat` to create an eBAT in build/ed448-goldilocks. This
- is only minimally tested, though, because SUPERCOP doesn't work on my
- machine and I'm too lazy to reverse engineer it. It sets a new macro,
- SUPERCOP_WONT_LET_ME_OPEN_FILES, which causes goldilocks_init() to fall
- back to something horribly insecure if crandom_init_from_file raises
- EMFILE.
-
- Slightly improved documentation.
-
- Removed some old commented-out code; restored the /* C-style */ comment
- discipline.
-
- The AMD-64 version should now be GCC clean, at least for reasonably
- recent GCC (tested on OS X.9.3, Haswell, gcc-4.9).
-
- History no longer says "2104".
-
- May 3, 2014:
- Minor changes to internal routines mean that this version is not
- compatible with the previous one.
-
- Added ARM NEON code.
-
- Added the ability to precompute multiples of a partner's public key. This
- takes slightly longer than a signature verification, but reduces future
- verifications with the precomputed key by ~63% and ECDH by ~70%.
-
- goldilocks_precompute_public_key
- goldilocks_destroy_precomputed_public_key
- goldilocks_verify_precomputed
- goldilocks_shared_secret_precomputed
-
- The precomputation feature are is protected by a macro
- GOLDI_IMPLEMENT_PRECOMPUTED_KEYS
- which can be #defined to 0 to compile these functions out. Unlike most
- of Goldilocks' functions, goldilocks_precompute_public_key uses malloc()
- (and goldilocks_destroy_precomputed_public_key uses free()).
-
- Changed private keys to be derived from just the symmetric part. This
- means that you can compress them to 32 bytes for cold storage, or derive
- keypairs from crypto secrets from other systems.
- goldilocks_derive_private_key
- goldilocks_underive_private_key
- goldilocks_private_to_public
-
- Fixed a number of bugs related to vector alignment on Sandy Bridge, which
- has AVX but uses SSE2 alignment (because it doesn't have AVX2). Maybe I
- should just switch it to use AVX2 alignment?
-
- Beginning to factor out curve-specific magic, so as to build other curves
- with the Goldilocks framework. That would enable fair tests against eg
- E-521, Ed25519 etc. Still would be a lot of work.
-
- More thorough testing of arithmetic. Now uses GMP for testing framework,
- but not in the actual library.
-
- Added some high-level tests for the whole library, including some (bs)
- negative testing. Obviously, effective negative testing is a very difficult
- proposition in a crypto library.
-
- March 29, 2014:
- Added a test directory with various tests. Currently testing SHA512 Monte
- Carlo, compatibility of the different scalarmul functions, and some
- identities on EC point ops. Began moving these tests out of benchmarker.
-
- Added scan-build support.
-
- Improved some internal interfaces. Made a structure for Barrett primes
- instead of passing parameters individually. Moved some field operations
- to places that make more sense, eg Barrett serialize and deserialize. The
- deserialize operation now checks that its argument is in [0,q).
-
- Added more documentation.
-
- Changed the names of a bunch of functions. Still not entirely consistent,
- but getting more so.
-
- Some minor speed improvements. For example, multiply is now a couple cycles
- faster.
-
- Added a hackish attempt at thread-safety and initialization sanity checking
- in the Goldilocks top-level routines.
-
- Fixed some vector alignment bugs. Compiling with -O0 should now work.
-
- Slightly simplified recode_wnaf.
-
- Add a config.h file for future configuration. EXPERIMENT flags moved here.
-
- I've decided against major changes to SHA512 for the moment. They add speed
- but also significantly bloat the code, which is going to hurt L1 cache
- performance. Perhaps we should link to OpenSSL if a faster SHA512 is desired.
-
- Reorganize the source tree into src, test; factor arch stuff into src/arch_*.
-
- Make most of the code 32-bit clean. There's now a 32-bit generic and 32-bit
- vectorless ARM version. No NEON version yet because I don't have a test
- machine (could use my phone in a pinch I guess?). The 32-bit version still
- isn't heavily optimized, but on ARM it's using a nicely reworked signed/phi-adic
- multiplier. The squaring is also based on this, but could really stand some
- improvement.
-
- When passed an even exponent (or extra doubles), the Montgomery ladder should
- now be accept points if and only if they lie on the curve. This needs
- additional testing, but it passes the zero bit exponent test.
-
- On 32-bit, use 8x4x14 instead of 5x5x18 table organization. Probably there's
- a better heuristic.
-
- March 5, 2014:
- First revision.
-
- Private keys are now longer. They now store a copy of the public key, and
- a secret symmetric key for signing purposes.
-
- Signatures are now supported, though like everything else in this library,
- their format is not stable. They use a deterministic Schnorr mode,
- similar to EdDSA. Precomputed low-latency signing is not supported (yet?).
- The hash function is SHA-512.
-
- The deterministic hashing mode needs to be changed to HMAC (TODO!). It's
- currently envelope-MAC.
-
- Probably in the future there will be a distinction between ECDH key and
- signing keys (and possibly also MQV keys etc).
-
- Began renaming internal functions. Removing p448_ prefixes from EC point
- operations. Trying to put the verb first. For example,
- "p448_isogeny_un_to_tw" is now called "twist_and_double".
-
- Began documenting with Doxygen. Use "make doc" to make a very incomplete
- documentation directory.
-
- There have been many other internal changes.
-
- Feb 21, 2014:
- Initial import and benchmarking scripts.
-
- Keygen and ECDH are implemented, but there's no hash function.
|