Important work items for Ed448-Goldilocks: * Import SHA-512 or SHA-3. * Decide which. * Get a public-domain version which is 64-bit and 32-bit clean. * Update LICENSE and README to reflect that SHA is not my code. * Incorporate hashing into goldilocks_shared_secret. * It's a pretty terrible shared secret right now. * Decide on output size * Documentation: write high-level API docs, and internal docs to help other implementors. * Documentation: write a spec or add to Watson's * Cleanup: rename everything consistently. * namespace_op or op_namespace? namespace_op_type? * We don't have to be super-careful with the namespacing, because symbols will be scrubbed by exported.sym. * Cleanup: hard-coded tables (probably?) * This reduces the work required for goldilocks_init() at the expense of library size. * Makes error-handling and thread safety easier. * Use the SAGE tool? * Cleanup: unify intrinsics code * Word_t, mask_t, bigregister_t, etc. * Generate asm intrinsics with a script? * Bugfix: make sure that init() and randomization are thread-safe. * Security: check on deserialization that points are < p. * Check also that they're nonzero or otherwise non-pathological? * Testing: * Corner-case testing * more bulk random testing * SAGE-(auto?)-generated test vectors * Test the Barrett fields * Safety: add static analysis attributes for compilers that support them * EG, warn on ignored return types * Safety: * Check for init() if it's still required once we've done the above * Decide what to do about RNG failures * abort * return error and zeroize * return error but continue if RNG is kind of mostly OK * Flexibility: decide which API options are good. * Eg, should functions take nbits and table sizes? * Remove hardcoded adjustments from comb control. * These adjustments make the output wrong when it's not 450 bits. * Other slow Barrett fields? Montgomery fields? * Mid-level API * Make it easier to work with untwisted Edwards objects. * Probably use extended or projective, not extensible coordinates. * Scalarmul with other cofactor modes. * High-level API: * Signatures. * Decide on strictness level. * SPAKE2 Elligator Edition? Maybe write a paper first. * Elligator. * Need to write Elligator inverse. Might not be Elligator-2S. * What low-level APIs to expose? * Edwards points with add, sub, scalarmul, =, ==, ser/deser? * Portability: try to make the vector code as portable as possible * Currently using clang ext_vector_length. * I can't get a simple for-loop to autovectorize :-/ * SAGE tool? * Portability: make the inner layers of the code 32-bit clean. * Write new versions of the field code. * 28-bit limbs give less headroom for carries. * NEON and vectorless ARM. * Run through the SAGE tool to generate new bias & bound. * Portability: make the outer layers of the code 32-bit clean. * I don't think that there are endian bugs, but who knows? * NEON and vectorless constant-time comparison. * Performance: write and incorporate some extra routines * Deserialize_and_isogeny * Unconditional negate (or just plain subtract) * Performance: fixed parameters? * Perhaps useful for comb precomputation. * Performance: improve the Barrett field code. * Support other primes? * Capture prime shape into a struct instead of passing 3 params. * Make 32-bit clean. (SAGE?) * Automation: * Improve the SAGE tool to cover more cases * Real SSA classes to cover branching and looping * Constant-time selection * Intrinsics code * Field code? * Vector-mul-chains * Negation "bubble pushing" optimization * Clear other TODO/FIXME/HACK/PERF items in the code * Submit to SUPERCOP