diff --git a/content/2021/05/nearly-complete-rng-guide.html b/content/2021/05/nearly-complete-rng-guide.html new file mode 100644 index 0000000..e7a5ac5 --- /dev/null +++ b/content/2021/05/nearly-complete-rng-guide.html @@ -0,0 +1,297 @@ +--- +title: Nearly Complete Guide to RNG on a microcontroller +description: > + How to initialize and run an RNG on an STM32L151CC microcontroller. +created: !!timestamp '2021-05-18' +listable: false +time: 12:00 PM +tags: + - security + - rng + - microcontroller +--- + +Security depends upon cryptography and which in turn depends upon a +Random Number Generator (RNG). An RNG is used for key generation (both +symmetric and asymmetric) and key negotiation (session establishment). +The later is an absolute requirement to ensure that communications can +be secured. The former (key generation) can be used at first boot for +personalization, but isn't necessary as it could be done when personalizing +the device at programming or first deployment. + +There are two types of RNGs, the first is a True Random Number Generator +(TRNG). This is one that takes some non-deterministic process, often +physical, and measures it. Often, these are slow and are not uniform, +requiring a post processing step before the are useful. + +The second type is a Pseudo Random Number Generator (PRNG)[NIST](https://www.nist.gov/) also refers to a +PRNG as a Deterministic Random Bit Generator (DRBG).. PRNGs +take a seed, and can generate large, effectively unlimited when seeded +properly, amounts of random looking data from them. The issue is than +if someone is able to obtain the seed, they will be able to predict +the subsequent values, allowing breaking security. + +The standard practice is to gather data from a TRNG, and use it to seed +a PRNG. It used to be common that the PRNG would be reseeded, but I +agree w/ djb (D. J. Bernstein) that once seeded, no additional seeding +is needed + +See his blog post +[Entropy Attacks!](https://blog.cr.yp.to/20140205-entropy.html) +as modern PRNGs are secure enough and can generate enough randomness +that their state will not leak. + +There are lots of libraries and papers that talk about how to solve the +problem for RNGs on a microcontroller that may not have an integrated +[T]RNG block, but I have not been able to find a complete guide for +integrating their work into a project where even a relative beginner +could get it functional. + +This article was written as I developed the +[lora-irrigation](https://www.funkthat.com/gitea/jmg/lora-irrigation) +project. This project will be used as an example, and the code reference +is mostly licensed under the 2-clause BSD license, and so is freely +usable for your own projects. + + +Sources of Randomness +--------------------- + +As mentioned, most microcontrollers do not have a dedicated hardware +block like modern AMD64 (aka x86-64) processors do w/ the RDRAND +instruction. Though they do not, there are other sources that are +available. + +The first, and easiest one is the Analog Digital Converter (ADC). Even +if the ADC pin is tied to ground, the process of digital conversion is +not 100% deterministic as there are errors in the converter or noise +introduced on the pin.The article +[ADC Input Noise: The Good, The Bad, and The Ugly. Is No Noise Good +Noise?](https://www.analog.com/en/analog-dialogue/articles/adc-input-noise.html) +talks about this. + +The data sheet for the microcontroller will help determine the expected +randomness from the part. In the case of the +[STM32L151CC](https://www.st.com/content/st_com/en/products/microcontrollers-microprocessors/stm32-32-bit-arm-cortex-mcus/stm32-ultra-low-power-mcus/stm32l1-series/stm32l151-152/stm32l151cc.html) +that I'm using, Table 57 of the data sheet lists the Effective number +of bits (ENOB) as typically 10 bits, which is a couple bits short of +the 12 bit resolution of the ADC. This means that the 2 least +significant bits are likely to have some noise in them. I did a run, +and collected 114200 samples from the ADC. The [Shannon +entropy](https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#Shannon_entropy) +calculated using the empirical probabilities was 2.48. + +Now this is not strictly Shannon entropy, as the +values were calculated from the experiment, and Shannon entropy should +be calculated from the a priori probabilities. Discarding the +0's (which makes up over half the results) improves the entropy +calculation to 3.29. The +[min-entropy](https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#Min-entropy), + +Forward reference: +min-entropy awk script +a better indicator of entropy, calculation is 1.2 bits, and if all the +0's are dropped, it improves to 2.943. This does help, but in the end, +subtracting the data sheet's ENOB from the ADC resolution does result +in an approximate estimate of entropy. + +It is possibly that a correlation analysis between samples could +further reduce the entropy gathers via the ADC, but with sufficient +collection, this should be able to be avoided. + +The second is using uninitialized SRAM. It turns out that this has +been studied in [Software Only, Extremely Compact, Keccak-based Secure +PRNG on ARM Cortex-M](https://dl.acm.org/doi/10.1145/2593069.2593218) +and [Secure PRNG Seeding on Commercial Off-the-Shelf +Microcontrollers](https://www.intrinsic-id.com/wp-content/uploads/2017/05/prng_seeding.pdf). +Depending upon how the SRAM is designed in the chip, it can create a +situation where each bit of SRAM will be indeterminate at boot up. +Both of these papers studied a similar microcontroller, an +STM32F100R8 to the one I am using, a STM32L151CC. + +I ran my own experiments where I powered on an STM3L151CC and dumped +the SRAM 8 times and analyzed the results. I limited my analysis to +26863 bytes the 32 KiBytes of ram (remaining was data/bss or stack, so +would not change, or was zeros). I then calculated the min-entropy for +each bit across power cycles and the resulting sum was 11188, or +approximately .416 bits per byte. This is 5.2% and in line with what +the later paper observed for a similar device. + +Part of using a source of randomness is making sure that it is usable. +In the case of the ADC, each reading can be evaluated against previous +reads to ensure that the data being obtained is possibly random. In +the case of SRAM, this is more tricky, as the state of SRAM is static, +and short of a reset, will not change. This means that to use SRAM, +proper analysis of the device, or family of devices, need to be evaluated +for suitability. There are cases where a device's SRAM does not provide +adequate entropy, as discussed in the papers, and so this method should +not be used in those cases, or not solely relied upon. + +The following is an `awk` script for calculating the min-entropy of the +provided data. Each sample must the first item on a line, and each sample +must be a hexadecimal value w/o any leading `0x` or other leading +identifier: +
# Copyright 2021 John-Mark Gurney
+# This script is licensed under the 2-clause BSD license
+
+function max(a, b)
+{
+ if (a > b)
+ return a;
+ else
+ return b;
+}
+
+{
+ v = ("0x" $1) + 0; a[NR] = v;
+ maxv = max(maxv, v);
+}
+
+END {
+ tcnt = length(a);
+ me = 0;
+ for (bit = 0; 2^bit <= maxv; bit += 1) {
+ cnt0 = 0;
+ cnt1 = 0;
+ for (i in a) {
+ tbit = int((a[i] / 2 ^ bit) % 2);
+ if (tbit)
+ cnt1 += 1;
+ else
+ cnt0 += 1;
+ }
+ v = -log(max(cnt0, cnt1) / tcnt) / log(2);
+ print "bit " bit ":\t" v;
+ me += v;
+ }
+ printf "total:\t%0.3f\n", me;
+}
+
+
+It is also possible that there are other parts of the board/design
+that could be a source of randomness. The project that started this
+journey is using [LoRa](https://en.wikipedia.org/wiki/LoRa) for
+communication. It turns out that the sample code for the radio chip
+([LoRaMac‑node](https://github.com/Lora-net/LoRaMac-node)) implements
+a [random interface](https://github.com/Lora-net/LoRaMac-node/blob/7f12997754ad8e38a84daa85f62e7e6c0e5dbe59/src/radio/radio.h#L154-L163).
+The function just waits one milisecond, reads the RSSI value, takes
+the low bit and repeats this 32 times to return a 32-bit word. There
+are issues with this as I cannot find any description of the expected
+randomness in the data sheet, nor in the code. It also does not do
+any conditioning, so just because it returns 32-bits, does not guarantee
+32-bits of usable entropy. I have briefly looked at the output, and
+there does appear to be higher lengths of runs than expected. Another
+issue is that it's collection takes a while, as the fastest is 1 bit
+per ms. So, assuming the need to collect 8 bits for 1 bit of entropy
+(pure speculation), that means at minimum 2 seconds to collect the
+2048 bits necessary for 256 bits of entropy.
+
+
+Uniquifying
+-----------
+
+One of the other ways to help ensure that a microcontroller is to
+integrate per device values into the PRNG. This does not guarantee
+uniqueness between boots, but it does make it harder to attack if an
+attacker is able to control the other sources of randomness.
+
+In the case of the STM32L151 chip I am using, there is a unique
+device id register. The device register is programmed at the
+factory. Because it is unknown if this unique id is recorded by the
+manufacturer, and possibly traced through the supply chain, and no
+guarantees are made to both the uniqueness or privacy, it has limited
+use to provide any serious additional randomization.
+
+Another method, is to write entropy at provisioning time. This can be
+done in either flash memory or EEPROM, which may have a more granular
+write access.
+
+
+Using SRAM
+----------
+
+The tricky part of using SRAM is figuring out how to access the
+uninitialized memory. Despite having full access to the environment,
+modifying the startup code, which is often written in assembly, to do
+the harvesting makes an implementation less portable. Using standard
+C, or another high level language, makes this easier, *but* we need to
+know where the end of the data and bss segments are. This is where
+looking at the linker script will come in.
+
+A linker script is used to allocate and map the program's data to the
+correct locations. This includes allocating memory so that all the
+code and data fits in flash, but also allocating ram for variables, and
+stack. Often there will be a symbol provided that marks where the data
+and bss sections in ram end, and the heap should begin. For example,
+in [`STM32L151CCUX_FLASH.ld` at lines 185 &
+186](https://www.funkthat.com/gitea/jmg/lora-irrigation/src/commit/91a6fb590b68af1bcd34f776d4a58c89ac581c7d/stm32/l151ccux/STM32L151CCUX_FLASH.ld#L185-L186)
+it defines the symbols `end` and `_end`, the later of which is often
+used by `sbrk` (or `_sbrk` in my project's case in
+libnosys
+A sample `_sbrk` is in [utils_syscalls.c](https://www.funkthat.com/gitea/jmg/lora-irrigation/src/commit/91a6fb590b68af1bcd34f776d4a58c89ac581c7d/loramac/src/boards/mcu/saml21/hal/utils/src/utils_syscalls.c#L67-L83),
+though this particular implementation is not used by my project.)
+to allocate memory for the heap. Using sbrk is the easiest method to
+access uninitalized SRAM, but modifying or adding a symbol can be used
+if your microcontroller's framework does not support sbrk.
+
+
+Putting it together
+-------------------
+
+It is accepted that integrating as many difference sournces of entropy
+(TRNGs) is best. This ensures that as long as any single soruce is
+good, or each one is not great, but combined they provide enough
+entropy (preferably at least 128 bits), that the seeded PRNG will be
+secure and unpredictable.
+
+As some sources are only available at first boot, e.g. SRAM, it is
+best to save a fork of the PRNG to stable storage. In my
+implementation, I decided to use EEPROM for this. I added an
+additional EEPROM section in the linker script, and then added a symbol
+[rng_save](https://www.funkthat.com/gitea/jmg/lora-irrigation/src/branch/main/strobe_rng_init.c#L39)
+that is put in this section. This should be 256-bits (32-bytes) as
+the savings of smaller does not make sense, and any proper PRNG when
+seeded with 256-bits will provide enough randomness. Writing to EEPROM
+does require a little more work to have the code save to this region,
+rather than RAM, but the STM32 HAL layer has functions that make this
+easy.
+
+It would be great if where the PRNG seed could be in read-once,
+write-once memory to ensure that it can be read, mixed in with any
+additional entropy, and the written out, but I do not know of any
+microcontroller that supports this feature.
+
+Part of this is is to ensure that the the state between the saved
+seed, and the PRNG state used for this boot is disjoint, and that if
+either seed is compromised, neither can be backtracked to obtain the
+other. In the case of [strobe](https://strobe.sourceforge.io/papers/strobe-latest.pdf),
+the function [strobe_randomize](https://www.funkthat.com/gitea/jmg/lora-irrigation/src/branch/main/strobe/strobe.c#L319-L331)
+does a RATCHET operation at the end, which ensure the state cannot be rolled
+back to figure out what was generated, and as the generated bytes does
+not contain the entire state of the PRNG, it cannot be used to
+reconstruct the future seed.
+
+Another advantage of using EEPROM is the ability to provide an initial
+set of entropy bytes at firmware flashing time. I did attempt to add
+this, but OpenOCD, which I use for programming the Node151 device,
+does not support programming EEPROM, so in my case, this was not
+possibleDespite not using it, the infrastructure to generate perso entropy is still present in the [Makefile](https://www.funkthat.com/gitea/jmg/lora-irrigation/src/branch/main/Makefile#L152-L157)..
+I could have added an additional source data file to the flash, but
+figured that the other sources of entropy were adequate enough for my
+project.
+
+
+{#
+Conclusion
+----------
+
+Modern microcontrollers do have a number of sources of entropy that can
+be used. With a little bit of work, a PRNG seed can be saved between
+resets, allowing for more secure operation, and even preloading of
+entropy. #}