Embedded Board dev cluster ========================== This is the description of how the cluster is architected and setup. There are a few design decisions that are likely to be different in other environments, but this made the most sense for mine. Goal ---- The goal of this setup is to allow people to be able to install and test images on various boards remotely. ### Definitions - User - A person or entity that is able to reserve a system, and test with it. - Host system - The machine that controls and provides access to the systems. ### Features - Remote power control - serial console access - Network boot - Internet connectivity for boards w/ ethernet - Isolation between board environments Likely implemented via VLANs+jails w/ VNET to provide complete control - Long term goals: - Emulated microSD cards ### Physical Architecture The following diagrams the connections between the components. It is expected that the connections for the RockPro64 is followed similarly for other embedded devices. In the case of the Switch, the RockPro64 will be put on a VLAN which will be delivered to the Host machine tagged. This will allow a jail in the host machine to have a direct control over the broadcast domain for the device. This will allow running dhcp/bootp/tftp services for netbooting by running dnsmasq or another service. As fusefs is now jail friendly, the root FS could even be mounted via sshfs The PoE part of the switch will be used for power control. PoE splitters (~$10) are readily available and inexpensive, and the cost per port is realatively inexpensive considering that power consumption will also be provided. ************************************************************************** * * * +--------------+ +-----------------+ * * | Host machine +-------------------------+ | Internet | * * +-+------------+ | +-+---------------+ * * | | | * * | | | * * | | | * * +-+------------+ +----------------+ | +-+---------------+ * * | USB Hub +-----+ Serial adapter | +---------+ PoE/VLAN Switch | * * +--------------+ +-+--------------+ +-+---------------+ * * | | * * | | * * | | * * +-+--------------+ Network +-+---------------+ * * | RockPro64 +------------+ PoE Splitter | * * +-+--------------+ +-+---------------+ * * | Power | * * +-----------------------------+ * * * ************************************************************************** ### Logical Architecture No user will be able to log into the host machine directly. The only user interface exposed on the host will be via an RPC interface, e.g. `echo function xxx | ssh labhost labcli`, or via a socket from within the jail. The user will be able to log into the jail that is created during the reservation. Any modifications to the jail will be rolled back and discarded after the system has been released, or the reservation has expired. Workflow -------- # Functions The functions that take a device handle can be executed from within the device jail and the device handle of that device jail will be used. An error will be raised if a device handle is provided and it does not match the current jail. These are the functions a user can execute: 1. List device classes (onlyavailable=false) List the device classes that are known about. If onlyavailable is true, than only ones that are currently available for claiming are returned. 2. Device status (claimed=false) List all the device statuses currently. This includes that status, claimed or unclaimed. If the device is claimed, it includes the user to claimed it. If claimed is true, only list devices that are currently claimed by you. The default is to list all devices. 3. Claim device (device class, power=false) A user can use this to lock a device. This will return a device handle with the device information, such as the device's jail's IP address, or an error. Once they have obtained a device, it will not be allocated to another user till they have release this device (or in the future a timeout has been hit). The user's ssh keys will be automatically populated in that jail. By default, the device will be in the power off state. When a default configuration is provided for the board, it can be automatically powered on to make it easier to integrate w/ systems like CI. 4. Reinit device (device handle) This will remove the current jail, and recreate it as if it was claimed for the first time. This can be done so you can get the jail in a clean state w/o risk losing the lock by freeing it and then reclaiming it. If you have not claimed the device, an error will be returned. If the reinit fails, an error will be returned, but the claim will be maintained. 5. Release device (device handle) Release a claim on the device handle returned by claim device. This will make it available to users again. All data in the jail will be deleted. It will return an error if you do have have a claim on the device. 6. Power off (device handle) Turn off the power to the device. 7. Power on (device handle) Turn on the power to the device. # Services provided in the device jail Once logged in, the jail will have the following services: 1. ssh Incoming ssh must be provided for the user to login. 2. One interface for internet nat setup so that all traffic appears from jail's IP. 3. One interface for device 4. Serial port for console access The configuration will identify which device to put in the jail. For USB devices, they will be able to be specified via a list of ports on the hubs so that there is no issue w/ device probe order. 5. dhcpd for device w/ tftp already configured 6. inetd w/ tftpd setup 7. nfsd setup w/ exports configured 8. power control for device As the user will have root access, all of these can be modified after. Future Work =========== There are a number of ideas to make developing boards remotely more doable. We can use devices that support OTG, to be USB devices for test boards. This means you can simulate a keyboard and or a mouse. With the combination of a HDMI to ethernet adapter (there are cheap ones for ~$40/each), a developer can work on making X or other GUI work remotely. Questions ========= Should the host key for the device be kept between invocations. Pros: Users don't have to munch the known_hosts every time. Cons: Malicious user could impersonate the jail as they can copy out the key. IPv6 support? I have enough that each device can get a /64, but this remove privacy in that the network will tell which device was active at the time