Overlayroot Problem Statement

This week I've been looking at overlayroot as a potential solution to reduce the effort to make changes to the nodes in my Raspberry Pi cluster. In this post I want to brain-dump about the problem I'm hoping it solves and the problems I'm running into with respect to implementing overlayroot as well as potential solutions that I'm exploring.

What am I hoping to solve with `overlayroot`?

For my Raspberry Pi cluster, I've been using cloud-init's NoCloud provider to configure each node, which entails mounting my SD card to my Mac and writing new user-data. This worked fine when changes were infrequent, but I'm pivoting toward pushing more of the complexity into the user-data so I want to remove the "burn an SD card image" from my iteration loop (as well as the incumbent searching around for my SD card adapter).

There are a few solutions to this problem, but the one that seems most appealing is using overlayroot to keep a read-only filesystem layer (the "lower" layer in overlayfs terminology) that essentially preserves the initial image as it was at install-time while writing changes to an "upper" layer. The upper layer can then be erased in order to reset the system to its initial state.

Roadblocks & potential solutions

The problem I'm running into is that overlayroot seems to assume that these layers are running on distinct block devices, and it seems overwhelmingly people use an in-memory tmpfs for the upper layer (so much so that it seems to be implicit when people write about overlayroot online). As far as I can tell from the overlayroot documentation and Googling, I have these options:

Use a tmpfs upper layer
Partition my SD card such that I have a device for "upper" and "lower"
Maybe create a loop device from a file on a single primary partition

Use a tmpfs upper layer

This option is the easiest to get working, but it seems like overlayroot will use half of the available memory for the tmpfs which is a lot for a raspberry pi (particularly my 1GB pis). As far as I can tell, there's no way to specify an amount of memory before it starts swapping to disk, and I'm also not sure how the performance of swapping a tmpfs to disk will compare to just writing to a block device (moreover, I would think I still would need a large swap partition anyway, so I don't think this ends up buying me anything over a partitioned SD card).

Partition SD card into "upper" and "lower" partitions

Basically this approach involves writing a Raspberry Pi disk image file to an SD card and then creating a third "upper" partition with the remaining space. This will give me devices that I can reference in the overlayroot.conf file, but it's a pretty significant mount of work everytime I want to burn a new SD card (there's probably some way to automatically grow the third partition, but I'm not sure how to make that happen at the moment).

Maybe create a loop device from a file on a single primary partition

I'm not sure if this is feasible for a few reasons, but the idea is that I shouldn't have to create partitions just to get a separate block device. I should be able to create a file on my primary partition, create a loop device from that file, and reference that loop device from my overlayroot.conf file. The main challenge here is that loop devices don't persist across boots, so I would need to inject a script that runs before overlayroot which creates the loop device, and I'm not sure when exactly overlayroot runs during the boot process, nor how to hook in my script. I haven't found anything online about using overlayroot with loop devices (a few things about using a DIY overlayroot-like system with a loop device, but nothing about overlayroot specifically).

Conclusion

The tmpfs solution seems unworkable for my requirements (not using a bunch of memory), and the loop device solution seems like a thread that I could pull for a very long time before getting anything working (mostly because of my poor understanding of the Linux boot process). The partitioned SD card solution seems like the fastest path to a working solution, but it also requires a lot of work each time I'm burning an SD card (any time I want to add a new Pi to the cluster or replace an SD card).

As such, I'll start by getting the partitioned SD card working and see how that works out in practice. Hopefully I won't be burning SD cards so often as to be burdensome, and even if it is painful, I can probably automate away a fair amount of that pain (scripting the SD card burning process).

Craig Weber