Last Sunday morning April 25th 2021, I decided to upgrade my server, hosted at Hetzner. For more information about my configuration, please have a look to Install Geli blog post.
The different steps
To make it short, I have a first, empty zpool with a minimal FreeBSD installation. Once connected to it I attach the two encrypted partitions of the two disks to a encrypted mirror with geli
. Then I define the next root mount point with the kenv
command and use reboot -r
to make a partial reboot into the newly defined root.
The backup
I have a backup for this machine, but I always prefer to make a fresh backup before every important operation. My script is very simple, but it takes hours.
The first reboot
Once the backups finished I reboot the machine. By default it reboots to the unencrypted, small (50GB) zpool titled zboot.
The first upgrade
Connected via ssh to this host I just use the freebsd-update fetch
and freebsd-update install
commands, read the messages, reboot, update the installed packages and everything is well.
First step ✓
The second upgrade and how it failed
TLDR; I can even find out how it failed, I don’t see why it failed, but… it does.
My goal was to attach the encrypted partitions, define the next root to use, and once connected to the entire machine, proceed with freebsd-update
as I already did. Sounds logical to me.
geli attach ada0p4 ada1p4
Enter passphrase
Did I do a zfs update at this time, I can’t remember but certainly I did.
kenv vfs.root.mountfrom="zfs:tank/root"
reboot -r
…
ssh <host>
sshd, permission denied
Damn something went wrong.
I started to make mistake a this point. I was tired and want to finish this upgrade before going to bed.
THIS IS A VERY BAD IDEA
The third (and all other) reboots
I connected to the Hetzner console and issue a hard reboot to get back up to the unencrypted zboot system.
Next, instead of investigating why reboot -r
failed and why sshd
did not start, I decided to make a “raw
” upgrade.
The “raw” upgrade
Pretty simple:
- Download the FreeBSD 13.0-RELEASE archive (base.txz, kernel.txz, lib32.txz, src.txz and ports.txz).
- attach the encrypted partitions and mount the zpool at an alternate root (
zpool import -o altroot=/tank/root tank
) - make a little loop to install the new OS on the encrypted partition:
foreach foo ( base.txz kernel.txz lib32.txz src.txz ports.txz doc.txz); xz -d -c -v $foo | tar -C /root -xf -; end
- Define the next root (
kenv vfs.root.mountfrom="zfs:tank/root
) - then reboot (
reboot -r
) - No chance, ssh still unavailable.
What next?
Many stupid things.
- clean the encrypted partitions from all system directories (/lib, /lib32, /usr, /usr/libexec, /usr/libdata/, /etc, /var) and change some flags for some libraries (
chflags noschg <file>
) - reinstall the OS (see above) As expected, no chance again.
STOP
I need help, even when I unmount the data datasets before I install or remove things, I’m going to make more and more mistakes and one day the only solution will be reinstall the whole machine. And I don’t want that.
Calling a friend
So I created a Protonmail address (my mail server is one of the jails of the problematic machine), and ask my friend Ollivier for help.
He asks me to get access to the machine. I put his ssh key on the machine, create a user, give him all the credentials, to mount the encrypted partitions, access to the Hetzner console, etc. Yes I have absolute trust in him, for many many reasons.
He asks me if I have access to the machine console. Sure, and why I did not think about it before. I must be a sucker!
The console
I ask Hetzner support to plug a console into my machine. And I reboot it.
As expected nothing wrong for the first part. As usual, I attached the encrypted partitions and reboot -r
.
Damn, the console said there is missing libraries for sshd
, the /var/tmp folder is missing too.
The revelation
Yes, I am a sucker, a real one and stupid. During all my “experimentation”. I made zfs datasets for all the important folders, those I had cleaned (/lib, /usr, …). And they are missing for the startup.
I fix that and recreate /var/tmp
. Reboot once again.
The miracle
At this reboot sshd found the mandatory libraries, /var/tmp
and it worked. I can get connected to the entire machine again. \o/
Some other fix
The encrypted zpool was not mounted in the right place. I reboot once again, fix the altroot problem for the zpool, issue a last reboot -r
and… YES.
It works and the jails start as expected.
What I learned
- do not insist if your are tired, you will make mistakes;
- the same behaviour gives you the same result, insisting while tired will irritate you;
- never forget that you have friends to help you! You’re not alone.
- stop what you are doing, do something totally different (gardening, running, cooking…). It will clean your mind.
To conclude
It took me two and a half days to1 make this upgrade. It’s too long. Before doing things, think about it and the consequences, double, triple check what you are doing. And never forget to make a backup. The backup I made of /etc before doing anything else, saved my ass too.
Acknowledgement
Big thanks to my friend Ollivier who pointed me to the right track to find the solution. He was my rubber duck debugging :-)
Big thanks too, to MacLemon for rereading and fix this post.
-
To be clear and real, I was off one day for health reasons. ↩︎