Sep 16, 2017

Using pm-utils to save/restore VMs on workstation suspend/restore

I use Ubuntu (16.04 for now) and Vagrant (1.9.0 for now) on a bunch of my projects, and I've been running into somethinng like this power management bug for a while now, where after restoring from suspension, my vagrant sessions would be dead and I'd have to 'vagrant halt' and 'vagrant up' before another 'vagrant ssh' would succeed.

To work around this, I came up with some /etc/pm/sleep.d scripts which would save any running vagrant boxes when suspending the workstation and then resume the VMs when resuming the workstation.

Now if I'm in a 'vagrant ssh' session and Ubuntu suspends/resumes, instead of coming back to a frozen session, I'll see I've been disconnected from the ssh session, and I can do another 'vagrant ssh' without having to halt/re-up the VM. That's better than nothing, but the next step here is to start using something like screen or tmux in my vagrant sessions so I can restore right back to where I left off.

So why bother with two scripts when you could have 1 script with a single case statement ? I wanted saving the running vagrant boxes to happen when all the usual services and userspace infrastructure was still running, so I wanted that script in the 00 -49 range from as per the 'Sleep Hook Ordering Convention' portion of 'man 8 pm-action`. However I don't restoration to happen until all the services restarted, so I pushed to the end of the service handling hook range. I may want to revisit this, and rename it to 75_vagrant.

Note in the resume script, the command is pushed into the background since I didn't want want to wait for the VMs to be restored before resuming Ubuntu. I'm usually checking email or the web for a bit before going back to my VMs so I'm OK if that's ready immediately.

Here are some other lessons I learned from these scripts:

The first script is /etc/pm/sleep.d/01_vagrant:

#!/bin/bash

YOURNAME="your normal nonroot user name"

case "$1" in
    suspend)
        timestamp=`date --rfc-3339=seconds`
        echo "${timestamp}: $0 output" >> /var/log/pm-suspend-vagrant.log
        (/sbin/runuser -u ${YOURNAME} /usr/bin/vagrant global-status | grep running | awk '{ print $1; }' | xargs -L1 -I % runuser -u ${YOURNAME} vagrant suspend % ) >> /var/log/pm-suspend-vagrant.log
        ;;
    *)
        ;;
esac

# Don't let errors above stop suspension
true

The second script is /etc/pm/sleep.d/74_vagrant.sh

#!/bin/bash

YOURNAME="your normal nonroot user name"

case "$1" in
     resume)
        # Push the restoration into the background so it doesn't slow down
        timestamp=`date --rfc-3339=seconds`
        ((/sbin/runuser -u ${YOURNAME} /usr/bin/vagrant global-status | grep saved | awk '{ print $1; }' | xargs -L1 -I % runuser -u ${YOURNAME} vagrant resume % ) >> /var/log/pm-resume-vagrant.log) &
        ;;
    *)
        ;;
esac

# Don't let errors above stop restoration
true

Sources: - http://manpages.ubuntu.com/manpages/xenial/man8/pm-action.8.html