Dead Simple Network Failover (Ubuntu)

Got two ISPs? Want automatic failover when one dies? Here’s a zero-dependency solution that just works on Ubuntu (or any systemd/NetworkManager distro).

The Problem

I have cable and Starlink. Cable is OK-ish but occasionally goes down. Starlink is my backup. I wanted automatic failover without buying expensive hardware or running complex software.

The Solution

A simple bash script that:

  • Checks if primary connection is healthy every 3 seconds
  • Switches to backup after 3 consecutive failures
  • Switches back when primary is stable again
  • Prevents flapping between connections

Setup (5 minutes)

1. Find your network interfaces

ip link show  # Find your interface names

2. Create the failover script

Save this to /usr/local/bin/netfailover.sh:

#!/usr/bin/env bash
set -euo pipefail

# Configuration - CHANGE THESE!
PRIMARY_IF="eth0"           # Your primary ISP interface
BACKUP_IF="wlan0"          # Your backup ISP interface
PRIMARY_NAME="Primary"      # Display name
BACKUP_NAME="Backup"       # Display name

# Tuning
OK_THRESHOLD=3             # Checks before switching back
FAIL_THRESHOLD=3           # Failures before switching
SLEEP_SECS=3              # Check interval

require_root(){ [[ $EUID -eq 0 ]] || { echo "Run with sudo"; exit 1; }; }
log(){ logger -t netfailover "$1"; echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"; }
gw(){ nmcli -g IP4.GATEWAY device show "$1" 2>/dev/null || true; }
setdef(){
  local ifc=$1 gw=$2
  [[ -n "$gw" ]] && ip route replace default via "$gw" dev "$ifc" && return 0
  return 1
}

healthy(){
  local ok=0
  # Test multiple endpoints to avoid false positives
  for dst in 1.1.1.1 8.8.8.8 9.9.9.9; do
    ping -I "$PRIMARY_IF" -c 1 -W 1 "$dst" >/dev/null 2>&1 && ok=$((ok+1))
  done
  [[ $ok -ge 2 ]]  # Need 2/3 responding
}

# Main loop
require_root
cur_primary=1; fail=0; ok=0

log "Starting failover monitor ($PRIMARY_NAME$BACKUP_NAME)"

while true; do
  PGW=$(gw "$PRIMARY_IF")
  BGW=$(gw "$BACKUP_IF")

  if healthy; then
    ok=$((ok+1)); fail=0
    if (( cur_primary==0 && ok>=OK_THRESHOLD )); then
      log "✓ $PRIMARY_NAME healthy → switching back"
      setdef "$PRIMARY_IF" "$PGW" && cur_primary=1
      ok=0
    fi
  else
    fail=$((fail+1)); ok=0
    if (( cur_primary==1 && fail>=FAIL_THRESHOLD )); then
      log "✗ $PRIMARY_NAME failed → switching to $BACKUP_NAME"
      setdef "$BACKUP_IF" "$BGW" && cur_primary=0
      fail=0
    fi
  fi

  sleep "$SLEEP_SECS"
done

3. Create the systemd service

Save this to /etc/systemd/system/netfailover.service:

[Unit]
Description=Network Failover Service
After=network-online.target NetworkManager-wait-online.service
Wants=network-online.target NetworkManager-wait-online.service

[Service]
Type=simple
ExecStartPre=/bin/sleep 8
ExecStart=/usr/local/bin/netfailover.sh
Restart=always
RestartSec=2
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

4. Enable and start

sudo chmod +x /usr/local/bin/netfailover.sh
sudo systemctl daemon-reload
sudo systemctl enable netfailover.service
sudo systemctl start netfailover.service

5. Check it’s working

# Watch the logs
sudo journalctl -u netfailover -f

# Test failover (unplug primary cable or disable interface)
sudo ip link set eth0 down  # Wait ~10 seconds
sudo ip link set eth0 up    # Should switch back after ~10 seconds

How It Works

  1. Health checks: Pings 3 public DNS servers through the primary interface
  2. Smart thresholds: Requires 3 consecutive failures before switching (prevents false positives)
  3. Stable switching: Waits for 3 successful checks before switching back (prevents flapping)
  4. Route manipulation: Uses Linux’s default route to control which ISP handles traffic

Why This Approach?

  • No dependencies: Just bash, ping, and ip route
  • Simple: ~50 lines of code you can understand and modify
  • Battle-tested: Based on traditional network monitoring patterns
  • Fast: Detects failures in ~9 seconds, recovers in ~9 seconds

Gotchas

  • Both connections need to be up and have DHCP leases
  • Existing connections may drop during switchover
  • DNS might need a flush after switching: systemd-resolve --flush-caches

Monitoring

Check which connection is active:

ip route | grep default

View failover logs:

sudo journalctl -u netfailover --since "1 hour ago"

That’s it. No BGP, no VRRP, no expensive hardware. Just a simple script that keeps me online.

Tested on Ubuntu 22.04/24.04 with cable + starlink, and dual WAN setups. Should work on any Linux with NetworkManager and systemd.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Why How You Lose As A Leader Matters As Much As How You Win
  • What Nobody Tells You About Those Long, Slow Zone 2 Runs
  • Claude Code for Blogging
  • Why Ethereum Could Become Financial Infrastructure
  • Uncomfortable truths about success