Sophos UTM WAN Link Failover Fix
Quick Overview |
Why A Backup Wan Connection Is Always To Confirm The Root Cause Of The How To Fix The WAN Failover Issue
|
Problem
If the primary ADSL connection fails, the failover succeeds to the standby interface with no
issues, what doesn’t work as intended is falling back to the primary connection when it is back up…
It can be argued that the monitoring for the primary connection is not setup as intended which is not the
case… I actually configured the UTM to monitor the default gateway for the WAN connection (the first hop in the traceroute
of the primary ADSL connection ISP) because if i monitor the router itself or any public IP it’s going to be up anyways
regardless of the WAN connection being used.
Why A Backup Wan Connection Is Always Needed
The main issue does not lie in losing internet connection, there are other things to consider like being able to connect remotely to a machine in the network when the primary connection goes down to see what’s wrong and to be able to connect to wireless security cameras…
Prerequisites
-A router flashed with open source firmware like DD-WRT or
any linux device in the network with bash shell.
-A 4G modem with an ethernet interface or any backup WAN connection preferably of course, from a
different ISP.
-Of course, a Sophos UTM with Uplink Balancing & Uplink Monitoring configured, either running
on a VM or on a dedicated device.
Current Configuration
|
The VM has three virtual interfaces configured in bridged mode, this way they are seen by |
I have a Sophos UTM running on an Oracle VirtualBox virtual machine on an Intel NUC, the NUC has
only one ethernet interface and a wireless interface but i am only using the ethernet interface and in the VM configuration, i
have created three virtual interfaces in bridged mode:
-LAN interface.
-Primary WAN interface set to use the main router as default gateway.
-Backup WAN interface set to use the 4G modem as default gateway, in my case i am using the
TP-Link MR-3020 as default gateway.
-In case you will be using the TP-LINK MR-3020 or a similar device then you will also need a 4G
USB modem plugged in to the USB interface of the MR-3020.
Creating the interfaces in bridged mode allows them to appear to other devices in the network as
physical interfaces, no device in the network can tell the difference.
|
Problematic setup – Uplink Balancing is set to |
The issue arises when the UTM tries to ping the aforementioned first hop via the
backup wan interface which will never work because they are totally different ISP’s, simply.
To Confirm The Root Cause Of The Problem
To confirm the root cause of the problem, we will initiate a failover by disabling the primary WAN
connection on the router and then monitoring the monitoring traffic 🙂
|
Disconnecting the WAN from the DD-WRT router’s interface and simulating a |
But first, we have to setup our tcpdump on the UTM and note the behaviour of the normal monitoring
traffic, we will use tcpdump flag -e to show the MAC addresses of the interfaces:
elutm:/root # tcpdump -nei eth1 host 10.45.3.134 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 08:22:14.847351 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:22:14.869246 e0:3f:49:9c:5a:78 > 08:00:27:a1:c4:f6, ethertype IPv4 (0x0800), 08:22:29.849211 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:22:29.870805 e0:3f:49:9c:5a:78 > 08:00:27:a1:c4:f6, ethertype IPv4 (0x0800), 08:22:44.852065 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:22:44.873276 e0:3f:49:9c:5a:78 > 08:00:27:a1:c4:f6, ethertype IPv4 (0x0800), |
The ICMP’s are working as expected, to the correct IP and to the router’s MAC
address
Now let’s disable and re-enable the WAN connection and capture the monitoring traffic on
both WAN interfaces eth1 and eth2
elutm:/root # tcpdump -nei eth1 host 10.45.3.134 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 08:39:00.768403 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:39:01.021330 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:39:01.272891 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:39:01.524324 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), |
elutm:/root # tcpdump -nei eth1 host 10.45.3.134 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 08:46:00.924674 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:46:01.175700 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:46:01.427553 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), 08:46:01.678668 08:00:27:a1:c4:f6 > e0:3f:49:9c:5a:78, ethertype IPv4 (0x0800), |
As per the captures above, although the primary WAN connection is back up, for some reason the UTM
cannot reach it on both interfaces and therefore no fallback to the primary WAN interface…
How To Fix The WAN Failover Issue
To fix this, i came up with the idea to have the UTM monitor an IP inside the LAN instead of
monitoring the default gateway of the ISP, but we also need the reachability of that IP to be dependent on the reachability of
the default gateway of the ISP…
The solution is very simple… A small bash script on the router (or any linux based OS with
CLI) will ping the default gateway of the primary ADSL ISP every 30 seconds, if it succeeds it will add an additional IP on the
router’s LAN interface, this IP address will be used for the Sophos’ UTM monitoring instead of the gateway of the
ISP, so here’s how it will work.
-On router boot, assume WAN connection is up and add an additional IP address to the
router’s LAN interface.
-The Sophos UTM will test the reachability of the internal IP 192.168.1.250.
-Every 30 seconds, the router will ping the default gateway of the ISP.
-If the default gateway of the ISP is not reachable, change the additional IP address on the
router’s LAN interface to any other IP address.
-If at anytime 192.168.1.250 is unreachable, UTM will failover to the backup WAN
connection.
-A while TRUE loop continues to ping the default gateway of the ISP.
-If we are on the backup WAN connection, since the IP being tested for is an internal IP address
so the UTM can reach it regardless of which WAN connection is active.
-If the router can ping the default gateway of the ISP, it will re-add the previously deleted
additional IP on the router’s LAN interface 192.168.1.250.
-The UTM will detect that the monitored host is up and will fall back to the primary WAN
connection :).
|
Under “Interfaces > Uplink Balancing”, this is how it looks like with the |
The Bash Script In Action
One of the most powerful features of DD-WRT is the ability to add custom scripts that are executed at startup
that can do virtually anything, i have added this script to the “Custom scripts” section under Administration > Commands and called it via the “Startup script”
section.
#!/bin/sh
|
The Final Result
Now the UTM sees the primary interface as up and the interface connected to the 4G modem as
standby, which is exactly the intended behaviour.
|
The ADSL connection is up and the 4G connection is standby in case of any issues with |
Recent Comments