High availability Internet connectivity on a budget

This post was originally written back in 2012 in my previous job. I’ve reposted it here in case it’s of any use in the future.

How can you improve Internet connectivity for an office on a budget?

This was a problem we had recently for a customer of ours. The customer did not have a budget for a highly available Internet connection from an ISP. Their existing service was a standard business/consumer cable connection that performed all of their needs most of the time and was about all they felt they needed to pay. However, they needed to insure against the occasional outage that is not uncommon for such a service (one or two hours a month, say). Spending thousands of pounds was not an option on the table.

The service did indeed fail every few weeks, but the provider had been unable to improve reliability and so the problem remained one of: tolerate the problem or look for an alternative more reliable provider.

However, the customer already had a Cisco 1811 ISR router (circa £600) which would be able to provide route failover if only there was a second Internet connection.

The solution we offered and implemented was to install a second low cost Internet connection from another supplier, and to implement IP SLAs to handle failover. The existing router had all of the features required using only static routes and without any routing protocols.

Two ISPs connected to one router
The planned configuration

The following post is very Cisco oriented, but similar features may be available with alternative vendors’ equipment.

Basic configuration

The cable connection was the fastest, and so that would be the active connection. The DSL connection would be redundant and only come into use when failure of the cable connection occurred.

The DSL connection was tested and then connected to a free Layer 3 port on the router. It was was added to the routing table thus:

The lower metric for FastEthernet0 means that all traffic will be routed out that unless the interface is administratively down. However, if the interface is up and the ISP has a failure upstream, then the connection will not automatically failover. More on that later.

Address translation and route maps

Outgoing connections need to be translated to the correct public IP address depending on what route is taken. (This is in contrast to higher availability solutions where the IP addresses might be rerouted.) The existing NAT rules were simply duplicated for the new connection. Here, using route maps, the connection is translated to the correct IP address depending on which interface the connection leaves:

At this stage, the failover can be tested by manually shutting down FastEthernet0. (It may also be necessary to clear the NAT translation table.) All being correct, Internet connection should remain available.

IP SLAs

The final step to providing failover is remarkably simple.

First a suitable upstream IP address needs to be found for each connection to monitor for availability. Your ISPs have probably provided equipment that serve as the next hop. However, monitoring those is not suitable for they will remain available when the connection has failed further upsteam.

Perform a trace route for each connection to determine a suitable test candidate. Here, upstream routers have been identified that can be monitored for availability.

Traceroute can be used for each connection to identify the upstream routers on the service provider network.
Service provider routers identified for monitoring *

Now, SLA rules can be defined to monitor the two connections.

Above, two seperate SLAs have been configured. Each independently pings the upstream routers every 5 seconds, and notes a problem if there is not a response within 1 second (or 1000ms). However, we want to avoid invoking a failover if the disruption is short, so the connection must be down for 15 seconds before a failure will occur. We also want to avoid flapping of the connection (if the connection is regularly going down every minute it will cause frustration to bring it back every time), so a failed state will only end after 120 seconds without timeout.

You can now check the current status with the ‘show track’ command. The command should show ‘Reachability is Up’ for each connection. To put the rules into effect, the routing table must be modified to use the rules. In the following, the routes have been altered to reference the track objects, and so a failure will cause the route entry to be disabled.

One more thing…

Once a failover does occur, the route is effectively gone from the routing table. That applies equally for traffic originating from the router as it does for traffic going across it. That includes the monitoring traffic. To allow the router to detect the resolution of a service outage after a failover, explicit routes need to be added to the routing table for each monitored router:

You now have a simple failover between two basic Internet connections using standard features available in most Cisco IOS routers, while avoiding the use of routing protocols and expensive availabilty solutions from service providers.

Results

So far, in the last two months the primary connection (which is the fastest and preferred, but unfortunately the less reliable) has been down on three occasions for an average of 3 hours each time. Already the improvements have paid off.

In future articles I will elaborate on expanding the configuration to include load-balancing traffic to make better use of the redundant connection, and how incoming traffic for services such as web and email can continue to be served during a service provider failure.

* There are caveats with this post’s selection of router to monitor. If the ISP has a failure somewhere further upstream then your monitoring will fail to notice it. Alternatively, if the ISP reorganises the network and the router you were monitoring is removed, the monitoring will incorrectly believe the connection is down. These problems must be considered on a per connection basis and may need to be solved with the assistance of the particular service providers involved.

Emulating a Cisco ASA in GNS3

I originally wrote a post with this title for my last employer’s website 2 years ago. It was pretty popular for some reason (perhaps because information about ASA emulation was a lot less common than it is now), so I decided to revisit it and update it if required.

Back at the time, I was working on an upgrade of a pair of critical Cisco ASA firewalls from version 8.2 to a version greater than 8.3; this is a major upgrade that changes the commands for NAT significantly.

Cisco have incorporated a migration script into the upgrade process that attempts to convert the old 8.2 commands to 8.3 syntax. However, it’s not perfect and some configurations will just not migrate without intervention.

Having initially attempted the upgrade on the standby ASA, the automatically generated configuration produced by the upgrade was found to be producing undesirable behaviour. The ASA was rolled back but not before taking a copy of the configuration. Being unable to purchase another ASA for lab testing, the bad configuration was loaded into an emulated ASA in GNS3 and through trial and error new quirks in ASA configuration were corrected and the problem solved in the live environment.

Preparation

An excellent script by dmz at 7200emu.hacki.at (repack.v4.sh.gz) is necessary. This will take an ASA image and separate it into two files – a RAM disk and a kernel image. (Register and login to be able to download it.)

You will also need the Cisco ASA 8.4(2) image (asa842-k8.bin), as that is the image that the repack script is designed for. I did not attempt to use or remodel the repack script for other ASA versions, so that’s an interesting challenge for another day.

On a Linux system (I’m using Linux Mint 17, which should be very similar in behaviour to Ubuntu 14.04), run the script against the image. (Script needs run as root to avoid errors from cpio, so run at own risk.)

Keep two of the files produced:

  • asa842-vmlinuz
  • asa842-initrd.gz

Since Linux is what I work in these days, I’m mainly interested in getting GNS3 working in that, but I was unable to get it to work without a programming assertion failure. Since I had no such problems in Windows, and I will probably rarely or never need to emulate an ASA again, that will suffice.

Windows guide

GNS3 has had had quite a few updates since 0.8.3. Now in version 1.1, the installer now includes WinPcap and Wireshark by default. I’ll assume that the GNS3-1.1-all-in-one.exe installer has been installed with the default packages at minimum.

Once installed, open GNS3 and in Edit > Preferences > QEMU > QEMU VMs, create a New VM.

Name the VM whatever you want, but set the Type to ‘ASA 8.4(2)’. The default Qemu binary of qemu-system-x86_64w.exe is fine, as is the 1024MB of RAM. Choose the initrd and vmlinuz binary files created earlier, and then save the VM preferences.

The completed VM should look something like this:

GNS3 1.1 QEMU VM Configuration

Now, having created the VM definition, simply drag an ASA device into a topology; you should now be able to start it, connect to the console, and make connections just like any IOS device…

ASA successfully booted in GNS3 on Windows

Linux guide

As already mentioned above, I was unsuccessful getting a working solution in Linux, but I will put one here if I ever get one.

(It’s possible that I was making problems for myself by not to downloading the latest versions of GNS3 and Qemu. I prefer to use the distro packages wherever possible.)