While you may have some measures in place to allow for failures of your network and servers within that network, how much have you really thought about it? In many solutions there is always something that is overlooked or missed simply because there is a common mindset of "We'll probably never need it". The idea of IT solutions being vulnerable to a complete outage due to simple tasks being incomplete or ignored is very scary!
There are many business critical applications for each business specifically but the general servers and applications which every business should consider backing up, making highly available and taking extra care of are the below:
Exchange server (If a cloud email solution is not in use)
Virtual hosts (VMWare ESX or Hyper-v)
The servers mentioned above should all be able to fail completely without causing severe impact to your business operations. Just having two of each will be enough to allow for a failure as long as the configuration is correct but what is the correct configuration?
Well for a domain controller, you should configure them to act as a single entity which is very achievable. All Active Directory data can be shared between multiple domain controllers which allows your business to run as normal if one were to fail.
DNS and DHCP Server
A well known feature of the Windows Server OS is DHCP, this can be configured across multiple servers and placing the service on a domain controller is usually a good idea. If you place this service on both domain controllers, clients will still receive an IP address in the event of a failure. There are a few configurations which you have to replicate manually but the critical operations of DHCP would continue to function on your network in the event of a failure. DNS services also lie under the same principals as DHCP and can be configured in the same way.
Microsoft Exchange allows for high availability configurations in many ways and a trained specialist should always configure this highly available environment. Databases can be shared across multiple servers so in the event that one server fails, you can activate the database on another server without issue and your mail will continue to flow. In the current world, I would personally recommend the use of a hybrid Office365 solution or a full Office365 solution as the service is extremely reliable, takes up a lot less of your IT teams time and can be cheaper to use in the long run.
Now this is a big one, some people may think that just having two of each server is enough and they have thought of everything but this is not the case! You may have two virtual servers which are domain controllers, DHCP servers and DNS servers but what if the physical host fails? This is where you need to consider placing the secondary servers on a separate secondary host which has enough resource to take on the load of the primary host in the event of a failure and vice versa. VMWare performs a lot better than Hyper-v in my experience for high availibility and I would always recommend VMWare over Hyper-v.
So now we have covered the generic critical servers, what about your network? In order to have a fully redundant solution, you have to have a robust network that can handle equipment outages at any time. Here are the key factors to think about when designing and implementing a redundant network solution:
HA Firewall cluster
DHCP (If this service is not on a server)
Multiple Internet Circuits
In order for the above to work effectively, you need to have the correct configurations in place otherwise your network will not perform as expected eventhough you have the correct equipment. Here is a breakdown on what you should consider:
HA Firewall cluster
Creating a firewall cluster can be a very complex task depending on the manufacturer and the model in which you are using. I would always recommend a well known brand such as Cisco or Watchguard who have very reliable HA solutions available. The firewall should have rules in place to allow for a failure and traffic to still flow through. This means that rules should allow traffic to flow over the secondary firewall as it would allow for the primary which may seem like a simple task but it can be very complicated. With different private and public interfaces, you will need multiple public IP addresses or internet circuits to allow for such configurations. HSRP is a very good feature which you should consider using as this will allow any of your publicly accessible services to remain accessible in the event of a hardware failure. The reason for this is that your public DNS records will be pointing to the one public IP address used for a specific service, if this IP is able to "float" across both firewalls then it will always be accessible. The IP is just the start though because behind that IP are multiple firewall rules to NAT traffic and allow or deny traffic, as long as the private/LAN interface is given the same access as the primary firewalls private/LAN interface then you should not see an issue.
Switch stacks are an amazing feature which allow multiple physical switches to become a single entity. This means that if one switch in the stack was to fail, you would still have access to the network. Unfortunately there will always be a single point of failure for end users unless they have two network cables going to their PC or have the option of wired and wireless connections. The reason for this is that a single cable goes to a single interface on the switch and eventhough the switch is part of a stack, if it goes down the end user's PC will fail to connect to the network. Wireless solutions are very good for a backup method to access the network but strangely not often configured in such a way. As for the servers, more often than not they will have two network interfaces which you can configure as a bond/team and you can then connect those to separate switches in the stack and this allows for redundancy. Overall when configuring switch stacks, you have to remember that it is a useless configuration for redundancy unless you are connecting the client to separate switches within the stack.
Also known as etherchannels and trunked ports, port channels allow for multiple interfaces on a switch, firewall or router to act as a single interface. The configuration of a port channel allows for an interface to go down and there still be a connection between two devices. These are absolutely critical for links between firewalls and core switches as well as links between all switches on the network. The reason for this is that if you only have one single cable/connection on the only link out to the internet, then all it takes is that link to fail to cause a total outage.
As discussed earlier on in this article, DHCP is required for clients to connect to the network and it can be configured on networking equipment such as firewalls and switches. If you have a DHCP server configured on your network equipment then you need to be sure to have a secondary DHCP server configured which replicates the same DHCP pools and exemptions. If you do not have this then a single piece of equipment failing could take your entire network offline.
Multiple internet circuits
You should have mutliple internet circuits going into the building in which you are situated and the building where your infrastructure is situated. If one line were to fail, you would not lose connection to the internet providing the correct failover configuration was in place internally. To determine whether you configuration allows for an internet line to fail, consider HSRP or if you are unable to use HSRP then at least have a NAT rule for the failover public IP.
Overall you need to make sure that each device has a backup for the services it provides and you need to be sure that every device is configured in a redundant manner. There are cheap and expensive ways of doing this with the expensive option usually being the option whereby little disruption is seen when a device fails, I would always recommend the more costly option.