One Byte at a Time: High Availability and Disaster Recovery Options for Dynamics 365 (CRM)

Planning the infrastructure for your Dynamics 365 environment is a critical step in the deployment process, especially when the software is mission critical to business needs and day-to-day operations. In scenarios like these, it is imperative to have high availability and even disaster recovery plans in place. So the question is: What options do you have when it comes to Dynamics 365 on premise? In this article, we will break it down from both an application (Dynamics 365) and database (SQL Server) perspective.

High Availability – Dynamics 365
For most companies, having high availability for their Dynamics 365 deployment is a must and the setup for this is similar to that of other web-based applications. Keep in mind that through this section we are strictly referring to Dynamcis 365 servers with the “Front End” or “Full” role – in other words, the servers that host the website in IIS. Having multiple “Back End”-only servers (those with the Async and Sandbox services) is also a good idea but those will natively load balance themselves if they are in the same deployment.

Network Load Balancing – Probably the most obvious method for high availability is running multiple Dynamics 365 servers in a load-balancing configuration. This will not only help with carrying the day-to-day load of traffic but also serve as the foundation for a good high availability option. If a server were to fail, just pop it out of the NLB pool. Load balancing comes in a few different flavors because of all the applications out there but mainly it comes down to two options – hardware and software.

Hardware Load Balancing involves the use of a third-party appliance such as F5 or NetScaler that sits in front of the Dynamics 365 servers to handle which server receives the user requests. While this is the preferred option as it does not create any additional processes for the servers to handle and usually provides a wider array of configuration options (e.g. monitoring, security, filtering, etc…), it can be more difficult and much more costly to implement.

Software Load Balancing on the other hand is relatively easy to setup and in the case of Windows NLB, is free, as you have already paid for your Windows Server licenses. Critics of software load balancing will contend that it is not “true” high-availability due to the lack of many key features found in hardware load balancing. The fact stands that Windows NLB is just Round Robin DNS so if a server were to go down, requests would still be sent to the faulted server until it is told otherwise.

As one can probably infer, this is simply a cost vs. ROI decision – which is the case with most IT situations. If the hardware option is being considered, make sure to reach out to your appliance vendor to ensure there are no known issues with it and Dynamics 365. It is also a good idea to inquire about documentation for setup. For more information about load balancing Dynamics 365 in general and the required configurations, check out this TechNet: https://technet.microsoft.com/en-us/library/hh699803.aspx.

Backup Server(s) – While this method may be a little more rigid, it is still effective and again, has a couple options. The meat and potatoes of this route is having a server to fall back to in the event that your primary Dynamics 365 server(s) fails and NLB has not been implemented.

VM Snapshots – If you are reading this, you most likely understand the concept of VM snapshots so we will not dive too heavily into this. The idea is to have automatic snapshots taken automatically on a set interval that the Dynamics 365 server can be reverted to or new server built from in the event of failure. Reverting the existing server is the easier option but also the more risky of the two. Spinning up a new server is safer but more involved – and requires the shutdown of the existing server as you will need to use the same host name and IP address.

Laying in wait – We have had some customers opt for this route as the return to uptime is a little quicker and/or VMs were not being used. Dynamics 365 deployments can obviously be comprised of multiple servers – but not all need to be active or enabled. The concept here is to build a secondary Dynamics 365 server as part of the deployment but leave it disabled until it is needed. If the primary server were to fail, all that would be required is enabling of the server via the Deployment Manager and changes to DNS (and firewall rules, if applicable).

High Availability – SQL Server

Now that we have covered high availability from an application perspective, what about the database? Don’t worry! Microsoft has that covered too!

SQL Server Failover Cluster – There is nothing new about this concept. Two SQL Servers are setup to use a shared disk, one server goes down and the other picks up the load. When installing Dynamics 365 just make sure to point connection strings to the cluster name and not a single node and you should not have any issues but be sure to test failover prior to going live. Just to note - although you can install Dynamics 365 to a SQL Server cluster configured for either active-active or active-passive clustering, the cluster will function in an active-passive manner. While clustering is tried and true, Microsoft decided to go a step further with...

SQL Server Always On – New to SQL Server 2012 (Enterprise Edition), Always On introduced the concept of Availability Groups. An Availability Group is simply a container for a set of databases that fail over together. For purposes of Dynamics 365 databases, this is important considering there are always at least two databases per deployment. With Always On, the databases in the Availability Groups are synchronized amongst all the configured replicas. We won’t get into the details of setting up Always On (there are tons of articles out there already) but be sure to reference this TechNet for how to configure Dynamics 365 with SQL Always On: https://technet.microsoft.com/en-us/library/jj822357.aspx

Disaster Recovery – Dynamics 365 and SQL Server

While having a single server fail is far more likely than seeing your entire data center swallowed into a pit only to be burnt up in the Earth’s core, it is a situation that should also be planned for! Unlike the sections regarding high availability, we need to think of a DR scenario in terms of the entire infrastructure – both Dynamics 365 and SQL Server – because if there is a disaster, chances are it is taking both so these plans go hand-in-hand.

Over the years, we have implemented CRM/Dynamics 365 disaster recovery plans for numerous customers and as a result, have learned (what we consider) the best option – which is what we will be covering. This is not to say there are no other viable methods (such as VM snapshot migrations or SQL Log Shipping), but we will keep focus on our preferred option. In a nutshell, we are going to have a separate, self-contained Dynamics 365 environment in the DR data center that has data synced from the production SQL servers in the primary site.

SQL Server

Let us start from a database perspective since it is a little more complex and setting up Dynamics 365 for DR will require this information. As previously touched on, SQL Always On will come in handy yet again and recall the concept of Availability Groups as this will play heavy into this strategy.

For DR scenarios with Dynamics 365, we will need two Availability Groups in SQL AlwaysOn – one for the MSCRM_CONFIG database to be synchronized between the two primary data center nodes and another to synchronize the Organization_MSCRM database between not only the two primary data center SQL Servers but also a third SQL Server in your DR data center. The reason for this will become clear as you keep reading.

Dynamics 365

Going into the Dynamics 365 server installation(s), the third SQL node should already be built and ready with SQL Always On. Unlike the installation of the Dynamics 365 servers in the primary data center, you do NOT want to set the SQL connection to use the Availability Group but rather just the SQL Server located in the DR data center. The reasoning behind this is the MSCRM_CONFIG database – this database is specific per deployment of Dynamics 365 and two cannot exist in the same SQL server/instance (nor can the name be changed). Remember that this Dynamics 365 DR deployment is to be isolated from the primary.

At this point, you may be asking yourself how this all comes together. When the primary data center goes down (mainly both primary SQL nodes), the organization database will be failed over to the third node in DR. Once that happens, launch the deployment manager on the Dynamics 365 DR Server and import the organization database into the deployment. That should take about 5 minutes and then Dynamics 365 will be back up and running.

Great! We have the servers all setup in DR and are ready for that giant pit to open up beneath the data center! Now what? RUN THROUGH A TEST DISASTER RECOVERY SCENARIO! This cannot be stressed enough – you do not want to wait for a real disaster to find some small configuration issue messed up the entire plan.

4 comments:

PranavOctober 10, 2018 at 2:35 AM
While using Always-On,
• What would happen to the inflight asynchronous plugins triggered using OrganizationServiceProxy?
• Will the context of the DB be available in the OrganizationServiceProxy or would it fail?
• Would the connections be stale or active?
• Do you know if CRM would automatically refresh the connections for OrganizationServiceProxy or would the requests get failed?
• Should the async service be restarted?
• Is the Plugin instance a Singleton setup?
kingsMay 12, 2019 at 1:47 PM
I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
commercial
McRueOctober 29, 2020 at 4:54 AM
quick one. just trying to understand the configuration for the CRM server in the DR, will that be a completely new CRM configuration or will we re-use the MSCRM_Config database?

Thursday, June 29, 2017

High Availability and Disaster Recovery Options for Dynamics 365 (CRM)

4 comments: