Monday, 2 January 2012

Dynamics CRM 4.0 Disaster Recovery

I've posted a new article on the technet wiki for Disaster recovery planning, you can find the article here:http://social.technet.microsoft.com/wiki/contents/articles/6404.aspx

Dynamics CRM 4.0 Disaster Recovery

When designing CRM 4.0 disaster recovery solution there are a few system tweaks needed depending on which method you use to restore a lost CRM environment, is not widely documented so I want to write this article in order to assist admins understanding different approaches to DR and the small tweaks needed in the system. When talking about disaster recovery, the first thing that you may want to do is identify two important pieces of information. (a) the time the business requires you to bring back CRM to a working state (b) how much data loss is acceptable. These two pieces of information are critical to successfully design the appropriate disaster recovery solution to your business needs. these pieces of information are generally applied to all kinds of disaster recovery systems and are called:
  1. Recovery Time Objective (RTO)
  2. Recovery Point Objective (RPO)
Recovery Time Objective:
RTO refers to the total acceptable time to recover the system.
Recovery Point Objective
RPO refers to the acceptable amount of data loss if the system is lost.
The above brief explanation gives you an idea what RPO and RTO measures in time. Together they define the starting point of your disaster recovery solution. The Business should set these values and define how critical CRM is for the company. If 4hours is the total time defined to both RTO and RPO it means you need a backup strategy that cannot lose more than 4hours of data and you must be able to recover the system inside the 4hours RTO.

Disaster Recovery Scenarios
CRM also relies on DNS, database backups and other network components for which you also need to account in a Disaster recovery solution, is out of scope of this article to cover those components. I'm only covering the technical aspects of CRM. In a tipical DR solution, you have stand-by servers on a separate site, these servers will be specifically configured for CRM or any other application, also known as fail-over servers. I'm considering the following servers:
  • 1x Application Server
  • 1x Platform Server
  • 1x SQL server
On your live environment you have an enterprise role-based CRM configuration with multiple Application and Platform servers. The database is backed up regularly with log shipping to comply with 4hours RPO, you backup MSCRM_CONFIG and COMPANY_MSCRM. The following are the possible scenarios to consider as a disaster recovery after the complete loss of the CRM live environment:

Restore Databases & Join servers to existing Organisation
Based on Microsoft documentation you can do a full restore of both CRM databases to a fail-over SQL server, and run the CRM setup wizard and instead of creating a new environment select the option to join to an existing environment. This method is valid although a manual tweak on the database is needed to actually make it work. The problem with this method is the time it takes to actually configure the servers with IIS pre-requisites and other server roles e.g. Indexing service, then running the setup wizard and obviously crossing fingers that no installation error occurs, if you don't update the installation setup files will most likely error. The MSCRM_CONFIG database holds a SQL connection string which CRM servers use to identify which SQL server to connect to, when you join the server to the organisation the servers will try to connect to the hard-coded server specified on this connection string, this case would be the live server which is down, even though they are actually connected to a working SQL server. You can test this, when you doing a DR exercise because the live database server is working, you will notice after joining the servers to the fail-over database server, data gets updated on the live server.
Before installing CRM and joining the servers to the new CRM environment, I would highly recommend you to manually change this SQL connection string:
  1. Open SQL management studio
  2. Connect to the SQL fail-over server and expand the MSCRM_CONFIG database and tables folder
  3. Locate the Organization table, right click over it and select EDIT top 200 rows
  4. The first column is ConnectionString this is the fields we need to edit
  5. Edit each entry and replace the server name or IP address on Provider=SQLOLEDB;Data Source=SERVER_NAME;Initial Catalog=ArupSandBox;Integrated Security=SSPI
This database change will make sure the servers joining to this environment will also connect to the correct SQL server. You ready to install the Application and Platform server and join these servers to the new environment, when completed test CRM and confirm is working fine, all previous registered plugins will also be published and working as normal. The second step will be configuring the report server, the live report server is also down and reports are not working. On the application server open deployment manager, disable the organisation, right-click and edit organisation, you can specify the fail-over SQL server with reporting services and click next to configure, when completed, this creates a new folder on the SRS server, and will also publish all existing reports including custom reports. However if you use custom iframes which point at specific reports outside the folder 4.0, these reports would have been lost if you don't backup the ReportServer database, if you do use custom iframes you need to include the ReportServer database in your daily backups. If you do restore the ReportServer database make sure the datasources are correctly configured.
If you have a heavy customized CRM environment, you have to make sure you restore from backup or network location, all the images or ISV files associated with custom solutions. Environments which are configured to use Kerberos authentication also require extra IIS configuration in order to work correctly. Take note of all your manual added CRM Registry keys as this is also something you want to restore to the DR environment, e.g. DisableOutlookClient, this disables the CRM Outlook client button from the CRM interface, you may not want extra load of calls asking why the Outlook client is not working after users have installed it from the web interface.
The process is not as straight forward as you may have envisaged, the time needed to perform all configurations may not comply with the company RTO. Is important to mention that other components e.g. DNS, Database Backup method/Restore time, network load balancers etc... Have not been taken into consideration, if you have a low RTO is important you consider all components which CRM is dependent on. This method advantage does not force the use of specific servers, giving you the flexibility to use any server available in a DR scenario. The following is a resume of the steps:
  1. Restore Databases
  2. Configure servers with minimum pre-requisites (if not done already)
  3. Database Connection String manual update
  4. Run CRM setup wizard and join Application and Platform servers to the organisation
  5. Manually configure extra IIS or other components
  6. Restore any custom images or ISV solutions
  7. Restore any CRM web.config configuration and Registry keys
  8. Deployment manager EDIT Report Server
Restore Databases onto a stand-by environment
With this solution you configure a stand-by fully working CRM environment on your DR site. You can install a new CRM environment creating an organisation with the same name or join the servers to an existing restored database. Configure IIS, copy ISV and image files, restore Registry keys and reports, confirm all is working correctly, set up log shipping for the databases and you ready to go.
The main difference is that you need to allocate servers and resources permanently even though you are not using them. The benefit is a faster recovery method, with many of the configuration and installation tasks done. You still need to edit the SQL connection string as explained on the previous scenario because the databases still need to be restored from backup and again edit the report server configuration. The following is a resume of the steps:
  1. Restore Databases
  2. Database Connection String manual update
  3. Deployment Manager EDIT Report Server

Conclusion
I hope this article has helped you in the design of your disaster recovery solution for CRM 4.0. For an advanced admin the two solutions described on this article are obvious and I believe the positive side to take from this article are the small details that can help towards understanding how CRM works and on what situations and configurations to consider in order to successfully design the appropriate disaster recovery solution.
For those with very very low RTO and RPO times, and a more automated solution is needed, please look at platespin or doubletake solutions, these are software that can make a copy of other servers in real time to a different location.

3 comments:

  1. This is a good post. I have been looking for a long while for some Dynamics CRM 2011 DR guidance.

    Does this also apply to Dynamics CRM 2011?

    ReplyDelete
  2. Hi Gee, Thanks.
    I haven't done a full test yet on CRM 2011, but I believe the concept is the same.

    ReplyDelete
  3. Hello we have read your post and i think it's more relevant for others.Thanks for the sharing this website. it is very useful professional knowledge.
    Disaster Restoration Web Design

    ReplyDelete