I've posted a new article on the technet wiki for Disaster recovery planning, you can find the article here:
http://social.technet.microsoft.com/wiki/contents/articles/6404.aspx
Dynamics CRM 4.0 Disaster Recovery
When designing CRM 4.0 disaster recovery solution there are a few
system tweaks needed depending on which method you use to restore a lost
CRM environment, is not widely documented so I want to write this
article in order to assist admins understanding different approaches to
DR and the small tweaks needed in the system. When talking about
disaster recovery, the first thing that you may want to do is identify
two important pieces of information. (a) the time the business requires
you to bring back CRM to a working state (b) how much data loss is
acceptable. These two pieces of information are critical to successfully
design the appropriate disaster recovery solution to your business
needs. these pieces of information are generally applied to all kinds of
disaster recovery systems and are called:
- Recovery Time Objective (RTO)
- Recovery Point Objective (RPO)
Recovery Time Objective:
RTO refers to the total acceptable time to recover the system.
Recovery Point Objective
RPO refers to the acceptable amount of data loss if the system is lost.
The above brief explanation gives you an idea what RPO and RTO
measures in time. Together they define the starting point of your
disaster recovery solution. The Business should set these values and
define how critical CRM is for the company. If 4hours is the total time
defined to both RTO and RPO it means you need a backup strategy that
cannot lose more than 4hours of data and you must be able to recover the
system inside the 4hours RTO.
Disaster Recovery Scenarios
CRM also relies on DNS, database backups and other network components
for which you also need to account in a Disaster recovery solution, is
out of scope of this article to cover those components. I'm only
covering the technical aspects of CRM. In a tipical DR solution, you
have stand-by servers on a separate site, these servers will be
specifically configured for CRM or any other application, also known as
fail-over servers. I'm considering the following servers:
- 1x Application Server
- 1x Platform Server
- 1x SQL server
On your live environment you have an enterprise role-based CRM
configuration with multiple Application and Platform servers. The
database is backed up regularly with log shipping to comply with 4hours
RPO, you backup MSCRM_CONFIG and COMPANY_MSCRM. The following are the
possible scenarios to consider as a disaster recovery after the complete
loss of the CRM live environment:
Restore Databases & Join servers to existing Organisation
Based on Microsoft documentation you can do a full restore of both
CRM databases to a fail-over SQL server, and run the CRM setup wizard
and instead of creating a new environment select the option to join to
an existing environment. This method is valid although a manual tweak on
the database is needed to actually make it work. The problem with this
method is the time it takes to actually configure the servers with IIS
pre-requisites and other server roles e.g. Indexing service, then
running the setup wizard and obviously crossing fingers that no
installation error occurs, if you don't update the installation setup
files will most likely error. The MSCRM_CONFIG database holds a SQL
connection string which CRM servers use to identify which SQL server to
connect to, when you join the server to the organisation the servers
will try to connect to the hard-coded server specified on this
connection string, this case would be the live server which is down,
even though they are actually connected to a working SQL server. You can
test this, when you doing a DR exercise because the live database
server is working, you will notice after joining the servers to the
fail-over database server, data gets updated on the live server.
Before installing CRM and joining the servers to the new CRM
environment, I would highly recommend you to manually change this SQL
connection string:
- Open SQL management studio
- Connect to the SQL fail-over server and expand the MSCRM_CONFIG database and tables folder
- Locate the Organization table, right click over it and select EDIT top 200 rows
- The first column is ConnectionString this is the fields we need to edit
- Edit each entry and replace the server name or IP address on
Provider=SQLOLEDB;Data Source=SERVER_NAME;Initial
Catalog=ArupSandBox;Integrated Security=SSPI
This database change will make sure the servers joining to this
environment will also connect to the correct SQL server. You ready to
install the Application and Platform server and join these servers to
the new environment, when completed test CRM and confirm is working
fine, all previous registered plugins will also be published and working
as normal. The second step will be configuring the report server, the
live report server is also down and reports are not working. On the
application server open deployment manager, disable the organisation,
right-click and edit organisation, you can specify the fail-over SQL
server with reporting services and click next to configure, when
completed, this creates a new folder on the SRS server, and will also
publish all existing reports including custom reports. However if you
use custom iframes which point at specific reports outside the folder
4.0, these reports would have been lost if you don't backup the
ReportServer database, if you do use custom iframes you need to include
the ReportServer database in your daily backups. If you do restore the
ReportServer database make sure the datasources are correctly
configured.
If you have a heavy customized CRM environment, you have to make sure
you restore from backup or network location, all the images or ISV
files associated with custom solutions. Environments which are
configured to use Kerberos authentication also require extra IIS
configuration in order to work correctly. Take note of all your manual
added CRM Registry keys as this is also something you want to restore to
the DR environment, e.g. DisableOutlookClient, this disables the CRM
Outlook client button from the CRM interface, you may not want extra
load of calls asking why the Outlook client is not working after users
have installed it from the web interface.
The process is not as straight forward as you may have envisaged, the
time needed to perform all configurations may not comply with the
company RTO. Is important to mention that other components e.g. DNS,
Database Backup method/Restore time, network load balancers etc... Have
not been taken into consideration, if you have a low RTO is important
you consider all components which CRM is dependent on. This method
advantage does not force the use of specific servers, giving you the
flexibility to use any server available in a DR scenario. The following
is a resume of the steps:
- Restore Databases
- Configure servers with minimum pre-requisites (if not done already)
- Database Connection String manual update
- Run CRM setup wizard and join Application and Platform servers to the organisation
- Manually configure extra IIS or other components
- Restore any custom images or ISV solutions
- Restore any CRM web.config configuration and Registry keys
- Deployment manager EDIT Report Server
Restore Databases onto a stand-by environment
With this solution you configure a stand-by fully working CRM
environment on your DR site. You can install a new CRM environment
creating an organisation with the same name or join the servers to an
existing restored database. Configure IIS, copy ISV and image files,
restore Registry keys and reports, confirm all is working correctly, set
up log shipping for the databases and you ready to go.
The main difference is that you need to allocate servers and
resources permanently even though you are not using them. The benefit is
a faster recovery method, with many of the configuration and
installation tasks done. You still need to edit the SQL connection
string as explained on the previous scenario because the databases still
need to be restored from backup and again edit the report server
configuration. The following is a resume of the steps:
- Restore Databases
- Database Connection String manual update
- Deployment Manager EDIT Report Server
Conclusion
I hope this article has helped you in the design of your disaster
recovery solution for CRM 4.0. For an advanced admin the two solutions
described on this article are obvious and I believe the positive side to
take from this article are the small details that can help towards
understanding how CRM works and on what situations and configurations to
consider in order to successfully design the appropriate disaster
recovery solution.
For those with very very low RTO and RPO times, and a more automated
solution is needed, please look at platespin or doubletake solutions,
these are software that can make a copy of other servers in real time to
a different location.