Building Microsoft MEAP: Scaling Out SQL Server 2012

Super Scale

In this third article on building Microsoft MEAP, I’ll show you how to Shard your SQL Server 2012 database using Replication to create a Shared Nothing data architecture to support Internet-scale mobile solutions.

In the previous article, I discussed Gartner’s Enterprise Application Integration Tools (EAI) critical capability for building a Mobile Enterprise Application Platform (MEAP) using Microsoft technologies.  Tying into multiple backend packages and data sources is an essential CIO requirement for moving enterprise data out to mobile devices and SQL Server Integration Services (SSIS) performs this task beautifully.

Like virtually every enterprise and Internet based application or website, the database is the heart of the system.  This is also true for MEAP systems.  Don’t be fooled by MEAP vendors that use clever marketing terms to cover up this fact.  The Mobile Middleware often associated with MEAP systems is a database and some kind of web/app server or HTTP listener.  Staging tables in the database are needed to cache data moving between devices and backend systems.  Since mobile devices run on the Internet via mobile data networks, web/app servers are needed to transmit data over HTTPS.  The problem with most on-premise and even some cloud-based MEAP solutions is that they can’t deliver Internet scalability.  I’m talking tens of thousands, hundreds of thousands, or even millions of devices.

When it comes to boosting performance and supporting more concurrent clients, you think of scaling up with beefier hardware and scaling out with more servers. Unfortunately, most systems I’ve observed around the world limit their scaling to load-balanced web/app servers pointed at a single database.  I know, hard to believe.

SQL Server + 2 IIS Servers

Obviously, this only takes you so far before you run out of gas.  Infrastructure Architects need to aim higher and learn from the scalability best practices of the world’s largest search engines, social networks, and e-commerce sites.  Guess what, if your favorite social network has a billion users, a single database and a bunch of load-balanced web/app servers are going to melt down in the first few nanoseconds of operation. Instead, multiple layers of scaleout architectures are employed to support a large percentage of the global population, and the databases are no exception.  Since databases are usually responsible for around 90% of all system bottlenecks, I would venture to say scaling out your database is one of the most important things you can do.

Data replication technologies are used by the world’s largest sites to horizontally partition Relational and NoSQL databases.  Google coined the term “sharding” to refer to the shared-nothing database partitioning used by their Big Table architecture. Common sense tells you that even the world’s most powerful database running on the world’s biggest server will eventually hit a saturation point.  Replicating either complete or partial copies of your big database to tens of thousands of commodity servers that don’t share a common disk is the key to scaling out.  Sorry about the big SAN you just spent a few million dollars on.  The other nuance you see with these large, replicated systems is breaking up the reads and writes into different groups of servers.  Following the 80/20 rule, most clients on the Internet are SELECTing data while a much smaller group are uploading INSERTs, UPDATEs, and DELETEs.  It therefore follows that the bulk of the replicated database shards are read-only and load-balanced just like web servers.  These servers have a lower amount of replication overhead because data updates only come to them in one direction from the top of the hierarchy.  A second, smaller group of load-balanced database servers handle all the writes coming in from the Internet clients. They track and merge those changes into the multi-master databases at the top of the hierarchy.  Hopefully, all this makes sense to you.  To put it simply, instead of a billion people trying to hit a single database, smaller fractions of a billion people are hitting multiple copies of that same database. Additionally, these database shards are smaller and therefore faster because disk I/O is reduced since the ratio between memory and data on disk is improved.  Of course, it can get more granular than that.  Instead of replicating complete copies of your database, you could replicate each table down to its own individual database server to scale out even further.  If that’s not enough, you can replicate different ranges of table rows down to their own individual database servers so that a single table spans multiple servers.  In all cases, your app servers would maintain the intelligence to know which sharded database servers to communicate with in order to get the right answer for a mobile user.  It also means that you need design your databases differently to avoid table JOINs across replicated server nodes.

Super Scale

While you’re probably well-versed in scaling out your stateless web/app servers using load-balancers, there’s one other ingredient that’s commonly used across the world’s largest systems.  Distributed caching.  While scaling out replicated database shards and web/app servers make it possible to support a large chunk of the planet, it’s caching that takes this performance and scalability to the next level.  It’s great that the answer to your query is found on a nearby, replicated database shard.  But what if your answer was already in RAM and you didn’t have to query the database in the first place?  A tier of distributed caching servers holding terabytes of data in RAM is what helps your Facebook page load fast when you and a few hundred million of your best friends are all trying to access the site at the same time.

Now that you have a bite-sized primer on how the world’s largest systems scale and perform, you’re probably wondering if you can do the same thing to scale your MEAP solution using the Microsoft servers and technologies that you already own.

The answer is Yes!

While I understand that you might not need to support millions of users with your solution, the important thing to take away from this article is that with Microsoft MEAP you can.  As you can guess from the title of this article, I’m going to focus on scaling out SQL Server since this might be more unfamiliar to you than scaling out Internet Information Services (IIS) using the Network Load Balancing (NLB) feature of Windows Server.  Also, if you’re wondering about that cool distributed caching thing, we have that too in the form of the Windows Server AppFabric Cache.  Both our load-balancing and distributed cache technologies are included features of Windows Server.

Let’s jump into scaling SQL Server since it’s the heart of Microsoft MEAP.  From a scale-up perspective, here’s a few things you can do with SQL Server Standard Edition:

  • It supports up to 64 GB of RAM.  Hey, RAM is cheap.  Buy all 64 GB and allow your database to run in memory as much as possible to avoid disk I/O.
  • It supports up to 16 cores.  While you might use this many cores (or more) on the SQL Server at the top of your hierarchy, you won’t need this many for your shard servers.  Faster clock speeds and the largest possible L3 shared cache you can find are key.
  • It’s time to make the move from spinning disks to solid state drives (SSD). Upgrading to a SATA SSD can give you up to a 100x performance boost over their rotating counterparts.  Do your research when looking a different vendors.
  • Make sure the replicated data is moving as fast as possible between your servers by using 10 gigabit Ethernet network cards and switches.  Make sure to keep your network drivers updated as your vendor releases new ones.

Let’s scale out!

The first thing to consider when scaling out any database, is that the schema must be designed to support breaking the database apart.  In other words, think about how you might turn your relational database with referential integrity, foreign keys, and other constraints into something that looks more like a NoSQL database.  Denormalize to eliminate performance-killing JOINs.  Keep in mind that if you can’t build a database this way or you have an existing database that you can’t change, you’ll need to use Transactional Peer to Peer Replication to make complete copies of your database to scale out.  Normally, I would prefer that you only use P2P or “Always On” to maintain a separate replica hierarchy in another data center.

Those of you who know me are aware that I’ve been lucky enough to build many of the largest mobile systems in the world utilizing SQL Server Merge Replication.  I’ve also done my best to teach others what I’ve learned along the way.  This time around, I’m going to show you how to use this same replication technology to create a variety of database shards.  As you might imagine, I’ll use the ContosoFruit database of data aggregated via SSIS from the backend data sources from the last article.

ContosoFruit

Replication in SQL Server uses the concepts of Publishers to describe which database is making its data available and Subscribers to describe which databases are consuming replicas.  As you might imagine, the ContosoFruit database will be the Publisher.  That being said, since I will be creating 3 shards, one for each table, I won’t be publishing the entire database as a single entity.  You’ll have 3 Publications instead. In order to create a new Publication based on this database, I need to you expand the Replication folder in the Object Explorer.

Customer Publication

Right click on the Local Publications folder and select New Publication to launch the New Publication Wizard.  Click Next.

New Publication Wizard

  • If your Distributor isn’t already configured, you’ll be taken to the Distributor dialog where you will select the first radio button to allow your SQL Server to act as its own Distributor.  Click Next.
  • If your SQL Server Agent isn’t already configured, you’ll be taken to the SQL Server Agent Start dialog where you should select the Yes radio button to start the Agent service automatically.  Click Next.
  • If you don’t yet have a folder to hold the initial database snapshots, you’ll be taken to the Snapshot Folder dialog.  Before entering a path in the Snapshot folder text box, create a folder on your local PC called Snapshot.  Share that folder as a network share that’s available to Everyone with Read/Write permissions.  Now go back to the Snapshot folder text box and enter \\MachineName\Snapshot and then click Next.
  • Click Next to move to the Publication Database dialog.
  • Select the ContosoFruit database and click Next.
  • In the Publication Type dialog, select Merge Publication and click Next.
  • In the Subscriber Types dialog, select SQL Server 2008 or later and click Next.
  • In the Articles dialog, expand the Tables tree view and check the Customers check box.  This means that the Customers table and all its columns will be replicated.  Check the Highlighted table is download-only check box.  This ensures that only changes made to the Publisher at the top of the hierarchy will be replicated down to read-only Subscribers.  Click Next.
  • In the Article Issues dialog you’re informed that a Uniqueidentifier column will be added to your table.  Merge replication uses this to track each row for changes. If your Primary Key is already a Uniqueidentifier, the system will use it instead of adding a new one.  Uniqueness is an important part of any data sync system. Click Next.
  • In the Filter Table Row dialog, I don’t want you to create any filters because I want this Customers shard to contain the complete table.  Click Next.
  • In the Snapshot Agent dialog, check both check boxes.  Click the Change button and in the New Job Schedule dialog, change the Recurs every: text box to 1 day(s) instead of 14.  Click OK and then click Next.
  • In the Agent Security dialog, click the Security Settings button.  In the Snapshot Agent Security dialog, select the second radio button to run it under the SQL Server Agent service account.  You won’t do this in production but we’re doing it now for expedience sake.  Click OK and then click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, enter CustomerShard and click Finish.
  • In the Creating Publication dialog, if everything succeeds, click Close.

Sorry about all those tedious steps.  Keep in mind that the next two shard Publications will be easier to create so let’s get to it.

Product Publication

Like before, expand the Replication folder in the Object Explorer, right click on the Local Publications folder, select New Publication to launch the New Publication Wizard and click Next.

  • In the Publication Database dialog, select the ContosoFruit database and click Next.
  • In the Publication Type dialog, select Merge Publication and click Next.
  • In the Subscriber Types dialog, select SQL Server 2008 or later and click Next.
  • In the Articles dialog, expand the Tables tree view and check the Products check box.  Check the Highlighted table is download-only check box.  Click Next.
  • In the Article Issues dialog you’re informed that a Uniqueidentifier column will be added to your table.  Click Next.
  • In the Filter Table Row dialog, I don’t want you to create any filters because I want this Products shard to contain the complete table.  Click Next.
  • In the Snapshot Agent dialog, check both check boxes.  Click the Change button and in the New Job Schedule dialog, change the Recurs every: text box to 1 day(s) instead of 14.  Click OK and then click Next.
  • In the Agent Security dialog, click the Security Settings button.  In the Snapshot Agent Security dialog, select the second radio button to run it under the SQL Server Agent service account.  Click OK and then click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, enter ProductShard and click Finish.
  • In the Creating Publication dialog, if everything succeeds, click Close.

Only one more shard to go.  This one will be different because it’s designed to support write operations coming in from mobile devices.

Order Publication

Expand the Replication folder in the Object Explorer, right click on the Local Publications folder, select New Publication to launch the New Publication Wizard and click Next.

  • In the Publication Database dialog, select the ContosoFruit database and click Next.
  • In the Publication Type dialog, select Merge Publication and click Next.
  • In the Subscriber Types dialog, select SQL Server 2008 or later and click Next.
  • In the Articles dialog, expand the Tables tree view and check the Orders check box.  Click Next.
  • In the Article Issues dialog you’re informed that a Uniqueidentifier column will be added to your table.  Click Next.
  • In the Filter Table Row dialog, click the Add button and select Add Filter.  In the Add Filter dialog, go to the Filter statement text box and add 1 = 0 to the end of the WHERE clause.  The filter should look like the following when you’re done:

Add Filter

  • Using 1 = 0 as the table filter causes Replication to work in an upload-only manner.  When the Orders table is synchronized, only the empty shell of the table will be created in the Subscriber database.  Any new data added to it will be uploaded to ContosoFruit and then removed from the Subscriber database.  Click Ok and Next.
  • In the Snapshot Agent dialog, check both check boxes.  Click the Change button and in the New Job Schedule dialog, change the Recurs every: text box to 1 day(s) instead of 14.  Click OK and then click Next.
  • In the Agent Security dialog, click the Security Settings button.  In the Snapshot Agent Security dialog, select the second radio button to run it under the SQL Server Agent service account.  Click OK and then click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, enter OrderShard and click Finish.
  • In the Creating Publication dialog, if everything succeeds, click Close.

You now have 3 Publications and you can have as many Subscription databases as you need to scale out.  In this example, I will just have you create 1 Subscription to match each Publication.

Customer Subscription

Expand the Replication folder in the Object Explorer, right click on the Local Subscriptions folder, select New Subscription to launch the New Subscription Wizard and click Next.

  • In the Publication dialog, select CustomerShard and click Next.
  • In the Merge Agent Location dialog, select the first radio button to run all agents at the Distributor.  Click Next.
  • In the Subscribers dialog, check the check box for the local SQL Server that you’re using.  Click the Subscription Database combo box and select New Database.  In the New Database dialog, enter Customer1 in the Database name text box and click OK.  Click Next.
  • In the Merge Agent Security dialog, click the ellipsis on the far right.  In the dialog, select the second radio button to run under the SQL Server Agent service account.  You won’t use this security option in production.  Click OK and click Next.
  • In the Synchronization Schedule dialog, click the Agent Schedule combo box and select Run Continuously.  This is the obvious choice since you creating a real-time, OLTP database.  Click Next.
  • In the Initialize Subscriptions dialog, stick with the default value of initializing immediately and click Next.
  • In the Subscription Type dialog, click the Subscription Type combo box and select Client.  This prevents new data from being added at the Subscriber and uploaded back to the Publisher.  Click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, click Finish.  If everything succeeds, click Close

To verify that all your settings are correct and that everything is working, open your new Customer1 database, right click on the Customers table and Select Top 1000 Rows. The table should be filled with the same list of customers that you find in the ContosoFruit database.  The next test is to add a new row in the ContosoFruit Customers table, wait for several seconds, and then refresh the Customer1 Customers table.  The new row should appear and you now have your first read-only database shard based on SQL Server.

Product Subscription

Expand the Replication folder in the Object Explorer, right click on the Local Subscriptions folder, select New Subscription to launch the New Subscription Wizard and click Next.

  • In the Publication dialog, select ProductShard and click Next.
  • In the Merge Agent Location dialog, select the first radio button to run all agents at the Distributor.  Click Next.
  • In the Subscribers dialog, check the check box for the local SQL Server that you’re using.  Click the Subscription Database combo box and select New Database.  In the New Database dialog, enter Product1 in the Database name text box and click OK.  Click Next.
  • In the Merge Agent Security dialog, click the ellipsis on the far right.  In the dialog, select the second radio button to run under the SQL Server Agent service account.  Click OK and click Next.
  • In the Synchronization Schedule dialog, click the Agent Schedule combo box and select Run Continuously.  Click Next.
  • In the Initialize Subscriptions dialog, stick with the default value of initializing immediately and click Next.
  • In the Subscription Type dialog, click the Subscription Type combo box and select Client.   Click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, click Finish.  If everything succeeds, click Close

To verify that all your settings are correct and that everything is working, open your new Product1 database, right click on the Products table and Select Top 1000 Rows.  The table should be filled with the same list of products that you find in the ContosoFruit database.  The next test is to add a new row in the ContosoFruit Products table, wait for several seconds, and then refresh the Product1 Products table.  The new row should appear and you now have your second read-only database shard based on SQL Server.

Order Subscription

Expand the Replication folder in the Object Explorer, right click on the Local Subscriptions folder, select New Subscription to launch the New Subscription Wizard and click Next.

  • In the Publication dialog, select OrderShard and click Next.
  • In the Merge Agent Location dialog, select the first radio button to run all agents at the Distributor.  Click Next.
  • In the Subscribers dialog, check the check box for the local SQL Server that you’re using.  Click the Subscription Database combo box and select New Database.  In the New Database dialog, enter Order1 in the Database name text box and click OK.  Click Next.
  • In the Merge Agent Security dialog, click the ellipsis on the far right.  In the dialog, select the second radio button to run under the SQL Server Agent service account.  You won’t use this security option in production.  Click OK and click Next.
  • In the Synchronization Schedule dialog, click the Agent Schedule combo box and select Run Continuously.  This is the obvious choice since you creating a real-time, OLTP database.  Click Next.
  • In the Initialize Subscriptions dialog, stick with the default value of initializing immediately and click Next.
  • In the Subscription Type dialog, click the Subscription Type combo box and select Client.  This prevents new data from being added at the Subscriber and uploaded back to the Publisher.  Click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, click Finish.  If everything succeeds, click Close.

To verify that all your settings are correct and that everything is working, open your new Order1 database, right click on the Orders table and Select Top 1000 Rows.  The table should be empty.  The next test is to add a new row in the empty Order1 Orders table, wait for several seconds, and then refresh the ContosoFruit Orders table.  The new row should appear and you now have your writable database shard based on SQL Server.

Congratulations!  You’ve scaled out one database into one writable database shard and two read-only shards as shown below:

Publications and Subscriptions

The ContosoFruit database will no longer have to bear the brunt of all your mobile devices retrieving Customers and Products while uploading new Orders.  ContosoFruit will only see 3 connections moving and merging data back and forth instead of thousands.

3 Pubs + 3 Subs

The web services I’ll be showing you how to create in the next article will point to the appropriate shards from the IIS app servers.  Keep in mind that in a production system, you’ll need to create at least 2 load-balanced SQL Servers for each shard in order to maintain high availability.

Now that you know how to shard complete tables to n database server nodes, you probably want to know how to shard at an even more granular level.  I’m talking about scaling ranges of table rows across multiple server nodes.  The example you always hear about is partitioning a Customer table with tens of millions of rows by the first letter of a customer’s last name.  For instance, node 1 gets customers A – I, node 2 gets J – R, and node 3 gets S – Z.

Customers A - Z

You can slice and dice this any way you want.  You could even have 26 separate nodes for every letter of the alphabet if you need that level of scale.  Keep in mind that you won’t necessarily get an even distribution of table rows across nodes since the “S” node will have dramatically more customers than the “Q” node.  Using a customer Id column to filter on might yield better results when it comes to numerically balancing the load.  Speaking of balancing, as the number of rows in a given table increase, you will find that some nodes will start to have more rows than others.  From time to time, you’ll need to re-balance them.

Luckily, Merge Replication Publications have the ability to perform row filtering which makes this more granular level of sharding pretty simple.  Keep in mind that you will only do this type of filtering for your download-only/read-only shards.  For this example, I’m going to create 2 Customer Publications that filter the rows based on the Id column in order to get 2 nodes with a roughly equal number of customers.

Customer (First Half) Publication

Expand the Replication folder in the Object Explorer, right click on the Local Publications folder, select New Publication to launch the New Publication Wizard and click Next.

  • In the Publication Database dialog, select the ContosoFruit database and click Next.
  • In the Publication Type dialog, select Merge Publication and click Next.
  • In the Subscriber Types dialog, select SQL Server 2008 or later and click Next.
  • In the Articles dialog, expand the Tables tree view and check the Customers check box.  Check the Highlighted table is download-only check box.  Click Next.
  • In the Article Issues dialog you’re informed that a Uniqueidentifier column will be added to your table.  Click Next.
  • In the Filter Table Row dialog, click the Add button and select Add Filter.   In the Add Filter dialog, go to the Filter statement text box and add Id <= (SELECT COUNT(*)/2 FROM [dbo].[Customers]) to the end of the WHERE clause.  The filter should look like the following when you’re done:

Add Filter

  • The subquery in the WHERE clause calculates the total number of rows, divides them by 2, and then returns a list of customers whose Id is less than or equal to the midpoint of the list.  Click OK and then click Next.
  • In the Snapshot Agent dialog, check both check boxes.  Click the Change button and in the New Job Schedule dialog, change the Recurs every: text box to 1 day(s) instead of 14.  Click OK and then click Next.
  • In the Agent Security dialog, click the Security Settings button.  In the Snapshot Agent Security dialog, select the second radio button to run it under the SQL Server Agent service account.  Click OK and then click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, enter CustomerFirstHalfShard and click Finish.
  • In the Creating Publication dialog, if everything succeeds, click Close.

Customer (Second Half) Publication

Expand the Replication folder in the Object Explorer, right click on the Local Publications folder, select New Publication to launch the New Publication Wizard and click Next.

  • In the Publication Database dialog, select the ContosoFruit database and click Next.
  • In the Publication Type dialog, select Merge Publication and click Next.
  • In the Subscriber Types dialog, select SQL Server 2008 or later and click Next.
  • In the Articles dialog, expand the Tables tree view and check the Customers check box.  Check the Highlighted table is download-only check box.  Click Next.
  • In the Filter Table Row dialog, click the Add button and select Add Filter.   In the Add Filter dialog, go to the Filter statement text box and add Id > (SELECT COUNT(*)/2 FROM [dbo].[Customers]) to the end of the WHERE clause.  The filter should look like the following when you’re done:

Add Filter

  • The subquery in the WHERE clause calculates the total number of rows, divides them by 2, and then returns a list of customers whose Id is greater than the midpoint of the list.  Click OK and then click Next.
  • In the Snapshot Agent dialog, check both check boxes.  Click the Change button and in the New Job Schedule dialog, change the Recurs every: text box to 1 day(s) instead of 14.  Click OK and then click Next.
  • In the Agent Security dialog, click the Security Settings button.  In the Snapshot Agent Security dialog, select the second radio button to run it under the SQL Server Agent service account.  Click OK and then click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, enter CustomerSecondHalfShard and click Finish.
  • In the Creating Publication dialog, if everything succeeds, click Close.

Customer (First Half) Subscription

Expand the Replication folder in the Object Explorer, right click on the Local Subscriptions folder, select New Subscription to launch the New Subscription Wizard and click Next.

  • In the Publication dialog, select CustomerFirstHalfShard and click Next.
  • In the Merge Agent Location dialog, select the first radio button to run all agents at the Distributor.  Click Next.
  • In the Subscribers dialog, check the check box for the local SQL Server that you’re using.  Click the Subscription Database combo box and select New Database.  In the New Database dialog, enter CustomerFirstHalf in the Database name text box and click OK.  Click Next.
  • In the Merge Agent Security dialog, click the ellipsis on the far right.  In the dialog, select the second radio button to run under the SQL Server Agent service account.  Click OK and click Next.
  • In the Synchronization Schedule dialog, click the Agent Schedule combo box and select Run Continuously.  Click Next.
  • In the Initialize Subscriptions dialog, stick with the default value of initializing immediately and click Next.
  • In the Subscription Type dialog, click the Subscription Type combo box and select Client.   Click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, click Finish.  If everything succeeds, click Close.

To verify that all your settings are correct and that everything is working, open your new CustomerFirstHalf database, right click on the Customers table and Select Top 1000 Rows.  The table should be filled with the first half of the products that you find in the ContosoFruit database.

Customer (Second Half) Subscription

Expand the Replication folder in the Object Explorer, right click on the Local Subscriptions folder, select New Subscription to launch the New Subscription Wizard and click Next.

  • In the Publication dialog, select CustomerSecondHalfShard and click Next.
  • In the Merge Agent Location dialog, select the first radio button to run all agents at the Distributor.  Click Next.
  • In the Subscribers dialog, check the check box for the local SQL Server that you’re using.  Click the Subscription Database combo box and select New Database.  In the New Database dialog, enter CustomerSecondHalf in the Database name text box and click OK.  Click Next.
  • In the Merge Agent Security dialog, click the ellipsis on the far right.  In the dialog, select the second radio button to run under the SQL Server Agent service account.  Click OK and click Next.
  • In the Synchronization Schedule dialog, click the Agent Schedule combo box and select Run Continuously.  Click Next.
  • In the Initialize Subscriptions dialog, stick with the default value of initializing immediately and click Next.
  • In the Subscription Type dialog, click the Subscription Type combo box and select Client.   Click Next.
  • In the Wizard Actions dialog, check the first check box and click Next.
  • In the Complete the Wizard dialog, click Finish.  If everything succeeds, click Close.

To verify that all your settings are correct and that everything is working, open your new CustomerSecondHalf database, right click on the Customers table and Select Top 1000 Rows.  The table should be filled with the second half of the products that you find in the ContosoFruit database.

Customer First + Second Half

You’ve now scaled out one Customer table shard into two read-only shards that split the number of customers evenly as shown below: Publications Subscriptions

Hopefully, you now see the power of horizontally scaling out SQL Server into shards of partial or complete tables.  When you take this shared-nothing architecture into production, you’ll have n SQL Server Subscriber nodes with their own storage, CPU, memory, and networking.  Merge Replication is a powerful, supported component of the SQL Server database engine that allows the Microsoft MEAP mobile middleware to meet your performance and scalability needs just like the world’s largest Internet sites.  In the next article in the Building Microsoft MEAP series, I’ll show you how build REST web services that connect to the various database shards in order to expose their data out to smartphones and tablets.

Keep scaling!

– Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

SQL Server Compact 3.5 Service Pack 2 Cumulative Update Package 7 RTW

SQL Server Compact

Hey folks, just wanted to let you know that cumulative update package 7 for SQL Server Compact 3.5 Service Pack 2 has been released to the web.

You can download the new bits over at http://support.microsoft.com/kb/2665342.

This is a hotfix for an incorrect sort order for a subscriber in SQL Server Compact 3.5 SP2 that synchronizes with a publisher in SQL Server.  For instance, you may have a column with an ASC index on SQL Server, but during sync, the sort order may not be specified.  The problem occurs due to an incorrect index creation statement in the .OUT file in the virtual directory on IIS.  Therefore, only the Server Tools need to be updated.  Both x86 and x64 versions of the update are available to download.

Keep in mind that cumulative updates 6 and above will allow your Windows tablets, laptops and Windows Embedded Handheld devices to sync with SQL Server 2012.

Go get things sorted out,

-Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Performance and Memory Management Improvements with Windows Embedded Handheld

A lot has changed since the launch of Windows Phone in the Fall of 2010.

Microsoft now has a compelling phone platform that targets consumers inside and outside the office.  One thing that that hasn’t changed is the widespread use of Windows Embedded Handheld to solve tough enterprise mobility problems.  It should be no surprise that over 80% of enterprise handhelds shipped are running Windows Mobile or Windows Embedded Handheld.  They include support for barcode scanning, RFID reading, rugged hardware, every type of wireless, full device encryption, complete over-the-air software distribution and device managment support, FIPS compliance, and both capacitive touch and stylus operation.  On the application platform side of the equation, they have rich support for WinForm development using Visual Studio and the .NET Compact Framework, C++ and a full-featured database with built-in sync capabilities via SQL Server Compact.  They can easily communicate with WCF SOAP and REST web services running on Windows Servers on-premise or with Azure in the cloud.  Support for Merge Replication means faster time to market to get device synchronizing with SQL Server with almost no coding.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Since Windows Embedded Handheld uses an advanced version of the operating system kernel used by Windows Mobile 6.5.3, many of the techniques and best practices I’ve taugh customers and developers all over the world still apply.  While it still uses the slotted memory model found in Windows CE 5 with 32 processes and 32 MB of memory per process, you’ll find that numerous enhancements and tuning has taken place to give your line of business apps more of what they need.  I’m talking about more memory per process and improved performance.

A recent Gartner report recommends that organizations should stay with Windows Embedded Handheld as the best mobile platform for enterprise line of business needs.  Great devices are available from OEMs like Intermec, Motorola, Psion, and Honeywell just to name a few.  I hope this video helps you with any memory management or performance issues you may need to deal with in your enterprise mobile apps.

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Reducing SQL Server Sync I/O Contention :: Tip 5

Trident Submarine

When it comes to delivering server solutions with Windows Server and SQL Server, speed is your friend (as we used to say in the submarine service).

More speed means more things can happen in a given period of time. If more things can happen in a given period of time, then you can derive greater scalability. Duh, Winning!

Okay, this stuff is obvious so let’s move on.

Have you ever noticed when you’ve finished installing your shiny new Windows Server 2008 R2 box, the default Power Plan is set to “Balanced?” Guess what kind of performance and scalability you get when you decide to “Go Green” and save the world with a “Balanced” power plan? Needless to say, you’re not making the most of the high-powered CPUs you just paid big money for.

So how does this relate to SQL Server and reducing I/O contention?

Would it surprise you to know that the amount of time your CPU’s spend processing your queries could actually double with a Balanced plan? If it takes more CPU time to execute a query, then imagine all those Merge Agent processes locking and blocking each other as they try to perform DML operations on the change tracking tables.

So what’s the takeaway here?

Set your Windows Server 2008 R2 power plan to High Performance! If you’re part of a Windows Domain and you need to make this setting stick, have your sys admin enforce this setting on all your SQL Servers via Group Policy.

Go fast or go home because your users care about performance.

-Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Reducing SQL Server Sync I/O Contention :: Tip 4

Add Filter

One of the features that makes SQL Server Merge Replication superior to other sync technologies is something called Precomputed Partitions.

SQL Server creates and maintains distinct data partitions/subsets for each unique user or other type of filter value.  Other sync technologies figure out “what’s-changed” on the fly when a Subscriber synchronizes.  This means all change-tracking tables must be evaluated to figure out the data-delta while the user (not so patiently) waits.  Merge Replication with Precomputed Partitions does all this hard work in advance, so the Subscribers can start downloading changes instantly.  Not making your customer wait contributes to a good user experience (UX).

To pull this off, SQL Server has to use a more advanced set of tracking tables + additional trigger logic than simpler sync systems.  Since it does more work in advance, it goes without saying that the Merge Agents are busily creating extra contention through locking and blocking of all those extra tracking tables.  Luckily, you have some control over how many extra tracking tables are used depending on the partition choices you make.  The fewer tables you have to use, while still getting the functionality you need is one of the keys to reducing SQL Server contention.

Let’s take a quick look at the tracking tables to give you some context:

  • MSmerge_contents (A row is inserted for each user table insert or update)
  • MSmerge_tombstone (A row is inserted for user table delete)
  • MSmerge_genhistory (A grouping of the above inserts called a generation)
  • MSmerge_partition_groups (One row for each distinct partition defined)
  • MSmerge_current_ partition_mappings (One row for each unique combination of rows in MSmerge_contents and MSmerge_partition_groups)
  • MSmerge_past_partition_mappings (One row for each row that no longer belongs in a given partition)

Wow, that’s a bunch of tracking tables!

So imagine a bunch of Merge Agents performing SELECTS, INSERTS, UPDATES, and DELETES against all these tables in order to make the magic happen for the end user.  The more Merge Agents you having running concurrently, the more locking and blocking the system will experience which degrades performance.

What if I told you that you could reduce contention by eliminating 2 tracking tables from the equation?

Anyone who has setup Merge Replication might recognize the Add Filter dialog below:

Add Filter

You will see one of these dialog boxes for every filter you create.  For the purposes of this article, I just want you to focus on the 2 radio buttons at the bottom.  You get to choose that a row from a table will sync with multiple subscriptions or just one subscription.  If multiple users share the same distinct partition, you select the top one, because it’s an overlapping partition and it will need to use all 6 of the tracking tables I mentioned previously.  This means multiple users send the exact same HOST_NAME() filter value when they sync.  This might be a scenario where a group of sales professionals share the same region and clients, but nothing specific to themselves.

On the other hand, if you know that only one user subscribes to a given partition, then you select the bottom radio button because it’s a non-overlapping partition, and only uses the top 4 tracking tables I mentioned above.  This is actually the most common scenario I see with my customers.  Imagine a delivery driver that syncs down the data he needs to tell him where to go, what to do, and then uploads data he’s captured about deliveries.

So the big takeaway here, is to always select the bottom radio button so that SQL Server will create more efficient, non-overlapping partitions and use fewer tracking tables.

But wait, there’s more!

Non-overlapping partitions are less likely to create conflicts that have to be resolved by the built-in conflict resolvers.  If you’re not updating other users rows, then you’re not crashing into each other which leads to faster, smoother operation.

Back at the beginning of the article, I mentioned that there’s a quite of lot of code executed by triggers and associated stored procedures to make this sophisticated machine work.  Depending on your schema, the amount of data in the tracking tables, and the types of filters you’ve created, a potentially slow 5-way join is executed by sp_MSenumchanges. You can turn that into a 3-way join and speed up the system by sticking to non-overlapping partitions.  That was easy.

It’s all about UX, and trust me, your users will thank you for caring about this stuff!

Keep Synching,

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

SQL Server Compact 4.0 Lands on the Web

SQL Server

A decade has passed since I first started using SQL CE on my Compaq iPAQ.  What started as a great upgrade to Pocket Access turned into the ultimate embedded database for Windows CE, the Pocket PC, Windows Mobile and Windows Phones.  The one-two punch of Outlook Mobile synchronizing email with Exchange and SQL Server Compact synchronizing data with SQL Server helped set the mobile enterprise on fire.  In 2005, version 3.0 supported Windows Tablets and progressive enhancements to the code base led to full Windows support on both x86 and x64 platforms.  With the new version 4.0, the little-database-that-could has grown up into a powerful server database ready to take on the web.

We’ve come a long way and you’re probably wondering what qualifies this new embedded database to take on the Internet:

  • Native support for x64 Windows Servers
  • Virtual memory usage has been optimized to ensure the database can support up to 256 open connections – (Are you actually using 256 pooled connections with your “Big” database today?)
  • Supports databases up to 4 GB in size – (Feel free to implement your own data sharding schemeSQL Server Compact)
  • Developed, stress-tested, and tuned to support ASP.NET web applications
  • Avoids the interprocess communications performance hit by running in-process with your web application
  • Row-level locking to boost concurrency
  • Step up to Government + Military grade security SHA2 algorithm to secure data with FIPS compliance
  • Enhanced data reliability via true atomicity, consistency, isolation, and durability (ACID) support
  • Transaction support to commit and roll back grouped changes
  • Full referential integrity with cascading deletes and updates
  • Support ADO.NET Entity Framework 4 – (Do I hear WCF Data Services?)
  • Paging queries are supported via T-SQL syntax to only return the data you actually need

Wow, that’s quite a list!  SQL Server Compact 4.0 databases are easily developed using the new WebMatrix IDE or through Visual Studio 2010 SP1.  I’m loving the new ASP.NET Web Pages.  It reminds me of the good old days of building web applications with Classic ASP back in the 90’s with Visual InterDev and Homesite.

What about Mobility?

Since SQL Server Compact owes its heritage to mobile and embedded versions of Windows, you might be wanting to know what our story is there.  The good news is that you can build and deploy v4.0 databases on Windows XP, Windows Vista, and Windows 7.  If you want to implement an occasionally-connected solution that utilizes the Sync Framework, Remote Data Access (RDA), or Merge Replication, you’ll need to stick with SQL Server Compact 3.5 SP2.  Time and resource-constraints prevented the Compact team from enabling these features.  Luckily, single-user WPF/WinForms database applications running on Windows Slates, laptops and Windows Embedded Handheld devices will work just fine with the v3.5 SP2 runtime.  Get a jumpstart with this by pickup up “Enterprise Data Synchronization with Microsoft SQL Server 2008 and SQL Server Compact 3.5 Mobile Merge Replication” at   http://www.amazon.com/Enterprise-Synchronization-Microsoft-Compact-Replication/dp/0979891213/ref=sr_1_1?s=books&ie=UTF8&qid=1281715114&sr=1-1 to start building those MEAP solutions.

With the tidal wave of Windows Slates hitting the market, a secure, powerful mobile database that allows users to work offline and syncs with SQL Server is definitely going to be a hot item!

So run, don’t walk to the Microsoft Download site to download the Next-Gen database for the web:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=033cfb76-5382-44fb-bc7e-b3c8174832e2

If you need to support occasionally-connected mobile applications with sync capabilities on muliple Windows platforms, download SQL Server Compact 3.5 SP2:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=e497988a-c93a-404c-b161-3a0b323dce24

Keep Syncing,

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Reducing SQL Server Sync I/O Contention :: Tip 3

Primary Key
GUIDs and Clustered Indexes

Uniqueness is a key factor when synchronizing data between SQL Server/Azure and multiple endpoints like Slates and Smartphones.  With data simultaneously created and updated on servers and clients, ensuring rows are unique to avoid key collisions is critical.  As you know, each row is uniquely identified by its Primary Key.

Primary Key

When creating Primary Keys, it’s common to use a compound key based on things like account numbers, insert time and other appropriate business items.  It’s even more popular to create Identity Columns for the Primary Key based on an Int or BigInt data type based on what I see from my customers.  When you designate a column(s) to be a Primary Key, SQL Server automatically makes it a Clustered Index.  Clustered indexes are faster than normal indexes for sequential values because the B-Tree leaf nodes are the actual data pages on disk, rather than just pointers to data pages.

While Identity Columns work well in most database situations, they often break down in a data synchronization scenario since multiple clients could find themselves creating new rows using the same key value.  When these clients sync their data with SQL Server, key collisions would occur.  Merge Replication includes a feature that hands out blocks of Identity Ranges to each client to prevent this.

When using other Microsoft sync technologies like the Sync Framework or RDA, no such Identity Range mechanism exists and therefore I often see GUIDs utilized as Primary Keys to ensure uniqueness across all endpoints.  In fact, I see this more and more with Merge Replication too since SQL Server adds a GUID column to the end of each row for tracking purposes anyway.  Two birds get killed with one Uniqueidentifier stone.

Using the Uniqueidentifier data type is not necessarily a bad idea.  Despite the tradeoff of reduced join performance vs. integers, the solved uniqueness problem allows sync pros to sleep better at night.  The primary drawback with using GUIDs as Primary Keys goes back to the fact that SQL Server automatically gives those columns a Clustered Index.

I thought Clustered Indexes were a good thing?

They are a good thing when the values found in the indexed column are sequential.  Unfortunately, GUIDs generated with the default NewId() function are completely random and therefore create a serious performance problem.  All those mobile devices uploading captured data means lots of Inserts for SQL Server.  Inserting random key values like GUIDs can cause fragmentation in excess of 90% because new pages have to be allocated with rows pushed to the new page in order to insert the record on the existing page.  This performance-killing, space-wasting page splitting wouldn’t happen with sequential Integers or Datetime values since they actually help fill the existing page.

What about NEWSEQUENTIALID()?

Generating your GUIDs on SQL Server with this function will dramatically reduce fragmentation and wasted space since it guarantees that each GUID will be sequential.  Unfortunately, this isn’t bulletproof.  If your Windows Server is restarted for any reason, your GUIDs may start from a lower range.  They’ll still be globally unique, but your fragmentation will increase and performance will decrease.  Also keep in mind that all the devices synchronizing with SQL Server will be creating their own GUIDs which blows the whole NEWSEQUENTIALID() strategy out of the water.

Takeaway

If you’re going to use the Uniqueidentifier data type for your Primary Keys and you plan to sync your data with RDA, the Sync Framework or Merge Replication, ensure that Create as Clustered == No for better performance.  You’ll still get fragmentation, but it will be closer to the ~30% range instead almost 100%.

Keep synching

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Reducing SQL Server I/O Contention during Sync :: Tip 2

Database Storage
Indexing Join Columns

In my last Sync/Contention post, I beat up on a select group of SAN administrators who aren’t willing to go the extra mile to optimize the very heart of their organization, SQL Server.  You guys know who you are.

This time, I want to look at something more basic, yet often overlooked.

All DBAs know that Joining tables on non-indexed columns is the most expensive operation SQL Server can perform.  Amazingly, I run into this problem over and over with many of my customers.  Sync technologies like the Sync Framework, RDA and Merge Replication allow for varying levels of server-side filtering.  This is a popular feature used to reduce the size of the tables and rows being downloaded to Silverlight Isolated Storage or SQL Server Compact.

It’s also a performance killer when tables and columns participating in a Join filter are not properly indexed.  Keeping rows locked longer than necessary creates undue blocking and deadlocking.  It also creates unhappy slate and smartphone users who have to wait longer for their sync to complete.

Do yourselft a favor and go take a look at all the filters you’ve created and makes sure that you have indexes on all those Joined columns.

Keep synching,

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Reducing SQL Server I/O Contention during Sync :: Tip 1

SAN Storage
RAID

Sync technologies like Merge Replication and the Sync Framework track changes on SQL Server using triggers, stored procedures and special tracking tables.  The act of tracking changes made by each SQL Server Compact or Silverlight sync subscriber can cause a lot of locking and blocking on the server.  This diminishes performance and sometimes leads to deadlocks. SAN Storage

Therefore, don’t listen to your SAN administrator when he says the RAID 5 will do.  RAID 1 or 10 must always be used for all databases, tempdb, and transaction logs.  Furthermore, each of these database objects must be placed on their own dedicated RAID arrays.  No sharing!  Remembers, as a DBA and sync expert, knowledge of SAN configuration must always be part of your skillset.

Keeping synching,

Rob

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]

Mobile Merge Replication Performance and Scalability Cheat Sheet

SQL Server Compact

If your Mobile Enterprise Application Platform (MEAP) is using SQL Server Merge Replication to provide the mobile middleware and reliable wireless wire protocol for SQL Server Compact (SSCE) running on Windows Mobile and Windows Embedded Handheld devices + Windows XP/Vista/7 tablets, laptops, desktops, slates, and netbooks; below is a guide to help you build the fastest, most scalable systems:

Active Directory

  • Since your clients will be passing in their Domain\username + password credentials when they sync, both IIS and SQL Server will make auth requests of the Domain Controller. Ensure that you have at least a primary and backup Domain Controller, that the NTDS.dit disk drives are big enough to handle the creation of a large number of new AD DS objects (mobile users and groups), and that your servers have enough RAM to cache all those objects in memory.

Database Schema

  • Ensure your schema is sufficiently de-normalized so that you never have to perform more than a 4-way JOIN across tables. This affects server-side JOIN filters as well as SSCE performance.
  • To ensure uniqueness across all nodes participating in the sync infrastructure, use GUIDs for your primary keys so that SQL Server doesn’t have to deal with the overhead of managing Identity ranges. Make sure to mark your GUIDs as ROWGUIDCOL for that table so that Merge won’t try to add an additional Uniqueidentifier column to the table.  Don’t create clustered indexes when using GUIDs as primary keys because they will suffer horrible fragmentation that will rapidly degrade performance.  Just use a normal index.
  • Create clustered indexes for your primary keys when using Indentity columns, Datetime, or other natural keys.  Ensure that every column in every table that participates in a WHERE clause is indexed.

Distributor

  • If your network connection is fast and reliable like Wi-Fi or Ethernet, your SSCE client has more than 32 MB of free RAM, and SQL Server isn’t experiencing any deadlocks due to contention with ETL operations or too many concurrent Merge Agents, create a new Merge Agent Profile based on the High Volume Server-to-Server Profile so that SQL Server will perform more work per round-trip and speed up your synchronizations.
  • If you’re using a 2G/3G Wireless Wide Area Network connection, create a Merge Agent Profile based on the Default Profile so that SQL Server will perform less work and use fewer threads per round-trip during synchronization than the High Volume Server to Server Profile which will help to reduce server locking contention and perform less work per round trip which will make your synchronizations more likely to succeed.
  • In order to prevent SQL Server from performing Metadata Cleanup every time a Subscriber synchronizes, set the –MetadataRetentionCleanup parameter to 0.
  • As SQL Server has to scale up to handle a higher number of concurrent users in the future, locking contention will increase due to more Merge Agents trying to perform work at the same time.  When this happens, adjust the parameters of the Default Profile so that both  –SrcThreads and –DestThreads are equal to 1.

Publication

  • When defining the Articles you’re going to sync, only check the minimum tables and columns needed by the Subscriber to successfully perform its work.
  • For Lookup/Reference tables that aren’t modified by the Subscriber, mark those as Download-only to prevent change-tracking metadata from being sent to the Subscriber.
  • Despite the fact the column-level tracking sends less data over the air, stick with row-level tracking so SQL Server won’t have to do as much work to track the changes.
  • Use the default conflict resolver where the “Server wins” unless you absolutely need a different manner of picking a winner during a conflict.
  • Use Static Filters to reduce the amount of server data going out to all Subscribers.
  • Make limited use of Parameterized Filters which are designed to reduce and further specify the subset of data going out to a particular Subscriber based on a HOST_NAME() which creates data partitions.  This powerful feature slows performance and reduces scalability with each additional filter, so it must be used sparingly.
  • Keep filter queries simple and don’t use IN clauses, sub-selects or any kind of circular logic.
  • Strive to always create “well-partitioned” Articles where all changes that are uploaded/downloaded are mapped to only the single partition ID for best performance and scalability.
    • When using Parameterized Filters, always create non-overlapping data partitions where each row from a filtered table only goes to a single Subscriber instead of more than one which will avoid the use of certain Merge metadata tables.
    • Each Article in this scenario can only be pubished to a single Publication
    • A Subscriber cannot insert rows that do not belong to its partition ID.
    • A Subscriber cannot update columns that are involved in filtering.
    • In a join filter hierarchy, a regular article cannot be the parent of a “well-partitioned” article.
    • The join filter in which a well-partitioned article is the child must have the join_unique_key set to a value of 1 which relates to the Unique key check box of the Add Join dialog.  This means there’s a one-to-one or one-to-many relationship with the foreign key.
    • Each “well-partitioned” Article can have only one subset or join filter. The article can have a subset filter and be the parent of a join filter, but cannot have a subset filter and be the child of a join filter.
  • Never extend a filter out to more than 4 joined tables.
  • Do not filter tables that are primarily lookup/reference tables, small tables, and tables with data that does not change.
  • Schedule the Snapshot Agent to run once per day to create an unfiltered schema Snapshot.
  • Set your Subscriptions to expire as soon as possible to keep the amount change-tracking metadata SQL Server has to manage to an absolute minimum. Normally, set the value to 2 to accommodate 3-day weekends since 24 hours are automatically added to the time to account for multiple time zones. If server-side change tracking isn’t needed and Subscribers are pulling down a new database every day and aren’t uploading data, then set the expiration value to 1.
  • Set Allow parameterized filters equal to True.
  • Set Validate Subscribers equal to HOST_NAME().
  • Set Precompute partitions equal to True to allow SQL Server to optimize synchronization by computing in advance which data rows belong in which partitions.
  • Set Optimize synchronization equal to False if Precompute partitions is equal to True.  Otherwise set it to True to optimize filtered Subscriptions by storing more metadata at the Publisher.
  • Set Limit concurrent processes equal to True.
  • Set Maximum concurrent processes equal to the number of SQL Server processor cores.  If exceesive locking contention occurs, reduce the number of concurrent processes until the problem is fixed.
  • Set Replicate schema changes equal to True.
  • Check Automatically define a partition and generate a snapshot if needed when a new Subscriber tries to synchronize. This will reduce Initialization times since SQL Server creates and applies snapshots using the fast BCP utility instead of a series of slower SELECT and INSERT statements.
  • Add data partitions based on unique HOST_NAMEs and schedule the Snapshot Agent to create those filtered Snapshots nightly or on the weekend so they’ll be built using the fast BCP utility and waiting for new Subscribers to download in the morning.
  • Ensure that SQL Server has 1 processor core and 2 GB of RAM for every 100 concurrent Subscribers utilizing bi-directional sync. Add 1 core and 2 GB of RAM server for every additional 100 concurrent Subscribers you want to add to the system.  Never add more Subscribers and/or IIS servers without also adding new cores and RAM to the Publisher.
  • Turn off Hyperthreading in the BIOS of the SQL Server as it has been known to degrade SQL Server performance.
  • Do not add your own user-defined triggers to tables on a Published database since Merge places 3 triggers on each table already.
  • Add one or more Filegroups to your database to contain multiple, secondary database files spread out across many physical disks.
  • Limit use of large object types such as text, ntext, image, varchar(max), nvarchar(max) or varbinary(max) as they require a significant memory allocation and will negatively impact performance.
  • Set SQL Servers’s minimum and maximum memory usage to within 2 GB of total system memory so it doesn’t have to allocate more memory on-demand.
  • Always use SQL Server 2008 R2 and Windows Server 2008 R2 since they work better together because they take advantage of the next generation networking stack which dramatically increases network throughput. They can also scale up as high as 256 cores.
  • Due to how Merge Replication tracks changes with triggers, Merge Agents, and tracking tables, it will create locking contention withDML/ ETL operations.  This contention degrades server performance which negatively impacts sync times with devices.  This contention should be mitgated by performing large INSERT/UPDATE/DELETE DML/ETL operations during a nightly maintenance window when Subscribers aren’t synchronizing.
  • Since Published databases result in slower DML/ETL operations, perform changes in bulk by using XML Stored Procedures to boost performance.
  • To improve the performance of pre-computed partitions when DML/ETL operations result in lots of data changes, ensure that changes to a Parent table in a join filter are made before corresponding changes in the child tables.  This means that when DML/ETL operations are pushing new data into SQL Server, they must add master data to the parent filter table first, and then add detail data to all the related child tables second, in order for that data to be pre-computed and optimized for sync.
  • Create filter partitions based on things that don’t change every day.  Partitions that are added and deleted from SQL Server and Subscribers that move from one partition to another is very disruptive to the performance of Merge Replication.
  • Always perform initializations and re-initializations over Wi-Fi or Ethernet when the device is docked because this is the slowest operation where the entire database must be downloaded instead of just deltas.  To determine rough estimates for initialization, multiply the size of the resulting SSCE .sdf file x the bandwidth speed available to the device.  A file copy over the expected network will also yield estimates for mininum sync times.  These times don’t include the work SQL Server and IIS must perform to provide the data or data INSERT times on SSCE.
  • If your SQL Server Publisher hits a saturation point with too many concurrent mobile Subscribers, you can scale it out creating a Server/Push Republishing hierarchy. Put the primary SQL Server Publisher at the top of the pyramid and have two or more SQL Servers subscribe to it. These can be unfiltered Subscriptions where all SQL Servers get the same data or the Subscribers can filter their data feeds by region for example. Then have the Subscribing SQL Servers Publish their Subscription for consumption by mobile SSCE clients.
  • Create just a single Publication.

Internet Information Services

  • Use the x64 version of the SQL Server Compact 3.5 SP2 Server Tools with Windows Server 2008 R2 running inside IIS 7.5.
  • Use a single Server Agent in a single Virtual Directory.
  • Ensure the IIS Virtual Directory where the Server Agent resides is on a fast solid-state drive that’s separate from the disk where Windows Server is installed to better support file I/O.
  • Use a low-end server with 2 processor cores and 2 GB of RAM to support 400 concurrent Subscribers queued at the same time.
  • Set the MAX_THREADS_PER_POOL Server Agent registry key equal to 2 to match the IIS processor cores and RAM. Do not set this value to a higher number than the number of cores.
  • Set the MAX_PENDING_REQUEST Server Agent registry key equal to 400 which means the Server Agent will queue up to 400 concurrent Subscribers waiting for one of the 2 worker threads to become available to sync with SQL Server.
  • Set the IIS Connection Limits property to 400 to prevent an unlimited number of connections reaching the Server Agent.
  • Add a new load-balanced IIS server for every additional 400 concurrent Subscribers you want to add to the system.

Subscriber

  • Use the appropriate x64, x86 or ARM version of SQL Server Compact 3.5 SP2 to take advantage of the PostSyncCleanup property of the SqlCeReplication object that can reduce the time it takes to perform an initial synchronization. Set the PostSyncCleanup property equal to 3 where neither UpdateStats nor CleanByRetention are performed.
  • Increase the Max Buffer Size connection string parameter to 1024 on a phone and 4096 on a PC to boost both replication and SQL query processing performance. If you have more RAM available, set those values even higher until you reach the law of diminishing returns.
  • Keep your SSCE database compact and fast by setting the Autoshrink Threshold connection string parameter to 10 so it starts reclaiming empty data pages once the database has become 10% fragmented.
  • Replication performance testing must be performed using actual PDAs to observe how available RAM, storage space and CPU speed affect moving data into the device’s memory area and how quickly this data is inserted into the SSCE database tables.  Since the SSCE database doubles in size during replication, the device must have enough storage available or the operation will fail.  Having plenty of available RAM is important so that SSCE can utilize its memory buffer to complete a Merge Replication operation more quickly.  With plenty of available RAM and storage, a fast CPU will make all operations faster.
  • The PDA must have at least an extra 32 MB of available free RAM that can be used by the .NET Compact Framework (NETCF) application.  If additional applications are running on the device at the same time, even more RAM is needed.  If a NETCF application has insufficient RAM is will discard its compiled code and run in interpreted mode which will slow the application down.  If the NETCF app is still under memory pressure after discarding compiled code, Windows Mobile will first tell the application to return free memory to the operating system and then will terminate the app if needed.
  • Set the CompressionLevel property of the SqlCeReplication object to 0 for fast connections and increment it from 1 to 6 on slower connections like GPRS to increase speed and reduce bandwidth consumption.
  • Tune the ConnectionRetryTimeout, ConnectTimeout, ReceiveTimeout and SendTimeout properties of the SqlCeReplication object based on expected bandwidth speeds:
Property High Bandwidth Medium Bandwidth Low Bandwidth
ConnectionRetryTimeout 30 60 120
ConnectTimeout 3000 6000 12000
ReceiveTimeout 1000 3000 6000
SendTimeout 1000 3000 6000
  • You can decrease potentially slow SSCE file I/O by adjusting the Flush Interval connection string parameter to write committed transactions to disk less often than the default of every 10 seconds.  Test longer intervals between flushes like 20 or 30 seconds. Keep in mind that these transactions can be lost if the disk or system fails before flushing occurs so be careful.
  • When replicating data that has been captured in the field by the device, perform Upload-only syncs to shorten the duration.

Storage

  • Use a Fibre Channel SAN with 15k RPM or solid-state disks for best I/O performance.
  • Databases should reside on a RAID 10, unshared LUN comprised of at least 6 disks.
  • Database logs should reside on a RAID 10, unshared LUN comprised of at least 6 disks.
  • Tempdb should reside on a RAID 10, unshared LUN comprised of at least 6 disks.
  • The Tempdb log should reside on a RAID 10, unshared LUN comprised of at least 6 disks.
  • The Snapshot share should reside on a RAID 10, unshared LUN comprised of at least 6 disks.  This disk array should be large enough to accommodate a growing number of filtered Snapshots. Snapshot folders for Subscribers that no longer use the system must be manually deleted.
  • Merge Replication metadata tables should reside on a RAID 10, unshared LUN comprised of at least 6 disks.
  • Increase your Host Bus Adapter (HBA) queue depths to 64.
  • Your Publication database should be broken up into half the number of files as the SQL Server has processor cores. Each file must be the same size.
  • Tempdb should be pre-sized with an auto-growth increment of 10%. It should be broken up into the same number of files as the SQL Server has processor cores. Each file must be the same size.

High Availability

  • Load-balance the IIS servers to scale them out. Enable Server Affinity (stickiness) since the Replication Session Control Blocks that transmit data between the Server Agent and SSCE are stateful. Test to ensure that your load-balancer is actually sending equal amounts of Subscriber sync traffic to each IIS server.  Some load-balancers can erroneously send all traffic to a single IIS server if not properly configured.
  • Implement Windows Clustering so that SQL Server can failover to a second node.
  • Using SQL Server Mirroring so that your Published database will failover to a standby server.
  • Make a second SQL Server into an unfiltered Subscriber to your Publisher so that it can take over Merge Replication duties for mobile clients as a Republisher if the primary SQL Server fails. SSCE clients would just have to reinitialize their Subscriptions to begin synchronizing with the new Republisher.

Ongoing Maintenance

  • Use the Replication Monitor to have a real-time view of the synchronization performance of all your Subscribers.
  • Use the web-based SQL Server Compact Server Agent Statistics and Diagnostics tools to monitor the health and activity of the Server Agent running on IIS.
  • Create a SQL Job to execute the sp_MSmakegeneration stored procedure after large DML operations. Regular execution after INSERTING/UPDATING/DELETING data from either DML/ETL operations or after receiving lots of changes from Subscribers will maintain subsequent sync performance. Executing this stored procedure from the server-side is preferable to having it executed as a result of a Subscriber sync which would block all other Subscribers.
  • During your nightly maintenance window, rebuild the indexes and update the statistics of the following Merge Replication metadata tables:
    • MSmerge_contents
    • MSmerge_tombstone
    • MSmerge_genhistory
    • MSmerge_current_partition_mappings
    • MSmerge_past_partition_mappings
    • MSmerge_generation_partition_mappings
  • If you notice performance degradation during the day due to a large number of Subscribers making large changes to the database, you can updates the statistics (with fullscan) of the Merge Replication metadata tables more frequently throughout the day to force stored proc recompiles to get a better query plan.
    • UPDATE STATISTICS MSmerge_generation_partition_mappings WITH FULLSCAN
    • UPDATE STATISTICS MSmerge_genhistory WITH FULLSCAN
  • Rebuild/defrag indexes on your database tables and Merge Replication metadata tables throughout the day to reduce locking contention and maintain performance.
  • Use the Missing Indexes feature of SQL Server to tell you which indexes you could add that would give your system a performance boost. Do not add recommended indexes to Merge system tables.
  • Use the Database Engine Tuning Advisor to give you comprehensive performance tuning recommendations that cover every aspect of SQL Server.
  • Monitor the performance of the following counters:
    • Processor Object: % Processor Time: This counter represents the percentage of processor utilization. A value over 80% is a CPU bottleneck.
    • System Object: Processor Queue Length: This counter represents the number of threads that are delayed in the processor Ready Queue and waiting to be scheduled for execution. A value over 2 is bottleneck and shows that there is more work available than the processor can handle. Remember to divide the value by the number of processor cores on your server.
    • Memory Object: Available Mbytes: This counter represents the amount of physical memory available for allocation to a process or for system use. Values below 10% of total system RAM indicate that you need to add additional RAM to your server.
    • PhysicalDisk Object: % Disk Time: This counter represents the percentage of time that the selected disk is busy responding to read or write requests. A value greater than 50% is an I/O bottleneck.
    • PhysicalDisk Object: Average Disk Queue Length: This counter represents the average number of read/write requests that are queued on a given physical disk. If your disk queue length is greater than 2, you’ve got an I/O bottleneck with too many read/write operations waiting to be performed.
    • PhysicalDisk Object: Average Disk Seconds/Read and Disk Seconds/Write: These counters represent the average time in seconds of a read or write of data to and from a disk. A value of less than 10 ms is what you’re shooting for in terms of best performance. You can get by with subpar values between 10 – 20 ms but anything above that is considered slow. Times above 50 ms represent a very serious I/O bottleneck.
    • PhysicalDisk Object: Average Disk Reads/Second and Disk Writes/Second: These counters represent the rate of read and write operations against a given disk. You need to ensure that these values stay below 85% of a disk’s capacity by adding disks or reducing the load from SQL Server. Disk access times will increase exponentially when you get beyond 85% capacity.
  • A limited number of database schema changes can be made and synchronized down to SSCE Subscribers without any code changes which makes it easier to update your system as it evolves over time.
  • Use a Merge Replication Test Harness to stress test the entire system.  The ability to simulate hundreds or thousands of concurrent synchronizing Subscribers allows you to monitor performance and the load on the system.  This is helpful in properly configuring and tuning SQL Server, IIS, and the Windows Mobile devices.  It will tell you where you’re having problems and it will let you predict how much server hardware you will need to support growing numbers of Subscribers over time.  It’s also very important to simulate worst-case scenarios that you never expect to happen.

I hope this information sufficiently empowers you to take on the largest MEAP solutions that involve SQL Server Merge Replication and SQL Server Compact.  If you need a deeper dive, go check out my book on Enterprise Data Synchronization http://www.amazon.com/Enterprise-Synchronization-Microsoft-Compact-Replication/dp/0979891213/ref=sr_1_1?ie=UTF8&s=books&qid=1271964573&sr=1-1 over at Amazon.  Now go build a fast and scalable solution for your company or your customers.

Best Regards,

Rob

P.S.  If your solution doesn’t require all the advanced features found in Merge Replication, I highly recommend you use Remote Data Access (RDA).  This is a much simpler sync technology that’s extremely fast, scalable, and easier to manage.

Sharing my knowledge and helping others never stops, so connect with me on my blog at http://robtiffany.com , follow me on Twitter at https://twitter.com/RobTiffany and on LinkedIn at https://www.linkedin.com/in/robtiffany

Sign Up for my Newsletter and get a FREE Chapter of “Mobile Strategies for Business!”

[mc4wp_form id=”5975″]