Ruminations on Multi-Tenant Data Architectures
Posted by: Scott on Writing,
on 19 Aug 2009 |
View original | Bookmarked: 0 time(s)
One of the key advantages of web applications is that they can be deployed as a hosted service and accessed over the Internet rather than needing to be locally installed at a customer's site. Over the years I've helped numerous clients build web applications of this sort. When building such an application there are bound to be questions on the best way to model and store data from the different customers that access the hosted application. Do you co-mingle customer data in a single database with a single schema? Do you use a single database, but separate customer data intodifferentschemas?Or do you create a separate database with separate tables for each customer?
These three data architectures are examined and discussed in detail in the article Multi-Tenant Data Architecture. The authors introduce three different multi-tenant data architectures, which they name:
- Separate Databases - in this architecture each customer's data is stored in a separate database. The databases may all be on the same database server or they could be partitioned across multiple database servers. This approach provides for maximum isolation of customer data. If we were building a hosted application using the Northwind database and had three customers then with this approach we would have three databases - Northwind01, Northwind02, and Northwind03 - and each database would have the same tables, views, stored procedures, and so on. Customer 1's data would all be located in the Northwind01 database, while Customer 2's data would be over in the Northwind02 database.
- Shared Databases, Separate Schemas- SQL Server 2005 introduced the concept of schemas, which offer a way to group a set of tables. With this approach you can have a single database with one schema for each customer. Returning to the Northwind example, with this approach there would be a single database, but there would be three schemas - Customer01Schema, Customer02Schema, and Customer03 schema. Each schema would have the same set of tables, views, stored procedures, and so forth. Customer1's product information would be found in the Customer01Schema.Products table, whereas Customer2's product information would be stored in Customer02Schema.Products.
- Shared Database, Shared Schema - here we have only a single database and a single schema. There would be only one Products table. To differentiate one customer's products from another we'd need to add a NorthwindCustomers table that would have a record for each customer and then add a NorthwindCustomerID foreign key to the Products table (and to the other pertinent tables).
The Multi-Tenant Data Architectureexamines these three approaches in much more detail with screen shots, T-SQL snippets, and more in-depth examples, and is definitely worth reading.
I'd like to talk a little bit about my from the trenches experiencewith multi-tenant data architectures. First, a little background.
Medical Software
In 2003 I started work with a client we'll call Acme Medical, which was building a hosted, ASP.NET medical software application. This application had been in existence for a couple of years by this point, but as a Microsoft Access application that was installed and run locally from a couple of hospitals and clinics. I joined the project with the aim of moving the application from Microsoft Access to ASP.NET and SQL Server. We chose to use a Separate Databases architecture. I continue to work on this project today. Originally, there were two customers, each with a few thousand records. Thedata model contained maybe 20-30 tables. Today there are more than15 customers, each with hundreds of thousands if not millions of records. There are nearly 350 database tables, hundreds of ASP.NET pages, and several automated backend processes.
Print Management
Another client, Acme Printers, was a prominent player in the print shop marketplace, selling in-house applications to large-scale print shops. I helped this company build an online, hosted version for order placement and fulfillment. We used a Shared Database, Shared Schema data model. In a nutshell, there is a Customers table with a CustomerID primary key value uniquely identifying each customer. Every other database table has a CustomerID field that identifies what data belongs to what customer. When a user signs on,welook up what customer the user is associated with and then store thisCustomerID in session. This CustomerID variable is then used inother pages on the site to pull backthedata pertinent to that customer.
I have never used the Shared Database, Shared Schema approach.I recently spoke with Michael Campbellabout this topic, and he shared the following advice: I'd recommend against using shared DBs and separate schemas. In my experience that almost NEVER works out as advertised, adds all sorts of difficulty in terms of disaster recovery, management, and isolation, and really doesn't offer any worthwhile benefits. It also becomes absolutely insane in terms of managing permissions/security as well.
The factors that should influence whether yougo with a Separate Database or Shared Database, Shared Schema architecture are not technical, but rather are regulatory-, security-,or business-related. If you find yourself deciding to use one architecture over another because of some technological reason you're probably asking the wrong questions and evaluating the wrong criteria.
Here's how I tackle this problem. I start by assuming that I'm going to use the Shared Database, Shared Schema architecture, as it's usually the best fit for the types of projects I work on. It easier to setup and implement and test and debug and scale up (to a point) than the Separate Database architecture. For the majority of web applications, the Shared Database, Shared Schema architecture is the best approach when weighing just the technological- and development-relatedfactors.
Sadly, in the real-world there are many factors that outweigh technological ones. While the Separate Database architecture has more friction associated with it than the Shared Database, Shared Schema, it does have certain advantages in terms of security, privacy, isolation, and so on. When evaluating which architecture to use, I like to askmy clientthe following questions:
-
How important is thesecurity and privacy of the data?
-
Are there any regulatory reasons for choosing one architecture over the other?
-
HIPAA, which definesguidelinesfor patient privacy. It's been several years since I've explored HIPAA, but from my recollection HIPAA does not require that a Separate Database architecture be used, but using one ensures that patient data in one hospital is isolated from the doctors and staff at other hospitals.
-
Do your customers want/need access to the database?
-
How many customers do you expect to be using the system?
Note that the above questions don't touch on technological or development issues. Rather, they are focusing on security, privacy, regulations, and the end user's needs or expectations.
While the Separate Databases architecture provides a higher degree of security andprivacy through data isolation, there are a couple of challenges worth noting.
- Rolling out changes to the data model is more difficult. Imagine that you've added a new feature, which entailed adding two database tables, a new view, and six new stored procedures. In a Shared Database, Shared Schema architecture, rolling out that change is a matter of adding those database objects to the production database. With a Separate Databases architecture, you need to make sure to roll out those changes to all of the databases. With a systematic, tested, scriptedprocess this is not much of a challenge, but if you don't have such a system defined - if this work is done manually, for instance - you're asking for trouble as you're undoubtedly going to have a scenario where the database changes get rolled out to some databases, but not all.
- Viewing data aggregated across the databases is difficult. I've touched upon this topic in an earlier blog post, Running the Same Query Against Multiple Databases. When you find a bug on one database and need to see whether it affects data in other databases there are not many tools at your disposal. One poor man's tool is sp_msForEachDb,but it's less than ideal.
- Adding a new customer requires creating a new database. Signing up a new customer is great news for the company's bottom line, but what changes does it entail in the data model? For a Shared Database, Shared Schema architecture, adding a new customer is as simple as adding a new record to the Customers table. For a Separate Databases architecture, adding a new customer means creating a new database and adding the appropriate database objects and any initial data in these tables. Using the SQL Server model database you can simplify this process, but in order to do so you need to make sure that any changes to the data model- new stored procedures, tables, views, UDFs, etc. - need to also be added to the model database. Similarly, any changes to the initial state - new default records in a lookup table, say - need to also be added to the appropriate tables in model.
Given the pros and cons of these two multi-tenant architectures, you can guess what architecture a company is using based on their price point and their targeted industry. If there is a high cost to sign up to the site then chances are the company has only a few clients (maybe just a few, maybe dozens, but probably not hundreds or thousands), and if the application is for medical, financial, or legal purposes then they probably use a Separate Databases architecture. For web applications with a lower price point and geared to industries where data privacy and security is less important and less encumbered by regulations, chances are a Shared Database, Shared Schema approach is being used.
Whether you choose the Shared Database, Shared Schema or Separate Databases architecture depends largely on non-technological factors. As I noted earlier, I typically choose the Shared Database, Shared Schema architecture by default, only switching to a Separate Databases architecture if there are particular security, privacy, regulatory, or other business needs that necessitate it.