Total votes: 0
Print: Print Article
Please login to rate or to leave a comment.
Published: 22 Dec 2008
Dino Esposito talks about the lazy load pattern in LINQ-to-SQL.
Suppose you have an object model that represents your application’s domain. Now suppose you ask your object model to retrieve and return an instance of the Customer type—in particular, the customer that matches a given ID. What data would you expect it to contain?
For sure, the returned object will store personal information about the customer such as company name, contact information, address, maybe the URL to the Web site. But what about the relationships between, say, that customer and the collection of orders, invoices, territories, and products bought? Furthermore, what about the relationships between a given order and its order items, and between an order item and associated products, and so on?
To cut a long story short, the more complex the object model, the more you risk pulling up a very large graph of objects.
Lazy loading indicates the DAL’s ability to load only a portion of the object graph based on some rules. When you add this capability to your DAL, you have implemented the “Lazy Load” pattern (LL). What about lazy loading in LINQ-to-SQL?
You can hardly find a mention for the expression “lazy loading” in the whole LINQ-to-SQL documentation. Does it mean that LINQ-to-SQL doesn’t support lazy loading? Quite the reverse, I’d say. LINQ-to-SQL provides an excellent support for lazy loading; it only calls it using a different name. The LINQ-to-SQL’s equivalent of lazy loading is deferred loading. In this article, I’ll be using the expressions deferred loading and lazy loading interchangeably. You can do the same as well and definitely consider the two terms as synonyms. At the end of the day, the expression lazy loading describes the perception of the behavior—you load as late as possible, then you’re lazy. Instead, the expression deferred loading focuses on how the behavior is implemented—you load as late as possible, then you defer loading.
Deferred (lazy) Loading Defined
Lazy loading describes the behavior of an object that doesn’t hold all the data it needs. The object, though, does know how to retrieve missing data on demand. When you load an instance of, say, a Customer entity in a lazy loading model only essential data is loaded—for example, only personal information. Later, when some piece of code needs to go through orders associated with the customer, missing information is located and loaded.
A framework that fully supports lazy loading is capable of recognizing automatically any situation where missing data is required. This is an important point to understand—lazy loading is a feature of the object model that represents your application’s domain. Lazy loading is not implemented by your application code. Let me rephrase this point in the context of a concrete example to sweep away all clouds and doubts around it.
LINQ-to-SQL is a framework that fully implements the lazy loading pattern. Once you hold a reference to a Customer object, you can always access its Orders entity set. If the set is null, the framework will fill it up for you as your code attempts to access it. As a developer, you don’t have to add a single line of code to handle the loading.
Entity Framework version 1.0 is a framework that doesn’t support the lazy loading pattern. Period. Entity Framework, though, allows you to obtain the similar effect of loading data only when needed. So what’s the point? In Entity Framework, developers are requested to explicitly invoke a
Load method on the entity to populate dependencies. The team explained this choice with the precise will of making any database roundtrips as explicit as possible. It works and it is understandable and acceptable. Technically speaking, though, this is not lazy loading. (From here, maybe, the decision to call it with a slightly different name—deferred loading.)
Is the distinction between lazy and deferred loading really important?
Well, the fundamental point of design patterns is the common language they represent. Each design pattern carries a specific and unambiguous meaning. If you say lazy loading, you know what you mean. And so others know. You can’t just use known words with a different meaning. It adds confusion and obfuscates concepts.
Deferred (Lazy) Loading in LINQ-to-SQL
As mentioned, lazy loading is enabled by default in LINQ-to-SQL and the framework plans required queries based on that setting. As you’ll see in a moment, this may reserve some unpleasant surprises. Let’s consider the following code snippet:
The query selects all orders in the data context that have been delivered through a given shipper. Next, you loop over the selected orders; check the freight costs and, if necessary, send a notification to the customer. How many database roundtrips and queries do you think it will take? Why should it take more than just one query? Or, at least, this is the answer I would expect.
Amazingly, the previous code run N+1 queries where N is the total number of orders whose freight costs exceed 300. How is it possible? Well, it is all about lazy loading and its default enablement.
Because lazy loading is enabled by default in LINQ-to-SQL, the query engine attempts to minimize the costs of each query by looking at the fields that are really used and requested. The first query just requests orders shipped via a given carrier. Reasonably, LINQ-to-SQL just fires the following T-SQL statement:
As you can see, no JOIN is performed on the Customers table and no customer related information is retrieved at first.
Next, the code enters in the loop and for all selected orders a check is made on the column Freight. If the value is greater than 300, a notification is sent to the customer that is related to the order. At this point, LINQ-to-SQL figures out that there’s no information in the data context about the customer who placed a given order. LINQ-to-SQL realizes it was lazy and makes up for that. At this point, LINQ-to-SQL has no clue about how many orders you’re going to process and keeps on being lazy. So you want customer related to order #123? Great, you’ll have that, and only that! The following T-SQL statement is placed:
Needless to say, PICCO is the ID of the customer who placed the first order with freight greater than 300. Got the point? You’ll have one such query for each order which exceeds freight costs.
What’s the Purpose of Lazy Loading?
What’s the purpose of lazy loading? Wasn’t it to make application performance scream? Or was it just to make developers scream for frustration? Quite simply, lazy loading is not something you can ignore. It is an optional setting that delivers a very well known behavior. This behavior may, or may not, be helpful in a particular scenario. If not, you just disable lazy loading. Lazy loading is controlled at the data context level, as shown below:
If you know beforehand that you are going to work on orders and customers, you let LINQ-to-SQL know about that. More in general, whenever you’re working with an O/RM tool, you must provide it with any possible information that can be used to optimize the SQL statements. And even more importantly, you must carefully check any SQL code it emits. It is your responsibility as a developer, or architect, to ensure that the database code is the best possible and that there are no patent mistakes as shown earlier. The database profiler is your best friend.
This said, let’s see which tools you have in LINQ-to-SQL to bend lazy loading to your real needs.
Disabling Lazy Loading
In LINQ-to-SQL, lazy loading is controlled by a Boolean property exposed by the DataContext class. By setting the
DeferredLoadingEnabled property to
false, you disable lazy loading altogether. What does it mean, exactly? Consider the following statement:
It selects all customers from London and packs them into an enumerable collection. With lazy loading disabled, try counting the total of all orders a given customer has issued. You invariably receive 0 and need to arrange an ad hoc query yourself to get the number. With lazy loading disabled, you only get what you explicitly ask for. This is also the behavior you get in Entity Framework 1.0.
Specify Your Fetch Plan
Having lazy loading enabled may generate a less than optimal sequence of calls to a database. On the other hand, disabling lazy loading takes you to write more code than reasonably necessary when using an O/RM tool. If you write all of your code from A to Z refusing any productivity enhancer, then lazy loading, like anything else, is up to you. Disabling lazy loading vanishes the benefit of having LINQ-to-SQL, or any other O/RM, aboard. In the end, there’s a lot of room in between the two extremes of having lazy loading enabled or disabled. For example, you can specify your own fetch plan.
In general, a fetch plan indicates your strategy to retrieve data within a class. Lazy loading, for instance, is just a hardcoded strategy to retrieve data within the classes of your model. In this case, you have a static and immutable fetch plan. A good O/RM tool will always provide support for a dynamic fetch plan.
In LINQ-to-SQL, you specify your own fetch plan using the DataLoadOptions class, as shown below. Note that the class works in conjunction with
DeferredLoadingEnabled set to
true. In other words, a dynamic fetch plan is still a form of lazy loading.
The DataLoadOptions class features a method named
LoadWith through which you specify the relationships you want the tool to take into account when running a query. In the preceding code snippet, you instruct LINQ-to-SQL to load customer information for each order and to ignore any other relationships between involved entities. Now if you go back to the first example shown where you send notification to customers subject to high freight costs, you’ll face a different T-SQL query:
Having you provided LINQ-to-SQL with enough information, a much more optimized query is planned and executed.
O/RM tools are powerful instruments that save developers a lot of work and are becoming a necessary tool every day. However, you cannot blindly delegate tools the generation of the database code. Make sure you cross-check any T-SQL code and use the SQL profiler extensively during development. If you don’t like the database code being generated, change it. Sometimes you can do better by simply tweaking the O/RM configuration; sometimes you need to manually rewrite the T-SQL code.
Dino Esposito is one of the world's authorities on Web technology and software architecture. Dino published an array of books, most of which are considered state-of-the-art in their respective areas. His most recent books are “Microsoft ® .NET: Architecting Applications for the Enterprise” and “...
This author has published 54 articles on DotNetSlackers. View other articles or the complete profile here.
Please login to rate or to leave a comment.