Introduction
Garbage Collection is one of the most important and vital aspects of compiler design that has a direct impact on the performance and speed of execution for a program. The .NET Framework has a very powerful and sophisticated garbage collector in its virtual machine that is not discovered very well, and it didn't have any major changes since the initial release of version 1.0 of the framework until the most recent version, 4.0, where Background Collector was added to improve the quality of garbage collection. In this article the principles of garbage collection are reviewed, fundamentals of garbage collection in .NET are discovered, and the changes to garbage collector in .NET Framework 4.0 are discussed.
Understanding certain aspects of a compiler that you're using as a software developer is a key factor in your success in building high-quality software in regards to different parameters like performance and resource usage. Of course, many advanced details of compiler design and implementation are not of primary concern for a programmer but some of them have a close relationship to better use of your compiler and programming language. In this article we try to discover one of the most important aspects of every compiler which is garbage collection that has a huge impact on the memory use and performance of a program.
We split the discussion into three major sections:
- General discussion of principles of garbage collection and related techniques in compilers
- How garbage collection is implemented in the .NET Framework
- Major changes applied to garbage collection in .NET Framework 4.0
Note that the .NET Framework offers a rich set of APIs for manual manipulation of garbage collection available in System.GC namespace that is beyond the topic of this article. Here we aim to understand the fundamentals of garbage collection and you can be sure that in most of the common programming scenarios with .NET you don't need to care about using this namespace and its APIs unless you're using unmanaged resources in your code.
Principles of Garbage Collection
Essentially, compilers use two types of memory allocations for variables that you create in your programs:
- Stack allocation: The variables that are only alive within the scope of a single method are allocated on a stack that keeps track of activation records for each and every method. These items are pushed and popped to or off the stack before and after the execution of a method and the objects allocated on stack will be deallocated.
- Heap allocation: There are certain variables in a program that should outlive the boundaries of methods such as global variables, pointers to variables, parameters passed as by reference, among others. Such objects will be allocated on a heap and will be kept there as long as needed.
Obviously, we have to pay special attention to the case for heap allocation as we need mechanisms to allocate a portion of memory from heap to an object and also release that portion when we're done with it. This process is very clear when you use languages like C/C++ where you need to use instructions like malloc and free.
This is challenging for programmers as they need to take care of the lifetime of each and every variable and write extra code by hand to keep track of objects created. Newer generations of compilers have tried to hide this requirement by implementing an automated process of releasing the variables. After creating a new variable using instructions like new (that is similar to malloc in C/C++), you no more need to take care of deallocating the memory. These compilers, including the .NET Framework, implement a technique commonly known as Garbage Collection in which they use different algorithms and techniques to detect the objects allocated on heap that are not needed anymore and deallocate them when available memory is low.
At first glance, garbage collection sounds like an unimportant and easy task but it is indeed a very important and challenging task that much time and effort is dedicated to innovating new techniques to improve existing methods both at academia and industry for it. Garbage collection can alone change the story of success for a compiler as there are many objects in common programming codes being allocated on heap.
In general, there are two main challenges for garbage collection:
- Finding the memory slots to deallocate from heap. In this case the main challenge is to make sure we're finding the slots that are not really needed by any means (think about dangling pointers and how they can make things worse). The other related challenge is to find a memory slot to deallocate that is least likely to be needed in the future again.
- When to perform garbage collection. The process of garbage collection is usually expensive and can have an impact on the resource usage and performance of code execution, so we need to find the best time to perform garbage collection. The tradeoff is between doing that too early or too late. If it is done too early, we're wasting our resources for something that is probably not necessary, and if it is done too late, we're making the penalty bigger.
These two challenges have driven the research community to find different algorithms and techniques that have their own pros and cons and based on the type of the compiler and its applications one of them or a combination of them is selected. Some of the major techniques are Reference Counts, Tracing Collection, Mark and Sweep, Pointer Reversal, Stop and Copy, Generational Collection, and Conservative Collection. Although knowing about these techniques can be very helpful (e.g., Generational Collection is used in the .NET Framework as we discover later), it's beyond the scope of current writing and it can be left to interested read to discover them in more details.
Garbage Collection in the .NET Framework
As with many other modern operating systems, Windows is built on top of a virtual memory infrastructure that hides physical memory from developers so they can work only with virtual memory addresses. In the .NET Framework each process has its own virtual memory space dedicated to that and .NET works with Windows API to manipulate virtual memory for the process. Various threads inside a process share the same virtual memory and address space.
As you may guess, during allocation and deallocation of virtual memory spaces, you may face with fragmentation issues as there will be multiple free spaces with smaller sizes while you need a continuous space to allocate to your process. Windows will take care of this issue with several algorithms that are beyond the scope of this writing.
Garbage collection in the .NET Framework happens in three cases. First, if the physical memory is low, we need to deallocate some objects to free space. Second, if we pass a threshold in heap storage, we need to deallocate some space. Third, we can manually deallocate objects using GC.Collect method.
The .NET Framework allocates a part of memory for storing your objects and this is called managed heap which is different from the heap that is managed by the operating system. Each process has its own managed heap and the threads in a process share the same managed heap. Managed heap consists of large objects heap and small objects heap which correspond respectively to large objects and small ones even though for many purposes we can consider the managed heap as a single entity.
Generally, the garbage collector in .NET tries to keep a good balance between two priorities by adjusting its thresholds: keeping the amount of memory occupied by an application/process in a reasonable level and reducing the garbage collection time.
Generations
In order for the .NET garbage collector to perform effectively, it follows a generational collection approach which tries to split the heap into different generations for short-lived and long-lived objects. In essence, an object is promoted from a lower generation where it can be garbage collected earlier to a higher generation where it can be collected later based on its usage. .NET collector tries to keep long-lived objects in higher generations to avoid recreating and reclaiming them.
There are three generations in the .NET Framework on the managed heap:
- Generation 0: A generation for short-lived objects that are garbage collected more often than others. Most objects in this generation are deallocated before they can go to the next generation.
- Generation 1: This is like a bridge between generation 0 and generation 2 to keep short-lived objects before they can be promoted to the long-lived level.
- Generation 2: A generation for long-lived objects like some server objects that appear in ASP.NET and are alive for the duration of the application lifetime.
Note that a newly created object will be allocated based on its size, that is, if it's a small object, it goes in generation 0 otherwise it goes in generation 2 of large object heap. As their nature suggests, generations 0 and 1 are called ephemeral generations.
If an object is not deallocated in a garbage collection round in a generation, it is called a survivor and such objects will be promoted to the next generation. Objects that survive in generation 2 stay in there. The garbage collector is smart to adjust its thresholds based on the frequency of the objects promoted to the next level. So if there are many objects surviving a generation, it will change the threshold to improve the performance and memory use.
How Garbage Collection Works
The .NET Framework applies the concept of roots in order to find the candidate objects that should be deallocated from the managed heap. A root is the memory storage pointing to an object on managed heap.
Garbage collector tries to examine the objects on managed heap to determine if they have a root and it does this by using the object graph. Each object is examined once to avoid circular cases. If an object doesn't have a root, it can be evicted from the managed heap and garbage collector can compact the managed heap after such evictions.
Workstation and Server Garbage Collection
Many aspects of garbage collection are controlled automatically by the .NET Framework and it sets its thresholds automatically based on the performance of the runtime, however, developers are given a few options for working with garbage collector. One of these options is to configure the garbage collection for your application in your web or application configuration files using the gcServer element. Using this element, you can set the garbage collector to work in workstation or server mode.
The workstation mode is designed for client systems like personal computers and could be done concurrently or non-concurrently. The concurrent garbage collection lets the threads operate during a garbage collection and is replaced with background garbage collection in .NET Framework 4.0 (discussed later).
On the other hand, server garbage collection is designed for applications running on servers that have specific requirements such as throughput and availability.
You can set the enabled attribute of gcServer element to false to enable workstation garbage collection (which is the default behavior), or to true to enable server garbage collection.
Concurrent Garbage Collection
Normally, garbage collection blocks all the threads of execution except the one that has requested the garbage collection which can pause the application. In order to reduce the side-effects of this there is an alternative mechanism called concurrent garbage collection which works only for generation 2 objects in which threads execute concurrently with a dedicated thread that performs garbage collection. This is not applied to ephemeral generations as their garbage collection is faster.
The default garbage collection for workstation mode is concurrent collection but if you have many processes running, it's better to disable this option.
Background Garbage Collection
After five major releases of the .NET Framework (1.0, 1.1, 2.0, 3.0, and 3.5) that either didn't have any update to CLR or if they had, no major modifications were applied to the garbage collector, .NET 4.0 comes with the concept of background garbage collection which is essentially a replacement and improvement to concurrent garbage collection. This garbage collection is only enabled for workstation mode.
Background garbage collection is performed by default when concurrent garbage collection is enabled, and it is applied to generation 2 objects (and not ephemeral generations). It also uses a dedicated thread to perform garbage collection.
Foreground garbage collection is performed on ephemeral generations whenever necessary and in this case the background garbage collection which runs on a dedicated thread suspends itself. After the foreground garbage collection completes execution, the background one resumes.
Summary
Garbage collection is one of the most important aspects of a compiler that should be understood by developers and didn't have any major changes for a very long time in .NET since version 1.0 till version 4.0 where background garbage collector was introduced to improve the garbage collection quality. In this article we discussed the principles of garbage collection in compilers in general, and then discovered the principles of garbage collection in the .NET Framework, and wrapped the discussion up by reviewing the new background collector in .NET 4.0.
About Keyvan Nayyeri
 |
Keyvan Nayyeri is a Ph.D. student in Computer Science and already has a B.Sc. degree in Applied Mathematics.
His primary research interests are Software Engineering and Programming Languages & Compilers. He’s also a software architect and developer with a focus on Microsoft stack of developm...
This author has published 5 articles on DotNetSlackers. View other articles or the complete profile here.
|
You might also be interested in the following related blog posts
Garbage Collection in .NET
read more
Investigating .NET Memory Management and Garbage Collection
read more
Introducing Recurring Appointments for Web.UI Scheduler ASP.NET AJAX
read more
The ESRI Dev Summit 2010 hosted in Palms Springs CA is open for registration.
read more
Error running ASP.NET application with impersonation: runtime Failed to start monitoring changes
read more
How to fix Sys.WebForms.PageRequestManagerParserErrorException in ASP.NET 2.0 AJAX 1.0
read more
Mapping references and collections in Telerik OpenAccess ORM (Part 1)
read more
Problems running Windows Communication Foundation (.svc) on an upgraded Windows 7 and IIS 7
read more
Free software for you! WebsiteSpark let the mountain go to Microsoft instead.
read more
Cloud Camp LA 9/30/2009
read more
|
|
Please login to rate or to leave a comment.