Published: 06 Jun 2011
By: Scott Mitchell
Download Sample Code

This article looks at using NPOI to programmatically generate Excel spreadshets. NPOI is a free, open-source library for creating and reading Excel spreadsheets and started as a port of the Java POI project.

Contents [hide]

Introduction

Microsoft Excel is, perhaps, the most popular business analysis tool; countless organizations across all types of industry rely on Excel each and every day to make sense of their sales, customer metrics, and other data. As a developer, Excel's predominance comes into sharp focus when building an intranet application, as one of the most common feature requests you'll hear is, "Can we have that report available in Excel?"

There are a variety of techniques to programmatically create Excel spreadsheets, which include:

  • Rendering the data to display as a CSV file, XML file, or HTML <table>, which Excel can parse and display.
  • Using a reporting framework like Crystal Reports or SQL Server Reporting Services, which include built-in options to export reports to Excel.
  • Working with the Open XML SDK for Microsoft Office to create an Excel spreadsheet.
  • Turning to a third-party library, of which there are many commercial and open-source choices.

Each option has its pros and cons. For instance, rendering the data to display as an HTML <table> is usually the quickest and easiest approach of the lot, but limits your formatting and layout options significantly.

This article looks at using NPOI to programmatically generate Excel spreadshets. NPOI is a free, open-source library for creating and reading Excel spreadsheets and started as a port of the Java POI project. Like with the other techniques, NPOI has its own set of pros and cons. The pros: NPOI creates .xls-format Excel spreadsheets, which can be opened in all Microsoft Excel versions dating back to Excel 97; the API is pretty straightforward and easy to learn; and NPOI does not use Office automation to create Excel spreadsheets, meaning that you do not need to have Microsoft Office installed on the web server. The cons: creating non-trivial spreadsheets requires a bit of code, as you essentially have to create the spreadsheet row-by-row and cell-by-cell; also, the NPOI documentation is lacking, although there are enough examples and articles online to get up to speed rather quickly.

To get started with NPOI you need the NPOI.dll assembly, which can be downloaded from the NPOI project page on CodePlex. Once you have the assembly, you'll need to add a reference to it in your web application by right-clicking on the website in Solution Explorer, choosing Add Reference, and then browsing to the assembly.

NOTE

You can get your hands on the NPOI assembly by downloading the demo application for this article, which uses NPOI version 1.2.3. You'll find this assembly in the demo application's Bin folder. Alternatively, you can download the latest version of NPOI from its project page, http://npoi.codeplex.com.

An Overview of Excel and the Terminology Used in this Article

This article includes a lot of jargon that not everyone may be familiar with. Before we dive into the meat of the article, I'd like to take a minute to define the terms I use in this article so as to help avoid any confusion. Figure 1 shows an Excel spreadsheet, which is the entirety of the Excel document. This particular spreadsheet was generated by NPOI using code that is part of the demo available for download. In particular, the spreadsheet reports product sales from the Northwind database for 1997.

A spreadsheet contains a collection of sheets. The sheets are displayed as tabs at the bottom left corner of the spreadsheet. For example, the spreadsheet in Figure 1 has two sheets, Summary and Details, with the Details sheet being currently displayed.

A sheet is a collection of rows. The spreadsheet in Figure 1 shows rows 1 through 32, although in this particular sheet there are a total of 1,290 rows.

A row is a collection of cells. Cells can contain static values – text, numbers, dates, and so on – or they can hold formulae that report a computed value. Cells D20, E20, and F20 in Figure 1 have forumale that compute the summation of the Unit Price, Quantity, and Total amounts for the product (Alice mutton, in the case of cells D20, E20, and F20).

Figure 1: A spreadsheet is a collection of sheets, which is a collection of rows, which is a collection of cells.

A spreadsheet is a collection of sheets, which is a collection of rows, which is a collection of cells.

Creating Your First Excel Spreadsheet with NPOI

NPOI's classes and interfaces model the components of an Excel spreadsheet. For example, the Workbook interface defines the properties and methods necessary for a spreadsheet. Similarly, the Sheet, Row, and Cell interfaces define properties and methods for the sheets, rows, and cells.

Creating an Excel spreadsheet with NPOI involves two high-level steps:

  1. Building the spreadsheet model, which involves constructing the necessary objects and setting their properties accordingly, and
  2. Converting the NPOI-representation of the spreadsheet into an actual Excel spreadsheet, which can be saved to the web server's file system or streamed directly to the visitor's browser.

To illustrate these two steps let's build an ASP.NET page that generates a rather simple Excel spreadsheet. In particular, let's create an Excel spreadsheet that lists the details about each user account on the website. (This code and the subsequent examples are included in the download associated with this article.)

Before we look at the code itself, let's take a sneak peek at the final results so that our goal is clear. Figure 2 shows the generated Excel spreadsheet. There is a row for each user in the system listing their username, email, the date they joined, their last login date, whether or not they are approved, and any comments.

Figure 2: The Excel spreadsheet lists the details about each user on the site.

The Excel spreadsheet lists the details about each user on the site.

First, create a new ASP.NET page (or ASP.NET MVC Controller) and add the using statements in Listing 1. The first six using statements are for namespaces in the NPOI library. The System.IO namespace is useful because very often you'll want to take the spreadsheet created via the NPOI library and either save it to disk or to a MemoryStream whose contents will be streamed down to the browser.

Listing 1: Add these using statements to your class file.

Next, we need to get the details about all of the user accounts on the website. Presuming you are using ASP.NET’s Membership system you can use the Membership.GetAllUsers method to retrieve a complete list of users (which is why I included the System.Web.Security namespace in Listing 1).

Listing 2: Retrieve details about the users on your site.

We're now ready to start building the spreadsheet! NPOI's HSSFWorkbook class models an Excel spreadsheet; consequently, programmatically creating a spreadsheet using NPOI starts with creating a new HSSFWorkbook object. Once we have a workbook object at the ready we can add sheets to it by calling the CreateSheet method, which returns an object that implements the Sheet interface.

The code in Listing 3 creates a new workbook object and then adds a single sheet named "User Accounts."

Listing 3: Create a new workbook object and add a sheet.

Next we need to create the header row, which labels the data points displayed for each user – Username, Email, and so forth. To add a row to a sheet use the Sheet's CreateRow method, passing in the index of the row. CreateRow returns an object that implements the Row interface. Using this row object you can add one or more cells to the row by calling its CreateCell method. Similarly, CreateCell returns an object that implements the Cell interface. The cell object has a variety of methods and properties, one of the most germane being SetCellValue, which assigns a value to the cell.

The code in Listing 4 generates the header row. It starts by creating a new row in position 0. Next, the row's CreateCell method is called six times to create the six cells for the header row. Each cell has its value assigned – Username, Email, etc. – via the SetCellValue method.

Because the CreateCell method returns an object that implements the Cell interface you can use method chaining to call the SetCellValue. This is a useful shortcut if you don't need to call any other methods of the cell or assign any of its properties. Also, note that the index for both CreateRow and CreateCell starts at zero. To add the first row – which is labelled as row 1 in the Excel spreadsheet – you'd call CreateRow(0).

Listing 4: Create the header row and its cells.

We're now ready to add the data rows – namely, those rows that represent the user accounts on the website. This is done by looping through the collection of MembershipUser objects in userAccounts and adding a new row with cells that hold the values for each user's fields of interest – Username, Email, and so on (see Listing 5). The rowIndex variable, created and initialized to 0 in Listing 4, is incremented at each step through the loop and instructs NPOI where to place the new row.

Listing 5: A row is added to the sheet for each user account.

There are two minor steps that I've omitted from Listings 1 through 5 for brevity – namely, adding the "Report generated on ..." message after the last user account and auto-sizing each of the columns – but other than those unessential tasks we've got our spreadsheet constructed in NPOI.

At this point we need to decide what to do with this in-memory representation of a spreadsheet. Do we want to save it to the web server's harddrive or do we want to stream it directly to the visitor? The workbook object has a Write method that will save the Excel spreadsheet to a Stream of our choosing. For example, to save the spreadsheet to a file use the code in Listing 6.

Listing 6: Save the spreadsheet to a file on the web server’s file system.

Streaming the Excel spreadsheet back to the client so that it will open in the browser takes a bit more work as we have to:

  • Tell the browser that we're sending it a Microsoft Excel spreadsheet (as opposed to an HTML document),
  • Instruct the browser to treat this as an attachment so that is will open the View / Save As dialog box, and
  • Send the contents of the spreadsheet back to the client in the response.

These three steps are handled by the code in Listing 7, which starts by creating a new MemoryStream object. The spreadsheet is then "saved" to this MemoryStream via the workbook's Write method. Next, the Response object's Content-Type and Content-Disposition HTTP headers are configured (steps 1 and 2 above) and then the contents of the spreadsheet are streamed down to the client via a call to Response.BinaryWrite (step 3). Note that the Content-Disposition header is set such that the browser will suggest saving the spreadsheet using the filename MembershipExport-date.xls, where date is the month, day, and year the report was run; for example, MembershipExport-4-29-2011.xls.

Listing 7: The Excel spreadsheet is saved to a MemoryStream and streamed to the client.

Figure 3 shows the effects of the code in Listing 7 when viewed through Internet Explorer 9. Here, the user has clicked the Generate Report button, which triggered a postback. On postback, the Excel spreadsheet was created, saved to a MemoryStream, and streamed back to the browser. The browser then prompts the user whether to open the file (MembershipExport-4-29-2011.xls) or to save it.

Figure 3: The user is prompted to open or save the generated Excel spreadsheet.

The user is prompted to open or save the generated Excel spreadsheet.

Multiple Sheets, Styles, and Formulae, Oh My!

The membership report Excel spreadsheet is rather simple and dull. It is composed of a single sheet, employs no formatting, and does not require the use of any formulae. Let's look at a more interesting example.

The demo available for download includes a sales data report from the Northwind database. In short, a user can select the year of sales that interest them and then an Excel spreadsheet is generated that contains two sheets:

  • Summary, which lists the total sales for the year across all products (a single number) and breaks out the sales for each product, and
  • Details, which lists every single sale for the year – the order, date, price, quantity, and total – grouping the sales by product.

Additionally, this generated spreadsheet includes a variety of style settings not seen in the previous example, including:

  • Alignment,
  • Borders,
  • Formatting, and
  • Font weight

And the spreadsheet makes use of formulae, as well. The styles and formulae can be seen in action in Figure 1. For instance, the text in row 20 is bold and has a thin top and bottom border. Cells D20, E20, and F20 each use Excel's SUM formula to sum the figures in their respective columns. And the figures in the D and F columns are formatted as currency values.

The remainder of this article shows how to create and apply styles to cells and how to have a cell's value computed via a formula. Download the demo for a complete working example.

Creating and Using Cell Styles

Applying style settings to a cell involves the following steps:

  1. Create a new object that implements the CellStyle interface,
  2. Set the object's various properties, and
  3. Assign the object to the cell's CellStyle property.

Step 1 is accomplished by calling the workbook's CreateCellStyle method, which creates and returns the object used in steps 2 and 3. Listing 8 shows the code used to create the style for the subtotal row in the Details sheet – namely, the style used on row 20 – along with the code that assigns this style to the cells in the subtotal row.

Listing 8: The cell style for the subtotal row is created, configured, and assigned to the appropriate cell.

Listing 8 shows just the first two cells in the subtotal row having their CellStyle property set to the detailSubtotalCellStyle object. Keep in mind that this style is also assigned to the other cells in this row – cells C through F – but these assignments are omitted from Listing 8 for brevity.

Create Each Distinct Style Once, Not Once Per Cell

Note how in Listing 8 the detailsSubtotalCellStyle style object is created once and then assigned multiple times – once to the "Total" cell, once to the second cell in the row, and so on. When applying styles it's important to create the shared style once and then assign it to the needed cells rather than creating a new style with identical settings for each cell. The reason is because each style object you create – even if it has the same settings as another style object – is recorded in the Excel spreadsheet, and Excel has a limit on how many different cell styles it supports. Consequently, if you have a spreadsheet with many cells and you create a new style object for each cell you may bump into this limitation.

You can also use the style object to define a format string for the cell's value via the style object's DataFormat property. Listing 9 shows the code used to create the currencyCellStyle style object, which is is used for the detail records in columns D and F. Specifically, the style uses a DataFormat property setting that formats the cell's value as a currency.

Listing 9: The style’s DataFormat property is used to display the cell’s text as a currency.

Excel has a number of built-in formats. The code in Listing 9 starts by checking to see if the format string of interest - $#,##0.00 – is a built-in format by calling the HSSFDataFormat.GetBuiltinFormat method. If this format exists, its ID is returned and assigned to the style object's DataFormat property. If the format does not exist a new format object is created by calling the workbooks' CreateDataFormat method and defining its format string.

Computing Cell Values Using Formulae

In the examples we've seen thus far, we've assigned a value to a cell using the SetCellValue method. This works well when displaying a static value in the cell, but part of Excel's power is its ability to use formulae. By assigning a formula to a cell (rather than a hard-coded value), the cell's value is updated automatically when the values of the cells it references are modified.

To assign a formula to a cell, set the cell's type to FORMULA via the SetCellType method and then specify the formula via the CellFormula property. Listing 10 shows the code used to assign the formula to the D column in the subtotal row of the Details sheet, which is the summation of the unit price values for the current product. This is accomplished using Excel's SUM formula, which takes the format: SUM(startCell, endCell). The startRowIndexForProductDetails and rowIndex variables present in Listing 10 mark the first and last row indexes of the current product and are used to craft the formula so that the for the subtotal row ranges over the unit price values for the current product. For instance, for the 1997 sales report for Alice Mutton the resulting formula is SUM(D2:D19).

Listing 10: The SetCellType method and CellFormula property are used to craft the formula for the subtotal row’s Unit Price column.

Because the cells in the D, E, and F columns in the subtotal rows use formulae and not a hard-coded value, if the user manually modifies the detail values in any of those columns the subtotal values will automatically update to reflect the change.

Conclusion

There are a variety of ways to programmatically create an Excel spreadsheet from an ASP.NET website. This article explored one particular approach – using NPOI, a free, open-source library for working with Excel 2003 spreadsheets. With NPOI and a bit of code, you can create spiffy looking, multi-sheet Excel spreadsheets rich with formatting, styling, and forumlae. And, while not explored in this article, NPOI can also be used to read the contents of existing Excel spreadsheets.

Happy Programming!

Further Reading

<<  Previous Article Continue reading and see our next or previous articles Next Article >>

About Scott Mitchell

Scott Mitchell, author of eight ASP/ASP.NET books and founder of 4GuysFromRolla.com, has been working with Microsoft Web technologies since 1998. Scott works as an independent consultant, trainer, ...

This author has published 16 articles on DotNetSlackers. View other articles or the complete profile here.

Other articles in this category


Code First Approach using Entity Framework 4.1, Inversion of Control, Unity Framework, Repository and Unit of Work Patterns, and MVC3 Razor View
A detailed introduction about the code first approach using Entity Framework 4.1, Inversion of Contr...
jQuery Mobile ListView
In this article, we're going to look at what JQuery Mobile uses to represent lists, and how capable ...
Exception Handling and .Net (A practical approach)
Error Handling has always been crucial for an application in a number of ways. It may affect the exe...
JQuery Mobile Widgets Overview
An overview of widgets in jQuery Mobile.
Book Review: SignalR: Real-time Application Development
A book review of SignalR by Simone.

You might also be interested in the following related blog posts


An alternative to Crystal read more
Code Review Singleton Pattern Issues read more
Improved type safety when dealing with generic types, generic methods and reflection read more
ASP.NET Membership Tip: Requiring New Users To Change Their Password When Logging On For The First Time read more
Writing A Page To A String read more
MAB ContainerModel / Funq: a transparent container read more
Silverlight Release History : Q1 2009 SP1 (version 2009.1.413) read more
WPF Release History : Q1 2009 SP1 (version 2009.1.413) read more
Assembly Loading across AppDomains read more
Getting a reference to a behavior read more
Top
 
 
 

Discussion


Subject Author Date
placeholder nice to see another NPOI article Anotheranonymous Anotheranonymous 6/6/2011 6:29 PM

Please login to rate or to leave a comment.