Bulk Importing Content into a SharePoint Publishing Site

Posted by: Clarity Blogs: ASP.NET, on 11 Jan 2009 | View original | Bookmarked: 0 time(s)

A little while ago, I posted a question on StackOverflow looking for suggestions on how best to migrate content into a SharePoint publishing site. I got a quite a few good suggestions, but none of them felt like a great fit for our needs.

As you may have realized from my last few posts, I'm working on a project to migrate an existing web site to SharePoint. A very important thing for me was to have all the site content in source control, allowing us to be able to repeatably deploy it into SharePoint during development and testing.

Additionally, we were working with a topology where all authoring would be happening in an Authoring environment behind the firewall, with scheduled Content Deployment jobs responsible for migrating the content to the Production environment. We had to be able to initially populate the content into the Authoring environment so that the content authors (or us) didn't have to do it manually.

We came up with what I think is a pretty original approach to this. The general idea is that since this is a SharePoint Publishing site, all of our content obviously corresponds to a certain content type (inheriting from the Page content type), and is displayed using a specific Page Layout.

Content Type Import Template

What this enables us to do is to define our site content in Xml - making it part of our source code. We can use a simple application to deploy it to a specific site collection. 

<?xml version="1.0" encoding="utf-8" ?>
<Pages SiteUrl="/">
  <Page
      Name="AboutUs.aspx"
      Default = "false"
      ContentTypeId=""
      PageLayout="GeneralContent.aspx">
    <Columns>
      <Column Name="Title" Id="{fa564e0f-0c70-4ab9-b863-0177e6ddd247}">About Us</Column>
      <Column Name="Page Content" Id="{f55c4d88-1f2e-4ad9-aaa8-819af4ee7ee8}">
        <![CDATA[
        <p>Some Html</p>
        ]]>
      </Column>
      <Column Name="Meta_Keyword" Id="{74dcba16-0809-4bca-87d5-b0fc67d2086f}"></Column>
      <Column Name="Meta_Description" Id="{6b1216bb-3023-462d-af6f-31878e9808bb}"></Column>
    </Columns>
  </Page>
</Pages>

Page Element

What's happening here is actually pretty simple;  our Xml file contains a Page element for every Publishing page that we need to import into the SPSite at the url specified by the SiteUrl property.

There are several attributes of the Page element:

  • Name - the name of the page when it's created in the Pages document library
  • Default - whether or not this is the Welcome Page of the specific site
  • ContentTypeId - the id of the content type that describes this content (I omitted it from the snippet above because content type ids are so damn long)
  • PageLayout - the page layout to use when creating this page

Columns

Each Page also contains a collection of columns corresponding to the columns in the content type. As I mentioned, all of our content types inherited from the Page content type, allowing us to use its columns such as Title and Page Content.

For each column, we provide the Name and the Id. We went with this approach for two reasons, the first being readability of our import files. The second being that when creating these pages programmatically, we felt that it was safer to reference these columns by their Id.

Dealing with HTML Content

The Page Content column contains Publishing HTML, we can specify this in our Xml import files by wrapping the HTML content in a <![CDATA[ ]]> block. This is great because we can paste our HTML in here without worrying about the Xml validation.

Importing the Content

Before we look at the code to import this content, let's define a couple of classes that will help us:

public class Page
{
    public string Name { get; set; }
    public string ContentTypeId { get; set; }
    public string PageLayout { get; set; }
    public List<Column> Columns { get; set; }
    public bool Default { get; set; }
}
 
public class Column
{
    public string Name { get; set; }
    public Guid Id { get; set; }
    public string Value { get; set; }
}

After specifying a particular Xml file to import content from, we use some LINQ to get the set of Pages:

var query = from xElem in xmlFile.Descendants("Page")
    select new Page
    {
        Name = xElem.Attribute("Name").Value,
        ContentTypeId = xElem.Attribute("ContentTypeId").Value,
        PageLayout = xElem.Attribute("PageLayout").Value,
        Default = Boolean.Parse(xElem.Attribute("Default").Value),
        Columns = (from column in xElem.Descendants("Column")
                   select new Column
                   {
                   Name = column.Attribute("Name").Value,
                   Id = new Guid(column.Attribute("Id").Value),
                   Value = column.Value
                   }).ToList()
    };

This now becomes a matter of iterating through each Page element and programmatically creating and publishing the Page:

SPSite siteCollection = null;
SPWeb site = null;
PublishingSite publishingSite = null;
PublishingWeb publishingWeb = null;
 
foreach (var page in query)
{
 
    using (siteCollection = new SPSite(siteURL))
    {
        using (site = siteCollection.AllWebs[siteName])
        {
            publishingSite = new PublishingSite(siteCollection);
            publishingWeb = PublishingWeb.GetPublishingWeb(site);
 
            List<PublishingPage> pages =
                publishingWeb.GetPublishingPages().ToList();
 
            PublishingPage thePage = pages.Find(
                delegate(PublishingPage pp) 
                    { return pp.Name == page.Name; });
 
            if (thePage != null)
            {
                // Page exists
                thePage.CheckOut();
            }
 
            //get ContentTypeID
            SPContentTypeId contentTypeId = 
                new SPContentTypeId(page.ContentTypeId);
 
            //get PageLayout and Page Name
            List<PageLayout> layouts =
                publishingWeb.GetAvailablePageLayouts(contentTypeId).ToList();
            PageLayout layout = layouts.Find(
                delegate(PageLayout pl) 
                { return pl.Name == page.PageLayout; });
            string pageName = page.Name;
 
            PublishingPage pubPage = null;
 
            if (thePage != null)
                pubPage = thePage;
            else
                pubPage = publishingWeb.GetPublishingPages().Add(pageName, layout);
 
            foreach (Column siteColumn in page.Columns)
                pubPage.ListItem[siteColumn.Id] = siteColumn.Value;
 
            pubPage.Update();
 
            //publish and approve
            SPListItem newPage = pubPage.ListItem;
            newPage.File.CheckIn("Checked in on creation");
            newPage.File.Publish("Published on creation");
        }
    }
}

Conclusion

There are a bunch of things we can improve on here, but we just needed a quick and dirty tool that provided us with a repeatable way of creating our site content. We can now fully populate our Authoring environment and let the Content Deployment job take care of the rest.

Advertisement
Free Agile Project Management Tool from Telerik
TeamPulse Community Edition helps your team effectively capture requirements, manage project plans, assign and track work, and most importantly, be continually connected with each other.
Category: Sharepoint | Other Posts: View all posts by this blogger | Report as irrelevant | View bloggers stats | Views: 2357 | Hits: 13

Similar Posts

  • IIS Search Engine Optimization Toolkit more
  • Announcing: IIS Search Engine Optimization Toolkit Beta 1 more
  • Top Ten Things We Learned on a SharePoint WCM Project more
  • Script for Bulk Import of Active Directory Site Links more
  • Script for Bulk Import of Active Directory Subnets more
  • Programmatically Importing SharePoint Reusable Content more
  • Formatting Source Code in your Blog more
  • Category / Taxonomy / Tagging / Knowledge Management Extension for SharePoint ( WSS / MOSS ) more
  • RadControls for ASP.NET Ajax in SharePoint 2007 and DotNetNuke more
  • You can't hold onto nothing more

News Categories

.NET | Agile | Ajax | Architecture | ASP.NET | BizTalk | C# | Certification | Data | DataGrid | DataSet | Debugger | DotNetNuke | Events | GridView | IIS | Indigo | JavaScript | Mobile | Mono | Patterns and Practices | Performance | Podcast | Refactor | Regex | Security | Sharepoint | Silverlight | Smart Client Applications | Software | SQL | VB.NET | Visual Studio | W3 | WCF | WinFx | WPF | WSE | XAML | XLinq | XML | XSD