Protecting against XML Entity Expansion attacks

Posted by: Tom Hollanders blog, on 21 May 2009 | View original | Bookmarked: 0 time(s)

One of the critical responsibilities of every developer and architect is to understand, and know how to prevent, as many kinds of security attacks as possible. While there are many types of attacks and many weapons at our disposal to thwart them, the most basic defence we have is input validation. The rule of thumb really needs to be to assume all input from uncontrolled sources is malicious, unless you can prove otherwise. This includes input from end users, as well as input from other systems.

Recently I worked on an application that received XML files from that most untrustworthy of sources, the Internet. Knowing the kind of people who lurk there, we took what seemed like a responsibly paranoid approach involving validating each parsed document against an XML schema, checking a digital signature to ensure it came from a known sender, and cherry-picking the values we needed out of the document.

So I was quite surprised to learn that there were was a class of attack which we had not mitigated. It turns out that you should never load untrusted XML content into a .NET XmlDocument class as a first step, even if you plan to do all sorts of checks on it afterwards. This is because there is a class of attack which can bring your server to meltdown just by getting it to parse some XML.

Consider this piece of XML:

<!DOCTYPE foo [ 

<!ENTITY a "1234567890" >
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;" >
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;" >
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;" >
<!ENTITY e "&d;&d;&d;&d;&d;&d;&d;&d;" >
<!ENTITY f "&e;&e;&e;&e;&e;&e;&e;&e;" >
<!ENTITY g "&f;&f;&f;&f;&f;&f;&f;&f;" >
<!ENTITY h "&g;&g;&g;&g;&g;&g;&g;&g;" >
<!ENTITY i "&h;&h;&h;&h;&h;&h;&h;&h;" >
<!ENTITY j "&i;&i;&i;&i;&i;&i;&i;&i;" >
<!ENTITY k "&j;&j;&j;&j;&j;&j;&j;&j;" >
<!ENTITY l "&k;&k;&k;&k;&k;&k;&k;&k;" >
<!ENTITY m "&l;&l;&l;&l;&l;&l;&l;&l;" >
]>
<foo>&m;</foo>

This certainly looks like an odd bit of XML, but at first glance it doesnt appear overly scary. Its compact, well-formed and actually only contains one element: <foo>. But whats in that element? Its a single custom-defined entity, &m;. And how is that defined? Well, its 8 other custom &l; entities. So whats an &l; then? Hmm, its 8 &k;s. You can see where this is going. The document will end up with 812 &a;s, where each &a; has 10 characters, so that innocent looking &m; will blow out to 10x812 or 687,194,767,360 characters. And on my reasonably well speced developer machine, expanding that number of characters consumed all of my CPU for longer than I was prepared to put up with. A bad guy armed with this attack isnt going to steal any data, but they could still cause a lot of damage through denial of service.

The good news is that its actually very easy to stop this entity expansion in its tracks. The key is to use an XmlReader before parsing the document into an XmlDocument (or instead of, if you can live without a fully-parsed document). Its possible to validate against an XSD or other schema type using an XmlReader too, but heres a minimalist example showing how you can check that a document is well-formed, contains no DTDs (and hence no entity definitions) and is less than 10K in size:

// Prepare text reader and settings for Xml Validation
StringReader textReader = new StringReader(unparsedXml);
XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = null;
settings.MaxCharactersInDocument = 10000;

// Successfully parse the file, otherwise an XmlException is to be thrown
XmlReader reader = XmlReader.Create(textReader, settings);
while (reader.Read()) ;

If you get to this point without an XmlException being thrown, the document should be safe to parse. Of course, there could be all sorts of evil things lurking within the elements of the document, so you need to continue to use appropriate validation and encoding as you would for any untrusted input.

Advertisement
Free Agile Project Management Tool from Telerik
TeamPulse Community Edition helps your team effectively capture requirements, manage project plans, assign and track work, and most importantly, be continually connected with each other.
Category: Data | Other Posts: View all posts by this blogger | Report as irrelevant | View bloggers stats | Views: 1057 | Hits: 14

Similar Posts

  • System.Data.Linq.Binary is not XmlSerializable more
  • A Bad Idea, EF Entities over WCF more
  • "The security validation for this page is invalid" when calling the SharePoint Web Services more
  • XML and Languages more
  • XmlSchemaSet Thread Safety more
  • Twitter API - Get a list of your friends in C# more
  • Calling the Twitter API in C# more
  • Talking Points: ADO.NET Entity Framework more
  • XML serialization using generics more
  • How to upgrade Atom 0.3 feeds on the fly with a custom XmlReader for use with WCF Syndication APIs more

News Categories

.NET | Agile | Ajax | Architecture | ASP.NET | BizTalk | C# | Certification | Data | DataGrid | DataSet | Debugger | DotNetNuke | Events | GridView | IIS | Indigo | JavaScript | Mobile | Mono | Patterns and Practices | Performance | Podcast | Refactor | Regex | Security | Sharepoint | Silverlight | Smart Client Applications | Software | SQL | VB.NET | Visual Studio | W3 | WCF | WinFx | WPF | WSE | XAML | XLinq | XML | XSD