Total votes: 4
Print: Print Article
Please login to rate or to leave a comment.
Published: 10 Aug 2009
Andrew Siemer will walk you through from start to finish (in a series of articles) on how he would go about creating a StackOverflow style knowledge exchange.
The Stack Overflow Inspired Knowledge Exchange Series
TOC Checkout the project homepage of this series to follow our journey from the creation of the famous StackOverFlow website.
In looking at all the hype surrounding the great site StackOverflow (currently my third place) I have found a lot of people wondering how they built that site. There is quite a lot of information regarding how it was built from a high level generic view. But I have yet to find any site that details an actual implementation of a StackOverflow style knowledge exchange from start to finish (though there are some copies of their idea out there: cnprog.com, code.google.com/p/cnprog, code.google.com/p/stacked).
While it would be nice for Jeff and Joel to open up their code base to the world (as they did with their data, and their WMD WYSIWYM Markdown Editor), it is highly unlikely as they are offering their software to generate revenue. And so, with the permission of Jeff Atwood himself (via email), I will walk you through from start to finish (in a series of articles) on how I would go about creating a StackOverflow style knowledge exchange.
The idea of egoless programming came up while researching this series. I read about it first on Jeff Atwood’s site here but also found other references to it. The basic idea as stated by Johanna Rothman is this:
Egoless programming occurs when a technical peer group uses frequent and often peer reviews to find defects in software under development. The objective is for everyone to find defects, including the author, not to prove the work product has no defects. People exchange work products to review, with the expectation that as authors, they will produce errors, and as reviewers, they will find errors. Everyone ends up learning from their own mistakes and other people's mistakes. That's why it's called egoless programming. My ego is not tied to my "perfect" or "imperfect" work product. My ego is only tied to my attempts to do the best job I know how, and to learn from my mistakes, not the initial result of my work.
Along this guideline I will attempt to do my best while designing and building this knowledge exchange software in full view of the public. I plan to build this software in a manner that uses all of the latest and greatest industry buzz-words and technologies such as nTier, TDD, DDD, continuous integration, MVC, LINQ to SQL, AutoMapper, MvcContrib, SOLID, DRY, IoC, StructureMap, SketchFlow, etc. I want to admit up front however that I do not proclaim to be a rocket scientist at all of these and so I expect to learn along with you in some cases. I fully expect a great many of you to give me coarse corrections (word play?) along the way where you think I am wrong and I will make adjustments where possible. I expect some of you to send full on flames instead of suggestions. And this is where the ego-less programming will come in!
What information is currently available about StackOverflow?
There is some information regarding the StackOverflow architecture in a readable form but a good majority of it is buried in a podcast here or there. In a blog post on blog.stackoverflow.com entitled “What was stack overflow built with” you will see a list of technologies the SO team used.
Table 1: StackOverflow technology stack
And then there are the other tools that are used by StackOverflow.
Table 2: Other dependancies
There is also a good article on highscalability.com regarding StackOverflow, the stack, the hardware, and the stats for their site. We will use these stats as something to shoot for in our sites design. There is also a great list of “lessons learned” that you might be interested to read if you plan on having a site even remotely as popular as theirs!
Stats excerpt from highscalability.com
- 16 million page views a month
- 3 million unique visitors a month (Facebook reaches 77 million unique visitors a month)
- 6 million visits a month
- 86% of traffic comes from Google
- 9 million active programmers in the world and 30% have used Stack Overflow.
- Cheaper licensing was attained through Microsoft's BizSpark program. My impression is they pay about $11K for OS and SQL licensing.
- Monitization strategy: unobtrusive adds, job placement ads, DevDays conferences, extend the software to target other related niches (Server Fault, Super User), develop StackExchange as a white label and self hosted version of Stack Overflow, and perhaps develop some sort of programmer rating system.
StackOverflow public database
We will discuss the database for this application in a future chapter. However, it is interesting to take a look at the database that StackOverflow has made public. There are many people data mining this to see what sort of coolness can be found. A good article on this topic is at sqlserverpedia.com entitled “Understanding the StackOverflow Database Schema”. We will return to this subject later.
Semi-controversial architecture and design decisions
I am a big podcast fan and listen to them for roughly 3 hours a day (as I drive 75 miles to and from work in Los Angeles). I love to listen to the greats such as Hanselminutes, Polymorphic Podcast, and many more. Something relevant to the readers of this series are two podcasts from Hanselminutes that really drove me to the decision of following known design patterns and best practices to the best of my ability. In the first interview Scott Hanselman interviews Jeff Atwood and team to discuss StackOverflow. During that interview Scott unearths several things that certainly took his breath away. You could sort of tell that he was a bit shocked at what he heard regarding some decisions to not use known best practices. Listen to it - it was sort of funny. In the second podcast…which wasn’t scheduled…Scott continued to record the behind the scenes discussion that took place after the initial interview of the StackOverflow team. In this discussion Scott really dug into what he had heard in the first interview. I found this to be quite funny too!
I think that Jeff Atwood and his team are probably way better programmers than I am. Having said that, I can’t bring myself to develop a project of this nature in a non-best practices fashion. Having a great and very successful site up and running quickly was their goal and so making decisions to do things in a non-standard fashion met their needs. Having a great series of current trend related tutorials followed by a great OSS platform available for others to use and modify is my goal. For that reason everything that I build will be done in a manner that to the best of my abilities follows today’s best practices and design patterns. For example: there will be no direct dips to the database from the presentation tier in our implementation!
What will be covered?
I will admit up front that I think that this series will be a long one. I am going to do my best to attack this project in the manner that I would any other project. I want to show the programmer that has not done a project of this nature ALL of the steps that go into it, not just code snippets. This will include setting up some infrastructure aspects, configuring continuous integration, automated builds, source control, test suites, logging, and many other aspects of a software project that generally doesn’t get mainstream attention (but should).
We will take a look at creating wireframes and screen mock ups using the new SketchFlow tool in Microsoft’s Expression product. We will discuss solution and project structure. We will cover some of the great tools out there such as NAnt, StructureMap, AutoMapper, Elmah, NUnit, Rhino Mocks, and CruiseControl.net. We might also discuss some platforms for managing a project such as this using Zen or VersionOne. Pretty much anything that goes in between the A-Z aspect of this project will get at least an article’s worth of coverage!
For that reason I am unable to create a specific article list of a 1, 2, … 30 nature outlining what is to come. We will instead create an index page that will give you a starting place for someone that is interested in jumping around the series (once it is completed that is!). And the articles will reference back to this index to keep you going.
What is required?
I will do my best to only use technologies, frameworks, and other tools that you will have access too. For that reason I will ensure that there is a trial version, or free version, or open source version of whatever we use in our discussions. This way you can walk along with me side by side to develop your own version of this knowledge exchange. Also, a copy of the source code will be maintained for each article. Along with that you will also have access to the knowledgeexchange.codeplex.com site where my incremental check-ins will be stored. That repository will be further along than the article release so if you want to jump ahead that is the place to do it.
Some additional information
While none of this is required reading by any means. The following videos, blog posts, and articles are worth a look. These are things that I found interesting as I started to research this project.
This article was primarily an introduction to the upcoming series of articles. It introduced why I chose to create a StackOverflow style knowledge exchange and showed some of the high level looks at how StackOverflow is currently running. We discussed egoless programming and the fact that I fully expect my readers to help keep me straight with what we are going to build. I also loosely discussed some of the technologies that we will use in our implementation of this knowledge exchange. I then stated that this series will be performed using publicly available software so that anyone can follow along with me. Lastly I listed a few articles that I think might be worth reading to get your mind in the right place to understand what StackOverflow is, and its core features.
In the next article we will take a look at setting up our development environment. We will get into setting up a version control project on CodePlex. We will also discuss an appropriate file and folder structure to support a big project like this. Then we will create our initial solution which will include an ASP.NET MVC 2 web application and its test project. Next we will get the TortoiseSVN client set up to communicate with our version control repository on CodePlex. And at the end of the article we will get to perform our first commit into our code repository and make our StackOverflow inspired knowledge exchange project public.
I am a 33 year old, ex-Army Ranger, father of 6, geeky software engineer that loves to code, teach, and write. In my spare time (ha!) I like playing with my 6 kids, horses, and various other animals.
This author has published 29 articles on DotNetSlackers. View other articles or the complete profile here.
Please login to rate or to leave a comment.