Monday, February 20, 2012

Business Intelligence Shootout: Microsoft SQL Server 2008 BI vs. Pentaho BI Enterprise Edition (Part 1: Pentaho Analysis Services)

A Tale of Two Analytic Engines
In this first head-to-head comparison, we pit Microsoft SQL Server Analysis Services (SSAS) 2008 against Pentaho BI's Analysis Services.  As mentioned in the introductory post, in-memory analytic engines aren't exactly new hat.  In fact, the inspiration for these engines came from very simple spreadsheet applications, hence the origins of Essbase's name -- "Extended Spread Sheet Database."  Despite their age, the benefits of analytic engines remain the same: data retrieval and ad-hoc analysis at lightning speeds.  Because most of the data is persisted in RAM, the speed at which you retrieve the data is only limited by your network speed and your user interface's rendering.  With that in mind, we have two analytic engines here that drew inspiration from similar roots, but go about their implementations differently.  

Microsoft SQL Server Analysis Services has actually been around since SQL Server 7, thanks to Big Redmond's acquisition of Panorama Software.  It has only been a recent development, starting with SQL Server 2005, that Microsoft has made a serious push into the BI space, literally offering its entire BI stack for free with a SQL Server 2005 (and then later, 2008) Standard Edition license.  Microsoft innovated the now ubiquitous Multi-Dimension Expression (MDX) query, and has made strides in usability, deeply integrating the SQL Server BI platform to all of its core enterprise offerings, Microsoft Office and Microsoft Sharepoint, as well as its well renowned integrated development environment, Visual Studio.

Pentaho Analysis Services, aka Mondrian in the open source world, is a relative newcomer to the BI marketplace.  Started by industry veterans from the defunct Arbor Software (where Essbase was incubated and released) at the turn of the 21st century, this Java-based analytic platform began its roots as an open source platform, along with the other components of Pentaho BI.  The goal of its founders was to create a powerful, flexible, cohesive, scalable, and cost-effective platform, meeting or exceeding the capabilities of its commercial conglomerate counterparts.  Who better to architect and develop such a solution but some of the very pioneers of the BI movement?  PAS is the core of the Pentaho BI stack, offering seemingly the same capabilities as other analytic engines in the market.


With the history of both analytic engines in mind, let's take a deep dive at both, using AdventureWorks as the star schema base.  For the impatient, I have published all the artifacts produced in this blog post on my source control system, which grants everyone read access.  If you have a Subversion client, point to https://edg.sourcerepo.com/edg/PentahoAdventureWorks




Friday, February 3, 2012

Business Intelligence Shootout: Microsoft SQL Server 2008 BI vs. Pentaho BI Enterprise Edition (Introduction)

Business Intelligence: A History in Review
Business Intelligence (BI) has become a hot topic in IT and business in recent years.  In the past, BI seemed more like a gimmick, with disconnected pieces of software made by various purveyors that seemingly required a significant upfront investment from both IT and business resources, from both a fiscal and training standpoint.


You might have a tremendous analytic engine from someone the likes of Arbor Software with their Essbase platform, but to effectively syphon the data out into a presentable, maintainable format, you'd also have to invest in some sort of enterprise reporting tool like Arcplan.  Furthermore, to grant users the ability to perform ad-hoc slice and dice analysis, you'd need an analytic software package pushed by a company like Hyperion.  The overall experience of BI, much like its roots, seemed disjointed, with no common vision, and as a result, was expensive to learn and maintain and did not impact the business world the way it had hoped.


The days of disjointed BI have thankfully past, and we are in an era of computing where these tools have become nearly as slick as their web and desktop enterprise application counterparts.  Speed of deployment, ease of use, and cost efficiency are the key elements in modern BI.  Users want their data faster, more flexible, and robust enough to truly gain that holy grail of turning business intelligence into business INSIGHT. 


The Geekstantialism BI Shootout!
In this eight-part series, we will bring two up-and-coming BI platforms front and center, and objectively review the strengths and weaknesses of each, as well as make some recommendations on their optimal effectiveness.


In one corner stands a slick platform backed by the billions of the Microsoft Corporation -- SQL Server 2008 BI.


In the other corner, a young upstart Java-based open source BI platform architected by some of the brightest minds in BI since its inception -- Pentaho Business Analytics.


The Parameters
In order to gain the best insight of the strengths and weaknesses of both platforms when stacked up against each other, I've put together the following parameters/requirements that both must follow in this thought exercise...

Pentaho System Setup
  • Ubuntu Linux Server 11.10
  • Apache Tomcat Server
  • MySQL database backend
SQL Server 2008 Setup
  • Windows Server 2008 Enterprise Edition
  • IIS
  • SQL Server 2008 backend

Data Source
The base data warehouse for both will be the AdventureWorksDW2008 database offered by Microsoft.  Obviously, as Pentaho BI runs off of MySQL out of the box, I will be converting said SQL Server database to MySQL, keeping it structurally the same.


Analytic Engine
Using the design tools for both platforms, cubes will be built against all fact tables in the AdventureWorksDW2008 database.  All dimensions will be created.  Dimensional hierarchies will be created where applicable.  Simple calculated members will be created.  Data mining options will be briefly evaluated.


Ad-hoc Analysis
The following ad-hoc slice, dice, and drillthrough will be performed against both platforms using the respective front-ends:

  1. Investigation of Internet Sales by Product, Customer, and Sales Territory
  2. Investigation of Reseller Sales by Order Date, Employee, and Promotion
Reporting
Both platforms will be evaluated in producing the following reports using provided design tools and persisting them to their respective report servers for distribution:
  1. Call Center Report by Date Shift
  2. Category of Items Purchased by Customer and Date.
Integration Engine
Both integration engines will be evaluated by using the design tools provided and will perform the following:
  1. Refresh DimProduct with new data (CSV source)
  2. Refresh DimCustomer with new data (CSV source)
  3. Rebuild the respective cubes and dimensions on the respective platform
The Final Verdict
After going through the components of both BI platforms, I will outline the overall strengths and weaknesses of both, and make some educated recommendations on the optimal implementation of either platform.  The goal here isn't necessarily to determine which is truly "better" (as that can be a relative term, depending on the circumstances), but instead to gain a better understanding of what scenarios would call for what tool.  After all, that's the whole fun in technology!


My Qualifications
Finally, if you haven't done so already, you must be asking yourself what makes me qualified to make any sort of judgement call on BI platforms?  Professionally, business intelligence has been my specialty area for the past 6 years.  Having been involved as the lead developer for multiple BI platforms on projects of varying sizes in finance and investments, I have practical experience on BI implementations and the demands of its users.


I do have a depth of knowledge with the Microsoft product stack, as that has also been my professional specialty for about 9 years now (specifically Visual Studio [C#, VB, ASP .Net], SQL Server, and the SQL Server BI stack).


I also have a depth of knowledge in the open source and Java, always experimenting with various flavors of Linux (from RedHat to Mandrake to Debian and now Ubuntu).  For fun, I like to develop Google Android apps.  


This will be my first deep dive into the Pentaho platform.  I hope to gain and document insight into this well-regarded platform and see how it stacks up to the SQL Server BI stack that I know well.  Furthermore, I've always had a soft spot for open source initiatives, but found many times that they're a little too rough around the edges for risk-averse enterprises.  Pentaho looks a bit more promising.


Pentaho claims to be able to lure people away from "big commercial BI" with its platform, and I want to put that to the test.  Is it really a BI platform I can confidently recommend as a viable, and even superior, option when scoping out projects?  Time will tell, and I'm excited to find out!   




Stay tuned for Part 1: Pentaho Analysis Services!