Friday, February 3, 2012

Business Intelligence Shootout: Microsoft SQL Server 2008 BI vs. Pentaho BI Enterprise Edition (Introduction)

Business Intelligence: A History in Review
Business Intelligence (BI) has become a hot topic in IT and business in recent years.  In the past, BI seemed more like a gimmick, with disconnected pieces of software made by various purveyors that seemingly required a significant upfront investment from both IT and business resources, from both a fiscal and training standpoint.


You might have a tremendous analytic engine from someone the likes of Arbor Software with their Essbase platform, but to effectively syphon the data out into a presentable, maintainable format, you'd also have to invest in some sort of enterprise reporting tool like Arcplan.  Furthermore, to grant users the ability to perform ad-hoc slice and dice analysis, you'd need an analytic software package pushed by a company like Hyperion.  The overall experience of BI, much like its roots, seemed disjointed, with no common vision, and as a result, was expensive to learn and maintain and did not impact the business world the way it had hoped.


The days of disjointed BI have thankfully past, and we are in an era of computing where these tools have become nearly as slick as their web and desktop enterprise application counterparts.  Speed of deployment, ease of use, and cost efficiency are the key elements in modern BI.  Users want their data faster, more flexible, and robust enough to truly gain that holy grail of turning business intelligence into business INSIGHT. 


The Geekstantialism BI Shootout!
In this eight-part series, we will bring two up-and-coming BI platforms front and center, and objectively review the strengths and weaknesses of each, as well as make some recommendations on their optimal effectiveness.


In one corner stands a slick platform backed by the billions of the Microsoft Corporation -- SQL Server 2008 BI.


In the other corner, a young upstart Java-based open source BI platform architected by some of the brightest minds in BI since its inception -- Pentaho Business Analytics.


The Parameters
In order to gain the best insight of the strengths and weaknesses of both platforms when stacked up against each other, I've put together the following parameters/requirements that both must follow in this thought exercise...

Pentaho System Setup
  • Ubuntu Linux Server 11.10
  • Apache Tomcat Server
  • MySQL database backend
SQL Server 2008 Setup
  • Windows Server 2008 Enterprise Edition
  • IIS
  • SQL Server 2008 backend

Data Source
The base data warehouse for both will be the AdventureWorksDW2008 database offered by Microsoft.  Obviously, as Pentaho BI runs off of MySQL out of the box, I will be converting said SQL Server database to MySQL, keeping it structurally the same.


Analytic Engine
Using the design tools for both platforms, cubes will be built against all fact tables in the AdventureWorksDW2008 database.  All dimensions will be created.  Dimensional hierarchies will be created where applicable.  Simple calculated members will be created.  Data mining options will be briefly evaluated.


Ad-hoc Analysis
The following ad-hoc slice, dice, and drillthrough will be performed against both platforms using the respective front-ends:

  1. Investigation of Internet Sales by Product, Customer, and Sales Territory
  2. Investigation of Reseller Sales by Order Date, Employee, and Promotion
Reporting
Both platforms will be evaluated in producing the following reports using provided design tools and persisting them to their respective report servers for distribution:
  1. Call Center Report by Date Shift
  2. Category of Items Purchased by Customer and Date.
Integration Engine
Both integration engines will be evaluated by using the design tools provided and will perform the following:
  1. Refresh DimProduct with new data (CSV source)
  2. Refresh DimCustomer with new data (CSV source)
  3. Rebuild the respective cubes and dimensions on the respective platform
The Final Verdict
After going through the components of both BI platforms, I will outline the overall strengths and weaknesses of both, and make some educated recommendations on the optimal implementation of either platform.  The goal here isn't necessarily to determine which is truly "better" (as that can be a relative term, depending on the circumstances), but instead to gain a better understanding of what scenarios would call for what tool.  After all, that's the whole fun in technology!


My Qualifications
Finally, if you haven't done so already, you must be asking yourself what makes me qualified to make any sort of judgement call on BI platforms?  Professionally, business intelligence has been my specialty area for the past 6 years.  Having been involved as the lead developer for multiple BI platforms on projects of varying sizes in finance and investments, I have practical experience on BI implementations and the demands of its users.


I do have a depth of knowledge with the Microsoft product stack, as that has also been my professional specialty for about 9 years now (specifically Visual Studio [C#, VB, ASP .Net], SQL Server, and the SQL Server BI stack).


I also have a depth of knowledge in the open source and Java, always experimenting with various flavors of Linux (from RedHat to Mandrake to Debian and now Ubuntu).  For fun, I like to develop Google Android apps.  


This will be my first deep dive into the Pentaho platform.  I hope to gain and document insight into this well-regarded platform and see how it stacks up to the SQL Server BI stack that I know well.  Furthermore, I've always had a soft spot for open source initiatives, but found many times that they're a little too rough around the edges for risk-averse enterprises.  Pentaho looks a bit more promising.


Pentaho claims to be able to lure people away from "big commercial BI" with its platform, and I want to put that to the test.  Is it really a BI platform I can confidently recommend as a viable, and even superior, option when scoping out projects?  Time will tell, and I'm excited to find out!   




Stay tuned for Part 1: Pentaho Analysis Services!

No comments: