Sunday, February 15, 2009

Capability Maturity Model Integration- Applied to a DW (Part 1)

Being Software professionals, we should all be familiar with the Capability Maturity Model Integration (CMMI). CMMI is a process improvement approach that provides organizations with elements to measure and improve business processes. Before we get into the details on what that means to a DW/BI solution, Lets take a quick look at what CMMI exactly means and how it helps an organization.

CMMI supercedes the earlier variant CMM (Capability Maturity Model), and was developed to provide guidance to use when developing system and software processes.



The Below document shows the key process areas and their classification in CMMI levels.



We will talk about how this applies to a DW/BI Solution in the next part.

Tuesday, January 20, 2009

BI winners and losers - 2009 Predictions

Winners
1. SaaS (Software as a Service)
2. Cloud
3. Analytics

Loser
1. SOA

Backburner
1. MDM
2. Data Quality
3. Metadata

Have a look at the complete list here or summary here.

Amidst the current economic conditions, it makes sense for lower capital cost models like SaaS gain popularity and "luxuries" like MDM and Meta Data Management to be set aside for better days to come. While the benefits of MDM, DQ and Meta Data in the long run cannot be denied, a lot of management people I know are yet to become seers.

Another interesting factor to consider is the advent of open source tools atleast on the BI side if not on the ETL side. Companies who will consider tackling the "Cost" beast will find tools like BIRT promising and interesting. Atleast medium size companies might try it out.

Also making a part of the high level data available across the enterprise and building a "enterprise search" functionality is lacking in most BI tools / Implementations. Information Access tools like Endeca can target this market.

Now the combination of Open source, SaaS and Cloud is a very promising market yet to be taken up by consulting firms. Lack of "Open source" tool experts, the "convincing the management" process on cloud and SaaS is still going to be a hurdle to this option.

And then, with the economic depression looming, you never know!

Sunday, January 4, 2009

Information Warehouse 1.0 - Final part

So now that we have seen about endeca and how it helps address some of our problems, lets come to the topic in hand. Information Warehouse. For those of us familiar with a data warehouse, this is a easy concept. Instead of dealing with data, we are going to handle the end product directly, thereby skipping a few steps and saving lots of money from the budget.

Yes, Information warehouse is presenting your data as Information directly to the end user from the source itself. Since Endeca allows for Information Integration and doesnt necessarily need conformance of different sources, this is easily achievable.

While Data Quality isssues like Cleansing and best practices like Master data Management still require physical Data integration, for a lot of companies who donot have the time or appetite to do it, Information warehouse should be a viable option.

Even for those of us who work in a data warehouse project and hear the end user complain everyday about the data he wants, Information warehouse can be the way to open up the data.

For those who still want to stand by Relational Databases and Architectures built on top of it, all I can say is, it is a dying breed. Dont believe me? Check out this MIT article which details why.

Here are a few features of RDBMS quoted to be used from 1970's

1. Disk oriented storage and indexing structures
2. Multithreading to hide latency
3. Locking-based concurrency control mechanisms
4. Log-based recovery


May be it is time to rethink the way we handle data. Start visualising the handling and access of data from today's technology perspective.

Maybe it is time we stopped building data warehouses without ability to give complete access to end users and start building information warehouses. After all, it is the Information that we care for!

Monday, December 15, 2008

Information Warehouse 1.0 - Part 4

Imagine a piece of data floating in a 3 dimensional world.


It is related to different elements – both data (other data) and (its own and other data’s) attributes / dimensions. Every data element is linked to another data or attribute in this world. The only difference is the level of the link. The key here is to understand that the “database” for this Information access is not in relational format.

And add a “Google like” patented technology that makes finding information a nifty process and a front end highly customizable Java / .NET API. The possibilities are endless. What is even better is the Guided summarization that makes finding what you want, even more easy. This goes one step above. While free form text searches are still one of the options, what about questions that require information that do not fit in a report or a search box properly? Guided summarization shows you the way.

In addition to being able to lookup structured, semi-structured and un-structured data, relating them to each other, providing capabilities for searching them using free form search, Guided summarization lets you analyze dimensions of data that you would not be aware of, easing your way to the information you need. For more information about Endeca, and to view a demo of how guided summarization works, please visit http://www.endeca.com/

p.s: The picture is only for visualization purpose.

Monday, December 1, 2008

Information Warehouse 1.0 - Part 3

Then the question of Data Integration comes to mind. If all I want to do is look at related data that happen to be in separate silos do we have to really go down the path of expensive data integration projects? Why do we try to hammer the square object in the triangle holes?

The World Wide Web search might show us the way. Relational databases are agreeably one of the best ways to store the data. But this is not 1980’s. Disk space and memory are less expensive than Consultants. We shouldn’t be shy to take the “Google” approach and improve upon that.

Introducing Endeca. Endeca is an Information Access platform. It doesn’t replace your ETL or BI tool. But lets you get more information out of your data and faster. When looking for Information, for those who always felt the BI front end as too restrictive and custom built SQL queries as too complex, here is the solution.

Imagine Google like free form text search capabilities for your queries. The ability to associate structured data with unstructured data and semi structured data without the need to bring them to your database. The ability to integrate your information without having to integrate the underlying data. Building an Information warehouse. This is Endeca.

So how does this work?

Monday, November 17, 2008

Information Warehouse 1.0 - Part 2

It all begin with their inability to understand human-friendly or unstructured data. Yes. I am talking about computers. The position of data becomes more important than the content or the context of the data in order to use it effectively with the computer. Rows and Columns become the norm of the day. A tremendous effort has been put in organising data in a structured way by companies wishing to utilize the value of their data. Despite spending enormous amount of money in converting data to a structured format, data, in its original format – unstructured / semi structured is still useful. A lot of companies still have the non-structured data that they simply do not have the time or money to convert to a structured format.

And then the question of looking at the data arises. Data itself becomes meaningless unless they are presented in a context understandable by humans, as information. Data stored, structured or unstructured has to be retrieved and shown with context as information to be of any value. Companies, having invested heavily in Business Intelligence or custom built applications, still cannot open up their data completely or efficiently as they wish to. The turn-around-time for a typical BI report is still measured in minutes, for instance.

Why does the traditional warehouse not fix the problem entirely? First, it has a hard (and expensive) time getting access to unstructured and semi structured data, and then it has limited ways of relating it to existing structured data. Add the limitations of the BI tool to this. And if you manage to do all of it, the query powering the reports at the front-end that show the information you are looking for, still takes a considerable amount of time doing it. Why the discrepancy in the query time? When, for instance Google can return results searching in a dataset considerably larger than any company can own, in less than seconds, why does the typical BI tool take much longer?

Lets see why.

Saturday, November 1, 2008

Information Warehouse 1.0 - Part 1

Information Access! Yes, that is what I am talking about. Have you ever wondered how google works? If so have you also questioned yourself why your BI tool takes 2 minutes to run a report!

If you answered yes, continue reading. Else, Do read still!

Ok. We have Integrated all our company data. We are producing regular operational and analytic report. Has that stopped the business user from coming and asking you for information? Usually no. The reason is BI tools tend to restrict the portal of information through their view and finiding anything is not easy.

The other side of the story is the business user is often not sure what he needs either. So how to give them access for what they themselves are not sure! Well, Like Google! Free form text search queries. And yes, relational databases are not exactly the best places to run them!

So How do we go about that? Introducing Endeca! More on Information access in the next part!