Monday, May 26, 2008

Audit Dimension and Data Quality

Is Data Quality confined to using a specific tool? Of course not. It begins right from Data Modelling. Effective tracking and monitoring of Data Quality doesnot necessarily need a Data Quality tool for instance. (Though I am sure the Data Quality tools make it easier). It can be delivered by a normal ETL job, if the Data Modeling is done right.

Kimball talks about about the modelling aspects of gathering information about Data Quality


He makes some interesting points about how to manage the audit "dimension" tables

....Our resulting audit dimension record should be of low cardinality compared to the fact table. All the fact table records loaded in the same batch run of the ETL system will probably have the same audit key, except for the few exceptional records that you have to modify or artificially supply. These exceptional records will generate only a few more audit keys......

The Full article can be accessed here.

Tuesday, May 20, 2008

For the success of Data Quality

Ok, we started a Data Quality project within the enterprise data - whatever form it is available in. Now, How do we make it a success?

Trillium, a Data Quality Software provider, tells this is how.

1. Establish measurable business goals
2. Align business and IT expectations
3. Confirm senior management buy in
4. Ensure that the business goals drive functionality
5. Understand the costs of building the solution in house
6. Commit trained personnel to your organization’s data quality issues
7. Understand the real costs and causes of poor data quality
8. Employ a proven methodology
9. Use a phased roll-out schedule
10. Track ROI

As it should be, it is more about the business, than about the technical solution!
Read the full article here. (requires registration)

Wednesday, May 14, 2008

Customer Data Integration - How to sell

CDI as it is usually called is achieving a single holistic view of the customer. Integrating, cleansing, standardizing and de-duplication of customer data and propagating it sounds an easy sell to the business. Apparently it is not. Most companies have not yet realised the business value and cost they lose associated with this.

Take an example. Company A has two sub divisions with their own silos of data. Once a Month I get a promotion mailer from them. Of course I get two envelopes of the same information.
Consider the cost savings associated with de-duplicating the customer data. The focused marketing that can be achieved by identifying an entire household.

Data Flux talks about the right way to sell CDI to the business. Dont scare them by telling how bad your data is. Instead focus on the benefits CDI brings in.

"........'We need CDI, and we’ll be at risk if we don’t adopt it soon. Our existing systems can’t cut the mustard – they’re all operating on different versions of the data. Our data warehouse was designed for analytics, our ODS is read-only, our ERP system has no customer detail, and our operational applications don’t talk to one another, let alone share data. Oh, and we need to put some governance mechanisms in place, meaning we need time from executives. And by the way, we need to hire skills we don’t have. And auditing our current systems would be a good idea…'

Such statements – while probably painfully true in your organization – are simply too scary for most managers to hear..... ..... All too often, that means your CDI program will be dead in the water. A better approach is to begin with the business impact of not having CDI. Have your facts straight, be unemotional and be ready to produce proof. Here are some statements that are non-threatening yet provocative. Each has proven very effective in getting management’s attention.

'12 percent of our accounts don’t have an associated account holder name. And we’re using this data in our financial reporting. Some day, someone’s going to notice and ask questions.'


'Our Siebel system was built to recognize a customer as anyone with an address. Our SAP system considers a customer to be anyone with a shipping address. And we have 72 other systems with their own business rules. The one that wins is the one that generated the data most recently. Which one are you using?'

'Of our 568,000 business customers, 23,452 of the monthly statements were returned by the post office. This represents $1.6 million in delayed or lost revenue.'
...............
Providing these facts is not fear-mongering. Instead, you need to understand and measure existing business problems that affect (or are affected by) inaccurate, unsynchronized, duplicated, missing or contradictory customer data. And if your company is like most companies, those problems proliferate. "

Download the DataFlux white paper here. (Needs registration)

Friday, May 9, 2008

Data Profiling - The First step

Profiling the existing data is the first step in any Data driven Initiative. Take a look at the DQ Management process available at Informatica's website.



When to profile the data?

Dr. Claudia Imhoff, President & Founder, Intelligent Solutions and Ed Lindsey, National Product Specialist, Informatica, answers Profile early and Profile often.

Some of the Key points she makes are

"...Data Stewards should come from the business. Generally they are the few people who demonstrate a true interest in the data and information you are generating.."

Forgetting the business that the IT serves and not involving them is a sure-fire way for an IT project to fail.

"The DQ process is different from customer to customer. The best place to fix a data quality problem is at the source. However, many customers will not allow modifications of the data at the source for fear of breaking the original system. Also, a lot of the data is not under the control of the department using the feeds because it comes from outside the company or business unit. Most of the time the data is corrected as it enters the business unit as part of an operational data store, data warehouse or enterprise application. Over time, as DQ issues are corrected downstream, the data customer gives their feedback to the provider and hopefully they initiate their own DQ process so that over time quality ultimately finds it way back to the source system"

This is a much more realistic approach in the real world. Usually Companies have some form of Data they use for reporting or analysis and they have Data Quality problems right there. This means profiling and identifying the problems is indeed the first step usully in a DQ initiative.

Read the complete Article here. There is also a download link available there if you want to view the webinar.

Monday, May 5, 2008

Setting up a IQM Framework

IQM stands for Information Quality Management.

Larry P. English author of the widely acclaimed book 'Improving Data Warehouse and Business Information Quality' says

Data cleansing [sic., correction] is NOT information quality improvement. It is the cost of having defective processes. The absolute goal of a sound IQ management system is the elimination of the need for data correction, a.k.a. “information scrap and rework.”

I still think cleansing the existing data and fixing the existing Data Quality problems is the first step in any IQM initiative. The reality is corrupted/bad data is here to stay until it is fixed.

Similarly the talk about the IQ methodologies such as Deming, Crosby, Kaizen, Six sigma etc; is interesting. Personally I would vote for Kaizen. (And I am supported by the success of Japanese and Korean Automakers while American Auto Giants are struggling)

Any Data Quality effort must improve and must continuously improve the quality of the data.

The Six Sigma is another interesting methodology. Usually it either follows DMAIC (Define, Measure, Analyze, Improve, Control) or DMADV (Define, Measure, Analyze, Design, Verify). Usually DMAIC is for correcting existing problems and DMADV is for new implementations.

He also makes a couple of other interesting points. Read the complete article here.

Thursday, May 1, 2008

Data Quality - How and When

Information is one of the key asset of any organisation. Sending newsletters to Mr. John as Ms. John is a sure way to lose business or create a negative image about one's company.

While everyone talks about Data Quality, How do we measure it?

The following six aspects are how we measure the quality of a data.

1. Completeness – What is missing?
2. Conformity – What is not in a standardized format?
3. Consistency – What are the conflicts in the data?
4. Accuracy – What is incorrect?
5. Duplicates – What is repeated?
6. Integrity – What fails to have proper reference data?

When to start worrying about Data Quality?

Ideally, As soon as possible. Any data -driven initiative which would include ERP, CRM and EDW solutions would have to start with a Data Quality component. And then, it is better to be late then never.