How frequent do you see project timelines suffer as a result of data quality issues discovered late in the game?
On my most recent ETL engagement, I had noticed a multiplicity issue with the value of one of the dimension attributes early on in the effort, prior to development. Consequently, quite a bit of time was spent to address the problem but I did not want to commence any coding until the issue gets resolved because of potential impact to the target model and downstream processes.
Soon enough, the week for code delivery came and the PM got anxious by the perceived lack of progress, to which I explained ETL is 50% Analysis, 30% Design and 20% Development. The fact of the matter is in a data integration effort, analysis and design normally dovetail. Design ideas would pop up in the course of analyzing the source data. Often than not, what is required at the end of analysis is only a formalization of the design since the latter's components are already present then.
I ended up completing the design the same day the PM voiced his "bewilderment" and finished the majority of the coding two days later, all within the planned delivery week. Granted it was a small project but there were most definitely challenges in sourcing the data.
Whereas it was primarily anecdotal on prior occasions, this experience ingrained in me the intuition that the bulk of data integration work is analysis and design. Properly conducted, analysis and design can facilitate the entire coding process greatly, i.e. the development itself is relatively not as burdensome a phase as most professionals are led to believe, especially when employing highly productive data integration tools.
Do as complete a data analysis upfront as possible, resist the urge to initiate coding early and you will have increase your odds of on-time delivery.
Thursday, October 30, 2008
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment