
3. Failing to Involve the Business People
Data warehouse projects are notoriously dynamic. That can be good or bad. Usually, it is perceived as bad because requirements change constantly; the scope is hard to control; the timeframes for delivering applications are unreasonably short; the data is usually dirtier than expected; business people are hard to pin down to provide business rules for data cleansing; project team members are often unclear about their specific roles and responsibilities, etc.
On the other hand, dynamic projects can be good because business people have an opportunity to learn about any new technology or tools early on. They also have an opportunity to “play” with their requirements and adjust them as they learn more about the capabilities and limitations of the data warehouse. IT folks can experiment with different database and application designs, and they can negotiate the project scope and delivery time to be more realistic depending on the difficulties they encounter. They can also profile the source data early and show all the data defects to the business people for resolution or deferment.
It should be obvious that dynamic projects have to be set up differently from traditional projects if the dynamics are to have a positive impact on the project team or the project schedule. The differences include adoption of a rapid development approach similar to prototyping, acceptance of the software release concept, a self-organizing SWAT team, and full-time involvement of business people in project activities. This is a paradigm shift for how applications are developed, and only a few business and IT people are comfortable with it. In contrast, organizations that successfully practice “extreme programming” techniques understand the benefits of this new approach because the prerequisites for extreme programming are the same (rapid development, software releases, SWAT teams, and participation from business people).
Why is it so important for business people to participate and what would they be doing on the projects? The most important reason business people should participate is to speed up the development work. It is a common complaint among IT people that situations come up several times a week for which they need input from the business people. But business people don’t often make themselves available. And when they do, it can be too late, especially when weeks have passed and IT has already assumed how best to resolve the situation. Sometimes IT guesses incorrectly, leading to rework, which can impact the project schedule. When combining these situations with changing business requirements that are not subjected to rigorous impact analysis, the result is a frustrated IT team and unhappy business users who don’t understand why IT takes so long to deliver the application. Business people must “live” the projects alongside IT in order for the entire project team to be more productive. Hence, business people must make time to become participating members of data warehouse project teams. They must participate in project planning, perform impact analysis on their own requirement (scope) changes, remove business-related road blocks (like data disputes between business areas), and perform other business-related project activities such as:
Business people must be told that they are an invaluable and indispensable part of an effective data warehouse project team because they possess certain knowledge and authority their IT counterparts don’t have. In addition, business people understand the severity and monetary impact of their organization’s business problems, and they are the only ones with the position and authority to negotiate the priorities of data warehouse projects.
4. Failing to Have Application Releases
The deliverable of a data warehouse project is usually a fully operable data warehouse application with a lot of functionality and a lot of data. The source data typically resides on multiple (and often heterogeneous) operational files or databases, which adds to the complexity of integrating the source data. The amount of data redundancy, data inconsistency, and data defects is habitually underestimated. Project teams who did not plan to spend the majority of their time on data cleansing are caught off guard—especially because they are expected to implement the fully functioning data warehouse application in an extremely short timeframe. In short, many project teams bite off much more than they can chew (i.e., their scope is much too large for their deadline).
It has been said for years that “you cannot build a whole data warehouse [environment] in one big bang.” Nobody challenges that anymore. But that does not go far enough in the attempt to reduce scope and complexity to a manageable chunk of work. Therefore, the new mantra should be: “you should not build a data warehouse application in one big bang.”
Following the principles of extreme programming, extreme project management, extreme methodologies, we should also adopt extreme scoping. Extreme scoping means reducing the complexity, and thus, the scope of each project in order to deliver something to the business users in a very short period of time. The first something would equate to only a fraction of the requested fully functioning application, but more functionality and more data would be added rapidly with each subsequent application release. Many business users still balk at this approach and insist on the “minimum required” deliverable, saying that anything less is of no use to them. They don’t realize that not only are they not losing anything, but they are gaining a lot with application releases.
Building an application in small iterations (application releases) will not take any longer than building the whole enchilada at one time. In addition, the business users can see their application grow and catch mistakes or adjust their requirements if needed (under strict change control procedures). This will greatly enhance the quality of the final fully functioning application. Another benefit of this approach is the opportunity for business people to slowly become familiar with their data warehouse application and any new technology or tools before the application is even completed.
Maybe the best way to illustrate the effectiveness of extreme scoping with application releases is to recall the concepts of prototyping. In prototyping we focus on a small (partial and incomplete) scope that is not too complex, and we produce a not yet fully functioning application. The next prototype release includes another small portion of the overall scope with a little more complexity and a little more functionality. This process can be repeated until the application is fully functioning. Although application releases are based on the same concepts as prototyping, they are not equal to prototyping. The difference is that traditional prototyping is pure ad hoc development, whereas application releases demand all necessary project activities to be performed with the rigor of a methodology.
Scoping data warehouse projects will remain a struggle as long as we are married to the idea that a project must produce a fully functioning application. But if we use application releases to build data warehouse applications, controlling the scope becomes much easier because the complexity of each release is reduced, the number of activities performed is decreased, and the project team can be smaller, rendering the methods for controlling the project more effective.
Most project managers create a short document that describes their project in terms of high-level requirements, users, schedule, resources, and budget. This document is often known as a document of understanding, project agreement, scope agreement, or project charter. Frequently, this document is created by copying an old document from a previous project and changing a few details here and there. Once the project is kicked off, this document disappears into a project manual—never to be seen again.
A well-thought-out project charter is a very useful instrument and should be used actively to monitor and control project activities during the entire development cycle of a data warehouse project. Therefore, the project manager and the business user, or business sponsor, should spend some time documenting the details of their agreement in this charter. While a detailed project charter can contain as many as 20 sections, the following four sections are the most useful to serve as a baseline for change control:
Traditionally, scope has been measured by the number of functions the system will perform (function point analysis)—a sure way to underestimate effort, budget, and resources. Data warehouse applications are data-intensive, not function-intensive. Therefore, scope must be measured by the number of data elements that have to be extracted from the source systems, transformed and cleansed, and loaded into the data warehouse target databases.
Every project is subject to some risks—risks are unavoidable. Such risks could severely affect the project schedule as well as the project deliverables, depending on the likelihood the risks will materialize and the impact they would have on the data warehouse project. The project manager must identify triggers for each risk and incorporate a risk mitigation plan as well as a contingency plan into the project program.
An assumption is anything taken for granted; a supposition or a presumption. It’s important to document assumptions because a wrong assumption could very quickly turn into a risk. Important assumptions should always have counterpart risks, in case the assumptions either turn out to be false or do not materialize.
All projects are subject to the four constraints of scope, effort (time), budget, and resources (capable and available people). In reality, there is a fifth constraint: quality. Although quality is a measure of how well the deliverables meet the requirements, it can also be considered a constraint that must be balanced with the other constraints because higher quality requires more effort and therefore more time to deliver.
The project charter should record the agreed-upon scope negotiated under the stated risks, assumptions, and constraints. If any of these components change, the entire project has to be reevaluated and renegotiated, and the changes should be reflected in the revised project charter so it can be used as a baseline for the next change request.
Unfortunately, business managers and IT managers frequently put their project teams under unwarranted pressure to incorporate scope changes without ever referring back to the original agreement in the project charter, and also without performing the necessary impact analysis.
Many project managers attend data warehouse conferences and learn best practices for planning, designing, and implementing data warehouses. But, when they return to their organizations and try to apply the best practices, they often encounter resistance from business users and “uninitiated” IT members. Regardless of how hard the project managers try to educate, convince, or force those who oppose them to follow best practices, they are often unsuccessful.
The mistake these project managers make is not realizing their organizations aren’t ready to suddenly stop building traditional stand-alone systems and begin building an integrated data warehouse environment. Some organizations are bent on trying every shortcut and silver-bullet solution before they admit those solutions only add to their data chaos.
Therefore, at the beginning of the initiative, an organization’s readiness should be assessed (understanding, ability, and willingness). A readiness assessment would include the following questions:
1. Have the goals and objectives been defined?
2. Do the goals and objectives for the data warehouse map to those
of the organization?
3. Has the source data been inventoried and modeled?
4. What is the quality of the source data?
5. Are the skills in place to build and support the data warehouse?
6. Is an adequate budget in place?
7. Has supporting software (ETL, cleansing, DBMS, etc.) been
selected and installed?
8. Is there a strong, well-placed, and reasonable business sponsor?
9. Does the business sponsor understand that a data warehouse is
not a stand-alone system?
10. Are the primary business users computer literate?
11. Are the business users’ expectations realistic?
12. Do the business users understand they have to participate in
project activities?
13. Does the business sponsor accept the approach of building
applications iteratively (using the software release concept)?
Based on the assessment, the project manager can determine which best practices to implement. Periodically, the questionnaire should be re-distributed to gauge the organization’s understanding of data warehousing. At that time, more best practices can be incorporated.
Data warehouse testing is often done poorly. It’s unacceptable—and so is the excuse that “it can be fixed in the next release.” If it takes too long to test the data warehouse application properly now, it will take even longer to test it later because the next release will be larger and more complicated.
The same types of testing activities that apply to operational systems also apply to data warehouse applications, including unit testing, integration (systems) testing, performance (stress) testing, quality assurance testing, and user acceptance testing.
Unit testing refers to the testing of discrete program modules and scripts. Every developer must test his or her program modules and scripts individually.
Integration testing tests the complete process. The interactions and flow of all programs must be observed and validated. Every time actual test results do not equal the expected test results, the program producing the error must be corrected, and all programs must be rerun.
The most complicated and time consuming type of testing is regression testing. The main goal of regression testing is to ensure any modifications to existing programs did not inadvertently produce errors.
Performance testing, also known as stress testing, is performed to predict system behavior and performance. Performance testing can be limited to only the most critical program modules with the highest volumes of data and the longest runtimes.
Most large organizations have strict procedures for moving applications into production. These procedures usually include QA testing, at which time the operations staff goes through a simulated production run before allowing the application to transfer to the production environment.
Acceptance testing is done by business users. They validate the functionality of the data warehouse application.
With the possible exceptions of unit and performance testing, all other testing activities are controlled by a test plan. The bulk of the plan will be a list of test cases. Each test case specifies the input criteria and the expected output results for each run. It also describes the program logic performed and the appearance of the resulting data.
8. Underestimating Data Cleansing Efforts
Most organizations admit they are not paying sufficient attention to their data quality. As disheartening as the situation is with operational source systems, it is very discouraging to see that many IT and business managers are continuing to put pressure on project teams to build data warehouses quicker using the motto: “There’s never enough time to do it right, but always enough time to do it over.”
With project schedules shrinking and project scopes expanding, project managers are under the gun to deliver more in less time. Therefore, they habitually do not allocate enough time for source data analysis, business rule discovery, data cleansing, data reconciliation, and ETL testing. As a result, two things happen: (1) many data defects propagate into the data warehouse unnoticed, and (2) some dirty data is discovered too late when data exceptions are caught during ETL testing or while loading the data warehouse databases.
To avoid project delays, the project manager should build sufficient time into the project plan to profile each data element. Common data violations to look for include:
Metadata is nothing new—it has always been a part of automated systems. It was found in system documentation, record layouts, database catalogs, and data declaration sections in programs. In fact, it used to be called the dirty “D” word—documentation. Since technicians detested the thought of doing documentation, they often simply didn’t do it.
However, in a data warehouse environment, metadata takes on a new level of importance. Since one data warehouse objective is to eliminate inconsistencies, data must be standardized. Standardization may result in renaming the data, splitting one source data element into multiple target columns, or populating one target column from multiple source data elements. It can also mean translating codes into mnemonics, standardizing (changing) data values, and filtering out inappropriate or invalid records. At the end of the day, business people will not be able to reconcile their operational source data to the data warehouse data unless they have the ability to trace these changes.
Therefore, metadata is now the nice “N” word—navigation. It helps the business people locate, manage, understand, and use the data in the data warehouse databases. It describes what data is available in which database, what it means, where it came from, how it was processed, how clean it is, and how it is used in reports and queries. Not delivering any metadata that could help business people navigate through their data warehouse environment is a mistake.
Granted, implementing a metadata repository has its challenges. Although many data warehouse experts consider metadata to be the “glue” holding the warehouse environment together, most organizations allocate little or no money for creating and maintaining a metadata repository. Also, metadata should be “living” documentation that is constantly updated, which means—at a minimum—one metadata administrator must be assigned full time to manage the metadata repository. But regardless of the challenges, metadata must be an integral part of a data warehouse environment.
And, since the importance of metadata is still not understood by most business executives, project managers need to do a much better job of promoting metadata and communicating its benefits to the business people.
10. Being a Slave to Project Management Tools
Planning and controlling a project is not a trivial task. It takes a long time to create a work breakdown structure, estimate effort and duration time for all tasks, apply task dependencies and resource dependencies, and determine the critical path. Critical path refers to a string of dependent tasks that cannot be late without affecting the project schedule, as compared to other tasks executed at the same time with slack time built into them. For example, if it takes four days to evaluate a product but only three days to create a project charter (at the same time the product is being evaluated), then the task of evaluating a product is the critical path because it has no leeway in timing. If that task is a day late, the project schedule is impacted.
Knowing where the critical path tasks are at any point during a project is crucial to staying on track. Since estimates are only best guesses based on prior experience with similar tasks, the actual time it takes to complete a task usually differs from its estimate. These differences can easily change the critical path. Tracking the differences between estimated and actual time, and continuously adjusting the critical path has enslaved more than one project manager to his/her project management tool. Other project managers find this activity too laborious and too tedious, and they stop tracking the critical path altogether.
The best approach (and compromise) for tracking the critical path is to do it at the milestone level rather than at the task level because there are fewer milestones than there are tasks. Thus, it takes less time and effort to continuously adjust the project plan. The critical path among the tasks between the milestones can be tracked more informally using a whiteboard or a flipchart instead of the project management tool. This approach can be safely and effectively used when the scope of the data warehouse project is very small and the project team is managed by a self-organizing core team (SWAT team) rather than by
a project manager who is not involved in daily project activities.