What is a Data-Centric Development Project?

By FSFP

This is the first article in a two-part series on data-centric projects and our Data-Centric Development Life Cycle (DCLC) methodology. DCLC brings a new approach to address the unique needs of data-centric projects — e.g., Data Lake, Big Data, Business Intelligence (BI)/Analytics, et al.

Data-Centric Projects Defined

When computers were first widely adopted by enterprises in the mid-1960s, the focus was on automating processes, particularly books and records. These processes were manual and well understood by the business. At that time, people considered data merely a byproduct of computerized systems, if it was considered at all.

By contrast, today many development projects are not trying to automate processes — but to distill valuable information from pre-existing production data. These are data-centric projects: they begin with data that is already available, rather than beginning with a need for automation. These are the projects taking center stage today. Enterprises, at last, understand the value of data and see the success of companies that put data at the core of their business models. The result is a surge in data-centric projects; however, success with them can be elusive.

Do data-centric projects exist in your organization? Do they produce the intended outcomes? If not, it’s likely your project management methodology is not aligned to data-centric projects.

Traditional Project Methodology Flaws

While hard statistics are difficult to come by, there are widespread anecdotes about data-centric projects failing to meet the expectations set for them. Data warehouses, data marts, data lakes and environments for BI/Analytics, predictive modeling and Big Data often seem to have issues. Some of these projects can be expensive, requiring enormous investments in hardware and software even before any development work begins. This takes the focus off data and puts it on technology. Once this happens, traditional IT concerns take over and expectations start to grow that, once the infrastructure investment has been made, success is inevitable — and it is a simple matter of using what was purchased. Incredible as it may sound, a number of organizations have spent millions of dollars in this way and have very little to show for it.

Typically, data-centric projects begin and are planned using the traditional Systems Development Life Cycle (SDLC or Waterfall) methodology or Agile. Both methodologies are more attuned to process-centric projects rather than data-centric ones. They presume that requirements can be specified at the beginning of a project (or a sprint or epic in the case of Agile). This may be true for processes that have business owners, but business users often don’t understand the source data that must be the starting point in any data-centric project. In fact, they may not even know where to get the data from or if it even exists. All too often, the result is that development work occurs, and then at a later stage the users discover the outputs are not right or what was intended. The users blame the development staff, and the development staff blame the users for not providing sufficiently clear and detailed requirements.

In our work with clients of all sizes and industries, we regularly see problems with data-centric initiatives. We decided there has to be a better way to assure success with these projects — and that led to us creating the Data-Centric Development Life Cycle (DCLC).

The DCLC Methodology

DCLC is First San Francisco Partners’ methodology for undertaking data-centric projects, like data lake, Big Data and BI/Analytics. It is based on the premise that data-centric projects are different from process-centric projects, and that process-centric methodologies will never fully meet the needs of a data-centric initiative.

What DCLC highlights is the following:

  • Business users cannot give specific requirements at the outset of the project. No matter how much IT demands these inputs, the users do not know the details of the source data. IT has to be a true partner to them and help the users to gain this knowledge. Only then can the users provide the business analysts with what they need to capture requirements in sufficient detail to allow IT to begin development.
  • Finding the best source of data is not simple. It should not be done by IT staff searching for what they think is the right source. Frequently, this is simply based on who they know on other application teams.
  • Source data analysis, a complex set of tasks that need to be carried out early on, is critical for project success. The right staffing is needed for these tasks.
  • Legal, compliance and regulatory issues affect the permitted use of data. These issues need to be clarified before outputs are developed.
  • Data quality has to be tackled from the start. It is especially important in the testing phase, and this generates deliverables that travel to production.
  • Data knowledge or understanding that is generated during the project is another production deliverable. This knowledge will help future users gain value from the data the project works with.
  • Many activities can be done in parallel. This puts more responsibility on the project manager to coordinate the project, rather than overseeing transfer of work from one group to another within the project.
  • There are more stakeholders than just the project sponsors. For instance, since data is being produced, it should be available for analytics. Data scientists and other may have suggestions that can easily be accommodated.

Benefits of Data-Centric Development Life CycleBenefits of DCLC

DCLC reduces risks on data-centric projects. One way it does this is by recognizing that information requirements must evolve. Traditional methodologies need the requirements all gathered at the beginning of the project and taken through development to User Acceptance Testing before the users see the outputs. All too often, the users then say this is not what they wanted and the developers blame the users for not providing specific requirements. With DCLC, tight cycles of iteration in the early phases of the project allow users to enhance information requirements as source data is understood. This also creates synergy between the users and analysts, as they are jointly responsible for the information requirements.

Another benefit of DCLC is the ability to have more activities in parallel. Both Waterfall and Agile conceive of linear flows of activity, but in DCLC there are parallel tracks — particularly in terms of beginning work on the target data store while analysis is being performed on the source data. This can speed up the project.

Perhaps one of the greatest benefits of the DCLC methodology is that it calls out all the specific data-centric roles on the project. Traditional roles in IT have persisted through the shift from process-centricity to data-centricity and often left unchanged. This resulted in mismatches in the staffing of data-centric projects versus what is actually required. This, too, has contributed to risks on these projects. With DCLC, the roles are specifically data-centric and this ensures proper staffing of data-centric projects.

Part two of this series covers organizational readiness for DCLC, integrating data-centricity and process-centricity and getting started with DCLC.


Interested in learning more about DCLC?
Download infosheet.

 
Ready to talk with us about how DCLC can help your organization?
Contact us.

Article contributed by Malcolm Chisholm. He brings more than 25 years’ experience in data management, having worked in a variety of sectors including finance, insurance, manufacturing, government, defense and intelligence, pharmaceuticals and retail. Malcolm’s deep experience spans specializations in data governance, master/reference data management, metadata engineering, business rules management/execution, data architecture and design, and the organization of enterprise information management.

March is Data Education Month at DATAVERSITY - 25% off courses