posted 25 May 2007 in Volume 10 Issue 8
Workshop: Master-data management
Managing master data, part II
Having recognised the need for master-data management what is the appropriate architecture for a company?
By Mike Fleckenstein
[This article contains references to illustrations that cannot be reproduced with our current web technology. Please accept our apologies for this shortcoming and e-mail the editor, firstname.lastname@example.org, for a full PDF version if you would like to read it in its original print form]
Most companies today have anywhere from a handful to thousands of transaction data stores (TDS) in which daily business processes are recorded. Examples include financial systems, marketing systems and operational systems, such as order entry. These systems are often developed separately and, as a result, contain the same data with inconsistent definitions. This is particularly true for companies that have undergone mergers or acquisitions. Additionally, companies undergo constant change. This adds to data structure inconsistencies over time and between systems. Integrating these systems to glean operational, historical, future and summary data is a challenge for any company.
Master data management (MDM) addresses the consistent data entry, system of record and (data model) evolution of key corporate data items. Examples include customer, product and location. Architecting the right solution depends on a company’s use of data.
Data management in the past
It is not uncommon for even small companies to have hundreds of reports with conflicting information. This problem begins with the multitude of systems typically present in a company that process daily business. These systems often have limited overlap and have been independently developed, resulting in data inconsistencies. To address inconsistencies across such transactional data stores companies integrate information in a number of ways, including:
- Batch reporting – these reports are run nightly to have minimal impact on the production system; the biggest drawback is the latency of available data;
- Replication – replicating systems allows more real-time reporting but only integrates the data within a given report rather than integrating it physically;
- Federated reporting – this approach creates a virtual, integrated view of the data against which multiple reports can be written; however, data physically still resides in the source systems;
- Operational data store– this is a low-latency, physical data store that integrates transactional data from multiple systems; this integrated data can be used to build new applications and feed downstream systems;
- Enterprise data warehouse – this is a physical data store that contains a history (and sometimes summary) of transactional data for analytical reporting.
All these approaches enable companies to report on integrated data. Each approach offers benefits and which one gets deployed depends on how the data are used. For example, storing transactional data in an operational data store reduces latency when that data is required by downstream applications. However, it does not lend itself to historical and summary analysis.
Ultimately many companies feed integrated data to an enterprise data warehouse. This enables organisations to perform strategic and tactical analysis, also called business intelligence (BI), on historical and summary data. Figure one illustrates how integrated data is fed to the enterprise data warehouse via an operational data store. Ideally, the data in each source system is kept in sync. More realistically, it is integrated before being extracted to the operational data store.
Depending on the company’s use of data (operational versus analytical), transactional systems can also feed the EDW directly. This is suitable for a company with no real-time need for integrated, transactional data, but which still wishes to perform historical and summary analysis against the data. In either case, data must be integrated via an extraction process from the TDS before being deposited either in the ODS or the enterprise data warehouse.
In Figure one, master-data entities are individually managed in transactional systems. Often the same entities, such as ‘customer’, for example, are managed independently in multiple transactional systems and then integrated into the ODS and EDW. This type of data integration works well for entities where the definition and relationships remain consistent. However, when looking at the definition of master entities such as customer, product and location, it is easy to see that these can change dramatically over time, especially due to mergers and acquisitions. Operational data stores, by design, store current business transactions and are therefore not qualified to store the evolution of these master-data entities and their changing relationships. Enterprise data warehouses do store data history but are equally ill-suited to store this information. Reasons for this include:
- Data warehouse data models are not designed to track the evolution (and the associated evolving relationships) of master-data entities. Rather, they are designed to lend themselves to a much broader set of queries assessing trends;
- Data warehouses lack the ability to dynamically query and write back to the source. They are updated each day, week or month for the sole purpose of providing query access.
Data management in the future
Master data refers to key business entities within an organisation.
The data model for these entities, as well as the relationships between them, both evolve over time. This is intensified by company mergers and acquisitions. Companies are realising that these entities, their evolution and the evolution of the relationships between them must be tracked in order to get an accurate data picture. To accomplish this, companies can segment master data from other transactional and historical data. How this is done depends on whether the need for master data is operational or analytic.
Companies that have a need for operational master data will want to integrate this data into an operational master data store (OMDS) separate from their ODS. Figure two illustrates this concept. In a fully compliant enterprise MDM environment, the systems of entry and the system of record are the same system. This eliminates data redundancy and improves data quality and consistency. More realistically, master data is maintained in a disperse set of transactional systems and integrated into the OMDS in the same way transactional data is integrated into the ODS. Companies can and should grow more and more towards making the OMDS both the entry point and system of record for master data. However, note that master data is dynamically written back to the transactional data stores.
Organisations that have a need for historical analysis of master data will want to deploy a historical master-data store (HMDS). This can be done by using the OMDS as a source for the HMDS in a similar way as the operational data store acts as a source for the enterprise data warehouse. Alternatively, a company can deploy a hybrid MDS that houses both operational and historic master data. Figure three opposite illustrates this concept. A subset of master data could be extracted into the data warehouse for ease of access or performance.
According to an ARC Advisory Group study, ‘Master Data Management Worldwide Outlook’, the MDM software market in 2006 was worth $680m and is forecast to grow to about $1.35bn in 2011 – that is to say it’s not just significant, but is growing fast.
The MDM market is currently served by two types of product vendors. Data-hub providers, on the one hand, offer narrow, vertical systems for a single master-data-entity type, such as ‘customer’. These products have been developed by the major vendors, such as IBM, Siebel (now part of Oracle) and Teradata. They are typically of the hybrid MDS variety and accommodate both operational as well as analytic master data management.
Alternatively, more universal MDM platform providers such as Kalido, Purisma, Siperian and others have put forward solutions that allow for the definition of a much broader set of master-data entities. These products can be limited in terms of their ability to be queried, end-user interface and performance. Custom-coded solutions offer the greatest flexibility, naturally.
The increasing focus on MDM has led, in some cases, to framework constructs in custom-coded solutions that enable them to be deployed more rapidly. However, these solutions may be difficult to maintain as the number of master-data entities, and changes to their underlying model, increase.
Choosing the right product depends on the role of master data within an organisation. If it is most critical for a company to integrate customers worldwide, for example, then a single-entity data-hub solution is ideal. If the number of master-entity types is likely to be two or more and if those entity types are subject to frequent model changes, then a more broad-based MDM platform makes sense. Custom-coded solutions apply any time the underlying master-entity model remains stable. If that is a given, then they may also be cost-effective, relative to data hubs, when the number of master entities to be managed is greater than one. Figure four illustrates these points.
Vendors in each area are focusing on incorporating more features in their products. Data hub vendors, for example are focusing on the ability to incorporate a broader spectrum of master data entities while broad-based MDM solution providers are improving performance. Additional areas of focus include integration of business process management, which helps discipline data entry as well as allow analysis of data at specific points in a process.
Master-data management enables organisations to consistently define, enter, store and manage key business entities on an enterprise-wide level. Master-data stores and their respective applications are specifically designed to incorporate master entity (data model) evolutions and to dynamically write back to the source systems. Neither an operational data store nor an enterprise data warehouse is designed for these functions. Organisations that have a need for operational business intelligence should consider a separate operational master-data store. If there is no need for operational business intelligence, organisations should consider a historical master-data store which tracks the evolution of master-data entities and their relationships over time. It is also possible to combine these into a hybrid master-data store.
Master-data-management software vendors can be divided into two groups. On the one hand, there are comprehensive solutions for managing single master-data entities, such as ‘customer’. On the other, there are providers that can accommodate a broad set of master-entity types. Custom-coded solutions present a third option. Which is best for an organisation depends on what types of, and how many master-data entities, an organisation wants to manage.
Mike Fleckenstein is principal analyst at Project Performance Corporation (PPC). He has 20 years experience developing and deploying data solutions for public and private sector clients around the globe, focusing on the insurance industry for the past six years. He currently leads the insurance practice at PPC (www.ppc.com) using best-of-breed technologies and solutions for data warehousing and master-data management, among others. Prior to joining PPC, Fleckenstein served as application manager at Medmarc Insurance and ran his own IT consulting firm, Windsor Systems Inc, specialising in IT and data solutions.