posted 1 Aug 2007 in Volume 10 Issue 10
Managing master data, part IV
In today’s data-integration mix, where does master-data management fit in?
By Mike Fleckenstein
[For a full PDF of the article or all articles in the series, including all graphics, please e-mail the editor, Graeme Burton. We will turn around requests within 24 hours. Apologies for any previous requests for PDFs that may have gone ignored.]
As the world’s economic interactions have become more electronic so the need, likewise, to integrate data. After all, how can systems communicate consistently unless they understand what each and every element of data refers to in each application? How can transactions be accurately recorded if the relevant data is inconsistent, incomplete or inadequately described?
Many technology concepts and offerings have been used to address the various different approaches to data integration. This final article in the series will explore some of these concepts, what they are and how they relate to master-data management (MDM).
We will examine how traditional approaches to data integration are developing: offerings in the areas of data cleansing, data extraction and transformation, and data warehousing have been around for some time. But how are they evolving and where does MDM fit in? We will also examine some more recent concepts. For example, we’ve all heard about the service-oriented architecture (SOA), enterprise-application integration (EAI) and enterprise-information integration (EII). But how do they relate to master-data management?
In some respects, better data integration is being forced upon us. Regulatory guidelines and mandates provide a case in point. For example, the Basel II Accord, which is intended to improve risk management in the banking system and, in the US, the Sarbanes-Oxley Act, which requires publicly listed companies to report detailed financial data, with the CEO’s head on the line if they get it wrong.
Additionally, many companies realise that they need to better integrate data in order to improve their own business performance and to serve customers better. Obviously, few organisations can afford to replace all of their application software at once – it must be done one system at a time and the more new applications that can ‘plug and play’, the better. It is simply cost-effective to minimise data-integration efforts in this way because customer-coded integration is expensive to achieve and poses ongoing maintenance and upgrade challenges, too.
In previous articles we’ve explored everything from candidates for MDM to architectural approaches for housing master data. We also looked at some real-life case studies. In part IV, we will examine different approaches to data integration available in the industry as a whole and their relationship to MDM.
MDM and the data warehouse Historically, one common approach to integrating data – normally for the purpose of analysis – has been to construct a data warehouse and to export a copy of the data there. In a traditional data warehouse, data from several different sources is extracted, cleansed and transformed into an appropriate format, then loaded into a single database schema.
Bill Inmon, sometimes referred to as ‘the father of data warehousing’, stated that a data warehouse must be subject-oriented (that is to say, real-world relationships should be reflected in the data structure); time-variant (changes are tracked over time); non-volatile (data is never over-written) – and integrated (the data from multiple applications must be consistent).
Typically, the data in data warehouses is batch-loaded, normally every night at the close of business. Such data warehouses are very useful for analysing historic trends over time and projecting trends going forward.
However, data warehouses are evolving and becoming more dynamic. There are a number of reasons for this. First, continual improvements in compression technology enable much more data to be stored, reducing hardware and input/output costs.
Second – and related to the first – is the proliferation of unstructured data (such as documentation, letters and invoices) and the need to incorporate this information in a data warehouse, too.
For example, in addition to line-by-line details of transactions, an organisation might also need access to e-mails, notes or other related information. Data warehouses, such as IBM’s DB2 Viper, now accommodate access to unstructured data by making it searchable and integrating it side by side with structured, relational-database data. Furthermore, some vendors, such as Sybase, which is particularly strong in the financial-services industry, are delivering new data-indexing techniques to make data access very fast. In a nutshell, all these offerings are helping to make data warehouses larger, more comprehensive and faster to access.
Another development is more tightly integrated data. The need for tighter coupling of data is a natural outcome of having more data in one place, since only then can the data be viewed in its proper context. Naturally, regulatory mandates and guidelines – such as Sarbanes Oxley in corporate America and Basel II in the European banking community – have forced a tighter coupling of data as well.
Finally, companies are seeking faster feedback to changes in data. While batch updates serve historical, analytical purposes well, they are insufficient for tweaking operational processes; today they must be instantaneous. Think of the benefit a retailer attains by realising that they are running low on widgets as the consumer is checking out; information can be instantaneously integrated and reported to alert the supplier. The desire for near real-time access to data stems from the need to tweak business processes more quickly in a service-oriented world.
To better facilitate these requirements, data warehouses are using master-data management. While separate from the warehouse, an MDM repository can dynamically share tightly coupled key-data, both with the data warehouse as well as source and downstream applications. Table one summarises some of the trends in data warehousing.
MDM and EAI
Enterprise-application integration fosters data propagation and business-process execution among distinct applications to make them appear as a single, global application. Its focus is on the messaging between applications, to integrate operational business functions that involve several different applications or systems, such as taking an order, creating an invoice and shipping a product. One common way to accomplish this is to leverage an enterprise-service bus. This provides a unified interface that enables application developers to more easily tap into multiple environments without having to custom-code transport messages between these environments.
The intent of EAI can be summarised as providing a common façade for multiple systems, integrating data and processes across applications, making an environment vendor-independent and ensuring that data is kept consistent across different applications.
Note the last item in that list – consistent data. EAI’s focus is on managing the message flow among the disparate systems. Thus, it is left to MDM to ensure that key entities are defined consistently. Once defined, the MDM repository can be linked to the enterprise-service bus to accommodate enterprise application integration. While an EAI system may provide for data transformation, it certainly does not ensure that a given product or customer is dynamically defined the same way in two or more applications.
Figure one illustrates how a master-data repository fits into the EAI architecture using an enterprise-service bus.
MDM and EII
In stark contrast to the tighter coupling of data noted above, a parallel trend in data integration has been to loosen the coupling between data. The idea behind enterprise-information integration is to provide a uniform query-interface over a virtual schema.
The EII tool seamlessly transforms the initial query into database-specific queries against the physical source-databases.
The end-users can therefore utilise business-intelligence tools and other applications to query a single, albeit virtual, schema. The loose coupling of data sources enables data in the virtual schema to be reflected both in terms of the source as well as in integrated terms. The extent to which data is integrated (that is to say, tightly coupled) determines the extent of the global view.
One issue is how to resolve data-definition differences between heterogeneous systems. For example, if two companies merge their databases, certain definitions (such as ‘earnings’) in their respective schemas will conceivably have different meanings. In one database it may mean profits in dollars, while in the other it might be the number of sales.
Here of course, MDM can help homogenise key corporate data prior to its EII incorporation, thereby lessening the resulting semantic conflicts. By integrating data prior to its introduction into the EII umbrella a company assures better data consistency. Figure two reflects this concept.
MDM and SOA
Service-oriented architecture (SOA) can be defined as a loosely coupled array of re-usable software components, exposed as services, which can be integrated with each other and also invoked as services by other applications. The strength of the SOA concept is flexibility. The key idea behind it is to enable organisations to put together applications and processes by stringing together different software components – but the data, more than ever under SOA, must be consistent.
Think of placing an online order, for example. This type of software service can be integrated, as necessary, with finance, marketing, third-party inventory systems and so on. EAI and EII, as described above, can be part of the SOA structure.
In practice, SOA means different things to different people. To the business manager, it means the process governance and organisation for project/program management and the business components that can be tweaked or re-used to reduce cost. The legal team needs to know whether the service is creating a liability outside the company, and what regulatory issues and exposure this might cause. To an IT architect, SOA means the overall enterprise design that enables the IT department to deploy business rapidly. And finally, for the chief information officer, SOA is simply the IT strategy for delivering business capability: what business functions are automated at what cost, maturity and return on investment?
The roadmap to SOA is the same as with any other IT effort:
- Understand business services and how they need to be integrated. This requires a close working relationship between IT and the business community (ie. requirements);
- Identify key performance metrics, such as reducing product defects by a certain percentage (ie. design);
- Develop an SOA outline that highlights the benefits in business terms (ie. user involvement);
- Identify quick wins (ie. implementation).
So how does MDM fit into the SOA construct? Note the second and fourth points in the above outline. It is impossible to correctly measure key metrics across the enterprise unless they are consistently defined. So, defining key metrics is an integral part of SOA and MDM is the underlying construct to accomplish this; it definitely helps to produce quick wins.
MDM, data cleansing and ETL The above approaches to data integration show that MDM is key to improving data integration as data is more and more gleaned from diverse systems. However, data must first be cleansed, extracted and transformed before it can reside in a master-data repository – it must be consistent and correct. Extraction, transformation and loading (ETL) tools, together with data-cleansing tools, can help accomplish much of this task. However, in order to get data correct there may also be a manual component to cleansing.
For example, take customer data that is stored in a series of fields labeled as follows: address1, address2, address3, address4 and address5. The first step here is to ensure that the first name is always in the address1 field, the last name is always in the address2 field and so on. This type of effort can only be done manually. No cleansing tool can accomplish this. Once manual cleansing is completed, though, data cleansing tools can be applied to examine the inputted data, and to de-duplicate names and standardise the addresses.
There are many third-party vendors that can also contribute to the process.
Services range from dynamic postalcode validation, provided by companies such as Dunn & Bradsteet, to vendors performing bulk-cleansing services for pennies per record. Another example is a nick-name list that recognises that ‘Richard’ and ‘Dick’ may be the same person. All of these types of tools are often bundled in cleansing software and can also be purchased independently.
Data-cleansing tools can be set up to enable continuous cleansing as well. It is relatively easy to conceive that a customer may be entered into the system multiple times, even once the data has been initially cleansed and a master-data repository set up.
No tool on the market can prevent this. However, allowing end-users to de-duplicate or merge these types of entries in an ongoing way helps to ensure that the master-data repository remains correct.
ETL tools can also extract similar data (for example, customer data) from multiple applications, transform that data into a homogenous form and deposit it in an integrated way into a single repository.
However, these tools do not actually share a standardised definition of an entity with their source systems. Some ETL tool vendors are positioning themselves as MDM software providers. The difference, though, is that these tools simply and routinely transform the data at the time of extraction.
That’s where the management of master data begins. Once extracted and deposited into the master-data repository changes to it are dynamically shared with source and downstream systems. Whether it consists of exposing customer data over the web to enable customers to manage their profile – which ought to make the data more accurate, while cutting costs – or whether internal users are managing a product hierarchy, this is where a decision on what software to use to manage master data can be made.
Bringing it all together
This article has examined a number of approaches towards integrating corporate data. We looked at some technologies that have been around for a while, such as data warehousing and data cleansing, and how they are evolving and relate to MDM. We also discussed how MDM relates to some more recent technology trends, such as EAI, EII and SOA.
In each case MDM aids the integration effort. It is a fundamental requirement for tightly coupled data and fosters the global view of data for loosely coupled data. Finally, it is difficult to imagine how a service-oriented environment can function if each service has its own definition of key entities.
Mike Fleckenstein is principal analyst at Project Performance Corporation (PPC). He has 20 years experience developing and deploying data solutions for public and private-sector clients around the world. He is currently involved in leading the MDM and insurance practices at PPC (www.ppc.com), a Washington DC-based IT consultancy, using best-of-breed technologies and solutions for data warehousing and master-data management, among others. Prior to joining PPC, Fleckenstein served as application manager at Medmarc Insurance and ran his own IT consulting firm, Windsor Systems Inc, which specialised in IT and data solutions.
Table one: Traditional and dynamic data warehousing
Table one: Traditional and dynamic data warehousing
Traditional data warehousing
Dynamic data warehousing
Provides a window into past operational data for historical analysis and reporting.
Provides a window into near-real-time operational and transactional data for both strategic planning and operational purposes.
Accesses only a limited number of business processes and systems.
Provides tight integration among enterprise-wide business systems.
Supports only structured data.
Accesses structured, unstructured and metadata.
Requires specialised skills or knowledge to access and use.
Delivers information to all enterprise
constituents within the context of the activities they are performing.