Problems consolidating databases
The group on the left shows processes that bring data into the database from outside – either manually or through various interfaces and data integration techniques.Some of these incoming data may be incorrect in the first place and simply migrate from one place to another.The data certainly did not add up to what was showing on the summary reports, yet the reports were produced from these very data! In fact they were even missing from the data dictionary!Certain codes were used in some years but ignored in other years.In other cases, the errors are introduced in the process of data extraction, transformation, or loading.High volumes of the data traffic dramatically magnify these problems.The new system was certainly not programmed to do so, and nobody remembered to indicate all this logic in the mapping document.
Among other things, we needed to convert employee compensation data from the "legacy" HR database.The group on the bottom shows processes that cause accurate data to become inaccurate over time, without any physical changes made to it.The data values are not modified, but their accuracy takes a plunge!This usually happens when the real world object described by the data changes, but the data collection processes do not capture the change. In this chapter we will systematically discuss the 13 processes presented in Figure 1-1 and explain how and why they negatively affect data quality. More often the starting point in their lifecycle is a data conversion from some previously exiting data source.And by a cruel twist of fate, it is usually a rather violent beginning.Most companies live with the consequences of bad data conversions for years or even decades.In fact, some data problems can be traced to "grandfathers of data conversions," i.e.Records with negative amounts – retroactive adjustments – were aggregated into the previous month, which they technically belonged to, rather than the month of the paycheck.Apparently the old system had a ton of code that applied all these rules to calculate proper monthly pensionable earnings.While each situation is different, I eventually came up with a classification shown in Figure 1-1.It shows 13 categories of processes that cause the data problems, grouped into three high-level categories.