For Internet retailers, customer information is the lifeline. It’s the information used to deliver products and services, and it creates the profile of customers.
But self-reported data entered on the web is highly prone to error. Consumers hurry through data entry, abbreviating information and neglecting to check the accuracy of the information they enter. The result is inaccurate data, which harms the fulfillment process. But more importantly, inaccurate data can do more damage as retailers become multi-channel outlets. Retailers risk alienating customers if they don’t have an accurate record of interactions across all channels.
Companies solve this problem by cleansing their data periodically. They commonly perform this in a batch process several times a year or directly before a direct-mail campaign.
On the web, however, the batch approach spells disaster. That’s because retailers are mailing products directly to the name and address entered online by consumers during the registration or purchase process. And any inaccuracies create delays in product deliveries and unhappy customers-even if the problem is the customer’s fault.
Thus Internet retailers cannot cleanse data in batches. When conducting business on the web, it’s critical to integrate into the site structure technology that cleanses consumer data one record at a time-directly after it is entered and before the purchase is final. Such technology exists and ranges from software that can identify and fix common typos and update records with change-of-address information from the U.S. Postal Service to sophisticated matching processes that compare data to an up-to-the-second consumer database.
Here’s how it works: As a prospect types in data, that information is verified at the same time against a source of accurate consumer information-whether that be internal or third-party data. When data inaccuracies appear, the system can prompt visitors to re-enter or correct their data. Data can also be updated automatically. The retailer now has accurate name and address information for fulfillment and customer identification.
Once web data has been cleansed, the question arises: How does a retail organization integrate this data into its network of offline databases? Let’s say you just took an order from J. Smith. Is this the same person as an existing customer named Janet Smith? Or is this another customer named Jane Smith?
When new information enters offline databases, it cannot be integrated accurately unless the operator can answer the questions: Was the person online a new or existing customer, and if an existing customer, which one?
Here’s the rub. Companies maintain multiple databases for billing, marketing and analytical purposes. Rarely are these databases adequately connected to each other. In turn, each database is riddled with redundant, inaccurate, out-of-date and incomplete data. So when a company attempts to integrate data from one channel, such as the web, across appropriate company databases, database managers have a difficult time determining which source of data is the most accurate.
If customer data is integrated properly, a retailer will know that J. Smith is really Jane Smith, and that Janet Smith is a misspelling of Jane Smith. It just happened that when Jane Smith placed an order over the phone with her new address information, the customer-service representative mistakenly created a new record with a misspelled first name.
Customer-data integration is a detailed process of name-and-address matching. To match data across databases, an organization first must scan the network of databases for duplicate and look-alike customer records. It then must compare these records to each other to identify actual matches. It also can compare the data to a third-party reference repository of consumer data to validate matches and to resolve matches where customer records resemble each other but the organization isn’t sure they are an actual match. It then synthesizes all this information into a best customer record and a single view of each customer.
Integrating customer data across databases also serves another critical function: It can be a discovery process by which a company measures its overall data quality and detects critical deficiencies. Companies learn which databases contain redundant, inaccurate and outdated information and which contain the best information. They can then take steps to improve data quality.
The never-ending job
Data cleansing and integration are themselves only a part of data management. Owners of databases must assess and improve data quality continually, fixing data deficiencies as well as the processes that create them. They must also perform rigorous, ongoing data management through surveillance of both existing and new data.
Here is a brief guide to building a solid data-management program. First, survey all databases and assess the data in them. Start by cataloguing databases. Then map their integration by answering these key questions: Which databases are connected, which need to be connected, and where do you store the consumer data you collect from the web?
Next, examine each database for its structure and content. Is each table within a database clearly defined? How are tables updated or refreshed? How comprehensive is each table in scope? What meaning is assigned to each field within the table? Has that meaning changed with new usage? How complete or accurate are the data in each field? Are any fields missing, such as work phone number? Does the company collect the right data it needs to build loyalty through customer knowledge?
When you answer these and related questions, you can assess the overall quality of your existing data. This is the environment to which you are introducing data collected from the Internet.
After the assessment process, determine which problems are most vexing. These can range from easy-to-spot but widespread errors like redundant data, database fields that are not populated and data that is placed in the wrong fields to problems that require some analysis like outdated information and data models, such as a customer loyalty score, that no longer seem reliable.
Then develop and execute a comprehensive plan to fix these problems and their causes. The data-cleansing and integration processes already mentioned are the main tools to fix data quality. In addition, incomplete data may be resolved by finding new data sources such as other internal databases or data from third-party vendors.