Dmall takes grocery orders online and employs workers who buy the items in supermarkets and delivery them quickly to consumers.
(Page 2 of 3)
The objective was to approximate on the site ahead of time what Funk calls “real battlefield conditions.” Vermont Teddy Bear worked with KeyLabs to simulate user traffic at different levels and at different times of day to determine error rate and cycle times. The idea was that it was better to uncover problems via simulated orders than when serving actual customers, when site problems could mean lost revenue, and that’s exactly what happened. “We built the new application; we thought we had it right, but it had issues when we load-tested it,” says Funk. “We wanted to make sure the site could handle the traffic. But our first test showed we still needed to fix things before we were ready to go live.”
Issues affecting site performance can range from load balancing to server or bandwidth capacity, to problems with the applications themselves, such as script errors, Funk notes. That means identifying a problem through testing and monitoring is only the first step toward resolution; isolating its cause among a multitude of possibilities is something else. Funk’s team uses data from error messages, visual inspection of pages flagged as problems, and sifting through other variables to pinpoint the cause of problems once problems are found. “Our sites are online stores that use common elements, common templates. It’s not that hard for us to diagnose issues-usually the problems float up to the surface somehow,” he says.
But that can be time consuming and, at a site as complex as consumer and auto electronics retailer Crutchfield.com, downright unwieldy. About 85% of Crutchfield’s applications are internally developed. The order entry system alone, developed over the last 10 years, now has almost 1 million lines of code. The site runs upwards of 350 applications in multiple languages and several thousand ASP pages. That raises the possibility of interaction problems on several fronts: between its internal applications, such as switching from one language to another, for example; between third party applications and other third party applications, or between outside applications and its own code.
That makes sifting through possible sources of trouble to find the actual culprit impractical. It’s so time-consuming, in fact, that CIO Steve Weiskircher says that after weighing the programmer time required to re-create the sequence of events leading to a minor problem on its internally-facing system used by call center agents against an agent productivity loss of perhaps 10 seconds, Crutchfield chose to track down and resolve only about 15% of such problems.
But recently, Crutchfield has been using a tool that shrinks that cycle time. The AppSight Black Box from Identify Software represents a category of product that tackles the issue with software which captures actual event sequences, allowing IT staff to simply replay a sequence rather than try to re-create it.
The software captures a real-time log of user actions, system events, performance metrics and other site operations data. Among other uses, Crutchfield has tapped the software to find the source of a problem within an internally written application that was designed to shop for rates at carriers for packages awaiting shipment. Crutchfield wrote the web service that made those calls to carriers, noticed when the application appeared to be slowing down, and saw that, unresolved, it stood to slow the turnaround on customer orders. The Black Box identified the time associated with each element of the application and revealed the location of the bottleneck: internal servers weren’t up to supporting the new rate-shopping application.
Crutchfield resolved the issue by boosting server capacity, an upgrade that already was in the works. “The real benefit of the troubleshooting software in this case was in the time it shaved off finding the problem,” Weiskircher says: minutes, compared with hours it may have taken for IT staff to retrieve the same information by looking for it in the source code, testing and re-testing the application.
Other technology providers such as Xaffire Inc. also provide software that monitors and replays web site user sessions to troubleshoot problems. Tower Records gets similar functionality from TeaLeaf Technology Inc.’s RealiTea. To cover all bases in ensuring that technology supporting site operations is up to standard, Tower uses monitoring and applications testing services from Keynote Systems Inc. and Digital River Inc.’s Fireclick Inc. as well. The three vendors provide feedback on site performance and site problems at different levels of detail.
TeaLeaf captures data by individual user sessions to give Tower a window on customer- specific interactions with the site, data it uses primarily for tech support and customer service. Recently, for example, a customer contacted Tower.com to complain he couldn’t log onto his account and that repeated e-mails to Tower requesting help with his account password had gone unanswered. Replaying session information showed that while the customer kept hitting the “view hint” button, the site couldn’t display a hint because the customer had never supplied one when setting up his account.
Session information also showed that Tower had sent a number of e-mails in response to the customer’s questions. It turned out the customer never received them because they’d been mistakenly blocked by his AOL address as spam. “In that case our software was working okay, but it would have been much harder to figure out what was happening had we not been able to just look at the session,” says vice president of e-commerce Kevin Ertell.
With testing and monitoring services and products getting better at identifying performance and application problems, it’s no surprise that some companies are starting to quantify the accumulated data and use it as the basis of service-level agreements that guarantee technology performance. And as systems have become more reliable, some of the focus of web site monitoring and testing has shifted away from an earlier emphasis on basic site availability to site responsiveness and all the elements that affect it.
As a result, forward-thinking retailers are now looking to monitor and test site operations with a new metric in mind: consistency of response time. “In addition to expectations of functionality and ease of use, customers now carry expectations related to site performance,” says Matthew Poepsel, director of business development at Gomez. Meeting them is critical for retailers, he contends-any gap between visitors’ expectations of and their experience with site performance is an indicator of customers’ frustration, their propensity to click off and go elsewhere, and a wasted opportunity for that online retailer.