E-retailers wave the 24/7 banner as a strategic advantage over brick-and-mortar businesses. But downtime and system failures continue to tear away at that always-open image.
Outages have struck a who’s who of Internet retailers, from Amazon.com to Toys ‘R’ Us. Last June, analysts had a field day ripping into eBay after a 22-hour system failure frustrated customers and drove down its stock price by 26%. Recently eBay renewed its contract with Sun Microsystems to supply servers and data storage units at the center of that outage.
While eBay and other business-to-consumer sites have vowed to avoid crashes by paying stricter attention to backend systems integration and site traffic management, many still haven’t, says Eric Pakman, chief technology officer for Network-shop Inc., an Internet systems integration and consulting firm based in Montreal. “People are putting so much time and energy into building a site that monitoring anything related to systems is an afterthought,” he explains. “But site performance will ultimately make or break their business.”
Slow site performance and major server crashes are already at an all-time high: 50 so far this year. And analysts project that up to 70% of Web sites will suffer some power failure or system slowdown by December.
Amazon.com executives won’t discuss their internal monitoring software, but they acknowledge contracting with Keynote Systems for more than two years to monitor site traffic externally. Even so, an Amazon security executive questions the service’s accuracy. Keynote tracks how sites are responding to traffic from different access points around the globe, then notifies its clients when problems occur. But during the denial of service attacks that struck various e-commerce sites in February, Keynote erroneously reported that Amazon was one of the sites blocked, says Tom Killalea, Amazon’s director of information security. “They said Amazon was blocked when people were actually able to access the site and perform transactions,” he contends. “So there are limitations to Keynote’s ability to accurately reflect what happens on the Web.”
The problem, according to Killalea, is that Keynote does not account for the fact that previous Amazon visitors access the site using cookies, giving them a personalized greeting page rather than the standard Amazon start page.
E-commerce sites are made up of multiple-and disparate-operating systems including networking and database hardware or servers. The applications cover Web server, networking, load balancing and database software. Because sites are connected by many hardware and software configurations, problems can occur when traffic overwhelms a particular subsystem or when servers can’t handle sudden, unexpected volume.
Site developers certainly aren’t ignoring the problem, following some troubling failures. Charles Schwab wound up pumping more than $70 million of emergency money into its Internet infrastructure last year after heavier-than-usual trading crashed the site for four consecutive days. And despite more than doubling its processing horsepower, Toys ‘R’ Us crashed during the holiday shopping season for the second consecutive year. EBay, Schwab and Toys ‘R’ Us are responding to performance problems by building more backup systems and redundant data centers or by investing in clustering and load-balancing technology. The systems can detect a server failure and then route messages or transactions to an unused portion of the network.
Raising red flags
But such measures aren’t enough. Newport Group Inc., a Barnstable, Mass.-based information technology consulting company, estimates that almost 52% of site crashes could be prevented if Web companies did a better job of troubleshooting potential problems.
That’s the reason why more CIOs and systems managers are looking at a new option: performance-management software, which sends up a red flag whenever the e-commerce application being monitored fails to meet performance expectations.
Written to work with Oracle 7.2.2 databases or higher and with various server operating systems, performance management software monitors front- and middle-tier and backend databases for message routing or Web trafficking problems.
The applications also track the flow of information across a commerce server platform by scanning operating systems, switches, routers, load balancers, database engines, Ethernet networks and other areas for signs of trouble. Using Windows-based monitoring and diagnostic tools linked to the Oracle or main administration database, the software can pinpoint and prepare instant summaries on various trouble spots.
Once the software spots a problem, electrical pulses traveling between the performance-manage-ment software and server networks create an instant graphic on the database administrator’s computer console in real time. “The conventional answer to solving traffic trouble has been throwing more hardware at the problem,” says Peter Urban, senior analyst of database technologies for AMR Research Inc., Boston. “What they really should do is build in a performance-management program that spots problems before they hit the pipeline.”
Performance management software is relatively inexpensive to buy and install. Most packages consist of a recorder for creating system workload scripts, a controller for stress-testing Web applications and a performance reporting tool.
The software sells for less than $50,000 and is available from such developers as BMC Software of Houston; Computer Associates International, Icelandia, N.Y.; Keynote Systems, San Mateo, Calif.; and Landmark Systems Corp., Reston, Va.; among others.
Austin, Texas-based Hoover’s Corp. is installing Quest’s Instance Monitor 1.0 software to give its 2.5 million online business customers faster and more consistent access to its business information repository on 55,000 U.S. companies and corporations. Unlike several major B2C sites, Hoover’s site has yet to incur an outage. But traffic has more than doubled in the last year to 1.2 million daily page views, and the company’s server network is running at close to capacity.
To prevent potential traffic problems, Hoover’s wants to ensure its central processing unit always has at least 20% of unused capacity in case of a big spike in processing volume or to head off potential system crash.