intertek

NMSO - National Master Standing Offer
Quality Assurance

Test Methodology

Methodology Overview

Traditional Standalone Desktop/ Notebook Testing:
Stress/Compatibility Testing; Features and Usability; Intertek Standalone System Performance
Application Server Testing:
Performance results for the Advanced Server Category were obtained with the use of Intertek's Application Server benchmark suite; including SQL Server testing, E-mail Server testing, and Web Server testing.
Features and Usability:
Battery Performance Testing was executed on all Notebook categories. All test methodologies, including application versions, scripts and workloads, will remain constant for the entire life of the NMSO, with the exception that, in the event of a new release, Intertek will evaluate its impact on the methodologies and make a recommendation in respect to its substitution.

Compatibility Testing

The compatibility assessment is organized into the following test areas:

Stress Testing
The first step in compatibility testing is to select the products to be tested. Intertek selected the following Windows applications, which were expected to be among the most frequently used products by the end users: Microsoft Excel, Microsoft Word, Microsoft Access, MultiOffice, Photoshop, RARZip, Handbrake, and WMEncoding. The Stress test consisted of a Windows applications executing typical workloads in a continuous loop for 24 hours. Execution of the entire mixture of applications was followed by the simulation of a cold boot of the system before the mixture was repeated. The Stress test was executed on systems that were configured for the identical video mode/resolution.

Design Test Plans
The Intertek test development staff has developed a standard format for test plans which provides a detailed audit trail for the testing process. Intertek's test plans are not developed 'on-the-fly'. Each test plan is specifically designed to identify which functions are to be tested, how to execute each function, precisely the order in which the functions should be tested, and what the test engineer should expect to happen. Using this technique, Intertek provides a scenario by which the Intertek test engineer or the client can repeat any incompatibilities or discrepancies which have been discovered, using the same methodical, step-by-step approach.

Versatility Testing

Intertek conducted versatility/usability testing for each system by assessing the diversity of features for each system submitted for testing, and each system's usability.
Usability Scoring Methodology
Intertek examines each system for usability by focusing on a few key areas; system design, documentation, and Vendor specific Web Site content. In looking at system design, testers start by rating system teardown. Once inside the system, testers assess how easy it is to perform normal system upgrading such as increasing the system's RAM and to add adding mass storage devices.
Outside the unit, testers look for non-mandatory enhancements such as a reset button and clearly marked and color coded I/O ports. Testers navigate through each systems Setup Utility (if supplied) to see if the integrated hard disk controller and built-in I/O ports can be disabled (if applicable). Documentation considerations consist of: manuals that are comprehensive, include easy-to-read diagrams, and offer up-to-date technical information. Additional points are given for refinements such as quality and detail of illustrations and ease and breadth of comprehension. The presence of a glossary and an index is a bonus. Web sites are rated based on ease of navigation and technical areas; as well as client (NMSO) specific areas. Restore CD's (including hidden partition restores) are evaluated for ease of use and completeness of driver installation, etc. (if supplied with system).
The ease of performing the activities is weighted and scored, and a 'basic usability score' is then created by summing these values. This basic score is scaled to a 10-point scale, where a '10' represents 'perfect in all scored categories'.
Each system was evaluated based on the following criteria:
Desktop Systems Servers Notebooks
  • Potential Problems
  • Ease of Service
  • Restore CD
  • Web Site
  • Potential Problems
  • Ease of Service
  • Restore CD
  • Web Site
  • Cover Removal / Configuration / Internal Access
  • Hot Swap Usability
  • CPU/Memory Installation
  • End-user Documentation
  • Server Setup / Configuration
  • Hard Drive Installation
  • Potential Problems
  • Ease of Service
  • Battery and Power Management
  • Drive Device Management
  • CMOS Setup / Configuration
  • Keyboard Usability
  • Documentation
  • Pointing Device / Battery Life / Weight
  • Restore Utility
  • Potential Problems
  • Ease of Service
  • Website
Features Scoring Methodology
Prior to testing, questionnaires were sent to each system manufacturer. These questionnaires required detailed specification of more than 150 attributes of each system. Some items are informational (e.g., CPU manufacturer); whereas other features (e.g., Dual-Processor Capability on-board) are considered relevant features which contribute to the overall flexibility of the system.
This latter set of features is weighted and scored, and a 'basic feature score' is created by summing these values. This basic score is scaled to a 10-point scale, where a '10' represents 'perfect in all scored categories'.
TCO Features Scoring Methodology
The attributes for TCO consist of questions relating to the Environment (green capabilities), Security, Deployment, Certifications. TCO also consists of the verification of the System Restore, System migration, and Network management utilities. Value added services are also noted.
Each system was evaluated based on the following criteria:
Desktop Systems Servers Notebooks
  • Equipment Information
  • Processor
  • BIOS
  • System Management
  • Certifications
  • Security
  • Video Controller
  • Video Monitor
  • I/O Ports
  • System RAM
  • Secondary Cache
  • Hard Disk and Controller
  • CDROM/Audio
  • Value Added Services
  • NIC
  • Equipment Information
  • Processor
  • Certifications
  • System RAM
  • Hard Disks
  • SCSI Controller
  • Chassis, Power Supplies
  • System Bus and Motherboard
  • BIOS
  • I/O Ports
  • Server Configuration Utility
  • Server Management
  • Security
  • NIC
  • Microprocessor
  • System RAM
  • System BIOS
  • Software
  • Security
  • Video Display
  • External Video
  • I/O Ports
  • Sound / Audio
  • Communications
  • Hard Drive / CD-ROM
  • Power Supply / Battery
  • Power Management
  • Status Indicator
  • Chassis/Case
  • Pointing Device
  • Port Replicator / Docking Station
  • Portability

Performance Testing

One way of expressing a system's performance is through the analysis of the raw measurements of its various components, including the speed of the processor, the number of wait states on memory access, and the speed at which the hard disk reads randomly and sequentially stored data. These measurements allow for the comparison of the performance of the individual system components but in and of themselves do not provide a realistic indication of the performance of the system operating as a whole.
Software applications vary by the extent to which their performance is dependent upon each of these factors. For example, the performance of some applications, like spreadsheets, is determined primarily by the processing and memory access speeds of the systems. For others, like databases, the speed of the hard disk and memory sub-system is the principal factor affecting performance. The performance of a single application benchmark, however, depends not only on the speed of individual components, but also on the interaction between the different components and sub-systems.
Intertek's Standalone System Performance Benchmark Suite was specifically designed to assess how effectively a system's sub-systems interact and how that interaction affects overall performance. Since Intertek's Standalone System Performance Benchmark Suite performs a series of tasks which replicate actual use, the application benchmarks provide a more realistic measurement of how each software package will perform on a particular system. Consequently, results obtained from the various Performance Benchmark Suites reveal concrete, real-world differences between comparable servers.
The Intertek Standalone System Performance Benchmark Suite
The speed at which a computer operates is significant in that it directly affects how much work a user can produce over time. Intertek's suite of Standalone System Performance Benchmarks emulates typical use in order to provide a realistic measurement of how each software package would perform on a range of systems.In general, the specific functions tested in the benchmarks were chosen to highlight performance differences between test systems. In designing its benchmarks, Intertek considers both the time required by the system to perform the task as well as the unproductive time a user must wait for the computer to respond. A faster system can enhance the user's productivity by taking less real time to perform a function, ultimately decreasing the amount of unproductive time the user must spend waiting for the system.
All of Intertek's software application benchmarks test the speed of performing time-consuming functions within each program. For example, the word processing benchmarks each include a spell check: a relatively lengthy procedure. Although all of the benchmark tests of a single application category exercise the same functions, the results should not be used to compare the relative performance of the different test applications. The benchmarks were designed specifically to test system performance.
Office (Word, Excel) Office (Access) Photoshop
the individual apps. These two tend to test the system as a whole. When testing both at once (Multi-Office) it allows for the differentiation of the multi-threading capabilities of each system. Windows .vbs scripted - performing real world functions of the individual app. This app tests the system as a whole but also memory and it's sub-functions plays a very important part in the test. Scripted to carry out numerous tasks within the app. Tests the system as a whole and the GPU (video) capabilities of the system.
 
RARzip Handbrake Windows Media
Scripted to perform the following: Extract Microsoft Access database; Extract Multiple Microsoft Word and Excel documents; Extract Photoshop Graphic Files
To accomplish this we: Compress all of the above files using WinRAR Various Compression Levels; Compress a large Video File using WinRAR Various Compression method
Encoding mp4 captured Video into Apple TV mp4 videos. To accomplish this we use Handbrake with the preset 'AppleTV' Encoding XVID mp4 video into Microsoft Windows Media mp4 videos. To accomplish this we use Microsoft's wmcmd.vbs script and pass it the source file and the destination file. This script then encodes the video using default values for quality.
Test Automation and Reliability
Intertek designed the System Performance Benchmark Suite to be as automated as possible. Automation improves the accuracy and consistency of the timing and procedures. In addition, automation insures that the testing methodologies utilized for the Performance Benchmark Test Suite are identical for each benchmark on each of the systems tested. When conducting comparative performance benchmarks, Intertek applies strict testing procedures to ensure the accuracy of the results:
  • All machines are similarly configured to minimize variations in test conditions.
  • Each computer's hard disk was formatted and contained only the program, operating system, and data files necessary to run the tests. An identical procedure was utilized to install the required test files to the hard disk of each of the test systems.
  • All benchmarks are run a minimum of three times to obtain consistent and accurate results.
Intertek has developed the Benchmark Management System in order to automate the Intertek System Performance Benchmarks. The Performance tests are executed with no Network connection and the test systems are rebooted (a simulated cold-boot) after each iteration of each application. The Benchmark Management System controls the sequence of the tests and the collection of the data for each test system.
Performance Scoring
The score is determined for each application as: ((individual score) x individual weight). The overall score is then POWER (multiple of all applications/total weights
Battery Life Testing
Battery life is highly dependent upon the how a system is used. Consequently, battery life testing is more subjective than objective. No two users operate a computer identically, thus it is impossible to emulate real world usage. Also, the effectiveness of a system's power management is dependent on system usage. To reduce subjectivity, Intertek proposes to measure battery discharge and charge under worse case scenarios. The worst case scenario has APM enabled, but with the least amount of managed power savings. In other words, no devices will be power managed. It also has the hard drive accessing a file and simulating keyboard activity.
Today, all systems should support Advanced Power Management (APM). APM allows Windows 9x to obtain power information about the battery and to put the system into standby or suspend. APM also allows Windows 9x to monitor changes in a systems power state. This information can be used to perform battery life testing.
Intertek uses its proprietary Windows based Power Monitor program to perform battery life testing. This program interfaces directly with Windows' VPOWERD device driver and Windows' messaging to record power status and Windows events.
Before testing, battery conditioning must be performed. This consists of completely discharging the battery, completely charging the battery, completely discharging the battery, and finally completely recharging the battery. This will help reduce any charge 'memory' the battery may contain. APM is enabled with no devices being power managed.
Deviations as depicted in the battery charts from the expected results of a relatively smooth decline are noted and explained below. For example, if the line plot of the power vs. time is not linear, an analysis of the plot should be explained.
Typical charts of inferior plots are:
  • Sudden drop of power occurs within a very short time frame. An unsuspecting user could be caught off guard, alarmed, and unable to complete work in the amount of time anticipated.
  • A saw tooth or stair case type pattern. This commonly occurs when the power reporting algorithm uses modulo format (90%, 80%, 70%... 20% indicates modulo of 10 is implemented). Modulo algorithms makes it more difficult to predict rates of power loss by users.
  • Flat rate of power loss at either end of chart:
    • Very little power loss reports when the battery is fully charged can cause the user to falsely predict a very long battery life.
    • Very little power loss reports when the battery power is low causes the user to prematurely halt work in belief there is no power left when in reality there is significant time left before actual power loss.

Application Server Performance Testing

One way of expressing a system's performance is through the analysis of the raw measurements of its various components, including the speed of the processor, the number of wait states on memory access, and the speed at which the hard disk reads randomly and sequentially stored data. These measurements allow for the comparison of the performance of the individual system components but in and of themselves do not provide a realistic indication of the performance of the system operating as a whole.
Intertek's Standalone System Performance Benchmark Suite was specifically designed to assess how effectively a system's sub-systems interact and how that interaction affects overall performance. Since Intertek's Standalone System Performance Benchmark Suite performs a series of tasks which replicate actual use, the application benchmarks provide a more realistic measurement of how each software package will perform on a particular system. Consequently, results obtained from the various Performance Benchmark Suites reveal concrete, real-world differences between comparable servers.

Overview
The server methodology Intertek uses for benchmarking is designed to simulate a real world environment. The testing is based on four different areas of performance: Disk, E-Mail, SQL and Web. Each of these is individually tuned for every server category based on the specifications for that category. The following applications are used in the testing: Internet Information Services (Web), Microsoft SQL Server (SQL), Microsoft Exchange Server and the Intertek Disk test.

SQL Testing
This methodology examines SQL server systems to measure performance in the client/server environment and the time taken to process and index millions of records. The testing is setup to simulate a typical data warehouse application. - Each client station is configured with Windows XP. Intertek conducted a test of the performance with each server running Windows Server 2003. The tests are conducted using client station(s), simulating a number of users accessing the server at one time. This number is configured independently for each category. The servers were configured to run in a 1GB Base-T network topology. TCP/IP communication protocol was utilized between clients and server. Each client station controlled the execution of tests for each of its 'clients' and collected the results upon successful completion of the tests. A variety of SQL statements are executed at random on every client. The connection and execution time is recorded for each statement. These times are then averaged across all runs

E-Mail Testing
This methodology examines e-mail server software products to determine the maximum rates at which an SMTP server accepts messages and a POP3 client retrieves messages from a POP3 server. For this test, each client PC is configured with Windows XP and the server with Windows Server 2003. The e-mail server tests stress the performance of the server in handling the message delivery and retrieval. The performance test looks at the optimum thread rate where the server is able to accept and retrieve messages without compromising performance (without losing bits/s). Servers are expected to perform well under heavy load
The focus of the test suite is on performance, with a goal of determining the number of messages per second the e-mail server is able to accept (SMTP), and at the number of messages per second the server is able to send (POP3)
Two Intertek traffic generation tools are utilized to determine the highest rate at which a server could accept messages and the maximum rate at which messages could be retrieved:
SMTPTEST - an Intertek developed tool to generate SMTP traffic. (Messages are passed to user accounts on the server.
IMAPCLIENT - an Intertek developed tool to generate POP3 (Get) traffic. All messages on the server are retrieved.
The score is based on a twenty five minute window of activity in which the number of messages per seconds is recorded. The harmonic mean of three runs is taken and a small CPU factor is added to distinguish between high and low CPU utilization.

Web Testing
This methodology examines web server systems to determine how well each performs as an organization's web server. For this methodology, the tested systems were pre-configured with IIS software. The server systems ran under Windows Server 2003.
Intertek's Web Server Benchmarks stress the Web Server's ability to handle requests HTTP. Intertek's benchmarks simulate a realistic use of the web server system under test by using network interface cards to connect to the workstations, and by imposing heavy loads. Overall transaction time for each client data transfer is measured and reported under various load conditions, and these performance ratings for various scenarios are calculated from individual performance scores for the weighted benchmarks.
There are up to 4 physical clients in an Ethernet network connected to the test product web server via the 1Gbps Ethernet switch. One server is tested at a time.

Disk Testing
This methodology examines disk subsystem performance by simulating disk usage in a server environment. The testing is setup to simulate various types of disk activities including reading and writing both randomly and sequentially. Each run begins with a preparation stage to assure that the disk cache is cleared and that it does not affect the results. The tests are done using several different block sizes. The total time and throughput are measured and reported for every test. The harmonic mean of these results is compared to a theoretical maximum value which is then used to calculate the overall performance scores for the weighted benchmarks.