NSTL/SIPSS Benchmark Report
 A. Introduction
 B. Methodology Overview
 C. Compatibility Testing
 D. Versatility Testing
 E. Performance Testing

A. INTRODUCTION

Each advance in hardware and software technology poses new challenges to evaluation techniques.  In previous evaluations NSTL was able to apply common test methodologies across all categories of the NMSO.  Except for some minor variations to reflect the category, desktop systems, and notebooks were each subjected to the same suite of tests for Performance, Compatibility and Features/Usability.  Since it became increasingly difficult to differentiate between the products in the server categories using results produced by methodologies that were designed for standalone categories NSTL utilized a test suite specifically targeted at the server products.  In addition, the methodologies for the standalone system categories have been upgraded to reflect environments that are current in the industry. Some of the features of the new test suites include:

All test suites for standalone systems were executed with Windows 2000 as the OS. The network environment of Windows 2000 Advanced Server was utilized.

All network related aspects of the benchmark methodologies were conducted in a 10BaseT Ethernet environment (Desktops and Notebooks).



B. METHODOLOGY OVERVIEW

Traditional Standalone Desktop/ Notebook Testing included the following tests:

 Stress/Compatibility Testing

Features and Usability

NSTL Standalone System Performance

Application Server Testing included the following tests:

Performance results for the Advanced Server Category were obtained with the use of NSTL's Application Server benchmark suite; including SQL Server testing, E-mail Server testing, and Web Server testing.

Features and Usability

Battery Performance Testing was executed on all Notebook categories.

All test methodologies, including application versions, scripts and workloads, will remain constant for the entire life of the NMSO, with the exception that, in the event of a new release, NSTL will evaluate its impact on the methodologies and make a recommendation in respect to its substitution

 

 

 


C. COMPATIBILITY TESTING

The compatibility assessment for the SIPSS Benchmark Project was organized into the following test areas:

1. Stress Testing

Desktop/Notebook

Task 1: Select and Acquire Products to be Tested

The first step in compatibility testing is to select the products to be tested. SIPSS selected the following Windows applications, which were expected to be among the most frequently used products by the end users:

The Stress test consisted of a and Windows applications executing typical workloads in a continuous loop for 24 hours.  Execution of the entire mixture of applications was followed by the simulation of a cold boot of the system before the mixture was repeated.   The Stress test was executed on systems that were configured for the 1024x768x16bit video mode.

The mixture of applications included the following:

Lotus 1-2-3 for Windows Release 9

Microsoft Word 97 for Windows

Microsoft Excel 97 for Windows

Microsoft Word 2000 for Windows

Microsoft Excel 2000 for Windows

Microsoft FoxPro for Windows 2.6

Adobe Photoshop 6.01/7.0

WordPerfect for Windows 9.0

AutoCAD for Windows Release 13.0

Microsoft Word XP for Windows

Microsoft Excel XP for Windows

 

Task 2: Design Test Plans

The NSTL test development staff has developed a standard format for test plans which provides a detailed audit trail for the testing process. NSTL's test plans are not developed "on-the-fly". Each test plan is specifically designed to identify which functions are to be tested, how to execute each function, precisely the order in which the functions should be tested, and what the test engineer should expect to happen. Using this technique, NSTL provides a scenario by which the NSTL test engineer or the client can repeat any incompatibilities or discrepancies which have been discovered, using the same methodical, step-by-step approach.




D. VERSATILITY TESTING

NSTL conducted versatility testing for each system by assessing the diversity of features 

of each system submitted for testing, and each system's usability.

1. Usability

During the course of the benchmark testing, NSTL test engineers reviewed and scored each system on a number of predefined usability criteria. A list of those criteria is presented below. The NSTL test engineers worked  with the system and documentation to perform some basic tasks, such as setting up the CMOS and installing memory. Each task was scored according to the problems identified or the difficulties experienced by the test engineers.

An overall usability score was derived by averaging the scores for all items. The final score was adjusted to a 10-point scale for easy comparison. An overall usability score of 10 thus represents the highest possible score for a system on the usability questionnaire.

Desktop Systems

The usability of each desktop system was evaluated based on the following criteria:

System Teardown

CMOS Setup/Configuration

Documentation

Hard Drive Installation

Monitor

Potential Problems

Ease of Service

Restore CD

Web Site

 

Advanced Servers

The usability of each advanced server was evaluated based on the following criteria:

Cover Removal/Configuration/Internal Access

Hot Swap Usability

CPU/Memory Installation

End-user Documentation

Server Setup/Configuration

Hard Drive Installation

Potential Problems

Ease of Service

 

Notebooks

The usability of each notebook was evaluated based on the following criteria:

Battery and Power Management

Drive Device Management

CMOS Setup/Configuration

Keyboard Usability

Documentation

Pointing Device/Battery/Life Weight

Restore Utility

Potential Problems

Ease of Service

Website

 

2. Features

Prior to the commencement of the benchmark testing, vendors were to fill out a web-based features questionnaire for each system they were submitting. The questionnaire consisted of many items for which vendors specified the particular attributes of each system. During the course of the benchmark testing, NSTL technicians reviewed the features questionnaire and verified the responses against the system and documentation provided for the benchmark. Individual detailed features for each system are presented in a table format in the results section of this report.

The verified features for each system were scored according to the following methodology. Many of the items were recorded for information purposes only and were not included in deriving the score (e.g. CPU manufacturer). A number of items were assigned weights and scored (e.g.  CPU speed). Weighted items were assigned a weighted value. Exact weights for each item can be found in the first column of the features table in the data section of this report. A system received a score of "x" (where x is the score to be applied) for each weighted feature it included and a score of "0" for each weighted feature it did not include. In some instances, answers were awarded a partial score (ie."0.5") for the item. All items for each system was then multiplied by the corresponding weight and compared to the total possible number of points for that category. The final score of each system was adjusted to a 10-point scale for easy comparison. An overall features score of 10, thus represents the highest possible score for a system on the features questionnaire.

Servers

The features of each advanced server were evaluated based on the following criteria:

Equipment Information

Processor

Certifications

System RAM

Hard Disks

SCSI Controller

Chassis, Power Supplies

System Bus and Motherboard

BIOS

I/O Ports

Server Configuration Utility

Server Management

Security

NIC

 

Desktop Systems

The features of each desktop system were evaluated based on the following criteria:

Equipment Information

Processor

BIOS

System Management

Certifications

Security

Video Controller

Video Monitor

I/O Ports

System RAM

Secondary Cache

Hard Disk and Controller

CDROM/Audio

Value Added Services

NIC

 

Notebooks

The usability of each notebook was evaluated based on the following criteria:

Microprocessor

System RAM 

System BIOS

Software 

Security 

Video Display

External Video

I/O Ports

Sound/Audio

Communications

Hard Drive/CD-ROM

Power Supply/Battery

Power Management

Status Indicator

Chassis/Case

Pointing Device

Port Replicator/Docking Station

Portability



E. PERFORMANCE TESTING
  • Standalone System Performance

  • NSTL's Approach to Performance Testing

    One way of expressing a system's performance is through the analysis of the raw measurements of its various components, including the speed of the processor, the number of wait states on memory access, and the speed at which the hard disk reads randomly and sequentially stored data. These measurements allow for the comparison of the performance of the individual system components but in and of themselves do not provide a realistic indication of the performance of the system operating as a whole.

    Software applications vary by the extent to which their performance is dependent upon each of these factors. For example, the performance of some applications, like spreadsheets, is determined primarily by the processing and memory access speeds of the systems. For others, like databases, the speed of the hard disk sub-system is the principal factor affecting performance. The performance of a single application benchmark, however, depends not only on the speed of individual components, but also on the interaction between the different components and sub-systems.

    NSTL's Standalone System Performance Benchmark Suite was specifically designed to assess how effectively a system's sub-systems interact and how that interaction affects overall performance. Since NSTL's benchmarks perform a series of tasks which replicate actual use, the application benchmarks provide a more realistic measurement of how each software package will perform on a particular system. Consequently, results obtained from the Standalone System Performance Benchmark Suite reveal concrete, real-world differences between comparable systems.

    The NSTL Standalone System Performance Benchmark Suite

    The speed at which a computer operates is significant in that it directly affects how much work a user can produce over time. NSTL's suite of Standalone System Performance Benchmarks emulates typical use in order to provide a realistic measurement of how each software package would perform on a range of systems.

    The software applications which were selected by SIPSS/NSTL to measure the performance of 

    competitive hardware systems follows:

    Notebooks:

    Microsoft Word XP

    Microsoft Excel XP

    Adobe Photoshop 7.0

    Microsoft Access XP

    3dMark video test

     

    Desktops:

    Microsoft Word XP

    Microsoft Excel XP

    Adobe Photoshop 7.0

    Microsoft Access XP

    Individual Test Design

    In general, the specific functions tested in the benchmarks were chosen to highlight performance differences between test systems. In designing its benchmarks, NSTL considers both the time required by the system to perform the task as well as the unproductive time a user must wait for the computer to respond. A faster system can enhance the user's productivity by taking less real time to perform a function, ultimately decreasing the amount of unproductive time the user must spend waiting for the system.

    All of NSTL's software application benchmarks test the speed of performing time-consuming functions within each program.  For example, the word processing benchmarks each include a spell check: a relatively lengthy procedure. Although all of the benchmark tests of a single application category exercise the same functions, the results should not be used to compare the relative performance of the different test applications. The benchmarks were designed specifically to test system performance.

    The tasks that make up each benchmark test were structured to emulate the way in which a typical user might perform a series of operations utilizing the test application.  The results for the tests are then reported in time taken to complete each task and converted to "tests per hour".

    Test Automation and Reliability

    NSTL designed the System Performance Benchmark Suite to be as automated as possible.  Automation improves the accuracy and consistency of the timing and procedures. In 

    addition, automation insures that the testing methodologies utilized for the Performance 

    Benchmark Test Suite are identical for each benchmark on each of the systems tested.

    When conducting comparative performance benchmarks, NSTL applies strict testing 

    procedures to ensure the accuracy of the results:

    All machines are similarly configured to minimize variations in test conditions.

    Each computer's hard disk was formatted and contained only the program, operating system, and data files necessary to run the tests.

    An identical procedure was utilized to install the required test files to the hard disk of each of the test systems.

    All benchmarks are run a minimum of two times to obtain consistent and accurate results.

    NSTL has developed the Benchmark Management System in order to automate the NSTL System Performance Benchmarks. The Performance tests are executed with no Network connection and the test systems are rebooted (a simulated cold-boot) after each iteration of each application. The Benchmark Management System controls the sequence of the tests and the collection of the data for each test system.

    Each benchmark records its results in a text file. The control program reads in the results text files and stores them in a database. The database of results allows for easy reporting and comparison of information for different machines and different test runs.

    Battery Life Testing 

    Battery life is highly dependent upon the how a system is used.  Consequently, battery life testing is more subjective than objective.  No two users operate a computer identically, thus it is impossible to emulate real world usage.  Also, the effectiveness of a system’s power management is dependent on system usage.  To reduce subjectivity, NSTL proposes to measure battery discharge and charge under worse case scenarios.  The worst case scenario has APM enabled, but with the least amount of managed power savings.  In other words, no devices will be power managed.  It also has the hard drive accessing a file and simulating keyboard activity.

    Today, all systems should support Advanced Power Management (APM).  APM allows Windows 9x to obtain power information about the battery and to put the system into standby or suspend.  APM also allows Windows 9x to monitor changes in a systems power state.  This information can be used to perform battery life testing.

    NSTL uses its proprietary Windows based Power Monitor program to perform battery life testing.  This program interfaces directly with Windows’ VPOWERD device driver and Windows’ messaging to record power status and Windows events.

    Power Monitor is designed to follow the APM 1.2 specification and Windows power management event WM Power Broadcast message.

    Before testing, battery conditioning must be performed.  This consists of completely discharging the battery, completely charging the battery, completely discharging the battery, and finally completely recharging the battery.  This will help reduce any charge "memory" the battery may contain. APM is enabled with no devices being power managed

    Deviations as depicted in the battery charts from the expected results of a relatively smooth decline are noted and explained below.  For example, if the line plot of the power vs. time is not linear, an analysis of the plot should be explained.

    Typical charts of inferior plots are:

    Sudden drop of power occurs within a very short time frame.  An unsuspecting user could be caught off guard, alarmed, and unable to complete work in the amount of time anticipated.

    A saw tooth or stair case type pattern.  This commonly occurs when the power reporting algorithm uses modulo format (90%, 80%, 70%…20% indicates modulo of 10 is implemented).  Modulo algorithms makes it more difficult to predict rates of power loss by users.

    Flat rate of power loss at either end of chart:

    Very little power loss reports when the battery is fully charged can cause the user to falsely predict a very long battery life.

    Very little power loss reports when the battery power is low causes the user to prematurely halt work in belief there is no power left when in reality there is significant time left before actual power loss.

    Application Server Performance Testing

    One way of expressing a server's performance is through the analysis of the raw measurements of its various components, including the speed of the server's processor, network bandwidth, and the speed at which the server's hard disk reads randomly and sequentially stored data. These measurements allow for the comparison of the performance of these individual components but in and of themselves do not provide a realistic indication of the performance of the server operating as a whole.

    NSTL's Performance Benchmark Suite was specifically designed to assess how effectively a server's sub-systems interact and how that interaction affects overall performance. Since NSTL's benchmarks perform a series of tasks which replicate actual use, the application benchmarks provide a more realistic measurement of how each software package will perform on a particular system. Consequently, results obtained from the various Performance Benchmark Suites reveal concrete, real-world differences between comparable servers.

    Overview

    The server methodology NSTL uses for benchmarking is designed to simulate a real world environment. The testing is based on four different areas of performance: Disk, E-Mail, SQL and Web. Each of these is individually tuned for every server category based on the specifications for that category. The following applications are used in the testing: Internet Information Services (Web), Microsoft SQL Server (SQL), Microsoft Exchange Server and the NSTL Disk test.

    SQL Testing

    This methodology examines SQL server systems to measure performance in the client/server environment and the time taken to process and index millions of records. The testing is setup to simulate a typical data warehouse application.
    Each client station is configured with Windows XP. NSTL conducted a test of the performance with each server running Windows Server 2003. The tests are conducted using client station(s), simulating a number of users accessing the server at one time. This number is configured independently for each category. The servers were configured to run in a 1GB Base-T network topology. TCP/IP communication protocol was utilized between clients and server. Each client station controlled the execution of tests for each of its ‘clients’ and collected the results upon successful completion of the tests. A variety of SQL statements are executed at random on every client. The connection and execution time is recorded for each statement. These times are then averaged across all runs

    E-Mail Testing

    This methodology examines e-mail server software products to determine the maximum rates at which an SMTP server accepts messages and a POP3 client retrieves messages from a POP3 server. For this test, each client PC is configured with Windows XP and the server with Windows Server 2003. The e-mail server tests stress the performance of the server in handling the message delivery and retrieval. The performance test looks at the optimum thread rate where the server is able to accept and retrieve messages without compromising performance (without losing bits/s). Servers are expected to perform well under heavy load
    The focus of the test suite is on performance, with a goal of determining the number of messages per second the e-mail server is able to accept (SMTP), and at the number of messages per second the server is able to send (POP3)
    Two NSTL traffic generation tools are utilized to determine the highest rate at which a server could accept messages and the maximum rate at which messages could be retrieved:

    • SMTPTEST – an NSTL developed tool to generate SMTP traffic. (Messages are passed to user accounts on the server.
    • IMAPCLIENT – an NSTL developed tool to generate POP3 (Get) traffic. All messages on the server are retrieved.

    The score is based on a twenty five minute window of activity in which the number of messages per seconds is recorded. The harmonic mean of three runs is taken and a small CPU factor is added to distinguish between high and low CPU utilization.

Web Testing

This methodology examines web server systems to determine how well each performs as an organization’s web server.  For this methodology, the tested systems were pre-configured with IIS software. The server systems ran under Windows Server 2003.
NSTL’s Web Server Benchmarks stress the Web Server’s ability to handle requests HTTP. NSTL’s benchmarks simulate a realistic use of the web server system under test by using network interface cards to connect to the workstations, and by imposing heavy loads.  Overall transaction time for each client data transfer is measured and reported under various load conditions, and these performance ratings for various scenarios are calculated from individual performance scores for the weighted benchmarks.
There are up to 4 physical clients in an Ethernet network connected to the test product web server via the 1Gbps Ethernet switch.  One server is tested at a time.

Disk Testing

This methodology examines disk subsystem performance by simulating disk usage in a server environment. The testing is setup to simulate various types of disk activities including reading and writing both randomly and sequentially. Each run begins with a preparation stage to assure that the disk cache is cleared and that it does not affect the results. The tests are done using several different block sizes. The total time and throughput are measured and reported for every test. The harmonic mean of these results is compared to a theoretical maximum value which is then used to calculate the overall performance scores for the weighted benchmarks.

 

    • .

About NSTL | Legal | PWGSC | Canada Site | Home
If you have any questions or comments on the Benchmark Testing Report please contact Ian Kirk.
If you have technical questions regarding the site, please contact the Webmaster.