Usability testing

Usability testing is a means for measuring how well people can actually use something (such as a web page, a computer interface, a document, or a device) for its intended purpose. If users, or test subjects, have difficulty understanding instructions, manipulating parts, or interpreting feedback, then the developers must go back to the drawing board, improve the design, and test it again. During usability testing, developers are not expected to explain their product to the user, or argue about its merits. The aim is for them to observe a real user use their product in as realistic a situation as possible, so as to discover errors and possible areas of improvement. A common mistake that designers make, for instance, is to focus too much on creating designs that look "cool", but compromise on usability and functionality.

"Caution: simply gathering opinions is not usability testing -- you must arrange an experiment that measures a subject's ability to use your document." 1

Rather than showing users a rough draft of a document and asking, "Do you understand this?", usability testing involves watching people trying to use something for its intended purpose. For example, when testing a set of instructions for assembling a toy, the test subjects should be given the instructions and a box of parts. The phrasing of the instructions, the quality of the illustrations, and the actual design of the toy will all affect the assembly process.

Setting up a usability test involves carefully creating a scenario, or realistic situation, wherein the user can perform a list of tasks using the product being tested while observers watch and take notes. Several other test instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather user feedback on the product being tested. For example, if the aim is to test the attachment function of an e-mail program, the scenario would describe a situation where the user needs to send an e-mail attachment, and ask him or her to go through all the steps to perform this task. The aim is to observe users function in a realistic setting, performing realistic tasks, so that developers can see where they face problems, and what they like. The technique popularly used to gather data during a usability test is called a talk aloud protocol.

What to Measure.

Usability testing generally involves measuring how well test subjects respond in four areas: time on task, accuracy, recall, and emotional response. The results of the first test are the baseline or control measurement; all subsequent tests are compared to the baseline.

Time on Task -- How long does it take users to complete a set of basic tasks? (For example, find something you want to buy, create a new user account, and order the item.)
Accuracy -- How many mistakes did users make? (Can the user correct these errors, if given the proper feedback, or are the errors fatal?)
Recall -- How much information does the user remember, after completing the assigned tasks?
Emotional Response -- How does the user feel about the tasks completed? (Confident? Stressed? Would the user recommend this system to a friend?)

In the late 1990s, Jakob Nielsen, at that time a researcher with Sun Microsystems, popularized the concept of using numerous small usability tests -- typically with only five test subjects each -- at various stages of the development process. His argument is that once you find out that two or three people were totally confused by the home page, you gain very little by watching a dozen more people suffer through the same flawed design. "Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford." 2 He subsequently published his research and coined the term heuristic evaluation.

Bruce Tognazzini advocates "close-coupled testing": "Run a test subject through the product, figure out what's wrong, change it, and repeat until everything works. Using this technique, I've gone through seven design iterations in three-and-a-half days, testing in the morning, changing the prototype at noon, testing in the afternoon, and making more elaborate changes at night." 3