SALT Interactors

 

Home

Source Code

Metrics

Testing

Links

About This Site

Design

 

 

SourceForge.net Logo

Metrics

 

 

In order to measure the usability of the software, we employed two types of metrics: preference metrics and performance metrics.  Preference metrics are a subjective measurement of how the test subject evaluated the software – whether it was easy to use, if he/she enjoyed the experience, etc.  Performance metrics are objective measurements of the results of the test subject’s interaction with the software – how often errors occurred, how often the test subject spoke a word that was not understood, etc. 

 

 

 

 

Preference Metrics

Preference metrics were measured on a scale from 1 to 5.  In phase 1 of the tests, the test subject was simply asked to give a numerical rating for each.  In phase 2, we assigned specific words to each of the numeric measurements.  Not all preference metrics were used with all interactors.

Clarity – The test subject was told, “Rate how well you understood what the computer wanted you to do.”  In phase 2, the value assignments were: 1=very poorly, 2=somewhat poorly, 3=ok, 4=well, 5=very well

Effectiveness – The test subject was told, “Rate how well you were able to accomplish your task.”  In phase 2, the value assignments were: 1=very poorly, 2=somewhat poorly, 3=ok, 4=well, 5=very well

Ease of use – The test subject was told, “Rate how easy this program was to use.”  In phase 2, the value assignments were: 1=very difficult, 2=somewhat difficult, 3=ok, 4=easy, 5=very easy

Ease of Multiple Selection – The test subject was told, “Rate how easy it was to select multiple items.”  In phase 2, the value assignments were: 1=very difficult, 2=somewhat difficult, 3=ok, 4=easy, 5=very easy

Confirmation – The test subject was told, “Rate how you liked the confirmation of your choices at the end of the dialog.”  In phase 2, the value assignments were: 1=did not like at all, 2=liked somewhat, 3=neutral, 4=liked, 5=liked a lot

Deselection – The test subject was told, “Rate how well changing your selections at the end of the program worked.”  In phase 2, the value assignments were: 1=very poorly, 2=somewhat poorly, 3=ok, 4=well, 5=very well

Prompt Speed – The test subject was asked, “Were the options provided too slow, too fast, or just right?”  In phase 2, the value assignments were: 1=very slowly, 2=somewhat slowly, 3=just right, 4=somewhat quickly, 5=very quickly

Speech Preference – The test subject was asked, “Do you prefer to say the items you want immediately, or select them from a list of options.”  In phase 2, the value assignments were: 1=I strongly prefer to speak the items, 2=I slightly prefer to speak the items, 3=I like having both options, 4=I slightly prefer to select from a list, 5=I strongly prefer to select from a list

Second Prompt – The test subject was told, “Rate how helpful the second prompt was.”  This metric was not used in the second round of testing.

 

 

 

 

Performance Metrics

The following performance metrics were used for these tests, not all metrics were used with all interactors.

Time per iteration – The time (in milliseconds) it took to make a single choice in the interactor.

Error Rate – The rate at which the interactor returned incorrect results  – this is distinct from bad response rate, because it judges the end result of the interactor execution, and not any intermediate data.  The error rate was measured as a percentage of the total number of results returned from the interactor.

Bad Response Rate – The rate at which the test subject uttered a word or phrase that the interactor did not understand. The bad response rate was measured as a percentage of the total number of times the interactor was listening for a response.

No Response Rate – The rate at which the test subject said nothing when the interactor expected a response.  The no response rate was measured as a percentage of the total number of times the interactor was listening for a response.

Second-level No Rate – The rate at which test subjects answered in the negative when asked to confirm their response.  The second level no rate was measured as a percentage of the total number of results returned from the interactor.

 

 

 

 

Phase 1 Metrics

Performance Metrics Calculations

Performance

Metrics

General Comments

Single List Selection

Multiple List Selection

Multiple List Selection with Deselecting

Double Yes / No

Double Yes / No (using DTMF)

Selection from N-best List

Different Prompts for Different Responses (n-ary prompt)

Whisper Prompt

Confirmation and Correction

Multiple Related Field Selection

Help Prompt

Timeout Adjuster

Preference

Metrics

 

Phase 2 Metrics

Performance Metrics Calculations

Performance

Metrics

General Comments

Single List Selection 1

Single List Selection 2

Multiple List Selection 2A

Multiple List Selection with Deselecting  3

Double Yes/No (using Speech) 4

Double Yes/No (using Speech) 5

Selection from N-best List 6

Context-Sensitive Help Prompt

Whisper Prompt 9.A

Whisper Prompt 9.B

Preference

Metrics

 

An excellent definition of the word metric could be found on:
Dictionary.com

 

Portland State University
Capstone Summer 2004