Lin & Levin Experiment 3

EXPERIMENT 3

Experiment 3 was conducted as an extension of Experiment 2. The purpose of this experiment was to examine whether lengthened training time would lead to different learning development and outcomes between working with complex systems which adopted multiple metaphors and those which adopted a single metaphor for interface design. The experiment followed the basic format of that of Experiment 2 with several modifications.

The first and most important modification was the extension of training time. The time period of the experiment was three times longer than that of experiment 2. Secondly, a second training phase (training phase 2) was added between the training phase (training phase 1) and the testing phase. This phase was composed of six questions on the St. Louis stack, which were typical questions of the three types of tasks. Two questions for each task type were selected. This phase was included for additional training, as well as providing the experimenter with an opportunity to "check" to what extent the subjects had mastered the interface(s) they were assigned to use. The third change for the experiment was that subjects were given a summary of interface characteristics to read after each interface was used. They were also encouraged explicitly, before moving into the testing phase, to think and try to use the strategies they might have acquired from using the St. Louis information system. The last modification made was that only three out of six questions in each question stage were matched to the interface used. Therefore, subjects from both groups had the opportunity to experience tasks that were and were not matched to the interface model they used.

Method

Design

The experiment used a 3 x 3 mixed between and within subjects design (Kirk,1982). The between-subject independent variable was the number of interfaces used for training. It had two treatment levels: I1 = one interface, and I2 = three interfaces. The within subjects independent variable was the training sessions. It had three treatment levels: S1 = session 1, S2 = session 2, and S3 = session 3. The same dependent measures were used for this experiment as those used in Experiments 1 and 2 (response time, errors, blocking).

Subjects

Subjects were six graduate students from the University of Illinois at Champaign-Urbana.

Procedure

Subjects were randomly assigned to one of the two treatment groups: the single interface (SI) group and the multiple-interface group (MI). The three SI subjects were each assigned to use one of the three interfaces.

Training phase 1. The MI subjects first read an on-line instruction which explained how the book interface they would be using operated and what they would be asked to do. They then used the book interface to look for information for six questions about St. Louis. After the questions were completed, the subjects were given a summary of characteristics of the interface they just used. They then moved on to the note card interface through the same procedure and answer another six questions. The third interface they encountered, again through the same procedure, was the map interface.

The SI subjects first read an on-line instruction which explained how the interface they would be using operated and what they would be asked to do. They then worked through the 18 questions in the same order as the MI group on the St. Louis tourist information system.

Training phase 2. After the initial 18 questions, subjects proceeded to training phase 2. The MI subjects were given six questions in random order on the St. Louis information system. Their task was to read the question and then decide which interface would be the most appropriate, in terms of efficiency and effectiveness, for finding the answer of the particular question. They would choose the interface first and then find the information by using the interface.

Training phase 2 for the SI group was no different from training phase 1 except that the six questions were given in random order. After they were done with the questions, they were given a summary of characteristics of the interface they had just used.

The testing phase. Both groups of subjects were tested on the Apple Annual Report HyperCard stack using six questions. Before they started, they read a paragraph which encouraged them to apply their experience and strategies from the St. Louis information system in dealing with the new Apple Annual Report stack.

Response time and search pattern for each question were recorded. Subjects returned for a second session a week later and then a third session two weeks after the second session. Both sessions 2 and 3 used the identical procedure as session 1 but with different questions.

Results

Response time, number of errors, and blocking were, again, used as dependent measures for this experiment. Because the major concern of this experiment was on the development across time, each phase in the experiment was treated as an analysis unit in the hope that it would better illustrate the trends of learning development.

Response Time

A two-way ANOVA with repeated measure was carried out for response time for each of the three phases. The two factors of the analysis were (a) the number of interfaces used for training, and (b) the sessions. Session was the independent variable to which the repeated measures was applied in the analyses. As a general outcome, response time decreased significantly for both groups as training proceeded from session 1 to session 3 at all phases (F(2,8) = 22.97, p<.0005, F(2,8) = 20.07, p<.0008, and F(2,8) = 7.64, p<.014). Figures 5, 6, and 7 compare the response time of the two groups across all sessions in the three phases respectively.

Training phase 1. For training phase 1, the main effect of number of interfaces used and the interaction were not significant. In general, the two groups did not perform differently in training phase 1 across different sessions, although Figure 5 does show differences in development trends of the two groups. The MI subjects, with longer response time to start with, seemed to maintain a steady decrease in response time. On the other hand, though the initial response time was shorter for the SI group, it seemed to be leveling off after session 2. However, the effect was not big enough to reach a significant level.

Training phase 2. For training phase 2, the main effect of number of interfaces used were almost significant (F(1,4) = 6.98, p<.058). The SI subjects were generally faster in training phase 2 than the MI subjects. The interaction of the two factors was also significant (F(2,8) = 15.31, p<.002).

Figure 5: Average response time of training phase 1 for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Figure 6: Average response time of training phase 2 for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Figure 7: Average response time of the testing phase for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Training phase 2 was composed of typical and straight forward questions of each task type. Due to the simpler nature of the tasks, training phase 2 provides us with an opportunity to observe the trends of development of the two groups. That is to say, the learning process should be relatively shortened because tasks in this phase were considered the easiest. By looking at Figure 6, the MI subjects started out significantly slower than the SI group (p<.05) but improved much more from session 1 to session 2 and stayed at about the same level of performance in session 3. In contrast, the SI subjects, having had a good grasp of the interface at session 1, seemed to be staying at the same performance level throughout all three sessions. There appeared to be a distinction between the learning curves of the MI group and the SI group. The question, then, was whether the cross-over point had been reached or not with the training provided in this experiment. The result of a nonsignificant main effect of number of interfaces used for training in the testing phase implied that the cross-over point had not been passed. If this was the case, could the findings provide any clues as to whether learning could be expected to continue to develop along the trends and eventually pass the point?

The number of the correct choices of interface for the MI subjects in training phase 2 provided, to a certain extent, some evidence regarding this issue. The total number of correct choices for the three sessions were 11, 10, and 13 out of 18 respectively. Apparently, the MI subjects had not reached a complete mastery of the three interfaces, though they appeared to be performing at a similar level as the SI subjects in session 3 with the easier tasks of this phase. There was still potential for the continuing development of the MI group along the trend. With the SI group not progressing as much, a later point at which the MI group surpasses the SI group might be expected given even more training time.

The testing phase. For the testing phase, the main effect of number of interfaces used and the interaction were not significant. The null hypotheses for the testing phase could not be rejected. In general, the two groups did not perform differently in the testing phase across different sessions. Although Figure 7 seems to show differences in development trends of the two groups similar to those found in training phase 1, the effect was not big enough to reach a significant level.

Blocking

One other indicator which may lead to clues of different development would be blocking. A two-way ANOVA with repeated measures was performed on blocking for each of the three phases respectively. None of the effects tested were significant. Figures 8, 9, and 10 compare blocking of the two groups in the three phases. Notice the trends of development particularly in training phase 2 and the testing phase, though no significant differences were achieved. One way of knowing whether the different training was bringing different results was to see whether the SI subjects were fixed to a limited set of strategies as a result of extended use of a single interface by showing more signs of being blocked.

Figure 8: Total blocking in training phase 1 for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Figure 9: Total blocking in training phase 2 for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Figure 10: Total blocking in the testing phase for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

By looking at blocking in training phase 2 and the testing phase, especially the latter, the MI subjects seemed to be moving toward less blocking as time went on, but the SI subjects were showing increasing signs of being blocked. The few number of subjects for this experiment might have been a major reason for not being able to demonstrate significant differences.

Errors

Not many errors occurred during the experiment. A two-way ANOVA with repeated measures was performed on total errors. Errors decreased from session 1 to session 3 significantly (F(2,8) = 6.12, p<.024). The other effects were not significant. Figure 11 compares the total errors of the two groups. The number of errors may have been too few to realistically reflect any variation in learning development even though the MI subjects seemed to have had a more difficult start in session 1 than the SI subjects.

Figure 11: Total errors for multiple-interface (MI) and single-interface (SI) groups across three sessions in Experiment 3.

Discussion

Though not all the differences are statistically significant due to the small number of subjects, the results show a difference in learning trends of the two groups. We believe that the results have significant implications for the research questions concerning training with multiple vs. single interfaces and training time. It was especially suggestive that in two occasions (training phase 2 and testing phase blocking) the SI subjects displayed signs of fixation. Indications of being blocked more, possibly due to less flexible mental models, coming out of the SI group led us one step further from Experiment 2. Given sufficient training, users trained with multiple-interface models for complicated information systems may form more flexible and comprehensive mental models for the systems than users trained with just one interface model.

Though this study only samples a small set of information retrieval metaphors, it demonstrates the limitation of using a single metaphor for interface design for complex systems. As discussed earlier, the effectiveness of a single interface system would depend on the interactions among the interface design model, the nature of the tasks to be accomplished, and the users. With this in mind, when given situations where only a single-interface design model was allowed for a system, detailed analyses of the nature of the tasks to be accomplished and the purpose of the system should be performed in order to assure that the interface model would cover as many task characteristics as possible.

Given the limitation of using a single metaphor, adopting multiple metaphors in interface design for complex systems seems to be a more appropriate strategy. To a certain degree, this study demonstrated the possible effects of the interaction between number of models used for an interface and experience with systems on the formulation of mental models for complex systems. The results support the use of multiple-interface design models for complex systems. This strategy is somewhat in conflict with a widely adopted interface design principle -- consistency. On the one hand, the consistency principle urges the conformity to a universal interface metaphor (such as the standard Apple Macintosh desktop interface). On the other hand, the strategy of adopting multiple-metaphor interfaces requires breaking the consistency rule to switch metaphor models within a system.

Interface designers and researchers advocate the principle of consistency because it is an important means for achieving ease of use and ease of learning for computer systems (Apple Computer, 1986; Heckel, 1984; Rubinstein & Hersh, 1984; Schneiderman, 1987b; Stutcliffe, 1988). Indeed, consistency in all aspects of a system helps users to build a coherent system image quickly. If ease of learning and use for beginning users were the only goals of the interface design, then adopting a single metaphor for a system interface for the sake of consistency would doubtlessly be sufficient. The advantages of adopting single-metaphor models over multiple-metaphor models at the initial learning stage were apparent in this study as well. Single-metaphor interfaces definitely provided a less complicated system environment for the users. Therefore, the single interface subjects in this study were able to "get into the ball park" more quickly at the initial learning stage.

Nevertheless, the study also showed the limitations of a single interface model for complex systems. The limitations can be seen from two aspects. First of all, single-model interfaces are limited in their applicability to different situations. As mentioned earlier, it is not uncommon for systems to serve multiple purposes. The results of this study show that a single-model interface is only appropriate for the types of tasks that match its metaphor model. Whichever metaphor is chosen, there would be aspects of the system for which the interface model does not function effectively. Secondly, despite their initial benefits, there is a possible risk that the simplicity of single-model interfaces may result in oversimplified mental models which may in turn limit the applicability of the models. As discussed earlier in this study, subjects may become less flexible as the experience with the single interface increased. The simpler mental models seemed to impede the absorption of new heterogeneous experience. One major trade-off for consistency, in this case, could be that it serves as a barrier in the acquisition of expertise.

For complex systems, time with systems alone does not seem to be sufficient for turning new users into expert users. The richness of the system interface environment may also be crucial. If it is clear that the design goal of a system interface is more than the initial ease of use but rather also to generate comprehensive mental models required for being expert users, adopting multiple metaphors for system interfaces would be a strategy to be considered seriously. The overhead of overcoming the initial difficulty users may have could be worthwhile for achieving better functioning and more flexible mental models.

On to the Conclusion
Back to Experiment 2
Up to the Overview