Visualization of Web Based Information and Sense Making

James A. Levin
Daniel A. Kauwell

University of Illinois

Paper presented at Symposium 15.49
"Technology-based Tools for Learning: Analysis and Synthesis in the 21st Century"
Annual Meeting of the American Educational Research Association
Montreal, April 1999

Abstract

More and more schools, teachers and students are being connected to the Internet so that they may access the ever-growing wealth of information on the World Wide Web. Students, often with little or no training, are expected to conduct Internet searches on research topics with which they may have little prior experience. Often these searches produce tens or hundreds of thousands of "hits" delivered in list form, from which the students are expected to begin their research. They must make sense of the information gathered from hundreds of web pages.

To help people make sense of this overload of information, we have developed a heuristic approach to the visualization of information returned by a web search. With our software tool, a user can "see" the information space returned by a web search, view more easily how the sites and the documents within the sites are related to each other, and manipulate that information into a form more useful to the user. Initially sites/documents are arranged by a series of weights such that the site most likely to contain the most relevant information is displayed in the center of the information space and other sites determined to be less relevant are displayed further and further from the center. When viewing all of information space, a "mouse over" displays additional information about the page to help the user determine if this is a page worth visiting before actually going to the web page. Macro and micro views are available to the user with a macro navigational representation of information space available at all times so the user can keep track of where they are and where they have been. Users can rearrange the information space as they like once they begin to visit various sites. Users can also save, annotate, and restructure their information space.

We will report on preliminary studies of the effectiveness of this graphical representation of information returned by a web search in comparison to traditional list based information. We will also discuss future plans and directions for developing and evaluating this approach for helping learners make sense of the wealth of information available on the web.

Introduction

There is an ever-growing amount of information on the Internet. It resides in newsgroups, email archives (and listserve repositories), and in various forms on the World Wide Web. Existing interfaces for the location, display, and navigation through this information are quite similar. Newsgroups are displayed in text based "threads" which present discussions in a indented list type format; e-mail archives are usually presented as directories, files systems or lists, and Web resources are found by conducting searches which return lengthy lists of potential matches to search queries. All of these rely on linear (usually vertical) and predominately text based or minimally iconic displays of information. Navigation through the Internet remains problematic with no "meta" views of where one is or where they have been, other than a reliance on color changes in links, browser-based history lists (again linear text lists), or a pull down menu lists of the last URL's visited.

To keep track of the results of a web search, users are currently limited to creating bookmark lists or constructing HTML files (typically lists of links). These linear representations of Internet resources can quickly grow to unwieldy proportions, making the location of previously found information difficult. Furthermore, minimal tools exist to help users to annotate web resources once found, to edit, or to organize these resources for future use.

Finally, the current systems available to users for the display and archiving of information and the navigation through the Internet are lacking in machine-assisted learning (artificial intelligence) and lack collaborative capabilities (the ability for various users to "pool" resources to procure better results). Another problem is that there are hundreds of search engines and many are specialized or may be better for a particular search than others.

While the tools for information gathering and Web navigation remain far from ideal, more and more students are being required to use the Internet and the Web in particular, for a means of gathering information for school work. Many students are new to the Web, inexperienced in collecting data from it and navigating in it. When novices use search terms that are too broadly defined, results can be overwhelming. Often a student can be presented with thousands of "hits" spread over a large number of pages. This presentation of search results can cause the student to get frustrated, which frequently causes them to over-rely on the search engine’s confidence ratings or lead to their getting "lost" in cyberspace. Once frustrated and lost, it is easy for the student to pursue seductive details or irrelevant information strewn in their path, to engage in superficial web surfing rather than web sense making.

Theoretical motivation

Linear displays of search results and list-based archives of information do not map well onto the user’s representations or mental models of the Web. Furthermore, they provide little in the way of cognitive assistance and few perceptual clues to help in the process of finding information, nor do they assist in navigation.

Users have many different representations of the Web (Levin, Stuve, & Jacobson, in press, 1999) and most views of the Web usually have a spatial quality (highway, Web, etc.) that is not represented by the list returned from search engines and indexes. Our approach has been to develop a variety of innovative ways to represent the results of web searches, to assist novices by presenting the results in a way that is easier for them to understand.

In addition, we want to provide a variety of different ways of presenting these results, to help searchers acquire a diverse set of representations of information. It is also important to provide users with a more powerful set of tools for reorganizing the search results, so that they can engage in a more active process of making sense and constructing knowledge from the large set of information made available by web searches.

Such interfaces may be more effective than current displays because they more closely model users’ representations. They can serve as cognitive artifacts (Norman, 1990), "artificial devices that maintain, display, or operate upon information in order to serve a representational function and that effect human cognitive performance."

The VisIT software

To address this problem of searching and learning on the Internet, we are developing and evaluating a software tool which we call VisIT that applies visualization techniques to Internet search results. This development is part of a larger Computer Companion project supported by the Yamaha Motor Corporation conducted at the Beckman Institute for Advanced Science and Technology <http://www.beckman.uiuc.edu> at the University of Illinois, Urbana-Champaign.

VisIT displays the search results in a graphical, user modifiable manner that can be saved for future use and modification. Its graphical display allows for the presentation of search results in a way that conveys information that is not represented in conventional displays of search results, and allows for macro and micro views of the information space to aid in navigation. Numerous perceptual and graphical elements accomplish this display of information, in addition to textual information. VisIT, however, is more than just an initial display that exhibits some of the properties of a cognitive artifact. It allows the user to manipulate the elements of the display and make propositional relations between the elements as well as annotations. This process allows for the user to reorganize the display to better represent the user’s cognitive knowledge of the search space. These user-modified search spaces can serve as an external model of the user’s internal representation.

Here’s a typical sequence of usage of the VisIT software. When you start it up, you see a relatively simple display (see figure 1), prompting you to type in text that is to be passed to your specified search engines.

Figure 1: the initial VisIT screen

Once you type in some text, that text is passed through the Internet to the specified search engines, and the returned results are parsed, compared (with duplicates eliminated), and then displayed (see figure 2).

Figure 2: initial display of search results

Each rectangle in this display represents a web page. Those displayed in color are the actual "hits" returned by the search engines. The others are either top level web site pages, used to group together web pages from the same site, or they are web pages that are linked from the "hit" web pages. If there are links between any of these web pages, they are displayed as arrows between rectangles. The VisIT software computes a weighting for each cluster of web pages, and places those with the largest weights in the center of the display. The URL of each site cluster is displayed above that cluster.

As the user moves the cursor over a rectangle, a "pop-up" box shows the URL for that web page. If the user leaves the cursor on a rectangle, the pop-up box is expanded to show the explanatory text returned by the search engine, as shown in figure 3.

Figure 3: pop-up display of description returned by search engines

The user can add annotations to any web page, and these later can be viewed in the pop-up box.

When the user clicks on any rectangle, two things happen. First, the display of that cluster is shown in a larger format, with the rest of the search space shown in a smaller format, as the "context" for the current focus of interest, as shown in figure 4.

Figure 4: expanded display of selected web pages

At the same time, the user’s web browser is started up and given the URL of the specified web page. The user can then continue to browse or can return to VisIT to view the search space and select other web pages to view.

The user can save this search space, and then re-open it later for continued searching. The user can also re-organize this search space in a wide variety of ways. The user can delete web pages that he/she doesn’t want. The user can move similar web pages to any part of the screen, to "cluster" the web search results in a meaningful way.

Figure 5: rearranged display of web pages

The user can then create "groups" that contain any number of search result pages. Once grouped, these pages treated as a conceptual unit, and can be moved and deleted together.

Figure 6: creating a group of web pages

We allow user to control of the display window and many of the mechanisms that drive it. So users can pick colors, turn arrows on and off, as well as other features. Also, we allow for multiple search engines and we filter out duplicates URL's, arrange URL's by site, and have hits from various search engines displayed in one site representation. Our goal is to allow users to customize the presentation of search results in ways that range from something close to the current conventional display of search results and the fully automated graphical presentation of the search space that is currently the default for the VisIT software.

Research on Web Searches

Formative Studies

We have been conducted extended formative studies of the use of the VisIT software, to guide our continued development and as a pilot of procedures and data collection for a larger comparative study to be described later. Several expert web users participated in these studies. We will report here our observations of two expert users of computers, who have conducted numerous web searches with conventional interfaces.

For each session, the subject was asked to select a web search that would be of help to some current project the subject was currently pursuing. The subject was asked to speak aloud and what they said was tape-recorded. The sequence of actions taken by the subjects was also recorded by the computer. In addition, the authors were present during these uses of the software, and took field notes of the processes the formative study subjects went through.

Session 1

The first session for each of the formative study subjects was conducted in December 1998. Each subject spent an hour interacting with the VisIT software. Subject #1 did a search on "technology and education". He began exploring the web pages returned in the center, and deleted those that did not interest him. He requested the ability to move related pages to a corner of the screen, a feature that was not implemented at that time, but that has been implemented since then. When he had a small number left, he requested that we email him the URLs for the pages remaining.

Subject #2 did a search on "online courses". She started scanning the web pages returned starting in the top left corner of the display and continuing to the right, then continuing on the left end of the second row of pages (English reading order). She requested the ability to annotate some of the nodes, again a feature that had not been implemented at that time, but that has since been added. She used the highest level domain of the web page URL as a basis of which "hits" to look at in more detail (if it had a ".edu" domain, she was more likely to look more closely). This is one of the criteria that the VisIT software uses to decide which web pages to display in the center of the screen.

When she was done, she also requested that we send her the URLs that she found. She found some that were valuable, even though she had been doing extensive searches using conventional search interfaces during the weeks before this session.

Each of the two subjects used a different strategy for processing the pages returned by Web search engines, but each was quite excited at the end of the hour about the usefulness of the VisIT software for conducting Web searches.

Session 2

The same two formative study subjects each spent another hour with the VisIT software in April 1999 in a second session. Subject #1 conducted a search for "corporal punishment", which was a topic he was interested in based on a recent discussion. Not surprisingly, the search returned a number of "x-rated" sites, but Subject #1 was pleased to see that many of these were clustered by the VisIT software in the same part of the screen, so that he could easily select and delete them all at once. He also moved similar web pages together into different places on the screen, as he found that they were similar. He used the "clustering" mechanism, both to put together web pages that were conceptually similar and as a means of deleting a group of web pages all at once. He also ran into a problem with moving a cluster he created, since the web pages in the cluster did not move at the same rate as the cluster. This was due to a bug in the program. One overall negative comment was on how slow the system operated.

Subject #2 tried to conduct a search on gardening, but the system was very slow and had to be restarted several times, so she did not get much useful searching done. She did recommend that instructions for using the system be provided. We will reschedule her as soon as a more stable version of the system is available.

A comparative study

We have been conducting a comparative study of 18 undergraduate students, examining in detail their use of conventional search engines as compared to their use of the VisIT software. These students are in a section of the Educational Psychology 211 class at the University of Illinois, Urbana-Champaign <http://www.ed.uiuc.edu/courses/edpsy211/>, an introduction to educational psychology. About 60% of these students plan to become secondary education majors. Each student has chosen a topic of focus, and is doing web searches to aid in the construction of a web site on that topic by the end of the course. So far, they have completed their use of conventional web search engines, using standard web browsers. We used commercial "filtering" software (CyberSnoop) to record the URLs of the web pages they visited. We also have copies of the web pages they created at the end of each session.

Before they started, each subject filled out a survey (shown in Appendix A). Six of the 18 used the term "library" in their Internet metaphor, while the other 12 mentioned other unique metaphors.

After they finished their session, they filled out a survey shown in Appendix B. Four of the 18 choose as their topic "block scheduling", three chose "gifted education", three chose "inclusion in the classroom", and the other eight students chose unique other topics.

We will soon run these same subjects through the use of the VisIT software, and will then be able to compare their use and their survey results to the use and survey results of conventional search engines.

Future directions

There are several directions we are pursing for research, development, and evaluation in this general area of sense making and knowledge building based on Internet information.

We plan to continue to add additional features for visualizing and manipulating the search space information. We want to add the ability to annotate user-defined groups, to "collapse" the groups to hide the web pages inside to simplify the display in a way that the user can easily expand, to pan across the search space, to zoom in and out. We want to provide a richer display of the representation of each web page, including graphic icons.

Another direction we plan to take is to add more user control over the nature of the interface. Even our preliminary studies have shown considerable individual difference in the uses of the different features of the current VisIT software. Other studies of the nature of expertise have documented that experts have a variety of coordinated representations of their domain of expertise. We want to allow users not only to customize these knowledge building tools, but to be able to be able to develop a "toolkit" of sense making tools, which they can select for specific different purposes. In addition, this feature will make it easier for us to conduct our studies of which of the new aspects of visualization and manipulation are important and which just add unneeded complexity.

Thirdly, we are working with Jerry DeJong and Mark Brodie to add machine learning capabilities to this interface. As a user conducts searches and then constructs knowledge spaces out of the search spaces that are returned, his/her actions can be monitored and used in learning algorithms to determine how the user judges similarity and therefore how the user prefers to have searches displayed. In an initial effort along these lines, Mark Brodie has implemented an algorithm that allows the VisIT software to have a "smart move". Once the user has specified some groups of nodes, the algorithm computes the similarity among all the web pages displayed in the search space. Then, when the user moves one page around on the screen, other pages will move with it, based on how similar they are to it. We plan to continue to implement machine learning algorithms like this, and to evaluate how users react to such automated mechanisms in their knowledge building process.

Finally, we are beginning a line of research that explores collaborative uses of such visualization and knowledge building tools in a community of learners. For example, even though the way that an expert organizes a specific search space may be of limited utility, given the dynamic nature of the Web and the wide diversity of searches that people conduct, we will be investigating whether "meta-knowledge" from experts might be useful for more novice learners to aid in their sense making given search spaces of information. For example, the specific web pages found may be of limited value, but the groups that the experts use may be of value, even for a search that retrieves totally different web pages. We will look into the utility of "meta-knowledge servers", which allow experts to electronically "publish" aspects of their own practice (for example, the set of categories they typically use for a given knowledge building purpose) in a way that novices can draw upon that meta-knowledge when needed.

Summary

We have described a series of explorations of tools for helping learners make sense of the overwhelming complexity presented by web search tools, so that they can build knowledge from the mass of information. We have described the VisIT software as developed so far, have reported on our preliminary studies of the uses of this software by users, and have sketched out some directions that our research is taking. In general, an application of the techniques of visualization developed recently for experts to make sense of massive datasets to the sense making and knowledge building of learners has a lot of potential — our efforts have only scratched the surface of what is possible.

References

Levin, J. A., Stuve, M. J., & Jacobson, M. J. (1999 in press). Teachers’ conceptions of the Internet and the World Wide Web: A representational toolkit as a model of expertise. Journal of Educational Computing Research.

Norman, D. A. (1990). The design of everyday things. New York: Doubleday.

Acknowledgments

We would like to thank Sam Hicks, Qiang Sun, Mitch Fishman, Gerth Zhenzhou, and Michael Hamman for their excellent Java programming of the VisIT software. This research was supported in part by a grant from Yamaha Motor Corporation, and is a part of a larger research project to develop a "Computer Companion" at the Beckman Institute of Advanced Science and Technology at the University of Illinois, Urbana-Champaign.

Appendix A
Pre-survey administered before conventional search session

Name: Date:

1. How much experience do you have with using the Web (Web Surfing)? On a scale of one to five with "1" being no experience and "5" being very experienced, rate yourself as a Web Surfer:

No Experience 1 2 3 4 5 Very Experienced

2. How long have you been using the Web?

First time. Less than 1 year. 1 to 2 years. 2 to 3 years. More than 3 years

3. How many hours a week do you normally use the Web?

Less than 1 1 to 5 5 to 10 10 to 15 More than 15

4. Which of the following Internet base activities have you done (before today)?

Circle all that apply:

Recreational web surfing. Research/homework Email Chats Other (explain)

5. What is your primary use of the Internet?

N/A Recreational web surfing. Research/homework Email Chats Other (explain)

6. Please tell us in your words if you think the Internet is, or is not, an effective instructional medium (use back of page if necessary).

7. How do you define or view the Web? (i.e., what metaphor do you think best describes it?)

8. Where do you most often access the Internet/web.

Home School (if school, in an open lab or office?) Other? (explain)

9. What speed do you usually connect to the Internet/Web at.

Campus Ethernet 56k 33k 28k 14k other unsure.

10. How much experience do you have with using a word processor? On a scale of one to five with "1" being no experience and "5" being very experienced, rate your experience with a word processor:

No Experience 1 2 3 4 5 Very Experienced

11. How much experience do you have creating Web pages? On a scale of one to five with 1" being no experience and "5" being very experienced, rate your experience creating Web pages:

No Experience 1 2 3 4 5 Very Experienced

12. If you have created web pages, how do you usually create them (i.e., write HTML, Composer, PageMill, etc.)

Appendix B
Post-survey after conventional search session

Name: Date

1. What was the question you were researching this week (i.e. what were you looking for)?

2. Did you find the information you were looking for? Yes / no

3. If you answered yes to number 2, how easily did you find the information you were looking for? On a scale of one to five with "1" being very easily and "5" with great difficulty:

Very Easily 1 2 3 4 5 With Great Difficulty

Did you find anything difficult about this assignment? If so what?

Comments/questions:

Last updated: 12 April 1999