Information Ecology of Collaborations in Educational Settings: Influence of Tool

Mark Guzdial

EduTech Institute and GVU Center

College of Computing

Georgia Institute of Technology

801 Atlantic Dr.

Atlanta, GA 30332-0280

guzdial@cc.gatech.edu

http://www.cc.gatech.edu/gvu/people/Faculty/Mark.Guzdial.html

I. An Information Ecology Perspective of CSCL

Much work in computer-supported collaborative learning (CSCL) focuses on learning at the individual and group level (Koschmann, 1996). Collaborations in this line of research are analyzed for how learning emerges from the group's interactions. The content of interactions (textual, or in some cases, physical and graphical) is the critical piece of analysis, in order to determine misconceptions and growing conceptualizations (e.g. (Roschelle, 1992)).

A somewhat different perspective on studying computer-supported collaborative learning is analysis at a higher level of aggregation: Multiple group or whole class discussion forums, such as studying an entire CSILE knowledgebase (Scardamalia, Bereiter, McLean, Swallow, & Woodruff, 1989) or the newsgroup of an entire class. (I refer to the entire discussion space for a class generically as a forum., which the developers of a tool may describe as a knowledgebase, newsgroup, discussion, or shared space.) The questions at this level are about the behavior of all the participants in the forum. When do students read notes? When do they write notes? What is the level of participation in the class? Does the kind of computer technology and how it is used impact reading and writing behaviors of students?

At such a high level of aggregation it is difficult to make statements about content of notes. We really cannot even determine much about what students are learning and whether they are learning. However, there are benefits to analysis of aggregate behaviors in CSCL forums:

Stuart Card and his colleagues at Xerox PARC refer to aggregate behavior within an information space such as the World Wide Web as an "information ecology" (Card, Robertson, & York, 1996). We participants in an information ecology (referred to as "informavores" in Card's paper) are producers, gatherers, and consumers of information. By studying the rules of behavior and the relationships between variables in the information ecology, we can learn better how to maximize the ecology (i.e., achieve more information at lower cost). Research in information ecology are developing models of the WWW, for example, that describe when pages are created or deleted, and when they are accessed (Pitkow & Pirolli, 1997).

The information ecology perspective can be applied to CSCL forums, as well. Collaboration forums are a kind of information space. While the focus in a learning forum is not just information gathering and consumption, information is being produced (written) and consumed (read). An understanding of student reading and writing behaviors from an information ecologies perspective may help us better understand and better design CSCL environments. Papers that present new tools for CSCL often do present information ecology statistics in describing use of the tools (e.g., MFK/Speakeasy (Hsi & Hoadley, 1997), CoNote (Davis & Huttenlocher, 1995), CoVis (O'Neill, Edelson, Gomez, & D'Amico, 1995)). In this paper, I describe the information ecology for forums used in educational settings in two different collaboration tools, and then contrast these answers with those for other systems.

The results in this paper are based on analysis of 35 collaborative forums, which is over 7000 notes in the collaboration spaces written by 1300 students, teachers, and teaching assistants. These forums are split almost equally between two different kinds of CSCL tools: CaMILE (Guzdial et al., 1996; Guzdial, Turns, Rappin, & Carlson, 1995b) and newsgroups (Taylor, 1996). I also present reading behavior analysis of one forum, and then contrast the findings with those for other information ecologies. The results provide a picture of what is common behavior at a high-level of aggregation and how the difference in tools affects the information ecology of CSCL forums.

II. Data and Methods

The CSCL forums used in these analyses come from two different sources: class newsgroups used in the College of Computing at Georgia Tech and CaMILE class discussions from a variety of different academic units at Georgia Tech. 17 CaMILE discussions and 18 class newsgroups were analyzed. I first describe the two different kinds of collaboration tools (summarized in Table 1), and then describe how the data sets were selected and analyzed.

Newsgroups: Newsgroups are an old form of asynchronous collaboration support on the Internet. It has been used for years to discuss everything from TV shows to the latest computer operating system. Computers on the Internet that subscribe to a given newsgroup agree to distribute notes posted by users to the given newsgroup. The newsgroup is thus distributed across multiple machines, which means that access is improved (if a single machine goes down, students might be able to access the desired newsgroup from another news server) but is difficult to track. Notes are threaded - the newsgroup protocol tracks which notes were composed in response to other notes.

Users read newsgroup messages using one of many available newsgroup readers. The interface and even the modality of messages depends on the newsgroup reader used by an individual, e.g., newer newsgroup readers permit the composition and reading of newsgroup notes combining multiple media, but individual participants with a text-only newsgroup reader do not see the additional media. Most newsgroup readers, by default, show a note only once-unless the participant makes an explicit effort, a viewed note will not be shown ever again (and what that explicit effort is depends on the news reader). The lack of persistence may make a difference in sustaining discussion-if a note is not commented upon immediately, it may be difficult to retrieve for later comment or review.

At Georgia Tech, all College of Computing classes have an associated newsgroup for use by students to discuss the class, ask and answer questions, and perhaps interact with the class teacher or teaching assistants. Other academic units at Georgia Tech also use newsgroups for class discussions, but not all units and not all classes. Use of newsgroups varies dramatically between classes, but is rarely a requirement of the class.

CaMILE: CaMILE (Collaborative and Multimedia Interactive Learning Environment) is an asynchronous collaboration support designed by me and my colleagues in the EduTech Institute at Georgia Tech. CaMILE is a Web-based application, where all access is through a Web browser accessing a single server. The interface is forms-based and is the same for all users. CaMILE discussions are also threaded, as in a newsgroup. Unlike a newsgroup, CaMILE threads are persistent-they are always available to users and do not disappear after viewing. CaMILE was designed to be a CSCL tool. CaMILE provides a form of procedural facilitation like CSILE (Scardamalia, Bereiter, & Steinbach, 1984), where students are asked to identify the type of collaborative note they are posting (e.g., a question, or a new idea, or a rebuttal) and are offered suggestions for productive starter phrases to use in a note of that type. CaMILE notes can contain anything that a Web page can contain. In one forum, approximately 30% of all notes contained some kind of HTML tag (e.g., links out from the note, embedding images, etc.) (Guzdial & Turns, 1997).

An important distinction between newsgroups and CaMILE is that CaMILE supports anchored collaboration. (Guzdial, Carlson, & Turns, 1995a; Guzdial & Turns, 1997). Each individual note can be referenced uniquely through a Web browser. Direct addressing of notes allows for the creation of Web pages that can contain single-click hyperlinks to a thread of discussion (a collaboration space) related to the given Web page. This feature has been used to create comment-and-critique spaces for design reports, question-and-answer spaces for project and assignments, and study spaces for exam review questions. Anchors serve as indices (e.g., all the notes related to a given assignment are in the thread of notes accessed from the assignment Web page) and as reminders of what students are to talk about in a given thread. Typically, teachers create the anchors.
Newsgroups CaMILE
General structureThreaded notes in an asynchronous forum Threaded notes in an asynchronous forum
SearchingNewsreader dependent None
IndexingNone Index through anchors
Persistence of Notes Newsreader dependent, but default is not persistent Persistent
Use of multiple media Newsreader dependent, but not typical In anchors and notes
Location of notesDistributed Centralized

Table 1: Describing and Contrasting Newsgroup and CaMILE Collaboration Tools

Selection of DataSets: Data sets were selected to emphasize larger classes (where more forum activity may occur) and a predominantly undergraduate population. More undergraduate than graduate forums were available, and I predicted (but did not test) that use would differ between undergraduates and graduates. Summary statistics for the two datasets appear in Table 2. Overall, there were 7262 notes analyzed, with 1300 authors. There were 3007 CaMILE notes by 526 authors, and 4255 Newsgroup notes by 774 authors.

CaMILE has been in use for about two years now at Georgia Tech in a variety of different academic units. I chose 17 CaMILE undergraduate class discussions from over the two years of use, eliminating four graduate classes. The units represented are Computer Science (CS), Chemical Engineering (CHE), English (ENGL), History (HIST), and Literature, Culture, and Communication (LCC). From just a brief skim of the CaMILE forum summaries in Table 2, it's clear that there were some very sparse uses (e.g., two to four notes in the entire quarter) and some very narrow distribution of authors (e.g., one author out of a class of 31). These forums were still included in the analysis, as part of the broad range of use which might be expected with a new tool.

Not all academic units at Georgia Tech provide course newsgroups to every course. I chose 18 Computer Science undergraduate course newsgroups at Sophomore-level or above, to be sure that the audience was familiar with newsgroups (from first year CS course newsgroup use) and were at the same academic level (if not same unit) as the CaMILE users. I chose required courses, to be sure of larger numbers of users. I chose 9 courses from each of two quarters (Winter and Spring '97) to get a better spread over time, though not exactly the same as in the CaMILE group.

In general, I can make few assumptions about how the forum was used in the class. From discussion with teachers in these classes and review of the forums (and from personal experience-I am the teacher in five of the six CS2390 offerings and of the CS6397 and CS6398 offerings), I can make some generalizations about use. Use was not required in any of these classes. The main purpose was question asking and answering. CaMILE-using teachers were encouraged to make use of anchored collaborations-many of the CS and CHE classes definitely did use that feature. We might also assume that CaMILE-using teachers, since they sought out use of a new tool, were more interested in collaboration in the classes and may have encouraged its use more (perhaps subtly or implicitly).
CaMILE Classes # Notes # Authors # InClass Newsgroup Classes # Notes # Authors # InClass
CS2390 f96 409 61 81 2360 Sp 446 59 81
CS2390 sp96 464 65 79 2360 Wi 1110 103 75
CS2390 w96 487 57 79 2430 Sp 587 83 92
CS2390 w97 503 60 80 2430 Wi 536 98 89
CS 2390 sp97 452 109 92 2760 Sp 159 45 61
CS4345 w97 35 15 30 2760 Wi 108 54 57
CS6397 w97 141 23 32 3156 Sp 40 20 51
CS6398 sp96 15 7 16 3156 Wi 159 54 49
CHE2208 sp97 13 1 31 3158 Sp 62 16 50
CHE2210 sum96 71 16 40 3158 Wi 26 9 44
CHE2210 win97 103 18 66 3302 Sp 14 6 50
CHE4803 win96 42 9 20 3302 Wi 88 27 47
ENGL1002e sp97 75 29 35 3361 Sp 186 37 49
ENGL1002l sp97 76 28 37 3361 Wi 233 45 47
HIST3043 sp97 4 3 40 3411 Sp 214 43 49
LCC4875 f96 115 23 24 3411 Wi 204 44 50
LCC6607 f96 2 2 4 3431 Sp 79 28 60
3431 Wi 4 3 45

Table 2: Summary statistics for CaMILE and Newsgroup-using dataset classes

Analysis Methods: Analysis focused on writing (information-producing) behavior and reading (information-consuming) behavior. Writing behavior analysis looked at the entire dataset. Reading behavior, however, only looked at the CaMILE CS2390 Spring '97 data (452 notes with 109 authors). Since use of newsgroups is distributed, it is very difficult to get reading behavior data in that tool. CaMILE is centralized, so access data are possible to collect. The Spring '97 quarter was the first forum in which usage data has been collected and analyzed.

Three questions about writing behavior were addressed:

Three questions about reading behavior were addressed:

III. Writing (Information-Producing) Behavior

How much do individual students write over time? On average, a student using either tool wrote 4.8 notes (standard deviation of 9.8). Newsgroup authors wrote slightly less (4.4, SD 10.1) and CaMILE authors wrote slightly more (5.2, SD 9.3). The difference is not reliable (p=0.72 on a two-tailed t-test). Overall, this is about 0.4 notes per student per week of the course.

Figure 1 depicts the distribution of authors and the number of notes that they wrote. 87% of all authors wrote between 1 to 10 notes in the ten week quarter. 92% wrote between 1 and 20 notes. Only 5% of authors wrote more than 50 notes, that is, more than five notes per week.

The authors that write relatively little produce the majority of the notes in a forum. Authors writing 1-10 notes produce 44% of all notes in a forum, authors writing 1-20 produce 60% of the notes. The high-end authors (writing 50 or more notes in a quarter) account for 16% of all the notes in a forum. These findings suggest that forums are not typically dominated by a small number of authors.

Figure 1: Percentage of Authors by Number of Notes Written

How broad is participation in the forum? Overall, 64% (SD 33%) of students registered for a course participated in the collaboration forum for the course. CaMILE participation in each class was slightly lower (60%, SD 30%), and Newsgroup participation was slightly higher (70%, SD 37%). The difference was not reliable (p=0.40).

How many of the notes are in response to others' notes (i.e., threaded)? Overall, 55% notes posted in a forum are in response to other notes. In CaMILE, it's higher at 60%, and in newsgroups, it's lower at 50%. The average length of a thread in across all forums is 2.8 notes (SD 6.5), which suggests that most notes get a response and many get a third note in the thread. Newsgroup threads are shorter: 2.2 notes (SD 2.1). This implies that most threads in a newsgroup are simply a note (perhaps a question) and a response (perhaps an answer). In CaMILE, the average thread length is significantly higher (p<.001, two-tailed t-test): 4.2 notes (SD 10.9). The maximum thread length in any newsgroup was 56 notes, while the maximum in a CaMILE forum was 176 notes. These results suggest that there is not a great deal of sustained discussion going on in these forums, but the tool does play a significant role to play in encouraging more sustained discussion.

IV. Reading (Information-Consuming Behavior)

How much reading do students do? There were 452 notes in the Spring '97 CS2390 CaMILE forum. The average number of notes read per student was 163 (36%), with the maximum being 543 (multiple reads were counted). The standard deviation on reads per student is very large, at 158. On average, students in this forum wrote 3.8 notes each (SD 12.4), giving a read/write ratio of 42.04. (The high number of notes written was 117, by me.)

How much reading does each note receive? As is presented in the discussion section, wide variances in notes read per student are not uncommon. What may be more fruitful is to consider the number of reads attributed to each note.

Figure 2 presents the distribution of the number of reads per note (aggregated across the entire course and all students). The log graph makes the observation more obvious that there are a bunch of notes that get a reasonable amount of reads (between 10 and 100 references over the course of the quarter by 92 students), but there are a few that get almost no attention and another few that are markedly popular with many reads. The maximum number of reads was 229 (for a single note).

Figure 2: Number of Reads per Note and the Log Graph of the Number of Reads

When does a note get read? However, the number of reads tell us little about when reading occurs. Figure 3 shows the percentage of notes across the length of the note's lifetime (difference between first time read and last time read) in days. 38% of all notes were dead in a week or less-they were never accessed again after a week of writing. 81% of all notes were dead in a month or less. Two notes (out of 452) had a lifetime of 67 days, out of the 73 days (from the start of the quarter to final exam) in the forum.

Figure 3: Percentage of Notes across the Lifetime of a Note in Days

V. Discussion and Contrast with Literature

Writing Behavior: A positive finding in these results is that participation in these forums is rather broad-based. Most students participate, and the average author (in terms of number of notes written) creates the majority of the notes. It is not true in these results that a small number of highly-prolific authors are dominating the discussion-at least, in terms of percentage of notes in the entire forum . Through force of ideas or language, a small percentage of authors may actually be controlling the discussion, but that is not possible to determine from this level of analysis.

What is more disturbing is that few authors are writing much. Four or five notes (the average writing by an author in either forum) over the course of ten weeks is not what one might call a broad-based dialogue where individuals are presenting their views and responding to others. However, in contrast with literature on similar forums, these are not surprising findings.

It may be that low rates of student participation are normal in an asynchronous forum, independent of tool, where use is driven by student interest. The literature also shows other examples where use has been driven up much higher, to where one might imagine a dialogue taking place.

While the newsgroup average thread length of 2.2 notes is another indication that students are not conducting much of a dialogue in these forums, the average thread length of 4.2 (with a large standard deviation of 10.9 notes) in CaMILE suggests that thread length is a variable that a tool can influence. We have argued elsewhere that thread lengths are longer in CaMILE due to anchored collaboration (Guzdial & Turns, 1997), based on datasets where we could carefully track use of anchors. There are other factors, besides anchoring, that might be influencing the longer threads in broader use of CaMILE. For example, the persistence of notes may be enabling students to revisit and extend discussions, and the multimedia in notes or anchors may be holding students' attention and may be encouraging revisiting of notes. Both theories are supported by the high rate of reading and even re-reading in CaMILE. For a designer, the good news here is that design of a tool can facilitate what is probably a desirable characteristic, a mediating factor of a successful CSCL forum.

Reading Behavior: Hsi and Hoadley pointed out that reading behavior and the reading-to-writing ratio varied dramatically among students using MFK/Speakeasy (Hsi & Hoadley, 1997). These results are showing a similar huge variance in reading behavior using CaMILE. Hsi and Hoadley have pointed out several variables that influence lurking (reading without writing in a forum) vs. writing behavior, such as gender. A model that explains this variance may be quite complicated.

The perspective of reading per note may be more amenable to modeling. The results presented here support the notion that all notes are not read equally-some get a lot of attention, while some get little attention. Models that describe desirability (in terms of the amount of attention or usage some information receives) have had some success explaining page usage on the WWW. Pitkow in his dissertation (Pitkow, 1997) described a model of note desirability which focus on odds-of-being needed (the observed probability of a page being accessed on the eighth day given any access in the previous week) in terms of the recency of last access (had it been one day ago, two days ago, etc.). For several web sites at different time periods, Pitkow showed that usage data fit a log curve between needs-odds and recency with Pearson's r^2 of 0.95 and better. In short, he found that recency drives access-if a page has been accessed recently (i.e., was found desirable by somebody), it would likely be accessed again soon. But as soon as recency dropped, desirability dropped very quickly.

A similar sense of desirability may be at work in these results. Most pages are accessed only soon after they are written, and their desirability drops quickly over time. Only 20% of the pages have an information lifespan of longer than a month. There are several possible explanations for these results. Perhaps only 20% of the content was worth revisiting. Perhaps better indexing or searching mechanisms may have driven up revisiting. In any case, the CaMILE notes are not generally being accessed as a database of useful information.

It may be that CSCL notes, in general, are subject to the same patterns of access as other information ecologies, such as the WWW. CaMILE usage data may be driven by recency, as on the WWW in general. Results of use on CoNote (Davis & Huttenlocher, 1995), for example, are consistent with the results presented here and with Pitkow's results. Davis and Huttenlocher found that access to CoNote had enormous spikes, where access would increase dramatically (by almost a magnitude) in a short period, and then drop down quickly. They found that these usage spikes correlated very strongly with the dates that problem sets are due. Though they did not specify which annotations were accessed during these spikes, one might imagine that different annotations would be read for different problem sets, which would lead to similar short lifetimes and recency-driven access as in CaMILE. Thus, the CoNote usage data is consistent with the same recency effect seen in WWW usage data and hypothesized in the CaMILE data.

VI. Conclusions

The results presented in this paper begin to paint a picture of the information ecology of CSCL forums such as CaMILE and newsgroups.

These lessons can inform designers of new CSCL tools:

As networking technologies continue to improve and large information spaces such as the WWW are created and utilized, theory of information ecologies can be expected to develop. CSCL forums are also information ecologies, in some ways unique from general access on the WWW but in other ways quite similar. As we better understand the information ecologies of CSCL forums, we can better design and use these facilities in order to better facilitate learning.

References

Bruckman, A. (1994). Programming for Fun: MUDs as a Context for Collaborative Learning, Proceedings of the National Educational Computing Conference (NECC'94) . Eugene, OR: International Society for Technology in Education (ISTE).
Card, S. K., Robertson, G. G., & York, W. (1996). The WebBook and the Web Forager: An information workspace for the World-Wide Web. In M. J. Tauber (Ed.), CHI96 Conference Proceedings (pp. 111-117). Vancouver, BC: ACM.
Davis, J. R., & Huttenlocher, D. P. (1995). Shared Annotation for Cooperative Learning. In J. L. Schnase & E. L. Cunnius (Eds.), CSCL'95 Proceedings (pp. 84-88). Bloomington, IN: Lawrence Erlbaum and Associates.
Guzdial, M., Carlson, D., & Turns, J. (1995a). Facilitating learning design with software-realized scaffolding for collaboration, Proceedings of the Frontiers in Education Conference: American Society for Engineering Education.
Guzdial, M., Kolodner, J. L., Hmelo, C., Narayanan, H., Carlson, D., Rappin, N., Hübscher, R., Turns, J., & Newstetter, W. (1996). Computer support for learning through complex problem-solving. Communications of the ACM, 39(4), 43-45.
Guzdial, M., & Turns, J. (1997). Technological Support for Anchored Collaboration : Draft.
Guzdial, M., Turns, J., Rappin, N., & Carlson, D. (1995b). Collaborative support for learning in complex domains. In J. L. Schnase & E. L. Cunnius (Eds.), Computer Support for Collaborative Learning (CSCL '95) (pp. 157-160). Bloomington, IN: Lawrence Erlbaum Associates.
Hsi, S., & Hoadley, C. M. (1997). Productive discussion in science: Gender equity through electronic discourse. Journal of Science Education and Technology, 6(1), 23-36.
Koschmann, T. (1996). CSCL: Theory and Practices of an Emerging Paradigm. Hillsdale, NJ: Lawrence Erlbaum and Associates.
O'Neill, D. K., Edelson, D. C., Gomez, L. M., & D'Amico, L. (1995). Learning to Weave Collaborative Hypermedia into Classroom Practice. In J. L. Schanse & E. L. Cunnius (Eds.), CSCL'96 Proceedings (pp. 255-258). Bloomington, IN: Lawrence Erlbaum and Associates.
Pitkow, J. (1997). Characterizing WWW Ecologies. Unpublished Unpublished dissertation in the College of Computing, Georgia Institute of Technology.
Pitkow, J., & Pirolli, P. (1997). Life, Death, and Lawfulness on the Electronic Frontier. In S. Pemberton (Ed.), CHI97 Conference Proceedings (pp. 383-390). Atlanta, GA: ACM.
Roschelle, J. (1992). Learning by Collaborating: Convergent Conceptual Change. Journal of the Learning Sciences, 2(3), 235-276.
Scardamalia, M., Bereiter, C., McLean, R., Swallow, J., & Woodruff, E. (1989). Computer-supported intentional learning environments. Journal of Educational Computing Research, 5(1), 51-68.
Scardamalia, M., Bereiter, C., & Steinbach, R. (1984). Teachability of reflective processes in written composition. Cognitive Science, 8, 173-190.
Suthers, D., & Weiner, A. (1995). Groupware for Developing Critical Discussion Skills. In J. L. Schnase & E. L. Cunnius (Eds.), CSCL'95 Proceedings (pp. 341-348). Bloomington, IN: Lawrence Erlbaum and Associates.
Taylor, D. (1996). Process metrics for asynchronous concurrent engineering: Communication in an Internet newsgroup, Proceedings of the 1996 ASME Design Engineering Technical Conferences and Computers in Engineering Conference . Irvine, California: ASME.
Wan, D., & Johnson, P. M. (1994). Computer Supported Collaborative Learning using CLARE: The Approach and Experimental Findings. In R. Furuta & C. Neuwirth (Eds.), Proceedings of CSCW'94 (pp. 187-198). Chapel Hill, NC: ACM.