Much work in computer-supported collaborative learning (CSCL)
focuses on learning at the individual and group level (Koschmann, 1996).
Collaborations in this line of research are analyzed for how learning
emerges from the group's interactions. The content of interactions
(textual, or in some cases, physical and graphical) is the critical
piece of analysis, in order to determine misconceptions and growing
conceptualizations (e.g. (Roschelle, 1992)).
A somewhat different perspective on studying computer-supported
collaborative learning is analysis at a higher level of aggregation:
Multiple group or whole class discussion forums, such as studying
an entire CSILE knowledgebase (Scardamalia, Bereiter, McLean, Swallow, & Woodruff, 1989)
or the newsgroup of an entire class. (I refer to the entire discussion
space for a class generically as a forum., which the developers
of a tool may describe as a knowledgebase, newsgroup, discussion,
or shared space.) The questions at this level are about the behavior
of all the participants in the forum. When do students read notes?
When do they write notes? What is the level of participation in
the class? Does the kind of computer technology and how it is
used impact reading and writing behaviors of students?
At such a high level of aggregation it is difficult to make statements about content of notes. We really cannot even determine much about what students are learning and whether they are learning. However, there are benefits to analysis of aggregate behaviors in CSCL forums:
Stuart Card and his colleagues at Xerox PARC refer to aggregate
behavior within an information space such as the World Wide Web
as an "information ecology" (Card, Robertson, & York, 1996).
We participants in an information ecology (referred to as "informavores"
in Card's paper) are producers, gatherers, and consumers of information.
By studying the rules of behavior and the relationships between
variables in the information ecology, we can learn better how
to maximize the ecology (i.e., achieve more information at lower
cost). Research in information ecology are developing models of
the WWW, for example, that describe when pages are created or
deleted, and when they are accessed (Pitkow & Pirolli, 1997).
The information ecology perspective can be applied to CSCL forums,
as well. Collaboration forums are a kind of information space.
While the focus in a learning forum is not just information gathering
and consumption, information is being produced (written) and consumed
(read). An understanding of student reading and writing behaviors
from an information ecologies perspective may help us better understand
and better design CSCL environments. Papers that present new tools
for CSCL often do present information ecology statistics in describing
use of the tools (e.g., MFK/Speakeasy (Hsi & Hoadley, 1997),
CoNote (Davis & Huttenlocher, 1995), CoVis (O'Neill, Edelson, Gomez, & D'Amico, 1995)).
In this paper, I describe the information ecology for forums used
in educational settings in two different collaboration tools,
and then contrast these answers with those for other systems.
The results in this paper are based on analysis of 35 collaborative forums, which is over 7000 notes in the collaboration spaces written by 1300 students, teachers, and teaching assistants. These forums are split almost equally between two different kinds of CSCL tools: CaMILE (Guzdial et al., 1996; Guzdial, Turns, Rappin, & Carlson, 1995b) and newsgroups (Taylor, 1996). I also present reading behavior analysis of one forum, and then contrast the findings with those for other information ecologies. The results provide a picture of what is common behavior at a high-level of aggregation and how the difference in tools affects the information ecology of CSCL forums.
The CSCL forums used in these analyses come from two different
sources: class newsgroups used in the College of Computing at
Georgia Tech and CaMILE class discussions from a variety of different
academic units at Georgia Tech. 17 CaMILE discussions and 18 class
newsgroups were analyzed. I first describe the two different kinds
of collaboration tools (summarized in Table 1), and then describe
how the data sets were selected and analyzed.
Newsgroups: Newsgroups are an old form of asynchronous
collaboration support on the Internet. It has been used for years
to discuss everything from TV shows to the latest computer operating
system. Computers on the Internet that subscribe to a given newsgroup
agree to distribute notes posted by users to the given newsgroup.
The newsgroup is thus distributed across multiple machines, which
means that access is improved (if a single machine goes down,
students might be able to access the desired newsgroup from another
news server) but is difficult to track. Notes are threaded
- the newsgroup protocol tracks which notes were composed in response
to other notes.
Users read newsgroup messages using one of many available newsgroup
readers. The interface and even the modality of messages depends
on the newsgroup reader used by an individual, e.g., newer newsgroup
readers permit the composition and reading of newsgroup notes
combining multiple media, but individual participants with a text-only
newsgroup reader do not see the additional media. Most newsgroup
readers, by default, show a note only once-unless the participant
makes an explicit effort, a viewed note will not be shown ever
again (and what that explicit effort is depends on the news reader).
The lack of persistence may make a difference in sustaining discussion-if
a note is not commented upon immediately, it may be difficult
to retrieve for later comment or review.
At Georgia Tech, all College of Computing classes have an associated
newsgroup for use by students to discuss the class, ask and answer
questions, and perhaps interact with the class teacher or teaching
assistants. Other academic units at Georgia Tech also use newsgroups
for class discussions, but not all units and not all classes.
Use of newsgroups varies dramatically between classes, but is
rarely a requirement of the class.
CaMILE: CaMILE (Collaborative and Multimedia Interactive
Learning Environment) is an asynchronous collaboration support
designed by me and my colleagues in the EduTech Institute at Georgia
Tech. CaMILE is a Web-based application, where all access is through
a Web browser accessing a single server. The interface is forms-based
and is the same for all users. CaMILE discussions are also threaded,
as in a newsgroup. Unlike a newsgroup, CaMILE threads are persistent-they
are always available to users and do not disappear after viewing.
CaMILE was designed to be a CSCL tool. CaMILE provides a form
of procedural facilitation like CSILE (Scardamalia, Bereiter, & Steinbach, 1984),
where students are asked to identify the type of collaborative
note they are posting (e.g., a question, or a new idea, or a rebuttal)
and are offered suggestions for productive starter phrases to
use in a note of that type. CaMILE notes can contain anything
that a Web page can contain. In one forum, approximately 30% of
all notes contained some kind of HTML tag (e.g., links out from
the note, embedding images, etc.) (Guzdial & Turns, 1997).
An important distinction between newsgroups and CaMILE is that
CaMILE supports anchored collaboration. (Guzdial, Carlson, & Turns, 1995a; Guzdial & Turns, 1997).
Each individual note can be referenced uniquely through a Web
browser. Direct addressing of notes allows for the creation of
Web pages that can contain single-click hyperlinks to a thread
of discussion (a collaboration space) related to the given Web
page. This feature has been used to create comment-and-critique
spaces for design reports, question-and-answer spaces for project
and assignments, and study spaces for exam review questions. Anchors
serve as indices (e.g., all the notes related to a given assignment
are in the thread of notes accessed from the assignment Web page)
and as reminders of what students are to talk about in a given
thread. Typically, teachers create the anchors.
| Newsgroups | CaMILE | |
| General structure | Threaded notes in an asynchronous forum | Threaded notes in an asynchronous forum |
| Searching | Newsreader dependent | None |
| Indexing | None | Index through anchors |
| Persistence of Notes | Newsreader dependent, but default is not persistent | Persistent |
| Use of multiple media | Newsreader dependent, but not typical | In anchors and notes |
| Location of notes | Distributed | Centralized |
Table 1: Describing and Contrasting Newsgroup and CaMILE Collaboration
Tools
Selection of DataSets: Data sets were selected to emphasize
larger classes (where more forum activity may occur) and a predominantly
undergraduate population. More undergraduate than graduate forums
were available, and I predicted (but did not test) that use would
differ between undergraduates and graduates. Summary statistics
for the two datasets appear in Table 2. Overall, there were 7262
notes analyzed, with 1300 authors. There were 3007 CaMILE notes
by 526 authors, and 4255 Newsgroup notes by 774 authors.
CaMILE has been in use for about two years now at Georgia Tech
in a variety of different academic units. I chose 17 CaMILE undergraduate
class discussions from over the two years of use, eliminating
four graduate classes. The units represented are Computer Science
(CS), Chemical Engineering (CHE), English (ENGL), History (HIST),
and Literature, Culture, and Communication (LCC). From just a
brief skim of the CaMILE forum summaries in Table 2, it's clear
that there were some very sparse uses (e.g., two to four notes
in the entire quarter) and some very narrow distribution of authors
(e.g., one author out of a class of 31). These forums were still
included in the analysis, as part of the broad range of use which
might be expected with a new tool.
Not all academic units at Georgia Tech provide course newsgroups
to every course. I chose 18 Computer Science undergraduate course
newsgroups at Sophomore-level or above, to be sure that the audience
was familiar with newsgroups (from first year CS course newsgroup
use) and were at the same academic level (if not same unit) as
the CaMILE users. I chose required courses, to be sure of larger
numbers of users. I chose 9 courses from each of two quarters
(Winter and Spring '97) to get a better spread over time, though
not exactly the same as in the CaMILE group.
In general, I can make few assumptions about how the forum was
used in the class. From discussion with teachers in these classes
and review of the forums (and from personal experience-I am the
teacher in five of the six CS2390 offerings and of the CS6397
and CS6398 offerings), I can make some generalizations about use.
Use was not required in any of these classes. The main purpose
was question asking and answering. CaMILE-using teachers were
encouraged to make use of anchored collaborations-many of the
CS and CHE classes definitely did use that feature. We might also
assume that CaMILE-using teachers, since they sought out use of
a new tool, were more interested in collaboration in the classes
and may have encouraged its use more (perhaps subtly or implicitly).
| CaMILE Classes | # Notes | # Authors | # InClass | Newsgroup Classes | # Notes | # Authors | # InClass |
| CS2390 f96 | 409 | 61 | 81 | 2360 Sp | 446 | 59 | 81 |
| CS2390 sp96 | 464 | 65 | 79 | 2360 Wi | 1110 | 103 | 75 |
| CS2390 w96 | 487 | 57 | 79 | 2430 Sp | 587 | 83 | 92 |
| CS2390 w97 | 503 | 60 | 80 | 2430 Wi | 536 | 98 | 89 |
| CS 2390 sp97 | 452 | 109 | 92 | 2760 Sp | 159 | 45 | 61 |
| CS4345 w97 | 35 | 15 | 30 | 2760 Wi | 108 | 54 | 57 |
| CS6397 w97 | 141 | 23 | 32 | 3156 Sp | 40 | 20 | 51 |
| CS6398 sp96 | 15 | 7 | 16 | 3156 Wi | 159 | 54 | 49 |
| CHE2208 sp97 | 13 | 1 | 31 | 3158 Sp | 62 | 16 | 50 |
| CHE2210 sum96 | 71 | 16 | 40 | 3158 Wi | 26 | 9 | 44 |
| CHE2210 win97 | 103 | 18 | 66 | 3302 Sp | 14 | 6 | 50 |
| CHE4803 win96 | 42 | 9 | 20 | 3302 Wi | 88 | 27 | 47 |
| ENGL1002e sp97 | 75 | 29 | 35 | 3361 Sp | 186 | 37 | 49 |
| ENGL1002l sp97 | 76 | 28 | 37 | 3361 Wi | 233 | 45 | 47 |
| HIST3043 sp97 | 4 | 3 | 40 | 3411 Sp | 214 | 43 | 49 |
| LCC4875 f96 | 115 | 23 | 24 | 3411 Wi | 204 | 44 | 50 |
| LCC6607 f96 | 2 | 2 | 4 | 3431 Sp | 79 | 28 | 60 |
| 3431 Wi | 4 | 3 | 45 |
Table 2: Summary statistics for CaMILE and Newsgroup-using
dataset classes
Analysis Methods: Analysis focused on writing (information-producing)
behavior and reading (information-consuming) behavior. Writing
behavior analysis looked at the entire dataset. Reading behavior,
however, only looked at the CaMILE CS2390 Spring '97 data (452
notes with 109 authors). Since use of newsgroups is distributed,
it is very difficult to get reading behavior data in that tool.
CaMILE is centralized, so access data are possible to collect.
The Spring '97 quarter was the first forum in which usage data
has been collected and analyzed.
Three questions about writing behavior were addressed:
Three questions about reading behavior were addressed:
How much do individual students write over time? On average,
a student using either tool wrote 4.8 notes (standard deviation
of 9.8). Newsgroup authors wrote slightly less (4.4, SD 10.1)
and CaMILE authors wrote slightly more (5.2, SD 9.3). The difference
is not reliable (p=0.72 on a two-tailed t-test). Overall, this
is about 0.4 notes per student per week of the course.
Figure 1 depicts the distribution of authors and the number of
notes that they wrote. 87% of all authors wrote between 1 to 10
notes in the ten week quarter. 92% wrote between 1 and 20 notes.
Only 5% of authors wrote more than 50 notes, that is, more than
five notes per week.
The authors that write relatively little produce the majority
of the notes in a forum. Authors writing 1-10 notes produce 44%
of all notes in a forum, authors writing 1-20 produce 60% of the
notes. The high-end authors (writing 50 or more notes in a quarter)
account for 16% of all the notes in a forum. These findings suggest
that forums are not typically dominated by a small number of authors.
Figure 1: Percentage of Authors by Number of Notes Written
How broad is participation in the forum? Overall, 64% (SD
33%) of students registered for a course participated in the collaboration
forum for the course. CaMILE participation in each class was slightly
lower (60%, SD 30%), and Newsgroup participation was slightly
higher (70%, SD 37%). The difference was not reliable (p=0.40).
How many of the notes are in response to others' notes (i.e., threaded)? Overall, 55% notes posted in a forum are in response to other notes. In CaMILE, it's higher at 60%, and in newsgroups, it's lower at 50%. The average length of a thread in across all forums is 2.8 notes (SD 6.5), which suggests that most notes get a response and many get a third note in the thread. Newsgroup threads are shorter: 2.2 notes (SD 2.1). This implies that most threads in a newsgroup are simply a note (perhaps a question) and a response (perhaps an answer). In CaMILE, the average thread length is significantly higher (p<.001, two-tailed t-test): 4.2 notes (SD 10.9). The maximum thread length in any newsgroup was 56 notes, while the maximum in a CaMILE forum was 176 notes. These results suggest that there is not a great deal of sustained discussion going on in these forums, but the tool does play a significant role to play in encouraging more sustained discussion.
How much reading do students do? There were 452 notes in
the Spring '97 CS2390 CaMILE forum. The average number of notes
read per student was 163 (36%), with the maximum being 543 (multiple
reads were counted). The standard deviation on reads per student
is very large, at 158. On average, students in this forum wrote
3.8 notes each (SD 12.4), giving a read/write ratio of 42.04.
(The high number of notes written was 117, by me.)
How much reading does each note receive? As is presented
in the discussion section, wide variances in notes read per student
are not uncommon. What may be more fruitful is to consider the
number of reads attributed to each note.
Figure 2 presents the distribution of the number of reads per
note (aggregated across the entire course and all students). The
log graph makes the observation more obvious that there are a
bunch of notes that get a reasonable amount of reads (between
10 and 100 references over the course of the quarter by 92 students),
but there are a few that get almost no attention and another few
that are markedly popular with many reads. The maximum number
of reads was 229 (for a single note).
Figure 2: Number of Reads per Note and the Log Graph of the
Number of Reads
When does a note get read? However, the number of reads
tell us little about when reading occurs. Figure 3 shows the percentage
of notes across the length of the note's lifetime (difference
between first time read and last time read) in days. 38% of all
notes were dead in a week or less-they were never accessed again
after a week of writing. 81% of all notes were dead in a month
or less. Two notes (out of 452) had a lifetime of 67 days, out
of the 73 days (from the start of the quarter to final exam) in
the forum.
Figure 3: Percentage of Notes across the Lifetime of a Note
in Days
Writing Behavior: A positive finding in these results is
that participation in these forums is rather broad-based. Most
students participate, and the average author (in terms of number
of notes written) creates the majority of the notes. It is not
true in these results that a small number of highly-prolific authors
are dominating the discussion-at least, in terms of percentage
of notes in the entire forum . Through force of ideas or language,
a small percentage of authors may actually be controlling the
discussion, but that is not possible to determine from this level
of analysis.
What is more disturbing is that few authors are writing much. Four or five notes (the average writing by an author in either forum) over the course of ten weeks is not what one might call a broad-based dialogue where individuals are presenting their views and responding to others. However, in contrast with literature on similar forums, these are not surprising findings.
It may be that low rates of student participation are normal in an asynchronous forum, independent of tool, where use is driven by student interest. The literature also shows other examples where use has been driven up much higher, to where one might imagine a dialogue taking place.
While the newsgroup average thread length of 2.2 notes is another
indication that students are not conducting much of a dialogue
in these forums, the average thread length of 4.2 (with a large
standard deviation of 10.9 notes) in CaMILE suggests that thread
length is a variable that a tool can influence. We have argued
elsewhere that thread lengths are longer in CaMILE due to anchored
collaboration (Guzdial & Turns, 1997), based on datasets where
we could carefully track use of anchors. There are other factors,
besides anchoring, that might be influencing the longer threads
in broader use of CaMILE. For example, the persistence of notes
may be enabling students to revisit and extend discussions, and
the multimedia in notes or anchors may be holding students' attention
and may be encouraging revisiting of notes. Both theories are
supported by the high rate of reading and even re-reading in CaMILE.
For a designer, the good news here is that design of a tool can
facilitate what is probably a desirable characteristic, a mediating
factor of a successful CSCL forum.
Reading Behavior: Hsi and Hoadley pointed out that reading
behavior and the reading-to-writing ratio varied dramatically
among students using MFK/Speakeasy (Hsi & Hoadley, 1997).
These results are showing a similar huge variance in reading behavior
using CaMILE. Hsi and Hoadley have pointed out several variables
that influence lurking (reading without writing in a forum) vs.
writing behavior, such as gender. A model that explains this variance
may be quite complicated.
The perspective of reading per note may be more amenable to modeling.
The results presented here support the notion that all notes are
not read equally-some get a lot of attention, while some get little
attention. Models that describe desirability (in terms of the
amount of attention or usage some information receives) have had
some success explaining page usage on the WWW. Pitkow in his dissertation
(Pitkow, 1997) described a model of note desirability which focus
on odds-of-being needed (the observed probability of a page being
accessed on the eighth day given any access in the previous week)
in terms of the recency of last access (had it been one day ago,
two days ago, etc.). For several web sites at different time periods,
Pitkow showed that usage data fit a log curve between needs-odds
and recency with Pearson's r^2 of 0.95 and better. In short,
he found that recency drives access-if a page has been accessed
recently (i.e., was found desirable by somebody), it would likely
be accessed again soon. But as soon as recency dropped, desirability
dropped very quickly.
A similar sense of desirability may be at work in these results.
Most pages are accessed only soon after they are written,
and their desirability drops quickly over time. Only 20% of the
pages have an information lifespan of longer than a month. There
are several possible explanations for these results. Perhaps only
20% of the content was worth revisiting. Perhaps better indexing
or searching mechanisms may have driven up revisiting. In any
case, the CaMILE notes are not generally being accessed as a database
of useful information.
It may be that CSCL notes, in general, are subject to the same
patterns of access as other information ecologies, such as the
WWW. CaMILE usage data may be driven by recency, as on the WWW
in general. Results of use on CoNote (Davis & Huttenlocher, 1995),
for example, are consistent with the results presented here and
with Pitkow's results. Davis and Huttenlocher found that access
to CoNote had enormous spikes, where access would increase dramatically
(by almost a magnitude) in a short period, and then drop down
quickly. They found that these usage spikes correlated very strongly
with the dates that problem sets are due. Though they did not
specify which annotations were accessed during these spikes, one
might imagine that different annotations would be read for different
problem sets, which would lead to similar short lifetimes and
recency-driven access as in CaMILE. Thus, the CoNote usage data
is consistent with the same recency effect seen in WWW usage data
and hypothesized in the CaMILE data.
The results presented in this paper begin to paint a picture of the information ecology of CSCL forums such as CaMILE and newsgroups.
These lessons can inform designers of new CSCL tools:
As networking technologies continue to improve and large information
spaces such as the WWW are created and utilized, theory of information
ecologies can be expected to develop. CSCL forums are also information
ecologies, in some ways unique from general access on the WWW
but in other ways quite similar. As we better understand the information
ecologies of CSCL forums, we can better design and use these facilities
in order to better facilitate learning.
Bruckman, A. (1994). Programming for Fun: MUDs as a Context for Collaborative Learning, Proceedings of the National Educational Computing Conference (NECC'94) . Eugene, OR: International Society for Technology in Education (ISTE).
Card, S. K., Robertson, G. G., & York, W. (1996). The WebBook and the Web Forager: An information workspace for the World-Wide Web. In M. J. Tauber (Ed.), CHI96 Conference Proceedings (pp. 111-117). Vancouver, BC: ACM.
Davis, J. R., & Huttenlocher, D. P. (1995). Shared Annotation for Cooperative Learning. In J. L. Schnase & E. L. Cunnius (Eds.), CSCL'95 Proceedings (pp. 84-88). Bloomington, IN: Lawrence Erlbaum and Associates.
Guzdial, M., Carlson, D., & Turns, J. (1995a). Facilitating learning design with software-realized scaffolding for collaboration, Proceedings of the Frontiers in Education Conference: American Society for Engineering Education.
Guzdial, M., Kolodner, J. L., Hmelo, C., Narayanan, H., Carlson, D., Rappin, N., Hübscher, R., Turns, J., & Newstetter, W. (1996). Computer support for learning through complex problem-solving. Communications of the ACM, 39(4), 43-45.
Guzdial, M., & Turns, J. (1997). Technological Support for Anchored Collaboration : Draft.
Guzdial, M., Turns, J., Rappin, N., & Carlson, D. (1995b). Collaborative support for learning in complex domains. In J. L. Schnase & E. L. Cunnius (Eds.), Computer Support for Collaborative Learning (CSCL '95) (pp. 157-160). Bloomington, IN: Lawrence Erlbaum Associates.
Hsi, S., & Hoadley, C. M. (1997). Productive discussion in science: Gender equity through electronic discourse. Journal of Science Education and Technology, 6(1), 23-36.
Koschmann, T. (1996). CSCL: Theory and Practices of an Emerging Paradigm. Hillsdale, NJ: Lawrence Erlbaum and Associates.
O'Neill, D. K., Edelson, D. C., Gomez, L. M., & D'Amico, L. (1995). Learning to Weave Collaborative Hypermedia into Classroom Practice. In J. L. Schanse & E. L. Cunnius (Eds.), CSCL'96 Proceedings (pp. 255-258). Bloomington, IN: Lawrence Erlbaum and Associates.
Pitkow, J. (1997). Characterizing WWW Ecologies. Unpublished Unpublished dissertation in the College of Computing, Georgia Institute of Technology.
Pitkow, J., & Pirolli, P. (1997). Life, Death, and Lawfulness on the Electronic Frontier. In S. Pemberton (Ed.), CHI97 Conference Proceedings (pp. 383-390). Atlanta, GA: ACM.
Roschelle, J. (1992). Learning by Collaborating: Convergent Conceptual Change. Journal of the Learning Sciences, 2(3), 235-276.
Scardamalia, M., Bereiter, C., McLean, R., Swallow, J., & Woodruff, E. (1989). Computer-supported intentional learning environments. Journal of Educational Computing Research, 5(1), 51-68.
Scardamalia, M., Bereiter, C., & Steinbach, R. (1984). Teachability of reflective processes in written composition. Cognitive Science, 8, 173-190.
Suthers, D., & Weiner, A. (1995). Groupware for Developing Critical Discussion Skills. In J. L. Schnase & E. L. Cunnius (Eds.), CSCL'95 Proceedings (pp. 341-348). Bloomington, IN: Lawrence Erlbaum and Associates.
Taylor, D. (1996). Process metrics for asynchronous concurrent engineering: Communication in an Internet newsgroup, Proceedings of the 1996 ASME Design Engineering Technical Conferences and Computers in Engineering Conference . Irvine, California: ASME.
Wan, D., & Johnson, P. M. (1994). Computer Supported Collaborative Learning using CLARE: The Approach and Experimental Findings. In R. Furuta & C. Neuwirth (Eds.), Proceedings of CSCW'94 (pp. 187-198). Chapel Hill, NC: ACM.