Educational process mining: New possibilities

PART IV New indicators

CHAPTER 12 Educational process mining: New possibilities

Chapter 12 Educational process mining:

New possibilities for understanding students’ problem-solving skills

By Krisztina Tóth

Center for Research on learning and Instruction, Szeged university, hungary.

heiko Rölke, Frank goldhammer

german Institute for International Educational Research, Frankfurt, germany.

and Ingo Barkow

Swiss Institute for Information Science (SII), university of applied Sciences hTW Chur, Switzerland.

The NEPS data collection is part of the Framework Programme for the Promotion of Empirical Educational Research, funded by the german Federal ministry of Education and Research, and supported by the Federal States.

The assessment of problem-solving skills heavily relies on computer-based assessment (CBA). In CBA, all student interactions with the assessment system are automatically stored. Using the accumulated data, the individual test-taking processes can be reproduced at any time. Going one step further, recorded processes can even be used to extend the problem-solving assessment itself: the test-taking process-related data gives us the opportunity to 1) examine human-computer interactions via traces left in the log file; 2) map students’ response processes to find distinguishable problem-solving strategies; and 3) discover relationships between students’ activities and task performance. This chapter describes how to extract process-related information from event logs, how to use these data in problem-solving assessments and describes methods which help discover novel, useful information based on individual problem-solving behaviour.

THE NATURE OF PROBLEM SOLVING

Using Research to Inspire 21st Century Learning

Introduction

In educational measurement the concept of problem-solving skills is of particular importance.

They represent cross-curricular skills that are assumed to be involved in various specific domains like reading, mathematics and science, enabling one to master demands in real-life adult settings (compare Wirth and Klieme, 2003). accordingly, the assessment of problem-solving skills has become an important goal of the organisation for Economic Co-operation and Development’s (oECD) Programme for International Student assessment (PISa) and its Programme for the International assessment of adult Competencies (PIaaC).

Problem solving is not viewed as a one-dimensional construct, but can conceptually be decomposed into analytical aspects and dynamic or interactive aspects. For instance, PISa 2003 assessed students’ analytical problem-solving skills using static problems. Students were required to complete paper-based tasks where all the information needed to obtain a solution was complete and transparent to the problem solver (compare oECD, 2004). In PISa 2012, however, there was a move to assessing interactive problem-solving skills, requiring students to deal with problems where they had to uncover information by interactively exploring the problem situation (compare oECD, 2010).

The problem-solving construct PIaaC assessed refers to “information-rich” problems requiring solvers to locate and evaluate information by interacting with simulated software applications, such as a web browser (oECD, 2012). Constructing valid interactive problem-solving measures requires a computer-based assessment strategy containing complex and interactive tasks, as the problem solver needs to explore and perharps manipulate a (simulated) problem situation. at the behavioural level, such measures do not only provide product data (i.e. the correct or incorrect response), but also rich process data about the test taker’s path to finding a solution. one promising analysis strategy for exploiting this huge amount of unstructured data is called process mining.

Process mining enables scientists to discover new forms of data analysis in educational assessment. While classical measurement (e.g. paper-and-pencil tests) can only provide insights from analysing the end results, process mining can trace students’ abilities to solve the tasks. This provides a more granular picture of the students’ skills, as the way they solve the problems supplies valuable information above and beyond the results. Therefore, process mining as a method can open new ways of analysing the students’ problem-solving behaviour. Nevertheless, the implementation of process mining can be considered an interdisciplinary challenge requiring subject matter experts, item developers, psychometricians and computer scientists to work together to extract, aggregate, model and interpret the data appropriately. The following sections discuss the first steps and approaches involved.

This chapter is divided into five parts. First it gives a brief overview of the type of information that can be extracted from tracked data, and describes how to extract useful information from logs.

The next section presents the process-mining method that allows the analysis of process data retrieved from different educational settings. These background sections are followed by defining the research objectives that can typically be addressed by means of process mining. In accordance with the main objective of this chapter, the next section provides insights into applying process mining to problem-solving behaviour data (e.g. visualisation, clustering, and classification). Finally it describes the implications for online problem-solving assessments.

Background: From logs to knowledge

The way people use their knowledge during assessment cannot be directly observed but the outcomes can be derived from task results. moreover, students’ actions in a computerised environment can be automatically traced as an indicator of the process of using knowledge (Chung

and Baker, 2003). These tracking data (logs) support the observation of students’ online behaviour and enable researchers to examine problem-solving processes and other phenomena.

These log data are automatically accumulated during the assessment, as all the actions of the students are usually stored in generated log files or databases. The process of recording students’

interactions (e.g. clicks, key presses and drags/drops) with a test delivery platform is called data logging. It allows one to reproduce and investigate students’ individual online activity (for example changing an answer, using an embedded link or moving a slider).

The test item in Figure 12.1 demonstrates a problem situation designed by Pfaff and goldhammer (2012) simulating a web search engine. The opening screen of this interactive task displays a list of webpages according to the query (using the keyword “electromagnetism”) with short summaries containing documents’ titles and small parts of the hypertext. Three connected websites can be accessed from the list of search results in Figure 12.1 via hyperlinks.

Figure 12.1. Web search item

Source: Pfaff and goldhammer (2012), “Evaluating online onformation: The influence of cognitive components”.

Figure 12.2 depicts possible human-computer interactions in this simulation in the form of part of a log file. It provides a student’s actions in an extended markup language (Xml) schema and it is composed of five log entries (lines). The first digit in each line is a line reference. The second number is a time stamp (in milliseconds) giving the time the event happened. The third number (“15”) is the task id. The character stream on the fourth position in each line describes an action during the assessment process.

The first line of Figure 12.2 shows the beginning of Task 15 (loading data). The second line shows the time at which all the information of the hypertext item appeared to the student (loaded data). The third records a page request, when the student asked to load the Item15_website1 after 848 153 milliseconds. This student had explored the opening screen of the task containing the lists of the web search (source page: Item15_linklist) and clicked on the first embedded link available in the search results list. after exploring this webpage, the student navigated back to the list of links with a back button (line 4: cbaBackButton). This was followed by visiting the third website (Item15_

website3) depicted in line 5.

The results of related works on investigating students’ learning or test-taking behaviour in online environments (for example, see hershkovitz and Nachmias, 2011, mazza and milani, 2005;

Pechenizkiy et al., 2009) are based on similar log information accumulated during online assessment.

But these raise the following questions (see Tóth, Rölke and goldhammer, 2012a): how can the logs be analysed? how can educational researchers gain insights or at least useful information from students’ activities extracted from the individual assessment processes?

Figure 12.2. Sample log file

Tóth et al. (2012a:2) address these objections by arguing that the process of extracting useful information from large datasets is consistent with the knowledge discovery process in databases of Fayyad et al. (1996) presented in Figure 12.3. First, one has to select the relevant pieces of information that may help to answer research questions from the large amount of log entries (Fayyad et al., 1996). In this phase, information is extracted from the logs and a target dataset is created.

Figure 12.3. The knowledge discovery process

Source: Fayyad et al. (1996), “From data mining to knowledge discovery: an overview”.

afterwards, the target data are preprocessed, i.e. cleaned (for example handling missing data and removing noise) and transformed (normalised and aggregated) into the format required by statistical analysis software (Romero, ventura and garcía, 2008). This transformation step is followed by the analysis phase, choosing the most appropriate analysis methods and applying them to obtain patterns, models or results. The data analyses can be accomplished by applying data-mining techniques (see Baker and yacef, 2009), taking the research aims and the target dataset into account.

The final step is the interpretation and evaluation of (educational) data (Fayyad et al., 1996). as the arrow in this model shows, the knowledge-discovery process is iterative, allowing one to return to previous steps for more effective knowledge discovery.

Analysing problem-solving behaviour data: Educational process mining

The term process mining refers to “the development of a set of intelligent tools and techniques aimed at extracting process-related knowledge from event logs recorded by an information system”

(Pechenizkiy et al., 2009:1). It makes it possible to extend the examination of dichotomous test results, which simply reflect whether the student solved the task correctly or not, with online behaviour data. Furthermore, it “can be seen as a sub-domain of data mining” (van der aalst and Weijters, 2004:239) which requires data that are (totally) ordered in predefined way. The ordering may be given explicitly in terms of time or implicitly in terms of a sequence. Exploiting this ordering, process-mining algorithms try to (re-)construct models based on observed interaction paths within the assessment system.

If process mining techniques are applied to educational data, they are often referred to as educational process mining (EPm; for example by Tröka and Pechenizkiy, 2009). There are three types of EPm: 1) process model discovery; 2) conformance analysis; and 3) process model extension (Pechenizkiy et al., 2009).

Process model discovery seek to construct a model based on log events (van der aalst, 2011) which describes the log data in a compact and concise way. Process model discovery therefore often produces formal models using graphical representations like finite state machines or Petri nets.

using these visual representations, helps item and test developers to better understand how test takers actually interacted with their items and where to look for possible item functioning issues.

Conformance analysis requires an existing theoretical model that can be compared with the discovered model built from log events (van der aalst, 2011). It can therefore confirm whether the observed behaviour is in accordance with the model assumptions (Pechenizkiy et al., 2009). For instance, in the case of a complex problem-solving item, there may be several ways of solving the problem, depending on the level of expertise of the test taker. The question is whether these ways are reflected in the log data as expected. It might be the case that certain solutions are never tried or the other way round, that additional solutions are possible which the item developer did not foresee.

Process model extension also involves an existing theoretical model. But in contrast with conformance analysis, in which the expectations are compared with the tracked data, process model extension (called enhancement) seeks to extend or improve process models based on observed behaviour data (van der aalst, 2011).

While the roadmap for applying all three types of process mining to problem-solving assessments is clearly laid out, its implementation is still a work in progress. To provide an overview of the possibilities of educational process mining this chapter focuses on the first type of process mining, which is process model discovery.

Objectives

The main aim of this chapter is to show how test-taking process data can be integrated into educational assessment. What follows attempts to show how process data can be used in problem-solving assessments to 1) examine human-computer interaction via traces left in the log file; 2) map and cluster students based on response processes to find different problem-solving behaviour patterns; and 3) discover relationships between test takers’ activities and their test performance.

Applying process mining to problem-solving behaviour data

Process mining methods include visualisation, clustering and classification. This section briefly describes these methods and their application in learning and/or assessment environments, including related research work, and discusses their use in problem-solving assessment.

In document The Nature of Problem Solving (Pldal 195-200)