Up Previous Next Title Page Contents

5.2 Evaluating the applicability of modeling techniques

In this section we shall analyze some proposed approaches and their weaknesses in evaluating ISD methods. Most research on methods is based on the assumption of applicable methods. ME research is no exception: otherwise ME principles would not be proposed. ME research includes, however, one major difference to most method research, namely the situational dependency. Whilst method developers often aim to prove the general applicability of methods, ME research is interested in improving the method use in given circumstances. We examine method applicability based on the situation in which it is applied. This is also emphasized in our re-evaluation of method use (cf. Section 2.5.1). Similar definitions are also proposed by Fitzgerald (1991) as an “applicability to the case or circumstances”, by Schipper and Joosten (1996) as “serving in the intended purpose”, and by Kitchenham et al. (1995) as focusing on specific cases in which a method is used.

Our focus is on studies which address the applicability of modeling techniques. First, we shall analyze which kind of approaches the developers of text-book methods have applied for validating their methods, and what validation approaches are proposed for situational methods. The former covers evaluation of methods in general and the latter focuses on evaluating methods in their use situations. Our interest is in approaches which can be used to collect, analyze and apply experiences to refine methods. This means that the evaluation approach should not only analyze whether and how a method has met the applicability requirements, but also how it could be improved. Therefore, the analysis does not include approaches which are not linked to evaluating method knowledge or which do not offer opportunities for method improvements, such as Kitchenham et al. (1995) and Jayaratna (1994). Second, we acknowledge some key problems related to method evaluation. Some of these problems are partly solved with the systematic ME principles proposed.

5.2.1 Evaluation and validation of text-book methods

The applicability of a method forms a core assumption in all method development. Most method development efforts, however, do not aim to validate the proposed method at all. Only a few are in any way proven or justified for the tasks for which they are promoted (Fitzgerald 1991). As a result, it is difficult to find how claims made in favor of a particular method — such as “more expressive, yet cleaner and more uniform ... than other methods” (Booch et al. 1997, p 15) or “to support a seamless ISD process, or the reversibility of models” (Walden and Nerson 1995) — can be proven.

By analyzing methods modeled in Chapter 4 two kinds of approaches for determining the viability of a method are used: a demonstration of the method in some imaginary or real-world case, and a comparison with other methods. Both of these approaches are also widely used in studies comparing and evaluating methods, e.g. those reported in the CRIS conferences (Olle et al. 1982, 1983). Neither of these approaches, however, can provide strong evidence for method applicability (Fitzgerald 1991). Demonstration (e.g. Yourdon 1989a, Booch 1991) describes how one or more ISs are captured into models. The subject of validation is mostly a modeling technique (i.e. concepts and notations) rather than the process or design objectives which explain the use of such models. Little information is given about the background of the cases, such as contingencies, participants, method users, or alternative solutions. Weaknesses of the method are not considered or mentioned at all. As a result, “validation” means here only that the method can be used in modeling, rather than that it is useful or can lead to better results than other methods.

The latter, the comparison approach, focuses on similarities and differences between the proposed method and other similar methods (e.g. Firesmith et al. 1996, Booch and Rumbaugh 1995). The proposed method is typically used as a yardstick: little wonder that it is often described to be more comprehensive than others. The main emphasis in evaluation is on explaining why the new concepts are required. Here the justification of the method is normally explained at the type level only, because method use is not addressed at all. Comparisons thus focus mainly on modeling techniques and underlying conceptual structures. One reason for focusing on modeling techniques only is that method books do not usually describe other parts of method knowledge systematically. As a result, the main argument for a particular method is based on the endorsement by an authority.

Yet, as is described in method books[24], one can state that the best “proof” for a method is its use: an assessment of a number of ISD efforts following a method shows its viability. Use of a method’s popularity as a mechanism to prove its applicability is questionable. First, based on this strategy certain modeling techniques, like ER diagrams (Chen 1976) or data flow diagrams (Yourdon 1989a), should be considered to be applicable. There is, however, a great number of dialects available for these techniques. Also, criticisms against some of the principles they apply have been raised. The different versions of the ER model (e.g. Chen 1976, Batani et al. 1992, Teorey 1990) show that no single variant of the ER model is popular. Similarly, criticisms against the top-down decomposition applied in data flow diagrams has been presented (e.g. Goldkuhl 1990, Booch and Rumbaugh 1995). For example, Goldkuhl (1990) claims that top-down refinement of system leads to costly maintenance of designs and to a loss of information between the different levels. Second, the low acceptance of methods in general (cf. Section 2.4.1) and their adaptation in particular (cf. Section 2.4.2) indicates that most methods as proposed by their developers have failed. Therefore, instead of a general validation of methods, we are more interested in the validation of methods in a given situation. This is discussed in the next section.

5.2.2 Evaluation of methods in the problem context

In our literature review we found only a few studies that aimed to validate the applicability of methods more systematically. These are the validation of action modeling by Fitzgerald (1991), and of trigger modeling by Schipper and Joosten (1996). These can be distinguished from earlier method evaluations or validation approaches by their use of explicit measurements as a basis for the evaluation. In the following these approaches are briefly described by explaining the context in which they are used, what aspect of the method is evaluated, how data is collected, and what measures are used. Finally, we describe how the evaluation results are interpreted and applied for method refinements. The results of this analysis are summarized in Table 5-3.

TABLE 5-3 Summary of validation approaches



Fitzgerald (1991) evaluates the richness of a technique in terms of what is abstracted, i.e. its modeling power. He starts the evaluation by describing the objectives and design criteria of the technique. This forms the basis for the evaluation. He also describes the rationale of the technique (i.e. why the technique has been constructed as it is). In the evaluation the modeling technique is related to the larger context of the whole method, but no clear distinction is made between arguments in favor of the method in general, and those arguing for the specific modeling technique. The evaluation is carried out through modeling and trying to find out how well relevant aspects of the object system could be represented. The main instruments for data collection are the use of examples from different situations (but nothing is explained of the type of situations they were (e.g. domain, contingencies)), and why they were selected as modeling subjects. The results are derived based on the researchers’ (acting as method users) opinions of the richness and modeling power of the technique. Thus, the evaluation approach, as noted by the author, is highly subjective and dependent on the selected modeling situations. Furthermore, the approach does not present possibilities to refine methods during or after the evaluation.

Schipper and Joosten’s (1996) contribution to method evaluation is their proposal and use of multiple evaluation instruments. They base their approach on reviewing how validation and evaluation of modeling techniques are studied in other modeling-related areas, and what types of validity are recognized in the literature. They propose a model of validation which focuses on observing how the intentions of the method developer, in terms of associated characteristics of a method, are met. The approach focuses on evaluating a modeling technique separately from other parts of the method, and starts by describing the method developer’s intentions (e.g. to allow modeling of logistic processes) and characteristics (e.g. easy to learn) for the modeling technique. Next, the rationale for the technique is stated by arguing how the intentions are met and relating them as characteristics of the modeling technique. This method construction rationale is derived similarly in Fitzgerald (1991), but Schipper and Joosten also include characteristics other than those related to modeling power. Therefore, the instruments for observing the characteristics are also different. Schipper and Joosten (1996) propose and emphasize the use of instruments (both qualitative and quantitative) in validation depending on the type of intentions. Various instruments can be used simultaneously to check convergent and discriminant validity. The instruments proposed include a literature study, method metrics, analysis of deliverables, content analysis of interviews and measurement scales for ease of use and usefulness. Of these, method metrics (Rossi and Brinkkemper 1996) and ease of use and usefulness (Davis 1989) are instruments which are not used by Fitzgerald (1991). A major reason for this difference is that Fitzgerald has carried out the validation effort by himself, whereas Schipper and Joosten target their studies to developing an automated method selection procedure in CAME, i.e. to be used also by people other than the method developer.

The study by Schipper and Joosten, however, does not describe how the instruments are applied for data collection and analysis, nor do they provide examples in favor of the evaluated trigger modeling technique. The result of the validation effort should be the list of intentions and arguments derived from the observations based on the instruments. To illustrate the approach some examples are given. A goal to model business processes quickly can be supported if method metrics show that the method is not complex, or users consider it effective and quick to use. Because of the use of multiple instruments the observations made can provide better evidence for how successfully the method fulfills its developer’s intentions.

Finally, unlike Fitzgerald, Schipper and Joosten allow method refinements during the validation process to improve the applicability of the modeling technique. The modifications should, however, ensure earlier observations and intentions remain the same. Although these conditions are understandable, since they focus on validating a fixed method, they do not recognize experience based learning, uniqueness of situations and method evolution. Accordingly, in the following we shall analyze method evaluation approaches which accept, or even promote, method evolution.

5.2.3 Evaluation of methods as a part of a continuous ME process

As discussed in Section 3.2, the evaluation of the applicability of methods in ME research is dominated by a priori evaluation occurring in the method construction phase. To our knowledge, only the learning based approaches to method development (Checkland 1981, Kaasbol and Smordal 1996, Wood-Harper 1985, Mathiassen et al. 1996) indicate the importance of experiences and learning from method use as key mechanisms both to evaluate and to refine methods. Checkland (1981) advocated the learning based approach to method development and evaluation by introducing a cycle of action research in which experiences on method use provide the main source for method modifications; first by using the method and second by learning method use. This cycle is illustrated in Figure 5-1.



FIGURE 5-1 The evolution of a method through a learning cycle (Checkland 1981, p 254).

According to this cycle, ME can be viewed as a continuous and never-ending process, in which experiences are elicited from working with the method. Checkland has used the action research cycle as a key mechanism to develop Soft Systems Methodology (SSM) by repeating the cycle in many development cases and situations. In fact, the cycle for developing SSM began in 1969. The main reason for the cyclic learning based approach seems to be the difficulty of developing methods into a new field. In the case of SSM this means development based on methods applied successfully in developing “hard” systems into methods applicable for soft, human activity systems. This point also confirms the motivation of the incremental approach to ME made earlier (Section 5.1.3).

Because the cycle of method evolution is carried out as action research it is sensitive to the context in which the method is used and thus situation-bound. Although SSM includes the idea of incremental ME the objective of the learning cycle has been to develop general or universal principles for developing human activity systems. In other words, the iterative cycles have not been used to develop SSM towards situation-specific needs in the same sense as in the ME literature, but rather towards learning about various situations in which the SSM is applied. Moreover, Checkland emphasizes that SSM is not a method in the same sense as defined in this thesis but something between a philosophy or a framework, and a method[25]. However, as Checkland notices, some parts of SSM are very close to our definition of modeling technique (e.g. CATWOE). Therefore, from the ME point of view, the learning cycle has been used to define method knowledge in terms of concepts, processes, assumptions and values. Because of our interest, we shall focus only on the incremental development of SSM’s concepts and modeling techniques.

In Checkland’s approach the applicability of a method is evaluated based on its strength as a working device in a process of developing human activity systems. As such it applies a general question for the evaluation: “Was the problem solved” (Checkland 1981, p 192). However, he does not provide detailed principles of how experiences are collected (other than case records), analyzed, and applied when starting the next cycle: i.e. creating or modifying the method. Of course it can be claimed that such more concrete and systematic principles exist but they are not reported. In general, the concepts and notations used to develop conceptual models are not defined and thus not evaluated according to any systematic principles. The only reported exceptions are the root definition according to CATWOE and the sequencing between the stages of the method. Especially the former is relevant for our study, since CATWOE is closest to our definition of a modeling technique. The applicability of the concepts behind the root definition (Customer, Actor, Transformation, Weltanschauung, Ownership and Environmental constraints) were studied by seeking a dozen well formulated root definitions from earlier projects to test that the concepts behind the mnemonic CATWOE could be found. As a result of this analysis, they conclude (Smyth and Checkland 1976) that the CATWOE concepts are relevant because they would speed up the process of finding root definitions and enrich debates. The sequence of the method’s tasks is another example of experience based evaluation, although it focuses more on the process than on modeling techniques. In SSM the transitions between modeling tasks, and thus also between modeling techniques, are left open because examination of earlier studies has revealed that different starting points and sequences are possible. From a modeling technique point of view this means that “conceptual models” of the system under development can be made before root definitions or vice versa. An obvious reason why other parts of the SSM modeling techniques are not evaluated is the universal nature of the method: different human activity systems require different types of conceptual models — which can also be seen from the case studies documented — and therefore their validation in a universal manner is difficult (cf. Section 5.2.4).

The importance of Checkland’s view of method development (1981, 1991) is that it highlights the continuous learning cycle and shows that this cycle occurs at several levels: the IS level, the method level and the ME level. Although Checkland (1981) did not promote the learning cycle as a mechanism to develop methods (in the same sense as defined in this thesis) other researchers have applied it directly to ME (Wood-Harper 1985, Avison and Wood-Harper 1990, Mathiassen et al. 1996, Kaasbol and Smordal 1996). These studies are described in the following.

The developers of Multiview (Wood-Harper et al. 1985) have applied an action research cycle when using and testing the method. Similar to SSM, Multiview was developed as a fixed method and thus it supports narrow situational adaptability (Harmsen 1997) through in-built flexibility. One major distinction of Multiview from other methods and from situational adaptability is that it follows a contingency approach to select among the several techniques it includes. However, concrete suggestions are not given either for ME criteria, or for choosing the components of Multiview (Harmsen 1997).

Mathiassen et al. (1996) have applied the action research cycle for developing a method, called OOA&D (Mathiassen et al. 1995). Here the applicability of the method is evaluated based on how it has supported teaching as a learning device. By eliciting experiences from the students in a class-room setting they have shortened the method refinement cycle. As with other studies, no concrete ME principles for collecting, analyzing and refining methods are given. The authors have, however, distinguished some types of method knowledge which should be specified in ME, namely concepts, guidelines, principles and patterns, but no further details are given about these. The first three are covered by our taxonomy: conceptual structure, process and design objectives. The last one, patterns, deals more with instance level information as it shows partial solutions to IS modeling tasks in specific domains.

These approaches have, however, several limitations in addition to their universal view of methods. First and foremost, they do not include any explicit mechanism to collect, analyze and refine methods. This would be required for more systematic ME. Thus, after method use, no mechanisms are used to study whether the method has been applicable. As such they are general frameworks of method evaluation, rather than applicable principles for evaluating and refining modeling languages.

Second, in all of them an iterative cycle is carried out by the method developers rather than by others. For example, in Mathiassen et al. (1996) the role of students in refining the method is not explained, nor is the frequency of modifications. As a result, the modifications are highly dependent on the method developer’s opinions. No indications are given as to how a larger group of stakeholders can participate in the cycle. In other words, the process and roles involved in ME are not described.

Third, based on what is reported, the learning cycle is applied at a general level rather than related to the method knowledge (an exception is the evaluation of the CATWOE concepts in SSM). Because method knowledge is defined loosely in these approaches, the approaches do not apply any ME languages or tools. If such a more systematic approach to method development had been applied (as proposed by Parsons et al. 1997), it is obvious that method knowledge could also have been specified and evaluated in more detail. One reason why such approaches have not been followed may lie in the aim of situation independent applicability: it is difficult to specify method knowledge in detail and at the same time for general purposes. A good indication of this can been seen in the development of the UML method and its versions (e.g. Booch and Rumbaugh 1995, Booch et al. 1996, 1997) which have become less specified in terms of details documented in metamodels as the need to satisfy more general situations has increased.

To summarize, there is a surprising and disappointing lack of well-documented method evaluation cases, evaluation mechanisms, and criteria. As a result, it is hard to find out from the ME point of view why methods like OOD&A (Mathiassen et al. 1996), or Multiview (Avison et al. 1990) are constructed as they are. For example, which evidence from the use experiences show that a concept of a cardinality should be used in object models (Mathiassen et al. 1996) or in entity models (Avison et al. 1990)?

5.2.4 Problems of a posteriori evaluation

The analysis of the evaluation approaches, and especially their limitations, is not intended to be a criticism of the method development approaches. Rather it indicates the difficulty of method evaluation and why one of the key research questions, “Are methods useful?”, has remained unanswered. In this section our aim is to discuss the difficulties in making a posteriori, use based, evaluations of methods. This view is important, since it allows us to describe how incremental ME principles could solve these problems, and which problems it can not solve.

First, one major reason why method developers have not evaluated or validated their approaches lies in the difficulty of such a task. By applying ‘scientific’ research methods to method evaluation and validation we can not satisfy requirements of scientific theory testing, which involves reducing domain complexity, controlling data collection, and meeting replication requirements (see Galliers 1985, Fitzgerald 1991, Grant et al. 1992). The application of a scientific method typically involves construction of an experiment so that only one or a few factors are identified and studied at a time. This involves breaking the research subject into smaller parts for examination with a smaller number of factors. Hence, the experiment is first conducted in a standard way and then a number of times with one factor changed (ceteris paribus). A larger set of factors can not be considered at a time because of their possible interactions. Thus, an understanding of the applicability of a method, i.e. the big picture, would be constructed on the basis of these small factors. This type of research setting is, however, hard to achieve in daily ISD practices.

The replication requirement is also difficult to meet in ME research because ISD and thus method use is considered situational, or even unique. In this sense, the requirement for replication could be met only in situations where the ME criteria are the same. Moreover, if differences in a method’s applicability occurred between similar (in terms of ME criteria) ISD efforts, there would probably be factors which had not been identified. These factors could even be considered as candidate criteria for ME.

In terms of ME, the evolution should deal with inspecting the applicability of method knowledge according to the ME criteria used in the construction phase. In other words, a posteriori evaluation could focus on studying how a priori factors were satisfied. Was the method applicable in the expected circumstances and contingencies? Did the method help solving the development problems? Did the method satisfy its users’ requirements? Because of the expected complexity of ME criteria it is difficult to study one or some of these in different cases and expect that other criteria do not interfere with the results.

Second, coming up with hypotheses that show the applicability of methods is problematic, because the hypotheses can not be formally tested. According to the scientific approach, when several independent studies have consistently supported the hypothesis it will become a theory or even a law. This type of proof of method applicability is not available, and as Fitzgerald (1991, p 662) sarcastically notes, this has troubled IS research very little. In the context of ME, confirming a hypothesis means that there is some evidence that a method has been applicable. For example, in the case of validating the root definition method, Checkland (1981, p 227) notices that the existence of CATWOE concepts does not guarantee a good definition, but it provides evidence that in a well-formed definition such concepts are used. Coming up with hypotheses is, however, important because we can reject them by finding aspects of applicability which were not fully supported (Kitchenham et al. 1995). In other words, incremental method refinements occur only when a method has not been fully applicable.

A third difficulty in studying method applicability is to ensure that the method has actually been used (Jarke et al. 1994). In terms of our ME scenarios this means that each source of experience should be based on verifiable experiences. In our subset of method knowledge, this problem is bounded: the study of method use in terms of modeling techniques is easier to analyze than the use of other types of method knowledge, such as process (as in Jarke et al. 1994), or that design objectives and assumptions of the method are actually followed. This is also an obvious reason why most validation approaches focus on conceptual structures and modeling techniques. This does not mean that the study of method use in terms of modeling techniques is without problems. For example, method users can apply other modeling techniques than those proposed by the method engineers, and the study of intermediate models, design sketches, or different working versions of models is labor-intensive and costly to analyze for the purposes of ME (Hofstede and Verhoef 1996).

Fourth, the acquisition of experiences is difficult because experiences are personal and subjective (Nonaka 1994), they deal with situations that occurred at one point in time (Schön 1983), and they are often tacit: not all experiences can be made explicit and thus used for method refinements. Not all method knowledge is explicit: practitioners’ method knowledge is partly embedded in their practices and can not be fully described. Furthermore, collecting experiences can be time-consuming and costly. As a result, method evaluations and refinements seem to be highly subjective. For example, Fitzgerald (1991, p 668) believes “that the best that can be achieved is that people may be convinced about a technique’s applicability and usefulness only by argument and example, not by any concept of scientific proof”. It must be noted, however, that subjective perceptions and opinions are vital for the acceptance of methods.

Finally, it is difficult to find what has been the role of a modeling technique (Checkland 1981). A modeling language can be evaluated based on what it has abstracted from the current situation (Fitzgerald 1991) but whether it has provided alternative solutions or choices among them is more difficult to evaluate. As the analysis of the method evaluation literature showed, evaluation has mostly been based on the researcher’s concern that the problem has been “solved” or the problem situation has been improved (Checkland 1981). On the level of a whole method, an evaluation can be carried out more easily (e.g. Kitchenham et al. 1995) because method knowledge can treated in its entirety. Thus, detailed alternative compositions of method knowledge can be neglected. For example, problem solving capabilities can be measured based on the number of errors in the developed program, or whether the IS developed satisfies the user’s requirements. Hence, a method is treated as a whole. In addition, there remains a question whether the problem has been really solved with the method, or have they been solved through other means (e.g. the whole problem disappeared because of external changes). Naturally method users can judge the influence of methods, but evaluation research does not discuss enough how the method users’ experiences are collected and analyzed for improving methods.

5.2.5 Summary and discussion of method evaluation approaches

In this section we have analyzed approaches for carrying out a posteriori evaluation of modeling languages. Our aim was to seek mechanisms for collecting and analyzing methodical experiences, because we believe that the applicability of a method can only be known when the method is used. In short, the analysis shows a lack of instruments for evaluation, and problems in carrying out such evaluations. There seems to be no generally recognized way to determine if a modeling technique has been applicable. The reasons are summarized below.

First, the most important limitation of the approaches is that they do not aim to apply evaluation results to improve the methods. Methods are considered as a whole and evaluation is not targeted to inspect them in more detail. Instead of making small changes to the methods, evaluators often seek to obtain a general proof or disproof. Second, none of the approaches describe the method evaluation process in detail and only Joosten and Schipper (1996) describe some explicit instruments for evaluation. Even in their case, the use of the instruments during the actual evaluation is not explained in detail (Schipper and Joosten 1996). Some of the instruments, like method metrics, do not deal with method use at all. Similarly, most of the instruments applied are used in snap-shot cases. Third, all approaches target the validation to situation-independent methods. Although they recognize various situations of method use, they do not recognize that a method could be situation-dependent. In terms of ME, the evaluation is not targeted only to study whether a method has been applicable in the current case. Some possible reasons for this focus are the search for generality, an aspiration to follow scientific methods, and the method developers’ desire to prove their own methods.

To characterize the incremental approach in relation to the others described above, we have to focus on detailed method knowledge. Similarly, our primary aim is not to seek for a universal validation of methods following a “scientific” proof. Instead we focus on situational validation in which better applicability is sought by making gradual changes to a currently used method.

[24] It may be the case that some validation efforts have been carried out but not described. Similarly, it is most likely that evaluations are performed during method development, but it must be noted that method developers have not described how this has been carried out: i.e. how data is collected, how it is analyzed, and how it has led to improvements in the method.

[25] For the same reason SSM has not been included among the methods modeled in Section 4.

Up Previous Next Title Page Contents