1 Introduction
Since their inception, spreadsheets have been widely used in many scientific and mission-critical applications [
1] such as linear programming and regression [
2], control systems [
3], statistical radiobiological evaluation [
4], drug-alcohol interaction [
5], patient specimen collection [
6], neuroscience [
7], and nuclear fuel production [
8]. Many decision support systems (DSS) in the commercial and social sectors are built on spreadsheets to generate useful information for strategic decision making [
9].
Along with the high popularity of spreadsheet applications [
10], it was found that about 94% of the spreadsheets in use contained faults [
11,
12]. A major reason for a high number of faulty spreadsheets is the accelerating trend in end-user computing (or end-user programming) over the last few decades [
13-
15]. Spreadsheet development, now a prominent example of end-user computing [
16,
17], has shifted from being often done by well-trained IT professionals to something millions of non-technical departmental end users or
end-user programmers are now responsible to do. As most end-user programmers are not well trained in software development and testing [
18], it is not surprising that many spreadsheets they developed are poorly coded and inadequately tested [
19]. Consequently, these spreadsheets are likely to contain faults that are not properly detected and removed before release for daily operational use.
Faulty spreadsheets could result in business risks including: (a) loss in revenue, profit, cash, assets, and tax, (b) mispricing and poor decision-making, and (c) financial failure [
20]. If faulty spreadsheets are used in critical areas such as clinical medicine [
21] and nuclear operations [
8], catastrophic consequences may result. Hence, spreadsheet quality assurance (QA) is a serious issue that cannot be ignored. Our literature review aims to provide a holistic view of how various quality issues of spreadsheets can be addressed. This paper offers the following contributions: (a) an extensive literature review on spreadsheet QA over a 35.5-year period (from January 1987 to June 2022) for target journals and a 10.5-year period (from January 2012 to June 2022) for target conferences; (b) a holistic and comprehensive review of all spreadsheet QA issues over the entire spreadsheet life cycle so that the quality of spreadsheets and the data generated from them can be improved; and (c) identification of existing research gaps in the spreadsheet QA area, shedding light on future research directions.
2 Existing reviews related to spreadsheet QA
Jannach et al. [
22] performed a review mainly on
automated spreadsheet QA approaches. Also, most of their collected papers were technical-oriented and information-technology (IT) in nature. Our review differentiates from their work in several aspects: (a) our literature search included not only IT journals, but also business-oriented information-systems (IS) journals, dual-focused IT/IS journals, and management science (MS)/operational research (OR) journals, (b) our study covers not only automated QA approaches but also static and manual approaches that are at least equally important, and (c) our study covers all the stages of the spreadsheet life cycle.
Thorne [
23] performed a literature review of spreadsheet mistake reduction techniques. Similarly, Powell et al. [
24] conducted a critical review of the literature on spreadsheet errors. Since the publication of these two papers in 2008 or 2009, there have been many new research studies published in the past decade related to spreadsheet QA. More importantly, our literature review in this paper clearly differentiates from [
23,
24] in several significant aspects: (a) our study covers previous work published in a long time period (35.5 years for target journals and 10.5 years for target conferences), (b) our study is based on a well-defined search strategy, (c) the scope of our study is more comprehensive in terms of both the number and type of published works discussed (e.g., journals: technical-oriented IT, business-oriented IS, dual-focused IT/IS, MS/OR; conferences: technical-oriented IT, business-oriented IS), (d) our discussion of spreadsheet QA techniques is set within a spreadsheet life-cycle framework to facilitate in-depth analysis, and (e) our discussion is based on the established IEEE terminology [
25-
27], e.g., making a clear distinction: (i) among mistakes, faults, and failures, and (ii) among different types of static testing.
More recently, Hofer et al. [
28] reported their systematic review on product metrics for spreadsheets. Their paper is fairly restricted in scope — it only focuses on the spreadsheet metrics in several contexts such as fault prediction, refactoring, and risk assessment for audits. Concepts and use of spreadsheet metrics contribute only to a rather small part of the entire spectrum of spreadsheet QA.
3 Methodology and scope
3.1 Target journals and conferences
Our extensive and systematic search of the literature involved a total of 32 target journals plus three reputable professional magazines published by ACM and IEEE, which are known or expected publication outlets for spreadsheet-related studies (see Tab.1). To avoid verbosity, we henceforth refer to the three selected magazines [T06,T07,T08] in Tab.1 as “journals”.
The 35 target journals in Tab.1 can be broadly classified as follows. (a) Technical-oriented IT journals: They publish papers mainly in computer science, software engineering, and other technical IT areas (16 journals [T01–T16]). (b) Business-oriented IS journals: They publish papers mainly in information systems with more emphasis on business applications (14 journals [B17–B30]). (c) Dual-focused journals targeting on IT or IS: They publish both technical-oriented IT and business-oriented IS papers (3 journals [D31–D33]). (d) Management science (MS)/operational research (OR) journals: They publish papers in MS and OR areas (2 journals [M34–M35]).
Besides journals, our literature search also included the following eight conferences:
● ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA),
● ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE),
● ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA),
● IEEE/ACM International Conference on Automated Software Engineering (ICASE),
● IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC),
● International Conference on Software Engineering (ICSE),
● Hawaii International Conference on System Sciences (HICSS), and
● International Conference on Information Systems (ICIS).
In the above, the top six are technical-oriented IT conferences and the bottom two are business-oriented IS conferences. Note that: (a) starting from 2017, papers accepted by OOPSLA has been published annually as a special issue in the Proceedings of the ACM on Programming Languages (PACMPL), and (b) HICSS and ICIS are commonly considered as top-tier conferences on information systems.
3.2 Search strategy and results
We started our literature search for journal papers from January 1987. Choosing this start date is based on the development history of spreadsheets — it was reported that Microsoft launched the Windows operating system in 1987 and Excel was one of the first application products bundled with it [
29]. Our search period was from January 1987 to June 2022 (35.5 years).
Tab.2 lists the search terms we used for searching the 35 target journals in Tab.1. If the title, abstract, or keywords of a paper contains any search term(s) in Tab.2, it would be included in our initial pool of papers. The search was repeated twice: a manual scanning followed by an automatic search (e.g., via the journal’s search engines). This reduced the chance of accidentally omitting some relevant papers. After searching, a total of 544 potentially relevant papers were included in the initial pool. Not all these 544 papers are related to spreadsheet QA. For example, some of these papers are related to spreadsheets in a general context, and some are related to end-user computing outside the spreadsheet domain. We then manually evaluated these 544 papers and selected only those related to spreadsheet QA. To reduce possible omission, this manual evaluation was first done by one of the authors, and was then independently verified by another author. We found 86 such papers: 41 came from technical-oriented IT journals, 30 from business-oriented IS journals, 7 from dual-focused IT/IS journals, and 8 from MS/OR journals (see Tab.3).
We took a holistic view in determining whether a paper in the initial pool is related to spreadsheet QA. Instead of only focusing on some specific activities such as testing and debugging, we opine that a wide range of activities throughout the entire spreadsheet life cycle (e.g., problem identification, specification, modelling, design, implementation, testing, debugging, and maintenance) play a part in spreadsheet QA. For example, if problem identification is not properly performed, the implemented spreadsheet will not satisfy the users’ expectation. In this case, the spreadsheet obviously suffers from a “poor” quality issue. In other words, we determine the quality of a spreadsheet by its ability to satisfy users’ expectation. Accordingly, if a paper is related to any activity in the spreadsheet life cycle that affects the ability of a spreadsheet in fulfilling its users’ expectation, this paper is considered to be related to spreadsheet QA.
We repeated the above search strategy for the eight target conferences listed in Section 3.1 for a 10.5-year period (from January 2012 to June 2022). This search found an additional of 36 papers related to spreadsheet QA. See Tab.4 for the breakdown details. All in all, our search found 122 (= 86 + 36) journal and conference papers related to spreadsheet QA.
The 86 relevant journal papers (see Tab.3) and the 36 relevant conference papers (see Tab.4) formed our “primary” source of literature. We also followed the relevant references (including other journal papers, conference papers, and edited book chapters) mentioned in the papers in the primary source. These relevant references subsequently found formed our “secondary” source of literature.
3.3 Paper analysis
To frame our analysis of the papers, we first conducted a rigorous review of several software-engineering-based approaches which have been advocated in the literature for achieving quality development of spreadsheet software. Based on our review, we decided to adapt and extend an existing spreadsheet life-cycle framework for the purpose of this study. The framework consists of five stages, which are further refined to activities or substages, as detailed in Section 5.
4 Key terminologies and concepts
Inconsistent use of terminologies and concepts is observed in some literature on spreadsheets and even within the software community. Such use differs from the standard meanings attached to the terminologies/concepts in the context of software engineering/development. This inconsistency problem may cause confusion to software practitioners/researchers in the field, and make our subsequent discussion difficult to comprehend. To alleviate this problem, we first discuss the standard meanings of some key terminologies/concepts used in our literature review.
4.1 Failure, fault, mistake, and error
According to the IEEE terminologies [
25-
27], a
failure is an “externally visible” deviation from the system’s specification, which is caused by a
fault in the program, which in turn is created due to a
human mistake (or simply
mistake). Suppose, for example, we test a spreadsheet by entering some input data and then examining the output, and a discrepancy between the expected and actual results is found. This “observed” discrepancy represents a failure, indicating the existence of one or more faults (e.g., incorrect formulae) in the spreadsheet. These faults were introduced into the spreadsheet by its developer because of human mistakes such as typos or wrong logical deduction. Note that mistakes cannot be determined by solely examining the software, as opposed to observing failures and identifying faults.
Consider, for example, a spreadsheet with a formula that calculates a salesperson’s sales commission given his/her total sales s and the applicable commission rate r. According to the company policy, if s ≥ $1,500, r = 5%. Otherwise, r = 3%. Suppose the value of s is stored in the cell F6. The correct formula should be “= F6 * (IF(F6 >= 1500, 5%, 3%))”. When s = $1,500, the correct output should be $75 (= $1,500 × 5%). Suppose the actual formula coded in the spreadsheet is “= F6 * (IF(F6 > 1500, 5%, 3%))” and, hence, the actual output is $45 (= $1,500 × 3%). In this case, the miscoding of the arithmetic operator “>” (instead of “>=”) in the formula represents a fault, and the discrepancy observed between the expected result (= $75) and the actual result (= $45) represents a failure. Unless we check with the spreadsheet developer, it is impossible to determine the underlying cause (e.g., whether a typing mistake or misinterpretation of the company policy in the developer’s mind) for making this mistake.
Senders and Moray [
30] define an
error as “an action that is not intended by the actor [a mistake]; not desired by a set of rules [a fault caused by a mistake] or an external observer [a failure]; or that led the task or system outside its accepted limits [a fault leading to failure(s)]”. Here, an error is defined in a “coarse” manner that collectively refers to a failure, a fault, and a mistake. Sheridan [
31] defines (
human)
error as “an
action that fails to meet some implicit or explicit criterion”. If the term “action” is interpreted as “human action”, then Sheridan’s definition of “error” is similar to the meaning of “mistake” in the IEEE terminology. In this way, the definition of Sheridan [
31] is relatively narrow in scope when compared with the definition of Senders and Moray [
30]. To further add to the inconsistency, Panko and Sprague [
32] state in their paper that “programmers refer to what we call errors as faults”.
In this paper, as far as possible, we use the terms “failure”, “fault”, and “mistake” in accordance with the IEEE terminology. When there is no ambiguity, we also use the term “error” (by following the definition in [
30]) to refer to failure, fault, and mistake in a collective manner.
4.2 Different types of testing
Testing is a QA technique in software development, and is often classified into either dynamic or static [
33,
34].
Dynamic testing involves executing the software system with test data, and then verifying the output and operational behavior of the software [
34].
Static testing, however, refers to the examination and analysis of a work product (e.g., a software system or document). If the work product is a spreadsheet, then static testing does not involve directly executing the formulae or macros of the spreadsheet to compute outputs based on some given test data. Technical reviews, inspections, audits, and desk checking are examples of static testing [
35]. Static testing is sometimes called
human testing when done manually [
34], but nowadays automated static analysis tools are also available for spreadsheets. For example, MS Excel provides a built-in automatic static analysis tool that enables a spreadsheet developer to trace precedents and dependents for each cell value. Neither dynamic testing nor static testing is considered sufficient on its own. It is found that dynamic and static testing are largely complementary to each other in terms of the types of faults detected [
36].
Tab.5 outlines some common static testing techniques based on the IEEE terminology. In this paper, when discussing the relevant spreadsheet QA studies, we follow the definitions in Tab.5 as far as possible. For example, inspection refers to a QA exercise that takes a formal, team-based approach, while a “one-person inspection” coined by some researchers will be referred to as desk checking which is often, but not necessarily, done by the author of the work product being reviewed. An author refers to the person who generates/develops the work product.
Obviously, the target of dynamic testing is the software program (which is a spreadsheet in the context of this paper). For static testing, however, the scope is broader. Static testing can be applied to a software program or any non-software product (e.g., a software specification document). Since the focus of this paper is on spreadsheets, when we discuss static testing, we assume that it is applied to spreadsheets only. This implies that static testing (same as dynamic testing) is applied after a spreadsheet has been implemented. Because of this, we will discuss static testing and dynamic testing in Section 6.4 (after discussing the “implementation” stage in Section 6.3).
Recall in Section 4.1 the difference in meaning between “failure”, “fault”, and “mistake”. In general, a static testing technique requires visual examination of a spreadsheet, including its embedded formulae, for correctness. Thus, static testing is typically focused on fault detection. In dynamic testing, the tester executes a spreadsheet, followed by comparing the expected output and the actual output to look for externally observable discrepancy between the two outputs. Hence, dynamic testing often targets at failure detection. Upon observing a failure, the tester will perform fault localization (a debugging task) to identify and locate the faults in the spreadsheet which cause the failure. This will then be followed by fault removal or repair (another debugging task).
Besides, testing can also be broadly classified into functional and performance. Functional testing aims at evaluating the system’s compliance with its specified “functional” requirements, whereas performance testing covers “non-functional” aspects of a system such as efficiency, scalability, reliability, and usability. This paper mainly focuses on functional testing of spreadsheets. In the rest of the paper, we simply refer to functional testing as “testing”. By a similar reason, the term “quality assurance (QA)” used in this paper mainly focuses on the “functional” aspect of spreadsheets.
5 Software engineering-based spreadsheet life cycle
Following the growing importance of spreadsheet applications, some researchers (e.g., [
18]) proposed to use a software engineering-based approach to spreadsheet development. Some even coined the term “spreadsheet engineering” to emphasize the importance of this development approach [
39,
40]. Along this direction, the research community generally argues for a life-cycle approach to spreadsheet QA, because mistakes and faults could be introduced at any stage of the development life cycle [
41].
We found five studies [
41-
45] related to the spreadsheet life cycle. With respect to the main focus, these studies can be classified into two categories. The first category includes three studies [
41-
43]; they used a spreadsheet life-cycle framework to facilitate the discussion on the
existing practices and
problems associated with each life-cycle stage. The second category includes the other two studies [
44,
45]; they used a spreadsheet life cycle as a vehicle for proposing (new) “best practices” that should be used in individual life-cycle stages.
We now look at the first category of studies. Leon et al. [
41] proposed their spreadsheet life-cycle framework with the following five stages: (a) planning, (b) design and development, (c) usage, (d) modification, and (e) disposition. The last three stages (c)−(e) are collectively known as “operation”. Leon et al. [
41] then used their framework to investigate the specific controls that firms implemented to mitigate potential spreadsheet risks throughout a spreadsheet life cycle. They found that, in most firms, neither formal rules nor informal guidelines for QA have been implemented [
41]. Panko and Halverson [
42] proposed a life-cycle framework that involves five stages: (a) requirements and design, (b) cell entry, (c) drafting, (d) debugging, and (e) operation. It was observed that: (i) error rates varied over the life cycle, (ii) when entering numbers and formulae in cells, spreadsheet developers made many mistakes that they corrected immediately, (iii) developers often spent little effort in initial analysis and design, and (iv) many did not perform systematic error detection even after the drafting stage, except simple checking such as the reasonableness of numbers [
46]. Lawson et al. [
43] developed a seven-stage (designing, testing, documenting, using, modifying, sharing, and archiving) framework to study spreadsheet “best practices” by comparing “experienced” spreadsheet users with “inexperienced” ones. They found that some practices were more often performed by the most “experienced” developers and users, such as the use of model evaluation techniques (e.g., extreme case testing and the built-in static analysis toolbar for checking formulae and cell references) and commercial static analysis software (which some people refer to as “auditing software”).
Next, we turn to the second category of studies [
44,
45]. Ronen et al. [
44] are among the pioneers who proposed a spreadsheet life-cycle framework. Their framework consists of nine stages: (a) problem identification, (b) definition of model outcome/decision variables, (c) model construction, (d) testing, (e) documentation, (f) QA (called “auditing” in [
44]) of spreadsheet models and structure, (g) user manual preparation, (h) training, and (i) installation. Stages (g) and (h) are optional if the spreadsheet is designed to be used only by the developer, whereas the other seven stages are mandatory. Their framework was mainly introduced for the situation where spreadsheet development is solely responsible by one developer without much involvement from other stakeholders. In practice, some spreadsheet development tasks involve people other than the developer. In this paper, we shall use the term “other stakeholders” to refer to all people,
except the developer, engaged in spreadsheet development. Read and Batson [
45] introduced their six-stage framework (scope, specify, design, build, test, and use), which is applicable to both situations involving and without involving other stakeholders. This framework classifies spreadsheet models into four types: simple, complex, time-critical, and ill-defined. Each of these spreadsheet models is associated with a different set of “best practices” [
45].
From Section 6 onwards, we discuss and analyze in detail the major studies associated with each stage of a spreadsheet life cycle. To facilitate our discussion, a “specific” life-cycle framework needs to be adopted. When deciding what framework to adopt, we observed the following.
● Except the framework in [
45], other frameworks [
41-
44] do not explicitly address the situation where other stakeholders are involved in spreadsheet development.
● Despite the above merit, the framework in [
45] does not address an early task of spreadsheet development — problem identification. This task is important because if the problems are ill defined, the resulting spreadsheet will be unable to fulfil all the user requirements and expectation.
● After a spreadsheet has been released for use, it will often be maintained when errors are later detected or new user requirements arise. The task of maintenance is either explicitly or implicitly addressed in the relevant stage of four of the above frameworks [
41-
43,
45]. Indeed, in the framework introduced in [
44], maintenance is implicit in the “use” stage.
In view of the above observations, we revise and extend the original six-stage framework in [
45] by: (a) incorporating “problem identification” into the first stage and renaming it as “problem and scope identification”, (b) renaming the “use” stage as “usage and maintenance” to explicitly include the maintenance task, (c) combining the “specify” and “design” stages into one single stage for ease of discussion, followed by incorporating the stage “modeling”, and (d) revising the original stage “test” to “testing and debugging”. The resultant framework after these changes will hereafter be referred to as the “extended spreadsheet life-cycle framework”, or simply the “extended framework”. It consists of five stages: (i) problem and scope identification, (ii) specification, modeling, and design, (iii) implementation (or building), (iv) testing and debugging, and (v) usage and maintenance. Readers, however, are cautioned that in practice many spreadsheet development projects are not actually performed in a highly structured manner by following every stage of the extended framework. For example, many spreadsheet developers quickly go to implementation (stage (iii)) without first spending sufficient effort in problem and scope identification (stage (i)) as well as specification, modelling, and design (stage (ii)) [
45].
6 QA issues in each stage of the spreadsheet life cycle
6.1 Problem and scope identification
6.1.1 Problem identification
In this substage, the developer defines the nature of the problem to be solved by the spreadsheet, and also investigates other issues such as: (a) the way the problem is currently solved (if at all), (b) the performance bottlenecks, and (c) the sources of information [
44]. This substage is similar to the task “analyzing the existing system” of the waterfall model, and also includes a “make or buy” analysis in which the developer should determine if an existing template can be purchased for the application to be developed. Numerous such templates are available for purchase, particularly in the areas of income tax calculation, rental analysis, and real estate investments.
6.1.2 Scope identification
The developer defines the spreadsheet’s objectives and boundaries, and also considers: (a) the level of complexity that is appropriate to meet the spreadsheet’s objectives, (b) the input assumptions (i.e., the estimates or forecasts used in the model) and logical assumptions (for determining the model structure), (c) the data requirements in broad terms (which data are required and how to obtain them), and (d) the time and other resources required for model development. In addition, workshops should be organized for: (i) collecting opinions from the stakeholders involved in model development, (ii) soliciting consent on the model objectives, (iii) discussing the trade-off between the scope of the model and its effectiveness, (iv) resolving differences in the model requirements, and (v) agreeing on the data required for the model and responsibilities for producing them [
45].
6.2 Specification, modeling, and design
6.2.1 Specification
For large software development, a specification is often prepared which describes in detail the function of a software system prior to programming. Most spreadsheets, however, do not have “formal” specifications, and it is unlikely that spreadsheet developers will be avid specification preparers [
40]. Thus, the specification substage is often omitted. It is a challenge for spreadsheet developers to determine under what situations a specification is essential, cost-effective, or otherwise appropriate. If the specification task is considered necessary, bubble diagrams, calculation tables, and prototypes can be used for specifying spreadsheet models [
45].
6.2.2 Modeling
Mental models Developers have mental models of the spreadsheets with which they develop or interact [
47-
50]. Thus, understanding their mental models will lead to better knowledge of why spreadsheet development is error-prone and enable the development of new tools and techniques that better correspond to spreadsheet developers’ cognitive abilities. Three mental models in the developers’ minds were proposed: real-world model, domain model, and spreadsheet model [
47-
50]. When explaining a spreadsheet, the real-world and domain models are prominent, while the spreadsheet model is suppressed. However, when locating and fixing a fault, one must constantly switch back and forth between the domain model and the spreadsheet model. This suggests that we should strive to improve the mapping between these mental models by improving the correspondence between spreadsheet-specific concepts and application/problem domain concepts.
Model building and training In spreadsheet development, model building is seldom performed, possibly due to the lack of time of the developers [
51]. One study [
52] found that spreadsheet models were often built in an informal, iterative manner, by employees at all ranks. Another study [
53] was performed to identify what procedures could reduce mistakes in spreadsheet development. It was found that a six-week training has significantly improved the logical reasoning of trainees, which in turn enhanced their ability to develop competent spreadsheet models. Here we emphasize that effective spreadsheet training is more than just “learning to hit the keys” without conceptual learning [
53].
Model-driven or model-related techniques Several model-driven techniques [
39,
54-
62] have been proposed to define a spreadsheet business
model which will be consistent with its corresponding customized spreadsheet
application. Also, refactoring techniques were proposed to improve the overall quality characteristics of ClassSheet models [
63,
64], which will then be embedded in spreadsheets [
57,
65,
66]. This embedding practice aims to close the gap between creating and using a domain-specific language for spreadsheet models and a totally different framework for actually editing spreadsheet data. By combining and adapting the above model-driven techniques and ClassSheet models, as well as situational method engineering, an integrated approach was proposed where structure and computational behavior of a spreadsheet can be specified by a model with a process-like notation based on pre-defined functional spreadsheet services with typed interfaces [
67]. An Example-Driven Modeling technique [
68] was also proposed to develop spreadsheet models as an alternative to spreadsheet programming. The main idea is to use an example data set to be processed by a machine learning algorithm in order to infer the relationships between input and output, from which a generalized decision support model is generated. Furthermore, the SpreadSheet-based Modeling (SSM) approach was proposed in [
69], which provides a simplified representation to edit the spreadsheet models while the underlying metamodels are unchanged.
6.2.3 Design
The design substage involves producing the most effective structure for the spreadsheet model. A good design makes the model easy to use and understand, reduces the likelihood for a developer to make mistakes when using the model, and facilitates detecting faults due to these mistakes [
70].
Design rules and formal design practices Six “golden” rules of spreadsheet design were introduced in [
45] to make models easier to comprehend and modify, and reduce the risk of errors. These rules are: (a) separating inputs, calculations, and results, (b) using one formula per row or column, (c) referring to the left and above, (d) using multiple worksheets, (e) using each column for the same purpose throughout the model, and (f) including a documentation sheet. In addition to
rules, a
block structure for spreadsheet models was proposed in [
44]. This structure has five blocks: (a) identification, (b) macros/menus, (c) map of model, (d) parameters/assumptions, and (e) model itself. Two purposes of this structure are to separate parts of a spreadsheet into blocks to reduce the potential for making mistakes, and to clarify the assumptions of the model to users.
Although the above rules [
45] and block structure [
44] are useful, they are not enough for designing large and complex spreadsheets with good quality. In view of this, a top-down design approach is highly desirable. Based on the notion of data flow diagrams (DFDs) commonly used in traditional systems development, Ronen et al. [
44] introduced their spreadsheet flow diagrams (SFDs) to reduce complexity and to encourage a structured, top-down design. They argued that SFDs are more preferred than DFDs because SFDs focus more on modeling relationships (instead of data flow) and provide an effective means to show the algorithm or the underlying formulae of the model [
44]. Besides SFDs, state transition diagrams were proposed for defining macros and menus such that each keystroke selection from a menu moves the user to a state represented by a circle in a state transition diagram.
Similar to the work in [
44], a structured design approach for reducing spreadsheet development risks was proposed [
71]. We noted a major difference between [
44] and [
71]. In [
44], SFDs were proposed to replace DFDs in spreadsheet development because SFDs are relatively more effective. However, in [
71], DFDs were proposed to explicitly model and document the links among data cells in different modules during the design substage. It was also reported in [
71] that this DFD-based design approach significantly reduced the number of
linking errors (which are mistakes in coding links between cells in one part of a spreadsheet to another). In addition, a framework was proposed in [
72] which divides the variables into different parts of the workbook, with no duplication and linked through formulae. This framework makes the spreadsheet models easy to use, debug, and modify.
Because spreadsheets are inherently free-form, those developers accustomed to solving problems using very structured and dedicated MS/OR software packages are often confronted with the challenge of using spreadsheets for solving their optimization problems. Some respond to the challenge by devising rules for implementing models that impose artificial structure on spreadsheets [
44,
71,
72]. However, in MS/OR, this artificial-structure approach might occasionally result in spreadsheet models that are difficult to construct/comprehend and less reliable [
73]. To address this problem, some guidelines and suggestions for creating more effective spreadsheet models for optimization problems were provided [
73]. These guidelines and suggestions include, for instance, avoiding embedding numeric constraints in formulae, and designing formulae that can be copied. To some extent, adopting these guidelines and suggestions are similar to the approach of using the “golden” rules of spreadsheet design as proposed in [
45].
A large class of errors can be attributed to the inability to clearly visualize the underlying computational structure and the poor support for abstraction. To alleviate this problem, a multiple-representation spreadsheet was developed [
74]. This spreadsheet contains additional representations that allow abstract operations without changing the conventional grid representation or its formula syntax.
Test-driven development (TDD) is reported to be useful for developing better-quality spreadsheets [
11,
75,
76]. TDD is a development methodology which “forces” the developer to consider a system element’s design before coding and take small steps when writing software. Built upon TDD, test-driven spreadsheet development (TDSD) was proposed [
11,
75,
76], whose effectiveness has been verified by several case studies. TDSD was found to be easily understood and used, even for those users without prior knowledge of TDD.
Based on the logical/physical theory, there exist four principal components that characterize any spreadsheet model: schema, data, editorial, and binding [
77]. A factoring algorithm was developed to identify and extract these components from conventional spreadsheets with minimal user intervention, followed by a synthesis algorithm which assists users in the construction of executable spreadsheets from reusable model components [
77].
Support for capturing and sharing the problem-solving knowledge associated with end-user developed software (including spreadsheets) is often lacking [
78]. To improve “literate” programming in the spreadsheet domain, the following design techniques were proposed: (a)
chunking (breaking the code into logical pieces), (b) a dependency-based computational structure (for eliminating the need to order the declarations for execution), and (c) an automatic generation of the table of contents which provides active hyperlinks to each chuck [
78].
Design tools Todays, tools are widely used in software development to improve design productivity and effectiveness. Thus, tools for supporting spreadsheet design are highly desirable. One such tool was proposed in [
79] which ensures that the developed spreadsheets have a compatible structure and follow concrete standards. This tool requires the developer to firstly identify and state unambiguously the spreadsheet’s overall objective, followed by refining the design into three parts: (a) input data and assumptions, (b) applicable model, formulae or technique for data analysis/processing, and (c) outcome and results. Each part is then decomposed into a lower level of details. The refinement strategy associated with the tool facilitates structuring a spreadsheet into different sections: introduction, data and assumption, model, analysis, macro, and possibly some other sections such as data-entry screen and database lookup. Another tool was also developed which incorporates a block-based formula editor (known as XLBlocks) to replace the default text-based formula editor [
80].
6.3 Implementation
6.3.1 Implementation planning and practices
Implementation involves actual coding of the model. Many spreadsheet development projects started their life cycles with this stage, and only little effort was spent on the previous stages (i.e., the Problem and Scope Identification stage and the Specification, Modeling, and Design stage) [
45]. Also, many spreadsheet developers rarely did much planning before they started filling the cells in a spreadsheet [
37,
46,
52]. Such a practice likely generates several problems: (a) an unnecessarily complex model which takes longer time to build, (b) assumptions made that were not part of the original intention, and (c) a lack of common understanding of what the model is doing.
A visual description language (CogMap) was developed for spreadsheets to address problem (a) [
81]. In essence, CogMap is a simple yet cognitively effective tool for the visual expression of simple assertions about the structure, function, or any other propositions associated with a spreadsheet. A technique for the bidirectionalization of spreadsheet formulae was also developed to allow users to trace backwards for a chain of calculations, e.g., in “what-if” scenarios where the user changes the output and inspect how it possibly affects the input cells [
82]. Yet another technique for understanding and inferring units (e.g., dollars, metres, and kilograms) in spreadsheets was developed in view of the fact that many spreadsheet cells do not carry unit information [
83]. This may cause a class of spreadsheet faults where calculations (without proper unit transformation) are performed on cells with different units.
Implementation can be done by individuals or in teams. In this regard, the performance between individual and team (of 3 persons) development was also investigated. Compared with individual development, the quality of team-developed spreadsheets was empirically more superior as the percentage of faulty spreadsheets fell from 86% to 27%, and the percentage of cells containing faults fell from 4.6% to 1.0% [
84].
6.3.2 Coding mistakes, knowledge, and techniques
It was found empirically that most mistakes end users made in coding spreadsheets were due to poor strategies and attention problems (e.g., overload of working memory or paying attention to the incorrect part of the spreadsheet) [
85]. Also, many lookup functions used in spreadsheets were problematic [
86]. Furthermore, end users who are more knowledgeable on spreadsheets developed better quality spreadsheet applications [
17]. Thus, there is a need to develop metrics of the developer’s expertise on spreadsheets [
87].
When coding spreadsheets, decreasing the degree of complexity in formulae will improve the accuracy of a spreadsheet [
70]. Specific data and logic validation as well as coding controls through the use of @NA, @IF, and @ERR functions, crossfooting totals, and mechanisms for interactive error feedback should be used where applicable [
88]. Also, structured programming techniques, consistent and meaningful names for data ranges (instead of hardcoded cell addresses) and files, macro-controlled templates to avoid user tampering of the process logic, and menu-based macros for printing reports are recommended. In [
89], an approach was proposed to generate
elastic sheet-defined functions (
SDFs) that work over inputs of arbitrary size. The purpose of SDFs is to facilitate reuse by allowing spreadsheet developers to write code once, rather than repeatedly.
6.4 Testing and debugging
6.4.1 Role of testing and debugging
In the past, few corporations had maintained effective controls to deal with spreadsheet faults. After the financial reporting scandals at Enron, the U.S. Congress passed the Sarbanes-Oxley Act in 2002. Many countries/regions have followed with similar legislation. Consequently, some large corporations have started to look at how they were using spreadsheets in several critical business functions and to put more attention to spreadsheet testing [
14,
36]. Leon et al. [
90] even argued that planning and implementing spreadsheet controls in corporations should go beyond regular compliance.
Kruck [
70] contended that increasing the extent of testing and debugging will improve spreadsheet accuracy. The importance of spreadsheet testing was also emphasized in [
91,
92]. Although most spreadsheet practitioners and researchers generally agreed on the importance of testing (e.g., desk checking, inspection, static analysis tools, and comprehensive test cases), the findings by Caulkins et al. [
9] seemed to contradict this mainstream opinion. Their interviews found that a significant minority of respondents believed
informal quality control procedures were generally sufficient because the “human in the loop” could detect any gross errors. Also, an online survey [
41] found that only 24% of the respondent companies had independent QA groups, and only 7% of respondents had used third-party static analysis software. The first part of this finding indicates that QA exercises (particularly desk checking), if any, were most likely done by the spreadsheet developers themselves.
6.4.2 Spreadsheet reuse
One way to lower spreadsheet development costs is reuse. A challenge of reusing spreadsheets is the inadequate support for comparing spreadsheets which are descended from a common ancestor or are “siblings” (generated from templates and later repeatedly instantiated). Hence, it is often difficult to choose which spreadsheets to reuse. To deal with this issue, an algorithm was proposed in [
93] to identify the differences between two spreadsheets. Similarly, a method (based on heuristics) and an associated tool were proposed in [
94] to analyze the differences between related spreadsheets.
6.4.3 Contingency factors and test planning
Some researchers (e.g., [
95,
96]) argued that the traditional
validation methodology was primarily developed for use by expert system builders in the context of large and complex software, which is not the case in spreadsheet development. Thus, this traditional methodology needs to be augmented so that appropriate validation can be performed by non-technical end users. To this end, a validity framework for DSS (including spreadsheets) was developed, and this framework involves fives types of validity: general, logical, interface, data, and system builder [
96]. Also, several contingency factors were found that influence the process of validating DSS (including spreadsheets). These factors include: importance/complexity of the decision, life span of DSS (or spreadsheets), budget, abilities of owner/system builder, and organizational factors [
96]. Another framework was proposed in [
97,
98] for initially estimating the required validation effort to ensure that a developed spreadsheet is valid. This framework considers several contingency factors such as risk, complexity, significance, system builder competence, contentiousness, and deadline.
6.4.4 Static testing
Testing effectiveness Compared with inspection, desk checking was reported to be relatively low in fault-detection effectiveness — only about 10% [
32,
99]. People were generally overconfident on the fault detection effectiveness of desk checking, despite the finding that some types of faults are especially difficult to detect by desk checking when compared with inspection [
12]. To improve the effectiveness of desk checking, a general-purpose protocol (with the use of two commercial static analysis tools: XL Analyst and Spreadsheet Professional) was developed [
10,
100].
In [
38], a conceptual model of the factors which affect the performance of fault detection was developed. These factors can be classified into four categories: individual (e.g., domain experience and skill of the reviewer), presentation (e.g., whether the spreadsheet is reviewed on paper or on screen), nature of fault (e.g., type or complexity), and external (e.g., time pressure and desired accuracy). An experiment was also performed to investigate whether the presentation factor was indeed significant with respect to the number of faults detected. It was found that: (a) overall the reviewers detected only about half of the errors, (b) the experimental subjects who checked on paper detected more faults, but took a longer time to finish, than those who checked on screen, and (c) checking spreadsheets which had the formulae displayed or printed (even when printed alongside the corresponding cells than separately as a linear list) did not detect significantly more faults than those with no formulae available. When no formulae were available for checking, this kind of checking effectively targeted at detecting failures.
Static testing techniques and analysis tools MS Excel provides a built-in static analysis toolbar called “formula auditing” (note here the deviation of the term “audit” from the IEEE terminology) for the developer to trace precedents and dependents for each cell value. Using such a tool, however, would be tedious and time consuming for spreadsheets with many worksheets and many linked cells [
101]. To alleviate this problem, many
automatic approaches for detecting spreadsheet faults have been proposed [
18,
102-
112]. Often supported by their associated tools, these approaches target at detecting specific types of faults by various heuristic decision algorithms. For example, UCheck [
113-
115] and Dimension [
116,
117] target at assigned “data types” of input cells and formulae. These data types are derived from the text values of header cells that are positioned in the same row or column as the related input cells. Some other tools focus on cell arrays (e.g., AmCheck [
118], CACheck [
119], and EmptyCheck [
120]) or use adaptive learning to detect anomalies in formula cells (e.g., CUSTODES [
121], ExceLint [
122], and WARDER [
123]). Rather than focusing on spreadsheet faults, an algorithm was proposed in [
124] to detect a particular type of mistake (data cloning) — formulae whose values are copied as plain text in a different location of a spreadsheet. Furthermore, an Excel add-in tool, CheckCell [
125], was developed to automatically find potential data errors (this tool locates data that has a disproportionate impact on the spreadsheet computation).
However, some researchers (e.g., [
126,
127]) argued that the above automatic detection approaches are not effective as they only focus on specific fault/mistake types. To address this problem, supervised machine learning algorithms were developed to obtain fault predictors that utilize all data provided by multiple spreadsheet metrics from their “extensive” catalog [
126]. Experiments showed that these fault predictors were effective for a wide range of faults [
126]. To complement the work in [
126,
127], we noted six more studies [
128-
133] which have used the concept of
smelly formulae (i.e., spreadsheet formulae with potential faults) and/or a list of metrics, as well as detection/refactoring techniques to resolve these smelly formulae. In addition to smelly formulae, other studies [
101,
134,
135] proposed a visualization approach to detecting spreadsheet faults. For example, a desk-checking approach (supported by its associated tool LinkMaster) was proposed which provides useful, visual cues on individual worksheets for finding
linking faults (incorrect references to spreadsheet cell values on separate work areas) [
101].
Two studies [
136,
137] were conducted to: (a) investigate the different kinds of spreadsheet mistakes, (b) determine the effectiveness of two static analysis tools (the Excel’s built-in “Error Check” function and the add-in “Spreadsheet Professional” tools) for detecting mistakes, and (c) compare these tools with manual desk checking. One finding in [
136] was that the performance of both static analysis tools was very poor for each category of spreadsheet mistakes. This finding contradicted with another study [
100], which reported that the performance of “Spreadsheet Professional” was more superior to manual desk checking. Another finding in [
136] showed that static analysis tools for spreadsheets have different fault detection effectiveness. To allow a systematic and objective comparison among these tools, the Formulae, Formats, Relations (FFR) model was developed in [
138], serving as a basis for describing and comparing different static analysis tools.
6.4.5 Dynamic testing
Dynamic testing techniques Besides static testing, dynamic testing is also a common spreadsheet QA approach. Numerous dynamic testing techniques were developed to improve the effectiveness of dynamic testing for spreadsheets. Below we outline some of these techniques.
In [
139], metamorphic testing (MT) was proposed for detecting spreadsheet failures, and experiments showed that MT is highly effective for such detection [
140]. As follow-up work, two failure detection techniques (error trapping (ET) [
141] and MT) were compared [
20]. Experiments found that neither technique is sufficient in spreadsheet testing [
20]. Rather, ET and MT are complementary and should be used together in spreadsheet testing whenever possible.
The “What You See Is What You Test (WYSIWYT)” methodology [
13,
142-
144] is a well researched dynamic testing technique. Based on a definition-use test adequacy criterion [
145], this methodology helps users locate faults and highlight potential discrepancies [
104]. Another work [
146] was conducted to integrate automated test generation into the WYSIWYT methodology to support incremental testing and provide immediate visual feedback. Furthermore, a technique for generating test cases was developed, together with its associated tool, based on backward propagation and solution of constraints on cell values [
147].
While most dynamic testing techniques focus on checking
numerical data, another study [
148] took a different approach. In [
148], a user-extensible model for
string-like data (e.g., names) was developed to distinguish at runtime between invalid data, valid data, and questionable data that could be valid or invalid.
Testing effectiveness The WYSIWYT methodology described above uses data-flow concepts. Several empirical studies (e.g., [
149]) compared the performance of data-flow and mutation testing. These studies found that mutation-adequate tests detected more faults. Inspired by this finding, a suite of mutation operators was developed to support applying mutation testing for spreadsheets [
150]. These mutation operators can also be used for evaluating the effectiveness of dynamic testing tools and debugging tools for spreadsheets.
Test case construction To perform dynamic testing, spreadsheets are executed with test data so that the output results can be checked against the expected results. The comprehensiveness of test data will have a profound effect on the effectiveness of spreadsheet testing. Thus, a “good” set of test data should be constructed to validate the logic of spreadsheets before release to users [
88]. In this regard, the MT technique [
20,
139] discussed above provides an effective and systematic means to generate a “good” set of test cases for dynamically testing a spreadsheet. In addition, an algorithm for splitting spreadsheets into smaller logically connected parts (called
fragments) was developed [
151]. The split fragments can then be individually tested. Studies [
151] found that this approach significantly reduces the effort to generate test cases for testing a spreadsheet.
6.4.6 Spreadsheet error classifications and taxonomies
Many studies have developed classifications or taxonomies of spreadsheet errors. The study in [
44] is one of these pioneer works. In [
44], although the classification scheme includes eight error types, it is not fine-grained enough because: (a) some of these error types are mistakes (e.g., incorrectly copied formulae) and some of them are faults (e.g., incorrect ranges in formulae and incorrect cell references), and (b) one “error” type called confused range names is in fact undesirable practice potentially contributing to confusion rather than faults that directly cause failures. Another work [
152] distinguished between two classes of errors:
domain error (e.g., a “mistake” in logic due to misunderstanding of the depreciation concept in accounting) and
device error (e.g., a “fault” involving a wrong reference in the depreciation function). Similar to the work in [
44], the error classification in [
152] does not differentiate between mistakes and faults.
With respect to spreadsheet faults, the work in [
153] distinguished between location fault and formula fault.
Location fault refers to a fault in formulae that are conceptually correct but one or more of their cell references are wrong, while
formula fault refers to a misuse of operators or wrong number of operators. On the other hand, two studies [
42,
137] developed a taxonomy specifically focused on spreadsheet mistakes (Fig.1). They defined
quantitative mistakes as those which cause “immediate” incorrect output from spreadsheets, whereas
qualitative mistakes are those which do not result in “immediate” wrong output but, rather, they often represent poor design and coding practices (e.g., putting a constant instead of a cell reference into a formula). Such a definition of qualitative “mistakes” is debatable, because they do not necessarily produce failures. Instead, they are poor spreadsheet modeling practices and, hence, are strictly speaking not mistakes. Note that, in [
154],
latent errors are defined as those that “do not directly cause the error and occur upstream of the event”. Thus, latent errors defined in [
154] are obviously of the same in nature as that of qualitative “mistakes” defined in [
42,
137].
Along with this taxonomy, the study in [
155] included two more “mistakes”:
jamming (values of more than one variable are placed in a single cell) and
duplication (information of a variable is duplicated in the spreadsheet, possibly resulting in data inconsistency). Here, similar to qualitative “mistakes” defined in [
42,
137], classifying jamming and duplication (which are unrecommended practices) as mistakes is controversial. Similarly, the study in [
156] developed a taxonomy to classify the qualitative “mistakes” in Fig.1 into four subtypes: formula integrity, semantics, extendibility, and logic transparency (see Fig.2). With respect to the qualitative “mistakes” and quantitative mistakes in Fig.1, an experiment involving desk checking was performed [
157]. This experiment found that: (a) quantitative mistakes were more easily detected than qualitative “mistakes”, and (b) the detection rate depended on the type and prominence of mistakes (e.g., whether the mistakes were conspicuous) as well as prior incremental practice with spreadsheet error detection [
157].
6.4.7 Impact of spreadsheet calculation paradigms
It was argued in [
158] that a cause of spreadsheet errors is the low conceptual level of spreadsheets, e.g., lack of abstraction and modularity mechanisms. This study [
158] compared two spreadsheet calculation paradigms:
traditional (e.g., Excel) versus
structural. Structural spreadsheet calculation paradigm is at a higher conceptual level that utilizes goals, plans, and spreadsheet data structures in computation [
159]. The study [
158] found that the two paradigms produced different error behaviors.
6.4.8 Impact of spreadsheet faults
In [
160], the financial impacts of faults in 25 operational spreadsheets were investigated. The study identified 117 genuine faults, in which 70 (60%) of them had financial impact and the remaining 47 (40%) had not. The largest financial impact of a fault was over $100 million, followed by several faults whose impact exceeded $10 million each. A similar study [
161] was performed to investigate the impact of spreadsheet errors in an Irish healthcare setting, and found that more than 90% of the spreadsheets studied were faulty and the cell-error rate was 13%.
6.4.9 Debugging
Debugging effort and performance Some researchers (e.g., [
162]) argued that the human-computer interface (HCI) literature tended to emphasize the strengths of spreadsheets (e.g., quick gratification of immediate users’ needs), but often neglected their weaknesses, one of which being the difficulty in debugging. Some other researchers (e.g., [
46]) reported that users did not spend much time in a separate debugging stage, thereby reducing the effectiveness of this task.
With respect to the performance and behavior of expert and novice end users in spreadsheet debugging, an experiment [
163] has the following important results. First, on average, the debugging performance of experts (72%) was 14% better than novices (58%), and this difference was statistically significant, especially for formula faults. Second, regarding the debugging behavior of both types of users in terms of coverage per cell, formula cells took precedence over data cells, with a greater percentage of expert users looked at each cell. Also, summation formula cells and bottom-line value cells attracted more attention than other formula cells. Furthermore, formula cells generating
text values were not checked as much as formula cells generating
numeric values. Third, using a debugging tool that gives feedback on cell coverage resulted in slightly higher debugging performance (62% for the experimental group versus 59% for the control group). While the difference in debugging performance was not statistically significant, the cell coverage feedback provided by a debugging tool has greatly increased cell-checking rates.
Debugging strategies, models, and techniques For effective debugging, the task should be well supported by associated strategies, models, and techniques. In view of this, a sensemaking model was derived to analyze how end users went about debugging their spreadsheets [
164,
165]. The study in [
164] found the dominance of information foraging, which occupied half to two-thirds of participants’ time. This study also identified
selective (striving for efficiency by following contextually salient cues) and
comprehensive (seeking a more complete understanding of the problem) styles as two successful strategies for traversing the sensemaking model. The above findings thus revealed important implications for the design of spreadsheet tools to support end-user developers’ sensemaking during debugging.
Model-based software debugging (MBSD) is a well-known technique for fault localization in software written in common programming languages such as C++ and Java. One study [
166] empirically analyzed and compared the use of three MBSD models in locating spreadsheet faults: the value-based model, the dependency-based model, and the novel dependency-based model [
167]. Since this study [
166] found that each of the three models has its merits and drawbacks, thus combining these three models to produce a more effective bug diagnosis partitioning technique should be considered. Another study [
168] introduced a model-based diagnosis for spreadsheet debugging by translating the relevant parts of a spreadsheet to a constraint satisfaction problem for performing further reasoning (e.g., about input propagations).
Fault localization is an important task for debugging. We noted several studies on fault localization. The first study [
169] investigated three fault localization techniques: blocking, nearest consumers, and test count. This study found empirically that the two individual factors, information base and mapping, both had significant impact on the effectiveness of these three techniques. The second study [
170] proposed several ways to combine reasoning from UCheck [
113-
115] and WYSIWYT [
13,
142-
144] for locating faults and mapping the fault information visually to the users. This combined technique was reported to be more effective than either technique alone [
170]. The third study [
171] investigated the impact of different similarity coefficients on the accuracy of spectrum-based fault localization (SFL, which is a software debugging approach originally developed for traditional procedural or object-oriented programming languages) applied to spreadsheets. This study [
171] identified three of the 42 studied similarity coefficients as the best ones to diagnose spreadsheets with SFL. The fourth study [
172] proposed a fragment-based spreadsheet debugging approach, based on the rationale that after partitioning a spreadsheet into fragments, users find it easier to assess the correctness or faultiness of each smaller fragments than the entire spreadsheet.
Debugging tools A debugging tool, called GoalDebug, was developed to propagate the expected change from the old, incorrect value to the new, expected value backward over the formula to its arguments and referenced cells to derive formula changes, as well as generates change suggestions for the developer [
173]. Similarly, an interactive debugging tool, called EXQUISITE, based on artificial intelligence was developed [
174]. In addition, the role of visualization tools in spreadsheet debugging and fault detection was studied in [
175]. It was reported that developers using a visualization tool that facilitates chaining to trace formula cell references were significantly faster in identifying and correcting linking faults, compared to developers without using the tool.
Furthermore, two interactive tools for detecting data dependencies in a spreadsheet were developed and evaluated [
176]. One of them is an online tool which displays flowchart-like diagrams, while the other tool interactively depicts cell dependencies directly on the spreadsheet. It was found that users generally preferred the second tool though both tools were considered helpful [
176]. Recently, a study [
177] reported that, although debugging tools are useful, over-reliance on tools might exist and reduce fault identification performance. The reason is that users of debugging tools might focus solely on potential problems pinpointed by the tools, thus paying less attention to other parts of a spreadsheet.
6.5 Usage and maintenance
Generally, users of enterprise (non-spreadsheet) applications do not need to know the tools or programming languages used for developing the applications. By contrast, effective use of a spreadsheet by an end user (including the management) who is not the developer appears to require substantial spreadsheet knowledge [
17]. Thus, problems with the quality of spreadsheet applications can be partially offset by the end user’s spreadsheet knowledge [
17]. In this regard, a training program should be developed to raise the effectiveness of using and accepting end-user applications, including spreadsheets [
178]. It was also reported that companies generally offered more user trainings to high-risk spreadsheet applications [
19].
Some studies [
45,
88] recommended to employ specific controls in the use of spreadsheets, such as to: (a) document processing logic in detail to cater for future need of reconstructing the formulae, (b) prepare user manuals that include descriptions of the general context, input, processing, output, backup and recovery procedures, (c) timestamp outputs to safeguard data integrity, (d) formulate explicit error detection and handling procedures, and (e) implement security measures to prevent data loss.
Like most conventional software, spreadsheets are subject to software evolution or maintenance. However, this task is often not supported by version management tools, causing the introduction of new mistakes or faults during evolution. In view of this problem, a spreadsheet corpus (known as VEnron) was developed to provide useful version information on spreadsheet evolution [
179]. Also, during the maintenance phase, end users often need to modify a large (legacy) spreadsheet developed by others and whose functionality they do not understand. To address this problem, several techniques were proposed in [
180], including: (a) reverse engineering techniques for deriving relational models from existing spreadsheets, and (b) data mining techniques for reasoning about spreadsheet data and inferring functional dependencies among columns (these dependencies being the building blocks to infer the spreadsheet business model as a database schema).
7 Analysis, discussions, implications, and future research
7.1 Analysis and discussions
We call the 122 selected journal and conference papers related to spreadsheet QA the “primary” papers, and the other relevant ones listed in the References Section the “secondary” papers. We counted the number, P, of the primary papers as well as the number, T, of both primary and secondary papers related to the overall spreadsheet life cycle. We also performed similar counts for papers on individual spreadsheet life-cycle stages or substages. A paper may be counted more than once if it addresses multiple issues or areas. Tab.6 shows all these P and T values.
Overall, at the level of life-cycle stages, the values for
P range from 1 to 81, and the values for
T range from 2 to 107. We found that the life-cycle stages “Problem and Scope Identification” and “Usage and Maintenance” have the smallest
P and
T values. Consider the “Problem and Scope Identification” stage. There are two plausible reasons contributing to its small
P and
T values. First, Ronen et al. [
44] argued that problem identification in spreadsheet development is similar to the task “Studying Existing Systems” in the traditional waterfall model used for implementing non-spreadsheet applications. As problem identification is largely independent of the development platform, existing studies related to problem identification for implementing non-spreadsheet applications are likely to be applicable to spreadsheet development. Consequently, the demand is not high for additional studies that specifically address problem identification for spreadsheets. Second, unlike professional programming where users and developers are different groups of people, there are often the scenarios where users undertake the spreadsheet development work for themselves. Obviously, for end users, problems and requirements are both more easily understood (because the problems and requirements are their own) and more likely to change (because end users may need to negotiate such changes only with themselves) [
16,
181]. Thus, problem and requirement identification for spreadsheet development are relatively more implicit. The above two reasons may result in fewer studies related to the “Problem and Scope Identification” stage.
In Section 6.5 above, we discussed seven studies [
17,
19,
45,
88,
178-
180] related to the “Usage and Maintenance” stage. We identified from them several spreadsheet QA techniques and controls such as input and output controls, documentation, backup and recovery, user training, and change/version management. We noted that not only very few of our collected papers have addressed the relevant techniques and controls of this stage, but even for four of these few papers [
17,
19,
45,
88], they just briefly touched on the above techniques and controls without in-depth discussion.
The “Testing and Debugging” stage has the largest values of P (= 81) and T (= 107), even much larger than the P (= 26) and T (= 37) values of the “Specification, Modeling, and Design” stage. Our further investigation of the “Testing and Debugging” stage found that, overall, the three most popular research areas are: static testing (P = 30, T = 44), dynamic testing (P = 10, T = 13), and debugging (P = 17, T = 20). For each of these three areas, we also counted the numbers and the percentages of relevant studies in two categories: technical-oriented IT research and business-oriented IS research. The results are shown in Tab.7. We observed that, across all the three areas, the numbers and the percentages of relevant studies in the technical-oriented IT category are much larger than those in the business-oriented IS category. A major reason is that there have been many technical-oriented studies on the development of methodologies, techniques, and tools for these three areas from the software engineering community. Whereas among the relevant studies in the business-oriented IS category, most of them primarily focused on the performance evaluation (via experiments and case studies) and application aspects of testing and debugging.
We also noted an interesting observation regarding the area “Spreadsheet Error Classifications and Taxonomies” (see Section 6.4.6). Among the eight relevant papers (
P = 7,
T = 8), two of them [
44,
153] came from technical-oriented IT journals, four of them [
42,
152,
155,
156] from business-oriented IS journals or conferences, one of them [
137] from a dual-focused IT/IS journal (
Decision Support Systems (
DSS)), and the remaining one [
157] from an MS/OR journal (
Omega). Furthermore, the two papers, one from
DSS and the other from
Omega, were written by IS researchers. Thus, overall, the majority of papers related to spreadsheet error classifications and taxonomies are in the business-oriented IS category. We assert that most IS researchers pay relatively less attention to the IEEE terminology when compared with software-engineering researchers. This could be a reason why many IS researchers have not differentiated among mistakes, faults, and failures in their error classifications and taxonomies. Although their practices may be appropriate in their own study contexts, it would be desirable for future studies on this area to follow the well-defined IEEE terminology in order to avoid potential confusion and to generate a greater impact in the spreadsheet community.
7.2 Implications and suggestions for future research
In Section 7.1, it is mentioned that the values of
P and
T are relatively small for the life-cycle stage “Problem and Scope Identification”. It is also mentioned that one plausible reason is that problem identification is not much different between spreadsheet and non-spreadsheet developments [
44] and, hence, implying that new studies on this life-cycle stage is not in big demand. This reason is subject to argument. For example, in problem identification of spreadsheet development, a “make or buy (existing template)” analysis may be involved [
44] and this analysis may not exist in other non-spreadsheet development projects. Hence, it is advisable that more studies on problem identification specifically related to spreadsheets should be performed.
Also, our analysis in Section 7.1 found that only very few studies are related to the “Usage and Maintenance” stage. Thus, we recommend that more future studies on areas (e.g., input and output controls, documentation, backup and recovery, user training, and change management) related to this stage should be conducted. For example, with respect to user training on spreadsheets, several issues or questions are worth exploring: What modes (e.g., in-class, online, on-the-job, and help-desk) should be used for cost-effective spreadsheet training? With time constraint (e.g., a half-day training), what should be the training approaches and focuses (e.g., technical versus application-domain knowledge, model building, static and dynamic testing, testing analysis tools, debugging, and which kinds of cells, blocks, or structures in the spreadsheet should be spotlighted) to achieve the best outcome in effective usage and maintenance?
Our observation that the P and T values of the “Specification, Modeling, and Design” stage are much smaller than their corresponding P and T values of the “Testing and Debugging” stage also have an important implication. The former stage involves developing spreadsheets with higher quality and fewer faults (i.e., fault prevention), whereas the latter stage mainly focuses on fault detection and removal. As prevention is better than cure, more future research work should be performed to develop new and promising strategies, approaches, and techniques for spreadsheet specification, modeling, and design, as well as how to apply them effectively in the industry.
Section 6.2.3 has discussed the application of TDD to spreadsheet development [
11,
75,
76]. TDD has been advocated by many software researchers and practitioners in
non-spreadsheet domains for developing quality software. TDD is an iterative development approach with a “test early and continuously” attitude, so that developers need to design applications with testing in mind [
182-
184]. TDD offers many benefits. For example, it can help developers catch errors well before testing commences and also help clarify design. Despite these merits, we have not noted any studies other than [
11,
75,
76] (which were published by the same group of researchers) to apply TDD to spreadsheet development. Thus, a strong need exists for investigating and exploring this issue in future research studies. A key concept in TDD is that design and testing should not be two distinct tasks to be performed separately; rather there is a relationship between these two tasks. Following this logic, it is also worthwhile to further strengthen and extend the few existing studies (e.g., [
71]) on which spreadsheet errors are likely to be reduced by each design practice or technique.
We also note an important finding related to dynamic testing of spreadsheets. Although some techniques (e.g., the WYSIWYT methodology [
13,
142-
144], an automated test case generation based on backward propagation and solution of constraints on cell values [
147], and metamorphic testing (MT) [
20,
139]) have been proposed to systematically generate test cases for a more comprehensive spreadsheet testing, studies [
91,
92] found that many spreadsheet developers often work with a number of example inputs they use during development. A plausible reason is that spreadsheet users are not willing to invest time in the specification of test cases [
168]. Thus, more work needs to be done to educate spreadsheet developers and to promote the use of more systematic test case generation techniques for spreadsheets. Also, current spreadsheet environments do not support the “explicit” management of test cases [
168]. To deal with this issue, the EXQUISITE framework was developed with a feature of test case management [
168]. Through this framework, defining test cases can be done directly in the spreadsheet. In view of this issue, we suggest that more tools should be developed for test case management in the spreadsheet environments.
Nowadays, with the advent and popularity of the Microsoft Office Suite, a growing number of more sophisticated spreadsheets are integrated with Visual Basic for Applications (VBA) [
185]. This integration brings in two new sources of affecting the quality of spreadsheet outputs — the VBA module and its interface with the spreadsheet. Similarly, Excel spreadsheets can now be integrated with the Power BI platform for providing enterprise BI capabilities [
186]. As a result, the issue of spreadsheet quality is no longer restricted to the spreadsheet itself, but is also extended to its interface with other modules or platforms. Thus, a more holistic view of spreadsheet quality is necessary.
Although spreadsheets provide data analysis and computation capability, we noted that in some situations, spreadsheets are primarily used as a data repository [
187-
190]. For example, in [
190], spreadsheets were used as a universal data repository to unify the heterogeneous data sources in a distributed enterprise environment. In [
191], Excel was compared with Access (another Microsoft software product) in terms of their relative strength for storing large amount of data. It was argued in [
191] that if the goal is to maintain data integrity in a format that can be accessed by multiple users, Access is a better choice. On the other hand, although spreadsheets have some data analysis functions, Broman and Woo [
192] recommended to restrict spreadsheets to data entry and storage only, while analysis and visualization should be handled by a separate program. This practice helps reduce the chance of contaminating or destroying the raw data in the spreadsheet. In view of the use of spreadsheets for data storage, several data-related issues are to be considered, such as data quality, data integrity, data integration, data interoperability, and a well-defined data schema. More research work along this direction should be conducted.
The above major discussions in Section 7 are summarized in Tab.8.
8 Conclusion and limitations
Spreadsheets will continue to be a popular implementation platform for many decision support systems, intelligent systems, and expert systems in various application domains. Although developing and using spreadsheets are handy, its inherently free-form development style may result in many mistakes and, hence, faults to be introduced at various stages of the development life cycle. Thus, to effectively address the spreadsheet QA issues, a holistic life-cycle approach is recommended. To the best of our knowledge, this paper is one of the first kind to integrate the major research studies on spreadsheet QA from the technical-oriented IT community, the business-oriented IS community, and the MS/OR community.
Almost all literature reviews have their limitations. Ours is no exception. Our literature search did not strictly follow a systematic approach. Even though our review is fairly extensive in terms of the survey time period, the number of search terms, target journals and conferences, as well as other supporting sources, and the collection of relevant papers will never be entirely complete in a broader sense. In other words, there might be some other relevant papers inadvertently omitted from our review because they do not satisfy our inclusion criteria (e.g., published in a journal outside our list). Despite the aforesaid limitations, this paper should serve as a handy reference for a comprehensive overview and big picture of the research work done in the important areas of spreadsheet QA.
The Author(s) 2023. This article is published with open access at link.springer.com and journal.hep.com.cn