1. Department of Structures, Av. Brasil 101, Lisbon 1700-066, Portugal
2. Technical Center for Bridge Engineering, CEREMA, B.P. 214, Provins Cedex 77487, France
3. Materials and Structures Department, IFSTTAR, University Paris-Est, 14-20 Boulevard Newton, Champs sur Marne, Marne la Vallée Cedex2, F-77447, France
4. Department of Civil Engineering, Technical University Lisbon, Avenida Rovisco Pais, Lisbon 1096, Portugal
christian.cremona@cerema.fr
Show less
History+
Received
Accepted
Published
2014-08-12
2014-09-15
2015-04-02
Issue Date
Revised Date
2014-12-12
PDF
(2672KB)
Abstract
A large amount of researches and studies have been recently performed by applying statistical and machine learning techniques for vibration-based damage detection. However, the global character inherent to the limited number of modal properties issued from operational modal analysis may be not appropriate for early-damage, which has generally a local character.
The present paper aims at detecting this type of damage by using static SHM data and by assuming that early-damage produces dead load redistribution. To achieve this objective a data driven strategy is proposed, consisting of the combination of advanced statistical and machine learning methods such as principal component analysis, symbolic data analysis and cluster analysis.
From this analysis it was observed that, under the noise levels measured on site, the proposed strategy is able to automatically detect stiffness reduction in stay cables reaching at least 1%.
João Pedro SANTOS, Christian CREMONA, André D. ORCESI, Paulo SILVEIRA, Luis CALADO.
Static-based early-damage detection using symbolic data analysis and unsupervised learning methods.
Front. Struct. Civ. Eng., 2015, 9(1): 1-16 DOI:10.1007/s11709-014-0277-3
Structural Health Monitoring (SHM) can be defined as the development and application of strategies to identify abnormal behaviors (such as damage) in structural systems [ 1]. In civil engineering structures, damage may lead to expensive maintenance actions and, if it occurs with significant magnitude, may result in dramatic social and human consequences. An effective SHM strategy should therefore aim at identifying damage in an early stage, which is generally related to local phenomena with small magnitude.
Damage identification has been extensively studied in the framework of mechanical, airspace and civil engineering structural systems by using model-based or data driven approaches [ 1, 2]. The first type typically aims at identifying damage by fitting a numerical model to real data, a procedure which is usually time expensive and therefore must be combined with optimization techniques. Conversely, data driven approaches are more flexible and less expensive since they do not require the development structure-specific models. This type of methods is therefore addressed in this paper.
Damage detection has been described in the literature as a four-level scale [ 3]: 1) damage detection, 2) localization, 3) type and severity assessment and 4) lifetime prediction update. While the first and second levels can be carried out by data driven methods alone, the third and fourth require the use of numerical models and the last may also require local non-destructive testing and additional theoretical concepts such as fracture mechanics or fatigue analysis [ 4]. This paper is mainly focused on the first level of the previous scale (damage detection) by means of data driven techniques: early-damage detection is targeted. To fulfil this objective, three main tasks are required [ 5] and addressed herein: 1) feature extraction, 2) data normalization and 3) statistical classification.
Damage sensitive feature extraction is mandatory for early-damage detection approaches since the acquired data, alone, may not be informative about the presence of damage. Modal-based quantities are by far the most reported features in the literature [ 6], yet autoregressive models [ 7, 8] and wavelet transforms [ 7, 9] have also been reported as damage sensitive feature extractors for both static and dynamic monitoring. Principal Component Analysis (PCA) is usually applied after feature extraction for dimensionality reduction. However, Posenato et al. [ 7] showed that damage sensitive features can also be obtained from this multivariate statistical method in the realm of static monitoring. Symbolic data has shown to be useful in compressing and representing data without loss of information [ 10] and proved to be a sensitive feature extractor in SHM works applied to bridges [ 11].
Data Normalization can be defined as the process of separating data changes caused by environmental and operational effects from those caused by damage occurrence [ 12]. This process is crucial for early-damage detection and false alarm prevention since environmental conditions, such as temperature, may impose larger variations than those due to damage [ 13, 14]. In large structures, normalization strategies are generally based on linear or nonlinear regression methods such as Neural Networks [ 14, 15], Support Vector Regressions [ 13, 16] and Linear Regression [ 17, 18]. Latent variable methods have also been used recently for data normalization and present the advantage of intrinsically considering actions such as temperature without the need to measure them. Among these methods, PCA [ 19, 20] and one of its nonlinear counterparts, the Auto Associative Neural Networks [ 21, 22], have lately shown great effectiveness and attention.
Statistical Classification aims at distinguishing damaged from undamaged related data and can be divided into supervised and unsupervised methods. Since data obtained from damaged structures is scarce or inexistent, unsupervised techniques such as the Statistical Process Control (outlier detection) have been the most reported for damage detection purposes [ 19, 21, 23]. Cluster analysis can be seen as an alternative to outlier detection since it allows to distinguishing different groups in data without the need to define reference baselines in which the structures must be assumed as undamaged and unchanged. Even cluster analysis has been reported has an efficient damage detection approach [ 24, 25], its computational complexity has discouraged its use in SHM of large civil structures. To circumvent this disadvantage, symbolic dissimilarity measures have been included in cluster analysis to provide computational efficiency [ 18, 26, 27].
The large majority of damage detection methods are based on vibration monitoring [ 4- 6, 21, 26], which assume that damage produces changes in structural stiffness and use modal quantities (frequencies, mode shapes or damping ratios) to detect it. The global character of these features makes damage detection algorithms less sensitive to early-damage which has, in general, a local character [ 1, 5, 28]. Furthermore, dynamic SHM systems are extremely expensive and sensitive to ambient noise. Static based SHM systems can be deployed with less cost, yet not so many works report damage detection based on their use. Recently, numerical detection and localization was performed under the principle that damage produces changes in measured effects generated by dead loads [ 1, 29]. However, this method did not gain attention in the SHM active fields of mechanical or aerospace systems since its application requires that dead load effects must prevail significantly above the remaining ones [ 1], a fact which, in general, is only observed in civil engineering structures.
The present paper describes an original early-damage detection strategy which, unlike those found in the literature, does not require the definition of baseline references and therefore does not require that the monitored structure be unchanged and undamaged during a certain period of time. This original milestone is achieved by conducting feature extraction, data normalization and statistical classification using only unsupervised methods such a PCA, cluster analysis and symbolic dissimilarities. Another original feature of the proposed strategy consists of its sensitivity to early-damage (with local character), which is obtained by combining clustering methods static effects data generated by dead load redistributions. The use of static effects within the proposed strategy also renders its application on practical case studies less expensive, thus potentiating its practical application.
To test and validate the effectiveness of the developed methodology, data acquired from a cable-stayed bridge was used as input in finite element time-history analysis comprising damage simulations. After this introduction, section 2 describes the theoretical background of the unsupervised learnings methods applied herein, section 3 describes the cable-stayed bridge, its monitoring system and the numerical simulations conducted. In section 4 the proposed strategy is presented and applied and in section 5 the main conclusions obtained from this work are drawn.
Principal Component Analysis (PCA), or Karhunen-Loève transform, is a well-known multivariate statistical method which allows to obtain, from a group of correlated variables, a set of linearly uncorrelated vectors called principal components or scores [ 30]. In static SHM, where measurements are highly correlated, this method can be useful to distinguish, without significant computational complexity, the uncorrelated (“independent”) effects generated by different loads acting on a structure.
Let us consider a centered data matrix , with n measurements performed in p sensors, The PCA consists of a linear mapping between the original variables, , and the principal components, , as follows:
where the orthonormal linear transformation matrix, , is given by the solution of an eigenproblem on the correlation matrix (built from the measured data ), as shown in Eq. (2).
The Gamma matrix is a diagonal matrix with positive or null values that are the eigenvalues of the correlation matrix. The transformation shown in Eq. (1) is defined such that each principal component (column vector of ) has the higher correlation possible under the constraint of orthogonality to the preceding ones. Hence, the elements of the diagonal matrix , which are usually named as “active energies” and express the relative importance of each principal component in the entire data set’s variation [ 19], are usually placed in descending order.
Symbolic data objects and dissimilarity measures
Symbolic data can be defined as a richer, less voluminous and less specific type of information, when compared to classical data [ 11, 18, 26]. While classical data mining focuses in detecting groups or patterns in individual measurements, symbolic data deals with concepts, which must be properly defined to statistically describe the analyzed data. For instance, a week of hourly measurements acquired from 15 sensors can be described by the 2520 individual measurements or it can consist of a single symbolic object named “week of acquired data.” This object must be described by statistical quantities such as histograms or interquartile intervals providing data compression without significant loss of generality or information [ 10]. In the case of interquartile intervals, a week of data are reduced to only 30 values (15 intervals). This type of statistical quantity has proved to be sensitive in detecting structural changes [ 11, 18, 26] and is consequently used in the present paper.
The effectiveness of symbolic data analysis in SHM relies heavily on the definition of symbolic dissimilarities or distances between data objects [ 10, 31]. These measures can supply numerical values which reflect the distance between a pair of data objects. In a common sense, the lower these values are the more similar the objects may be according to their intrinsic features. Conversely, the objects with the highest discrepancies are the ones which evidence greater distances between them. A distance measure can therefore be used to quantify similarities as well as dissimilarities in data. Dissimilarity measures can take a variety of forms and some applications might require specific ones. Their choice can, therefore, greatly influence the effectiveness of the damage detection strategy in which they are included [ 32].
For the present work, three distinct symbolic dissimilarity measures were considered: the Normalized Euclidean Ichino-Yaguchi distance [ 33], the Gowda-Diday dissimilarity measure [ 34] and the Normalized Euclidean Hausdorff distance [ 31]. Their theoretical background can be seamlessly described by considering two symbolic objects Ti and Tj, obtained from a data set of symbolic objects, which are described by their r interquartile intervals, respectively and . The index stands for each of the p variables used to define the dissimilarity measure.
The Gowda-Diday dissimilarity measure, , defined between the pair of objects Ti and Tj, is given by:
where,
The Normalized Euclidean Ichino-Yaguchi distance measure can be calculated as it follows:
where,
and where is a pre specified constant ranging from 0 to 0.5. The operators , as well as the norm , are defined by:
and the quantity is defined in Eq. (7).
The Normalized Euclidean Hausdorff distance is given by:
where,
Cluster analysis
Clustering methods
Clustering methods consist of unsupervised learning algorithms with the ability to classify objects as members of different subsets (or clusters). Unlike supervised learning algorithms such as decision trees or neural networks, clustering methods do not require previous information about the objects’ memberships, which are obtained according to the data’s intrinsic characteristics, or dissimilarities.
The aim of a clustering method can be defined as the division of a data set into groups, which must be as compact and separated as possible [ 32]. To fulfil this objective, allocation rules must be defined so that pair-wise dissimilarities between objects assigned to the same cluster tend to be smaller than those allocated in different clusters [ 26]. Considering a given partition containing K clusters, , the within-cluster dissimilarity can be generally defined as [ 35]:
where C(i) is an allocation rule which assigns element i to cluster k, based on the dissimilarity measure dij. The total variation of data can be defined as in Eq. (17), where N is the total number of objects considered in the cluster analysis. Finally the between-cluster dissimilarity, , can be obtained by subtracting the other two, .
Once a set of clusters is defined, each can be described by its prototype, which generally consists of an object of the same type as the ones being clustered. This representation provides great data fusion while being sensitive, since it is based on the data’s compactness and density [ 32]. The location of each prototype is obtained, in the present work, by computing the centroid of the clusters’ members which can be defined considering a cluster with N objects , each described by interquartile values with . Considering these, the centroid of cluster is then simply defined by:
Several families of cluster methods can be found in the literature [ 32, 35], however the most used are the combinatorial and hierarchical methods. While the first is iterative in nature and requires the input of an initial set of clusters’ prototypes (and their centroids), the hierarchical methods provide a merging (or separation) hierarchy so that all partitions are defined, regardless of the number of elements [ 35]. This hierarchy can be displayed in a plot named dendrogram, which allows for a clear visualization of the structure of high dimensional data in a single and unambiguous plot. This type of plot remains as one of the main advantages of hierarchical methods and an important reason for their use [ 35].
Cluster validity
Clustering algorithms aim at defining a certain amount of partitions in data irrespective of whether this number of clusters really exists. As a result, high values of within-cluster dissimilarity and, thus, solutions which are far from being optimal [ 32] can be obtained. To overcome this issue, a quantitative evaluation known as cluster validity is usually performed on the outputs of clusters analyses. It consists of computing validity indexes for several pre-chosen cluster partitions with different number of k clusters [ 32]. Structure and dimensionality of data can influence the outcome of this task and, thus, the choice of the most appropriate validity index, which can identify the optimal partition by its maximum or minimum value. In the present work, four validity indexes were tested, three which were reported as exhibiting better performance in the reference study [ 36]: the Calinski and Harabaz (CH) index, the C* index and the Gamma index; and a more recent named Global Silhouette (SIL) index [ 32]. Their theoretical background is presented herein by considering a symbolic data set of N objects, K clustering partitions, a partition containing t distinct clusters and its k-th cluster constituted by Mk objects ( ).
The Calinski and Harabaz (CH) index is given by:
where is the overall between-cluster dissimilarity and is the overall within-cluster dissimilarity of partition . The partition corresponding to the maximal absolute value is identified as the optimal clustering partition (i.e., the optimal number of clusters).
The index can be calculated as:
where represents the sum of distances among the k objects within a cluster , is the sum of the k smallest distances among all objects and, conversely, is the sum of the k largest distances among all objects. The optimal partition is given by the minimal absolute value of C* index.
The Gamma ( ) index is obtained by:
where represents the number of within-cluster distances which are smaller than between-cluster distances, and is the number of within-cluster distances larger than between-cluster distances. The optimal partition is given for maximal absolute value.
The silhouette width of the i-th object belonging to cluster is defined in the following way:
where is the average dissimilarity between the i-th object in the cluster and the remaining j objects assigned to the same cluster and is the minimum average distance between the same object i and all the objects clustered in one of the remaining clusters.
The silhouette index of cluster Ck, sk, and the global silhouette index of partition t, , are then respectively given by:
The higher the global silhouette, the more compact and separate are the clusters. Hence, its maximal value indicates the optimal partition.
Case study—International Bridge over River Guadiana
Description of the structure and the SHM system
The International Bridge over River Guadiana (Fig. 1) is a cable-stayed bridge located in the South West of the Iberian Peninsula’s, and was built to connect the regions of Algarve (Portugal) and Andalucía (Spain). The bridge has a central span of 324 m and two lateral and transition spans of 135 and 36 m, respectively. The deck is a pre-stressed concrete box girder 18 m wide and 2.5 m high, suspended by one hundred and 20 eight stay cables that are composed of individually sheathed mono strands, varying from 22 to 55 (Fig. 1(a)). Their length varies from 48 to 167 m, and the stay cables are equally spaced every 9.0m on the deck and every 1.8m on the pylons. Shorter cables are clamped at mid length while longer at third length. The A-shaped pylons are 95 and 96 m high and consist of concrete hollow sections which, besides anchoring the cables, support the deck at a height of 35 m by means of hollow section transverse beams.
The bridge was open to traffic in 1991 and was the target of extensive studies prior, during and after construction. In addition, a permanent monitoring system consisting of acoustic strain gages and resistance thermometers was installed for periodic manual data acquisition. In December 2010 an autonomous online SHM system [ 37- 39], was installed on the bridge with the aim of carrying out early-damage detection thus contributing to an increase in safety and to a reduction of maintenance costs. Sensors’ location (Fig. 1(b)) was based on the principle that any damage in a cable stayed bridge may be revealed by dead load redistribution in cables’ tension, with consequent displacement and rotation changes close to the anchorages. Load cells, which would directly measure the cables’ forces, were not installed and are usually avoided due to operational, economical and applicability constraints. Instead, hydrostatic pressure cells (named as NL throughout the present paper) and magnetostrictive transducers (named herein as DH) were used for measuring deck and joint displacements, respectively. Bi-axial inclinometers (named as CL with suffixes of L and T for rotations observed within and out of the bridge’s plane) were installed on the top of the pylons. Infrastructure differential displacements and rotations are also controlled using the same type of inclinometers (CL), installed in each foundation and abutment (Fig. 1(b)).
Data acquisition is being carried out synchronously and hourly by a locally deployed industrial computer. At each hour, scale and covariance robust estimators are computed and compared to limits established by outlier and goodness-of-fit statistical tests. This strategy successfully removes values related to sensor malfunction and dynamical effects from wind and traffic [ 37- 39]. Data are sent daily to an FTP server which is hourly queried by a routine that automatically stores data in a MySQL database server [ 37].
Numerical simulation
The absence of damage in the Guadiana Bridge has motivated the development of a numerical model for simulating its behavior. In the present work, this behavior was simulated by running finite element time-history analyses using only experimental data as input. The tri-dimensional numerical model is geometrically and physically linear for the sake of computational simplicity and is composed of (Fig. 2) 404 beam elements and 543 nodes, reproducing the geometry of the original design. Stay cables were defined as beam elements free of bending moments and compression. Piles’ shafts were continuously restrained by linear elastic Winkler springs with stiffness values varying from 24MN/m to 105MN/m, according to the design studies. The Young Modulus and unitary weights were defined as Ec = 42 GPa and γc = 25 kN/m3 for concrete and Esp = 195 GPa and γsp = 78.5 kN/m3 for stay cable steel [ 40]. Coefficients of linear thermal expansion are αsp = 1.2°C-1 and αc = 1.0°C-1 for stay cable steel and concrete.
To guarantee that the numerical model accurately simulates the bridge’s structural behavior, its mode shapes were compared with those identified in the last experimental modal analysis performed and reported in [ 40]. Table 1 and Fig. 3 show the good agreement between the modal frequencies obtained from the developed numerical model, and the experimental natural frequencies obtained in [ 40]. The average modal frequency error obtained across all mode shapes is 1.76%. The mode shapes used to evaluate the fitness between the numerical model and the real structure are shown in Fig. 4 and have either a vertical or a lateral nature. Mode shapes comprising mainly torsion of the bridge’s deck were not obtained since this element was modeled as a single beam element, for the sake of computational efficiency.
The use of modal quantities to fit the numerical model to the real structure was chosen based on the premise: if numerical frequencies and mode shapes correspond to those measured in situ, then the stiffness and mass are well modeled and thus the model will respond to damage as the real structure would (regardless of the actions imposed to it). Static measurements could also have been considered for evaluating the fit between the model and the real structure. However, besides depending on the stiffness and mass (as the modal quantities), these also depend on the representativeness of the temperature measurements throughout the structure. Since only a few thermometers are installed on the structure (Fig. 1), choice was made to perform the fitting using the modal quantities.
The time-history numerical simulation aims at reproducing, as truthfully as possible, the structural behavior using as input measured temperature and noise data (Fig. 5). Measured temperature data (average temperatures in deck, pylons and cables, and differential temperature in the latter two) are used as input in the numerical time-history simulation (Fig. 6(a)-(c)). This analysis generates simulated displacements and rotations (Fig. 6(e)). To obtain the most similar and trustworthy reproduction of the real SHM data, the uniformly distributed noise (Fig. 6(d)) measured on site by the sensors is added to the numerical output.
Uniform noise distributions with 3'' and 0.1 mm spans were observed for rotations and displacements, respectively. Each time series used in the numerical simulation is constituted by over 6500 data points, spanning a 9– month period (January 2011 ‒ February 2012). The gaps shown in the time series of Fig. 6(a), 6(e) and Fig. 7 are related to maintenance actions carried out by the structure’s owner on the bridge’s power supply system.
Damage was simulated by applying controlled temperature time series (Fig. 6(b)) to one selected stay cable. The applied temperature values were calculated to reproduce equivalent stiffness losses under dead loading, which might simulate the breakage of stay cable wires due to corrosion or the slipping of wedges in the cables’ anchorages. Both phenomena are difficult to inspect during the structure’s life and might result in sudden stiffness losses regardless if the steel is in elastic or plastic regime. This numerical simulation procedure was used to accurately reproduce damage scenarios and to test the original data-driven proposed strategy.
All the structural quantities given by the numerical simulation are highly correlated with temperature measurements as it can be observed on Fig. 6(a) and Fig. 7.
Damage detection strategy
The original detection strategy proposed herein consists is illustrated using five scenarios of early-damage. These were simulated as stiffness reductions of 1%, 2%, 5% and 10% in stay cable 78, occurring instantly on the 1st of July, 2011. These reduction percentages correspond to the rupture of approximately 2, 4, 11 and 22 wires, out of the 217 wires composing the stay cable.
The choice of this stay cable was arbitrary and different ones could have been used. However, this one was chosen due to the fact that it: 1) exhibits an intermediate length within the fans, 2) sustains a significant dead load of 2676 kN (minimum is 600 kN is the shorter cables and maximum is 3300 kN is the longer ones) and 3) is not anchored close to any of the installed sensors, thus constituting a challenging example for the detection strategy proposed herein. The changes generated by the early-damage simulations could not be identified in the resulting time-series since they were completely masked by the effects generated by temperature.
Data normalization
In static SHM, early-damage can be outlined by normalizing (or suppressing) the effects related to “regular” actions such as temperature. PCA normalization is based on the principle that, since early-damage is generated by local dead load redistribution, it produces distinct variations from environmental loading acting globally on the structure [ 22].
While the time-series outputted by the early-damage simulations did not exhibited visible changes, their corresponding principal components (extracted using Eq. (1)) did. As it can be observed in Fig. 8, the local dead load redistribution produced clear shifts in one or more principal components at the instant in which damage was simulated. This result shows that the linear transformation described by Eq. (1) separated the effects generated by damage from the remaining ones. Moreover, these simulations also allowed observing that larger damage magnitudes produce more sensitive shifts in higher “active energy” components. These remarks were illustrated using few components, in Fig. 8. However, they were obtained by observing all principal components obtained for each damage scenario simulated.
To assess which principal components seem to be affected by local dead load redistribution, Kolmogorov–Smirnov (K-S) goodness-of-fit tests [ 41] were performed on each of the 15 principal components. Each K-S test is performed on pairs of principal components of the same order. In each pair, one of the components is obtained from the undamaged reference state and the other from one of the damage scenarios. The result of this analysis is summarized in Fig. 9, where a p-value close to one suggests that the principal component is independent from the corresponding damage scenario. For the undamaged scenario it can be observed that all principal components are reported as independent from damage, with p-values equal to one. From this figure it can also be observed that, for the four damage magnitudes tested, the first five principal components seem not to be influenced by damage and are therefore assumed to be related to global variations caused by temperature.
As observed in Fig. 9, PCA is able to retain meaningful information, related to global effects such as temperature, in the first components whereas variations related to measurement inaccuracy, noise or other small magnitude effects such as early-damage, may be summarized in latter components. However, the issue of determining whether or not a given component summarizes meaningful variation remains unclear in many cases. When the correct number of principal components is not retained for subsequent analysis, either relevant information is lost (underestimation) or surplus effects are included (overestimation), causing sensitivity of damage detection algorithms to decrease. Determining the number of meaningful principal components remains one of the greatest challenges in providing a truthful interpretation of multivariate data. It has been a long-standing issue in both biologic and statistical literature, and a variety of stopping rules have been proposed for its estimation without resorting to external comparison or baseline information [ 30, 42].
Since the aim of the present work is to detect early-damage, which has generally a local character, the principal components retained for damage detection are the ones evidencing smaller “active energies.” The components identified as meaningful by the stopping rules are associated with temperature or other “regular” actions and not considered for damage detection.
Among the different stopping rules found in the literature, the Kaiser-Guttmann parameter [ 43] is very popular: it consists in considering as meaningful (related to temperature) only principal components larger than 1. The Kaiser-Guttmann parameter was tested on the five simulated scenarios (undamaged and damages) and returned three as the amount of principal components related the global effects of temperature, as it can be observed in Fig. 10(a).
Another stopping rule found in the literature is the broken-stick method. The idea of this rule is that if a stick is randomly broken into p pieces, b1 would be the average size of the largest piece in each set of broken sticks, b2 would be the average size of the second largest piece, and so on. The parameter p equals the number of components. The proportion of total variance associated with the eigenvalue for the k-th component under the broken-stick model is obtained by Eq. (26):
If the k-th component has an eigenvalue larger than , then this component is considered as related to global effects and is not considered for damage detection. Unlike the Kaiser-Guttmann, this method identified five principal components as meaningful (Figs. 10(a) and (b)) in all simulations carried out in this work, a result which is in agreement with the baseline comparison performed with the Kolmogorov-Smirnov tests (Fig. 9). This result suggests that the first five principal components are related to temperature and that only the last ten axes should be subsequently considered for damage detection, since these may be related to local effects such as early-damage.
Feature extraction
In the present study, it was decided to extract symbolic objects and dissimilarity measures from the set of standardized principal components (SPC), identified by the Broken-Stick method as non-related to environmental global effects. This strategy provided an increase in sensitivity to early-damage by regularizing the principal components’ “active energies”, making damage detection independent from the components in which the shifts (Figs. 10 and 11) are observed.
Standardization is performed by subtracting each score by its mean and dividing the result by its standard deviation, as shown in Eq. (27). These standardized scores are dimensionless and their values (shown in the vertical axes in Fig. 11) express standard deviations. In Eq. (27), , are the average and standard deviation of PCi.
Considering the simulation of a 5% instantaneous stiffness reduction in stay cable 78 (Fig. 8), symbolic objects defined as “weeks of standardized principal components” were obtained and described by ten interquartile intervals (corresponding to latter ten principal components, obtained according to the Broken-Stick rule). Four series of these statistical quantities are presented by the colored regions of the box-and-whiskers plots shown in Fig. 11(b). By comparison with Fig. 11(a), it can easily be observed that, even though significant data compression took place, damage-related shifts are present in both types of data and appear to be outlined by the symbolic data. This fact seems to be related to the stability and generalization capacity of interquartile values in representing the data’s structure.
From the 40 three symbolic objects, described by ten interquartile objects each, symbolic dissimilarity matrices have been obtained and are presented in Fig. 12. These matrices contain the pair-wise dissimilarities between all 43 symbolic objects and constitute the input for cluster analysis. As it can be observed in Fig. 12, variations related to small magnitude damage are clearly highlighted in this type of information, where two distinct groups of data can be identified, regardless of the calculated dissimilarity. However, the Ichino-Yaguchi dissimilarity (Fig. 12(c)) produces a more sensitive outline of the effect of dead load redistribution than the other two (Figs. 12(a) and (b)), leading to a better sensitivity, of the proposed strategy, to early-damage.
Statistical classification
In the present section, the application of cluster analysis as a statistical classification method for baseline-free early-damage detection is highlighted using data from: 1) an undamaged simulated scenario and, 2) 5% instantaneous stiffness reduction in stay cable 78. The Ichino-Yaguchi dissimilarity measure was used as input in a hierarchical agglomerative cluster method providing the results presented in Fig. 13. Two of the ten interquartile series (SPC6 and SPC7) that describe the 43 symbolic objects are presented in Figs. 13(c) and (d). Their corresponding dendrograms can be observed in Figs. 13(a) and (b), respectively. From the latter (dendrograms), where the vertical distance is directly related to the dissimilarity measure between objects, it can be observed that the damage occurrence (Fig. 13(b)) generates more compact clusters when compared to the undamaged scenario (Fig. 13(a)). In the dendrogram plots, greater compactness is identified by a smaller cluster height. Hence, comparing Figs. 13(a) and (b) allows to conclude that when damage occurs, the distance within clusters is reduced and its counterpart, the distance between clusters , increased. This fact can prove useful in defining early-damage sensitive features and suggests the use of in their definition. In spite of being single-valued, features based in the distance between clusters can be sensitive to early-damage (with local character) and still reflect changes in a set of sensors installed throughout an entire structure, as it will be shown in the next subsection.
The observation of the dendrograms presented in Figs. 13(a) and (b) suggests that there are two preferential groups in data, regardless of the existence of damage. However, this conclusion must be confirmed using cluster validity, as referred in Section 2.3. For the two time-history simulation scenarios used to plot Fig. 13, the four validity indices presented in Section 2.3.2 were obtained. These are presented in Fig. 14 considering cluster partitions with two to eight clusters. Each of the index’s values, presented in this figure, are divided by their maximum so that all plotted values do not surpass 1.0, hence easing the comparison between indices.
By observing Figs. 14(a) and (b), it can be observed that the CH (index with best performance in Ref. [ 36]) and the SIL indexes exhibit great correlation and evidence a more stable behavior, regardless of the presence of damage. These two indexes reveal two as the optimal cluster partition, for both data sets considered. Conversely, the C* and Gamma indexes vary significantly. While the first exhibits great changes but suggests the same optimal cluster partition for both simulated scenarios, the latter changes the optimal partition estimate from eight in an undamaged state, to two under a 5% stiffness reduction.
As referred above in the present subsection, distances between clusters, B(Pk), can be used as early-damage sensitive features. To obtain an accurate description of these features through time, under environmental and/or structural changes, the use of more stable indexes should lead to an increase in confidence, reducing the possibility of false detections. Thus, in the present work the SIL index was used, not only in the analyses presented in Fig. 13, but also in the real time simulation presented in the next subsection.
Real-time simulation
To study the effectiveness of the proposed strategy for real time SHM applications, the five stiffness reductions (0%, 1%, 2%, 5% and 10%) occurring instantaneously in stay cable 78, on the 1st of July, were considered.
For each of these simulated scenarios 40 three analyses, corresponding to the 40 three weeks of measured data, were performed to simulate the functioning of an online SHM system which collects data weekly from the in situ deployed hardware. At each data collection, the time series used as input spans from instant 0 (in 11/01/2011) to the last instant of collected data, resulting in an increasing size of the analyzed data set.
The output obtained at each week consists of the average distance between clusters, , which was observed to be a single-valued early-damage sensitive feature in the previous subsection. Figure 15 presents the -values for each of the five simulated scenarios during the period of SHM on the Guadiana Bridge (11/01/2011 to 01/02/2012). From this figure, the great stability of the chosen sensitive feature can be observed in the undamaged scenario -values, where no increasing trend and small variability are exhibited. When damage scenarios are analyzed in real-time, very significant increases of the features’ values are observed, even for a stiffness reduction as small as 1%, thus allowing for clear early-damage detections under the amount of noise measured in situ.
Conclusions
The present paper describes a novel data-driven strategy to detect early-damage under environmental effects, based on static monitoring, statistical techniques and unsupervised learning algorithms. The developed strategy consists in fusing sets of measurements developed in such a way that it is of computational simplicity, allows a real-time implementation, and consists of the combination of: 1) PCA, 2) the Broken-Stick, 3) symbolic data, 4) interquartile based dissimilarity measures and 5) cluster analysis.
The proposed strategy led to detections of 1% structural stiffness reductions in a stay cable. These results, obtained using a static SHM system composed of few inexpensive sensors, clearly evidence the great effectiveness of the proposed strategy.
From the analyses performed, with respect to environmental normalization, it was observed that combining PCA with the Broken-Stick method provides an effective distinction of temperature and damage-related effects. Great sensitivity to early-damage was achieved by: 1) by computing the Ichino-Yaguchi dissimilarity measure from symbolic data objects described by interquartile intervals, and 2) by standardizing the principal components prior to the definition of the symbolic objects, leading to a normalization of the components’ eigenvalues or “active energies.” Strategies including the Gowda-Diday and Hausdorff dissimilarity measures and the use of non-standardized principal components lead to a less efficient damage detection strategy.
Four cluster validity indexes were tested to automatically and objectively obtain the data’s optimal cluster partitions. All of them revealed sensitiveness to early-damage, however two of them, the Calinski and Harabaz and the Global Silhouette indexes, revealed greater stability and were, therefore, considered on the proposed strategy.
The effectiveness of the proposed strategy was verified, in the present paper, by performing a real-time procedure simulation. With its results, it was concluded that the average distance between clusters is an effective early-damage sensitive feature which, in spite of being single-valued, reflects changes from all sensors installed throughout an entire structure. This simulation showed that all early-damage scenarios simulated (1%, 2%, 5% and 10%) can be clearly and unambiguously detected and distinguished from the undamaged scenario.
Hu X, Shenton H W. Damage identification based on dead load redistribution methodology. Journal of Structural Engineering, 2006, 132(8): 1254–1263
[2]
Teughels A, De Roeck G. Damage detection and parameter identification by finite element model updating. Rev Eur Génie Civ, 2005, 9(1): 109–158
[3]
Rytter A. Vibration Based Inspection of Civil Engineering Structures. Aalborg University, 1993
[4]
Doebling S W, Farrar C R, Prime M B, Shevitz D W. Damage Identification and Health Monitoring of Structural and Mechanical Systems from Changes in Their Vibration Characteristics: A Literature Review. Los Alamos, USA, 1996
[5]
Sohn H, Farrar C, Hemez F M, Shunk D D, Stinemates D W, Nadler B R. A Review of Structural Health Monitoring Literature: 1996 - 2001. Los Alamos, USA, 2004
[6]
Alvandi A, Crémona C. Assessment of vibration-based damage identification techniques. Journal of Sound and Vibration, 2006, 292(1-2): 179–202
[7]
Posenato D, Kripakaran P, Inaudi D, Smith I F C. Methodologies for model-free data interpretation of civil engineering structures. Computers & Structures, 2010, 88(7-8): 467–482
[8]
Nair K K, Kiremidjian A S, Law K H. Time series-based damage detection and localization algorithm with application to the ASCE benchmark structure. Journal of Sound and Vibration, 2006, 291(1-2): 349–368
[9]
Moyo P, Brownjohn J M W. Detection of anomalous structural behaviour using wavelet analysis. Mechanical Systems and Signal Processing, 2002, 16(2-3): 429–445
[10]
E. Diday and Noirhomme-Fraiture. Symbolic Data Analysis and the SODAS Software. Chicester: John Wiley and Sons, 2008, 445
[11]
Cury A, Crémona C. Assignment of structural behaviours in long-term monitoring: Application to a strengthened railway bridge. Structural Health Monitoring, 2012, 11(4): 422–441
[12]
Oh C K, Sohn H. Damage diagnosis under environmental and operational variations using unsupervised support vector machine. Journal of Sound and Vibration, 2009, 325(1-2): 224–239
[13]
Hua X G, Ni Y Q, Ko J M, Wong K Y. Modeling of temperature – frequency correlation using combined principal component analysis and support vector regression technique. Journal of Computing in Civil Engineering, 2007, 21(2): 122–135
[14]
Zhou H F, Ni Y Q, Ko J M. Constructing input to neural networks for modelling temperature-caused modal variability: Mean temperatures, effective temperatures, and principal components of temperatures. Engineering Structures, 2010, 32(6): 1747–1759
[15]
Mata J. Interpretation of concrete dam behaviour with artificial neural network and multiple linear regression models. Engineering Structures, 2011, 33(3): 903–910
[16]
Ni Y Q, Hua X G, Fan K Q, Ko J M. Correlating modal properties with temperature using long-term monitoring data and support vector machine technique. Engineering Structures, 2005, 27(12): 1762–1773
[17]
Posenato D. Model-Free Data Interpretation for Continuous Monitoring of Complex Structures. École Polytechnique Fédérale de Lausanne, 2009
[18]
Cury A. Téchniques D’Anormalité Appliquées a la Surveillance de Santé Structurale. Université Paris-Est, 2010
[19]
Yan A, Kerschen G, De Boe P, Golinval J C. Structural damage diagnosis under varying environmental conditions—Part I: A linear analysis. Mechanical Systems and Signal Processing, 2005, 19(4): 865–880
[20]
Bellino A, Fasana A, Garibaldi L, Marchesiello S Ã. PCA-based detection of damage in time-varying systems. Mechanical Systems and Signal Processing, 2010, 24(7): 2250–2260
[21]
Zhou H F, Ni Y Q, Ko J M. Structural damage alarming using auto-associative neural network technique: Exploration of environment-tolerant capacity and setup of alarming threshold. Mechanical Systems and Signal Processing, 2011, 25(5): 1508–1526
[22]
Hsu T Y, Loh C H. Damage detection accommodating nonlinear environmental effects by nonlinear principal component analysis. Structural Control and Health Monitoring, 2010, 17(3): 338–354
[23]
Mujica L, Rodellar J, Fernandez A, Guemes A. Q-statistic and T2-statistic PCA-based measures for damage assessment in structures. Structural Health Monitoring, 2011, 10(5): 539–553
[24]
da Silva S, Dias Júnior M, Lopes Junior V, Brennan M J. Structural damage detection by fuzzy clustering. Mechanical Systems and Signal Processing, 2008, 22(7): 1636–1649
[25]
Sohn H, Kim S D, Harries K. Reference-Free Damage Classification Based on Cluster Analysis. Comput Civ Infrastruct Eng, 2008, 23(5): 324–338
[26]
Cury A, Crémona C, Diday E. Application of symbolic data analysis for structural modification assessment. Engineering Structures, 2010, 32(3): 762–775
[27]
Santos J, Orcesi A D, Silveira P, Guo W. Real time assessment of rehabilitation works under operational loads. In: Proceedings of the ICDS12 - International Conference on Durable Structures: From construction to rehabilitation. 2012
[28]
Hua X G, Ni Y Q, Chen Z Q, Ko J M. Structural damage detection of cable-stayed bridges using changes in cable forces and model updating. Journal of Structural Engineering, 2009, 135(9): 1093–1106
[29]
Hu X, Shenton H W III. Damage identification based on dead load redistribution effect of measurement error. Journal of Structural Engineering, 2006, 132(8): 1264–1273
[30]
Jolliffe I T. Principal Component Analysis. 2nd ed. Aberdeen: Springer, 2002, 518
[31]
Billard L, Diday E. Symbolic Data Analysis. Chichester: John Wiley and Sons, 2006, 52(2): 321
Ichino M, Yaguchi H. Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems, Man, and Cybernetics, 1994, 24(4): 698–708
[34]
Gowda K C, Diday E. Symbolic clustering using a new dissimilarity measure. IEEE Transactions on Systems, Man, and Cybernetics, 1991, 24(6): 567–578
[35]
Hastie T. The Elements of Statistical Learning, Data Mining, Inference and Prediction. 2nd ed. Stanford, USA: Springer, 2011, 763
[36]
Milligan G, Cooper M. An examination of procedures for determining the number of clusters in a data set. Psychometrika, 1985, 50(2): 159–179
[37]
Santos J, Silveira P. A SHM framework comprising real time data validation. In: Proceedings of IALCCE 2012 - 3rd International Symposium on Life Cycle Civil engineering. 2012, 2
[38]
Santos J, Silveira P, Santos L O, Calado L. Monitoring of road structures—real time acquisition and control of data. In: Proceedings of the 16th IRF World Road Meeting. Lisbon, <month>May</month> 2010
[39]
Santos J, Orcesi A D, Silveira P, Pina C. Damage Detection under Environmental and Operational Loads on Large Span Bridges. In: V Congresso brasileiro de Pontes e Estruturas - Soluções Inovadores para Projeto. Execuçao e Manutençao, 2012
[40]
Caetano E, Cunha Á, Gattulli V, Lepidi M. Cable–deck dynamic interactions at the international Guadiana Bridge on-site measurements and finite element modelling. Structural Control and Health Monitoring, 2008, 15(3): 237–264
[41]
Massey F J Jr. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 1951, 46(253): 68–78
[42]
Jackson J E. A User’s Guide to Principal Components. Wiley-Interscience, 1991, 43(6): 641
[43]
Jackson D. Stopping rules in principal components analysis: A comparison of heuristical and statistical approaches. Ecology, 1993, 74(8): 2204–2214
RIGHTS & PERMISSIONS
Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.