Consistency of weighted feature set and polyspectral kernels in individual communication transmitter identification

Na SUN , Yajian ZHOU , Yixian YANG

Front. Electr. Electron. Eng. ›› 2010, Vol. 5 ›› Issue (4) : 488 -492.

PDF (189KB)
Front. Electr. Electron. Eng. ›› 2010, Vol. 5 ›› Issue (4) : 488 -492. DOI: 10.1007/s11460-010-0094-y
RESEARCH ARTICLE
RESEARCH ARTICLE

Consistency of weighted feature set and polyspectral kernels in individual communication transmitter identification

Author information +
History +
PDF (189KB)

Abstract

This paper presents a method using support vector machine with polyspectral kernels for classification of individual transmitters. Then, the neighborhood-rough-set-based weighted feature set is proposed. The experiments of the algorithms mentioned above indicate that they have consistency, which raises a new weighted kernel. The experiment shows that better classification rate can be achieved.

Keywords

polyspectral kernel / support vector machine (SVM) / neighborhood rough set / weighted feature set / weighted kernel

Cite this article

Download citation ▾
Na SUN, Yajian ZHOU, Yixian YANG. Consistency of weighted feature set and polyspectral kernels in individual communication transmitter identification. Front. Electr. Electron. Eng., 2010, 5(4): 488-492 DOI:10.1007/s11460-010-0094-y

登录浏览全文

4963

注册一个新账户 忘记密码

Introduction

On the difficulty for finding an optimal features’ set of individual communication transmitters because of the faint discrepancies, it is extremely hard to highly classify the transmitters especially when they are with the same model and manufacturing lot. The features of “turn-on” transient signals are usually used for classifying the transmitters [1-3] because they have unique characteristics. These have achieved good recognition rate, however, the features are raw and unstable to be widely applicable.

Then, researchers paid more attention to using stable characteristics of transmitters instead of transient features. There is a consensus that higher order spectra (polyspectra) of signal plays an important role in the analysis of subtle characteristics of transmitters. The major problem of polyspectra is the computation complexity caused by the increase of order. Even the calculus of the lowest order polyspectra (bispectra) is computationally expensive. Researchers are working at transforming the two-dimensional bispectrum into a one-dimensional template [4-6]. In nature, these algorithms are for feature extraction, which is, including linear or nonlinear mapping technique, high-dimensional data are projected toward an output space with dimension equal to the intrinsic dimension. Via feature extraction, the original physical significance is destroyed, and it is disadvantageous to the classification. Feature selection is another method. It chooses a subset of input variables by eliminating features with little or no predictive information, therefore physical significance can remain. The major problem of feature selection is how to evaluate the remaining variables in subset.

In this paper, we construct polyspectral kernels which are applied in support vector machines (SVMs). Meanwhile, the neighborhood-rough-set-based weighted feature subset is proposed. We design a simulation for analyzing the consistency of weighted feature set and polyspectrum kernels.

The rest of this paper is organized as follows: We define polyspectral kernels in support vector machines in Sect. 2, and the performance of the polyspectral kernels is investigated. In Sect. 3, the method of evaluating the features using the parameter “significances” is proposed. Then in Sect. 4, the performance of polyspectral kernels is compared with the performance of neighborhood-rough-set-based weighted feature subset. In Sect. 5, a weighted kernel is constructed from the algorithms mentioned above. Finally, the conclusions are made in Sect. 6.

Polyspectral kernels in SVMs

Definition of polyspectra

Let x(t) be a real stationary signal. The kth order moment of x(t) is defined as
mkx(τ1,τ2,,τk-1) =E{x(t)x(t+τ1)x(t+τk-1)} =-x(t)x(t+τ1)x(t+τk-1)dt.
mkx(τ1,τ2,,τk-1) is the kth order correlation function of x(t) where k2.

The kth order moment spectra of x(t) is the discrete multi-dimensional Fourier transform of the kth order moment. It is defined as
Mkx(ω1,ω2,,ωk-1) =τ1=-τk-1=-mkx(τ1,τ2,,τk-1)e-ji=1k-1ωiτi =τ1=-τk-1=--x(t)n=1k-1x(t+τn)e-ji=1k-1ωiτidt.

The polyspectrum for k=2 is the power spectrum, for k=3 it is called bispectrum and for k=4 it is called trispectrum.

In the paper, the kth order moment spectrum is used as polyspectrum, although the higher-order moment spectra and higher-order cumulants spectra constitute the higher-order spectra (polyspectra).

Polyspectral kernels

The n-order polyspectral characteristics reflect n-1 order frequencies components; therefore, they take more information of the transmitters. However, n-order polyspectra is much more computationally expensive than n-1 order polyspectra. Moreover, the computation of higher dimensionality is much more expensive. It is difficult to be utilized due to the computational complexity of polyspectra and high dimensionality of the feature space caused by the orders of the polyspectra growing higher. To address this problem we present an algorithm of using polyspectral kernels in SVMs. The computation of polyspectra can be avoided; the inner product of polyspectra will be used, which reduces the computational complexity.

SVM is a popular technique for pattern recognition and is considered as the state-of-the-art tool for linear and nonlinear classification [7]. Some conventional kernels are usually mentioned, such as the radial-basis function (RBF) kernel and polynomial kernel [8]. The kernel function is defined as
K(x,y)=iΦ(x)iΦ(y)i,
where Φ is a mapping function. Computation of K(x,y) is usually much less than evaluating Φ, thus it allows us to construct hyperplanes more easily in very high dimensional spaces. In this paper, the new polyspectral kernel is defined as a simple mathematical form; moreover, the computational complexity is reduced. In the same experimental environment, experiments show that calculation speed increases at least one order of magnitude when the order of polyspectra is low. The higher the order grows, the more obvious the advantage of polyspectral kernel takes on.

By defining Ωk-1=(ω1T ω2T ωk-1T)T, Tk-1=(τ1T τ2T τk-1T)T, the kth polyspectral kernel is defined as the following:
K(x,y)=Mkx(Ωk-1),Mky(Ωk-1)=Mkx(Ωk-1)×Mky*(Ωk-1)dω1dωk-1=τ1=-τk-1=--x(t)n=1k-1x(t+τn)e-jΩk-1TTk-1dt, τ1=-τk-1=--y(t)n=1k-1y(t+τn)e-jΩk-1TTk-1dt=ω1=-ωk-1=-[τ1=-τk-1=---x(t1)y(t2) n=1k-1x(t1+τn)n=1k-1y(t2+τn)e-jΩk-1TTk-1dt1dt2]=(2π)2(k-1)-[-x(t)y(t+τ)dt]kdτ.

It is clear that the polyspectral kernel K(x,y) is a function of the correlation of BoldItalic and BoldItalic.

Experiment 1

In order to illustrate the performance of polyspectral kernel, four sets of data are provided, amplitude modulation (AM), frequency modulation (FM), frequency shift keying (FSK), and frequency-hopping spread spectrum (FHSS) samples included. Figure 1 shows that the recognition rates vary with the order of polyspectra for two classes of patterns. It seems that the performance for classification of AM and FM signals are much better than that of FSK and FHSS signals. The result also represents that higher order of polyspectra cannot always bring about higher classification effect. Certainly the recognition rates will decline as the number of classes increases in terms of multi-classification.

Neighborhood-rough-set-based weighted feature set

Neighborhood rough set

Neighborhood rough set (RST) is an approach proposed in this paper for not only feature selection, but also advanced feature evaluation. In RST, a set of all similar objects is called an elementary set, which makes a fundamental atom of knowledge. Any union of elementary sets is called a crisp set and other sets are referred to as rough set [9]. Reference [10] introduced neighborhood relation into rough set methodology.

Let an information system be defined as a function f:U×AV, where U is the set of all objects, A is the set of all attributes, V is the set of all attribute values. A neighborhood decision system is denoted as a 6-tuple NDS={U,A,V,f,δ,θ}, where δ is the neighborhood measure function and θ is the threshold. The lower and upper approximations of the decision D with respect to attributes B are denoted as [10]
NB ¯D=i=1NNB ¯Xi,NB ¯D=i=1NNB ¯Xi,
where
NB ¯X={xi|δB(xi)X,xiU},NB¯X={xi|δB(xi)X,xiU}.

Then the decision boundary region of D with respect to attribute B defined as [10]
BN(D)=NB¯X-NB ¯X.

The dependency degree of D to B [10] is defined as the ratio of consistent objects:
γB(D)=|POSB(D)||U|,
in which POSB(D) is called positive region of decision [10].

Given a decision system U,C,D, BC, aB, the significance of an attribute is defined as [10]
SIG(a,B,D)=γB(D)-γB-a(D).

Given a decision system U,C,D, BC, aB, the significance of an attribute is defined as [10]
SIG(a,B,D)=γBa(D)-γB(D).

In Ref. [11], a forward greedy search algorithm for attribute reduction is proposed, and the minimal feature subset is achieved. More details of how to get the minimal feature subset can be found in Ref. [11].

Neighborhood-rough-set-based weighted feature set

Generally, the method of feature selection aims at how to evaluate the subsets and then grade a subset of features to remove most redundant features from the data. The disadvantage is that after the process of reducing, SIG(a,B,D) will usually be abandoned. We think that it could be useful for constructing a more effective subset, that is, it could be utilized to construct a “weighted” feature set which will be used in SVM.

The process will be described as follows:

Step 1 Let the raw feature dataset be a matrix XT={x1T,x2T,,xNT}. BoldItalic is matrix of size N×M1, where xi is a feature vector and N is the number of samples.

Step 2 Using the forward greedy search algorithm defined in Ref. [11], the matrix BoldItalic is transformed into a new matrix, which is denoted as RXT={r1T,r2T,,rNT}. RX is a matrix of size N×M2, in which M2M1, means that the feature set has been reduced after being transformed by the algorithm.

Step 3 In Step 2, the significance of an attribute has been calculated, and ri1, ri2, , riM2, i=1,2,,N, are descending orderly sequences ordered by their significance. Take the record of the significances of every attribute which is selected into the feature subset. Denote the significances as r1, r2, , rM2, γ1γ2γM2.

Step 4 Normalize the reduced dataset RX. Let the normalization of RX be
zi,j=rij-r¯jsj, i=1,2,,N, j=1,2,,M2,
where
r ¯j=1Ni=1Nrij, sj2=1Ni=1N(rij-r ¯j)2.

Step 5 In standard SVMs [8], the decision function is obtained as
f(x)=sgn[i=1lyiαiK(xi,x)+b],
where K is the kernel function.

Define A=diag(γ1,γ2,,γM2), then the decision function is rewritten as
f(x)=sgn[i=1lyiαiK(Axi,Ax)+b].

Experiments for consistency of significances and polyspectral kernels

Experiment 2

The data used in Experiment 1 are also applied to the experiment of weighted feature set in this section. The polyspectra with different order of the data are chosen as the attributes and Gaussian kernel is selected for SVM in simulation. A series of neighborhood measure function δ is adopted for comparison. Take AM samples for example, given that the order of polyspectra are selected from 2 to 20, Table 1 shows the significance of every attribute when different δ is selected.

Table 1 seems complex and no rules to follow. So we observe the classification rate.

Experiment 3

When weighted features are used, for multi-classifier (class=5), it is shown in Fig. 2 that the AM classification rate can be better and stable if 0.2δ0.5. The result is what we would like, that is, an appropriate value of δ can make stable and satisfactory performance, rather than best performance. Experiments of FM and FHSS signals are similar, whereas performance of FSK is unsatisfactory just as Experiment 1.

Figure 3 shows the significances of attributes which are the nth polyspectra of AM sample as δ=0.4. Compared to Fig. 1, it can be found that the fluctuations of significances are, in general, consistent with the curve in Fig. 1. The result reveals two points:

1) The method of neighborhood-rough-set-based weighted feature set and the method of polyspectral kernels are both effective to some degree. As Experiment 1 shows, higher order does not always bring about higher performance of recognition.

2) In case the two methods are combined, it will be easier to choose the parameter of polyspectra and neighborhood measure function δ. At least the performance of recognition would not too bad.

To address the issue, an improved algorithm, which can be called weighted kernel, will be proposed in the next section.

Weighted polyspectral kernel

Combined kernel function has been proposed in Ref. [12], which means the combination of different kernel functions by addition and multiplication operators. In this section, we make some improvements on the algorithm of neighborhood-rough-set-based weighted feature set, using combined kernel function, called weighted kernel, which can be described as follows:

1) Assuming that the nth polyspectra are chosen to be features, 2nN, denote corresponding significances as λ1,λ2,,λN.

2) The combined kernel function is represented as [13]
Kcomb=α1K1α2K2αmKm,
where Ki, i=1,2,,m, denotes the kernel function, represents the operator between the two kernel functions, which can be addition and multiplication operators. αi is the weighted parameter of the kernel function Ki, i=1Nαi=1.

3) It can be seen that i=1Nλi=1; moreover, the significances have been evaluated. Thus, we would like to construct a weighted kernel as
Kw1=λ1K1+λ2K2++λNKN.

Experiment 4

The dataset used in Experiment 1 are also applied. Figure 4 shows the classification rates for AM, FM, FHSS and FSK signals when weighted kernel is utilized. It is implicated that the performances are stable, and the result indicates that the polyspectra feature is inapplicable for recognition of FHSS and FSK signals, especially FSK signals. Besides, in terms of training time, the speed of using weighted polyspectral kernels is much higher than conventional kernels, although the weighted polyspectral kernels look more complex.

Conclusions

In this paper, we construct polyspectral kernels applied in SVMs and discuss the performances of this algorithm for different signals. To address the problem of fluctuating classification rates, we propose a novel method of neighborhood-rough-set-based weighted feature set. The experiment shows how to choose a suitable neighborhood measure function δ, and it also gives a suggestion of significance of every feature. However, the calculation of conventional kernel in SVMs is harder than polyspectral kernels. Thus, the weighted polyspectral kernel is developed. Although the combined kernel function is more computationally expensive than single polyspectral kernel, it takes better and stable performance of recognition. The work is imperfect with respect to that this method seems inapplicable for FHSS and FSK signals.

References

[1]

Toonstra J, Kinsner W. Transient analysis and genetic algorithms for classification. In: Proceedings of IEEE WESCANEX 95: Communications, Power, and Computing. 1995, 2: 432-437

[2]

Shaw D, Kinsner W. Multifractal modelling of radio transmitter transients for classification. In: Proceedings of IEEE WESCANEX 97: Communications, Power and Computing. 1997, 306-312

[3]

Sun L, Kinsner W, Serinken N. Characterization and feature extraction of transient signals using multifractal measures. In: Proceedings of 1999 IEEE Canadian Conference on Electrical and Computer Engineering. 1999, 2: 781-785

[4]

Tugnait J K. Detection of non-Gaussian signals using integrated polyspectrum. IEEE Transactions on Signal Processing, 1994, 42(11) : 3137-3149

[5]

Liao X J, Bao Z. Circularly integrated bispectra: novel shift invariant features for high resolution radar target recognition. Electronics Letters, 1998, 34 (19): 1879-1880

[6]

Xu S H, Huang B X, Huang Y C, Xu Z G. Identification of individual radio transmitters based on selected surrounding-line integral bispectra. In: Proceedings of the 9th International Conference on Advanced Communication Technology.2007, 2: 1147-1150

[7]

Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167

[8]

Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995

[9]

Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Boston: Kluwer Academic Publishers, 1991

[10]

Hu Q H, Yu D R, Xie Z X. Neighborhood classifiers. Expert Systems with Applications, 2008, 34(2): 866-876

[11]

Hu Q H, Yu D R, Liu J F, Wu C X. Neighborhood rough set based heterogeneous feature subset selection. Information Sciences, 2008, 178(18): 3577-3594

[12]

Nguyen H N, Ohn S Y, Park J, Park K S. Combined kernel function approach in SVM for diagnosis of cancer. Lecture Notes in Computer Science, 2005, 3610: 1017-1026

[13]

Tan Y, Wang J. A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(4): 385-395

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap
PDF (189KB)

660

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/