Data-driven consumer-phase identification in low-voltage distribution networks considering prosumers

Geofrey Mugerwa , Tamer F. Megahed , Maha Elsabrouty , Sobhy M. Abdelkader

Front. Energy ›› 2024, Vol. 18 ›› Issue (6) : 827 -840.

PDF (3273KB)
Front. Energy ›› 2024, Vol. 18 ›› Issue (6) : 827 -840. DOI: 10.1007/s11708-024-0946-4
RESEARCH ARTICLE

Data-driven consumer-phase identification in low-voltage distribution networks considering prosumers

Author information +
History +
PDF (3273KB)

Abstract

Knowing the correct phase connectivity information plays a significant role in maintaining high-quality power and reliable electricity supply to end-consumers. However, managing the consumer-phase connectivity of a low-voltage distribution network is often costly, prone to human errors, and time-intensive, as it involves either installing expensive high-precision devices or employing field-based methods. Besides, the ever-increasing electricity demand and the proliferation of behind-the-meter resources have also increased the complexity of leveraging the phase connectivity problem. To overcome the above challenges, this paper develops a data-driven model to identify the phase connectivity of end-consumers using advanced metering infrastructure voltage and current measurements. Initially, a preprocessing method that employs linear interpolation and singular value decomposition is adopted to improve the quality of the smart meter data. Then, using Kirchoff’s current law and correlation analysis, a discrete convolution optimization model is built to uniquely identify the phase to which each end-consumer is connected. The data sets utilized are obtained by performing power flow simulations on a modified IEEE-906 test system using OpenDSS software. The robustness of the model is tested against data set size, missing smart meter data, measurement errors, and the influence of prosumers. The results show that the method proposed correctly identifies the phase connections of end-consumers with an accuracy of about 98%.

Graphical abstract

Keywords

consumer-phase identification / data-driven / low-voltage distribution network / advanced metering infrastructure / singular value decomposition

Cite this article

Download citation ▾
Geofrey Mugerwa, Tamer F. Megahed, Maha Elsabrouty, Sobhy M. Abdelkader. Data-driven consumer-phase identification in low-voltage distribution networks considering prosumers. Front. Energy, 2024, 18(6): 827-840 DOI:10.1007/s11708-024-0946-4

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Low-voltage distribution networks (LVDNs) are designed to deliver electrical energy to hundreds of residential and commercial loads spread over a wide geographical area [1]. As a result, any inefficiency in the operation and control of the LVDN directly interrupts the reliable supply of electricity to the end-users.

In recent years, management of the LVDNs has received much attention from researchers and power system engineers owing to the deregulation of the energy industry, ever-increasing electricity demand, and the proliferation of behind-the-meter (BTM) resources such as electric vehicles and rooftop photovoltaic (PV) systems [2,3]. One of the emerging challenges in the management of LVDNs is consumer-phase identification (CPI). This complex task involves determining the exact phase to which each end-consumer is connected within a multi-phase distribution network (DN) [4].

Knowing the phase connectivity information is of immense benefit to distribution system operators (DSOs) for executing maintenance and repair services, optimizing power losses, balancing the system loads, controlling bus voltages, detecting power outages [5], managing electricity restorations, and capacity planning of assets such as transformers [6,7]. In addition, such information is also essential for increasing the hosting capacity of BTM resources in DNs [8]. On the other hand, inaccurate consumer-phase connectivity may lead to phase imbalance, which increases the dissipation losses of the DN and reduces the life expectancy of the system assets and protection devices.

In practice, the complexity of the LVDN makes the CPI problem difficult to solve, leading to discrepancies between the ground truth phase connectivity data and the records stored in the databases at the power management center (PMC). For instance, in Fig.1, end-user 30 is connected to phase feeder A according to the records maintained at the PMC. However, the end-user’s true connection is phase feeder B. These inconsistencies are usually caused by the changes that occur in the LVDNs due to factors such as system reconfiguration, cyberattacks, new customer connections, emergency maintenance and power restoration services, asset upgrades, which often take place without updating the original data files [5].

CPI strategies can be categorized into three approaches: field-based, equipment placement, and data-driven. Field-based approaches involve on-site physical inspection, which requires a substantial workforce of utility operators. Thus, such approaches are not suitable for promotion in large LVDNs due to their labor-intensive nature and susceptibility to human errors [1]. In the case of equipment placement approaches, additional sensors or high-precision devices such as power line carriers (PLCs) [9] and micro-phasor measurement units (µPMUs) [10] need to be installed at different points of the DN. Equipment placement has been successfully implemented in high-voltage (HV) and medium-voltage (MV) networks because of the availability of supervisory control and data acquisition (SCADA) systems [11].

Other CPI methods involve injecting high frequency signals into each phase at the substation or transformer side [12]. The injected signals are then traced at the consumers’ premises to verify which of the signals have been correctly reproduced, i.e., if the signals injected and retrieved match, the phase through which the signal has been injected is the true phase to which the end-consumer is connected. However, the high maintenance costs, unavailability of SCADA systems, and need for redundant measuring devices limit the applicability of equipment installation methods in LVDNs. Besides, the implementation of such approaches often results in momentary power disconnections, which interrupts the continuous supply of electricity, incurring losses to both end-consumers and DSOs [6]. This lack of flexibility and convenience during the implementation of such CPI techniques raises the need for alternative strategies to mitigate the above-mentioned challenges.

Fortunately, the popularization of advanced metering infrastructure (AMI) devices at the consumers’ premises and within the upstream of the grid has enabled the collection and storage of numerical data measurements. Such data encompass the power factor, current, voltage, and active and reactive power measurements, recorded at predetermined intervals, often every 15 min or 1 h [13]. This data provides new alternatives for data-driven CPI algorithms in LVDNs. However, only a few previous studies have tackled the CPI problem using AMI data.

Data-driven CPI methods can be classified into voltage-based, power-based, or current-based, depending on the input AMI data. These techniques apply well-established power engineering and data analytics concepts such as correlation, unsupervised machine learning, 0–1 linear integer programming (LIP), linear regression, and probability theory, to solve the CPI problem. Voltage-based correlation methods are proposed in Refs. [1416] to determine the phase connectivity of the end-consumers. In Yan et al. [14], distance and trend similarity are utilized to correct errors in the phase label files using Pearson correlation and discrete Frechet distance. Its limitation is the dependency of the results on geographical information system (GIS) data, which renders the approach prone to GIS-based errors. Besides, GISs are also not available in LVDNs. Multi-dimension calibration is applied in Zhou et al. [15], where multiple voltage correlation coefficients and a location index are utilized. Phases are then identified based on the highest voltage correlation coefficient and the proximity of the consumers to the feeder bus. In Izadi & Mohsenian–Rad [16], granular window and reliability analysis are utilized, and phases are assigned to smart meters based on the highest correlation factors. However, the sensitivity of the correlation results to measurement errors may lead to wrong phase identification.

Unsupervised machine learning is proposed in Refs. [1720], where the CPI problem is solved by applying data clustering techniques to the voltage measurements. Nonnegative matrix factorization reduction and label propagation are proposed in Chen et al. [17], and phases are assigned to smart meters using graph theory analysis. In Chiu et al. [18], Fourier transformation is applied to first reduce the dimensions of the voltage matrix, and then clustering is utilized to group consumers into three clusters, each representing one of the transformer phases. A similar approach is adapted in Zaragoza & Rao [19], where one-minus Pearson correlation coefficient is utilized as the similarity metric to cluster the end-users. A majority vote method is then employed to assign phases to each cluster based on the phase that has most of the end-users connected to it. In Ma et al. [20], constrained k-means and k-nearest neighbor (KNN) algorithms are employed to predict the phase connections of smart meters in a LVDN. The algorithm in Ma et al. [20] utilizes the similarity between users in the same cluster as a constraint to group the end-consumers. Generally, data clustering techniques are promising candidates for leveraging the phase connectivity problem in LVDNs. However, such methods may require the utility engineers’ visual site inspection to assign phases to clusters, which makes the overall identification process costly, time-consuming, and prone to human errors. Besides, CPI methods that entirely rely on the correlation of voltage measurements may fail to figure out the correct phase connections of end-users in scenarios where the LVDN is well-balanced. The reason for this is that the end-users of such systems tend to have similar voltage characteristics, making it difficult to identify phases by simply calculating the correlation coefficients. Additionally, correlation-based methods also fail to uniquely identify to which phase the end-consumer is exactly connected. In the present paper, the identification results of the CPI method proposed are compared with those of the approaches in Refs. [19,20].

Contrary to voltage-based methods, power-based approaches apply the energy balance principle to the feeder and end-consumer smart meter power data. The phase connections of the consumers are then determined by optimizing the difference between the two data sets. For instance, Bayesian inference is utilized in García et al. [5] to predict the likelihood of the end-users’ connectivity to any of the three phases of the DN by applying Baye’s probability theory. Wavelet and similarity analysis are applied in Peng et al. [21], where the power data are first subjected to a time-frequency filter and an influence factor to eliminate the low-frequency power components. Pearson correlation is then utilized to identify the phase connection sequences of the end-users. Both approaches provide promising results. However, the impact of local generation on the accuracy of the identification results is not considered.

Current-based CPI approaches are applied in a similar manner as power-based methods, only that in this case the nodal current law is utilized. For example, quadratic programming and probability distribution are proposed in Zhou et al. [22] to identify the phase connectivity of end-consumers in a LVDN using smart meter current measurements. The approach applies optimization to solve the CPI problem, and the identification results are enhanced using Monte Carlo simulations. In Yi et al. [23], an algorithm is proposed that starts by formulating the CPI problem as a LIP problem, after which stepwise regression analysis is employed. The phase connections of consumers are then identified based on the significance values and correlation factors. Its disadvantage is the high computational time. This limitation is overcome in the study described in the present paper by introducing a softargmax activation function. A summary of the advantages of the identification model proposed over the existing methods is presented in Tab.1.

Based on the above discussion, most of the CPI techniques available require accurate measurements and can only tolerate small errors. However, in practical distribution systems, measurement errors are inevitable. Besides, the aging of the AMI devices results in accuracy degradation, which also increases the errors in the smart meter measurements [24]. Nevertheless, the transition toward carbon-free power systems has resulted in high penetration levels of BTM resources in LVDN. Owing to this transformation, prosumers have been created in LVDNs with small-sized rooftop PVs, battery storage, or electrified heating systems. These consumers are usually connected to single-phase feeders, and they consume little or no energy from the main grid during certain periods of the day or year. As a result, such end-users contribute to power quality problems and phase unbalance, which interrupts the normal operation of the LVDN. Hence, to effectively manage the LVDN, utility companies need to know the phase connectivity of prosumers.

In summary, despite the numerous benefits associated with CPI in LVDNs, identifying the correct phase connectivity of the end-consumers in scenarios where there are smart meter induced measurement errors, missing data, and prosumers is still a serious challenge that needs to be investigated. To overcome these research gaps, the study described in the present paper develops a data-driven phase identification model based on the AMI data of the phase feeders and consumers. The key contributions of the study are that a matrix approximation model based on singular value decomposition (SVD) is built to discard noise from the smart meter voltage data, which enhances the phase identification results. Moreover, a model based on KCL is proposed to uniquely identify the phase connections of end-consumers in a LVDN. Particularly, a discrete convolution-based LIP optimization algorithm that balances the current at the phase feeder and consumer sides is developed. The computational burden of solving the LIP problem is circumvented by introducing a softargmax function that converts the integer variables to probabilities. Furthermore, the impact of prosumers on the accuracy of the identification results is investigated.

2 Consumer-phase identification in low-voltage distribution networks

2.1 Mathematical description of consumer-phase connectivity problem

The LVDN utilized in this study is a modified IEEE-906 LV test system shown in Fig.2. This network is connected to a medium-voltage DN through a low-voltage transformer that steps down the voltage from 11 to 0.416 kV at a frequency of 50 Hz. The system has 116 nodes and supplies electricity to a total of 55 residential consumers. Among these end-consumers, 21 are associated with phase feeder A, 19 with B, and 15 with C [25]. Consequently, the mathematical sets representing the phases and consumers can be expressed as L and n = {1, 2, 3,…, N}, respectively, where represents the phases, i.e., L = {A, B, C}, n denotes each of the connected end-consumer, and N is the overall number of the consumers, i.e., N = 55. The mapping between phases and end-consumers is given by a 3 × N connectivity matrix β ,n= [x,n], where x,n={0,1 }. The binary elements represent the correlation between each phase and end-consumer n, i.e., 1 indicates that a connection exists between n and while 0 implies that the consumer is not connected to the selected phase.

2.2 Data sets utilized for consumer-phase identification

The data sets utilized in this study are synthetic smart meter voltage and current measurements obtained by performing power flow calculations using OpenDSS software. The daily measurements of each phase feeder and end-consumers consist of 96 data points, i.e., the interval between the measurements is 15 min. It should also be noted that the phase and consumer measurement data are assumed to be time-synchronized, i.e., there are no time discrepancies between the measurement instants.

Let E n and I n represent the smart meter voltage and current measurements of consumer n, given by Eqs. (1) and (2), respectively.

En= [ E1,n, E2 ,n,, Em ,n,, EM ,n]T, m{1, 2,3, ,M},n{1, 2,3, ,N},

In=[ I1,n, I2 ,n,, Im ,n,, IM ,n]T, m{1, 2,3, ,M},n{1, 2,3, ,N},

where M denotes the overall total of the measurements, and Em,n and Im,n are the average voltage and current smart meter measurements of consumer n at instant m.

Similarly, Vm, and Im , are the average feeder head voltage and current measurements of phase at instant m, and the corresponding data vectors are given by Eqs. (3) and (4), respectively.

V= [ V1 ,,V2, ,V 3,, ,V m,,,VM,] T,L,m {1,2,3,, M},

I= [ I1 ,,I2, ,I 3,, ,I m,,,IM,] T,L,m {1,2,3,, M},

where V and I represent the voltage and current measurements of phase .

The approach employed in this study is built on two data characteristic concepts of the LVDN, i.e., voltage similarity and current balance, as elaborated below.

1) The voltage measured at any of the consumers’ nodes is governed by Ohm’s law and Kirchoff’s voltage law (KVL), which varies with time [26]. Consequently, for end-consumers sharing the same phase feeder, similar voltage characteristics and strong correlation coefficients are exhibited. For example, in Fig.1, consumers 1,2, ,2n are connected to the same phase feeder A. Thus, the nodal voltages V1, V2, , V2 n of these consumers are closely correlated with each other and also with the supply voltage VA. Based on this idea of voltage similarity, the connectivity between consumers and phase feeders can be established using correlation analysis techniques such as Pearson correlation, as discussed in Section 4.

2) According to KCL, for a parallel path, the algebraic sum of the current leaving a node must equal to the algebraic sum of the current entering that same node [22]. Thus, at any given instant, the aggregate current of all the consumers served by the same phase feeder is exactly equal to the total current supplied by that phase. For instance, considering phase feeder A in Fig.1, then, I1+ I2+ + I2 n=IA. However, in practical DNs, there is always a difference between the sum of the currents leaving a circuit junction and the current entering that same node. This may be caused by factors such as energy theft, the accuracy of the metering device, and leakage current. Thus, at any given instant m, the connectivity between the consumers and phase is established as given in Eq. (5).

n=1N (Im,nxn)+ ξm= Im,

where ξm denotes the current error at instant m and xn is the binary variable representing the connectivity of end-consumer n.

2.3 Problem formulation of current balance optimization model proposed

In this section, a 0–1 LIP optimization model based on KCL is established. The integer solutions {0,1} of the LIP problem are then relaxed to probability variables [0,1] to make the CPI problem solvable.

Considering a three-phase LVDN with N single-phase connected end-consumers, the current matrices of phase feeders and consumers can be expressed in the form IM , R( M×3) and IM ,n R(M ×N) as given in Eqs. (6) and (7), respectively.

IM,=[ I1 ,A I1,B I1 ,C IM,A IM ,B IM,C] (M×3),

IM,n=[ I 1,1I1,N IM ,1IM,N ](M×N).

The relationship between each phase feeder and consumers is confined by the connectivity matrix β ,n R(3 ×N), and is given by Εq. (8),

β ,n=[xA,1 xA,NxB,1 xB,NxC,1 xC,N] (3×N).

Since each end-user is served by one phase feeder, Eq. (9) must be satisfied.

L x,n=1, n{1 ,2 ,3 ,,N },x,n{0,1} .

From Eq. (8), the connectivity of the end-users to any of the phases is given by Eq. (10).

β=[x,1, x,2, x,3,, x,N].

Based on Eq. (10), the connectivity matrix for all the connected end-users can be modified as given in Eq. (11).

β=[βA, βB, βC] T 3×N.

To guarantee that each end-user connects to only a single-phase feeder as desired, Eq. (9) is further expressed in a more concise form to satisfy the condition in Eq. (12).

β T R3= P,

where R 3=[ I N,IN, IN]T is a matrix comprising of three N×Nidentity matrices IN, and P is an N× 1 column matrix composed of 1’s.

Accordingly, the connectivity between phases and consumers is established by modifying Eq. (5) to Eq. (13).

Im,n βT+ ξm ,=Im,,

where ξm, is the current error matrix with dimensions M×3.

The CPI problem in Eq. (13) is a 0–1 LIP problem, and the objective is to determine the integer elements of matrix β. However, solving Eq. (13) directly requires exponential time to obtain an optimal solution to the problem. Moreover, for large LVDNs, the dimensions of matrix β also tend to grow, making the problem computationally expensive to solve in terms of CPU memory and speed. To overcome these drawbacks, binary elements of matrix β are converted to probabilities by introducing a Softargmax function. This function exponentiates each data point of the input vector and then divides the results by the sum of all the exponentiated values, as given in Eq. (14) [27].

h,n=ex,nj=1Nex,j, n {1,2,,N}.

Equation (14) is used to convert the binary elements of matrix β to probability variables, i.e., h,n is a value in the range [0,1].

Let H,nR(3 ×N) be the matrix containing the connectivity probabilities of the end-users to any of the three phases, Eq. (8) is modified to Eq. (15).

H,n=[hA,1 hA,NhB,1 hB,NhC,1 hC,N] (3×N).

Thus, based on KCL, the discrete convolution CPI model proposed is established as in Eq. (16).

Im,nH,n+ ξm ,=Im,,

where is the discrete convolution operator, and it is used to calculate the weighted sum of the product between two functions within a specified interval as given in Eq. (17).

Im,nH,n=n=1N(Im,n h,n),L ,m {1, 2,3, ,M}.

From Eq. (16), whenever the row vector of matrix H,n slides over the input of matrix Im ,n, the weighted sum is calculated and then compared with the output matrix Im , to iteratively update the elements in the connectivity matrix until an optimal solution is obtained. Hence, substituting Eq. (17) into Eq. (16) and then rearranging it gives the discrete convolution-based current balance optimization model proposed as expressed in Eq. (18).

Objective function I: Optimizing the difference between the current measurements

L m=1M (ξm ,)2=m in L m=1M I m,n= 1N(Im,n h,n)22,

s ub je ctto Lh,n 1, n{1 ,2,3, ,N},h,n [0 ,1].

Considering that the current supplied by each phase feeder should not be less than the aggregate current of the end-consumers connected to that particular phase, another objective function in Eq. (19) is added to limit the negative current difference.

Objective function II: Optimizing the negative current difference

min Lm=1Mma x( 0, n=1N(Im,n h,n)Im,),

sub je ctt o n=1N (Im,nh,n)> Im,.

Hence, the CPI problem is solved by optimizing the two objective functions in Eqs. (18) and (19).

To initialize the binary elements of matrix β,n, voltage correlation coefficients between the phase feeders and consumers are calculated using Pearson’s correlation function as given Eq. (20) [4].

τ(V,E n)= m=1MV ,m E m,n 1Mm= 1MV,m m=1M Em ,n (m=1MV ,m21M( m=1M V,m)2)(m= 1M Em ,n 2 1M(m=1MEm, n)2) .

The elements in the correlation matrix are then set to binary variables based on a threshold voltage correlation coefficient ( τ0) as expressed in Eq. (21),

τ(V ,En)int= {1,τ ( V,En) τ0, 0,otherwise.

where τ( V, En)int represents the integer form of the voltage correlation coefficients used to initialize β,n.

In this study, the value of τ 0 is set to 0.950 after several trials, and at every iteration, a step size of 104 is added to update the element weights of the connectivity matrix. The process is repeated until the desired condition(s) is or are satisfied, i.e., the algorithm terminates when it reaches the set number of iterations or when the current error, ξm , is less than or equal to the tolerance value. Thus, for the case in this study, the algorithm is set to run for a specific number of iterations, and the values of H,n corresponding to the iteration with the least current error are taken as the end-consumers’ true phase connection probabilities.

The values of h,n obtained in Eq. (18) consist of decimals. Therefore, to clearly determine the consumers’ phase connections and strictly ensure that each end-consumer is identified with only one phase, h,n variables are converted to 1 or 0. This is achieved by introducing a matrix A,n with the same dimensions as H,n, i.e, A,nR(3 ×N). At the start, all the elements of A,n are initialized to zeros, and then the subsequent weights are set as described in Tab.2. The overall phase identification process of the model is illustrated in Fig.3.

2.4 Influence of prosumers

Regarding prosumers, there will be a reduction in the energy consumption of end-users from the main grid. This implies that part of the energy consumed by the end-users is supplied by behind-the-meter (BTM) resources, i.e., micro-generation renewable sources such as rooftop PV and/or residential storage systems. Depending on the BTM generating capacity, the reduced energy consumption can be negative, zero, or close to zero. In the case where the energy consumption of the end-users is negative, the LVDN experiences reverse current (power) flow, consequently altering the voltage profiles of the end-users. Therefore, since the phase identification method proposed is initialized based on the voltage correlation between phases and consumers, it is only applicable to scenarios where the reduced energy consumption of BTM resources is zero or close to zero. In general, the mathematical formulation of the CPI method proposed applies to end-consumers with and without BTM resources, provided the reduced energy consumption is not negative. The impact of these end-users on the performance of the identification model proposed is investigated in Section 4.

3 Data pre-processing

In practical DNs, the data sets collected from the AMI devices often contain missing data points and noise, which may lead to misidentification of the consumers’ phase connectivity. Therefore, to enhance the phase identification process, data pre-processing is needed.

3.1 Recovering missing end-consumers’ smart meter voltage data

As enumerated in Section 2, the voltage measurements of a particular consumer at different timestamps are linearly correlated. Consequently, for smart meter voltage with missing data points, linearization techniques can be utilized to predict the missing information. In this paper, linear interpolation and extrapolation methods are adopted to fill in the missing data based on Eq. (22).

Em is sin g=E 1,n+( E 2,nE1,nm2 m1)(m m1),

where Em is sin g is the missing smart meter voltage at instant m, and E1 ,n and E2 ,n are the known voltage measurements at instants m1 and m2, respectively.

3.2 Noise in consumers’ smart meter data

The noise model considered in this study is additive gaussian noise (AGN) which approximates a normal distribution with an expected value of zero and a variance σ m,n2. The values of σm,n2 have a direct relationship with the measured quantity and the tolerance (Ec la ss) of the metering device [5,28]. The distribution of the AGN at instant m, is given by Eq. (23), where 97.73% of the smart meter data are assumed to correspond to the error, i.e., the expected value of the data lies within three times the standard deviation σm,n [23].

γ m,nN( 0, σ m,n 2=( E cl as s X m,n3)2),

where γm,n is the AGN, and Xm ,n is the consumer’s smart meter data at instant m.

According to the International Electrotechnical Commission (IEC), smart meters deployed in DNs should be of accuracy classes 2.0, 1.0, 0.5, and 0.1s with corresponding accuracy errors (Eclass) of ±2%, ±1%, ±0.5%, and ±0.1% [4], respectively. In this study, AGN is added to end-consumers’ voltage and current data by randomly injecting different noise levels, and the resulting noisy data Xnoisy, is given by Eq. (24).

Xn oi sy=Xm,n+ γm,n.

Hence, smart meter data are always in the form of Eq. (24), and the impact of such data on the phase identification accuracy is discussed in Section 4. The proceeding subsection discusses the elimination of noise from the voltage data.

3.3 Denoising end-consumers’ smart meter voltage using singular value decomposition

SVD is a data-driven method used to de-noise data sets by approximating the noisy data matrix to its original noiseless form [29]. SVD achieves this by factorizing a non-square matrix of dimensions M×N, in this case the end-users’ voltage matrix En oi syR(M ×N), into three distinct matrices, F, S, and QTas given in Eq. (25) [29].

En oi sy=FSQ T= m=1M smf~mq~mT,

where FR( M×M) and Q R(N ×N) are unitary matrices and their columns are orthogonal, i.e., F and Q satisfy the condition in Eq. (26); and S R(M ×N) is a diagonal matrix with non-negative, real elements on the diagonal, and all the off-diagonal entries are zeros.

{FFT= FTF =1, QQ T=Q TQ =1.

For practical systems, M is always greater than N, i.e., MN, making matrix S have only up to N nonzero entries on its diagonal, and in this case, it is given by S= [ S^0]. Consequently, instead of using the full SVD in Eq. (25), an economy SVD can be utilized to exactly represent En oi sy as given in Eq. (27) [29],

Enoisy= [F^F^] [ S ^0] QT= F^S^QT= m=1r smf~mq~mT,

where F^R( M× r) contains the left singular vectors and its information represents the relationships between the data sets, F^ is complementary and orthogonal to F^, Q is a matrix of the right singular vectors and it represents the relationships between the variables, and S^ R(r ×r) is a diagonal matrix of singular values arranged in a hierarchical order, i.e., s1 s2 sr>0. Thus, the information in S^ reflects the importance of each singular value.

S^=[ s100 0 s20 00 s r],

where r is the rank of Enoisy, i.e., r=rank( Enoisy) and its value is equal to the number of the singular values.

The concept of SVD matrix approximation involves determining the optimal value of, r, that can be utilized to discard noise while retaining meaningful information. This study adopts the profile likelihood method proposed in Zhu & Ghodsi [30] that automatically determines the maximum likelihood estimation (MLE) of the singular values by maximizing the function in Eq. (29).

lk(k)= m=1klogf(sm, μ 1,σ2)+ x=k+1rlogf(sx, μ 2,σ2),

where

f(sj, μ i,σ2)=1σ2πex p { (sjμi)22σ2},j=m,x,i=1,2,

in which μ 1 and μ 2 are the sample means and σ 2 is the common variance. The corresponding MLEs of the mean and variance are given by Eqs. (31)–(33) [30], respectively.

μ^ 1=1k m=1k sm,

μ^ 2=1r k x=k+1r sx,

σ^ 2=(k 1) α1 2+(r k1)α2 2r2,

where α12 and α2 2 are the sample variances.

The optimal number of singular values, k^ is determined using a search algorithm, i.e., the values of lk(1), lk(2), lk(3),…., lk(r) are computed and the value of k that maximizes lk(k) is obtained as given in Eq. (34).

k^=argmaxϑ=1, 2,3,,rlk(ϑ).

From the above discussion, only the first k^ singular values of matrix S^ are kept, and all the other elements are set to zeros. Hence, the denoised matrix Dm,n of the consumers’ smart meter voltage is computed using the truncated SVD, as expressed in Eq. (35) [29].

Dm,n= F^ (M×k^) S^ (k^× k^) QT (k^×N)= m=1k^ smf~m q~ m T.

To measure the performance of the SVD noise filter, the Frobenius norm is utilized to calculate the error between the noiseless voltage matrix Em,n and the denoised voltage matrix Dm,n, as given in Eq. (36) [29].

Em,n Dm,nF= m=1Mn=1N|em, ndm,n|2,

where em,n and dm,n are elements of matrices Em,n and Dm,n, respectively.

From Eq. (36), a small value of the Frobenius error is an indication that the two data sets are similar in terms of their elements.

4 Performance evaluation of phase identification model proposed

4.1 Performance index of results

The identification of consumer-phase connectivity in LVDNs is categorized as a binary classification problem. For this reason, verifying the performance of the identification method proposed based on only a single metric, such as accuracy, cannot guarantee the validity of the results. Therefore, the effectiveness of the CPI model proposed is evaluated using the F1-score performance index. The F1-score is a widely used metric in statistics and machine learning to measure the performance of binary classification problems. This index combines precision P and recall R into a single variable, where the values of P and R are used to measure the accuracy and sensitivity of the binary results. The expressions for calculating P and R are given in Eqs. (37) and (38), respectively.

P=TP +TNF P+FN+ TP+TN,

R=TPFN+T P.

From Eqs. (37) and (38), four indices TP, TN, FP, and FN are utilized to evaluate the performance of the model based on the identification of positive and negative, as illustrated in Tab.3.

The indices TP and TN represent the true positive and true negative, while FP and FN denote the false positive and false negative of the identification results, respectively. In the case of TP and TN, the model correctly identifies the connection of the end-consumers to any of the phases as per the ground truth data, whereas for FP and FN, the consumers’ phase connections are incorrectly identified. The formula for calculating the F1-score index is as given in Eq. (39),

F1= 2(P×R)P+R,

where F1 takes on continuous values between 0 and 1, i.e., F1 [0 ,1], in which the values close to 1 indicate that the model accurately predicts the true phase connections of the end-consumers.

4.2 Results of identification model proposed and its comparison with other previous studies

The results of the phase identification model proposed are compared with those of the previous studies in Refs. [19,20] that employ voltage correlation analysis (VCA) and data clustering methods to solve the CPI problem. For simplicity, Methods 1 and 2 are used to define the approaches in Refs. [19 20], respectively. The details of the methods are discussed in Section 1.

As shown in Fig.4, phase identification algorithms that directly apply VCA methods, i.e., Pearson correlation coefficient, may fail to accurately identify the phase connections of end-consumers in scenarios where the LVDN data are subjected to quality problems. This is evidenced in Fig.4(b), where a fading structure of the correlation results is observed when AGN of 0.5% is randomly injected into the voltage data. In Fig.4(c), the original structure of the correlation results (Fig.4(a)) is approximately recovered by applying the SVD noise filter proposed.

The heatmap in Fig.4(d) is a representation of consumer-phase connectivity based on the identification method proposed. It can be observed that the model proposed uniquely identifies the phase to which each end-consumer is connected, as indicated by the darker-colored regions. This stands out as a notable advantage over the existing correlation-based approaches, which fail to exactly figure out to which phase the end-user is connected. This challenge arises because there always exist voltage correlation coefficients between consumers and phases, even when the consumer is not actually linked to a particular phase, as visualized in Fig.4(a)–Fig.4(c). As a result, in scenarios where end-users exhibit similar voltage profiles, such approaches may lead to wrong phase identification as it becomes difficult to differentiate among the voltage correlation factors.

The identification accuracies of the algorithms are presented in Tab.4. The results indicate that in scenarios where the noise in the voltage data is neglected, Method 1 performs slightly better than the method proposed. However, in the case where noise is injected into the smart meter data, the CPI method proposed outperforms both Methods 1 and 2. The robustness of the method proposed is attributed to the SVD denoising technique, which manages to discard noise from the voltage data. This is another novel contribution of this work, since most of the existing phase identification methods only focus on how to add noise to the synthetic data but fail to examine how it can be eliminated or reduced to improve the identification accuracy in practical distribution systems. Fig.5 shows the identification results of the algorithms for each phase. It can still be observed that the method proposed manages to identify the connected end-users per phase as compared to Methods 1 and 2, despite the presence of noise in the data. That is, out of the 55 end-users connected, 51 are correctly identified by the method proposed, 19 by Method 1, and only 18 by Method 2. It is important to note that the actual connected end-users refer to the true phase connectivity information, the identified end-users indicate the number of consumers assigned to phases by applying the algorithms, and the correctly identified end-users represent the true number of consumers among the identified end-users.

4.3 Sensitivity analysis of identification model proposed

This subsection analyzes the impact of data amount, missing data, measurement errors, and the influence of prosumers on the phase identification results of the model proposed.

4.3.1 Sensitivity of the identification results to data amount

To examine the impact of data size on the effectiveness of the identification results, data sets of different widow sizes ranging from day 1 to 7 are utilized to test the phase identification method proposed. Fig.6 shows the impact of increasing window sizes on the performance of the algorithm proposed. The results indicate that the size of data sets is strongly correlated with the accuracy of the identification model. The maximum F1-score of the algorithm is approximately 0.985 with data sets spanning four days and above. Consequently, for the CPI method proposed, data sets with a widow size of 4 days, i.e., 384 datapoints sampled every 15 min, are sufficient to identify the end-users’ phase connections. Thus, all the analysis in this paper is conducted using voltage and current measurements with a window size of four days.

4.3.2 Sensitivity to missing data

In practical LVDNs, the problem of missing data is often encountered due to communication failures. This situation is similar to the lack of data measurements at the end-consumers’ side during certain measurement instants. To examine the effect of missing data on the efficacy of the method proposed, the missing data from 0% to 50% is randomly introduced into the end-consumers’ smart meter data. Fig.7 shows the impact of different missing data percentages on the performance of the identification algorithm proposed. The performance indices of the identification model are also presented in Tab.5. It can be observed that the technique proposed still manages to identify the phase connectivity of end-consumers under the influence of missing data. However, to obtain more accurate results, the percentage of missing data should be maintained below 20%. Nevertheless, the CPI method proposed is well-suited for practical applications, as most modern automation systems are designed to operate with minimal data losses.

4.3.3 Sensitivity to smart meter measurement errors

In practice, the measurement data often contains noise due to the inherent errors of the smart metering devices. For this reason, it is essential to evaluate the effect of such errors on the identification accuracy of the algorithm proposed. In this paper, measurement errors of the commonly used smart meters with accuracy classes of 0.2s, 0.5s, 1.0, and 2.0, as defined by the IEC, are added to the data. Fig.8 shows F1-scores corresponding to the different smart meter accuracy classes. It can be observed that increasing the measurement errors negatively affects the F1-scores, leading to wrong phase identification results. To enhance the phase identification process, noise is discarded from the voltage data by applying the SVD denoising technique proposed. This manages to reduce the impact of noise on the identification accuracy, as shown in Fig.8 and Tab.6. However, the effectiveness of the SVD technique deteriorates at higher noise levels. This limitation is currently being worked on by the authors as an improvement for this paper.

4.3.4 Sensitivity of identification results to prosumers

As mentioned in Section 2.4, the reduced current flow of end-consumers caused by BTM resources is either zero or close to zero rather than negative. In this regard, it is assumed that the generating capacity of the BTM resources installed is only sufficient for meeting the end-consumers’ energy demand at measurement instants when such power is available. For instance, the current profiles of end-user 5 (indicated on the LV test system in Fig.2) are shown in Fig.9(a). It can clearly be seen that whenever there is power generated from BTM resources, the current flowing from the grid is reduced to zero, i.e., the load current at such instants is supplied by micro-renewable energy sources. Fig.9(b) shows the performance of the identification model at different penetration levels (0%–100%) of BTM resources. It can be observed that the model proposed manages to identify the phase connectivity of end-consumers with a maximum F1-score of 0.985, as in the case without BTM resources. However, when the penetration level exceeds 60%, the accuracy of the CPI method proposed decreases, leading to wrong phase identification.

5 Conclusions

This paper has proposed a data-driven phase identification model based on AMI voltage and current measurements of a low-voltage DN. The model proposed employs a discrete convolution-based current balance optimization algorithm to uniquely identify the phase connectivity of each end-user. The effectiveness of the method proposed has been tested against various case studies, such as data set size, missing data, measurement errors, and the influence of prosumers. The identification accuracy of the algorithm has been enhanced by applying SVD to the end-users’ voltage data. The results demonstrate that the method proposed manages to identify the phase connectivity of end-users with an accuracy of about 98% using four-day data measurements, i.e., 384 datapoints sampled every 15 min. Further, it is revealed that to obtain a high identification accuracy, the percentage of missing data should be maintained below 20%. In the case of measurement errors, measuring devices with accuracy classes of 0.2s, 0.5s, and 1.0 have a better performance with identification accuracies of 97.58%, 95.15%, and 84.24%, respectively. For end-consumers with micro-generation renewable sources, e.g., rooftop PV systems, the results indicate that the penetration level of such sources should not exceed 60% for accurate phase identification.

Future work will focus on solving the phase connectivity problem in LVDNs with reverse current flow and limited smart meter coverage. Besides, alternative denoising methods to SVD to deal with the high noise levels of measuring devices will also be explored.

References

[1]

Hashmi M U, Brummund D, Lundholm R. . Consensus based phase connectivity identification for distribution network with limited observability. Sustainable Energy, Grids and Networks, 2023, 34: 101070

[2]

Gao J, Lu Y, Wu B. . Coordinated management and control strategy in the low-voltage distribution network based on the cloud-edge collaborative mechanism. Frontiers in Energy Research, 2022, 10: 903768

[3]

Al-Jaafreh M A A, Mokryani G. Planning and operation of LV distribution networks: A comprehensive review. IET Energy Systems Integration, 2019, 1(3): 133–146

[4]

Hoogsteyn A, Vanin M, Koirala A. . Low voltage customer phase identification methods based on smart meter data. Electric Power Systems Research, 2022, 212: 108524

[5]

García S, Mora-Merchán J M, Larios D F. . Phase topology identification in low-voltage distribution networks: A Bayesian approach. International Journal of Electrical Power & Energy Systems, 2023, 144: 108525

[6]

Tang X, Milanovic J V. Phase identification of LV distribution network with smart meter data. In: 2018 IEEE Power & Energy Society General Meeting. Portland: IEEE, 2018, 1–5 10.1109/PESGM.2018.8586483

[7]

MatijaševićT, AntićT, Capuder T. Voltage-based machine learning algorithm for distribution of end-users consumption among the phases. In: 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology. Opatija: IEEE, 2022, 974–979 10.23919/MIPRO55190.2022.9803565

[8]

Hosseini Z S, Khodaei A, Paaso A. Machine learning-enabled distribution network phase identification. IEEE Transactions on Power Systems, 2021, 36(2): 842–850

[9]

MarrónL, OsorioX, LlanoA, et al. Low voltage feeder identification for smart grids with standard narrowband PLC smart meters. In: 2013 IEEE 17th International Symposium on Power Line Communications and its Applications. Johannesburg: IEEE, 2013, 120–125 10.1109/ISPLC.2013.6525836

[10]

WenM H F, Arghandeh R, von MeierA, et al. Phase identification in distribution networks with micro-synchrophasors. In: 2015 IEEE Power & Energy Society General Meeting. Denver: IEEE, 2015, 1–5 10.1109/PESGM.2015.7286066

[11]

Bindi M, Piccirilli M C, Luchetta A. . A comprehensive review of fault diagnosis and prognosis techniques in high voltage and medium voltage electrical power lines. Energies, 2023, 16(21): 7317

[12]

ShenZ, Jaksic M, MattavelliP, et al. Three-phase AC system impedance measurement unit (IMU) using chirp signal injection. In: 2013 28th Annual IEEE Applied Power Electronics Conference and Exposition. Long Beach: IEEE, 2013, 2666–2673 10.1109/APEC.2013.6520673

[13]

Luan W, Peng J, Maras M. . Smart meter data analytics for distribution network connectivity verification. IEEE Transactions on Smart Grid, 2015, 6(4): 1964–1971

[14]

YanY, ZhouX, BaoW, et al. Connection identification of low voltage distribution areas based on distance measurement and trend similarity. In: 2020 12th IEEE PES Asia-Pacific Power and Energy Engineering Conference. Nanjing: IEEE, 2020, 1–5 10.1109/APPEEC48164.2020.9220469

[15]

Zhou L, Li Q, Zhang Y. . Consumer phase identification under incomplete data condition with dimensional calibration. International Journal of Electrical Power & Energy Systems, 2021, 129: 106851

[16]

IzadiM, Mohsenian–Rad H. Improving real-world measurement-based phase identification in power distribution feeders with a novel reliability criteria assessment. In: 2021 IEEE PES Innovative Smart Grid Technologies Europe. Espoo: IEEE, 2021, 1–5 10.1109/ISGTEurope52324.2021.9640040

[17]

ChenK, Shi J, WeiX, et al. Phase identification with single-phase meter and concentrator based on NMF dimension reduction and label propagation. In: 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems. Jiaxing: IEEE, 2021, 1–6 10.1109/CYBER53097.2021.9588325

[18]

ChiuJ, Wong A, ParkJ, et al. Phase identification of smart meters using a Fourier series compression and a statistical clustering algorithm. In: 2022 IEEE Electrical Power and Energy Conference. Victoria: IEEE, 2022, 224–228 10.1109/EPEC56903.2022.10000137

[19]

ZaragozaN, Rao V. Phase identification of power distribution systems using hierarchical clustering methods. In: 2021 North American Power Symposium. College Station: IEEE, 2021, 1–6 10.1109/NAPS52732.2021.9654617

[20]

MaY, Fan X, Tang R, et al. Phase identification of smart meters by spectral clustering. In: 2018 2nd IEEE Conference on Energy Internet and Energy System Integration. Beijing: IEEE, 2018, 1–5 10.1109/EI2.2018.8582318

[21]

PengQLiu XHuF, . Consumers’ phase identification in low voltage station area based on wavelet analysis of consumption data. In: 2021 IEEE International Conference on Power, Intelligent Computing and Systems. Shenyang: IEEE, 2021, 346–350 10.1109/ICPICS52425.2021.9524193

[22]

Zhou L, Zhang Y, Liu S. . Consumer phase identification in low-voltage distribution network considering vacant users. International Journal of Electrical Power & Energy Systems, 2020, 121: 106079

[23]

Yi Y, Liu S, Zhang Y. . Phase identification of low-voltage distribution network based on stepwise regression method. Journal of Modern Power Systems and Clean Energy, 2023, 11(4): 1224–1234

[24]

Kong X, Zhang X, Lu N. . Online smart meter measurement error estimation based on EKF and LMRLS method. IEEE Transactions on Smart Grid, 2021, 12(5): 4269–4279

[25]

KhanM A, Hayes B P. A reduced electrically-equivalent model of the IEEE European low voltage test feeder. In: 2022 IEEE Power & Energy Society General Meeting. Denver: IEEE, 2022, 1–5 10.1109/PESGM48719.2022.9916806

[26]

NiF, LiuJ Q, WeiF, et al. Phase identification in distribution systems by data mining methods. In: 2017 IEEE Conference on Energy Internet and Energy System Integration. Beijing: IEEE, 2017, 1–6 10.1109/EI2.2017.8245748

[27]

DukhanM, Ablavatski A. Two-pass softmax algorithm. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops. New Orleans: IEEE, 2020, 386–395 10.1109/IPDPSW50202.2020.00074

[28]

Wang W, Yu N. Maximum marginal likelihood estimation of phase connections in power distribution systems. IEEE Transactions on Power Systems, 2020, 35(5): 3906–3917

[29]

BruntonS L, Kutz J N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge: Cambridge University Press, 2019

[30]

Zhu M, Ghodsi A. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 2006, 51(2): 918–930

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (3273KB)

1865

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/