Endowing rotation invariance for 3D finger shape and vein verification

Hongbin XU; Weili YANG; Qiuxia WU; Wenxiong KANG

doi:10.1007/s11704-021-0475-9

Frontiers of Computer Science >

2022 , Vol. 16 >Issue 5: 165332

DOI: https://doi.org/10.1007/s11704-021-0475-9

RESEARCH ARTICLE

Endowing rotation invariance for 3D finger shape and vein verification

Hongbin XU ¹ ,
Weili YANG ² ,
Qiuxia WU ^,¹ ,
Wenxiong KANG ^,²

Expand

¹. School of Software Engineering, South China University of Technology, Guangzhou 510006, China
². School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China

Received date: 23 Sep 2020

Accepted date: 22 Feb 2021

Published date: 15 Oct 2022

Copyright

2022 Higher Education Press

Fold

Abstract

Finger vein biometrics have been extensively studied for the capability to detect aliveness, and the high security as intrinsic traits. However, vein pattern distortion caused by finger rotation degrades the performance of CNN in 2D finger vein recognition, especially in a contactless mode. To address the finger posture variation problem, we propose a 3D finger vein verification system extracting axial rotation invariant feature. An efficient 3D finger vein reconstruction optimization model is proposed and several accelerating strategies are adopted to achieve real-time 3D reconstruction on an embedded platform. The main contribution in this paper is that we are the first to propose a novel 3D point-cloud-based end-to-end neural network to extract deep axial rotation invariant feature, namely 3DFVSNet. In the network, the rotation problem is transformed to a permutation problem with the help of specially designed rotation groups. Finally, to validate the performance of the proposed network more rigorously and enrich the database resources for the finger vein recognition community, we built the largest publicly available 3D finger vein dataset with different degrees of finger rotation, namely the Large-scale Finger Multi-Biometric Database-3D Pose Varied Finger Vein (SCUT LFMB-3DPVFV) Dataset. Experimental results on 3D finger vein datasets show that our 3DFVSNet holds strong robustness against axial rotation compared to other approaches.

Key words： 3D finger-vein; biometrics; point-cloud; CNN

Cite this article

Hongbin XU , Weili YANG , Qiuxia WU , Wenxiong KANG . Endowing rotation invariance for 3D finger shape and vein verification[J]. Frontiers of Computer Science, 2022 , 16(5) : 165332 . DOI: 10.1007/s11704-021-0475-9

1 Introduction

With the awareness of significance and demand for secured system growing in recent years, biometric verification technology has gained plenty of attention. Physiological or behavioural modalities are utilized for identification in biometrics, which is more secure, reliable and convenient than traditional approaches [1]. The physiological modalities can be divided into two categories according to where the biometric information is captured [2]: 1) extrinsic modalities, such as face, fingerprint, palmprint and iris, are the extrinsic characteristics of human body; 2) intrinsic modalities, such as finger vein and palm vein, are acquired from intrinsic features within human body, which are underneath the skin and hard to be forged [3]. The finger vein modality acquisition is achieved by irradiating the finger with near-infrared light, which scatters in the finger, and the hemoglobin in the shallow vein vessels on the other side of the finger absorbs part of the near-infrared light energy, thus the vein pattern on the near-infrared (NIR) camera are formed. This particular imaging approach introduces the problem of erratic image quality caused by vein pattern fuzziness [4], ambient light [5], finger posture variation [6]. Although finger vein verification has been thoroughly investigated since 2004 [7], it still remains a challenging task when disturbing factors like rotation happens. In particular, most finger vein systems adopt contactless method to capture 2D vein images, so the finger posture has a high degree of freedom, leading to shrinked common area and distorted vein pattern for different acquisitions with the same finger. Since the acquired vein information is from the shallow vein vessels of the finger, and the cross section of the finger can be approximated as an ellipse, the sampling frequency of the middle part of the finger cross section is high while lowering down on both sides in the 2D image. In consequence, the projection of vein texture on the finger surface to the imaging plane is non-linear. In addition, the depth of the vein to the surface of the skin also varies with the view direction of the camera. Therefore, when the finger rotates axially, the vein pattern in the image is distorted irregularly, denoted as finger posture variation problem. The schematic diagram of finger posture variation problem is shown inFig. 1.

Fig.1 Illustration of the finger posture variation problem if rotation is taken into consideration in finger vein verification

Full size|PPT slide

In this paper, we propose a 3D finger vein verification system to solve the finger posture variation problem. The system consists of two parts: 3D finger vein reconstruction and 3D finger vein verification. We firstly reconstruct the 3D vein model before recognition. The 3D finger-vein model contains full-view vein texture around the finger. Dislike 2D image, the effective imaging regions can not be reduced by finger axial rotation, and the distortion of vein pattern is significantly decreased due to the introduction of 3D information. To obtain the 3D finger vein model, we propose a silhouette-based 3D reconstruction method by the epipolar constraints among cameras and the prior that the cross-section of finger approximates an ellipse. The reason for adopting this 3d reconstruction method is that the low contrast of the vein image cannot provide enough features to achieve multi-view based 3D reconstruction, while other active 3D reconstruction methods like structure light and time-of-light (TOF) have difficulties in associating the 3D geometry and 2D vein texture simultaneously. Though our previous work [8] firstly proposed a similar 3D finger vein reconstruction method, the time consumption, memory requirements and computation costs are relatively large. Hence we rebuild the 3D reconstruction optimization problem in a completely different and more efficient form and several accelerating strategies are employed as well.

After reconstruting the 3D vein model, represented as 3D point cloud, we propose a novel neural network to extract rotation invariant feature specially for handling point cloud data. Point cloud can not be formalized as grid structure and processed by standard operation like convolution because of its unordered configuration. PointNet [9] is a pioneer in processing point cloud with a deep model and inspired a series of works. However, it is shown that directly utilizing PointNet for 3D finger vein verification is not appropriate through our experimental results. We argue that the reason lies in the particularity of the 3D finger vein point cloud model, as follows:

1) Recognizing our 3D vein model needs fine-grained features, because the coarse 3D shapes among fingers are similar to elliptic cylinders. Whereas PointNet is designed to recognize coarse-grained shapes and only proven well performance on coarse-grained datasets like ModelNet40 [10].

2) PointNet can not handle rotation well, because it requires the input point cloud to be aligned in theory, while our 3D vein models have different degrees of finger axial rotation.

In order to handle the finger axial rotation problem, we design a novel neural network architecture, namely the 3D Finger Vein and Shape Network (3DFVSNet), to extract rotation invariant feature from 3D finger vein model. We transform the original axial rotation problem into permutation problem with the help of defining a special rotation group

R = {r 1, r 2, . . ., r n}

for 3D finger vein model. For each input point cloud, arbitrary rotation corresponds to a permutation

π

in the rotation group

R = {r π (1), r π (2), . . ., r π (n)}

. Convolution on the rotation group keeps the rotation equivariance, while pooling on the rotation group achieves the rotation invariance.

The key contributions of our work are as follows:

1) We propose a silhouette-based 3D finger vein reconstruction optimization model and corresponding accelerating strategies to obtain 3D finger point cloud with finger-vein texture. Our proposed method is efficient to run on an embedded platform, which is desirable for real-time applications.

2) We are the first to design a novel end-to-end neural network architecture for point cloud-based 3D finger vein verification, namely the 3D Finger Vein and Shape Network (3DFVSNet). A special rotation group is designed for 3DFVSNet, transforming the original rotation problem into a permutation problem.

3) To the best of our knowledge, almost all open finger vein databases are based on 2D images collected in a controlled environment without distrubing factors like rotation and illumination. Considering the limited scale of the only open 3D finger vein datasets [8], we construct the largest publicly available 3D finger vein dataset, namely the Large-scale Finger Multi-Biometric Database-3D Pose Varied Finger Vein (SCUT LFMB-3DPVFV) Dataset, by collecting more 3D finger vein models on various posture which is three times larger than [8].

4) Experiments on SCUT-3DFV-V1 [8] and SCUT LFMB-3DPVFV Dataset demonstrate the superiority of the proposed 3DFVSNet model. Furthermore, proof for rotation invariance is given by the visualization of 3DFVSNet.

The remainder of this paper is organized as follows. A brief introduction of related works is presented in Section 2. Section 3 provides detailed description of our silhouette-based 3D finger vein reconstruction method, and presents the intuition and topology of our axial rotation invariant neural network. In Section 4, experimental results are provided. Then further discussion is given in Section 5.

2 Related work

In this paper, we aim at designing an anti-rotation method for finger vein verification utilizing the reconstructed 3D data. Hence we firstly introduce the development of standard 2D image based finger vein verification. Then some anti-rotation methods in finger vein verification are discussed. Furthermore, related works on 3D finger vein verfication are provided. To the best of our knowledge, no deep learning based works on 3D point cloud have been researched for 3D finger vein verification, so we also introduce some point cloud based deep learning methods in the last part.

2.1 Finger vein verification

A 2D finger vein verification system generally consists of four procedures: vein pattern acquisition, region of interest (ROI) extraction, feature enhancement, and feature matching. The traditional methods are more focused on feature enhancement and extraction, and there are mainly four groups of feature extraction methods, vein structure based [7, 11, 12], local feature based [13, 14], coding based [15, 16], and subspace learning based [5, 17]. In recent years, deep learning based methods have been transferred into finger vein verification and identification. Early attempts utilized shallow CNN models [18, 19], then some classical deep CNN models were also adopted for finger vein recognition [20-22]. As a data-driven method, CNN can extract discriminative features from 2D finger vein images, so the vein texture distortion in finger axial rotation problem can be suppressed by increasing in the amount of data, but the reduction of the common area will seriously degrade the recognition performance.

2.2 Anti-rotation finger vein verification

Finger axial rotation causes the reduction of common area and the distortion of vein texture. Chen Q et al. [23] made a deep analysis about finger vein image deformation. Prommegger et al. [6, 24] systematically evaluated the influence of axial rotation on the performance of 2D finger vein recognition systems. There are mainly two anti-rotation strategies: 1) Construct a rotation invariant feature extractor [25, 26]; Xi et al. [25] proposed a feature combined the global texture characteristics, which is invariant to small rotation. Matsuda et al. [26] designed a feature point extractor that is robust against irregular shading and vein deformation. 2) Normalize or align first, then extract features, prior knowledge is needed in these methods [5, 27-31]; Lee et al. [27] aligned the image by vein minutia points and utilized the affine model. Jang et al. [28] adopted a bit-shift matching strategy to reduce the variations. Huang et al. [29] developed a normalization algorithm based on the prior that the cross sections of fingers are approximately ellipses and the vein pattern captured by the camera is close to the finger surface, we also adopt this prior in our work. Qiu et al. [5] proposed a similar pseudo-elliptical sampling model to retain the spatial distribution of vein patterns and reduce the nonlinear distortion caused by 3D to 2D projection mapping. In this work, we combine these two aspects, reconstructing the 3D vein model and restoring 3D information to normalize the vein distortion, designing an novel 3DFVSNet network to extract rotation invariant features.

2.3 3D finger vein verification

Binocular [32] and trinocular [33] stereo vision were adopted to reconstruct 3D finger vein patterns. However, only the frontal-view vein skeletons were used, which have few matching points and lead to the lost of a large amount of texture information. In our previous work [8], a three-camera system for full-view 3D finger vein reconstruction was developed. Concerning too much time to accomplish the whole 3D reconstruction pipeline, we improved the optimization problem for reconstructing the 3D finger model to a more effecient form, and reduced the number of parameters to make it easier to converge and run in real-time. Besides, the proposed 3DFVSNet can achieve axial rotation invariance in theory. While the CNN model in [8] have difficulties in representing rotation transformation, which is supported by our experimental results in Section 4.

2.4 Deep learning on 3D point cloud

Point cloud is the most common representation for 3D geometric information in a non-Euclidean structure, which is a set of irregular and unordered points, therefore standard convolution can not be extended. Recently, more researches focus on presenting a unified architecture that can directly takes point cloud as input and outputs features or class labels. This task was pioneered by PointNet [9], which proposed a basic PointNet vanilla architecture with permutation invariance among points by operating on each point independently and finally applied a simple symmetric function to accumulate features from points. PointNet++ [34] was then proposed to consider neighborhoods of local regions rather than merely utilizing single point independently. More related works [35-37] are proposed to improve the network’s ability for efficiently extracting feature from local regions in 3D space.

As discussed in Section 1, PointNet and its extension networks cannot be directly applied in 3D finger vein verification because of the necessity of fine-grained features and robustness towards rotation. Hence we firstly propose a novel neural network by integrating a specially designed rotation group to handle these problems. Comparison with these state-of-the-art PointNet-based networks in Section 4 demonstrate the superiority of the proposed 3DFVSNet.

3 3D finger vein verification system

In this section, we briefly introduce the proposed 3D finger vein verification system framework. Then we present the system in detail by two parts: 3D finger vein reconstruction and 3D vein verification network architecture.

The proposed system framework is shown in Fig. 2. The 3D vein capture device consists of three infrared cameras mounted at each vertex of an equilateral triangle, with three infrared light sources mounted opposite each camera. When a finger is inserted into the device, the light sources quickly take turns illuminating the finger, and the corresponding camera captures three finger vein images. Our proposed 3D finger vein reconstruction algorithm uses the three images to reconstruct and obtain a 3D finger vein point cloud model, which is then fed into the proposed 3DFVSNet to extract rotation invariant global feature. Finally, the cosine similarity distance is adopted for verification.

Fig.2 The framework of the proposed 3D finger vein verification system

Full size|PPT slide

3.1 3D finger vein reconstruction

In this section, we introduce the silhouette-based 3D finger vein reconstruction optimization problem in a completely different and more efficient form, as well as the accelerating strategies employed, because the previous method [8] consume a huge time cost, lacking feasibility to run in real-time. Further detailed information about the 3D reconstruction process is provided in the Appendix.

3.1.1 The silhouette-based 3D finger vein reconstruction

Following the prior that the cross-section of finger approximates an ellipse, we employ the finger contours and epipolar constraints to reconstruct the 3D finger vein model. We rebuild an optimization model to estimate the parameters of each ellipses, obtain the 3D model by stacking up ellipses on all cross sections. Then pixel intensities representing vein texture are mapped according to the correspondence between point cloud and image.

As Fig. 3 shows, the 3D vein capture device is fitted with 3 NIR cameras, strictly placed on the vertices of a regular triangle, namely

C 1

C 2

and

C 3

. Images from the three views are captured simultaneously during 3D reconstruction. We obtain the intrinsic and extrinsic parameters of cameras using the calibration method in [38], and define a 3D Cartesian coordinate system following Fig. 1. A cross section of the finger is represented by an ellipse in the 3D Cartesian coordinate. As we adopt the pinhole camera model and epipolar constraints, a set of constraint lines is obtained, which pass through the finger edge in the finger vein image and is tangent to the ellipse, as shown in Fig. 3. Image

I m g i

is captured by camera

C i

i

is noted as the camera index. The edge points of finger in these images are

u i

and

b i

. Therefore,

U i

and

B i

are corresponding points of

u i

and

b i

in 3D camera imaging plane. The dashed lines

C i U i

and

C i B i

in red, green, yellow are the constraint lines, which are tangent to the ellipse in

Z

X

plane.

Fig.3 3D reconstruction schematic diagram. (a) shows a finger model in our predefined 3D coordinate system; (b) extracts a cross section from the finger model (a) and spreads in a 2D plane

Full size|PPT slide

An ellipse can be represented by a quadratic equation:

(1)

A X 2 + B X Z + C Z 2 + D X + E Z + F = 0,

where

A, B, C, D, E, F

are undetermined coefficients. In default,

A

is set to 1 to normalize these coefficient. Hence only five remaining coefficients are needed to estimate an unique ellipse. Naturally, feeding the exact point coordinates

X

and

Z

can help solving the quadratic equation in Eq. (1). However, in the image coordinate system of each view, we can only obtain the

X

and

Y

coordinates from images, whereas the

Z

coordinate representing the depth is uncertain. Since no direct

Z

coordinates is acquirable in our system, we turn to build an optimization problem with constraints describing the correspondence among different views to infer the depth information (

Z

coordinates). The constraint lines

C i U i

and

C i B i

are noted as Eqs. (2) and (3).

(2)

C i U i : z = k C i U i x + b C i U i,

(3)

C i B i : z = k C i B i x + b C i B i,

where

k C i U i

and

k C i B i

are slopes,

b C i U i

and

b C i B i

are biases. Thus we have six constraint lines in total to solve an ellipse. In consequence, we construct an optimization problem as follows:

(4)

min B, C, D, E, F (∑ i = 1, 2, 3 (d C i U i + d C i B i)) s . t . Δ C i U i ≤ 0, Δ C i B i ≤ 0, i = 1, 2, 3,

(5)

{Δ C i U i = β C i U i 2 − 4 α C i U i γ C i U i ≤ 0 α C i U i = 1 + B k C i U i + C k C i U i 2 β C i U i = B b C i U i + C k C i U i 2 + 2 k C i U i b C i U i C γ C i U i = b C i U i 2 C + E b C i U i + F,

(6)

{Δ C i B i = β C i B i 2 − 4 α C i B i γ C i B i ≤ 0 α C i B i = 1 + B k C i B i + C k C i B i 2 β C i B i = B b C i B i + C k C i B i 2 + 2 k C i B i b C i B i C γ C i B i = b C i B i 2 C + E b C i B i + F,

where

d C i U i

and

d C i B i

are the minimal distance of all the

C i U i

and

C i B i

respectively.

Δ C i U i

and

Δ C i B i

are constraints points on ellipse confirmed by

B, C, D, E, F

to constraint lines inferred from tangent relationship among six constraint lines and ellipse equation. The detailed formula of constraints

Δ C i U i

and

Δ C i B i

are described in Eqs. (5) and (6).

The 3D reconstruction model proposed in this paper is based on the same prior knowledge with our previous work [8], but it is a completely different 3D reconstruction optimization model. Our model is much more simple, and convergence more faster, thus is more suitable for real-time application scenarios. Further comparisons and some reconstructed 3D finger vein models are provided in the Appendix.

3.1.2 Accelerated scheme

The 3D finger model is obtained by stacking up every ellipse together, therefore the optimization problem will be solved for every ellipse, which may accumulate a huge computation time. Considering the fact that verification needs instantaneity, we attach more importance to the trade-off between speed and performance. Hence, we propose three accelerating strategies specially for this 3D reconstruction process.

● Sharing parameters among similar shapes We inspect that cross section ellipses of a finger usually have similar patterns on some aspects, such as rotation angle and eccentricity, so we modify the optimization problem to a sub-optimization problem, by fixing some parameters like

B

and

C

, only optimizing remaining parameters

D, E, F

. The modified sub-optimization problem is presented in Eq. (7). We can firstly solve an optimization problem in Eq. (4) to get

B 0

C 0

D 0

E 0

F 0

. Then for the sub-optimization problem in Eq. (7), preceding solved

B 0

and

C 0

are assigned to

B

and

C

as shared parameters during optimizing.

(7)

min D, E, F (∑ i = 1, 2, 3 (d C i U i + d C i B i)) s . t . Δ C i U i ≤ 0, Δ C i B i ≤ 0, i = 1, 2, 3 . B = B 0, C = C 0

● Interpolation among sparse profiles If a finger has

M

cross sections in total with same interval, then

M

optimization problems are to be solved, accumulating a non-trivial time cost. An alternative approach is to reduce the candidate cross sections. Given

M

cross sections,

N

M

, we can solve

N

cross sections instead of estimating all

M

ellipses, and then interpolate among the

N

cross sections to obtain

M

cross sections with bilinear interpolation.

● Good initialization using former ellipse A good initialization helps optimization converge faster and provide better results. Based on this insight, we utilize the former solution as an initial guess for the next cross section ellipse, because we find that finger shape does not vary too much but following a smooth continuous variation.

3.2 Rotation invariant network architecture

In this section, we firstly introduce some basic concepts of equivariance, invariance and rotation group. And then introduce the designed rotation group and the mathematical proof of its equivariance, which can convert the rotation problem into an permutation problem. Finally, the structure of our proposed network extracting rotation invariant features is presented.

3.2.1 Preliminary

Equivariance and invariance

To prevent misunderstanding, we want to introduce the concept of equivariance and invariance towards rigid transformations like rotation and translation. In [39], the author analyzed the importance of equivariance and invariance in standard 2D CNN architecture:

● Convolution maintains translation equivariance.

● Pooling achieves translation invariance.

Translation equivariance means that if the input image is shifted the feature map is shifted correspondingly, while translation invariance means that the output prediction is invariant no matter how the input image is shifted. The concept of equivariance and invariance can also be spread to rotation as well. However, in [39], it has been proved that standard convolution operation in CNN can not achieve rotation equivariance. Hence, if we want to handle the rotation problem with a CNN, we can not directly apply standard convolution layers in the architecture. To solve this issue, in this paper, we propose to spread the convolution to a specially designed rotation group based on the properties of our 3D finger vein point cloud.

Definition of Rotation Group

Given a set

G

and an operation

∘

which combine two elements of

G

. For example, the combination of

a

and

b

in the defined group

G

is denoted as

a ∘ b

. The definition of group

(G, ∘)

should obey four properties:

1. Closure: $∀ a, b ∈ G, a ∘ b ∈ G$ ,

2. Associativity: $∀ a, b, c ∈ G, (a ∘ b) ∘ c = a ∘ (b ∘ c)$ ,

3. Identity element: $∃ e ∈ G, a ∘ e = e ∘ a = a$ ,

4. Inverse element: $∀ a ∈ G, ∃ b ∈ G, s . t ., a ∘ b = b ∘ a = e$ .

Like [39], we can easily construct a 2D rotation group

R = {r i = 2 π i M, i = 0, 1, . . ., M − 1}

. There are

M

different rotation in the group

R

. If any element

r j

of the rotation group

R = {r 0, . . ., r j − 1, r j, r j + 1, . . ., r M − 1}

happen on the input, the rotation group would transform to

R ′ = {r j, r j + 1, . . ., r M − 1, r 0, . . .,

r j − 1}

. For each input data

x

, we rotate it with the elements of the rotation group

R

and then feed it into a shared network to obtain

M

feature vectors in total. Rotating

x

with each element in the rotation group

R

equals to permuting the

M

feature vectors. In this way, rotation problem can be transformed into a permutation problem, and the rotation equivariance is ensured as well. Hence, in this paper, we build a special rotation group based on the elliptic cylindrical structure of a 3D finger model to achieve the rotation equvariance.

3.2.2 Design of rotation group

Preprocessing

The illustration of how we preprocess the point cloud is provided in Fig. 4. As mentioned in previous sections, the 3D finger vein point cloud follows an elliptic cylindrical structure. Based on this special prior, it is unnecessary to enumerate all possible rotation situations in

S O (3)

, and we can align the center axis of the 3D finger model to the

Z

axis. During the preprocessing procedure, we calculate the center coordinates

(x c, y c)

of every ellipse profile along the

Z

axis. Then the center axis comprised of center points of ellipse profiles is estimated by least square method and further aligned to

Z

axis. As shown in Fig. 4, only the points between the boundary lines(two horizontal dashed lines) is retained. In this way, rigid transformation on

X

and

Y

axes can be easily eliminated. The remaining problem lies in rigid transformation on

Z

axis, especially axial rotation. Then we can focus on constructing a rotation group facing single axis rather than all the

X / Y / Z

axes.

Fig.4 Illustration of the preprocessing procedure on 3D finger vein point cloud

Full size|PPT slide

Rotation group

In previous section, we preprocess the 3D finger vein point cloud to an aligned postition and eliminate the influence of rotation around

X

and

Y

axes. Now, we only need to consider the axial rotation around the

Z

axis. In Fig. 5, we illustrate our rotation group on 3D finger vein point cloud. Since the profile of finger is assumed to be an ellipse, we can separate it into different parts by degrees. For example, as shown in the figure, the edge line

E

0 o

ranges from

− ϵ o

ϵ o

. All points inside these degrees belong to the edge line on

0 o

. In this way, we can separate the original point cloud into a combination of

M

edge lines ranging from

0 o

360 o

regularly:

P = {E (2 π M ∗ 0), . . ., E (2 π M ∗ (r − 1)), E (2 π M ∗ r), . . ., E (2 π M ∗ (M − 1))} .

If the point cloud rotate

r o

along the

Z

axis, the new point cloud can be separated into

P ′ = {E (2 π M ∗ r), . . ., E (2 π M ∗ (2 r − 1)), E (2 π M ∗ (2 r)), . . ., E (2 π M ∗ (r + M − 1))} .

Considering the fact that the degree ranges from

0 o

360 o

, the new point cloud can be normalized to

P ′ = {E (2 π M ∗ r), . . ., E (2 π M ∗ 0), E (2 π M ∗ (M − 1)), . . ., E (2 π M ∗ (r − 1))} .

We can find that if point cloud

P

is rotated

r o

P ′

, it equals to a permutation of the separated edge lines. Hence, the point cloud can be transformed to a rotation group while the edge lines can be treated as the element of rotation group. In this way, on applications like classification, rotation equivariance can be ensured on this rotation group by transforming rotation to permutation problem. And the permutation caused by rotation can be easily converted to invariance with operations like max-pooling, average-pooling, and etc.

Fig.5 Illustration of the rotation group on our 3D finger vein point cloud

Full size|PPT slide

Proof of equivariance

It is denoted that a point cloud

P ∈ R N × 3

, which stores the X/Y/Z coordinates of

N

points. As discussed in the previous section, only axial rotation is considered. We can have a rotation group

R = {r i ∈ R 3 × 3, i = 0, 1, . . ., M − 1}

. The rotation matrix

r i

of each element in the rotation group can be calculated by the following formula:

(8)

r i = [c o s (2 π i / M) s i n (2 π i / M) 0 − s i n (2 π i / M) c o s (2 π i / M) 0 001] .

The closure property of group in Eq. (9) can be easily satisified because of the usage of periodic function

s i n

and

c o s .

(9)

∀ r a, r b ∈ R, r a r b = r (a + b) ∈ R .

The rotation transform operator

T r i

is defined as:

(10)

T r i (P) = (r i P T) T = P r i T .

Rotation operator

T

is linear with respect to the rotation group

R

(11)

T r i (T r j (P)) = P r j T r i T = P (r i r j) T = T r i r j (P) .

Given a neural network, denoted as function

f

in Eq. (12). All elements of rotation group are processed by the network

f

with shared weights, and the whole process handling the rotation group is denoted as function

F

(12)

F (P) = [f (T r 0 (P)), f (T r 1 (P)), . . ., f (T r M − 1 (P))] .

If arbitrary rotation

r ∈ R

happens on the point cloud, all elements of group

R

is rotated as well as Eq. (13).

(13)

F (T r (P)) = [f (T r 0 (T r (P))), f (T r 1 (T r (P))), . . ., f (T r M − 1 (T r (P)))] .

Transforming Eq. (13) with Eq. (11), we can get the following simplified Eq. (14).

(14)

F (T r (P)) = [f (T r 0 r (P)), f (T r 1 r (P)), . . ., f (T r M − 1 r (P))] .

Because of the closure propertity of the definition of group:

∀ a, b ∈ R, a ∘ b ∈ R

, we can always find a corresponding element in the original group

R

for each element in Eq. (14). In other words, rotation operation

T r

corresponds to a special permuting function

π

which changes the order of elements in the rotation group.

(15)

∀ r ∈ R, ∃ P, s . t . F (T r (P)) = π (F (P)) .

In summary, rotating input point cloud

P

equals to permuting the output

F (P)

of the network, under the designed special rotation group

R

. A simple example illustrating how the rotation group transform rotation problem into permutation problem is also provided in Fig. 6.

Fig.6 A simple example of the rotation group. $∗$ represent the convolution operator. Rotation on the input data equals to permutation in the group

Full size|PPT slide

3.2.3 Integration with hierachical neural network

In this section, we aim to integrate our designed rotation group into a hierachical network to achieve rotation equivarance. In theory, rotating the input equals to permuting the output features. If we want to achieve invariance towards permutation, global pooling is a simple and effective trick, which was firstly proposed in PointNet [9]. However, we find that PointNet is not suitable for an equivariant design. Because PointNet only has one single global feature aggregation, in which the global pooling towards the whole point cloud may lose many detailed information. As subsequent researches like PointNet++ [34] and DGCNN [36] suggest, applying a hierachical structure for feature aggregation achieves better performance than single global feature aggregation [9]. Hence, in this paper we apply a hierachical structure to aggregate features from the designed rotation group and the 3D finger vein point cloud. In Fig. 7, we present the brief architecture of our network.

Fig.7 Illustration of the network architecture. The input point cloud in the first row is rotated arbitrary to be the input point cloud in the second row. “GCN” means graph convolutional neural network. “MLP” represents multi layer perceptron and “Pool” means global pooling. The initial rotation problem is transformed into a permutation problem by the rotation group, and finally transformed to invariance by global pooling

Full size|PPT slide

Separating rotation group

Before being inputted into the network, the point cloud may suffer from

S O (3)

rotation. As mentioned in Section 3.2.2 Preprocessing, we align the center axis of the 3D finger vein point cloud to the

Z

axis based on geometric prior of finger. In this way, the rotation of 3D finger model is simplified to an axial rotation problem. However, since some points outside the boundary are cropped during preprocessing, the aligned point clouds might have different number of points. Hence, the dimension of the input data is not fixed. Regular convolution and pooling operation can not be directly applied and the input data can not be simply treated as image like [8]. Then, based on the designed rotation group in Section 3.2.2 Rotation group, we can separate the original point cloud into

M

edge lines. Each edge lines represent the feature in a specific degree. In other words, the

360

degree of rotation is separated into

M

parts discretely. The larger

M

we have, the more possible rotation situations are taken into consideration.

Graph convolutional network

After separating the rotation group, points are assigned to different edge lines. In order to aggregate local information, we can construct a local KNN-Graph on each point of the point cloud [36]. It is noted that the local relationship between each point and its k-nearest neighbors remains unchanged no matter how the point cloud rotates. Then the local region is represented by a graph in which the point and each of its k-nearest neighbors have an edge between them, namely KNN-Graph. A simple example of the graph convolution processing KNN-Graph is shown in Fig. 8. It is denoted that a point

p

has k-nearest neighbors

n i, i = 1, . . ., K

, encoded with its X/Y/Z coordinates

p g e o ∈ R 3

and gray intensity

p t e x ∈ R

. We can build a KNN-Graph from them, assuming that the point

p

and each of the k-nearest neighbors have an edge

e i

between them. Geometric and texture features between

p

and

n i

are used to construct edge feature

e i

describing the KNN-Graph, as Eq. (16).

Fig.8 An example of a simple graph convolutional layer. $p$ represent the center point of this local region and $n i, i = 1, 2, 3, 4, 5$ represent the k-nearest neighbors of point $p$ .

Full size|PPT slide

(16)

e i = C o n c a t (‖ p g e o − n i g e o ‖ 2, p t e x, p t e x − n i t e x) .

Then MLP layers are applied on the edge feature

e i

as Eq. (17).

(17)

h i = M L P (e i) .

Pooling is used to aggregate feature

h i

from all the edges in the graph. Usually max-pooling and avg-pooling are prefered, we choose max-pooling in our network.

(18)

o = M a x ({h i, i = 1, . . ., K}) .

In the following graph layers, the input is a pointwise feature

x ∈ R D

rather than the initial four dimensional point cloud, the features of point

p

and its neighbor

n i

on the KNN-Graph are concatenated to construct the edge feature

e i

as Eq. (19).

(19)

e i = C o n c a t (p f e a t, p f e a t − n i f e a t) .

By stacking multiple graph convolutional layers, we can progressively aggregate the local information around each point of the point cloud. In default, we set

K = 20

for constructing the local KNN-Graph.

From rotation to permutation

As shown in Fig. 7, the axial rotation is transformed to permutation by constructing the rotation group. When separating the rotation group, the original 3D finger vein point cloud is separated into different edge lines. Then graph convolutional network is used to aggregate pointwise feature from local region. We respectively apply global max-pooling to the separated

M

edge lines for feature aggregation in the first stage. Following the group separation, the permutation property is still equvariant in the network. Afterwards, MLP layers are further applied and another global max-pooling is used for feature aggregation in the second stage. The final global max-pooling achieves invariance towards the rotation group.

4 Experiment

4.1 Implementation

The 3D finger vein reconstruction method is implemented in C++. To provide a direct comparison with [8], we further provide an implementation of Matlab. The proposed network is trained on NVIDIA 1080Ti GPU implemented by Pytorch. We randomly divide each dataset into training set, validation set and test set with a ratio of 6∶2∶2. During the training, Adam optimizer is used with a learning rate of 0.0002. Dropout rate is set to 0.5 and the batch size is four. The network is trained for 200 epochs in total. For constructing the KNN-Graph, 20 nearest neighbors is considered. Besides the cross entropy loss, center loss [40] is also appended to construct the loss function for training with a weight of 0.001. The embedding layer of the network is used to calculate cosine distance [40] for verification.

During the preprocessing stage, we divide the 3D finger model into 400 ellipses and calculate the center axis constructed by the centers of all those 400 ellipses. Then the center axis is aligned to Z axis and some redundant regions(points) are removed, as we mentioned in Section 3.2.2 Preprocessing. Hence the exact point number of input point cloud in our proposed 3DFVSNet is not fixed. As a simple trick to process variable number of points [9] and construct a graph convolutional neural network [36],

1 × 1

convolution layers is used instead of fully-connected layers.

Figure 7 presents the basic structure of our proposed 3DFVSNet. The input point cloud is encoded by the X/Y/Zcoordinates and the gray intensities, assumed to be

P ∈ R N × 4

. Then the point cloud

P

is separated into a rotation group

R = {E 1, . . ., E M}, E 1 ∈ R N 1 × 4, . . ., E M ∈ R N M × 4, N = ∑ i = 1 M N i

. In default,

M

is set to 360. Afterwards, a KNN-Graph with

K

as 20 is constructed and then fed to the graph convolutional layers. In each graph layer, the dimension of KNN-Graph is

N × K × F

. The graph layer uses convolution with kernel size of

1 × 1

and the global max pooling to aggregate the information for

K

neighbors [36]. Two graph layers is stacked together with channels

3 − 32 − 64

. Then mlp layers follows with features

64 − 128 − 512

. The first global max pooling is used to aggregate information from the aforementioned

M

elements(edge lines) of the rotation group, and the output is of

M × 512

dimension. Furthermore, several mlp layers with mapping of

512 − 1024

are applied and the second global max pooling follows which outputs the final

1024

dimensional global feature vector. Three fully-connected layers with channels mapping

1024 − 512 − 256 − C

, in which

C

is the number of classes in the training set. Cosine distance calculated based on the

256

dimensional embedding feature is finally used for verification.

4.2 Datasets

To the best of our knowledge, the open finger vein datasets are mostly comprised of 2D images and collected in a controlled environment without disturbance of rotation and illumination. Consequently, we construct a new 3D finger vein database for the evaluation of proposed system, namely the Large-scale Finger Multi-Biometric Database-3D Pose Varied Finger Vein (SCUT LFMB-3DPVFV) Dataset, which is released at https://github.com/SCUT-BIP-Lab/SCUT-LFMB-3DPVFV. The new SCUT LFMB-3DPVFV Dataset is three times larger than SCUT-3DFV-V1 [8]. The details of the adopted datasets are introduced in Table 1. The samples in the datasets are collected from the index finger and middle finger of both left and right hands. To simulate the arbitrary rotation in realistic scenarios, the volunteers are asked to rotate their fingers in various poses. After deleting some samples which can not be reconstructed as 3D models because of outlier movement and overexposure, there are 203 fingers in SCUT-3DFV-V1 and 702 fingers in our proposed LFMB-3DPVFV Dataset. Each finger was acquired 14 times and the 3D model was reconstructed from images from three different views. In SCUT-3DFV-V1, for the first 6 of 14 acquisitions, the data are collected with each finger in a normal pose with small rotation no more than

± 30

degrees; for the next 6 of 14 acquisitions, the data are collected with larger rotation no more than

± 80

degrees; for the remaining two ones, rotation on the other directions besides axial direction is included. In LFMB-3DPVFV Dataset, the first 10 acquisitions are collected with rotation no more than

± 30

degrees, while the remaining four times are collected by rotating the fingers with larger rotation no more than

± 80

degrees. Finally, we have 1218 samples for 3DFV-V1-ER (easy rotation), 2842 samples for 3DFV-V1-HR (hard rotation), 7020 samples for LFMB-3DPVFV-ER and 9828 samples for LFMB-3DPVFV-HR.

Tab.1 Instruction of the SCUT-3DFV-V1 and LFMB-3DPVFV Dataset

Subset	Rotation	Subjects	Times	Samples
SCUT-3DFV-V1-ER [8]	Small	203	6	1218
SCUT-3DFV-V1-HR [8]	Large	203	14	2842
3DPVFV-ER	Small	702	10	7020
3DPVFV-HR	Large	702	14	9828

4.3 Time consumption for 3D reconstruction

The original 3D reconstruction method [8] was only implemented in Matlab. In this paper, we also provide a matlab implementation and further provide a C++ implementation. The time consumption comparison is presented in Table 2. The whole 3D reconstruction pipline is divided into three partitions: preprocessing, reconstruction and texture mapping. As shown in Table 2, two different platforms are used for evaluate the time consumption of the proposed 3D reconstruction method. Comparisons on Device A (Intel i5-8500 CPU + 12G RAM) demonstrate the efficiency of the proposed 3D reconstruction pipeline in this paper. Furthermore, the 3D reconstruction algorithm can also be deployed on embedding device like Device B (Intel Atom Z8350 CPU + 2G RAM).

Tab.2 Time consumption for 3D Reconstruction pipline (ms)

Methods	[8]	Ours
Device	A¹	A	A	B²
Implementation	Matlab	Matlab	C++	C++
Acceleration			√	√
Preprocessing	848.19	948.33	59.52	270.79
3D reconstruction	1532.09	724.65	4.75	19.94
Texture mapping	5331.14	328.04	28.49	138.5
Total	7711.42	2001.02	92.76	429.23

¹ Device A: Intel i5-8500 CPU + 12G RAM. ² Device B: Intel Atom Z8350 CPU + 2G RAM.

4.4 Visualization proof of rotation invariance

In Section 3, we prove that the integration of a rotation group can transform a rotation problem to a permutation problem. In other words, convolution on the rotation group is equivariant to rotation, while pooling on the rotation group is invariant to rotation. Apart from the simplified proof in theory in Section 3, we want to give a visualization proof of what our 3DFVSNet is focusing on towards different rotation. In intuition, if the aforementioned assumptions is right, no matter what rotation happens, the network should concentrate on the same areas. Hence, in order to understand where our 3DFVSNet concentrates on facing rotation, Grad-CAM++ [41] is applied for neural network’s visulaization and interpretation here.

First, for ideal situations, we append random rotation to SCUT-3DFV-V1-ER to simulate arbitrary posture. The visualization results are provided in Fig. 9. The first row shows the 3D heatmap of the network, while the second row shows the top view of the 3D heatmap for a clear comprehension. In the figure, the original point cloud is used for visualization in the left column of each sample, then it is rotated randomly and then visualized in the right column. The black dotted arrow is used to provide a reference direction for comparison. Along the reference direction, we can find the heatmap rotates as the point cloud rotates, which further proves our assumption of equivariance.

Fig.9 Visualization of our 3DFVSNet’s attention towards arbitrary simulated rotation. Grad-CAM++ [41] is applied for calculating the heatmap of the input point cloud. Besides the 3D heatmap, we also provide the top view of the 3D heatmap point cloud for a clear comprehension. The black dotted arrow means the reference direction

Full size|PPT slide

However, the experiments in Fig. 9 only considers ideal situations with only rigid transformation generated randomly. In realistic situations, non-rigid deformation might happen and the finger vein texture might be affected by the noises and illumination as well. What if some unpredictable factors in realistic situation disturb the results?

Then, for realistic situations, we visualize the network on SCUT-3DFV-V1-HR to find out whether the assumption still works towards rotation. The visualization results are presented in Fig. 10. For each sample in Fig. 10, the first column provides the 3D heatmap of point cloud and the second column presents the top view of the 3D heatmap. Furthermore, the heatmap of point cloud is unfolded to an image by interpolating the corresponding neighbor points, as shown in the third column. In the fourth column, the corresponding 2D unfolded full-view texture image is also provided for a direct comparison. The black dotted arrow represents the reference position and the red arrow means the direction of unfolding to construct the 2D maps. From Fig. 10, we can find that most of the activated areas in 2D heatmaps are similar among different postures. Most of the activations surround the finger vein patterns in the heatmap. Due to unexpected disturbance in the imaging system such as illumination change, the captured vein texture may have some difference in 2D texture, for example, sample 1 and sample 3 in Fig. 10. From the figure, it is shown that the activated regions among different rotations share a common concentration of the vein structure.

Fig.10 Visualization of our 3DFVSNet’s attention towards arbitrary rotation in realistic situation. Grad-CAM++ [41] is applied for calculating the heatmap of the input point cloud. For each sample, 3D heatmap and its top view are presented in the first two columns. The black dotted arrow provides a reference direction and the red arrow represents the unfolding direction. Following the direction of red arrow, we unfold the point cloud and interpolate the discrete points into 2D texture map and heatmap with a resolution of $200 × 200$ in the remaining two columns

Full size|PPT slide

As the experimental visualization shows, our 3DFVSNet can maintain rotation equivariance during convolutional layers and achieve rotation invariance in the final global pooling layer. The integration of a rotation group into network not only works in theory (Fig. 9), but also be practicable in realistic situations (Fig. 10).

4.5 3D finger vein verification

In this section, we provide quantitative evaluation results to compare our proposed 3D based 3DFVSNet and other representative algorithms on the task of finger vein verification. The adopted evaluation metrics are the equal error rate (EER) and the receiver operating characteristics (ROC) curve. EER is the value when the false acceptance rate (FAR) is equal to the false reject rate (FRR), acting as a standard metric for estimating the performance of a verification system. The ROC curve shows the performance of the verification algorithm under different thresholds, with FAR as the X coordinate and FRR as the Y coordinate. The FAR is the ratio of the number of false acceptances divided by the number of identification attempts, and FRR is the ratio of number of false recognition divided by the number of authentication attempts.

Since we aim to explore the robustness towards rotation in finger vein verification, we divide the dataset into an easy partition with small rotation (ER) and a hard partition with large rotation (HR). It is noted that the original point cloud in our 3D vein model datasets can not be directly processed by 2D image based methods, hence we unfold the 3D finger vein model to a full view image based on the unfolding methods introduced in [8]. Then state-of-the-art 2D based methods are evaluated on the unfolded full view images. Traditional 2D based finger vein verification methods like DOCH [42], Uniform LBP [43] and MCP [11] are compared. 2D deep learning based approaches such as DeepVein [20] and Das et al. [22] are evaluated on the unfoled full view images as well. Furthermore, 3D point cloud based methods (PointNet [9], DGCNN [36], DensePoints [37]) are also taken into consideration in the comparison experiment. The experimental results for various methods are summarized in Tables 3 and 4. The receiver operating characteristics(ROC) curves for comparison are provided in Figs. 11-14.

Tab.3 Performance comparison on SCUT-3DFV-V1

Methods	Data	3DFV-V1-ER/%	3DFV-V1-HR/%
DOCH [42]	Image	8.42	25.56
Uniform LBP [43]	Image	8.65	17.29
MCP [11]	Image	4.32	20.75
Deepvein [20]	Image	8.08	9.25
Das et al. [22]	Image	6.25	8.77
Mobile CNN [8]	Image	3.05	9.54
ResNet50 [44]	Image	3.67	11.76
PointNet [9]	Points	10.10	15.39
DensePoints [37]	Points	4.11	7.73
DGCNN [36]	Points	3.89	7.00
3DFVSNet	Points	2.61	5.07

Tab.4 Performance comparison on LFMB-3DPVFV

Methods	Data	3DPVFV-ER/%	3DPVFV-HR/%
DOCH [42]	Image	18.05	31.78
Uniform LBP [43]	Image	19.09	31.17
MCP [11]	Image	14.44	28.40
Deepvein [20]	Image	5.56	8.71
Das et al. [22]	Image	5.67	8.91
Mobile CNN [8]	Image	2.97	5.38
ResNet50 [44]	Image	3.23	5.97
PointNet [9]	Points	4.23	8.43
DensePoints [37]	Points	4.88	8.36
DGCNN [36]	Points	3.60	6.30
3DFVSNet	Points	2.81	4.49

Fig.11 The ROC curves of various algorithms evaluated on dataset 3DFV-V1-ER (easy rotation)

Full size|PPT slide

Fig.12 The ROC curves of various algorithms evaluated on dataset 3DFV-V1-HR (hard rotation)

Full size|PPT slide

Fig.13 The ROC curves of various algorithms evaluated on dataset LFMB-3DPVFV-ER (easy rotation)

Full size|PPT slide

Fig.14 The ROC curves of various algorithms evaluated on dataset LFMB-3DPVFV-HR (hard rotation)

Full size|PPT slide

Compared with classic 2D based methods, our 3DFVSNet performs better on all four benchmarks on SCUT-3DFV-V1 and LFMB-3DPVFV dataset. Ttraditional methods like DOCH [42], MCP [11] and Uniform LBP [43] have much higher EER than our 3DFVSNet’s. In normal situations without any rotation, those methods have been proven to be able to extract discriminative features. However, due to the need for dividing image into no-overlapping patches, the finger rotation may leads to severe changes in local patches, resulting in the degradation in recognition performance of these methods. For the 2D deep-learning based methods, such as DeepVein [20], Das et al. [22], the drop in accuracy caused by rotation is alleviated a lot compared with traditional methods. Since the rotation is included in the training data, learning based methods can achieve better robustness than traditional methods. However, as mentioned in Section 1 and Fig. 1, the problems of image distortion and imaging different regions can not be effectively solved only by simply stacking a lot of data. Intuionally, 3D based data like point cloud contains more information than 2D based data like image, for example, the posture of finger are explicitly modeled in 3D point cloud while full view images loses it. As a result, our 3DFVSNet performs better compared with 2D based methods in Tables 3 and 4.

Compared with 3D based methods, our 3DFVSNet outperforms those representative 3D point cloud based networks (PointNet [9], DensePoints [37], DGCNN [36]). In Tables 3 and 4, the EERs of PointNet, DGCNN and DensePoint are higher than our 3DFVSNet and suffer more from degradance in recognition performance caused by rotation. As mentioned in Section 1, PointNet and its variants are proven to be unable to handle rotation well in related works. On the one hand, PointNet only utilize the global feature ignoring the local regions. On the other hand, PointNet is only proved to perform well on coarse-grained datasets but lack the feasibility on fine-grained task like finger vein verification. DensePoints and DGCNN are the state-of-the-art point cloud based methods which are improved based on PointNet focusing on how to utilize those local regions effectively. The EERs of DensePoints and DGCNN are much lower than PointNet because local features are utilized more effectively. Because of the integration of rotation group based on the spatial prior of finger shape, our 3DFVSNet performs better than DensePoints and DGCNN on all bechmarks, which demonstrate the superiority of our 3DFVSNet towards the task of finger vein verification.

4.6 Ablation study

Ablation study on different modalities

The reconstructed 3D finger vein point cloud contains two different modalities: shape features from the geometric distribution of point cloud and texture features from the vein patterns. Different from regular point cloud based feature representation methods [9] which only considers geometric information of the 3D shape, our 3D finger vein point cloud also contains abundant texture information comprised of unique finger vein patterns. To evaluate the effect of these modalities on our proposed 3DFVSNet, we conduct ablation experiment on two benchmarks: SCUT-3DFV-V1 and LFMB-3DPVFV. The experimental results are provided in Table 5. Since the aggregation of texture patterns relies on the position of each point in point cloud, it is meaningless to directly use an unordered point set constructed by texture modality without shape modality. From Table 5, we can find that the involvement of finger vein texture pattern can significantly improve the performance. Furthermore, the shape modality is less discriminative towards larger rotation, for example, the EER of “ER” benchmark is lower than the EER of “HR” benchmark on both datasets.

Tab.5 Ablation study on different modalities of 3D finger vein

Benchmark	Modalities	EER/%
3DFV-V1-ER	Shape	4.81
3DFV-V1-ER	Shape + Texture	2.61
3DFV-V1-HR	Shape	12.15
3DFV-V1-HR	Shape + Texture	5.07
3DPVFV-ER	Shape	5.84
3DPVFV-ER	Shape + Texture	2.81
3DPVFV-HR	Shape	10.00
3DPVFV-HR	Shape + Texture	4.49

Ablation study on group separation

To evaluate the effect of the specially designed rotation group in the network, we further conduct ablation experiments about the group separation. As discussed in Section 3.2.2 Rotation Group, the number of assumed rotation groups is a crucial factor towards the “resolution” of possible rotations. In Tables 6 and 7, we respectively present the experiments of different settings on rotation group separations on SCUT-3DFV-V1 and LFMB-3DPVFV. In default, we uniformly separate the 360 degree of axial rotation into

M

possible cases in the rotation group. We provide experimental results on 4 different resolutions of rotation group:

M = 360

M = 180

M = 90

M = 45

. It is noted that if the resolution

M

of rotation group equals to one, the model degrades to a normal PointNet based network. Hence, we also provide the performance of normal PointNet based networks as comparable baselines with a resolution of

M = 1

, such as PointNet [9], DensePoints [37], DGCNN [36]. From Tables 6 and 7, we can find that when the resolution of rotation groups decreases, it is obvious that the model suffers from a drop in performance in benchmarks with large rotations (SCUT-3DFV-V1-HR and LFMB-3DPVFV-HR). Whereas, in benchmarks with easy rotations (SCUT-3DFV-V1-ER and LFMB-3DPVFV-ER), the influence of drop in rotation group’s resolution appears to be more trivial compared with harder cases. Furthermore, compared with SCUT-3DFV-V1, LFMB-3DPVFV which is three times larger, suffers less from the drop of resolution in rotation group.

Tab.6 Ablation study of different grops on SCUT-3DFV-V1

Method	Groups	3DFV-V1-ER/%	3DFV-V1-HR/%
3DFVSNet	360	2.61	5.07
	180	3.55	7.14
	90	3.70	7.20
	45	3.70	7.79
PointNet [9]	1	10.10	15.39
DensePoints [37]	1	4.11	7.73
DGCNN [36]	1	3.89	7.00

Tab.7 Ablation study of different groups on SCUT-LFMB-3DPVFV

Methods	Groups	3DPVFV-ER/%	3DPVFV-HR/%
3DFVSNet	360	2.81	4.49
	180	2.83	4.60
	90	2.85	4.70
	45	2.95	4.99
PointNet [9]	1	4.23	8.43
DensePoints [37]	1	4.88	8.36
DGCNN [36]	1	3.60	6.30

4.7 Network efficiency

To evaluate the time efficiency of our proposed 3DFVSNet, we provide a comparison of the efficiency of different networks in Table 8. We present the number of parameters, the FLOPs, the CPU runtime on Intel i5-8500 CPU and the GPU runtime on GTX 1080Ti in the table. As a standard setting of finger vein image verification, the image resolution is usually set to be ranging from

100 × 100

200 × 200

. Hence, we evaluate the time efficiency of the point cloud with a comparable resolution of

10, 000

and

40, 000

points. Compared with point cloud based methods, our 3DFVSNet requires much less FLOPs and runs faster. Standard PointNet based networks only consider a sparse point cloud with about

1024

2048

points in default. Once the number of points increases, the calculation cost also increases dramatically, because all layers conduct a point-wise convolution on all points rather than shrinking the resolution of feature map like CNN. Due to the construction of dynamic graph on each layer, the computation cost becomes even larger in DGCNN. Furthermore, DGCNN with 40,000 points suffer from an overflow in GPU memory (11G) in GTX 1080Ti. Whereas with the help of the rotation group, our 3DFVSNet does not need to conduct point-wise convolution on all points. In analogy with CNNs, aggregating features from the specially designed rotation group with global max pooling, we no longer need to conduct convolutions on the whole point cloud but aggregate features from an abstracted point sets with less points. In comparison with image based methods, the point cloud based networks involving our 3DFVSNet perform competitively. Whereas, according to the trade-off between performance and efficiency, we should select an appropriate resolution based on the realistic situations.

Tab.8 The comparison of time efficiency among our proposed 3DFVSNet and other methods

Network	Input	Resolution	Params	FLOPs	CPU runtime/ms	GPU runtime/ms
3DFVSNet	Points	40000 $×$ 4	1.55M	3.41G	360	12.5
3DFVSNet	Points	10000 $×$ 4	1.55M	891M	90	3.2
PointNet [9]	Points	40000 $×$ 4	0.83M	6.07G	570	15.7
PointNet [9]	Points	10000 $×$ 4	0.83M	1.52G	150	6.3
DGCNN [36]	Points	40000 $×$ 4	1.82M	40.39G	−	−
DGCNN [36]	Points	10000 $×$ 4	1.82M	10.1G	5340	97.6
ResNet50 [44]	Image	224 $×$ 224 $×$ 3	24.05M	4.11G	210	11.8
Deepvein [20]	Image	128 $×$ 128 $×$ 1	70.63M	3.56G	220	8.7
Das et al. [22]	Image	153 $×$ 153 $×$ 1	188.82M	19.32G	520	24.4

5 Discussion

In this paper, we aim to solve the finger posture variation problem especially the rotation transformation of finger. We propose a new silhouette-based 3D finger vein reconstruction method aiming at simplifying the optimization problem and reducing the time consumption and convergence difficulty. Several accelerating strategies for reconstructing 3D finger model are also depicted on behalf of being deployed on an embedded platform to run in real time. The main contribution is that we are the first to propose a 3D rotation invariant network (3DFVSNet) specially for processing point cloud. The integration of a special rotation group ensures the rotation equvariance while the final global pooling achieves the rotation invariance. Visualization results of the network’s attention provides sufficient proof for the rotation equvariance of rotation group and rotation invariance of the extracted feature from our 3DFVSNet. Furthermore, experimental results on two 3D vein datasets demonstrate the robustness of our 3DFVSNet on handling rotation transformations.

Since the proposed method is built on a premise of rigid transformation, non-rigid transformation of fingers may degrade the performance. Because fingers are flexible limbs of human body, non-rigid deformations such as bending occur inevitably. As non-rigid deformations may involve challenging variations of postures and require more computation time for decoupling the motion fragments, the volunteers are required to straighten their fingers to alleviate the effect of non-rigid transformations during verification following a standard premise as previous methods do. However, if the acquisition device follows an open space design and the fingers has more degrees of freedom, the effect of non-rigid deformation in finger vein verification will become non-negligible, leading to an interesting direction in the future work.

Fig.15 Fig.A1 The visualization of the reconstructed 3D model from the same finger under different posture

Full size|PPT slide

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61976095 and 61772225), the Guangdong Natural Science Foundation (2016A030313468 and 2020A1515010558), the Science and Technology Planning Project of Guangdong Province (2018B030323026), and the Fundamental Research Funds for the Central Universities (2018PY24).

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Zhou Y . Human identification using finger images. IEEE Trans Image Process, 2011, 21( 4): 2228– 2244

2	Qin H , El-Yacoubi M A . Deep representation-based feature extraction and recovering for finger-vein verification. IEEE Transactions on Information Forensics and Security, 2017, 12( 8): 1816– 1829

3	Tome P, Raghavendra R, Busch C, Tirunagari S, Poh N, Shekar B, Gragnaniello D, Sansone C, Verdoliva L, Marcel S. The 1st competition on counter measures to finger vein spoofing attacks. In: Proceedings of International Conference on Biometrics. 2015, 513– 518

4	Lee E C , Park K R . Image restoration of skin scattering and optical blurring for finger vein recognition. Optics and Lasers in Engineering, 2011, 49( 7): 816– 828

5	Qiu S , Liu Y , Zhou Y , Huang J , Nie Y . Finger-vein recognition based on dual-sliding window localization and pseudo-elliptical transformer. Expert Systems with Applications, 2016, 64 : 618– 632

6	Prommegger B , Kauba C , Linortner M , Uhl A . Longitudinal finger rotation deformation detection and correction. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2019, 1( 2): 123– 138

7	Miura N , Nagasaka A , Miyatake T . Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification. Machine vision and applications, 2004, 15( 4): 194– 203

8	Kang W , Liu H , Luo W , Deng F . Study of a full-view 3d finger vein verification technique. IEEE Transactions on Information Forensics and Security, 2020, 15 : 1175– 1189

9	Qi C R, Su H, Mo K, Guibas L J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2017, 652– 660

10	Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of IEEE conference on computer vision and pattern recognition. 2015, 1912−1920

11	Miura N , Nagasaka A , Miyatake T . Extraction of finger-vein patterns using maximum curvature points in image profiles. IEICE TRANSACTIONS on Information and Systems, 2007, 90( 8): 1185– 1194

12	Huang B, Dai Y, Li R, Tang D, Li W. Finger-vein authentication based on wide line detector and pattern normalization. In: Proceedings of the 20th. International Conference on Pattern Recognition. 2010, 1269−1272

13	Hartung D , Olsen M A , Xu H , Nguyen H T , Busch C . Comprehensive analysis of spectral minutiae for vein pattern recognition. IET biometrics, 2012, 1( 1): 25– 36

14	Liu F , Yang G , Yin Y , Wang S . Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing, 2014, 145 : 75– 89

15	Kang W , Chen X , Wu Q . The biometric recognition on contactless multi-spectrum finger images. Infrared Physics & Technology, 2015, 68 : 19– 27

16	Yang W , Ji W , Xue J H , Ren Y , Liao Q . A hybrid finger identification pattern using polarized depth-weighted binary direction coding. Neurocomputing, 2019, 325 : 260– 268

17	Yang G, Xi X, Yin Y. Finger vein recognition based on (2d)2 pca and metric learning. Journal of Biomedicine and Biotechnology, 2012, 2012

18	Radzi S A , Hani M K , Bakhteri R . Finger-vein biometric identification using convolutional neural network. Turkish Journal of Electrical Engineering & Computer Sciences, 2016, 24( 3): 1863– 1878

19	Liu W, Li W, Sun L, Zhang L, Chen P. Finger vein recognition based on deep learning. In: Proceedings of the 12th Conference on Intustrial Elatronics and applications. 2017, 205– 210

20	Huang H, Liu S, Zheng H, Ni L, Zhang Y, Li W. Deepvein: Novel finger vein verification methods based on deep convolutional neural networks. In: Proceedings of IEE Internatinal Conference on Identity, Security and Beharior Analysis. 2017, 1– 8

21	Song J M , Kim W , Park K R . Finger-vein recognition based on deep densenet using composite image. IEEE Access, 2019, 7 : 66845– 66863

22	Das R , Piciucco E , Maiorana E , Campisi P . Convolutional neural network for finger-vein-based biometric identification. IEEE Transactions on Information Forensics and Security, 2018, 14( 2): 360– 373

23	Chen Q, Yang L, Yang G, Yin Y, Meng X. Dfvr: Deformable finger vein recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 2017, 1 278– 1282

24	Prommegger B, Uhl A. Rotation invariant finger vein recognition. In: Proceedings of IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS). 2019, 1– 9

25	Xi X , Yang G , Yin Y , Meng X . Finger vein recognition with personalized feature selection. Sensors, 2013, 13( 9): 11243– 11259

26	Matsuda Y , Miura N , Nagasaka A , Kiyomizu H , Miyatake T . Finger-vein authentication based on deformation-tolerant feature-point matching. Machine Vision and Applications, 2016, 27( 2): 237– 250

27	Lee E C , Lee H C , Park K R . Finger vein recognition using minutia-based alignment and local binary pattern-based feature extraction. International Journal of Imaging Systems and Technology, 2009, 19( 3): 179– 186

28	Park K R , Jang Y K , Kang B J . A study on touchless finger vein recognition robust to the alignment and rotation of finger. The KIPS Transactions: PartB, 2008, 15( 4): 275– 284

29	Huang B, Dai Y, Li R, Tang D, Li W. Finger-vein authentication based on wide line detector and pattern normalization. In: Proceedings of the 20th International Conference on Pattern Recognition. 2010, 1269−1272

30	Yang L , Yang G , Yin Y , Xi X . Finger vein recognition with anatomy structure analysis. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28( 8): 1892– 1905

31	Chen Q , Yang L , Yang G , Yin Y . Geometric shape analysis based finger vein deformation detection and correction. Neurocomputing, 2018, 311 : 112– 125

32	Ma Z, Fang L, Duan J, Xie S, Wang Z. Personal identification based on finger vein and contour point clouds matching. In: Proceedings of IEE International Conference on Mechatronics and Automation. 2016, 1983−1988

33	Bunda S. 3D point cloud reconstruction based on the finger vascular pattern. B.S. thesis, University of Twente, 2018

34	Qi C R, Yi L, Su H, Guibas L J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, 2017, 5099−5108

35	Jiang M, Wu Y, Zhao T, Zhao Z, Lu C. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv: 180700652, 1807

36	Wang Y , Sun Y , Liu Z , Sarma S E , Bronstein M M , Solomon J M . Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 2019, 38( 5): 1– 12

37	Liu Y, Fan B, Meng G, Lu J, Xiang S, Pan C. Densepoint: Learning densely contextual representation for efficient point cloud processing. In: Proceedings of the IEEE International Conference on Computer Vision. 2019, 5239−5248

38	Svoboda T , Martinec D , Pajdla T . A convenient multi-camera self-calibration for virtual environments. Presence: Teleoperators & virtual environments, 2005, 14( 4): 407– 422

39	Cohen T, Welling M. Group equivariant convolutional networks. In: Proceedings of International conference on machine learning. 2016, 2990−2999

40	Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition. In: Proceedings of European Conference on Computer Vision. 2016, 499– 515

41	Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian V. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision. 2018, 839– 847

42	Lu Y, Tu M, Wang H, Zhao J, Kang W. Finger vein recognition based on double-orientation coding histogram. In: Proceedings of the 14th Chinese Conference on Biometric Recognition. 2019, 20– 27

43	Ojala T , Pietikainen M , Maenpaa T . Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 2002, 24( 7): 971– 987

44	He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition. 2016, 770– 778

Options

Outlines

About the journal

Browse

Authors & reviewers

Abstract

Cite this article

1 Introduction

Fig.1 Illustration of the finger posture variation problem if rotation is taken into consideration in finger vein verification

2 Related work

2.1 Finger vein verification

2.2 Anti-rotation finger vein verification

2.3 3D finger vein verification

2.4 Deep learning on 3D point cloud

3 3D finger vein verification system

Fig.2 The framework of the proposed 3D finger vein verification system

3.1 3D finger vein reconstruction

3.1.1 The silhouette-based 3D finger vein reconstruction

Fig.3 3D reconstruction schematic diagram. (a) shows a finger model in our predefined 3D coordinate system; (b) extracts a cross section from the finger model (a) and spreads in a 2D plane

3.1.2 Accelerated scheme

3.2 Rotation invariant network architecture

3.2.1 Preliminary

3.2.2 Design of rotation group

Fig.4 Illustration of the preprocessing procedure on 3D finger vein point cloud

Fig.5 Illustration of the rotation group on our 3D finger vein point cloud

Fig.6 A simple example of the rotation group. ∗ represent the convolution operator. Rotation on the input data equals to permutation in the group

3.2.3 Integration with hierachical neural network

Fig.8 An example of a simple graph convolutional layer. p represent the center point of this local region and ni,i=1,2,3,4,5 represent the k-nearest neighbors of point p.

4 Experiment

4.1 Implementation

4.2 Datasets

Tab.1 Instruction of the SCUT-3DFV-V1 and LFMB-3DPVFV Dataset

4.3 Time consumption for 3D reconstruction

Tab.2 Time consumption for 3D Reconstruction pipline (ms)

4.4 Visualization proof of rotation invariance

4.5 3D finger vein verification

Tab.3 Performance comparison on SCUT-3DFV-V1

Tab.4 Performance comparison on LFMB-3DPVFV

Fig.11 The ROC curves of various algorithms evaluated on dataset 3DFV-V1-ER (easy rotation)

Fig.12 The ROC curves of various algorithms evaluated on dataset 3DFV-V1-HR (hard rotation)

Fig.13 The ROC curves of various algorithms evaluated on dataset LFMB-3DPVFV-ER (easy rotation)

Fig.14 The ROC curves of various algorithms evaluated on dataset LFMB-3DPVFV-HR (hard rotation)

4.6 Ablation study

Tab.5 Ablation study on different modalities of 3D finger vein

Tab.6 Ablation study of different grops on SCUT-3DFV-V1

Tab.7 Ablation study of different groups on SCUT-LFMB-3DPVFV

4.7 Network efficiency

Tab.8 The comparison of time efficiency among our proposed 3DFVSNet and other methods

5 Discussion

Fig.15 Fig.A1 The visualization of the reconstructed 3D model from the same finger under different posture

Acknowledgements

References

Fig.6 A simple example of the rotation group. $∗$ represent the convolution operator. Rotation on the input data equals to permutation in the group

Fig.8 An example of a simple graph convolutional layer. $p$ represent the center point of this local region and $n i, i = 1, 2, 3, 4, 5$ represent the k-nearest neighbors of point $p$ .