Path synthesis of spatial revolute–spherical–cylindrical–revolute mechanisms using deep learning

Xueting DENG; Anar NURIZADA; Anurag PURWAR

doi:10.1007/s11465-025-0825-7

Front. Mech. Eng. ›› 2025, Vol. 20 ›› Issue (2) : 9 DOI: 10.1007/s11465-025-0825-7

RESEARCH ARTICLE

Path synthesis of spatial revolute–spherical–cylindrical–revolute mechanisms using deep learning

Author information +

History +

PDF (10367KB)

Abstract

The design of single-degree-of-freedom spatial mechanisms tracing a given path is challenging due to the highly non-linear relationships between coupler curves and mechanism parameters. This work introduces an innovative application of deep learning to the spatial path synthesis of one-degree-of-freedom spatial revolute–spherical–cylindrical–revolute (RSCR) mechanisms, aiming to find the non-linear mapping between coupler curve and mechanism parameters and generate diverse solutions to the path synthesis problem. Several deep learning models are explored, including multi-layer perceptron (MLP), variational autoencoder (VAE) plus MLP, and a novel model using conditional $− β − VAE$ ( $c − β − VAE$ ). We found that the $c − β − VAE$ model with β = 10 achieves superior performance by predicting multiple mechanisms capable of generating paths that closely approximate the desired input path. This study also builds a publicly available database of over 5 million paths and their corresponding RSCR mechanisms. The database provides a solid foundation for training deep learning models. An application in the design of human upper-limb rehabilitation mechanism is presented. Several RSCR mechanisms closely matching the wrist and elbow path collected from human movements are found using our deep learning models. This application underscores the potential of RSCR mechanisms and the effectiveness of our model in addressing complex, real-world spatial mechanism design problems.

Graphical abstract

Keywords

spatial mechanism / neural networks / path synthesis / machine learning / deep learning / generative models

Cite this article

Download citation ▾

Xueting DENG, Anar NURIZADA, Anurag PURWAR. Path synthesis of spatial revolute–spherical–cylindrical–revolute mechanisms using deep learning. Front. Mech. Eng., 2025, 20(2): 9 DOI:10.1007/s11465-025-0825-7

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Multi-degree-of-freedom (multi-DOF) robots, which can be programmed to produce paths with their end-effectors, are prevalent. Meanwhile, closed-loop, single-DOF mechanisms can generate high-order multivariate polynomials with only one actuator and do not require complex control and programming. Such mechanisms are light, easy to build and maintain, and less prone to mechanical failure. As a result, single-DOF mechanisms are widely used in daily life, such as in car window wipers, rehabilitation devices [1–4], and robots including flapping-wing robots. However, designing closed-loop single-DOF mechanisms, especially spatial mechanisms, is a difficult task because it requires specialized knowledge and access to design tools that can facilitate such design. Previous research efforts have been focused on generating a specific motion or functional relationship between input and output. In general, the motion synthesis problem is often simplified because the mechanisms can be decomposed using dyadic decomposition; however, the path synthesis problem does not benefit from such simplification. This work is concerned with the path synthesis of a single-DOF spatial mechanism called revolute–spherical–cylindrica–lrevolute (RSCR) (Fig.1). Although spatial RSCR mechanisms have potential applications in functional, path, and motion synthesis, they have remained relatively underexplored. In addition, the planar RRRR mechanism can be viewed as a special spatial RSCR mechanism when all joints are in parallel planes and all rotational axes are perpendicular to the joint plane.

The goal of path synthesis is to find mechanisms whose coupler follows a desired path or a sequence of points. For a single-DOF mechanism, the inverse process of determining the mechanism parameters of a desired path is challenging due to the non-linear relationships between the mechanism’s parameters and its generated path. The direct analytical functions for a single-DOF mechanism, particularly for spatial RSCR mechanisms, are multivariate and of higher order [5,6]. Bagci [7] and Thompson [6] first analyzed the kinematic properties of RSCR spatial mechanisms, laying the groundwork and providing foundational formulas for synthesis problems. Chiang et al. [8] derived a motion synthesis method for accommodating four precision poses and relaxed specifications. Ananthasuresh and Kramer [9] and Watanabe et al. [5] worked on the branch identification of RSCR mechanisms. Ananthasuresh and Kramer [9] also classified RSCR mechanisms into cone, cylinder, and one-sheet hyperboloid types based on their geometry properties. Shrivastava and Hunt [10] highlighted that RSCR mechanisms can achieve dwell motion, a feature critical in many mechanical systems. Wang and Guo [11] demonstrated that RSCR mechanisms and other closed-chain mechanisms can be applied to the problem of stator blade adjustment in aero-engines. Most of these works focused on the function and motion synthesis problems and utilized special configurations of RSCR mechanisms to simplify calculations. For example, Osman and Segev [12] and Osman et al. [13] assumed that the two rotational axes of the revolute–cylindrical (RC) dyad are parallel, but Huang and Youm [14] assumed those two axes intersect. As two spatial lines, the two rotational axes of the RC dyad can be parallel, intersecting, or neither intersecting nor parallel. For parallel or intersecting configurations, the motion of coupler is constrained to a cone or a cylinder instead of a general one-sheet hyperboloid. Such simplification significantly reduces the complexity of calculations.

Deriving analytical relationships for path synthesis in spatial RSCR mechanisms and subsequently obtaining solutions are challenging tasks. With the advent of deep learning, its potential applications in engineering design problems have become evident [15]. These models can be used to 1) discover complex approximate relationships between inputs and outputs and 2) instantaneously generate many design solutions once the training is completed. Early in 2001, Vasiliu and Yannou [16] employed a multi-layer perceptron (MLP) deep learning model for the closed-path synthesis of planar four-bar linkages. Later, Galn-Marn et al. [17] focused on crank–rocker planar four-bar and improved the results produced by MLPs by changing the representation of the path from raw Cartesian coordinates to wavelets [18]. With the recent surge in deep learning, models with more complex structures than MLP, especially deep generative models, have proven effective in mechanism synthesis [19‒26]. Variational autoencoder (VAE) [27] is known for extracting data features and thus can be applied to explore path features. The encoder of a VAE can map the input path to a low-dimensional latent space representation where similar paths can be searched in the neighborhood of the latent representation [19,20,28]. A method similar to the VAE also proved effective for the path synthesis of spatial 5-spherical–spherical (5-SS) mechanisms [21]. Instead of generating new mechanisms, the standard VAE functions as a library, finding paths similar to the desired one from the database used for training. To overcome this defect, researchers proposed a pipeline that combines the VAE encoder and an MLP to extract features and generate new mechanisms [22]. They also examined the influence of various path representations on VAE performance. Meanwhile, the generative adversarial network was applied to planar four-bar synthesis problems, integrating kinematic and dynamic conditions [23]. Regenwetter et al. [15] published a review of the application of deep generative models in engineering design. Purwar and Chakraborty [24] recently discussed potential future research directions in robot mechanism design with deep learning.

Inspired by Nurizada et al. [29], this work introduces a novel generative model utilizing a

conditional − β − VAE

(

c − β − VAE

) for the path synthesis of spatial RSCR mechanisms.

c − β − VAE

is a modified version of the VAE for supervised learning that allows the model to generate reconstructed data with given conditions (c). This model also introduces an adjustable loss weight β. Our approach involves using a spatial path as the condition of

c − β − VAE

, with the respective mechanisms serving as inputs to the model. The aim of the model is to predict mechanisms based on the provided spatial paths as the condition. For path synthesis after training, a desired path is provided, and the model combines this path with multiple random samplings from its trained latent space to generate several possible mechanisms that approximate the desired path. Inspired by the work of Nurizada and Purwar [22], we train several models with simple MLP architectures and VAE plus MLP architectures using the same dataset to demonstrate the effectiveness of our model relative to existing ones. By comparing the results of three different models, we establish the superiority of the novel model

c − β − VAE

. This work also briefly describes the generation of a path database for RSCR mechanisms including a simulation algorithm and path normalization.

The remainder of the paper is organized as follows. Section 2 briefly introduces the structure of RSCR mechanisms, followed by the generation and processing of the database used in this work. Section 3 discusses the rationale for using deep learning and introduces the architectures of the employed deep learning models. Section 4 shows the analysis of the results from all architectures utilized. Finally, Section 5 presents a practical application of RSCR mechanisms in human upper-limb rehabilitation.

2 Database generation

Fig.1 shows the structure of an RSCR mechanism that includes two fixed revolute (R), a spherical (S) joint, a cylindrical (C) joint, and a welded joint. Joint

J 6

is another welded joint attached to the rigid body

l 345

and is also called the coupler point that generates the coupler curve.

Deep learning requires a high-quality database with evenly distributed and sufficient data to achieve good results. The generation of such a database involves three main steps: implementing a simulator, collecting data from the simulator, and normalizing the collected data. This section briefly outlines these steps. Please see Deng et al. [30] for additional details.

2.1 Simulator

The simulator is implemented using the geometrical constraints of the spatial RSCR mechanism and a gradient-based optimization algorithm. The coordinates of

J 3

are calculated from the input angle ϕ and

l 13

. Some geometrical constraint equations below are referenced from Javier and Eduardo [31]. In particular, the length constrains of rigid links

l 34

and

l 25

are presented as

(1)

f 1 = (x 3 − x 4) 2 + (y 3 − y 4) 2 + (z 3 − z 4) 2 − l 34 = 0,

(2)

f 2 = (x 2 − x 5) 2 + (y 2 − y 5) 2 + (z 2 − z 5) 2 − l 25 = 0,

where

(x a, y a, z a)

are the coordinates of

J a

. The dot product gives the constant angular relationships between links and axes:

(3)

f 3 = u 43 x u 5 x + u 43 y u 5 y + u 43 z u 5 z − l 34 cos ⁡ θ 1 = 0,

(4)

f 4 = u 52 x u 5 x + u 52 y u 5 y + u 52 z u 5 z − l 25 cos ⁡ θ 2 = 0,

(5)

f 5 = u 25 x u 2 x + u 25 y u 2 y + u 25 z u 2 z − l 25 cos ⁡ θ 3 = 0,

(6)

f 6 = u 2 x u 5 x + u 2 y u 5 y + u 2 z u 5 z − cos ⁡ θ 4 = 0,

where the vectors

J a J b → = (u a b x, u a b y, u a b z)

and

u a = (u a x, u a y, u a z)

. Unfortunately, the dot product is unable to represent parallel relationships between

u 5

and

J 4 J 5 →

because of the variable length of

l 45

. We use cross product to capture this geometrical constraint at

u 5 z ≠ 0

(7)

f 71 = u 45 y u 5 z − u 45 z u 5 y = 0,

(8)

f 81 = u 45 z u 5 x − u 45 x u 5 z = 0.

When

u 5 z = 0

, the two constraints change to

(9)

f 72 = u 45 x u 5 y − u 45 y u 5 x = 0,

(10)

f 82 = u 5 z = 0,

where

f 8

is automatically satisfied. The last constraint equation is given by the unit vector

u 5

(11)

f 9 = ‖ u 5 ‖ − 1 = 0.

During the simulation,

l 13

is set to be the input link. The fixed joints

J 1

and

J 2

and their rotational axes

u 1

and

u 2

are assumed to be known. In this case, the coordinates of

J 3

can be easily calculated while it is rotating about

u 1

. The remaining unknown variables are

J 4

J 5

, and

u 5

. The unknowns are set as a vector

λ = (x 4, y 4, z 4, x 5, y 5, z 5, u 5 x, u 5 y, u 5 z)

. With nine constraint equations and nine unknowns, the solvable object function

F (λ)

(12)

F (λ) = ∑ i = 1 9 f i 2 (λ) = 0.

For one simulation, the initial state of the mechanism, including the coordinates of all joints and vectors, is defined. During the simulation, a small angular increment is applied to input angle ϕ, which updates the coordinates of

J 3

. The numerical algorithm Powell’s Hybrid Method (hybr) from scipy.optimize.root function in Python [32] is used to find the new root of

F

for the updated

J 3

. The coordinates of

J 6

are calculated according to the relative coordinates of

J 3

and

J 4

as they form a rigid body.

2.2 Data collection

For deep learning training, we first need to create a database of mechanisms. We begin by setting up a

7 × 7 × 7

mesh grid shown in Fig.2 and locating the joints and axes of rotations as follows:

1) The center of the

7 × 7 × 7

mesh grid is the origin

(0, 0, 0)

, and the x-, y-, and z-values of nodes range from

[− 3, 3]

with a unit interval

1

2) The fixed revolute joint

J 1

is located at

(0, 0, 0)

, and the rotational axis is fixed at

u 1 = (0, 0, 1)

;

3) Given that three non-collinear points define a unique plane in

R 3

, without losing generality, three joints

J 1

J 2

, and

J 3

are set on the same plane z = 0 but their x- and y-coordinates are allowed to vary among the nodes.

J 4

is set at the same coordinates as

J 5

for simplification. Given that

J 4

exists to maintain the constant angle

θ 1

, its exact coordinates do not matter. Although in practice,

J 5

and

J 4

cannot physically meet. In theory, once

θ 1

is a constant, any

J 4

that lies along the rotational axis

u 5

results in the same mechanism.

5) Rotational axes

u 2

and

u 5

are set in a unit spherical coordinate system

(1, β 2, β 5)

where

β i = 2, 5

is chosen from

[0, π 3, 2 π 3]

. This step offers nine different choices of

u 2

and

u 5

, and their values can be calculated from

(13)

u i = (sin ⁡ β 1 cos ⁡ β 2, sin ⁡ β 1 sin ⁡ β 2, cos ⁡ β 1), i = 2, 5.

6) Other joints and axes are chosen, and the coupler point

J 6

is assumed to be located at the nodes of a

3 × 3 × 3

mesh grid attached to the rigid link

l 345

as shown in Fig.3.

The above rules aim to ensure the joints are located at all grid nodes to build a relatively evenly distributed database. Unfortunately, even with such a small mesh grid and the limited choices of

β i = 2, 5

, the number of possible combinations of different mechanisms is

72 × 72 × 73 × 32 × 32 = 66706983

with

3 × 3 × 3 = 27

coupler curves for each combination. Moreover, the grid-based generation method provides a limited combination of link lengths, leading to numerous repeated calculations. To streamline the generation process and avoid duplicated mechanisms, we only select one combination for each unique link ratio

(l 13 / l 12, l 34 / l 12, l 25 / l 12)

. A closed-loop mechanism can be assembled in different ways even with the same links’ lengths, lead to the generation of circuit defects [33] during synthesis. Although the grid-based method generates a limited number of different link ratios, initializing the joints’ locations provides the benefit of avoiding the circuit defect. The density of the mesh grid is determined by considering the computing time, storage requirement, and the actual effect on the deep learning results. In the end, the algorithm produces

196912

mechanisms, each with 27 coupler curves. Given that this work focuses only on closed paths, the final database consists of

5316624

coupler curves with their corresponding mechanism after the open paths are removed.

Before the data are sent to the deep learning models, each raw coupler curve with Cartesian points obtained from simulator is converted to a third-degree B-spline interpolation curve [34]. Thus, each path has 100 data points that are uniformly sampled with the same time parametrization. The path is then translated to the center of mean, rotated, and reflected with principal component analysis [20,35–37] and independent component analysis [38]. Fig.4 shows a mechanism with its path before and after the normalization. The final database used in this work contains normalized paths with 100 evenly distributed data points each. The generated database is available at Kaggle website [39].

3 Deep learning models

The analytical functions for mapping from the RSCR mechanism parameters to its coupler curve have been derived [5,6]. However, an inverse function from the coupler curve to the mechanism parameters does not exist because multiple mechanism parameters could correspond to the same path. Deep learning excels at approximating the non-linear relationships between different data sets. As mentioned in Section 2, during data generation, mechanisms with the same link ratio are avoided to maintain a one-to-one mapping between the path and the mechanism. Furthermore, the mechanism parameters are represented by the initial coordinates of the joints and rotational axes to prevent circuit defects. Similar methods for planar linkage mechanisms have been validated in Refs. [16,17,22].

In our work, we leverage the capabilities of deep learning models to learn the complex relationship between the coupler curve and its corresponding RSCR mechanism parameters, essentially solving the closed-path synthesis problem for spatial RSCR mechanisms. We propose three distinct models for this purpose. Similar to Refs. [16,17], the first model utilizes an MLP to directly map the coupler curve to its mechanism parameters. Based on Nurizada and Purwar [22], the second model first trains a VAE to represent the features of the path data by latent space representation. It then incorporates a separate MLP to learn the mapping between the latent representations of the path and the respective mechanisms. Different from the first one, this model allows for the exploration of the latent space in the vicinity of the latent representation of the input, yielding multiple potential solutions, i.e., mechanisms, for a single input path. The third model introduces a novel model

c − β − VAE

that treats the coupler curve as a condition for reconstructing mechanism parameters. After training,

c − β − VAE

learns a set of latent distributions that are conditioned on an input. For predicting mechanisms for the desired input path, data can be randomly sampled from the latent distributions and combined with the desired path before being fed to the decoder. Given that the random samples are drawn from the learned latent distributions, they are valid and ensure diverse output mechanisms for a single input path. Later in this section, we provide the theoretical background for each model. The results and comparisons of these methods are discussed in Section 4.

This work aims to train deep learning models that take a desired path as input and then predict one or several RSCR mechanisms capable of generating paths that approximate the input (Fig.5). First, the raw input path is normalized to be represented as

100

3D Cartesian points flattened into a

1 × 300

vector. This vector corresponds to the normalized path coordinates

P = (P 1 x, P 1 y, P 1 z, …, P 100 x, P 100 y, P 100 z)

. The output data, representing the mechanisms, is encoded as

1 × 18

vectors

J = (J 1, J 2, J 3, J 4, u 2, u 5)

of the initial coordinates of RSCR mechanisms. The rotational axis

u 1

is always perpendicular to

J 1 J 3 →

, and

J 5

coincides with

J 4

for the initial coordinates. The rest of this section is devoted to the theoretical background needed to understand the proposed methods.

3.1 MLP

The MLP architecture is a fundamental model within the realm of deep learning. It comprises an input layer, one or more hidden layers, and an output layer. Each layer consists of multiple neurons, and every one of which is interconnected with neighbor layers’ neurons through weighted connections. For this reason, the MLP is also called a “fully connected neural network”. An exemplary architecture of an MLP is depicted in Fig.6.

Assuming

m

neurons in the

j th

layer, the output value of the

i th

neuron in (j+1)th layer

n i j + 1

can be calculated as

(14)

n i j + 1 = g j (∑ k = 1 m w k j n k j + b k j),

where

w k j

and

b k j

represent the weights and bias of

k th

neuron in

j th

layer connected to the

i th

neuron in (j+1)th layer, respectively, and

g j

denotes the activation function, introducing non-linearity into the output.

A critical aspect of deep learning is the utilization of a loss function to measure the model’s accuracy and to train the model by calculating the difference between its predictions and the actual data. The objective of training is to minimize this loss function by fine-tuning the model’s weights and biases. During training, the MLP undergoes forward-propagation, where data traverse through the network, followed by the adjustment of weights and biases based on the loss function’s gradient through backpropagation [40].

For the evaluation of the difference between the predicted mechanism

J^

and the ground truth mechanism

J

, the mean squared error (MSE) is chosen as the loss function:

(15)

L MSE = 118 ∑ i = 1 18 ‖ J^i − J i ‖ 2,

where

J^

denotes the predicted value of

J

from the model. The rectified linear unit activation function [41], coupled with batch normalization [42] and a dropout layer [43], is employed to enhance performance and counteract overfitting during training.

Our first method capitalizes on the approximating capabilities of MLPs to directly learn the mapping between the path and its respective mechanism, which serve as input and output to the MLP, respectively. We train several MLP architectures by varying the number of hidden layers and the number of neurons within each layer. The general architecture is shown in Tab.1.

3.2 VAE

Introduced by Kingma and Welling [27], the VAE excels in data dimensionality reduction and reconstruction. It follows the classic autoencoder architecture, employing an encoder to reduce input data

x = (x 1, x 2, …, x m)

to a low-dimensional latent space representation

z = (z 1, z 2, …, z n)

. A decoder then reconstructs the latent space representation into output

x^= (x^1, x^2, …, x^m)

while maintaining the input’s format. In generative tasks, the output

x^

is expected to be similar but not exactly the same as the input

x

for generating new data. Although this work primarily employs the encoder of the VAE to extract path data into latent space, a brief introduction of the full VAE is provided below.

To generate new data

x^

, the true distribution of input

p (x)

must be known. Unfortunately,

p (x)

is often unknown. Instead, we assume a latent space representation

z

where

p (z) ∼ N (0, I)

to indirectly calculate

p (x) = ∫ p (x | z) p (z) d z

. In practice, the latent space can be complex and high dimensional, making

p (x | z)

nearly zero for most

z

. The key idea behind VAE is to sample

z

values that are likely to produce

x

with

p (z | x)

and to estimate

p (x)

just from those samples [11]. This process leads to one of the most important equations in VAE:

(16)

D KL [q (z | x) | | p (z | x)] = E z ∼ q [log q (z | x) − log ⁡ p (z | x)],

where

q (z | x)

is a new distribution that approximates

p (z | x)

, and

D KL

is the Kullback–Leibler divergence (KLD), a measure of the statistical distance between two distributions. According to Bayes’ theorem, Eq. (16) can be transformed into

(17)

log ⁡ p (x) − D KL [q (z | x) | | p (z | x)] = E z ∼ q [log ⁡ p (x | z)] − D KL [q (z | x) | | p (z)] .

Given that the value of KLD is always non-negative, the goal is to find the lower bound of

log ⁡ p (x)

that involves minimizing

D KL [q (z | x) | | p (z | x)]

. On the right-hand side of Eq. (17), the term

E z ∼ q [log ⁡ p (x | z)]

corresponds to reconstructing

x

from

z

, which is also the function of the decoder in VAE. The term

D KL [q (z | x) | | p (z)]

relates to the encoder’s role in making

q (z | x)

close to

p (z)

. To find the lower bound of

log ⁡ p (x)

, we should maximize

E z ∼ q [log ⁡ p (x | z)]

and minimize

D KL [q (z | x) | | p (z)]

. Ultimately, the VAE loss function is

(18)

L = − E z ∼ q [log ⁡ p (x | z)] + D KL [q (z | x) | | p (z)] .

Here, the prior distribution

p (z)

is assumed to follow a standard normal distribution

z ∼ N (0, I)

. The learnable distribution

q (z | x)

follows

q (z | x i) ∼ N (μ i, σ i 2)

with the VAE learning

μ i = f encoder (x i)

and

log σ i 2 = f encoder (x i)

. With these assumptions, the loss function of VAE can be further simplified to

(19)

L = L MSE (x^, x) + 12 (− log σ 2 + μ 2 + σ 2 − 1) .

Finally, a technique called reparameterization trick is utilized for sampling the latent data

z

. To ensure the sampling step remains differentiable, VAE samples a number

ε

from

N (0, I)

and calculates the latent data as

(20)

z = μ + ε × σ .

A well-trained VAE can map similar data close to each other in the latent space

z

. Given two sets of input

x 1

and

x 2

, the encoder outputs

q (z | x 1) ∼

N (μ (x 1), σ i 2 (x 1))

and

q (z | x 2) ∼ N (μ (x 2),

σ i 2 (x 2))

, respectively. Owing to the similarity between

x 1

and

x 2

, distributions

q (z | x 1)

and

q (z | x 2)

will have similar means and variances to ensure that the latent variables

z 1

and

z 2

sampled from these distributions are close to each other.

In our application, the VAE has a two-fold task—it is used to reduce the dimension of the input path and map similar paths together. After training, the VAE can project an input path

P

with 300 dimensions into a low-dimensional latent space of n dimensions. We then train an MLP that learns the mapping between the latent representation of the path to their respective mechanisms (Fig.7). Similar to the first method, we train several VAEs with varying latent dimensions to assess their impact on the reconstruction quality. The detailed architecture of VAE is shown in Tab.2.

3.3 Attention mechanisms

Prior to the discussion of the last deep learning model, an important concept known as attention mechanism must be introduced. Renowned for their ability to focus on the most relevant parts of the input data for a given task, attention mechanisms have become one of the most prominent neural network architectures. Originally developed for natural language processing applications [44], attention mechanisms have been adapted across various fields, including computer vision [45].

The architecture of a commonly used attention mechanism called scaled dot-product attention comprises three main components,

Query (Q)

Key (K)

, and

Value (V)

. All Q, K, and V values are calculated by their own trainable single linear layer from the input data of the attention mechanism. Every query takes a different amount of information from the values by weighing them with their keys. This process allows the model to pay close attention to the data that has a great impact on the overall model. The dimensions of Q, K, and V are often chosen to be equal for simplicity but can be set to any value. Each component is calculated from its own trainable MLP and represents the input data from different perspectives. The output of the attention mechanism is computed by [44,46]

(21)

A t t e n t i o n (Q, K, V) = s o f t m a x (Q K T d k) V,

where the softmax function facilitates the conversion of the weighted scores into a probability distribution over the V vectors. In this context,

d k

represents the dimensionality of the K vectors.

Self-attention is a foundational version of the attention mechanism, where the Q, K, and V vectors are all derived from the same input data. This process allows the model to focus on different parts of a single input, capturing various dependencies and relationships within that input. Multi-head attention is an enhanced version of the self-attention mechanism that incorporates multiple parallel attention mechanisms. By calculating several attentions in parallel, multi-head attention employs different sets of Q, K, and V vectors to determine distinct sets of weights for the input data and thereby capture various aspects of the data.

Cross-attention is another variant where the attention mechanism is applied between two distinct sets of inputs to integrate information across these sets. In cross-attention,

Q

is calculated from the first set, and

K

and

V

are derived from the second set. This approach enables the model to effectively combine the two distinct sets and capture complementary information from both sources. For additional details on multi-head attention and cross-attention, please refer to Ref. [44]. In this work, we use all mentioned formats of attention to concentrate the data for

c − β − VAE

3.4 Conditional variational autoencoder ( $c − VAE$ )

c − VAE

[47], a variation of VAE designed for supervised learning, incorporates a condition alongside the input, allowing the model to generate reconstructed data with a given condition. The general architecture of

c − VAE

is similar to that of VAE with an encoder, latent space, and a decoder. The three variables in

c − VAE

are input

x

, condition

y

, and latent space data

z

. The encoder processes the merged data of the

x

and

y

to generate

z

, which is then merged with

y

and fed into the decoder. The decoder’s output,

x^

, is a reconstruction of the input while satisfying the given condition. The loss function of

c − VAE

is an extension of the regular VAE and is given by

(22)

L = − E z ∼ q [log p (x | z, y)] + D KL [q (z | x, y) | | p (z | y)] .

β − VAE

is introduced to encourage a disentangled representation of the latent space [48]. As a hyperparameter,

β

controls the balance of reconstruction loss and the KL divergence by scaling the KL divergence term. The loss of

β − VAE

is given by

(23)

L = − E z ∼ q [log p (x | z)] + β ⋅ D KL [q (z | x) | | p (z)] .

In this work, we employ a combination of

β − VAE

and

c − VAE

c − β − VAE

. Thus, the loss function of the final model is

(24)

L = − E z ∼ q [log p (x | z, y)] + β ⋅ D KL [q (z | x, y) | | p (z | y)] .

Our

c − β − VAE

model takes the initial coordinates of mechanism

J

as input and expected (reconstructed) output. The normalized path coordinates

P

, which serves as the condition. During training, the MSE loss minimizes reconstruction error between input and output coordinates of mechanisms’ joints, aiding accurate mechanism parameter reconstruction. The aim is to reconstruct mechanisms from the database and generate defect-free mechanisms for a given coupler curve by sampling from the latent space.

Fig.8 shows our third model’s architecture, which also incorporates self- and cross-attention mechanisms. As mentioned in Section 3.3, self-attention evaluates the relevance of elements within the same input vector, and cross-attention calculates the correlations between distinct vectors.

The process begins by amalgamating the distinctive features of the input mechanism parameters with the path condition, standardizing the input mechanism

J

and the path condition

P

using MLPs to obtain

e J

and

e P

, respectively. A cross-attention then takes

e J

as the first input for calculating

Q

and

e P

as the second input for calculating

K

and

V

. This cross-attention is applied to fully integrate information across

e J

and

e P

. The cross-attention is followed by a self-attention. Alatent vector

z

is then generated by MLPs after the self-attention block. The latent vector

z

is taken into a second cross-attention again with

e P

and then sent to another self-attention block. In the end, a single linear layer projects the results from the self-attention block to

J^

During inference, the exploration of the latent space within specified path conditions allows for the generation of mechanisms closely aligned with the desired path. The number of self-attention blocks, akin to the layer count in the MLP model, serves as a hyper-parameter. Optimal block numbers minimize loss until a threshold is reached, beyond which the training efficacy diminishes. The detailed architecture of

c − β − VAE

is presented in Tab.3.

4 Results and discussion

After the models’ architectures are detailed in Section 3, the results for all the architectures employed are presented in this section. Our dataset comprises 5316624 data paths in total, with 90% of the data used for training, 5% for validation, and 5% for testing. All the models discussed in this paper were trained on the same dataset. To evaluate the robustness of our model, we added 20% random paths out of the original test set to the test set. These paths are generated by RSCR mechanisms with randomly picked joint locations and rotational axes without the discrete mesh grid nodes. We employed the Hausdorff distance [49] to assess similarity between two paths. Assuming

A

and

B

are two sets of points, the Hausdorff distance is given by

H (A, B) = max {h (A, B), h (B, A)}

, where

h (A, B) = ma x a ∈ A {mi n b ∈ B d (a, b)}

and

d (a, b)

is any metric between the two sets of points. Experimental studies indicated that a Hausdorff distance of 0.027 is indicative of satisfactory similarity between two paths, and this threshold is employed to evaluate satisfactory results.

Tab.4 shows only the most significant MLP models that we trained. Starting with a learning rate of

0.001

and employing an automatic scheduler for learning rate reduction, we trained each of these models for over 200 epochs until the loss plateaued. The initial training loss for all models began at

L MSE ≈ 0.62

. Every model contains several hidden layers, each with the same number of neurons in our case; the detailed architecture is shown in Tab.1. For the test database evaluation, only the predicted mechanism that generates a closed path, the focus of this work, was considered valid. A predicted path was deemed satisfactory if its Hausdorff distance from the input path is less than

0.027

. The table also includes the mean and median values of the Hausdorff distances across all the test paths.

The results presented in Tab.4 indicates that deep MLP architectures with more neurons tend to perform better than the shallow ones with few neurons. However, models with a larger number of parameters require long training times. The most effective model is an MLP with

9

hidden layers and

4096

neurons in each layer. The disparity between a low median Hausdorff distance and a higher mean Hausdorff distance suggests that this model performs well for certain types of input paths but is less effective for others. This characteristic is further illustrated in Fig.9, which shows a comparison of the predicted paths (in green) against randomly chosen input paths from the test dataset (in red). For clarity, we do not show the mechanisms generating the predicted path.

Tab.5 presents three of the VAEs with MLP models we have trained; for the detailed architecture, please see Tab.2. All three models share the same encoder architecture, which consists of a single MLP layer mapping the input dimensions to the latent space dimensions and a decoder that reverses this mapping. The subsequent MLP architectures in these models are identical, each featuring

8

hidden layers with

4096

neurons per layer. We also explored various architectures with different encoder/decoder configurations and subsequent MLP architectures, discovering that the dimensionality of the latent space has the most significant impact on the results. Each VAE model was trained for over

400

epochs with an automatically learning rate reduction scheduler until the loss plateaued. Similarly, each MLP was trained for over

200

epochs.

Compared with the MLP only method, the VAE plus MLP architecture yields slightly better results, achieving a higher ratio of good paths and lower mean and median values of the Hausdorff distance. Although the MLP only method generates a single mechanism approximating the input path, the VAE plus MLP architecture can produce multiple predicted mechanisms, which is critical to support concept generation. During training, the latent space data are extracted directly as the input for the subsequent MLP. With the trained model, the input desired path is first mapped to a latent representation. Samplings around the neighborhood of this latent representation are then fed to the subsequent trained MLP, yielding multiple results. Fig.10 shows a comparison of the predicted paths in green with the randomly chosen input paths from test dataset in red. Fig.11(a) displays nine different predicted paths for the same input path in red, and Fig.11(b) shows nine different RSCR mechanisms corresponding to these nine predicted paths. These results demonstrate that the VAE plus MLP architecture performs better than the MLP only architecture and provides various mechanisms for the same input.

Tab.6 presents some

c − β − VAE

models we have trained, highlighting the impact of varying

β

values. We found that different

β

values impact the results. The listed architectures utilize the exact same architecture in Tab.3 except

β

. Compared with the other models discussed previously,

c − β − VAE

achieves significantly better results. During testing, we took each test path as a condition, combined it with 30 random samples from the latent space, and sent it to the decoder to generate 30 different mechanisms. After decoding, the predicted path with the lowest Hausdorff distance from the desired path was selected as the result. The random samples were drawn from the learned latent distribution so that they are valid and introduce diverse outcomes. Among all the

c − β − VAE

models, the best result is achieved with

β = 10

, leading to a

78.9 %

ratio of good path. An additional test path database of

1000

paths generated by fully random RSCR mechanisms was used to assess the robustness of the best model. Shown in the third row of Tab.6, the results of this evaluation indicate that even with randomly generated paths, the best model maintains a high performance and achieves a

70.8 %

ratio of good paths. Fig.12(a) illustrates a randomly selected result from the test path dataset, and Fig.12(b) shows part of the result from the additional random test path dataset. Furthermore, Fig.13 shows nine different predicted paths with their corresponding RSCR mechanisms, all derived from the same input path. The link lengths and rotational axes of these nine mechanisms are listed in Tab.7 to illustrate their uniqueness from each other. While some of them, for example the 1st and the 4th mechanisms, are quite similar to each other, the majority display uniqueness. This finding further proves our model’s capability to generate multiple, diverse RSCR mechanisms.

In this section, we presented the results from the three different models we explored. To verify the statistical difference between these results, McNemar’s tests [50,51] were applied to compare the Hausdorff distances from the different architectures. When the Hausdorff distance is less than 0.027, the prediction is considered correct; otherwise, the prediction is considered wrong. With the use of the best model from each architecture, the p-value obtained between MLP and VAE plus MLP is 0.738, indicating no significant difference between their results. The p-value between

c − β − VAE

and MLP is

2.75 × 10 − 11

, and that between

c − β − VAE

and VAE plus MLP is

3.95 × 10 − 11

. These two p-values indicate are significant differences between the results of

c − β − VAE

and the other two architectures. Among all the architectures,

c − β − VAE

generally performs better, exhibiting lower losses, a higher closed-path ratio, and a greater satisfactory ratio. When the hyperparameter

β

is set to 10,

c − β − VAE

demonstrates excellent performance across various paths. In addition, the

c − β − VAE

architecture can generate multiple solutions for a single input path. Although the results between MLP and VAE plus MLP are similar, the latter still stands out due to its ability to generate multiple solutions.

5 Application example

This section presents an example of the practical application of RSCR mechanisms. Zhao et al. [1] collected data on human upper-limb motion and designed a 1-DOF planar four-bar mechanism to approximate this motion. Considering the limitation of planar mechanisms, they projected the spatial upper-limb motion onto a neutral plane. The spatial RSCR mechanism can directly accommodate the spatial motion without the need for projection. By utilizing the original collected spatial data, the RSCR mechanism can provide additional natural solutions. Although this work focuses on path synthesis and only utilizes the path data given by Zhao et al. [1], the RSCR mechanism also holds potential for motion synthesis for future research. The raw path given by Ref. [1] is shown in Fig.14, where the green path represents the trajectories of the wrist joint and the red path illustrates the trajectory of the elbow joint.

Following our pipeline, the raw trajectories were first converted into B-splines and then normalized. These normalized paths were fed into our optimal model,

c − β − VAE

(β = 10)

, where they were copied 100 times, and each copy was combined with a random sample from latent space to ensure robust matching. The best matches were selected as the final design. Fig.15 displays the top matches for the wrist and elbow paths, alongside the RSCR mechanisms that generate these optimal paths. Our model achieves excellent matches, with Hausdorff distances of

0.01031

and

0.0175

for the wrist and elbow paths, respectively. After scaling back, the design parameters for the practical application of the RSCR mechanisms are detailed in Tab.8.

6 Conclusions and future work

In this work, we detailed a comprehensive process for the path synthesis of spatial RSCR mechanisms using deep learning models. A dedicated simulator was developed to generate a robust database containing over 5 million normalized paths, providing a reliable foundation for deep learning training. We explored three distinct deep learning models: MLPs, VAE plus MLP, and a novel approach using

c − β − VAE

. Comprehensive evaluations proved

c − β − VAE

with

β = 10

to be the most effective model.

We also presented a practical application in human upper-limb rehabilitation, where our trained

c − β − VAE

model successfully matched RSCR mechanisms with the wrist and elbow trajectories from real human movements. This demonstration highlighted the potential of RSCR mechanisms and our methods to make significant contributions in real-world scenarios.

Looking forward, we aim to extend these methods to diverse mechanism types and motion synthesis problems, continuing to refine the accuracy and applicability of deep learning models in mechanical design.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhao P, Zhang Y T, Guan H W, Deng X T, Chen H D. Design of a single-degree-of-freedom immersive rehabilitation device for clustered upper-limb motion. Journal of Mechanisms and Robotics, 2021, 13(3): 031006

[2]	Song W, Zhao P, Li X, Deng X T, Zi B. Data-driven design of a six-bar lower-limb rehabilitation mechanism based on gait trajectory prediction. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2023, 31: 109–118

[3]	ZhangYDeng X TZhouBZhaoP. Design and optimization of a multi-mode single-DOF watt-I six-bar mechanism with one adjustable parameter. In: Proceeding of Advances in Mechanism, Machine Science and Engineering in China. Singapore: Springer, 2022

[4]	Deng X T, Purwar A. A matrix-based approach to unified synthesis of planar four-bar mechanisms for motion generation with position, velocity, and acceleration constraints. ASME Journal of Computing and Information Science in Engineering, 2024, 24(12): 121003

[5]	Watanabe K, Sekine T, Nango J. Kinematic analysis and branch identification of RSCR spatial four link mechanisms. JSME International Journal Series C Mechanical Systems, Machine Elements and Manufacturing, 1998, 41(3): 450–459

[6]	ThompsonJ M. Computer aided design and synthesis of the RSCR spatial mechanism. Thesis for the Master’s Degree. Blacksburg: Virginia Polytechnic Institute and State University, 1987

[7]	BagciC. The RSRC space mechanism – analysis by 3 × 3 screw matrix, synthesis for screw generation by variational methods. Dissertation for the Doctoral Degree. Oklahoma State University, 1969

[8]	Chiang C H, Chieng W H, Hoeltzel D A. Synthesis of the RSCR mechanism for four precision positions with relaxed specifications. Mechanism and Machine Theory, 1992, 27(2): 157–167

[9]	Ananthasuresh G K, Kramer S N. Analysis and optimal synthesis of the RSCR spatial mechanisms. Journal of Mechanical Design, 1994, 116(1): 174–181

[10]	Shrivastava A K, Hunt K H. Dwell motion from spatial linkages. Journal of Engineering for Industry, 1973, 95(2): 511–518

[11]	Wang X, Guo W Z. The design of looped-synchronous mechanism with duplicated spatial assur-groups. Journal of Mechanisms and Robotics, 2019, 11(4): 041014

[12]	Osman M O M, Segev D. Kinematic analysis of spatial mechanisms by means of constant distance equations. Transactions of the Canadian Society for Mechanical Engineering, 1972, 1(3): 129–134

[13]	Osman M O M, Bahgat B M, Dukkipati R V. Kinematic analysis of spatial mechanisms using train components. Journal of Mechanical Design, 1981, 103(4): 823–830

[14]	Huang T C, Youm Y. Exact displacement analysis of four-link spatial mechanisms by the direction cosine matrix method. Journal of Applied Mechanics, 1984, 51(4): 921–928

[15]	Regenwetter L, Nobari A H, Ahmed F. Deep generative models in engineering design: A review. Journal of Mechanical Design, 2022, 144(7): 071704

[16]	Vasiliu A, Yannou B. Dimensional synthesis of planar mechanisms using neural networks: application to path generator linkages. Mechanism and Machine Theory, 2001, 36(2): 299–310

[17]	Galán-Marín G, Alonso F J, Del Castillo J M. Shape optimization for path synthesis of crank-rocker mechanisms using a wavelet-based neural network. Mechanism and Machine Theory, 2009, 44(6): 1132–1143

[18]	ChuiC K. An Introduction to Wavelets. San Diego: Academic Press, 1992

[19]	Deshpande S, Purwar A. A machine learning approach to kinematic synthesis of defect-free planar four-bar linkages. Journal of Computing and Information Science in Engineering, 2019, 19(2): 021004

[20]	Deshpande S, Purwar A. Computational creativity via assisted variational synthesis of mechanisms using deep generative models. Journal of Mechanical Design, 2019, 141(12): 121402

[21]	Sharma S, Purwar A. A machine learning approach to solve the alt–burmester problem for synthesis of defect-free spatial mechanisms. Journal of Computing and Information Science in Engineering, 2022, 22(2): 021003

[22]	Nurizada A, Purwar A. An invariant representation of coupler curves using a variational autoencoder: Application to path synthesis of four-bar mechanisms. Journal of Computing and Information Science in Engineering, 2024, 24(1): 011008

[23]

LeeSKimJ KangN. Deep generative model-based synthesis of four-bar linkage mechanisms considering both kinematic and dynamic conditions. In: Proceedings of the ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Boston: ASME, 2023, V03AT03A016

[24]	Purwar A, Chakraborty N. Deep learning-driven design of robot mechanisms. Journal of Computing and Information Science in Engineering, 2023, 23(6): 060811

[25]	NobariA HSrivastava AGutfreundDXuKAhmedF. LInK: learning joint representations of design and performance spaces through contrastive learning for mechanism synthesis. 2024, arXiv preprint arXiv:2405.20592

[26]	YimN HRyu JKimY Y. Big data approach for synthesizing a spatial linkage mechanism. In: Proceedings of IEEE International Conference on Robotics and Automation. London: IEEE, 2023, 7433–7439

[27]	KingmaD PWelling M. Auto-encoding variational Bayes. 2013, arXiv preprint arXiv:1312.6114

[28]	Nurizada A, Dhaipule Z, R A. A dataset of 3M single-DOF planar 4-, 6-, and 8-bar linkage mechanisms with open and closed coupler curves for machine learning-driven path synthesis. ASME Journal of Mechanical Design, 2025, 147(4): 041702

[29]	Nurizada A, Lyu Z, Purwar A. Path generative model based on conditional β-variational auto encoder for four-bar mechanism design. Journal of Mechanisms and Robotics, 2025, 17(6): 061004

[30]	DengXNurizada APurwarA. Synthesizing spatial RSCR mechanisms for path generation using a deep neural network. In: Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASME, 2024

[31]	Javier G dJEduardo B. Kinematic and Dynamic Simulation of Multibody Systems: The Real-Time Challenge. New York: Springer, 2011, 72–74

[32]

VirtanenPGommers ROliphantT EHaberlandMReddyT CournapeauDBurovski EPetersonPWeckesserWBrightJ vander Walt S JBrettMWilsonJ MillmanK JMayorov NNelsonA R JJonesEKernR LarsonECarey C JPolatİFengYMooreE W VanderPlasJLaxalde DPerktoldJCimrmanRHenriksen IQuinteroE AHarrisC RArchibald A MRibeiroA HPedregosaFvanMulbregt PSciPy1.0 Contributors. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 2020, 17: 261–272

[33]	Chase T R, Mirth J A. Circuits and branches of single-degree-of-freedom planar linkages. Journal of Mechanical Design, 1993, 115(2): 223–230

[34]	PieglLTiller W. The NURBS Book. 2nd ed. Berlin: Springer, 1997

[35]	LyuZ JPurwar A. Design and development of a sit-to-stand device using a variational autoencoder-based deep neural network. In: Proceedings of ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. St. Louis: ASME, 2022, V007T07A027

[36]	JolliffeI T. Principal Component Analysis. 2nd ed. New York: Springer, 2002

[37]	Yu S C, Chang Y, Lee J J. A generative model for path synthesis of four-bar linkages via uniform sampling dataset. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 2023, 237(4): 811–829

[38]	SenerSUnel M. Geometric invariant curve and surface normalization. In: Campilho A, Kamel M, eds. Image Analysis and Recognition. Berlin: Springer, 2006, 445–456

[39]	Purwarlab. RSCR Mechanisms. 2025, available at www.kaggle.com/datasets/purwarlab/rscr-mechanisms website

[40]	HaykinS. Neural Networks: A Comprehensive Foundation. 2nd ed. Upper Saddle River: Prentice Hall, 1998

[41]	AgarapA F. Deep learning using rectified linear units (ReLU). 2019, arXiv preprint arXiv:1803.08375

[42]	BaJ LKiros J RHintonG E. Layer normalization. 2016, arXiv preprint arXiv:1607.06450

[43]	Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929–1958

[44]	VaswaniAShazeer NParmarNUszkoreitJJonesL GomezA NKaiser ŁPolosukhinI. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017, 6000–6010

[45]	Han K, Wang Y H, Chen H T, Chen X H, Guo J Y, Liu Z H, Tang Y H, Xiao A, Xu C J, Xu Y X, Yang Z H, Zhang Y M, Tao D C. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87–110

[46]	SuttonR SBarto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018

[47]	SohnKLee HYanX C. Learning structured output representation using deep conditional generative models. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015, 3483–3491

[48]	HigginsIMatthey LPalABurgessCGlorotX BotvinickMMohamed SLerchnerA. β-VAE: Learning basic visual concepts with a constrained variational framework. In: Proceedings of International Conference on Learning Representations (ICLR 2017). 2017

[49]	Rote G. Computing the minimum Hausdorff distance between two point sets on a line under translation. Information Processing Letters, 1991, 38(3): 123–127

[50]	McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 1947, 12(2): 153–157

[51]	RaschkaS. Model evaluation, model selection, and algorithm selection in machine learning. 2020, arXiv preprint arXiv:1811.12808

RIGHTS & PERMISSIONS

The Author(s). This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap

PDF (10367KB)

1403

Accesses

Citation

Detail

Sections

Recommended

Received	Accepted	Published
2024-05-02	2024-10-09	2025-04-15
Issue Date	Revised Date
2025-05-06

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Guidelines for authors

Download templates

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Database generation

2.1 Simulator

2.2 Data collection

3 Deep learning models

3.1 MLP

3.2 VAE

3.3 Attention mechanisms

3.4 Conditional variational autoencoder ( $c − VAE$ )

4 Results and discussion

5 Application example

6 Conclusions and future work

References

RIGHTS & PERMISSIONS

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Database generation

2.1 Simulator

2.2 Data collection

3 Deep learning models

3.1 MLP

3.2 VAE

3.3 Attention mechanisms

3.4 Conditional variational autoencoder ( c−VAE)

4 Results and discussion

5 Application example

6 Conclusions and future work

References

RIGHTS & PERMISSIONS

AI思维导图

3.4 Conditional variational autoencoder ( $c − VAE$ )