Multi-component gas analysis based on the overlapping region of near-infrared gas photothermal spectra

Xianjie Zhong; Chao Wang; Shoulin Jiang; Shuangxiang Zhao; Wei Jin

doi:10.2738/foe.2026.0016

Front. Optoelectron. ›› 2026, Vol. 19 ›› Issue (2) :16 DOI: 10.2738/foe.2026.0016

RESEARCH ARTICLE

Multi-component gas analysis based on the overlapping region of near-infrared gas photothermal spectra

Author information +

History +

PDF (3059KB)

Abstract

In this paper, we report multi-component gas analysis from a narrow overlapping near-infrared spectral window using second-harmonic photothermal interferometry (PTI) combined with partial least-squares regression (PLSR). The analytes in a 5 cm hollow core Fabry−Pérot probe are pumped by a 40 mW DFB laser tuned from 1680 nm to 1681.2 nm and probed at 1570 nm. A total of 460 spectra for the gases CH₄, C₂H₆, and C₂H₄ at different concentrations were automatically recorded, with each spectrum containing 520 points. Using 80%/20% train/validation splits and 5-fold cross-validation, the PLSR model exhibits an overall relative error of 0.319%. The model predictions can maintain a good relative error of about 0.5% with only 180 training samples, or 150 attention-focused points, or 33-point down-sampled spectra. This narrow-band single-laser fiber-integrated PTI with PLSR would enable accurate gas component prediction for industrial and medical applications.

Graphical abstract

Keywords

Narrow-band / Multi-component gas analysis / Hollow-core fibers / Partial least-squares regression / Near-infrared photothermal spectroscopy

Cite this article

Download citation ▾

Xianjie Zhong, Chao Wang, Shoulin Jiang, Shuangxiang Zhao, Wei Jin. Multi-component gas analysis based on the overlapping region of near-infrared gas photothermal spectra. Front. Optoelectron., 2026, 19(2): 16 DOI:10.2738/foe.2026.0016

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

Near-infrared (NIR) spectroscopy gas sensing technologies are important tools in food quality assessment, biomedical diagnosis, industrial process control, and environmental monitoring due to their good gas selectivity, high device maturity, compact system architecture, and excellent fiber compatibility which enable remote and distributed monitoring [1–3]. Gas NIR spectra typically refer to the absorption of gas molecules in the 780–2526 nm (12820–3959 cm⁻¹) band. The absorption primarily results from the overtone and combination bands of fundamental vibrations of molecular groups [4]. Multi-component gas analysis simultaneously determines the concentrations of multiple gas components in a sample, which is a more challenging task in NIR gas spectroscopy. The spectra of mixed gases are the superposition of the absorptions of each component, and the problems of spectral line overlapping and interference are more prominent. The characteristic absorption peaks of each gas may no longer be as easily distinguishable as in single-component cases, and algorithms are needed to “disentangle” the mixed spectra into the contributions of each component. At present, multicomponent gas analysis is mainly performed using Raman spectroscopy and broadband absorption spectroscopy (BAS). Raman spectroscopy is typically implemented in the visible spectral region, with a sensitivity at the level of parts-per-million (ppm) [5]. The BAS technology uses a single measurement over a wide spectral range to obtain the overall absorption characteristics of a mixed gas. Sensitivity of the BAS technology is determined by the resolution of spectrometers, such as Fourier transform infrared spectroscopy (FTIR), fiber optic spectrometer (FOS) and array-detector. The typical resolution of FTIR system is between 0.02 and 20 nm [6,7]. Furthermore, BAS technology based on frequency-modulated continuous-wave (FMCW) interferometry has been developed to enable multi-point distributed gas sensing [8]. In the NIR region, the FOS and array-detector resolutions are 3−30 nm depending on slit/grating. This resolution is not sufficient to distinguish individual gas absorption lines (typically tens of picometers in width).

In contrast to broadband strategies, narrowband absorption spectroscopy (NAS) analysis confines the measurement to one or a few discrete rovibrational transitions of the target species, rather than broad absorption envelopes. Gas analysis systems based on this principle include photoacoustic spectroscopy (PAS), tunable diode laser absorption spectroscopy (TDLAS), and photothermal spectroscopy (PTS). Recent advancements have demonstrated the vitality of these techniques. For instance, TDLAS has achieved ppb-level HF detection in complex matrices using long-path (76 m) multi-pass cells [9], while PAS has expanded its capabilities through UV [10] and QCL [11] excitation for highly sensitive detection of formaldehyde, volatile organic compounds, and CO_x [12]. However, these high-performance systems often rely on bulky free-space optics or complex resonant cavities. In contrast, offering a compact, fiber-integrated alternative, a hollow core fiber photothermal interferometry technique has demonstrated a detection limit down to the parts-per-trillion level [13]. The main issue with using NAS technologies for multi-component gas analysis is the need for multiple lasers corresponding to the absorption lines of different gases [14,15], and the optical signals of different gases would overlap after photoelectric conversion. To distinguish various gases, time-division multiplexing [16] or frequency-division multiplexing (FDM) methods [17] need to be employed. In FDM PTS-based multicomponent analysis of CO, CO₂ and H₂O, the limits of detection (LODs) achieved were 0.6 ppb for CO, 1.5 ppb for CO₂, and 222 ppb for H₂O [18].

The application of multiple lasers in the NAS systems increases the complexity of the system and reduces its reliability. Although utilizing the overlapping spectral regions can significantly reduce the number of lasers, this single-laser narrowband approach inherently introduces cross-interference during detection. Recently, analytical methods like utilizing Taylor expansion to solve for the second harmonic (2f) peak shift have been proposed to mitigate this interference [19]. Nevertheless, the applicability of such analytical models is limited in complex gas environments; instead, multivariate regression algorithms offer a more powerful solution for disentangling severe spectral overlaps. Zhao et al. studied the analysis of overlapping absorption lines of CO and CH₄ in the 2300 nm band using TDLAS with a single semiconductor distributed feedback (DFB) laser. The experiment constructed a multivariate partial least-squares regression (PLSR) model with 1024 sampling points in the 2326.66–2327.75 nm full window as independent variables and the concentrations of CO and CH₄ as dependent variables. With 30 sets of mixed-gas standards (CO: 2–100 ppm, CH₄: 200–8000 ppm) for training and prediction, they achieved a prediction error less than 5% [20]. Photothermal gas analysis technology also has similar methods. Zhou et al. developed a fiber-optic measurement system employing a single DFB laser (linewidth 2 MHz) and photothermal interferometry method to resolve the overlapping absorption spectra of acetylene and ammonia at 1530.371 nm and 1530.327 nm using a PLSR model trained on 63 samples (NH₃: 100–700 ppm; C₂H₂: 100–900 ppm). The prediction error of PLSR is 3.84% [21]. While these studies reported relatively high prediction errors, they effectively demonstrate the capability of PLSR to handle overlapping spectra in both TDLAS and PTS systems. Furthermore, the zero-background nature of PTS provides a superior dynamic range compared to TDLAS, enabling the extraction of more robust spectral features for high-precision regression.

Beyond conventional multivariate methods, deep learning approaches, such as Convolutional Neural Networks (CNNs) combined with Gated Recurrent Units (GRUs), have recently demonstrated impressive accuracy in multi-component gas quantification [22]. Additionally, advanced neural network architectures like U-net have been successfully employed as noise filters to achieve sub-ppb level sensitivity in photoacoustic sensors [23]. However, these data-driven neural network models typically require extensive data sets comprising thousands of samples to minimize overfitting [24,25] and often lack transparency in feature selection. In the context of spectroscopic analysis, where the underlying physical principle (Beer-Lambert law) [26] is inherently linear and the available sample size is moderate, PLSR remains the preferred method. Unlike deep neural networks, PLSR offers superior computational efficiency and, crucially, provides spectral interpretability through latent variables (LVs) [27]. This capability allows for the physical verification of gas absorption features even amidst overlapping interference. Therefore, considering the balance between interpretability, and data efficiency, PLSR is adopted in this work to disentangle the overlapping spectral signals.

While PLSR is efficient with moderate data sets, expanding the training sample size and diversity remains a critical strategy for further minimizing prediction errors compared to previous small-scale studies. To achieve this, we constructed an automated gas-mixing system. Using 460 spectra for PLSR model training and prediction, we achieved an average

R 2

of 0.99984 and a prediction error of 0.319%. Leveraging the large data set, we further investigated the impact of different data representations and training strategies on the accuracy of mixed-gas concentration estimation.

2 Principle and method

The PTS gas sensing technology transduces weak optical absorption into variations of temperature, refractive index, and optical path length, which are subsequently interrogated interferometrically or by other phase-sensitive techniques. Compared with direct absorption spectroscopy, PTS eliminates the need for long free-space multi-pass cells, can be fully implemented in fiber, and is intrinsically compatible with narrow-linewidth lasers and compact hollow-core fiber gas cells. These characteristics make photothermal interferometry particularly attractive for high-sensitivity, multi-component gas detection in the NIR, where fundamental vibrational transitions are relatively weak.

In a gas-filled hollow-core fiber, a NIR pump beam propagates in the fundamental mode with intensity profile

I (r, z)

, where

r

denotes the radial coordinate in the fiber core cross-section and

z

is the longitudinal propagation coordinate along the fiber. For a weakly absorbing gas with peak absorption coefficient

A = α C

(where

C

is the relative gas concentration and

α

is the peak coefficient at 100% concentration), and normalized line-shape function

φ w (λ pump)

, the local volumetric heat generation rate due to non-radiative relaxation can be written as [28]

H (r, z) = Y H A φ w (λ pump) I (r, z),

where

Y H

is the heat yield. This non-uniform heating leads to spatially varying temperature, density, and pressure in the hollow core, reducing the gas refractive index in the core center and inducing a small thermoelastic change in the effective length of the fiber. The combined effect of the fractional change in effective refractive index

ζ (z) = Δ n eff (z) / n eff

and the fractional length change

ε (z) = Δ l (z) / l

produces a longitudinally varying optical path perturbation for a co-propagating probe at wavelength

λ probe

. The accumulated photothermal phase shift over a fiber length

L

can then be expressed in integral form as [28]

Δ ϕ = 2 π n eff λ probe ∫ 0 L [ζ (z) + ε (z)] d z,

where

ζ (z) + ε (z)

is proportional to

A φ w (λ pump) I 0 (z)

through the thermo–optic and thermo–elastic response. Substituting this proportionality into the integral shows that

Δ ϕ

scales linearly with the product of the peak absorption coefficient

A

(and hence concentration

C

), the line-shape function

φ w (λ pump)

, and the pump power averaged along the interaction length, enabling quantitative trace-gas detection via interferometric phase readout.

PLSR is employed to process these overlapping spectral features and predict the concentration of gas by constructing the regression coefficient matrix

B

. The core idea of PLSR is to reduce the dimensionality of the original multivariate data by constructing a set of low-dimensional latent variables (LVs) that simultaneously capture the main variation in the predictor matrix

X

and exhibit strong linear correlation with the response matrix

Y

. Each LV is obtained as a weighted linear combination of the original variables, and the corresponding weight vector

w

is iteratively optimized through maximizing the covariance between the two LVs. Different LVs are approximately orthogonal and mutually uncorrelated, so that a small number of LVs can condense the information in

X

that is most relevant to

Y

, while mitigating the adverse effects of strong collinearity among spectral variables [29].

Assume the matrix of the mixed gas sample containing

m

wavelength components is denoted as

x

[x (1), x (2), …, x (m)]

. The sample consists of a concentration matrix of

l

types of gases as

y

[y (1), y (2), …, y (l)]

. The total number of samples is

n

. The response matrices are

X = [x 1, …, x n] T

and

Y = [y 1, …, y n] T

The data of sample

x i

is usually normalized using z-score to eliminate mean offsets and equalize scale. For a feature

x = {x k} k = 1 N

z k = x k − μ σ, μ = 1 N ∑ k = 1 N x k, σ = 1 N − 1 ∑ k = 1 N (x k − μ) 2 .

At iteration

j

, the weight vector

w j

is derived to maximize the cross-covariance between the spectral matrix

X

and the concentration matrix

Y

. After applying

ℓ 2

-normalized, this weight vector is used to extract the latent score vector

t j

via

t j = X (j) w j,

where

X (j)

denotes the current residual. Each iteration is then regressed onto the

t j

by least squares to obtain a spectral loading

p j

and a response coefficient

q j

, i.e.,

X ≈ t j p j T

and

Y ≈ t j q j T

. The contributions

t j p j T

and

t j q j T

are subtracted (deflation) to form new residuals for the next iteration, and this procedure is repeated for

j = 1, …, a

. Collecting the results gives

W = [w 1, …, w a]

P = [p 1, …, p a]

, and

Q = [q 1, …, q a]

, from which the overall regression matrix is

B = W (P T W) − 1 Q T, Y ≈ X B .

The number of LVs

a

is chosen by

K

-fold cross-validation using the mean absolute error (MAE).

M A E C V (a) = 1 N t r a i n l ∑ i = 1 N t r a i n ∑ k = 1 l ∣ y i k − y^i k (a) ∣,

where

N train

is the number of training samples and

l

is the number of output analytes. In

K

-fold cross-validation, the training set is randomly partitioned into

K

approximately equal subsets, each subset being used once as a validation set while the remaining

K − 1

subsets are used to fit the PLSR model. The value minimizing

MAE CV (a)

is retained. With this

a

, the extraction is rerun on the full training set to obtain the final

B

; a new spectrum

x ∈ R 1 × m

is then mapped to concentrations by

y^= x B ∈ R 1 × l

To explicitly define the error metric used throughout this work, we calculated the mean of the absolute relative deviations between the predicted (

y^i

) and reference (

y i

) concentrations as prediction error, herein referred to as the relative error. Mathematically, this is equivalent to the mean absolute percentage error (MAPE):

R e l a t i v e error = 1 N ∑ i = 1 N ∣ y i − y^i y i ∣ × 100 % .

To optimize the model, the data for regression analysis are sometimes pre-processed through algorithms, such as standard normal variate (SNV), multiplicative scatter correction (MSC), 1st/2nd derivative transforms, and Savitzky−Golay (SG) smoothing. These procedures help attenuate additive and multiplicative scatter [30], correcting spectral distortions [31], removing slowly varying baselines [32], improving the signal-to-noise ratio [33], and preserving peak shape [34]. In addition, we systematically examined how data set design choices affect model performance, including spectral-region selection, training-set size reduction, value clipping, and data-point down-sampling to identify a more efficient modeling workflow.

3 Experiment of multicomponent gas analysis

The training data that is the narrow-band spectra of gas mixture, for PLSR analysis were acquired using the system shown in Fig. 1. The PTS demodulation system is similar to the Fabry−Pérot interferometric (FPI) PTS system reported in Ref. [35]. A temperature-tuned DFB laser with center wavelength of 1680.6 nm was used as the pump light source; a narrow linewidth laser with a wavelength of 1570 nm (Connet Ltd. VENUS) was used as the probe source. In the gas sensing, the absorption of pump energy by the target gas molecules induces a localized temperature rise via non-radiative relaxation, which modulates the refractive index in the hollow core. This refractive index variation results in a phase shift of the probe light that co-propagates in the hollow core. Consequently, a lock-in amplifier (Zurich Instruments MFLI 500 kHz) was employed to demodulate this photothermal-induced phase shift which is proportional to the gas concentration. As illustrated on the right side of Fig. 1, the gas delivery system employs three mass flow controllers (MFCs, Shanxi Yidu Ltd. EC series) to independently regulate the flow rates of CH₄, C₂H₆, and C₂H₄. These MFCs operate within a control range of 0–200 SCCM (Standard Cubic Centimeters per Minute), featuring an accuracy of ±1% F.S. (Full Scale) and a repeatability of ±0.2% F.S. Downstream of the MFCs, the individual gas streams are routed through check valves before merging into a single pipeline. The mixture is then directed through a flow valve and introduced into the gas chamber via the “Gas in” port. Inside the chamber, the mixed gas diffuses into the AR-HCF sensing element via the open micro-channels of the PCF.

The Fabry−Pérot interferometer (FPI) sensor head was fabricated by fusion splicing a 5-cm-length anti-resonant hollow-core fiber (AR-HCF) between a single-mode fiber (SMF) and a solid-core photonic crystal fiber (PCF). The free spectral range (FSR) of the fiber FPI is 24 pm. The AR-HCF possesses a 7-ring cladding structure with a mode field diameter (MFD) of about 22 µm. This fiber is identical to the one used in Ref. [36]. To minimize coupling loss due to the mode mismatch between SMF and AR-HCF, the SMF core was thermally expanded to 20 µm before splicing. By adopting the splicing technique proposed by Ref. [37], a splicing loss of approximately 1 dB can be achieved. At the other side of the FPI, the solid-core PCF with core diameter of ~15 µm is used as the second reflector. Unlike previous designs that rely on femtosecond laser drilling for gas access [35], the intrinsic air holes (~3 µm) within the PCF cladding directly serve as gas inlets, allowing the gas mixture to diffuse effectively into the hollow core. To eliminate parasitic interference arising from Fresnel reflection at the PCF, the far-end of the PCF was manually angle-cleaved using a fiber cleaver. The final contrast of the sensor can be optimized to about 8 dB.

The absorption spectra of target analytes (CH₄/C₂H₆/C₂H₄) in the 1680–1681.2 nm region are shown in Fig. 2 (10000 ppm, 298 K, 1 atm, data from HITRAN/PNNL [38,39]). Although the main absorption peaks of C₂H₆, C₂H₄, and CH₄ are separated from each other by intervals of approximately 0.38 nm and 0.23 nm in the selected wavelength range, the spectral window (1680–1681.2 nm) is densely populated by numerous weaker rovibrational transitions and overlapping line wings. As illustrated in Fig. 2, the absorption intensities of these features are often only several times lower than those of the main peaks. Consequently, significant overlap persists, most notably between the C₂H₄ main peak and the C₂H₆ side peaks in the 1680.16–1680.27 nm region. Especially when the concentration of the interfering gas is high, these “minor” side peaks generate substantial background signals. This cross-interference is particularly pronounced in the second harmonic (2f) spectra, where the extended side lobes of the modulation waveform significantly broaden the effective spectral profile of each gas, inducing varying degrees of distortion and spectral shifts that render simple peak-tracking methods ineffective.

The spacing between the gas absorption peaks is only about 200 pm and is accompanied by numerous weaker side peaks, which together give rise to significant crosstalk in the 2f spectra.

In the gas mixture, the concentration of each gas analyte is in the range from 100000 to 800000 ppm (10%−80%). To exclude the possibility of the model overfitting to fixed analyte correlations rather than decoupling individual components, we employed a randomized experimental design strategy. The flow rates of the three analytes were modulated independently with a minimum control step of 5 SCCM. Due to the excessive number of potential permutations, a subset of combinations was randomly selected rather than conducting an exhaustive sweep, which resulted in an uneven distribution of concentration intervals. The volume of the gas chamber was about 12 mL, and the measurement and stabilization time for each set of data was 3 min. The corresponding photothermal signal in each gas ratio is also automatically measured and recorded. As illustrated in Fig. 3a, the resulting concentration distribution of the 460 samples covers the multidimensional state space comprehensively. Furthermore, the histogram of concentration difference in Fig. 3b confirms the non-uniformity arising from this strategy: the adjacent concentration differences fluctuate over a wide dynamic range (up to ~3.3% or 33153 ppm) rather than following a fixed step, providing diverse spectral superposition cases for robust model training.

Figure 4 illustrates the measured 2f signals of CH₄, C₂H₆, and C₂H₄, both individually (at ~10% concentration) and within a mixture. The pump laser was temperature-tuned by a 0.2 Hz sawtooth current to cover a wavelength range from 1680 nm to 1681.2 nm, with an output power of 40 mW, and the probe laser power was maintained at 15 mW. The signals were demodulated using a lock-in amplifier configured with a modulation frequency of 25 kHz and an integration time of 100 ms. As shown, when CH₄ and C₂H₆ of the same concentration are in a mixed gas, the mixed signal exhibits complex interference and shifting, deviating from a simple linear superposition compared to when they are present individually.

Using the system in Fig. 1 and the custom-developed control program, we recorded 460 sets of harmonic data of photothermal spectra for CH₄, C₂H₆, and C₂H₄ gas mixtures. The concentration range of the spectral samples is from 10% to 80%. The gas ratios and photothermal 2f signals were automatically adjusted and collected by the custom-developed program, and the total time for recording the 460 sets of samples was approximately 23 h. Each data set contains 520 2f data points, with a wavelength interval of about 1.7 pm between every two points. The data sets were used for PLSR analysis and model validation according to the process shown in Fig. 5.

4 Results and discussion

The data set consists of 460 samples acquired continuously over approximately 23 h in a laboratory environment with temperature controlled within ±1°C. Given that the gas concentration states were randomized throughout the experiment, a random 80/20 split was employed as the primary validation strategy. This approach properly evaluates the model's interpolation capability within the multi-dimensional concentration space, yielding an overall relative error of 0.319%. To further investigate the system’s robustness against potential long-term drift, we supplemented this with a time-ordered validation (using the first 80% for training and the final 20% for testing). Although the relative error increased to 0.803% in this scenario, it remains within an acceptable range. This comparison indicates that the random split strategy effectively allows the model to learn and compensate for the temporal spectral variations, including the slight wavelength drift of the DFB laser, distributed throughout the experiment. Furthermore, we analyzed the intensity variations at the 2f peak positions of each individual gas (CH₄: 314, C₂H₆: 502, C₂H₄: 386). As illustrated in Fig. 6, the peak absorption intensities exhibit slight deviations from ideal linearity, with

R 2

values ranging from 0.8871 to 0.9814. This observed nonlinearity is primarily attributed to the concentration-dependent variations in the thermophysical properties (e.g., heat capacity and thermal conductivity) and molecular relaxation dynamics of the gas mixture as the composition evolves [40].

These nonlinear trends shown in Fig. 6 also characterize the system’s operational boundary when one component approaches strong absorption or saturation. Although high concentrations (up to 80%) introduce spectral distortions and cross-interference, it can compensate for such matrix effects by capturing higher-order variations through LVs. Following this validation, the optimal number of LVs was determined by evaluating the MAE distribution through 5 repetitions of 5-fold cross-validation, spanning from 1 to 110 LVs. As illustrated in Fig. 7a, when the number of LVs is insufficient, the model underfits, resulting in a high relative error. As the LV dimensionality increases, the MAE decreases and stabilizes at approximately 35 LVs. Notably, as the number of LVs further increases beyond 50, both the error and the variance of the cross-validation MAE expand significantly. This widening error distribution is a hallmark of overfitting: higher-order LVs cease to extract meaningful physical absorption features and instead begin to fit high-frequency random instrument noise. Therefore, the PLSR model with 35 LVs was selected for the final evaluation.

Using the 35 LVs, the PLSR model was trained on the newly randomly generated training set and then validated using the validation set that was not involved in the training process. The results in Table 1 indicate that, over the 10%–80% concentration range of the analytes, the PLSR model achieves an average

R 2

of 0.99984 and an overall relative error of 0.319%. It is important to note that the concentration labels used for training were synthesized by mass flow controllers (MFCs) characterized by a fixed error of ±1% F.S. and a repeatability precision of ±0.2% F.S. The reported 0.319% relative error represents the algorithmic prediction performance evaluated against these synthesized labels, which themselves contain uncertainty propagated from the multi-channel MFCs.

The average concentration interval of the training sample data are about 1600 ppm. To assess how prediction performance depends on concentration, the validation set was partitioned into 10%-wide concentration intervals, and predictions were evaluated sequentially for each interval. As illustrated in Fig. 7b, the average relative error in the mid-concentration region is approximately 0.3%. Notably, for CH₄ concentrations in the 70%–80% range, the error drops to below 0.1%. In contrast, the relative error increases to approximately 0.6% in the 10%–20% interval. This slightly higher error in the low-concentration range is partly attributed to the amplified relative uncertainty of the MFC labels at low flow rates. This lower range typically coexists with a dominant gas; for instance, in a mixture dominated by approximately 61.5% C₂H₄, the C₂H₆ (~15.3%) and CH₄ (~23.0%) were retrieved with errors of 0.461% and 0.653%, demonstrating that the high concentration of one component does not compromise the accurate prediction of others.

Benefiting from this comprehensive multi-dimensional data set and robust model training, our system achieves exceptional overall performance. Compared with the previous two-component experiments based on smaller data sets of 63 samples [21], this study has significantly improved in terms of data scale and gas mixing schemes, thereby significantly reducing the prediction error of the model from 3.84% to 0.319%.

To optimize the model performance and reduce the relative error, we evaluated the influence of five spectral preprocessing methods: Standard Normal Variate (SNV), Multiplicative Scatter Correction (MSC), 1st/2nd derivatives, and Savitzky–Golay (SG) smoothing. To ensure the rigor of the preprocessing methodology, the implementation workflows and key parameters were strictly controlled. SNV normalized each spectrum by subtracting its mean and dividing by its standard deviation, while MSC corrected scattering effects using the mean spectrum of the data set as a reference for linear regression. The 1st and 2nd derivatives were calculated using the finite difference method. For SG smoothing, through a grid search optimization, the window length was set to 5 with a polynomial order of 4. Finally, prior to PLSR training, all spectral data (fixed length of 520 points) and target concentration vectors underwent z-score normalization (mean = 0, standard deviation = 1) to standardize the input features. The experimental results shown in Fig. 8 indicated that, among the five preprocessing methods, SG smoothing yields the best performance apart from using the raw data, with a minimum relative error of 0.361%. This marginal gain is likely due to the high spectral dimensionality and the relatively large sample size. The abundance of spectral information likely weakens the beneficial effect of preprocessing. In cases with a small sample size, low data sampling rate, or low precision, these preprocessing steps might play a more significant role.

In addition, we explored the influence of the training sample size on the prediction results. With 20% (92 groups) of the data fixed as the test set, the size of the training set was gradually and randomly increased from 5% (23 groups) to 80% (368 groups), as shown in Fig. 9. The error rapidly decreased before the sample size reached approximately 200; when the training samples were fewer than 100, the average relative error was about 1%.

For the scanning-window clipping experiment, we treated the 2f main peaks as the regions of primary model attention and centered the analysis windows on the peak indices 314, 386, and 502, corresponding to the three gases identified from the single-gas 2f spectra in Fig. 4. In Fig. 10, the secondary horizontal axis denotes the total span of data points obtained by symmetrically extending from each of these three centers; when the spans of two centers overlap, the overlapping segment is counted only once. As the window span around the 2f main peaks increases, the training error decreases monotonically with the expense of slower computation and higher resource usage, and the relative error is reduced to approximately 0.5% when the number of data points reaches 150 points.

Subsequently, further exploration was conducted on the data scale. We evaluated the impact of down-sampling the original spectra on the PLSR prediction performance, as summarized in Fig. 11. In this figure, the secondary horizontal axis in the figure represents the down-sampling factor, i.e., the factor by which the data length is reduced. When the down-sampling factor reaches ×16 (corresponding to 33 data points), the relative error decreases to approximately 0.4%. Therefore, the PLSR method does not have high requirements for the sampling rate of the data and can achieve accurate multi-component concentration prediction based on a relatively small number of informative spectral points.

Notably, this quantitative accuracy is superior to that of most commercial gas sensors of the same range (typically ±1%–5%). The above performance was achieved by a single laser, narrow linewidth 2f photothermal system in combination with PLSR multivariate regression, highlighting the robustness of the proposed optical platform and the effectiveness of the algorithm framework. Currently, the development of multi-component gas regression analysis has taken two major technical routes: one is the wide-spectrum method, which measures the continuous spectrum within a wide band and simultaneously resolves the concentrations of all components using algorithms; the other is the narrow-spectrum method, which selects specific narrow bands (such as a certain independent absorption line) for each gas for measurement, and in the case of multiple components, covers all targets through multi-channel or multiple measurements.

The above method demonstrates the use of overlapping regions in gas absorption spectra to accurately predict the concentrations of gases CH₄, C₂H₆, and C₂H₄ through PLSR modeling. This method might also be applied to predict more components in gas mixtures. By scanning the gas absorption data in the HITRAN database, multiple spectral regions can be found where the strong absorption lines of four gases overlap. For example, Fig. 12 shows that in the spectral range from 1568.1 nm to 1569.6 nm, seven gases have absorption lines with intensities greater than 10⁻⁵. In this region, it may be possible to measure seven types of gases using only one pump laser. In addition, because photothermal gas detection technology is highly precise, using training data with a wider concentration range is expected to further improve prediction accuracy. It is worth noting that while the PTI hardware exhibits high intrinsic sensitivity (ppm level) due to its superior signal-to-noise ratio, this study focuses on validating the decoupling capability within a high-concentration regime (10%–80%) to address specific industrial monitoring requirements. Therefore, the minimal detectable concentration reported here is constrained by the training data set boundaries rather than the sensor's instrumental noise floor.

5 Conclusions

In this work, quantitative multi-component gas analysis is realized in the overlapping region of NIR photothermal spectra. This method can achieve high-precision prediction of multiple gas components using only a single pump laser. Using a custom-developed automatic gas mixing and collection system, we collected 460 sets of second harmonic photothermal spectral data samples for gases CH₄, C₂H₆ and C₂H₄ at different concentration ratios. Based on the data, a PLSR model is then established for concentration retrieval. The model achieves an average

R 2

of 0.99984 and an overall relative error of 0.319% across the concentration range of 10% to 80%. Even when the training set is reduced to 180 samples, or the analysis is restricted to 150 attention-focused data points, or the spectra are down-sampled to only 33 points, the proposed method still maintains high prediction accuracy, with a relative error on the order of 0.5%. By searching the absorption data of gases included in HITRAN, this method is expected to be applicable for predicting up to seven mixed gas components and their concentrations.

In addition to gas components and concentrations, more physical quantities (such as temperature and pressure) could also be included in this model, which would provide an efficient method and a simple system for multi-component gas analysis in applications such as industrial production, medical diagnosis, and exploration.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Guo , X. , Huang , H. , Wang , H. , Cai , C. , Wang , Y. , Wu , X. , Wang , J. , Wang , B. , Zhu , B. , Xiang , Y. : Near-infrared spectroscopy and machine learning for fast quality prediction of bottle gourd. Foods 14(14), 2503(2025)

[2]	Batra , D. , Chen , M. , Meis , J. , Möhlenbruch , M.A. , Klose , C. , Ringleb , P. , Shah , V. , Bösel , J. , Schönenberger , S. : Feasibility of non-invasive neuromonitoring using BIS and NIRS during endovascular treatment of acute ischemic stroke. Neurol. Res. Pract 7(1), 60(2025)

[3]	Peng , C. , Wu , Z. , Zhang , S. , Lin , B. , Nie , L. , Tian , W. , Zang , H. : Online monitoring of water quality in industrial wastewater treatment process based on near-infrared spectroscopy. Water Res 275, 123165(2025)

[4]	Blanco , M. , Coello , J. , Iturriaga , H. , Maspoch , S. , de la Pezuela, C. : Near-infrared spectroscopy in the pharmaceutical industry. Analyst 123(8), 135R–150R(1998)

[5]	Yang , D. , Li , W. , Tian , H. , Chen , Z. , Ji , Y. , Dong , H. , Wang , Y. : High-sensitivity and in situ multi-component detection of gases based on multiple-reflection-cavity-enhanced raman spectroscopy. Sensors (Basel) 24(17), 5825(2024)

[6]	Makhoukhi , N. , Péré , E. , Creff , R. , Pouchan , C.: Determination of the composition of a mixture of gases by infrared analysis, chemometric methods. J. Mol. Struct. 744–747, 744–747(2005)

[7]	Yamamoto , Y. , Oshita , M. , Saito , S. , Kan , T. : Near-infrared spectroscopic gas detection using a surface plasmon resonance photodetector with 20 nm resolution. ACS Appl. Nano Mater 4(12), 13405–13412(2021)

[8]	Lou , X. , Xu , N. , Tian , B. , Xia , M. , Ba , D. , Dong , Y. : Multi-point multi-component gas sensing by frequency-modulated continuous-wave interferometry. Opt. Express 33(13), 27639–27650(2025)

[9]	Yin , X. , Hu , Q. , Yang , X. , Xu , K. , Liang , Y. , Hou , J. , Sampaolo , A. , Patimisco , P. , Spagnolo , V. , Zhang , D. , Liu , Z. , Xu , H. , Wu , H. : Ppb-level HF sensor in GeF₄ gas matrices with a 76 m TDLAS cell. Sens. Actuators B Chem 449, 139141(2026)

[10]	Yang , X. , Li , B. , Geng , X. , Zhao , T. , Yu , Q. , Hou , J. , Zhang , D. , Liang , Y. , Xu , K. , Wu , H. , Yin , X. : Ppb-level formaldehyde sensor utilizing a compact 3D-printed differential photoacoustic cell and a 320 nm UV laser. Photoacoustics 46, 100780(2025)

[11]

Kinjalk , K. , Paciolla , F. , Sun , B. , Zifarelli , A. , Menduni , G. , Giglio , M. , Wu , H. , Dong , L. , Ayache , D. , Pinto , D. , Vicet , A. , Baranov , A. , Patimisco , P. , Sampaolo , A. , Spagnolo , V. : Highly selective and sensitive detection of volatile organic compounds using long wavelength InAs-based quantum cascade lasers through quartz-enhanced photoacoustic spectroscopy. Appl. Phys. Rev 11(2), 021427(2024)

[12]	Yin , X. , Zhu , C. , Yang , X. , Xu , K. , Liang , Y. , Zhang , D. , Mao , W. , Wu , H. : Trace photoacoustic spectroscopy gas sensors for CO_x detection. Measurement 262, 120059(2026)

[13]	Bao , H. , Jin , W. , Hong , Y. , Ho , H.L. , Gao , S. , Wang , Y. : Phase-modulation-amplifying hollow-core fiber photothermal interferometry for ultrasensitive gas sensing. J. Lightwave Technol 40(1), 313–322(2022)

[14]	Kluczynski , P. , Gustafsson , J. , Lindberg , Å.M. , Axner , O. : Wavelength modulation absorption spectrometry — an extensive scrutiny of the generation of signals. Spectrochim. Acta B At. Spectrosc 56(8), 1277–1354(2001)

[15]	Rieker , G.B. , Jeffries , J.B. , Hanson , R.K. : Calibration-free wavelength-modulation spectroscopy for measurements of gas temperature and concentration in harsh environments. Appl. Opt 48(29), 5546–5560(2009)

[16]	Xu , D. , Cai , Q. , Zhang , G. , Ge , Q. , Xu , L. : Dual-gas sensor employing wavelength-stabilized tunable diode laser absorption spectroscopy and H-infinity filtering algorithm. Appl. Spectrosc 79(8), 1266–1278(2025)

[17]	Tian , X. , Cao , Y. , Chen , J. , Liu , K. , Wang , G. , Tan , T. , Mei , J. , Chen , W. , Gao , X. : Dual-gas sensor of CH₄/C₂H₆ based on wavelength modulation spectroscopy coupled to a home-made compact dense-pattern multipass cell. Sensors (Basel) 19(4), 820(2019)

[18]	Chen , F. , Jiang , S. , Ho , H.L. , Gao , S. , Wang , Y. , Jin , W. : Frequency-division-multiplexed multicomponent gas sensing with photothermal spectroscopy and a single NIR/MIR fiber-optic gas cell. Anal. Chem 94(39), 13473–13480(2022)

[19]	Gao , H. , Yang , Q. , Wang , Q. , Zhang , Z. , Lu , Y. , Wang , L. : Dual-gas sensor for ultra-close overlapping spectra based on second harmonic peak shift. Sens. Actuators B Chem 427, 137159(2025)

[20]	Zhao , X. , Sun , P. , Zhang , Z. , Wang , Q. , Wu , B. , Pang , T. , Xia , H. , Guo , Q. , Sun , M. : Method for demodulating the overlapping absorption spectra of CO and CH₄. Opt. Express 30(24), 43464–43479(2022)

[21]	Zhou , Y. , Jiang , M. , Dou , W. , Meng , D. , Wang , C. , Wang , J. , Wang , X. , Sun , L. , Jiang , S. , Chen , F. , Jin , W. : Narrow-band multi-component gas analysis based on photothermal spectroscopy and partial least squares regression method. Sens. Actuators B Chem 377, 133029(2023)

[22]	Wang , D. , Zhang , D. , Zhang , H. , Wang , Z. , Wang , J. , Xi , G. : Quantitative detection of multi-component chemical gas via MXene-based sensor array driven by triboelectric nanogenerators with CNN-GRU model. Sens. Actuators B Chem 417, 136101(2024)

[23]

Wang , L. , Lv , H. , Zhao , Y. , Wang , C. , Luo , H. , Lin , H. , Xie , J. , Zhu , W. , Zhong , Y. , Liu , B. , Yu , J. , Zheng , H. : Sub-ppb level HCN photoacoustic sensor employing dual-tube resonator enhanced clamp-type tuning fork and U-net neural network noise filter. Photoacoustics 38, 100629(2024)

[24]	Tian , L. , Xia , J. , Kolomenskii , A.A. , Schuessler , H.A. , Zhu , F. , Li , Y. , He , J. , Dong , Q. , Zhang , S. : Gas phase multicomponent detection and analysis combining broadband dual-frequency comb absorption spectroscopy and deep learning. Commun. Eng 2(1), 54(2023)

[25]	Chowdhury , M.A.Z. , Oehlschlaeger , M.A. : Deep learning for gas sensing via infrared spectroscopy. Sensors (Basel) 24(6), 1873(2024)

[26]	Swinehart , D.F. : The Beer-Lambert law. J. Chem. Educ 39(7), 333(1962)

[27]	Balabin , R.M. , Lomakina , E.I. : Support vector machine regression (SVR/LS-SVM)-an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data. Analyst 136(8), 1703–1712(2011)

[28]	Jin , W. , Cao , Y. , Yang , F. , Ho , H.L. : Ultra-sensitive all-fibre photothermal spectroscopy with large dynamic range. Nat. Commun 6(1), 6767(2015)

[29]	Wold , S. , Sjöström , M. , Eriksson , L. : PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst 58(2), 109–130(2001)

[30]	Barnes , R.J. , Dhanoa , M.S. , Lister , S.J. : Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl. Spectrosc 43(5), 772–777(1989)

[31]	Geladi , P. , MacDougall , D. , Martens , H. : Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl. Spectrosc 39(3), 491–500(1985)

[32]	Norris , K. , Williams , P. : Optimization of mathematical treatments of raw near-infrared signal in the. Cereal Chem 61(2), 158–165(1984)

[33]	Savitzky , A. , Golay , M.J.E. : Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem 36(8), 1627–1639(1964)

[34]	Schafer , R.W.: What is a Savitzky−Golay filter? IEEE Signal Process. Mag. 28(4), 111–117 (2011)

[35]	Yang , F. , Tan , Y. , Jin , W. , Lin , Y. , Qi , Y. , Ho , H.L. : Hollow-core fiber Fabry−Perot photothermal gas sensor. Opt. Lett 41(13), 3025–3028(2016)

[36]	Zhao , P. , Zhao , Y. , Bao , H. , Ho , H.L. , Jin , W. , Fan , S. , Gao , S. , Wang , Y. , Wang , P. : Mode-phase-difference photothermal spectroscopy for gas detection with an anti-resonant hollow-core optical fiber. Nat. Commun 11(1), 847(2020)

[37]	Xiao , L. , Demokan , M.S. , Jin , W. , Wang , Y. , Zhao , C.L. : Fusion splicing photonic crystal fibers and conventional single-mode fibers: microhole collapse effect. J. Lightwave Technol 25(11), 3563–3574(2007)

[38]

Gordon , I.E. , Rothman , L.S. , Hargreaves , R.J. , Hashemi , R. , Karlovets , E.V. , Skinner , F.M. , Conway , E.K. , Hill , C. , Kochanov , R.V. , Tan , Y. , Wcisło , P. , Finenko , A.A. , Nelson , K. , Bernath , P.F. , Birk , M. , Boudon , V. , Campargue , A. , Chance , K.V. , Coustenis , A. , Drouin , B.J. , Flaud , J.M. , Gamache , R.R. , Hodges , J.T. , Jacquemart , D. , Mlawer , E.J. , Nikitin , A.V. , Perevalov , V.I. , Rotger , M. , Tennyson , J. , Toon , G.C. , Tran , H. , Tyuterev , V.G. , Adkins , E.M. , Baker , A. , Barbe , A. , Canè , E. , Császár , A.G. , Dudaryonok , A. , Egorov , O. , Fleisher , A.J. , Fleurbaey , H. , Foltynowicz , A. , Furtenbacher , T. , Harrison , J.J. , Hartmann , J.M. , Horneman , V.M. , Huang , X. , Karman , T. , Karns , J. , Kassi , S. , Kleiner , I. , Kofman , V. , Kwabia–Tchana , F. , Lavrentieva , N.N. , Lee , T.J. , Long , D.A. , Lukashevskaya , A.A. , Lyulin , O.M. , Makhnev , V.Y. , Matt , W. , Massie , S.T. , Melosso , M. , Mikhailenko , S.N. , Mondelain , D. , Müller , H.S.P. , Naumenko , O.V. , Perrin , A. , Polyansky , O.L. , Raddaoui , E. , Raston , P.L. , Reed , Z.D. , Rey , M. , Richard , C. , Tóbiás , R. , Sadiek , I. , Schwenke , D.W. , Starikova , E. , Sung , K. , Tamassia , F. , Tashkun , S.A. , Vander Auwera, J. , Vasilenko , I.A. , Vigasin , A.A. , Villanueva , G.L. , Vispoel , B. , Wagner , G. , Yachmenev , A. , Yurchenko , S.N. : The HITRAN2020 molecular spectroscopic database. J. Quant. Spectrosc. Radiat. Transf 277, 107949(2022)

[39]	Sharpe , S.W. , Johnson , T.J. , Sams , R.L. , Chu , P.M. , Rhoderick , G.C. , Johnson , P.A. : Gas-phase databases for quantitative infrared spectroscopy. Appl. Spectrosc 58(12), 1452–1461(2004)

[40]	Schilt , S. , Besson , J.P. , Thévenaz , L. : Near-infrared laser photoacoustic detection of methane: the impact of molecular relaxation. Appl. Phys. B 82(2), 319–328(2006)

RIGHTS & PERMISSIONS

The author(s)

PDF (3059KB)

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Youth editorial board

Abstracting / indexing

Cover gallery

Contact us

Browse

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submission

Call for papers

Guidelines for authors

Acknowledgement

Article processing charge

Open access policy

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Principle and method

3 Experiment of multicomponent gas analysis

4 Results and discussion

5 Conclusions

References

RIGHTS & PERMISSIONS