Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques

Yang, Xin; Wei, Zihan; Li, Lehao; Yang, Xiaoming; Han, Jimei; Ming, Meiling; Wang, Guibin; Cao, Fuliang; Zhou, Kai; Fu, Fangfang

doi:10.1186/s13007-025-01439-9

Research
Open access
Published: 16 October 2025

Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques

Plant Methods volume 21, Article number: 130 (2025) Cite this article

134 Accesses
Metrics details

Abstract

The photosynthetic pigments – chlorophyll a (Chl a), chlorophyll b (Chl b), and carotenoids (Car) – in juvenile ginkgo leaves are crucial for growth monitoring as they reflect physiological status and directly influence the biosynthesis of bioactive compounds such as flavonoids and terpene lactones. Traditional pigment measurement methods (acetone/ethanol extraction, SPAD, etc.) are inadequate for large-scale dynamic monitoring and high-throughput phenotyping analysis. To address this, this study developed a non-destructive prediction model for Chl a, Chl b, and Car contents in ginkgo seedlings using hyperspectral imaging combined with machine learning algorithms, which is applicable to seedlings with different genetic backgrounds and at various color development phases. A total of 3,460 seedlings from 590 families, sourced from ancient trees across 19 provinces in China, were analyzed using hyperspectral imaging and biochemical pigment quantification. A phased optimization strategy was implemented, including preprocessing method screening, model comparison, and feature wavelength selection. Among the four tested preprocessing methods (raw reflectance, normalization, first derivative, and second derivative), normalization significantly improved model accuracy. The Adaptive Boosting (AdaBoost) algorithm outperformed partial least squares regression (PLSR) and random forest (RF), achieving coefficients of determination (R²) above 0.83 and the ratio of performance to deviation (RPD) values exceeding 2.4 across all pigments. Compared with competitive adaptive reweighted sampling (CARS), the successive projections algorithm (SPA) demonstrated more effective spectral dimensionality reduction while preserving predictive power. This framework enables efficient, accurate, and scalable pigment phenotyping in Ginkgo biloba, offering technical support for large-scale germplasm screening and precision breeding.

Introduction

Ginkgo biloba L., recognized as one of the most ancient extant seed plants, is celebrated as a “living fossil” in the plant kingdom. This species possesses not only a unique evolutionary position but also produces valuable secondary metabolites, including flavonoids and terpene lactones, while demonstrating remarkable environmental adaptability. Consequently, precise monitoring of its physiological status represents a critical prerequisite for the scientific conservation and sustainable utilization of this precious germplasm resource. In medicinal applications, the leaves of juvenile ginkgo trees (1–5 years old) serve as the primary source of bioactive compounds such as flavonoids and terpene lactones, whose biosynthesis and accumulation efficiency are closely correlated with the plant’s photosynthetic capacity and metabolic activity [1,2,3].

The photosynthetic pigments in leaves - chlorophyll a, chlorophyll b, and carotenoids- are fundamental components of the photosynthetic apparatus, with their contents and ratios directly reflecting plant physiological status. Chlorophyll (particularly Chl a) represents a key indicator for assessing photosynthetic efficiency and biomass accumulation potential [4, 5], whereas carotenoids facilitate light harvesting and play vital roles in photoprotection mechanisms, mitigating photooxidative damage to photosynthetic systems [6, 7]. Given that seedling photosynthetic performance and redox homeostasis directly influence medicinal compound biosynthesis flux, accurate monitoring of these pigment dynamics proves essential for both selecting superior ginkgo germplasms (characterized by high photosynthetic efficiency, strong stress resistance, and enhanced medicinal value) and optimizing cultivation practices to improve economic yield [8,9,10].

Conventional pigment quantification methods (e.g., Arnon’s spectrophotometry approach) rely on destructive extraction using organic solvents (acetone/ethanol) [11]. While providing accurate measurements, these methods present three major limitations: (1) their destructive nature precludes long-term monitoring of valuable ancient trees or endangered germplasms; (2) labor-intensive procedures, including grinding, > 24-hour dark extraction, and centrifugation, hinder high-throughput screening in large germplasm repositories; and (3) inherent time delays prevent real-time tracking of plant responses to environmental fluctuations, thereby constraining precise germplasm evaluation and breeding progress.

Hyperspectral non-destructive detection technology presents a promising solution to these challenges, especially when integrated with machine learning algorithms such as Partial Least Squares Regression (PLSR), Random Forest (RF), 1-D Convolutional Neural Networks (1-D CNN), and Adaptive Boosting (AdaBoost), among others [12,13,14,15]. However, existing techniques exhibit specific shortcomings: chlorophyll meters (e.g., SPAD-502) provide rapid measurements but yield only relative indices that are susceptible to interference from leaf structural parameters (e.g., thickness and water content[16,17,18,19,20] ; while drone/satellite remote sensing covers extensive areas, its limited spatial resolution cannot accurately retrieve leaf-scale physiological parameters and remains vulnerable to atmospheric conditions [21]. In contrast, proximal hyperspectral techniques employing portable spectrometers or hyperspectral imaging devices can acquire continuous narrowband (1–3 nm resolution) reflectance spectra from leaves or canopies, enabling precise analysis of pigment-specific absorption features in both the blue-violet (430–450 nm) and red-edge (680–720 nm) regions [22].

Current hyperspectral techniques for the quantitative analysis of leaf pigments present significant methodological limitations across different application scenarios, as systematically compared in Table 1. In precision agriculture, empirical regression models based on vegetation indices have demonstrated satisfactory predictive performance for specific crop varieties, yet they struggle to account for interference from complex interactions of secondary metabolites and intraspecific genetic variations [24,25,26,27,28]. For forest health monitoring, aerial hyperspectral platforms [29,30,31,32,33] are constrained by spatial resolution limitations (> 1 m/pixel), resulting in substantial dilution of critical pigment-structure covariation characteristics at the leaf level. Notably, while a dedicated study on Ginkgo biloba [33] innovatively integrated UAV-LiDAR multi-source data fusion, it still suffers from insufficient sample representativeness (n < 100) and limited spectral-spatial resolution. Furthermore, Yue et al.’s hyperspectral modeling of seedlings using SPAD chlorophyll relative values not only lacked key physiological parameters for carotenoids but also showed clear applicability limitations in large-scale precision assessment of Ginkgo biloba germplasm resources [34].

These methodological shortcomings collectively highlight two critical unresolved issues in Ginkgo germplasm evaluation: (1) inadequate genetic coverage in existing sample sets, which fail to represent the diversity of natural Ginkgo populations, and (2) a disconnect between static monitoring approaches and dynamic physiological processes, as most studies rely on single-time-point sampling data, whereas Ginkgo leaf coloration is fundamentally a complex physiological process involving the dynamic balance of chlorophyll synthesis–degradation and carotenoid metabolism.

This study bridges these methodological gaps by pioneering the development of quantitative retrieval models for Chl a, Chl b, and Car in Ginkgo biloba leaves using portable hyperspectral imaging. Our comprehensive sampling framework incorporates three key innovations: (1) an unprecedented sample size (n = 3,460 seedlings) that ensures exceptional model robustness across genetic and environmental variations; (2) representative genetic diversity encompassing progeny from 590 ancient ginkgo trees across 19 Chinese provinces (25°–41°N latitude), covering the species’ major natural distribution range; and (3) temporally optimized sampling spanning the 20-day autumnal color transition period to capture the complete pigment remodeling dynamics during the critical chlorophyll degradation phase, thereby maximizing model generalizability across developmental phases.

To systematically address these gaps, our study was designed with a phased analytical framework (Fig. 1). First, to minimize non-biochemical spectral noise and enhance pigment-specific absorption features, we evaluated multiple spectral preprocessing methods, including raw reflectance, normalization, and first- and second-derivative transformations. Second, we compared the performance of representative machine learning algorithms (PLSR, RF, and AdaBoost) for their capacity to capture the complex, nonlinear relationships between spectral data and pigment concentrations across diverse genotypes. Finally, we employed feature wavelength selection algorithms (SPA and CARS) to refine the most informative spectral variables, aiming to improve model efficiency and portability for future high-throughput field applications. This structured approach was designed to ensure that our final models achieve not only high predictive accuracy but also essential robustness and generalizability across the genetic and temporal variability inherent in large-scale germplasm collections.

Table 1 Comparative review of core methodologies and limitations in hyperspectral pigment Estimation

Full size table

Materials and methods

Plant materials and sampling protocol

The experimental materials were collected from the Ginkgo biloba Ancient Tree Germplasm Resource Nursery at Nanjing Forestry University. The geographical origins of the germplasm sources and the nursery locations are mapped in Supplementary Figure S1. The germplasm was initially acquired in 2021, and seedlings were cultivated in a controlled greenhouse environment at the Xiashu Practice Forest Station (119.218°E, 32.120°N) in Jurong, Jiangsu Province, starting in March 2022.

To establish a hyperspectral prediction model capable of accommodating diverse leaf senescence phases, sampling was strategically conducted during the critical natural senescence period from late October to late November. The samples were divided into four phases according to sampling time: Phase-1 (October 31–November 5) represented the initial senescence transition; Phase-2 (November 6–11) captured mid-phase senescence development; Phase-3 (November 12–15) reflected advanced senescence; and Phase-4 (November 20) included supplemental early-yellowing specimens to ensure complete phenotypic representation. Hyperspectral imaging was performed on 3,460 seedlings, representing 590 distinct families derived from ancient trees across 19 provinces in China. Following the hyperspectral data acquisition, four leaf discs (14 mm diameter) were sampled from each seedling, one from each cardinal direction, to ensure representative tissue collection. The excised leaf discs were immediately placed in centrifuge tubes, flash-frozen on dry ice under dark conditions, and transported to the laboratory for subsequent biochemical analysis (Fig. 1b).

Hyperspectral imaging system

We used a portable hyperspectral imaging system for non-destructive spectral measurement of Ginkgo biloba seedlings (Fig. 1a). The core component was an Image-λ-V10E-HR hyperspectral imager (Dualix Spectral Imaging, China), covering 350–1000 nm with 176 spectral channels. The imaging system was integrated with an HSIA-RAK100-IMS motorized rotary stage that enabled comprehensive scanning of the plant samples. For illumination, two halogen lamps provided stable light output across 350–2500 nm, powered by an uninterrupted power supply to maintain consistent lighting conditions during data acquisition.

The system incorporated critical calibration components including a PTFE-coated white reference panel (Dualix Spectral Imaging) with 99% reflectance for spectral calibration, and black velvet background material exhibiting near-zero reflectance (< 0.1%) across the working spectral range to minimize interference. All operations were controlled via a Lenovo ThinkPad T14 computer running SpecVIEW software (v2.9.3.8) for image acquisition and spectral calibration, whereas MATLAB 2022b and Python 3.10 were employed for region-of-interest analysis and machine learning model development. The entire setup was enclosed in a light-isolated darkroom to eliminate ambient light interference and ensure measurement consistency.

During operation procedures, the seedling samples were sequentially positioned at the center of the imaging platform. The hyperspectral camera, mounted on the rotary stage and precisely aligned using checkerboard calibration, acquired top-down images through a left-to-right scanning motion with approximately 15 s per scan. This brief acquisition time ensured negligible thermal effects from the illumination system on the plant samples. The system output consisted of three-dimensional hyperspectral reflectance cubes containing 176 spectral bands across the 350–1000 nm range, with each band corresponding to a specific wavelength interval. Automated data collection and preliminary processing were performed through the integrated SpecVIEW control software.

The raw hyperspectral images were calibrated using white and dark references to remove system noise. The white reference was acquired from a standard calibration panel, whereas the dark reference was obtained with the lens covered and lights off. The reflectance (R) was calculated as:

$${\text{R}}=\frac{{{R_{raw}} - Doc}}{{Woc - Doc}}$$

(1)

where R_raw is the raw sample value, Doc is the dark reference, and Woc is the white reference. This correction eliminates sensor noise and ensures accurate reflectance measurements.

Data preprocessing

To extract spectral reflectance information from ginkgo seedlings, we processed the white/dark-corrected hyperspectral images using MATLAB R2022b. The procedure involved: (1) selecting a characteristic wavelength band (747 nm) for single-band image processing, (2) applying Otsu’s threshold method combined with morphological opening operations to remove background interference and eliminate small-area noise, and (3) generating a target mask based on the segmentation results. This mask was then used to extract the leaf regions of interest (ROIs). For each ROI, we calculated the mean reflectance across all pixels at each wavelength, generating a one-dimensional average reflectance spectrum vector representing each sample (Fig. 2).

To mitigate interference from instrument noise, illumination variations, and scattering effects in the hyperspectral data while evaluating different preprocessing approaches, we implemented the following spectral preprocessing methods: (1) Raw reflectance spectra (untreated average reflectance data), (2) normalization (Min–Max scaling to [-1, 1] range for dimension unification), (3) first derivative (emphasizing spectral slope features while suppressing baseline drift), and (4) second derivative (enhancing spectral curvature features and precisely identifying absorption peaks/valleys). This comprehensive preprocessing strategy significantly improved subsequent modeling data quality.

Pigment extraction and quantification

As shown in Fig. 1b, the leaf discs were subjected to ethanol extraction using immersion in 9 mL of 96% ethanol and maintained in complete darkness for > 24 h until complete chlorophyll bleaching was achieved. Absorbance measurements were conducted at 665 nm (A₆₆₅), 649 nm (A₆₄₉), and 470 nm (A₄₇₀) using a SPECORD 200PLUS spectrophotometer with ethanol as blank [36].

Pigment concentrations (µg·mL⁻¹ extract) were calculated as:

$${C_a}=13.95{A_{665}} - 6.88{A_{649}}$$

(2)

$${C_b}=24.96{A_{649}} - 7.32{A_{665}}$$

(3)

$${C_{x+c}}=(1000{A_{470}} - 2.05{C_a} - 114.8{C_b})/245$$

(4)

where $\:{C}_{a}$ denotes the concentration of chlorophyll a, $\:{C}_{b}$ represents the concentration of chlorophyll b, and $\:{C}_{x+c}$ corresponds to the concentration of carotenoids.

The pigment content per unit leaf area (µg·cm⁻²) was determined as follows:

$$Cont=\left( {C \times V} \right)/\left( {A \times N} \right)$$

(5)

where C represents the pigment concentration, V is the extraction volume (9 mL), A is the disc area [π×(0.7 cm)²], and N is the number of discs per sample (N = 4).

The obtained pigment contents were analyzed in four phases. Pigment concentrations were calculated using Excel, with all subsequent statistical analyses (Kruskal–Wallis H tests and post hoc Dunn’s tests with Bonferroni correction), and data visualization performed using Python 3.10 to assess phase-dependent differences and identify specific intergroup variations.

Dataset partition

The Kennard-Stone algorithm was employed to partition the 3,460 seedling samples into calibration (n = 2,075) and prediction (n = 1,385) sets based on spectral feature space distances[37]. This approach maximizes spectral variability representation in the calibration set, while ensuring complete separation from the prediction set. The algorithm iteratively selects samples with maximal Euclidean distances, guaranteeing comprehensive coverage of spectral characteristics. This rigorous partitioning enhances model robustness by preventing overfitting and enabling reliable evaluation of generalization performance using the independent prediction set [38, 39].

Modeling methods and evaluation metrics

This study adopted a stepwise optimization approach to develop hyperspectral prediction models for pigment content in ginkgo seedling leaves (Fig. 1c), systematically implementing three key optimization phases: (1) preprocessing method selection, (2) modeling strategy comparison, and (3) feature wavelength identification. Throughout the process, a unified model evaluation framework was consistently applied to ensure comparability and consistency across different modeling strategies.

All phases adopted a unified model evaluation framework. The calibration set was evaluated using two key metrics including the cross-validated coefficient of determination ($\:{R}_{CV}^{2}$) to characterize model goodness-of-fit (values closer to 1 indicating better fit) and the standard error of cross-validation ($\:SECV$) to quantify prediction deviation (smaller values indicating lower deviation between predicted and actual values). For the prediction set evaluation, an independent validation set was used to assess model generalization capability, with core metrics comprising the coefficient of determination ($\:{R}_{p}^{2}$) to quantify the model’s explanatory power for unknown sample variability (higher values indicating better performance), root mean square error of prediction (RMSEP) reflecting the average deviation between predicted and true values (smaller values indicating higher accuracy), and the ratio of performance to deviation ($\:RPD$), defined as the ratio of prediction set standard deviation to RMSEP (models with $\:RPD$ >2 generally considered to have strong predictive ability).

Partial least squares regression (PLSR) was selected as the benchmark model to evaluate and compare the effects of different preprocessing methods. PLSR establishes linear regression relationships between spectra and target parameters by extracting latent variables (LVs), demonstrating effective handling of high-dimensional collinear data [39]. The specific procedure involved constructing PLSR models for each of the four preprocessed datasets, followed by comprehensive performance comparison using calibration set metrics ($\:{R}_{CV}^{2}$, $\:SECV$) and prediction set ($\:{R}_{p}^{2}$, RMSEP, $\:RPD$). The optimal preprocessing method was selected based on superior performance across all evaluation indicators in both calibration and prediction sets, thereby establishing a high-quality data foundation for subsequent modeling.

After applying the above optimal preprocessing methods, three representative models including PLSR, random forest (RF), and adaptive boosting (AdaBoost), were systematically compared to evaluate different modeling strategies comprehensively. PLSR, as a classical multivariate statistical regression method, excels in handling high-dimensional, multicollinear spectral data. Its core advantage lies in extracting the most explanatory latent variables from the original spectral variables and target parameters to establish linear regression models, simultaneously achieving dimensionality reduction while maximizing covariance information, making it a widely used benchmark algorithm in this field.

RF, as a decision tree algorithm based on bagging ensemble strategy, enhances prediction robustness and generalization capability by constructing numerous uncorrelated decision trees and employing voting (classification) or averaging (regression) [40]. It inherently handles nonlinear relationships, automatically evaluates feature importance, and demonstrates strong resistance to overfitting and noisy data.

AdaBoost, based on boosting ensemble strategy, operates by iteratively training a series of “weak learners” (e.g., simple decision trees) and adjusting sample weights and model weights according to previous model errors, enabling subsequent models to focus more on difficult-to-predict samples [41]. This mechanism allows gradual improvement in the overall model prediction accuracy, particularly excelling in learning complex patterns.

The selection of PLSR, RF and AdaBoost for comparison aimed to cover different modeling paradigms from linear (PLSR) to nonlinear (RF, AdaBoost) approaches, while utilizing ensemble learning (RF, AdaBoost) to enhance model robustness, ultimately identifying the most suitable predictive model for the data characteristics and modeling objectives.

After determining the optimal preprocessing method and predictive model, the successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) were employed for characteristic wavelength selection to reduce data dimensionality (thereby enhancing model interpretability and practicality) and potentially improve performance. SPA selects a set of wavelengths with rich information content and minimal mutual collinearity from full spectral bands [42]. Its core principle involves iteratively selecting new wavelengths where the projection vector modulus in the subspace spanned by already selected wavelength vectors is maximized, thereby minimizing redundant information between wavelengths and obtaining a highly representative, low-correlation feature subset.

CARS algorithm simulates the “survival of the fittest” principle, combining Monte Carlo sampling and exponentially decreasing function (EDF) for wavelength selection [43]. Each sampling iteration involves three steps. First, determining the proportion of wavelengths to eliminate in the current round using EDF. Second, dynamically calculating retention weights for each wavelength based on the absolute value of their regression coefficients in the PLSR model (or selected base model) (higher absolute values corresponding to greater weights). Finally, performing weighted sampling to eliminate wavelengths. By cross-validating model performance corresponding to wavelength subsets generated in each iteration, the optimal subsets demonstrating the best cross-validation performance is ultimately selected as characteristic wavelengths.

Results

Temporal changes in chlorophyll and carotenoid contents

A comprehensive dataset comprising 3,460 seedlings was analyzed for photosynthetic pigment content dynamics. The samples were systematically categorized into four distinct temporal phases to track progressive senescence patterns. In particular, Phase-1 (n = 566) represented the initial senescence transition; Phase-2 (n = 1,276) captured mid-phase senescence development; Phase-3 (n = 1,600) reflected advanced senescence; and Phase-4 (n = 18) included supplemental early-yellowing specimens to ensure complete phenotypic representation. As visually documented in Fig. 3, this sampling window captures a continuous senescence gradient spanning from dark green to yellowing leaves, with standardized imaging conditions ensuring accurate color representations.

As shown in Fig. 4, the stacked bar chart reveals significant temporally declining trends in Chl a, Chl b, and Car contents. Phase-1 exhibited peak levels, with the most drastic decline toward Phase-2; values stabilized between Phase-2 and Phase-3; and Phase-4 showed minima due to high yellowing-leaf prevalence.

The Kruskal–Wallis H test confirmed highly significant overall differences across senescence phases (p < 0.001), with effect sizes (ε² = 0.094–0.098) indicating moderate strength. Bonferroni-corrected Dunn’s post hoc tests delineated specific differentiation patterns: Phase-1 differed significantly from Phase-2/3/4 (p < 10⁻¹⁰); all pigments differed between Phase-2 and Phase-3 (p < 0.05), with Chl a most distinct (p = 1.7 × 10⁻⁴); Phase-2 and Phase-3 significantly differed from Phase-4 in all pigments (p < 0.05) (Fig. 5).

Spectral characteristics and preprocessing effects

Raw spectra contained valuable information but require preprocessing to mitigate noise and baseline drift.

Regarding raw spectral characteristics (Fig. 6a), the 375–1000 nm reflectance curve shows key features: a 400 nm absorption valley (short-wave absorption), a 550 nm chlorophyll reflection peak, a 625 nm inflection valley, and a dominant 725–750 nm near-infrared peak. Subtle fluctuations at 800–900 nm and 950 nm reflect intrinsic responses with potential noise. This provides fundamental but noisy baseline data.

In terms of normalization effects (Fig. 6b), compression of reflectance magnitude (≈ -0.075 to 0.075) reduces absolute differences while preserving shape. The 375–450 nm valley and 600–650 nm relative valley correspond to original features at 400 nm and 625 nm, while 500–550 nm and 700–750 nm peaks retain main peak morphology. This enables cross-sample shape comparison by eliminating reflectance-level variations.

In terms of first-derivative effects (Fig. 6c), this transformation highlights rate-of-change features: negative peaks at 400/600 nm mark valley inflection points; positive peaks at 500 nm and 675–700 nm indicate maximal ascent slopes. Critical wavelength dynamics (e.g., chlorophyll-related red-edge shifts) are amplified.

In terms of second-derivative effects (Fig. 6d), curvature features (≈ − 0.015 to 0.015) are resolved: negative peaks at 375–425/525/700 nm show maximal bending; positive features at 475/675 nm and 850–950 nm capture subtle curvature. Fine structures such as carotenoid-chlorophyll competition in 450–550 nm are enhanced despite noise sensitivity.

With regard to the integrated impact, combined preprocessing mitigates sample variations while enhancing discriminative features: normalization standardizes shapes, first-derivative extracts trends, second-derivative refines curvature. The transformed spectra improve feature interpretability for quantitative analysis.

Optimal preprocessing results

This study evaluated four spectral preprocessing methods, including raw spectra, normalization, first-order derivative (1st

derivative) and second-order derivative (2nd derivative), for their impact on partial least squares regression (PLSR) models of predicting Chl a, Chl b, and Car content.

Derivative preprocessing yielded only marginal improvements over raw spectra, with most metrics remaining comparable or slightly degraded across pigments. In contrast, the normalization strategy demonstrated unequivocal superiority, substantially outperforming all other methods in every evaluation dimension (Fig. 7). Comprehensive metrics (Table 2) confirmed its robust advantages: normalization significantly enhanced prediction accuracy (systematically higher $\:{R}_{p}^{2}$), reduced modeling errors (lower $\:{RMSEP}$/ $\:SECV$), and improved model stability (higher $\:RPD$) for all pigments. The most pronounced gains occurred in the carotenoid prediction, although substantial improvements consistently manifested across all biochemical parameters.

Table 2 Predictive performance of PLSR models with different spectral preprocessing methods

Full size table

Model comparison and optimal model determination

Using hyperspectral data preprocessed via normalization, this study systematically assessed the predictive capabilities of PLSR, RF, and AdaBoost models for Chl a, Chl b, and Car contents (Table 3). All pigment datasets exhibited consistent model performance ranking: AdaBoost > RF > PLSR. Compared with PLSR, AdaBoost demonstrated comprehensive superiority, achieving a mean 10.9% improvement in the prediction set R² and a 22.7% increase in RPD, with carotenoid prediction reaching an RPD of 2.53. Compared with RF, AdaBoost delivered respective mean enhancements of 2.4% in prediction R² and 5.8% in RPD. Critical error metrics confirmed these advantages, including a 20% reduction in prediction error (RMSEP) for Chl a.

Table 3 Comparison of machine learning model performance for estimating pigment contents

Full size table

As demonstrated in Fig. 8, AdaBoost’s measured versus predicted values cluster tightly along the 1:1 line for both training and testing sets, confirming its exceptional fitting and generalization capabilities. Across all pigments, the prediction set exhibits superior aggregation around the ideal line compared to the calibration set, particularly within typical concentration ranges. This consistency is occasionally interrupted by minor dispersion at high-concentration extremes. Notably, the prediction set fitting slopes approach the ideal value of 1.0, indicating reduced systematic bias.

These distribution patterns corroborate quantitative metrics, whereas prediction set R² values appear suppressed due to range concentration, actual prediction accuracy proves higher, evidenced by consistently lower RMSEP versus calibration SECV and visually reduced dispersion. AdaBoost’s dynamic weighting mechanism effectively captures nonlinear relationships between spectral features and pigment concentrations, enabling significant accuracy improvements (RPD > 2.4) in agriculturally critical monitoring ranges. The established superiority of AdaBoost, validated through distribution compactness, error reduction (21% average RMSEP decrease versus PLSR), and stability (RPD threshold compliance), with providing a robust technical foundation for crop physiological monitoring.

Results of feature wavelength selection

The feature wavelength selection results using successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) are presented in Supplementary Figures S2 and S3, respectively. The results of the quantitative analysis (Table 4) demonstrate the systematic superiority of SPA. For chlorophyll estimation, SPA-selected 45 wavelengths achieved a prediction R² of 0.8207, exceeding CARS by 0.94% points, while improving model robustness with a 2.12% higher RPD value. The chlorophyll b results further highlight the advantages of SPA, with 44 wavelengths yielding 0.8117 prediction R² and 0.0008 SECV, representing a 1.08% points accuracy improvement and an 11.1% error reduction, compared with CARS’s sparse 9-wavelength model. The most significant improvement occurred in carotenoid prediction, where SPA’s 57 wavelengths outperformed CARS (11 wavelengths), by 3.29% points in R² and 7.74% in RPD while maintaining an equivalent RMSEP. The combined SPA–CARS approach (Supplementary Table S4) further reduced the number of selected wavelengths but at the cost of decreased predictive accuracy, suggesting that overly aggressive dimensionality reduction may eliminate key spectral information and compromise model performance.

Table 4 Comparative modeling metrics across calibration and prediction sets: CARS vs. SPA for pigments quantification

Full size table

Wavelength distribution patterns (Fig. 9) reveal the underlying mechanisms: SPA achieves comprehensive spectral coverage, preserving chlorophyll a’s red-edge sensitive region (680 nm) and near-infrared senescence-indicating peak (747 nm) while precisely targeting chlorophyll b’s primary absorption features (450–500 nm) and secondary responses (600–700 nm) and simultaneously capturing carotenoid’s 470 nm primary and 490 nm secondary peaks. In contrast, CARS focuses on local high-variance region fragment spectral fingerprints. In terms of chlorophyll b models, CARS retains only 9 wavelengths, losing critical 600–650 nm absorption information, and carotenoid characterization is compromised by the omission of the 490 nm secondary peak.

Model simplification validation (Fig. 10) confirms that despite a 1.2 to 2.5% point average prediction R² reduction versus full-spectrum models, wavelength reduction (ranging from 87 to 94%) significantly enhances practicality. SPA maintains high accuracy (0.8207 R² for chlorophyll a) when only 45 wavelengths are used (10.4% of the full spectrum), with fitting curves approaching ideal responses, whereas CARS results in greater prediction bias despite the use of fewer wavelengths. Notably, SPA achieves an RPD value of 2.4583 for carotenoid prediction, indicating that feature selection enhances robustness by eliminating non-informative spectral variables. This approach preserves core predictive performance while establishing theoretical foundations for portable vegetation monitoring devices.

Spatiotemporal inversion of pigment content

The established models were applied to monitor the dynamic changes in pigment content during the growth cycle of ginkgo seedlings. As shown in Fig. 11, the inversion results used heatmap color coding (blue: low concentration; yellow: medium concentration; red: high concentration) to reveal pigment distribution patterns across the four growth phases.

In Phase-1, the leaves presented predominantly red (high concentration) areas. This transitioned to yellow-dominated patterns with reduced red coverage in Phase-2. By Phase-3, yellow areas significantly diminished, whereas midrib regions retained stronger signals. Finally, in Phase-4, pigment distributions were barely distinguishable from those in the background. This progressive color variation comprehensively documents the physiological process from vigorous growth to senescence in Ginkgo seedlings.

The inversion results accurately captured the spatial characteristics of leaf senescence—color fading initially appeared at the leaf margins (Phase-3) and gradually progressed toward the petiole (Phase-4). This pattern aligns perfectly with the classical spatial progression of plant organ senescence (from distal to basal regions). The hyperspectral imaging technology, through its non-destructive 2D visualization capability, provides a new dimension for monitoring Ginkgo physiological status.

Discussion

This study implemented a high-resolution ground-based hyperspectral imaging system (spectral resolution: 2.8 nm; spatial resolution: 1936 × 1456 pixels) to enable non-destructive monitoring of ginkgo plants. Compared with conventional ASD single-point measurements [44] and UAV remote sensing approaches [45], this proximal sensing technology demonstrates superior performance in both data quality and leaf feature extraction. Building upon Li’s foundational work [32], we developed a specialized imaging protocol coupled with an adaptive spectral processing framework that successfully resolved the longstanding compatibility challenges between woody plants and high-throughput phenotyping platforms. This technological innovation not only advances ginkgo research but also establishes a methodological paradigm for hyperspectral analysis of other woody species. Furthermore, our whole-plant spectral measurements provide a more holistic characterization of physiological states, overcoming the spatial limitations of traditional point-based sampling.

Through an extensive sampling strategy encompassing 3,460 samples across multiple genotypes and senescence phases, our study significantly advances the field of senescence monitoring. Unlike previous investigations limited to healthy leaves [46], our model achieves robust predictive capability (RPD > 2.3) throughout all discoloration phases while effectively addressing NDVI saturation. The results reveal a distinct degradation pattern: (1) rapid initial decline (Phase-1–2), (2) moderated mid-term attenuation (Phase-2–3), and (3) accelerated late-phase yellowing (Phase-4). This nonlinear progression provides crucial insights into phase-dependent senescence dynamics that were previously overlooked in hyperspectral studies [32, 47, 48], including temporally resolved analyses [17]. Our systematic characterization of these degradation phases establishes a more reliable monitoring methodology, which is particularly valuable for economically important species such as ginkgo, and represents a significant improvement over existing phenotyping approaches.

The screening of spectral data preprocessing methods demonstrates that normalization is the optimal preprocessing strategy for spectral data. By standardizing data scales and eliminating dimensional interference, normalization effectively aligns spectral distributions with the requirements of PLSR, whereas derivative methods failed to yield meaningful model improvements. The consistent performance enhancements underscore normalization’s critical role in optimizing spectral models for plant pigment quantification. This conclusion is supported by prior studies, which acknowledge the importance of preprocessing but do not explicitly validate normalization’s superiority. For instance, Li et al. [32, 44] employed hyperspectral data for plant trait quantification, relying implicitly on effective preprocessing—yet our results uniquely identify normalization as the optimal approach, particularly for PLSR. In contrast, Daughtry [23] employed derivative methods without rigorously assessing their limitations; our findings not only confirm that derivatives do not enhance model performance but also reinforce normalization’s reliability for accurate pigment quantification.

This superiority arises because normalization enhances spectral interpretability while preserving physiologically relevant information. As illustrated in Figure S5 (a–d), raw spectra (a) display high variability in absolute reflectance due to factors such as illumination conditions, sensor geometry, and leaf orientation, which obscure biochemical features. Normalization (b) mitigates these artifacts by rescaling reflectance values to a standardized intensity range, thereby suppressing non-physiological noise while maintaining diagnostically significant spectral contours—including chlorophyll absorption features around 430, 460, 640, and 660 nm, the red-edge transition between 680 and 750 nm, and the NIR plateau beyond 750 nm [49]. In contrast, first- and second-derivative transformations (c, d), though effective at highlighting subtle absorption characteristics, tend to amplify high-frequency noise. This amplification compromises data quality and consequently reduces the robustness of models built on high-dimensional hyperspectral data [50].

The study also confirmed the importance of full-spectrum analysis in hyperspectral modeling. Unlike traditional methods relying on empirical vegetation indices [51], our AdaBoost-SPA integrated framework retained key spectral sensitivity regions while achieving an 85% dimensionality reduction [52].

At the methodological level, the optimized “Normalization-AdaBoost-SPA” framework significantly improved the model practicality and efficiency. Normalization effectively mitigated shadow and illumination variability issues in whole-plant measurements. This finding is consistent with Rautiainen et al.’s [53] conclusions in forest canopy research. Moreover, the AdaBoost algorithm demonstrated advantages in terms of large-sample processing speed and adaptability to class imbalance.

Furthermore, the feature selection capability of this framework identified a suite of characteristic bands that are both interpretable and mechanistically sound. These include the green reflectance peak at 550 nm, chlorophyll a absorption between 650 and 680 nm, the red-edge inflection point around 720–740 nm, and the NIR plateau beyond 800 nm. The close alignment of these features with established vegetation spectral principles [54, 55] confirms that the pipeline not only improves predictive accuracy but also strengthens the biological interpretability of the model, creating a credible mapping between spectral features and pigment dynamics. Collectively, these attributes—noise suppression through normalization, targeted dimensionality reduction, and powerful ensemble modeling—enable the proposed framework to achieve an optimal balance between signal fidelity and feature relevance, outperforming derivative-based approaches in both accuracy and mechanistic insight.

Consequently, this technical approach is not only suitable for laboratory research but also offers a feasible solution for future field-scale high-throughput phenotyping analysis.

The established “Normalization-AdaBoost-SPA” framework provides a reliable tool for high-throughput phenotyping of ginkgo germplasm resources. Its application prospects are mainly reflected in three aspects: First, this technology can be integrated with multi-omics platforms. For example, combining hyperspectral-predicted photosynthetic pigment phenotypes with genomic and metabolomic data can reveal the genetic basis of pigment metabolism and accelerate the selection of genotypes with high photosynthetic efficiency or specific secondary metabolite content. Second, the key characteristic wavelengths identified in this study (such as the characteristic absorption band of Chl a around 650–680 nm) provide a basis for developing low-cost, portable multispectral sensors suitable for large-scale field breeding screening, addressing the issues of expensive hyperspectral equipment and complex data processing. Finally, this model has unique advantages for monitoring the dynamic changes of pigments during the autumn leaf color transition period of ginkgo, and is expected to be applied in landscape plant physio-ecological monitoring and optimization of harvesting time for medicinal plants.

However, this study has some limitations that need to be addressed in future work. First, the model training data all came from a greenhouse seedling environment. Environmental factors (such as water stress, soil nutrient differences) may affect the spectral response, and introducing environmental correction factors may be a necessary approach to improve the model’s generalization ability. Second, although the AdaBoost algorithm performed excellently, its computational efficiency when processing extremely high-dimensional spectral data may still become a bottleneck for large-scale real-time processing. Lightweight deep learning models (such as 1-D CNN [56, 57]) or model compression techniques could be explored in the future to balance accuracy and efficiency. Third, hyperspectral imaging is susceptible to ambient light changes and leaf inclination angles. Although normalized preprocessing alleviated this problem to some extent, developing active imaging systems that sensor fusion approaches (e.g., IMU, LiDAR, or polarization) may be an important direction for achieving robust field measurements.

In summary, the hyperspectral analysis framework proposed in this study shows great potential in non-destructive detection of ginkgo pigments, laying a technical foundation for its application in precision breeding and physiological monitoring. Future research should focus on promoting the transition of this technology from controlled environments to complex field environments and improving its robustness and practicality through multi-technology integration.

Conclusion

This study focused on Ginkgo biloba seedlings and established a non-destructive prediction model for chlorophyll and carotenoid contents based on hyperspectral imaging technology. By applying normalization preprocessing and the AdaBoost-SPA feature selection algorithm, sensitive spectral features in the visible range were optimized, resulting in a high-precision and stable prediction model. This model is suitable for rapid phenotyping analysis of Ginkgo biloba seedlings, providing an efficient and non-destructive detection method for high-throughput screening of germplasm resources and breeding of superior varieties, demonstrating significant practical value.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

Chl a:: Chlorophyll a
Chl b:: Chlorophyll b
Car:: Carotenoids
AdaBoost:: Adaptive boosting
PLSR:: Partial least squares regression
RF:: Random forest
CARS:: Competitive adaptive reweighted sampling
SPA:: Successive projections algorithm
ROI:: Region of interest
R² _cv :: Cross-validated coefficient of determination
SECV:: Standard error of cross-validation
R² _p :: Coefficient of determination
RMSEP:: Root mean square error of prediction
RPD:: Ratio of performance to deviation
EDF:: Exponentially decreasing function

References

Zhao YP, Fan G, Yin PP, Sun S, Li N, Hong X, Hu G, Zhang H, Zhang FM, Han JD, Hao YJ, Xu Q, Yang X, Xia W, Chen W, Lin HY, Zhang R, Chen J, Zheng XM, Lee SMY, Lee J, Uehara K, Wang J, Yang H, Fu CX, Liu X, Xu X, Ge S. Resequencing 545 Ginkgo genomes across the world reveals the evolutionary history of the living fossil. Nat Commun. 2019;10(1):4201. https://doi.org/10.1038/s41467-019-12133-5.
Article CAS PubMed PubMed Central Google Scholar
Lu J, Xu Y, Meng Z, Cao M, Liu S, Kato-Noguchi H, Yu W, Jin B, Wang L. Integration of morphological, physiological and multi-omics analysis reveals the optimal planting density improving leaf yield and active compound accumulation in Ginkgo Biloba. Ind Crops Prod. 2021;172:114055. https://doi.org/10.1016/j.indcrop.2021.114055.
Article CAS Google Scholar
Van Beek TA, Montoro P. Chemical analysis and quality control of Ginkgo Biloba leaves, extracts, and phytopharmaceuticals. J Chromatogr A. 2009;1216(11):2002–32. https://doi.org/10.1016/j.chroma.2009.01.013.
Article CAS PubMed Google Scholar
Lieth H, Whittaker RH, editors. Primary productivity of the biosphere. Volume 14. Springer Berlin Heidelberg; 1975. https://doi.org/10.1007/978-3-642-80913-2.
Gitelson AA, Viña A, Ciganda V, Rundquist DC, Arkebauer TJ. Remote Estimation of canopy chlorophyll content in crops. Geophys Res Lett. 2005;32(8):2005GL022688. https://doi.org/10.1029/2005GL022688.
Article CAS Google Scholar
Demmig-Adams B. Carotenoids and photoprotection in plants: A role for the xanthophyll Zeaxanthin. Biochim Et Biophys Acta (BBA) - Bioenergetics. 1990;1020(1):1–24. https://doi.org/10.1016/0005-2728(90)90088-L.
Article CAS Google Scholar
Frank HA, Brudvig GW. Redox functions of carotenoids in photosynthesis. Biochemistry. 2004;43(27):8607–15. https://doi.org/10.1021/bi0492096.
Article CAS PubMed Google Scholar
Xu Y, Wang G, Cao F, Zhu C, Wang G, El-Kassaby YA. Light intensity affects the growth and flavonol biosynthesis of Ginkgo (Ginkgo Biloba L). New Forest. 2014;45(6):765–76. https://doi.org/10.1007/s11056-014-9435-7.
Article Google Scholar
Zhang H, Ge Y, Xie X, Atefi A, Wijewardane NK, Thapa S. High throughput analysis of leaf chlorophyll content in sorghum using RGB, hyperspectral, and fluorescence imaging and sensor fusion. Plant Methods. 2022;18(1):60. https://doi.org/10.1186/s13007-022-00892-0.
Article CAS PubMed PubMed Central Google Scholar
Kurepin LV, Zaman M, Pharis RP. Phytohormonal basis for the plant growth promoting action of naturally occurring biostimulators: basis for the action of biostimulators. J Sci Food Agric. 2014;94(9):1715–22. https://doi.org/10.1002/jsfa.6545.
Article CAS PubMed Google Scholar
Arnon DI. Copper enzymes in isolated chloroplasts. Polyphenoloxidase in BETA VULGARIS. Plant Physiol. 1949;24(1):1–15. https://doi.org/10.1104/pp.24.1.1.
Article CAS PubMed PubMed Central Google Scholar
Houborg R, McCabe MF. A hybrid training approach for leaf area index Estimation via cubist and random forests machine-learning. ISPRS J Photogrammetry Remote Sens. 2018;135:173–88. https://doi.org/10.1016/j.isprsjprs.2017.10.004.
Article Google Scholar
Koirala B, Zahiri Z, Scheunders P. A machine learning framework for estimating leaf biochemical parameters from its spectral reflectance and transmission measurements. IEEE Trans Geosci Remote Sens. 2020;58(10):7393–405. https://doi.org/10.1109/TGRS.2020.2982263.
Article Google Scholar
Zhang J, Tian H, Wang D, Li H, Mouazen AM. A novel spectral index for Estimation of relative chlorophyll content of sugar beet. Comput Electron Agric. 2021;184:106088. https://doi.org/10.1016/j.compag.2021.106088.
Article Google Scholar
Chen X, Li F, Shi B, Chang Q. Estimation of winter wheat plant nitrogen concentration from UAV hyperspectral remote sensing combined with machine learning methods. Remote Sens. 2023;15(11):2831. https://doi.org/10.3390/rs15112831.
Article Google Scholar
Yuan Z, Cao Q, Zhang K, Ata-Ul-Karim ST, Tian Y, Zhu Y, Cao W, Liu X. Optimal leaf positions for SPAD meter measurement in rice. Front Plant Sci. 2016;7. https://doi.org/10.3389/fpls.2016.00719.
Ye Z, Tan X, Dai M, Chen X, Zhong Y, Zhang Y, Ruan Y, Kong D. A hyperspectral deep learning attention model for predicting lettuce chlorophyll content. Plant Methods. 2024;20(1):22. https://doi.org/10.1186/s13007-024-01148-9.
Article CAS PubMed PubMed Central Google Scholar
Liu H, Bruning B, Garnett T, Berger B. Hyperspectral imaging and 3D technologies for plant phenotyping: from satellite to close-range sensing. Comput Electron Agric. 2020;175:105621. https://doi.org/10.1016/j.compag.2020.105621.
Article Google Scholar
Uddling J, Gelang-Alfredsson J, Piikki K, Pleijel H. Evaluating the relationship between leaf chlorophyll concentration and SPAD-502 chlorophyll meter readings. Photosynth Res. 2007;91(1):37–46. https://doi.org/10.1007/s11120-006-9077-5.
Article CAS PubMed Google Scholar
Gonthier P, Garbelotto M, Nicolotti G. Swiss stone pine trees and Spruce stumps represent an important habitat for Heterobasidion spp. In subalpine forests. Forest Pathol. 2003;33(3):191–203. https://doi.org/10.1046/j.1439-0329.2003.00323.x.
Article Google Scholar
Gitelson AA, Gritz † Y, Merzlyak MN. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J Plant Physiol. 2003;160(3):271–82. https://doi.org/10.1078/0176-1617-00887.
Article CAS PubMed Google Scholar
Sims DA, Gamon JA. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens Environ. 2002;81(2–3):337–54. https://doi.org/10.1016/S0034-4257(02)00010-X.
Article Google Scholar
Daughtry C. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens Environ. 2000;74(2):229–39. https://doi.org/10.1016/S0034-4257(00)00113-9.
Article Google Scholar
Zhi X, Massey-Reed SR, Wu A, Potgieter A, Borrell A, Hunt C, Jordan D, Zhao Y, Chapman S, Hammer G, George-Jaeggli B. Estimating photosynthetic attributes from High-Throughput canopy hyperspectral sensing in sorghum. Plant Phenomics. 2022;2022:9768502. https://doi.org/10.34133/2022/9768502.
Article CAS PubMed PubMed Central Google Scholar
Bendig J, Yu K, Aasen H, Bolten A, Bennertz S, Broscheit J, Gnyp ML, Bareth G. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int J Appl Earth Obs Geoinf. 2015;39:79–87. https://doi.org/10.1016/j.jag.2015.02.012.
Article Google Scholar
Deery D, Jimenez-Berni J, Jones H, Sirault X, Furbank R. Proximal remote sensing buggies and potential applications for Field-Based phenotyping. Agronomy. 2014;4(3):349–79. https://doi.org/10.3390/agronomy4030349.
Article Google Scholar
Jin X, Zarco-Tejada PJ, Schmidhalter U, Reynolds MP, Hawkesford MJ, Varshney RK, Yang T, Nie C, Li Z, Ming B, Xiao Y, Xie Y, Li S. High-Throughput Estimation of crop traits: A review of ground and aerial phenotyping platforms. IEEE Geoscience Remote Sens Magazine. 2021;9(1):200–31. https://doi.org/10.1109/MGRS.2020.2998816.
Article Google Scholar
Zarco-Tejada PJ, Guillén-Climent ML, Hernández-Clemente R, Catalina A, González MR, Martín P. Estimating leaf carotenoid content in vineyards using high resolution hyperspectral imagery acquired from an unmanned aerial vehicle (UAV). Agric for Meteorol. 2013;171–172:281–94. https://doi.org/10.1016/j.agrformet.2012.12.013.
Article Google Scholar
Cheng T, Riaño D, Ustin SL. Detecting diurnal and seasonal variation in canopy water content of nut tree orchards from airborne imaging spectroscopy data using continuous wavelet analysis. Remote Sens Environ. 2014;143:39–53. https://doi.org/10.1016/j.rse.2013.11.018.
Article Google Scholar
Aasen H, Burkart A, Bolten A, Bareth G. Generating 3D hyperspectral information with lightweight UAV snapshot cameras for vegetation monitoring: from camera calibration to quality assurance. ISPRS J Photogrammetry Remote Sens. 2015;108:245–59. https://doi.org/10.1016/j.isprsjprs.2015.08.002.
Article Google Scholar
Kanning M, Kühling I, Trautz D, Jarmer T, High-Resolution. UAV-Based hyperspectral imagery for LAI and chlorophyll estimations from wheat for yield prediction. Remote Sens. 2018;10(12):2000. https://doi.org/10.3390/rs10122000.
Article Google Scholar
Li W, Weiss M, Jay S, Wei S, Zhao N, Comar A, Lopez-Lozano R, De Solan B, Yu Q, Wu W, Baret F. Daily monitoring of effective green area index and vegetation chlorophyll content from continuous acquisitions of a multi-band spectrometer over winter wheat. Remote Sens Environ. 2024;300:113883. https://doi.org/10.1016/j.rse.2023.113883.
Article Google Scholar
Yin S, Zhou K, Cao L, Shen X. Estimating the horizontal and vertical distributions of pigments in canopies of Ginkgo plantation based on UAV-Borne lidar, hyperspectral data by coupling PROSAIL model. Remote Sens. 2022;14(3):715. https://doi.org/10.3390/rs14030715.
Article Google Scholar
Yue Z, Zhang Q, Zhu X, Zhou K. Chlorophyll content Estimation of Ginkgo seedlings based on deep learning and hyperspectral imagery. Forests. 2024;15(11):2010. https://doi.org/10.3390/f15112010.
Article Google Scholar
Hu B. Studies on Ginkgo Biloba Leaf Chlorophyll Content Estimation Based on Image Analysis and Hyperspectral Analysis. Master’s thesis. Shenyang Agricultural University; 2017.
Lichtenthaler HK, Wellburn AR. Determinations of total carotenoids and chlorophylls a and b of leaf extracts in different solvents. Biochem Soc Trans. 1983;11(5):591–2. https://doi.org/10.1042/bst0110591.
Kennard RW, Stone LA. Computer aided design of experiments. Technometrics. 1969;11(1):137–48. https://doi.org/10.1080/00401706.1969.10490666.
Wu Q, ZHU Z, Wu J, Xu X. A dataset representativeness metric and A slicing sampling strategy for the Kennard-Stone algorithm. Chem J Chin Universities. 2022;43(10):0397. https://doi.org/10.7503/cjcu20220397.
Article CAS Google Scholar
Abdi H. Partial least squares regression and projection on latent structure regression (PLS Regression). WIRE Comput Stat. 2010;2(1):97–106. https://doi.org/10.1002/wics.51.
Merentitis A, Debes C, Heremans R. Ensemble learning in hyperspectral image classification: toward selecting a favorable Bias-Variance tradeoff. IEEE J Sel Top Appl Earth Observations Remote Sens. 2014;7(4):1089–102. https://doi.org/10.1109/JSTARS.2013.2295513.
Article Google Scholar
Fan W, Stolfo SJ, Zhang J. The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 1999:362–366. https://doi.org/10.1145/312129.312283
Ding D, Yu H, Yin Y, Yuan Y, Li Z, Li F. Determination of chlorophyll and hardness in cucumbers by Raman spectroscopy with successive projections algorithm (SPA) – Extreme learning machine (ELM). Anal Lett. 2023;56(8):1216–28. https://doi.org/10.1080/00032719.2022.2123922.
Article CAS Google Scholar
Li H, Liang Y, Xu Q, Cao D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal Chim Acta. 2009;648(1):77–84. https://doi.org/10.1016/j.aca.2009.06.046.
Article CAS PubMed Google Scholar
Atzberger C, Guérif M, Baret F, Werner W. Comparative analysis of three chemometric techniques for the spectroradiometric assessment of canopy chlorophyll content in winter wheat. Comput Electron Agric. 2010;73(2):165–73. https://doi.org/10.1016/j.compag.2010.05.006.
Article Google Scholar
Dalponte M, Ørka HO, Ene LT, Gobakken T, Næsset E. Tree crown delineation and tree species classification in boreal forests using hyperspectral and ALS data. Remote Sens Environ. 2014;140:306–17. https://doi.org/10.1016/j.rse.2013.09.006.
Article Google Scholar
Ustin SL, Gamon JA. Remote sensing of plant functional types. New Phytol. 2010;186(4):795–816. https://doi.org/10.1111/j.1469-8137.2010.03284.x.
Article PubMed Google Scholar
Yendrek CR, Tomaz T, Montes CM, Cao Y, Morse AM, Brown PJ, McIntyre LM, Leakey ADB, Ainsworth EA. High-Throughput phenotyping of maize leaf physiological and biochemical traits using hyperspectral reflectance. Plant Physiol. 2017;173(1):614–26. https://doi.org/10.1104/pp.16.01447.
Article CAS PubMed Google Scholar
Angel Y, McCabe MF. Machine learning strategies for the retrieval of Leaf-Chlorophyll dynamics: model choice, sequential versus retraining learning, and hyperspectral predictors. Front Plant Sci. 2022;13. https://doi.org/10.3389/fpls.2022.722442.
Song D, Gao D, Sun H, Qiao L, Zhao R, Tang W, Li M. Chlorophyll content Estimation based on cascade spectral optimizations of interval and wavelength characteristics. Comput Electron Agric. 2021;189:106413. https://doi.org/10.1016/j.compag.2021.106413.
Article Google Scholar
Chen X, Dong Z, Liu J, Wang H, Zhang Y, Chen T, Du Y, Shao L, Xie J. Hyperspectral characteristics and quantitative analysis of leaf chlorophyll by reflectance spectroscopy based on a genetic algorithm in combination with partial least squares regression. Spectrochim Acta Part A Mol Biomol Spectrosc. 2020;243:118786. https://doi.org/10.1016/j.saa.2020.118786.
Article CAS Google Scholar
Verma B, Prasad R, Srivastava PK, Yadav SA, Singh P, Singh RK. Investigation of optimal vegetation indices for retrieval of leaf chlorophyll and leaf area index using enhanced learning algorithms. Comput Electron Agric. 2022;192:106581. https://doi.org/10.1016/j.compag.2021.106581.
Article Google Scholar
Xu Y, Mao Y, Li H, Sun L, Wang S, Li X, Shen J, Yin X, Fan K, Ding Z, Wang Y. A deep learning model for rapid classification of tea coal disease. Plant Methods. 2023;19(1):98. https://doi.org/10.1186/s13007-023-01074-2.
Article CAS PubMed PubMed Central Google Scholar
Rautiainen M, Lukeš P, Homolová L, Hovi A, Pisek J, Mõttus M. Spectral properties of coniferous forests: A review of in situ and laboratory measurements. Remote Sens. 2018;10(2):207. https://doi.org/10.3390/rs10020207.
Article Google Scholar
Liu N, Xing Z, Zhao R, Qiao L, Li M, Liu G, Sun H. Analysis of chlorophyll concentration in potato crop by coupling continuous wavelet transform and spectral variable optimization. Remote Sens. 2020;12(17):2826. https://doi.org/10.3390/rs12172826.
Article Google Scholar
Cheng J, Yang G, Xu W, Feng H, Han S, Liu M, Zhao F, Zhu Y, Zhao Y, Wu B, Yang H. Improving the Estimation of Apple leaf photosynthetic pigment content using fractional derivatives and machine learning. Agronomy. 2022;12(7):1497. https://doi.org/10.3390/agronomy12071497.
Article CAS Google Scholar
Sun Q, Zhao G, Xia X, Xie Y, Fang C, Sun L, Wu Z, Pan C. Hyperspectral image classification based on Multi-Scale convolutional features and Multi-Attention mechanisms. Remote Sens. 2024;16(12):2185. https://doi.org/10.3390/rs16122185.
Article Google Scholar
Jia J, Zheng X, Wang Y, Chen Y, Karjalainen M, Dong S, Lu R, Wang J, Hyyppä J. The effect of artificial intelligence evolving on hyperspectral imagery with different signal-to-noise ratio, spectral and Spatial resolutions. Remote Sens Environ. 2024;311:114291. https://doi.org/10.1016/j.rse.2024.114291.
Article Google Scholar

Download references

Funding

This work was funded by the Jiangsu Agriculture Science and Technology Innovation Fund (CX(23)1027) and the National Natural Science Foundation of China (32101521).

Author information

Authors and Affiliations

State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, Jiangsu, China
Xin Yang, Zihan Wei, Lehao Li, Xiaoming Yang, Jimei Han, Meiling Ming, Guibin Wang, Fuliang Cao, Kai Zhou & Fangfang Fu

Authors

Xin Yang
View author publications
Search author on:PubMed Google Scholar
Zihan Wei
View author publications
Search author on:PubMed Google Scholar
Lehao Li
View author publications
Search author on:PubMed Google Scholar
Xiaoming Yang
View author publications
Search author on:PubMed Google Scholar
Jimei Han
View author publications
Search author on:PubMed Google Scholar
Meiling Ming
View author publications
Search author on:PubMed Google Scholar
Guibin Wang
View author publications
Search author on:PubMed Google Scholar
Fuliang Cao
View author publications
Search author on:PubMed Google Scholar
Kai Zhou
View author publications
Search author on:PubMed Google Scholar
Fangfang Fu
View author publications
Search author on:PubMed Google Scholar

Contributions

X.Y. and F.F. conceptualized the study and designed the methodology. X.Y., Z.W., and L.L. conducted the investigation. X.Y. performed data analysis, validated results, managed data, and wrote the original draft. F.F. and F.C. provided resources, supervised the research, and managed the project. X.Y., J.H., M.M., GW, and K.Z. reviewed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Fuliang Cao, Kai Zhou or Fangfang Fu.

Ethics declarations

Ethics approval and consent to participate

All authors agreed to publish this manuscript.

Consent for publication

Consent and approval for publication were obtained from all authors.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Supplementary Material 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, X., Wei, Z., Li, L. et al. Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques. Plant Methods 21, 130 (2025). https://doi.org/10.1186/s13007-025-01439-9

Download citation

Received: 25 July 2025
Accepted: 25 August 2025
Published: 16 October 2025
DOI: https://doi.org/10.1186/s13007-025-01439-9

Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques

Abstract

Introduction

Materials and methods

Plant materials and sampling protocol

Hyperspectral imaging system

Data preprocessing

Pigment extraction and quantification

Dataset partition

Modeling methods and evaluation metrics

Results

Temporal changes in chlorophyll and carotenoid contents

Spectral characteristics and preprocessing effects

Optimal preprocessing results

Model comparison and optimal model determination

Results of feature wavelength selection

Spatiotemporal inversion of pigment content

Discussion

Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1.

Supplementary Material 2.

Supplementary Material 3.

Supplementary Material 4.

Supplementary Material 5.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us