Skip to main content
BMC is moving to Springer Nature Link. Visit this journal in its new home.

Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques

Abstract

The photosynthetic pigments – chlorophyll a (Chl a), chlorophyll b (Chl b), and carotenoids (Car) – in juvenile ginkgo leaves are crucial for growth monitoring as they reflect physiological status and directly influence the biosynthesis of bioactive compounds such as flavonoids and terpene lactones. Traditional pigment measurement methods (acetone/ethanol extraction, SPAD, etc.) are inadequate for large-scale dynamic monitoring and high-throughput phenotyping analysis. To address this, this study developed a non-destructive prediction model for Chl a, Chl b, and Car contents in ginkgo seedlings using hyperspectral imaging combined with machine learning algorithms, which is applicable to seedlings with different genetic backgrounds and at various color development phases. A total of 3,460 seedlings from 590 families, sourced from ancient trees across 19 provinces in China, were analyzed using hyperspectral imaging and biochemical pigment quantification. A phased optimization strategy was implemented, including preprocessing method screening, model comparison, and feature wavelength selection. Among the four tested preprocessing methods (raw reflectance, normalization, first derivative, and second derivative), normalization significantly improved model accuracy. The Adaptive Boosting (AdaBoost) algorithm outperformed partial least squares regression (PLSR) and random forest (RF), achieving coefficients of determination (R²) above 0.83 and the ratio of performance to deviation (RPD) values exceeding 2.4 across all pigments. Compared with competitive adaptive reweighted sampling (CARS), the successive projections algorithm (SPA) demonstrated more effective spectral dimensionality reduction while preserving predictive power. This framework enables efficient, accurate, and scalable pigment phenotyping in Ginkgo biloba, offering technical support for large-scale germplasm screening and precision breeding.

Introduction

Ginkgo biloba L., recognized as one of the most ancient extant seed plants, is celebrated as a “living fossil” in the plant kingdom. This species possesses not only a unique evolutionary position but also produces valuable secondary metabolites, including flavonoids and terpene lactones, while demonstrating remarkable environmental adaptability. Consequently, precise monitoring of its physiological status represents a critical prerequisite for the scientific conservation and sustainable utilization of this precious germplasm resource. In medicinal applications, the leaves of juvenile ginkgo trees (1–5 years old) serve as the primary source of bioactive compounds such as flavonoids and terpene lactones, whose biosynthesis and accumulation efficiency are closely correlated with the plant’s photosynthetic capacity and metabolic activity [1,2,3].

The photosynthetic pigments in leaves - chlorophyll a, chlorophyll b, and carotenoids- are fundamental components of the photosynthetic apparatus, with their contents and ratios directly reflecting plant physiological status. Chlorophyll (particularly Chl a) represents a key indicator for assessing photosynthetic efficiency and biomass accumulation potential [4, 5], whereas carotenoids facilitate light harvesting and play vital roles in photoprotection mechanisms, mitigating photooxidative damage to photosynthetic systems [6, 7]. Given that seedling photosynthetic performance and redox homeostasis directly influence medicinal compound biosynthesis flux, accurate monitoring of these pigment dynamics proves essential for both selecting superior ginkgo germplasms (characterized by high photosynthetic efficiency, strong stress resistance, and enhanced medicinal value) and optimizing cultivation practices to improve economic yield [8,9,10].

Conventional pigment quantification methods (e.g., Arnon’s spectrophotometry approach) rely on destructive extraction using organic solvents (acetone/ethanol) [11]. While providing accurate measurements, these methods present three major limitations: (1) their destructive nature precludes long-term monitoring of valuable ancient trees or endangered germplasms; (2) labor-intensive procedures, including grinding, > 24-hour dark extraction, and centrifugation, hinder high-throughput screening in large germplasm repositories; and (3) inherent time delays prevent real-time tracking of plant responses to environmental fluctuations, thereby constraining precise germplasm evaluation and breeding progress.

Hyperspectral non-destructive detection technology presents a promising solution to these challenges, especially when integrated with machine learning algorithms such as Partial Least Squares Regression (PLSR), Random Forest (RF), 1-D Convolutional Neural Networks (1-D CNN), and Adaptive Boosting (AdaBoost), among others [12,13,14,15]. However, existing techniques exhibit specific shortcomings: chlorophyll meters (e.g., SPAD-502) provide rapid measurements but yield only relative indices that are susceptible to interference from leaf structural parameters (e.g., thickness and water content[16,17,18,19,20] ; while drone/satellite remote sensing covers extensive areas, its limited spatial resolution cannot accurately retrieve leaf-scale physiological parameters and remains vulnerable to atmospheric conditions [21]. In contrast, proximal hyperspectral techniques employing portable spectrometers or hyperspectral imaging devices can acquire continuous narrowband (1–3 nm resolution) reflectance spectra from leaves or canopies, enabling precise analysis of pigment-specific absorption features in both the blue-violet (430–450 nm) and red-edge (680–720 nm) regions [22].

Current hyperspectral techniques for the quantitative analysis of leaf pigments present significant methodological limitations across different application scenarios, as systematically compared in Table 1. In precision agriculture, empirical regression models based on vegetation indices have demonstrated satisfactory predictive performance for specific crop varieties, yet they struggle to account for interference from complex interactions of secondary metabolites and intraspecific genetic variations [24,25,26,27,28]. For forest health monitoring, aerial hyperspectral platforms [29,30,31,32,33] are constrained by spatial resolution limitations (> 1 m/pixel), resulting in substantial dilution of critical pigment-structure covariation characteristics at the leaf level. Notably, while a dedicated study on Ginkgo biloba [33] innovatively integrated UAV-LiDAR multi-source data fusion, it still suffers from insufficient sample representativeness (n < 100) and limited spectral-spatial resolution. Furthermore, Yue et al.’s hyperspectral modeling of seedlings using SPAD chlorophyll relative values not only lacked key physiological parameters for carotenoids but also showed clear applicability limitations in large-scale precision assessment of Ginkgo biloba germplasm resources [34].

These methodological shortcomings collectively highlight two critical unresolved issues in Ginkgo germplasm evaluation: (1) inadequate genetic coverage in existing sample sets, which fail to represent the diversity of natural Ginkgo populations, and (2) a disconnect between static monitoring approaches and dynamic physiological processes, as most studies rely on single-time-point sampling data, whereas Ginkgo leaf coloration is fundamentally a complex physiological process involving the dynamic balance of chlorophyll synthesis–degradation and carotenoid metabolism.

This study bridges these methodological gaps by pioneering the development of quantitative retrieval models for Chl a, Chl b, and Car in Ginkgo biloba leaves using portable hyperspectral imaging. Our comprehensive sampling framework incorporates three key innovations: (1) an unprecedented sample size (n = 3,460 seedlings) that ensures exceptional model robustness across genetic and environmental variations; (2) representative genetic diversity encompassing progeny from 590 ancient ginkgo trees across 19 Chinese provinces (25°–41°N latitude), covering the species’ major natural distribution range; and (3) temporally optimized sampling spanning the 20-day autumnal color transition period to capture the complete pigment remodeling dynamics during the critical chlorophyll degradation phase, thereby maximizing model generalizability across developmental phases.

To systematically address these gaps, our study was designed with a phased analytical framework (Fig. 1). First, to minimize non-biochemical spectral noise and enhance pigment-specific absorption features, we evaluated multiple spectral preprocessing methods, including raw reflectance, normalization, and first- and second-derivative transformations. Second, we compared the performance of representative machine learning algorithms (PLSR, RF, and AdaBoost) for their capacity to capture the complex, nonlinear relationships between spectral data and pigment concentrations across diverse genotypes. Finally, we employed feature wavelength selection algorithms (SPA and CARS) to refine the most informative spectral variables, aiming to improve model efficiency and portability for future high-throughput field applications. This structured approach was designed to ensure that our final models achieve not only high predictive accuracy but also essential robustness and generalizability across the genetic and temporal variability inherent in large-scale germplasm collections.

Table 1 Comparative review of core methodologies and limitations in hyperspectral pigment Estimation

Materials and methods

Plant materials and sampling protocol

Fig. 1
figure 1

Flowchart of pigment prediction in Ginkgo seedlings using hyperspectral imaging modeling: a spectral extraction; b pigment measurement; c modeling process

The experimental materials were collected from the Ginkgo biloba Ancient Tree Germplasm Resource Nursery at Nanjing Forestry University. The geographical origins of the germplasm sources and the nursery locations are mapped in Supplementary Figure S1. The germplasm was initially acquired in 2021, and seedlings were cultivated in a controlled greenhouse environment at the Xiashu Practice Forest Station (119.218°E, 32.120°N) in Jurong, Jiangsu Province, starting in March 2022.

To establish a hyperspectral prediction model capable of accommodating diverse leaf senescence phases, sampling was strategically conducted during the critical natural senescence period from late October to late November. The samples were divided into four phases according to sampling time: Phase-1 (October 31–November 5) represented the initial senescence transition; Phase-2 (November 6–11) captured mid-phase senescence development; Phase-3 (November 12–15) reflected advanced senescence; and Phase-4 (November 20) included supplemental early-yellowing specimens to ensure complete phenotypic representation. Hyperspectral imaging was performed on 3,460 seedlings, representing 590 distinct families derived from ancient trees across 19 provinces in China. Following the hyperspectral data acquisition, four leaf discs (14 mm diameter) were sampled from each seedling, one from each cardinal direction, to ensure representative tissue collection. The excised leaf discs were immediately placed in centrifuge tubes, flash-frozen on dry ice under dark conditions, and transported to the laboratory for subsequent biochemical analysis (Fig. 1b).

Hyperspectral imaging system

We used a portable hyperspectral imaging system for non-destructive spectral measurement of Ginkgo biloba seedlings (Fig. 1a). The core component was an Image-λ-V10E-HR hyperspectral imager (Dualix Spectral Imaging, China), covering 350–1000 nm with 176 spectral channels. The imaging system was integrated with an HSIA-RAK100-IMS motorized rotary stage that enabled comprehensive scanning of the plant samples. For illumination, two halogen lamps provided stable light output across 350–2500 nm, powered by an uninterrupted power supply to maintain consistent lighting conditions during data acquisition.

The system incorporated critical calibration components including a PTFE-coated white reference panel (Dualix Spectral Imaging) with 99% reflectance for spectral calibration, and black velvet background material exhibiting near-zero reflectance (< 0.1%) across the working spectral range to minimize interference. All operations were controlled via a Lenovo ThinkPad T14 computer running SpecVIEW software (v2.9.3.8) for image acquisition and spectral calibration, whereas MATLAB 2022b and Python 3.10 were employed for region-of-interest analysis and machine learning model development. The entire setup was enclosed in a light-isolated darkroom to eliminate ambient light interference and ensure measurement consistency.

During operation procedures, the seedling samples were sequentially positioned at the center of the imaging platform. The hyperspectral camera, mounted on the rotary stage and precisely aligned using checkerboard calibration, acquired top-down images through a left-to-right scanning motion with approximately 15 s per scan. This brief acquisition time ensured negligible thermal effects from the illumination system on the plant samples. The system output consisted of three-dimensional hyperspectral reflectance cubes containing 176 spectral bands across the 350–1000 nm range, with each band corresponding to a specific wavelength interval. Automated data collection and preliminary processing were performed through the integrated SpecVIEW control software.

The raw hyperspectral images were calibrated using white and dark references to remove system noise. The white reference was acquired from a standard calibration panel, whereas the dark reference was obtained with the lens covered and lights off. The reflectance (R) was calculated as:

$${\text{R}}=\frac{{{R_{raw}} - Doc}}{{Woc - Doc}}$$
(1)

where Rraw is the raw sample value, Doc is the dark reference, and Woc is the white reference. This correction eliminates sensor noise and ensures accurate reflectance measurements.

Data preprocessing

Fig. 2
figure 2

Workflow for extracting reflectance spectra from Ginkgo canopy hyperspectral images: a single-plant RGB representation; b single-band image at 747 nm; c region of interest (ROI) extracted image; d averaged one-dimensional reflectance spectral plot

To extract spectral reflectance information from ginkgo seedlings, we processed the white/dark-corrected hyperspectral images using MATLAB R2022b. The procedure involved: (1) selecting a characteristic wavelength band (747 nm) for single-band image processing, (2) applying Otsu’s threshold method combined with morphological opening operations to remove background interference and eliminate small-area noise, and (3) generating a target mask based on the segmentation results. This mask was then used to extract the leaf regions of interest (ROIs). For each ROI, we calculated the mean reflectance across all pixels at each wavelength, generating a one-dimensional average reflectance spectrum vector representing each sample (Fig. 2).

To mitigate interference from instrument noise, illumination variations, and scattering effects in the hyperspectral data while evaluating different preprocessing approaches, we implemented the following spectral preprocessing methods: (1) Raw reflectance spectra (untreated average reflectance data), (2) normalization (Min–Max scaling to [-1, 1] range for dimension unification), (3) first derivative (emphasizing spectral slope features while suppressing baseline drift), and (4) second derivative (enhancing spectral curvature features and precisely identifying absorption peaks/valleys). This comprehensive preprocessing strategy significantly improved subsequent modeling data quality.

Pigment extraction and quantification

As shown in Fig. 1b, the leaf discs were subjected to ethanol extraction using immersion in 9 mL of 96% ethanol and maintained in complete darkness for > 24 h until complete chlorophyll bleaching was achieved. Absorbance measurements were conducted at 665 nm (A665), 649 nm (A649), and 470 nm (A470) using a SPECORD 200PLUS spectrophotometer with ethanol as blank [36].

Pigment concentrations (µg·mL⁻¹ extract) were calculated as:

$${C_a}=13.95{A_{665}} - 6.88{A_{649}}$$
(2)
$${C_b}=24.96{A_{649}} - 7.32{A_{665}}$$
(3)
$${C_{x+c}}=(1000{A_{470}} - 2.05{C_a} - 114.8{C_b})/245$$
(4)

where \(\:{C}_{a}\) denotes the concentration of chlorophyll a, \(\:{C}_{b}\) represents the concentration of chlorophyll b, and \(\:{C}_{x+c}\) corresponds to the concentration of carotenoids.

The pigment content per unit leaf area (µg·cm⁻²) was determined as follows:

$$Cont=\left( {C \times V} \right)/\left( {A \times N} \right)$$
(5)

where C represents the pigment concentration, V is the extraction volume (9 mL), A is the disc area [π×(0.7 cm)²], and N is the number of discs per sample (N = 4).

The obtained pigment contents were analyzed in four phases. Pigment concentrations were calculated using Excel, with all subsequent statistical analyses (Kruskal–Wallis H tests and post hoc Dunn’s tests with Bonferroni correction), and data visualization performed using Python 3.10 to assess phase-dependent differences and identify specific intergroup variations.

Dataset partition

The Kennard-Stone algorithm was employed to partition the 3,460 seedling samples into calibration (n = 2,075) and prediction (n = 1,385) sets based on spectral feature space distances[37]. This approach maximizes spectral variability representation in the calibration set, while ensuring complete separation from the prediction set. The algorithm iteratively selects samples with maximal Euclidean distances, guaranteeing comprehensive coverage of spectral characteristics. This rigorous partitioning enhances model robustness by preventing overfitting and enabling reliable evaluation of generalization performance using the independent prediction set [38, 39].

Modeling methods and evaluation metrics

This study adopted a stepwise optimization approach to develop hyperspectral prediction models for pigment content in ginkgo seedling leaves (Fig. 1c), systematically implementing three key optimization phases: (1) preprocessing method selection, (2) modeling strategy comparison, and (3) feature wavelength identification. Throughout the process, a unified model evaluation framework was consistently applied to ensure comparability and consistency across different modeling strategies.

All phases adopted a unified model evaluation framework. The calibration set was evaluated using two key metrics including the cross-validated coefficient of determination (\(\:{R}_{CV}^{2}\)) to characterize model goodness-of-fit (values closer to 1 indicating better fit) and the standard error of cross-validation (\(\:SECV\)) to quantify prediction deviation (smaller values indicating lower deviation between predicted and actual values). For the prediction set evaluation, an independent validation set was used to assess model generalization capability, with core metrics comprising the coefficient of determination (\(\:{R}_{p}^{2}\)) to quantify the model’s explanatory power for unknown sample variability (higher values indicating better performance), root mean square error of prediction (RMSEP) reflecting the average deviation between predicted and true values (smaller values indicating higher accuracy), and the ratio of performance to deviation (\(\:RPD\)), defined as the ratio of prediction set standard deviation to RMSEP (models with \(\:RPD\) >2 generally considered to have strong predictive ability).

Partial least squares regression (PLSR) was selected as the benchmark model to evaluate and compare the effects of different preprocessing methods. PLSR establishes linear regression relationships between spectra and target parameters by extracting latent variables (LVs), demonstrating effective handling of high-dimensional collinear data [39]. The specific procedure involved constructing PLSR models for each of the four preprocessed datasets, followed by comprehensive performance comparison using calibration set metrics (\(\:{R}_{CV}^{2}\), \(\:SECV\)) and prediction set (\(\:{R}_{p}^{2}\), RMSEP, \(\:RPD\)). The optimal preprocessing method was selected based on superior performance across all evaluation indicators in both calibration and prediction sets, thereby establishing a high-quality data foundation for subsequent modeling.    

After applying the above optimal preprocessing methods, three representative models including PLSR, random forest (RF), and adaptive boosting (AdaBoost), were systematically compared to evaluate different modeling strategies comprehensively. PLSR, as a classical multivariate statistical regression method, excels in handling high-dimensional, multicollinear spectral data. Its core advantage lies in extracting the most explanatory latent variables from the original spectral variables and target parameters to establish linear regression models, simultaneously achieving dimensionality reduction while maximizing covariance information, making it a widely used benchmark algorithm in this field.

RF, as a decision tree algorithm based on bagging ensemble strategy, enhances prediction robustness and generalization capability by constructing numerous uncorrelated decision trees and employing voting (classification) or averaging (regression) [40]. It inherently handles nonlinear relationships, automatically evaluates feature importance, and demonstrates strong resistance to overfitting and noisy data.

AdaBoost, based on boosting ensemble strategy, operates by iteratively training a series of “weak learners” (e.g., simple decision trees) and adjusting sample weights and model weights according to previous model errors, enabling subsequent models to focus more on difficult-to-predict samples [41]. This mechanism allows gradual improvement in the overall model prediction accuracy, particularly excelling in learning complex patterns.

The selection of PLSR, RF and AdaBoost for comparison aimed to cover different modeling paradigms from linear (PLSR) to nonlinear (RF, AdaBoost) approaches, while utilizing ensemble learning (RF, AdaBoost) to enhance model robustness, ultimately identifying the most suitable predictive model for the data characteristics and modeling objectives.

After determining the optimal preprocessing method and predictive model, the successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) were employed for characteristic wavelength selection to reduce data dimensionality (thereby enhancing model interpretability and practicality) and potentially improve performance. SPA selects a set of wavelengths with rich information content and minimal mutual collinearity from full spectral bands [42]. Its core principle involves iteratively selecting new wavelengths where the projection vector modulus in the subspace spanned by already selected wavelength vectors is maximized, thereby minimizing redundant information between wavelengths and obtaining a highly representative, low-correlation feature subset.

CARS algorithm simulates the “survival of the fittest” principle, combining Monte Carlo sampling and exponentially decreasing function (EDF) for wavelength selection [43]. Each sampling iteration involves three steps. First, determining the proportion of wavelengths to eliminate in the current round using EDF. Second, dynamically calculating retention weights for each wavelength based on the absolute value of their regression coefficients in the PLSR model (or selected base model) (higher absolute values corresponding to greater weights). Finally, performing weighted sampling to eliminate wavelengths. By cross-validating model performance corresponding to wavelength subsets generated in each iteration, the optimal subsets demonstrating the best cross-validation performance is ultimately selected as characteristic wavelengths.

Results

Temporal changes in chlorophyll and carotenoid contents

Fig. 3
figure 3

Ginkgo seedlings collected at four phases illustrated with true color RGB images

A comprehensive dataset comprising 3,460 seedlings was analyzed for photosynthetic pigment content dynamics. The samples were systematically categorized into four distinct temporal phases to track progressive senescence patterns. In particular, Phase-1 (n = 566) represented the initial senescence transition; Phase-2 (n = 1,276) captured mid-phase senescence development; Phase-3 (n = 1,600) reflected advanced senescence; and Phase-4 (n = 18) included supplemental early-yellowing specimens to ensure complete phenotypic representation. As visually documented in Fig. 3, this sampling window captures a continuous senescence gradient spanning from dark green to yellowing leaves, with standardized imaging conditions ensuring accurate color representations.

Fig. 4
figure 4

Stacked bar chart showing the temporal decline in photosynthetic pigment contents (Chl a, Chl b, and Car) across four growth phases

As shown in Fig. 4, the stacked bar chart reveals significant temporally declining trends in Chl a, Chl b, and Car contents. Phase-1 exhibited peak levels, with the most drastic decline toward Phase-2; values stabilized between Phase-2 and Phase-3; and Phase-4 showed minima due to high yellowing-leaf prevalence.

Fig. 5
figure 5

Post-hoc analysis (Tukey’s HSD test) of pigment content variations across the four growth phases: a Chl a; b Chl b; c Car. Significance levels between phases are indicated by asterisks: *, p < 0.05; **, p < 0.01; ***, p < 0.001

The Kruskal–Wallis H test confirmed highly significant overall differences across senescence phases (p < 0.001), with effect sizes (ε² = 0.094–0.098) indicating moderate strength. Bonferroni-corrected Dunn’s post hoc tests delineated specific differentiation patterns: Phase-1 differed significantly from Phase-2/3/4 (p < 10⁻¹⁰); all pigments differed between Phase-2 and Phase-3 (p < 0.05), with Chl a most distinct (p = 1.7 × 10⁻⁴); Phase-2 and Phase-3 significantly differed from Phase-4 in all pigments (p < 0.05) (Fig. 5).

Spectral characteristics and preprocessing effects

Raw spectra contained valuable information but require preprocessing to mitigate noise and baseline drift.

Regarding raw spectral characteristics (Fig. 6a), the 375–1000 nm reflectance curve shows key features: a 400 nm absorption valley (short-wave absorption), a 550 nm chlorophyll reflection peak, a 625 nm inflection valley, and a dominant 725–750 nm near-infrared peak. Subtle fluctuations at 800–900 nm and 950 nm reflect intrinsic responses with potential noise. This provides fundamental but noisy baseline data.

In terms of normalization effects (Fig. 6b), compression of reflectance magnitude (≈ -0.075 to 0.075) reduces absolute differences while preserving shape. The 375–450 nm valley and 600–650 nm relative valley correspond to original features at 400 nm and 625 nm, while 500–550 nm and 700–750 nm peaks retain main peak morphology. This enables cross-sample shape comparison by eliminating reflectance-level variations.

In terms of first-derivative effects (Fig. 6c), this transformation highlights rate-of-change features: negative peaks at 400/600 nm mark valley inflection points; positive peaks at 500 nm and 675–700 nm indicate maximal ascent slopes. Critical wavelength dynamics (e.g., chlorophyll-related red-edge shifts) are amplified.

Fig. 6
figure 6

Comparison of spectral pretreatment methods: a Raw spectra; b Normalization; c First-order Derivative; d Second-order Derivative

In terms of second-derivative effects (Fig. 6d), curvature features (≈ − 0.015 to 0.015) are resolved: negative peaks at 375–425/525/700 nm show maximal bending; positive features at 475/675 nm and 850–950 nm capture subtle curvature. Fine structures such as carotenoid-chlorophyll competition in 450–550 nm are enhanced despite noise sensitivity.

With regard to the integrated impact, combined preprocessing mitigates sample variations while enhancing discriminative features: normalization standardizes shapes, first-derivative extracts trends, second-derivative refines curvature. The transformed spectra improve feature interpretability for quantitative analysis.

Optimal preprocessing results

This study evaluated four spectral preprocessing methods, including raw spectra, normalization, first-order derivative (1st

derivative) and second-order derivative (2nd derivative), for their impact on partial least squares regression (PLSR) models of predicting Chl a, Chl b, and Car content.

Fig. 7
figure 7

Comparative performance of different spectral preprocessing methods on pigment prediction metrics: a coefficient of determination (R²); b standard error of cross-validation (SECV) and root mean square error of prediction (RMSEP); c residual predictive deviation (RPD)

Derivative preprocessing yielded only marginal improvements over raw spectra, with most metrics remaining comparable or slightly degraded across pigments. In contrast, the normalization strategy demonstrated unequivocal superiority, substantially outperforming all other methods in every evaluation dimension (Fig. 7). Comprehensive metrics (Table 2) confirmed its robust advantages: normalization significantly enhanced prediction accuracy (systematically higher \(\:{R}_{p}^{2}\)), reduced modeling errors (lower \(\:{RMSEP}\)/ \(\:SECV\)), and improved model stability (higher \(\:RPD\)) for all pigments. The most pronounced gains occurred in the carotenoid prediction, although substantial improvements consistently manifested across all biochemical parameters.

Table 2 Predictive performance of PLSR models with different spectral preprocessing methods

Model comparison and optimal model determination

Using hyperspectral data preprocessed via normalization, this study systematically assessed the predictive capabilities of PLSR, RF, and AdaBoost models for Chl a, Chl b, and Car contents (Table 3). All pigment datasets exhibited consistent model performance ranking: AdaBoost > RF > PLSR. Compared with PLSR, AdaBoost demonstrated comprehensive superiority, achieving a mean 10.9% improvement in the prediction set R² and a 22.7% increase in RPD, with carotenoid prediction reaching an RPD of 2.53. Compared with RF, AdaBoost delivered respective mean enhancements of 2.4% in prediction R² and 5.8% in RPD. Critical error metrics confirmed these advantages, including a 20% reduction in prediction error (RMSEP) for Chl a.

Table 3 Comparison of machine learning model performance for estimating pigment contents

As demonstrated in Fig. 8, AdaBoost’s measured versus predicted values cluster tightly along the 1:1 line for both training and testing sets, confirming its exceptional fitting and generalization capabilities. Across all pigments, the prediction set exhibits superior aggregation around the ideal line compared to the calibration set, particularly within typical concentration ranges. This consistency is occasionally interrupted by minor dispersion at high-concentration extremes. Notably, the prediction set fitting slopes approach the ideal value of 1.0, indicating reduced systematic bias.

Fig. 8
figure 8

Measured vs. predicted values of pigment contents using the AdaBoost regression model: a Chl a; b Chl b; c Car

These distribution patterns corroborate quantitative metrics, whereas prediction set R² values appear suppressed due to range concentration, actual prediction accuracy proves higher, evidenced by consistently lower RMSEP versus calibration SECV and visually reduced dispersion. AdaBoost’s dynamic weighting mechanism effectively captures nonlinear relationships between spectral features and pigment concentrations, enabling significant accuracy improvements (RPD > 2.4) in agriculturally critical monitoring ranges. The established superiority of AdaBoost, validated through distribution compactness, error reduction (21% average RMSEP decrease versus PLSR), and stability (RPD threshold compliance), with providing a robust technical foundation for crop physiological monitoring.

Results of feature wavelength selection

The feature wavelength selection results using successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) are presented in Supplementary Figures S2 and S3, respectively. The results of the quantitative analysis (Table 4) demonstrate the systematic superiority of SPA. For chlorophyll estimation, SPA-selected 45 wavelengths achieved a prediction R² of 0.8207, exceeding CARS by 0.94% points, while improving model robustness with a 2.12% higher RPD value. The chlorophyll b results further highlight the advantages of SPA, with 44 wavelengths yielding 0.8117 prediction R² and 0.0008 SECV, representing a 1.08% points accuracy improvement and an 11.1% error reduction, compared with CARS’s sparse 9-wavelength model. The most significant improvement occurred in carotenoid prediction, where SPA’s 57 wavelengths outperformed CARS (11 wavelengths), by 3.29% points in R² and 7.74% in RPD while maintaining an equivalent RMSEP. The combined SPA–CARS approach (Supplementary Table S4) further reduced the number of selected wavelengths but at the cost of decreased predictive accuracy, suggesting that overly aggressive dimensionality reduction may eliminate key spectral information and compromise model performance.

Table 4 Comparative modeling metrics across calibration and prediction sets: CARS vs. SPA for pigments quantification
Fig. 9
figure 9

Comparison of characteristic wavelengths selected by the successive projections algorithm (SPA) and competitive adaptive reweighted sampling (CARS) for predicting three pigments: a Chl a; b Chl b; c Car

Wavelength distribution patterns (Fig. 9) reveal the underlying mechanisms: SPA achieves comprehensive spectral coverage, preserving chlorophyll a’s red-edge sensitive region (680 nm) and near-infrared senescence-indicating peak (747 nm) while precisely targeting chlorophyll b’s primary absorption features (450–500 nm) and secondary responses (600–700 nm) and simultaneously capturing carotenoid’s 470 nm primary and 490 nm secondary peaks. In contrast, CARS focuses on local high-variance region fragment spectral fingerprints. In terms of chlorophyll b models, CARS retains only 9 wavelengths, losing critical 600–650 nm absorption information, and carotenoid characterization is compromised by the omission of the 490 nm secondary peak.

Fig. 10
figure 10

Validation of model simplification based on feature wavelengths selected by CARS (a-c) and SPA (d-f) for predicting (a, d) Chl a, (b, e) Chl b, and (c, f) Car

Model simplification validation (Fig. 10) confirms that despite a 1.2 to 2.5% point average prediction R² reduction versus full-spectrum models, wavelength reduction (ranging from 87 to 94%) significantly enhances practicality. SPA maintains high accuracy (0.8207 R² for chlorophyll a) when only 45 wavelengths are used (10.4% of the full spectrum), with fitting curves approaching ideal responses, whereas CARS results in greater prediction bias despite the use of fewer wavelengths. Notably, SPA achieves an RPD value of 2.4583 for carotenoid prediction, indicating that feature selection enhances robustness by eliminating non-informative spectral variables. This approach preserves core predictive performance while establishing theoretical foundations for portable vegetation monitoring devices.

Spatiotemporal inversion of pigment content

The established models were applied to monitor the dynamic changes in pigment content during the growth cycle of ginkgo seedlings. As shown in Fig. 11, the inversion results used heatmap color coding (blue: low concentration; yellow: medium concentration; red: high concentration) to reveal pigment distribution patterns across the four growth phases.

In Phase-1, the leaves presented predominantly red (high concentration) areas. This transitioned to yellow-dominated patterns with reduced red coverage in Phase-2. By Phase-3, yellow areas significantly diminished, whereas midrib regions retained stronger signals. Finally, in Phase-4, pigment distributions were barely distinguishable from those in the background. This progressive color variation comprehensively documents the physiological process from vigorous growth to senescence in Ginkgo seedlings.

Fig. 11
figure 11

Spatially inverted distribution maps of Chl a, Chl b, and Car in Ginkgo seedlings across the four growth phases

The inversion results accurately captured the spatial characteristics of leaf senescence—color fading initially appeared at the leaf margins (Phase-3) and gradually progressed toward the petiole (Phase-4). This pattern aligns perfectly with the classical spatial progression of plant organ senescence (from distal to basal regions). The hyperspectral imaging technology, through its non-destructive 2D visualization capability, provides a new dimension for monitoring Ginkgo physiological status.

Discussion

This study implemented a high-resolution ground-based hyperspectral imaging system (spectral resolution: 2.8 nm; spatial resolution: 1936 × 1456 pixels) to enable non-destructive monitoring of ginkgo plants. Compared with conventional ASD single-point measurements [44] and UAV remote sensing approaches [45], this proximal sensing technology demonstrates superior performance in both data quality and leaf feature extraction. Building upon Li’s foundational work [32], we developed a specialized imaging protocol coupled with an adaptive spectral processing framework that successfully resolved the longstanding compatibility challenges between woody plants and high-throughput phenotyping platforms. This technological innovation not only advances ginkgo research but also establishes a methodological paradigm for hyperspectral analysis of other woody species. Furthermore, our whole-plant spectral measurements provide a more holistic characterization of physiological states, overcoming the spatial limitations of traditional point-based sampling.

Through an extensive sampling strategy encompassing 3,460 samples across multiple genotypes and senescence phases, our study significantly advances the field of senescence monitoring. Unlike previous investigations limited to healthy leaves [46], our model achieves robust predictive capability (RPD > 2.3) throughout all discoloration phases while effectively addressing NDVI saturation. The results reveal a distinct degradation pattern: (1) rapid initial decline (Phase-1–2), (2) moderated mid-term attenuation (Phase-2–3), and (3) accelerated late-phase yellowing (Phase-4). This nonlinear progression provides crucial insights into phase-dependent senescence dynamics that were previously overlooked in hyperspectral studies [32, 47, 48], including temporally resolved analyses [17]. Our systematic characterization of these degradation phases establishes a more reliable monitoring methodology, which is particularly valuable for economically important species such as ginkgo, and represents a significant improvement over existing phenotyping approaches.

The screening of spectral data preprocessing methods demonstrates that normalization is the optimal preprocessing strategy for spectral data. By standardizing data scales and eliminating dimensional interference, normalization effectively aligns spectral distributions with the requirements of PLSR, whereas derivative methods failed to yield meaningful model improvements. The consistent performance enhancements underscore normalization’s critical role in optimizing spectral models for plant pigment quantification. This conclusion is supported by prior studies, which acknowledge the importance of preprocessing but do not explicitly validate normalization’s superiority. For instance, Li et al. [32, 44] employed hyperspectral data for plant trait quantification, relying implicitly on effective preprocessing—yet our results uniquely identify normalization as the optimal approach, particularly for PLSR. In contrast, Daughtry [23] employed derivative methods without rigorously assessing their limitations; our findings not only confirm that derivatives do not enhance model performance but also reinforce normalization’s reliability for accurate pigment quantification.

This superiority arises because normalization enhances spectral interpretability while preserving physiologically relevant information. As illustrated in Figure S5 (a–d), raw spectra (a) display high variability in absolute reflectance due to factors such as illumination conditions, sensor geometry, and leaf orientation, which obscure biochemical features. Normalization (b) mitigates these artifacts by rescaling reflectance values to a standardized intensity range, thereby suppressing non-physiological noise while maintaining diagnostically significant spectral contours—including chlorophyll absorption features around 430, 460, 640, and 660 nm, the red-edge transition between 680 and 750 nm, and the NIR plateau beyond 750 nm [49]. In contrast, first- and second-derivative transformations (c, d), though effective at highlighting subtle absorption characteristics, tend to amplify high-frequency noise. This amplification compromises data quality and consequently reduces the robustness of models built on high-dimensional hyperspectral data [50].

The study also confirmed the importance of full-spectrum analysis in hyperspectral modeling. Unlike traditional methods relying on empirical vegetation indices [51], our AdaBoost-SPA integrated framework retained key spectral sensitivity regions while achieving an 85% dimensionality reduction [52].

At the methodological level, the optimized “Normalization-AdaBoost-SPA” framework significantly improved the model practicality and efficiency. Normalization effectively mitigated shadow and illumination variability issues in whole-plant measurements. This finding is consistent with Rautiainen et al.’s [53] conclusions in forest canopy research. Moreover, the AdaBoost algorithm demonstrated advantages in terms of large-sample processing speed and adaptability to class imbalance.

Furthermore, the feature selection capability of this framework identified a suite of characteristic bands that are both interpretable and mechanistically sound. These include the green reflectance peak at 550 nm, chlorophyll a absorption between 650 and 680 nm, the red-edge inflection point around 720–740 nm, and the NIR plateau beyond 800 nm. The close alignment of these features with established vegetation spectral principles [54, 55] confirms that the pipeline not only improves predictive accuracy but also strengthens the biological interpretability of the model, creating a credible mapping between spectral features and pigment dynamics. Collectively, these attributes—noise suppression through normalization, targeted dimensionality reduction, and powerful ensemble modeling—enable the proposed framework to achieve an optimal balance between signal fidelity and feature relevance, outperforming derivative-based approaches in both accuracy and mechanistic insight.

Consequently, this technical approach is not only suitable for laboratory research but also offers a feasible solution for future field-scale high-throughput phenotyping analysis.

The established “Normalization-AdaBoost-SPA” framework provides a reliable tool for high-throughput phenotyping of ginkgo germplasm resources. Its application prospects are mainly reflected in three aspects: First, this technology can be integrated with multi-omics platforms. For example, combining hyperspectral-predicted photosynthetic pigment phenotypes with genomic and metabolomic data can reveal the genetic basis of pigment metabolism and accelerate the selection of genotypes with high photosynthetic efficiency or specific secondary metabolite content. Second, the key characteristic wavelengths identified in this study (such as the characteristic absorption band of Chl a around 650–680 nm) provide a basis for developing low-cost, portable multispectral sensors suitable for large-scale field breeding screening, addressing the issues of expensive hyperspectral equipment and complex data processing. Finally, this model has unique advantages for monitoring the dynamic changes of pigments during the autumn leaf color transition period of ginkgo, and is expected to be applied in landscape plant physio-ecological monitoring and optimization of harvesting time for medicinal plants.

However, this study has some limitations that need to be addressed in future work. First, the model training data all came from a greenhouse seedling environment. Environmental factors (such as water stress, soil nutrient differences) may affect the spectral response, and introducing environmental correction factors may be a necessary approach to improve the model’s generalization ability. Second, although the AdaBoost algorithm performed excellently, its computational efficiency when processing extremely high-dimensional spectral data may still become a bottleneck for large-scale real-time processing. Lightweight deep learning models (such as 1-D CNN [56, 57]) or model compression techniques could be explored in the future to balance accuracy and efficiency. Third, hyperspectral imaging is susceptible to ambient light changes and leaf inclination angles. Although normalized preprocessing alleviated this problem to some extent, developing active imaging systems that sensor fusion approaches (e.g., IMU, LiDAR, or polarization) may be an important direction for achieving robust field measurements.

In summary, the hyperspectral analysis framework proposed in this study shows great potential in non-destructive detection of ginkgo pigments, laying a technical foundation for its application in precision breeding and physiological monitoring. Future research should focus on promoting the transition of this technology from controlled environments to complex field environments and improving its robustness and practicality through multi-technology integration.

Conclusion

This study focused on Ginkgo biloba seedlings and established a non-destructive prediction model for chlorophyll and carotenoid contents based on hyperspectral imaging technology. By applying normalization preprocessing and the AdaBoost-SPA feature selection algorithm, sensitive spectral features in the visible range were optimized, resulting in a high-precision and stable prediction model. This model is suitable for rapid phenotyping analysis of Ginkgo biloba seedlings, providing an efficient and non-destructive detection method for high-throughput screening of germplasm resources and breeding of superior varieties, demonstrating significant practical value.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

Chl a:

Chlorophyll a

Chl b:

Chlorophyll b

Car:

Carotenoids

AdaBoost:

Adaptive boosting

PLSR:

Partial least squares regression

RF:

Random forest

CARS:

Competitive adaptive reweighted sampling

SPA:

Successive projections algorithm

ROI:

Region of interest

R2 cv :

Cross-validated coefficient of determination

SECV:

Standard error of cross-validation

R2 p :

Coefficient of determination

RMSEP:

Root mean square error of prediction

RPD:

Ratio of performance to deviation

EDF:

Exponentially decreasing function

References

  1. Zhao YP, Fan G, Yin PP, Sun S, Li N, Hong X, Hu G, Zhang H, Zhang FM, Han JD, Hao YJ, Xu Q, Yang X, Xia W, Chen W, Lin HY, Zhang R, Chen J, Zheng XM, Lee SMY, Lee J, Uehara K, Wang J, Yang H, Fu CX, Liu X, Xu X, Ge S. Resequencing 545 Ginkgo genomes across the world reveals the evolutionary history of the living fossil. Nat Commun. 2019;10(1):4201. https://doi.org/10.1038/s41467-019-12133-5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Lu J, Xu Y, Meng Z, Cao M, Liu S, Kato-Noguchi H, Yu W, Jin B, Wang L. Integration of morphological, physiological and multi-omics analysis reveals the optimal planting density improving leaf yield and active compound accumulation in Ginkgo Biloba. Ind Crops Prod. 2021;172:114055. https://doi.org/10.1016/j.indcrop.2021.114055.

    Article  CAS  Google Scholar 

  3. Van Beek TA, Montoro P. Chemical analysis and quality control of Ginkgo Biloba leaves, extracts, and phytopharmaceuticals. J Chromatogr A. 2009;1216(11):2002–32. https://doi.org/10.1016/j.chroma.2009.01.013.

    Article  CAS  PubMed  Google Scholar 

  4. Lieth H, Whittaker RH, editors. Primary productivity of the biosphere. Volume 14. Springer Berlin Heidelberg; 1975. https://doi.org/10.1007/978-3-642-80913-2.

  5. Gitelson AA, Viña A, Ciganda V, Rundquist DC, Arkebauer TJ. Remote Estimation of canopy chlorophyll content in crops. Geophys Res Lett. 2005;32(8):2005GL022688. https://doi.org/10.1029/2005GL022688.

    Article  CAS  Google Scholar 

  6. Demmig-Adams B. Carotenoids and photoprotection in plants: A role for the xanthophyll Zeaxanthin. Biochim Et Biophys Acta (BBA) - Bioenergetics. 1990;1020(1):1–24. https://doi.org/10.1016/0005-2728(90)90088-L.

    Article  CAS  Google Scholar 

  7. Frank HA, Brudvig GW. Redox functions of carotenoids in photosynthesis. Biochemistry. 2004;43(27):8607–15. https://doi.org/10.1021/bi0492096.

    Article  CAS  PubMed  Google Scholar 

  8. Xu Y, Wang G, Cao F, Zhu C, Wang G, El-Kassaby YA. Light intensity affects the growth and flavonol biosynthesis of Ginkgo (Ginkgo Biloba L). New Forest. 2014;45(6):765–76. https://doi.org/10.1007/s11056-014-9435-7.

    Article  Google Scholar 

  9. Zhang H, Ge Y, Xie X, Atefi A, Wijewardane NK, Thapa S. High throughput analysis of leaf chlorophyll content in sorghum using RGB, hyperspectral, and fluorescence imaging and sensor fusion. Plant Methods. 2022;18(1):60. https://doi.org/10.1186/s13007-022-00892-0.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kurepin LV, Zaman M, Pharis RP. Phytohormonal basis for the plant growth promoting action of naturally occurring biostimulators: basis for the action of biostimulators. J Sci Food Agric. 2014;94(9):1715–22. https://doi.org/10.1002/jsfa.6545.

    Article  CAS  PubMed  Google Scholar 

  11. Arnon DI. Copper enzymes in isolated chloroplasts. Polyphenoloxidase in BETA VULGARIS. Plant Physiol. 1949;24(1):1–15. https://doi.org/10.1104/pp.24.1.1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Houborg R, McCabe MF. A hybrid training approach for leaf area index Estimation via cubist and random forests machine-learning. ISPRS J Photogrammetry Remote Sens. 2018;135:173–88. https://doi.org/10.1016/j.isprsjprs.2017.10.004.

    Article  Google Scholar 

  13. Koirala B, Zahiri Z, Scheunders P. A machine learning framework for estimating leaf biochemical parameters from its spectral reflectance and transmission measurements. IEEE Trans Geosci Remote Sens. 2020;58(10):7393–405. https://doi.org/10.1109/TGRS.2020.2982263.

    Article  Google Scholar 

  14. Zhang J, Tian H, Wang D, Li H, Mouazen AM. A novel spectral index for Estimation of relative chlorophyll content of sugar beet. Comput Electron Agric. 2021;184:106088. https://doi.org/10.1016/j.compag.2021.106088.

    Article  Google Scholar 

  15. Chen X, Li F, Shi B, Chang Q. Estimation of winter wheat plant nitrogen concentration from UAV hyperspectral remote sensing combined with machine learning methods. Remote Sens. 2023;15(11):2831. https://doi.org/10.3390/rs15112831.

    Article  Google Scholar 

  16. Yuan Z, Cao Q, Zhang K, Ata-Ul-Karim ST, Tian Y, Zhu Y, Cao W, Liu X. Optimal leaf positions for SPAD meter measurement in rice. Front Plant Sci. 2016;7. https://doi.org/10.3389/fpls.2016.00719.

  17. Ye Z, Tan X, Dai M, Chen X, Zhong Y, Zhang Y, Ruan Y, Kong D. A hyperspectral deep learning attention model for predicting lettuce chlorophyll content. Plant Methods. 2024;20(1):22. https://doi.org/10.1186/s13007-024-01148-9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Liu H, Bruning B, Garnett T, Berger B. Hyperspectral imaging and 3D technologies for plant phenotyping: from satellite to close-range sensing. Comput Electron Agric. 2020;175:105621. https://doi.org/10.1016/j.compag.2020.105621.

    Article  Google Scholar 

  19. Uddling J, Gelang-Alfredsson J, Piikki K, Pleijel H. Evaluating the relationship between leaf chlorophyll concentration and SPAD-502 chlorophyll meter readings. Photosynth Res. 2007;91(1):37–46. https://doi.org/10.1007/s11120-006-9077-5.

    Article  CAS  PubMed  Google Scholar 

  20. Gonthier P, Garbelotto M, Nicolotti G. Swiss stone pine trees and Spruce stumps represent an important habitat for Heterobasidion spp. In subalpine forests. Forest Pathol. 2003;33(3):191–203. https://doi.org/10.1046/j.1439-0329.2003.00323.x.

    Article  Google Scholar 

  21. Gitelson AA, Gritz † Y, Merzlyak MN. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J Plant Physiol. 2003;160(3):271–82. https://doi.org/10.1078/0176-1617-00887.

    Article  CAS  PubMed  Google Scholar 

  22. Sims DA, Gamon JA. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens Environ. 2002;81(2–3):337–54. https://doi.org/10.1016/S0034-4257(02)00010-X.

    Article  Google Scholar 

  23. Daughtry C. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens Environ. 2000;74(2):229–39. https://doi.org/10.1016/S0034-4257(00)00113-9.

    Article  Google Scholar 

  24. Zhi X, Massey-Reed SR, Wu A, Potgieter A, Borrell A, Hunt C, Jordan D, Zhao Y, Chapman S, Hammer G, George-Jaeggli B. Estimating photosynthetic attributes from High-Throughput canopy hyperspectral sensing in sorghum. Plant Phenomics. 2022;2022:9768502. https://doi.org/10.34133/2022/9768502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Bendig J, Yu K, Aasen H, Bolten A, Bennertz S, Broscheit J, Gnyp ML, Bareth G. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int J Appl Earth Obs Geoinf. 2015;39:79–87. https://doi.org/10.1016/j.jag.2015.02.012.

    Article  Google Scholar 

  26. Deery D, Jimenez-Berni J, Jones H, Sirault X, Furbank R. Proximal remote sensing buggies and potential applications for Field-Based phenotyping. Agronomy. 2014;4(3):349–79. https://doi.org/10.3390/agronomy4030349.

    Article  Google Scholar 

  27. Jin X, Zarco-Tejada PJ, Schmidhalter U, Reynolds MP, Hawkesford MJ, Varshney RK, Yang T, Nie C, Li Z, Ming B, Xiao Y, Xie Y, Li S. High-Throughput Estimation of crop traits: A review of ground and aerial phenotyping platforms. IEEE Geoscience Remote Sens Magazine. 2021;9(1):200–31. https://doi.org/10.1109/MGRS.2020.2998816.

    Article  Google Scholar 

  28. Zarco-Tejada PJ, Guillén-Climent ML, Hernández-Clemente R, Catalina A, González MR, Martín P. Estimating leaf carotenoid content in vineyards using high resolution hyperspectral imagery acquired from an unmanned aerial vehicle (UAV). Agric for Meteorol. 2013;171–172:281–94. https://doi.org/10.1016/j.agrformet.2012.12.013.

    Article  Google Scholar 

  29. Cheng T, Riaño D, Ustin SL. Detecting diurnal and seasonal variation in canopy water content of nut tree orchards from airborne imaging spectroscopy data using continuous wavelet analysis. Remote Sens Environ. 2014;143:39–53. https://doi.org/10.1016/j.rse.2013.11.018.

    Article  Google Scholar 

  30. Aasen H, Burkart A, Bolten A, Bareth G. Generating 3D hyperspectral information with lightweight UAV snapshot cameras for vegetation monitoring: from camera calibration to quality assurance. ISPRS J Photogrammetry Remote Sens. 2015;108:245–59. https://doi.org/10.1016/j.isprsjprs.2015.08.002.

    Article  Google Scholar 

  31. Kanning M, Kühling I, Trautz D, Jarmer T, High-Resolution. UAV-Based hyperspectral imagery for LAI and chlorophyll estimations from wheat for yield prediction. Remote Sens. 2018;10(12):2000. https://doi.org/10.3390/rs10122000.

    Article  Google Scholar 

  32. Li W, Weiss M, Jay S, Wei S, Zhao N, Comar A, Lopez-Lozano R, De Solan B, Yu Q, Wu W, Baret F. Daily monitoring of effective green area index and vegetation chlorophyll content from continuous acquisitions of a multi-band spectrometer over winter wheat. Remote Sens Environ. 2024;300:113883. https://doi.org/10.1016/j.rse.2023.113883.

    Article  Google Scholar 

  33. Yin S, Zhou K, Cao L, Shen X. Estimating the horizontal and vertical distributions of pigments in canopies of Ginkgo plantation based on UAV-Borne lidar, hyperspectral data by coupling PROSAIL model. Remote Sens. 2022;14(3):715. https://doi.org/10.3390/rs14030715.

    Article  Google Scholar 

  34. Yue Z, Zhang Q, Zhu X, Zhou K. Chlorophyll content Estimation of Ginkgo seedlings based on deep learning and hyperspectral imagery. Forests. 2024;15(11):2010. https://doi.org/10.3390/f15112010.

    Article  Google Scholar 

  35. Hu B. Studies on Ginkgo Biloba Leaf Chlorophyll Content Estimation Based on Image Analysis and Hyperspectral Analysis. Master’s thesis. Shenyang Agricultural University; 2017.

  36. Lichtenthaler HK, Wellburn AR. Determinations of total carotenoids and chlorophylls a and b of leaf extracts in different solvents. Biochem Soc Trans. 1983;11(5):591–2. https://doi.org/10.1042/bst0110591.

  37. Kennard RW, Stone LA. Computer aided design of experiments. Technometrics. 1969;11(1):137–48. https://doi.org/10.1080/00401706.1969.10490666.

  38. Wu Q, ZHU Z, Wu J, Xu X. A dataset representativeness metric and A slicing sampling strategy for the Kennard-Stone algorithm. Chem J Chin Universities. 2022;43(10):0397. https://doi.org/10.7503/cjcu20220397.

    Article  CAS  Google Scholar 

  39. Abdi H. Partial least squares regression and projection on latent structure regression (PLS Regression). WIRE Comput Stat. 2010;2(1):97–106. https://doi.org/10.1002/wics.51.

  40. Merentitis A, Debes C, Heremans R. Ensemble learning in hyperspectral image classification: toward selecting a favorable Bias-Variance tradeoff. IEEE J Sel Top Appl Earth Observations Remote Sens. 2014;7(4):1089–102. https://doi.org/10.1109/JSTARS.2013.2295513.

    Article  Google Scholar 

  41. Fan W, Stolfo SJ, Zhang J. The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 1999:362–366. https://doi.org/10.1145/312129.312283

  42. Ding D, Yu H, Yin Y, Yuan Y, Li Z, Li F. Determination of chlorophyll and hardness in cucumbers by Raman spectroscopy with successive projections algorithm (SPA) – Extreme learning machine (ELM). Anal Lett. 2023;56(8):1216–28. https://doi.org/10.1080/00032719.2022.2123922.

    Article  CAS  Google Scholar 

  43. Li H, Liang Y, Xu Q, Cao D. Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal Chim Acta. 2009;648(1):77–84. https://doi.org/10.1016/j.aca.2009.06.046.

    Article  CAS  PubMed  Google Scholar 

  44. Atzberger C, Guérif M, Baret F, Werner W. Comparative analysis of three chemometric techniques for the spectroradiometric assessment of canopy chlorophyll content in winter wheat. Comput Electron Agric. 2010;73(2):165–73. https://doi.org/10.1016/j.compag.2010.05.006.

    Article  Google Scholar 

  45. Dalponte M, Ørka HO, Ene LT, Gobakken T, Næsset E. Tree crown delineation and tree species classification in boreal forests using hyperspectral and ALS data. Remote Sens Environ. 2014;140:306–17. https://doi.org/10.1016/j.rse.2013.09.006.

    Article  Google Scholar 

  46. Ustin SL, Gamon JA. Remote sensing of plant functional types. New Phytol. 2010;186(4):795–816. https://doi.org/10.1111/j.1469-8137.2010.03284.x.

    Article  PubMed  Google Scholar 

  47. Yendrek CR, Tomaz T, Montes CM, Cao Y, Morse AM, Brown PJ, McIntyre LM, Leakey ADB, Ainsworth EA. High-Throughput phenotyping of maize leaf physiological and biochemical traits using hyperspectral reflectance. Plant Physiol. 2017;173(1):614–26. https://doi.org/10.1104/pp.16.01447.

    Article  CAS  PubMed  Google Scholar 

  48. Angel Y, McCabe MF. Machine learning strategies for the retrieval of Leaf-Chlorophyll dynamics: model choice, sequential versus retraining learning, and hyperspectral predictors. Front Plant Sci. 2022;13. https://doi.org/10.3389/fpls.2022.722442.

  49. Song D, Gao D, Sun H, Qiao L, Zhao R, Tang W, Li M. Chlorophyll content Estimation based on cascade spectral optimizations of interval and wavelength characteristics. Comput Electron Agric. 2021;189:106413. https://doi.org/10.1016/j.compag.2021.106413.

    Article  Google Scholar 

  50. Chen X, Dong Z, Liu J, Wang H, Zhang Y, Chen T, Du Y, Shao L, Xie J. Hyperspectral characteristics and quantitative analysis of leaf chlorophyll by reflectance spectroscopy based on a genetic algorithm in combination with partial least squares regression. Spectrochim Acta Part A Mol Biomol Spectrosc. 2020;243:118786. https://doi.org/10.1016/j.saa.2020.118786.

    Article  CAS  Google Scholar 

  51. Verma B, Prasad R, Srivastava PK, Yadav SA, Singh P, Singh RK. Investigation of optimal vegetation indices for retrieval of leaf chlorophyll and leaf area index using enhanced learning algorithms. Comput Electron Agric. 2022;192:106581. https://doi.org/10.1016/j.compag.2021.106581.

    Article  Google Scholar 

  52. Xu Y, Mao Y, Li H, Sun L, Wang S, Li X, Shen J, Yin X, Fan K, Ding Z, Wang Y. A deep learning model for rapid classification of tea coal disease. Plant Methods. 2023;19(1):98. https://doi.org/10.1186/s13007-023-01074-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Rautiainen M, Lukeš P, Homolová L, Hovi A, Pisek J, Mõttus M. Spectral properties of coniferous forests: A review of in situ and laboratory measurements. Remote Sens. 2018;10(2):207. https://doi.org/10.3390/rs10020207.

    Article  Google Scholar 

  54. Liu N, Xing Z, Zhao R, Qiao L, Li M, Liu G, Sun H. Analysis of chlorophyll concentration in potato crop by coupling continuous wavelet transform and spectral variable optimization. Remote Sens. 2020;12(17):2826. https://doi.org/10.3390/rs12172826.

    Article  Google Scholar 

  55. Cheng J, Yang G, Xu W, Feng H, Han S, Liu M, Zhao F, Zhu Y, Zhao Y, Wu B, Yang H. Improving the Estimation of Apple leaf photosynthetic pigment content using fractional derivatives and machine learning. Agronomy. 2022;12(7):1497. https://doi.org/10.3390/agronomy12071497.

    Article  CAS  Google Scholar 

  56. Sun Q, Zhao G, Xia X, Xie Y, Fang C, Sun L, Wu Z, Pan C. Hyperspectral image classification based on Multi-Scale convolutional features and Multi-Attention mechanisms. Remote Sens. 2024;16(12):2185. https://doi.org/10.3390/rs16122185.

    Article  Google Scholar 

  57. Jia J, Zheng X, Wang Y, Chen Y, Karjalainen M, Dong S, Lu R, Wang J, Hyyppä J. The effect of artificial intelligence evolving on hyperspectral imagery with different signal-to-noise ratio, spectral and Spatial resolutions. Remote Sens Environ. 2024;311:114291. https://doi.org/10.1016/j.rse.2024.114291.

    Article  Google Scholar 

Download references

Funding

This work was funded by the Jiangsu Agriculture Science and Technology Innovation Fund (CX(23)1027) and the National Natural Science Foundation of China (32101521).

Author information

Authors and Affiliations

Authors

Contributions

X.Y. and F.F. conceptualized the study and designed the methodology. X.Y., Z.W., and L.L. conducted the investigation. X.Y. performed data analysis, validated results, managed data, and wrote the original draft. F.F. and F.C. provided resources, supervised the research, and managed the project. X.Y., J.H., M.M., GW, and K.Z. reviewed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Fuliang Cao, Kai Zhou or Fangfang Fu.

Ethics declarations

Ethics approval and consent to participate

All authors agreed to publish this manuscript.

Consent for publication

Consent and approval for publication were obtained from all authors.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Wei, Z., Li, L. et al. Large-scale non-destructive crown-level assessment of Ginkgo pigments via hyperspectral and machine learning techniques. Plant Methods 21, 130 (2025). https://doi.org/10.1186/s13007-025-01439-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13007-025-01439-9

Keywords