Skip to content

mitkeng/SEER

Repository files navigation

python user user user

S∈∈R: Gas Phase Molecular Charge State Predictor


focus

Introduction

SEER (State Ensemble Energy Recognition) is a hybrid knowledge-based machine learning program for ranking molecular charge sites and subsequent prediction of a gas phase "global" minimum energy charge state. The objective of SEER is to accurately assign a minimum energy charge state (with residually higher energy ranked candidate charge states available as auxiliary models) for a given molecule so as to effectively eliminate gross workload and computational cost that can arise from charge modeling any system with numerous titratable sites. This program is appropriate for modeling mass spectrometry $[M-H]^-$ and $[M+H]^+$ charge modes.

The ensembles of molecular ions used to train SEER have been geometry optimized using the quantum mechanical density functional theory D3BJ-B3LYP/6-31G(d,p) or D3BJ-B3LYP/6-31+G(d,p). The accuracy of our previously predicted gas phase molecular ions (used in the training) was asssessed and screened by comparing computed collision cross section values against experimental ion mobility mass spectrometry reference values.

Benefits

  • High turnaround time
  • Unambiguous results
  • Good generalizability
  • Competitive accuracy
  • No structural artifacts
  • User-friendly interface
  • Seemless workflow integration

Specifications

Currently, SEER supports biomolecules and small molecules with commonly observed oxygen and nitrogen as the proton donating or proton acccepting atoms.

Although the ANI-2x geometry optimization use in S∈∈R supports only the atom types H, C, N, O, F, Cl, and S, we run surrogate-optimization for the atom types P, Se, Br, and I to extend SEER applicability to systems containing these atoms.

Functionalities

  • Rank charge sites
  • Predict equilibrium charge state
  • Auto generate $[M+H]^+$ or $[M-H]^-$ structure
  • Soft geometry optimization
  • Compute model relative energy score

🧬 SEER Workflow Documentation


🛠 Setup & Installation

01. Environment Initialization

Initialize the system by executing the core installation script:

bash install_seer.sh

Important

Colab Session Restart Required: The environment will automatically restart. Once reconnected, deploy the PyMOL dependency via the Mamba package manager:

!mamba install -c schrodinger pymol-bundle --yes

🛰 Resource Acquisition

02. Models Retrieval

Satisfy local directory with required pre-trained neural models:

# Pre-trained Models
!wget https://github.com/mitkeng/SEER/raw/refs/heads/main/models/seer_neg_model.zip
!wget https://github.com/mitkeng/SEER/raw/refs/heads/main/models/seer_pos_model.zip

⚡ Execution

03. Running the Workflow

Invoke the seer.py engine by providing the SMILES string and ion configuration.

!python seer.py --smiles "CNC[C@@H](C1=CC(=C(C=C1)O)O)O" --name adrenaline --mode "[M+H]+"

Parameter Specification

Argument Definition Requirements
--smiles Target molecule identifier Valid SMILES string
--name Project identifier Output directory name
--mode Ionization mode (Optional) [M+H]+ (Pos) or [M-H]- (Neg)

📊 Output Manifest

04. Result Analysis

All analytical outputs are routed to Completed_Job/[molecule_name]/.

  • final_ranking_summary.csv — Quantitative energy ranking of protomers.
  • summary.txt — System-generated filtering report.
  • 3D Models — Optimized 3D geometric protomer structures.

System Automated Documentation | SEER v1.0

Additional Information

Input xyz file format:

32               <--- number of atoms
Adenosine        <--- system name
C          0.99780        0.54800        0.55510    
N          0.50730       -0.59360        0.05790
C         -0.81000       -0.67980       -0.21870
C         -1.63810        0.44290        0.02080
C         -0.97870        1.58660        0.54380
N          0.34920        1.68350        0.83200
N         -1.95520        2.54830        0.68610
C         -3.12290        1.92820        0.28910
N         -2.99430        0.69750       -0.13430
C         -1.76470        3.92500        1.24140
C         -0.69150        4.70330        0.43840
C         -1.45460        5.92710       -0.04840
C         -2.88530        5.38690       -0.11800
O         -2.97540        4.60630        1.06550
C         -4.03200        6.41380       -0.17650
O         -3.81480        7.50930        0.69290
O         -0.95000        6.36360       -1.29830
O          0.39650        5.14920        1.22310
...

S∈∈R output xyz file for $[M+H]^+$:

33                   <--- number of atoms
model_energy_-963.90 <--- state energy
C          1.05953          0.66186          0.43596
N          0.52345         -0.51138         -0.04995
H          1.15112         -1.27216         -0.28706
C         -0.82830         -0.63465         -0.30184
C         -1.59541          0.47619         -0.01025
C         -0.94599          1.60849          0.46715
N          0.38790          1.73564          0.69982
N         -1.92473          2.55063          0.65226
C         -3.10928          1.92333          0.30223
N         -2.96391          0.69732         -0.10316
C         -1.77957          3.91399          1.17078
C         -0.73075          4.75616          0.41687
C         -1.52082          5.96647         -0.07566
C         -2.93187          5.39797         -0.14912
O         -3.01393          4.57743          1.02142
C         -4.05373          6.40627         -0.15382
O         -3.77147          7.42167          0.78567
O         -1.01295          6.42594         -1.30962
...

Web Application Option

Open In Colab access to the SEER Application

Limitation

Titratable sites for protonation or deprotonation are limited to nitrogen and oxygen; all other atom types are completely neglected.

Disclaimer and Remarks

All data used in this ML model development was generated inhouse. We do not intend to make publically available any original quantum mechanical data pertaining to the DFT geometry optimization and single point energy calculation, unless reasonably requested. SEER results may differ significantly from pKa-based methods since 1) pKa is not considered in the training and 2) the target is gas phase.

Although only "soft" geometry optimization is carried out, it is a best practice and recommended that an input structure and an output structure be visually compared to ensure that the original molecular integrity, other than a change at the protonation or deprotonation site, is preserved.

Literature Citation

Software can be cited at: https://pubs.acs.org/action/showCitFormats?doi=10.1021/jasms.5c00078

Acknowledgement

  • Merz research group

  • Quantum mechanical calculations and data used in building this model were organically generated through computational resources and services provided by the Institute for Cyber-Enabled Research (ICER) at Michigan State University.