Predicting CO 2Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks

2025-05-02 0 0 1.74MB 32 页 10玖币
侵权投诉
Predicting CO2Absorption in Ionic Liquids with
Molecular Descriptors and Explainable Graph
Neural Networks
Yue Jian,Yuyang Wang,,and Amir Barati Farimani,,,
Department of Materials Science and Engineering, Carnegie Mellon University, USA
Department of Mechanical Engineering, Carnegie Mellon University, USA
Machine Learning Department, Carnegie Mellon University, USA
E-mail: barati@cmu.edu
Abstract
Ionic Liquids (ILs) provide a promising solution for CO2capture and storage to
mitigate global warming. However, identifying and designing the high-capacity IL
from the giant chemical space requires expensive, and exhaustive simulations and ex-
periments. Machine learning (ML) can accelerate the process of searching for desirable
ionic molecules through accurate and efficient property predictions in a data-driven
manner. But existing descriptors and ML models for the ionic molecule suffer from
the inefficient adaptation of molecular graph structure. Besides, few works have in-
vestigated the explainability of ML models to help understand the learned features
that can guide the design of efficient ionic molecules. In this work, we develop both
fingerprint-based ML models and Graph Neural Networks (GNNs) to predict the CO2
absorption in ILs. Fingerprint works on graph structure at the feature extraction
stage, while GNNs directly handle molecule structure in both the feature extraction
1
arXiv:2210.01120v2 [physics.chem-ph] 9 Nov 2022
and model prediction stage. We show that our method outperforms previous ML mod-
els by reaching a high accuracy (MAE of 0.0137, R2of 0.9884). Furthermore, we take
the advantage of GNNs feature representation and develop a substructure-based ex-
planation method that provides insight into how each chemical fragments within IL
molecules contribute to the CO2absorption prediction of ML models. We also show
that our result agrees with some ground truth on functional group importance from
the theoretical understanding of CO2absorption in ILs, which can advise on the design
of novel and efficient functional ILs in the future.
Introduction
Global warming is a major environmental problem in our world. Based on the prediction
of the Intergovernmental Panel on Climate Change (IPCC), the average temperature of our
world will rise about 1.9Cif we don’t take any action by 2100.1Among all of the greenhouse
gases, CO2makes the most contribution to global warming to an extent of about 78.6%.2
How to effectively capture and store CO2is crucial for solving the global warming prob-
lem. Existing methods, including physisorption/chemisorption,3,4 membrane separation5or
molecular sieves,6carbamation, amine physical absorption,7amine dry scrubbing,8and min-
eral carbonation,9,10 have been introduced to absorb CO2. However, the reagents used in
these methods suffer from insufficient carbon dioxide storage capacity, high energy demand
in absorption process and low thermal stability.11,12 The evaporation and degradation of
reagents may lead the storage process to become costly.13
Ionic liquids (ILs) are families of molten salt that remains liquid state at room temper-
ature. Over the past decades, it received significant attention and has been an intensive
research area due to its unique physical and chemical properties, such as nonvolatility, high
chemical stability, high CO2solubility, and easy operation at liquid state. Those properties
make ILs an ideal candidate for CO2storage.14–22 Usually, IL composes pair of ions with dif-
ferent charges, and the combination of ions largely determine the properties of ILs. However,
2
such combinations of cations and anions as well as the various selections of cations and anions
themselves make it challenging to exhaust the design space of IL for efficient CO2storage
through experiments. To efficiently estimate the CO2absorption of ILs, researchers have
investigated the quantitative structure-property relationship (QSPR). QSPR method aims
at building mathematical models for the prediction of numerical properties based on struc-
tural information of chemical compounds.23,24 But traditional methods used in QSPR such
as Molecular Dynamic (MD) and Density Function Theory (DFT) can be computationally
challenging for ILs due to the complexity of inter- and intra- molecular interaction.25–27
The recent development of Machine Learning (ML) methods bears the promise for
QSPR modeling through accurate and efficient property predictions of chemical compounds.
Compared with conventional simulation methods like MD or DFT,28 ML methods have
demonstrated similar accuracy but with less computational cost in various chemical applica-
tions.29–31 Especially, several works have explored applying ML models to solving ionic liquid
problems via various descriptors of IL molecules. Group Contribution Theory (GC) is one
of the earliest descriptors for IL molecules.32–35 GC manually breaks down the molecule into
different characteristic functional groups and counts the existence frequency for each group.
However, this descriptor is highly human experience-dependent and may lead to the loss of
information for substructures within the group. Another way of finding molecule descriptors
is Quantum Chemical Descriptor (QC), which utilizes the calculated properties from DFT to
provide sub-molecule level representations for IL molecules.36–38 But to gain QC descriptors,
one needs to perform expensive and time-consuming QC calculations like DFT to acquire
the properties. ML models like Support Vector Machine (SVM), Random Forest (RF), and
deep learning models such as Multi-layer Perceptron (MLP), Convolutional Neural Network
(CNN), and Recurrent Neural Network (RNN), have been applied on top of the descriptors
to perform various properties prediction tasks.39,40 However, both GC and QC descriptors
can lack the modeling of the structural information of molecules, which confines the perfor-
mance of ML models. Other molecular descriptors like Extended-Connectivity Fingerprints
3
(ECFPs), create a feature vector by iteratively aggregating the neighbor information of each
atom and hashing that into a vector.41,42 Such methods directly better encode the structural
information of molecules and can be more expressive. However, such molecular fingerprints
(FP) and how different ML models built upon them perform have not been well studied for
CO2absorption in ILs.
Recently, GNNs has shown to be a powerful tool for molecule features representation, and
properties prediction and have received a significant amount of attention.43–49 At the feature
representation stage, GNNs directly work on the molecular structure. It treats the molecule
as a graph and utilizes an adjacent matrix to encode the bond edge and connectivity, as well
as a node feature matrix to encode the atom and related properties. This representation is
more generalizable, stable, and less computationally expensive compared with GC and QC
descriptors. At the model training and prediction stage, GNN aggregates the node message
through edge during the forward process,46,50,51 and it outperforms other Neural-Network-
based models on unstructured graphical data.46 GNNs have been involved in many areas
related to molecules, such as drug discovery, quantum chemistry, and structural biology.52
But existing works using GNNs on ILs tend to focus solely on one family of anions or cations,
which still needs to be expanded and generalized.53,54
Besides building ML models to obtain accurate and efficient predictions, how to explain
the output from ML models given certain input data is also an active research area.25,55
In traditional experimental and computational chemistry, researchers heavily rely on their
knowledge and experience in designing new compounds. On the other hand, as a black box,
the intermediate decision process of the ML models is hard to unveil. Understanding how
ML models make decisions can provide us with extra insights into how the structure of the
input molecule affects the property of IL and new IL design from a data-driven perspective.
Explainable algorithms have been developed on GNNs to analyze the importance of each
input edge.56 But in the ILs research area, researchers usually focus on the prediction power of
GNN on various properties but ignore the explainability of the GNN model. Benefiting from
4
graph representation, GNNs has the potential to provide an explanation of the molecular
structure importance that reaches atom and bond levels.
In this paper, we introduce two categories of methods for CO2solubility prediction,
namely, FP-based machine learning models and GNNs. Besides, we also developed an ex-
planation method for IL molecule substructure importance analysis. For FP-based machine
learning models, GC and FP are included as descriptors. We then compare the expressiveness
of FP with GC on different machine learning models. For GNNs part, we included Graph
Convolutional Networks (GCN), Graph Attention Networks (GAT), and Graph Isomorphism
Networks (GIN) to do the CO2solubility prediction. Moreover, we make two improvements
to data representation and the GNN framework in order to build an IL explainer. Firstly,
instead of treating cation and anion as two separate graphs, we treat them as a single uncon-
nected graph and feed the whole graph into one GNN network. Secondly, we substitute the
final pooling layer of GNN with a global node that connects with every atom within one data
point. Based on that, we develop an IL molecule explainer by combining the improved GNN
framework with the sub-graph-based GNN explaining methods.56 Benefiting from the two
improvements we make, the IL molecule explainer can provide an importance score insight
into a single atom level within the IL molecule. we also find that our explanation method
can provide a reasonable fragments importance ranking for the IL molecule in the prediction
task and can be a useful tool to guide the design of new IL molecules. To the best of our
knowledge, this is one of the first works in applying GNNs and the fragments importance
explanation study that reaches the single atom level for CO2absorption in ILs.
Method
Fig. 1 is the overview of the whole work. IL molecule pairs are represented in three ways
which are GC, FP, and graph representation. GC and FP are combined with various Machine
Learning models such as SVM, RF, XGBoost, and MLP to perform solubility prediction
5
摘要:

PredictingCO2AbsorptioninIonicLiquidswithMolecularDescriptorsandExplainableGraphNeuralNetworksYueJian,yYuyangWang,z,{andAmirBaratiFarimani,z,y,{yDepartmentofMaterialsScienceandEngineering,CarnegieMellonUniversity,USAzDepartmentofMechanicalEngineering,CarnegieMellonUniversity,USA{MachineLearningDepa...

展开>> 收起<<
Predicting CO 2Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:1.74MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注