Predicting CO 2Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks

2025-05-02 0 0 1.74MB 32 页 10玖币

侵权投诉

Predicting CO2Absorption in Ionic Liquids with

Molecular Descriptors and Explainable Graph

Neural Networks

Yue Jian,†Yuyang Wang,‡,¶and Amir Barati Farimani∗,‡,†,¶

†Department of Materials Science and Engineering, Carnegie Mellon University, USA

‡Department of Mechanical Engineering, Carnegie Mellon University, USA

¶Machine Learning Department, Carnegie Mellon University, USA

E-mail: barati@cmu.edu

Abstract

Ionic Liquids (ILs) provide a promising solution for CO2capture and storage to

mitigate global warming. However, identifying and designing the high-capacity IL

from the giant chemical space requires expensive, and exhaustive simulations and ex-

periments. Machine learning (ML) can accelerate the process of searching for desirable

ionic molecules through accurate and eﬃcient property predictions in a data-driven

manner. But existing descriptors and ML models for the ionic molecule suﬀer from

the ineﬃcient adaptation of molecular graph structure. Besides, few works have in-

vestigated the explainability of ML models to help understand the learned features

that can guide the design of eﬃcient ionic molecules. In this work, we develop both

ﬁngerprint-based ML models and Graph Neural Networks (GNNs) to predict the CO2

absorption in ILs. Fingerprint works on graph structure at the feature extraction

stage, while GNNs directly handle molecule structure in both the feature extraction

arXiv:2210.01120v2 [physics.chem-ph] 9 Nov 2022

and model prediction stage. We show that our method outperforms previous ML mod-

els by reaching a high accuracy (MAE of 0.0137, R2of 0.9884). Furthermore, we take

the advantage of GNNs feature representation and develop a substructure-based ex-

planation method that provides insight into how each chemical fragments within IL

molecules contribute to the CO2absorption prediction of ML models. We also show

that our result agrees with some ground truth on functional group importance from

the theoretical understanding of CO2absorption in ILs, which can advise on the design

of novel and eﬃcient functional ILs in the future.

Introduction

Global warming is a major environmental problem in our world. Based on the prediction

of the Intergovernmental Panel on Climate Change (IPCC), the average temperature of our

world will rise about 1.9◦Cif we don’t take any action by 2100.1Among all of the greenhouse

gases, CO2makes the most contribution to global warming to an extent of about 78.6%.2

How to eﬀectively capture and store CO2is crucial for solving the global warming prob-

lem. Existing methods, including physisorption/chemisorption,3,4 membrane separation5or

molecular sieves,6carbamation, amine physical absorption,7amine dry scrubbing,8and min-

eral carbonation,9,10 have been introduced to absorb CO2. However, the reagents used in

these methods suﬀer from insuﬃcient carbon dioxide storage capacity, high energy demand

in absorption process and low thermal stability.11,12 The evaporation and degradation of

reagents may lead the storage process to become costly.13

Ionic liquids (ILs) are families of molten salt that remains liquid state at room temper-

ature. Over the past decades, it received signiﬁcant attention and has been an intensive

research area due to its unique physical and chemical properties, such as nonvolatility, high

chemical stability, high CO2solubility, and easy operation at liquid state. Those properties

make ILs an ideal candidate for CO2storage.14–22 Usually, IL composes pair of ions with dif-

ferent charges, and the combination of ions largely determine the properties of ILs. However,

such combinations of cations and anions as well as the various selections of cations and anions

themselves make it challenging to exhaust the design space of IL for eﬃcient CO2storage

through experiments. To eﬃciently estimate the CO2absorption of ILs, researchers have

investigated the quantitative structure-property relationship (QSPR). QSPR method aims

at building mathematical models for the prediction of numerical properties based on struc-

tural information of chemical compounds.23,24 But traditional methods used in QSPR such

as Molecular Dynamic (MD) and Density Function Theory (DFT) can be computationally

challenging for ILs due to the complexity of inter- and intra- molecular interaction.25–27

The recent development of Machine Learning (ML) methods bears the promise for

QSPR modeling through accurate and eﬃcient property predictions of chemical compounds.

Compared with conventional simulation methods like MD or DFT,28 ML methods have

demonstrated similar accuracy but with less computational cost in various chemical applica-

tions.29–31 Especially, several works have explored applying ML models to solving ionic liquid

problems via various descriptors of IL molecules. Group Contribution Theory (GC) is one

of the earliest descriptors for IL molecules.32–35 GC manually breaks down the molecule into

diﬀerent characteristic functional groups and counts the existence frequency for each group.

However, this descriptor is highly human experience-dependent and may lead to the loss of

information for substructures within the group. Another way of ﬁnding molecule descriptors

is Quantum Chemical Descriptor (QC), which utilizes the calculated properties from DFT to

provide sub-molecule level representations for IL molecules.36–38 But to gain QC descriptors,

one needs to perform expensive and time-consuming QC calculations like DFT to acquire

the properties. ML models like Support Vector Machine (SVM), Random Forest (RF), and

deep learning models such as Multi-layer Perceptron (MLP), Convolutional Neural Network

(CNN), and Recurrent Neural Network (RNN), have been applied on top of the descriptors

to perform various properties prediction tasks.39,40 However, both GC and QC descriptors

can lack the modeling of the structural information of molecules, which conﬁnes the perfor-

mance of ML models. Other molecular descriptors like Extended-Connectivity Fingerprints

(ECFPs), create a feature vector by iteratively aggregating the neighbor information of each

atom and hashing that into a vector.41,42 Such methods directly better encode the structural

information of molecules and can be more expressive. However, such molecular ﬁngerprints

(FP) and how diﬀerent ML models built upon them perform have not been well studied for

CO2absorption in ILs.

Recently, GNNs has shown to be a powerful tool for molecule features representation, and

properties prediction and have received a signiﬁcant amount of attention.43–49 At the feature

representation stage, GNNs directly work on the molecular structure. It treats the molecule

as a graph and utilizes an adjacent matrix to encode the bond edge and connectivity, as well

as a node feature matrix to encode the atom and related properties. This representation is

more generalizable, stable, and less computationally expensive compared with GC and QC

descriptors. At the model training and prediction stage, GNN aggregates the node message

through edge during the forward process,46,50,51 and it outperforms other Neural-Network-

based models on unstructured graphical data.46 GNNs have been involved in many areas

related to molecules, such as drug discovery, quantum chemistry, and structural biology.52

But existing works using GNNs on ILs tend to focus solely on one family of anions or cations,

which still needs to be expanded and generalized.53,54

Besides building ML models to obtain accurate and eﬃcient predictions, how to explain

the output from ML models given certain input data is also an active research area.25,55

In traditional experimental and computational chemistry, researchers heavily rely on their

knowledge and experience in designing new compounds. On the other hand, as a black box,

the intermediate decision process of the ML models is hard to unveil. Understanding how

ML models make decisions can provide us with extra insights into how the structure of the

input molecule aﬀects the property of IL and new IL design from a data-driven perspective.

Explainable algorithms have been developed on GNNs to analyze the importance of each

input edge.56 But in the ILs research area, researchers usually focus on the prediction power of

GNN on various properties but ignore the explainability of the GNN model. Beneﬁting from

graph representation, GNNs has the potential to provide an explanation of the molecular

structure importance that reaches atom and bond levels.

In this paper, we introduce two categories of methods for CO2solubility prediction,

namely, FP-based machine learning models and GNNs. Besides, we also developed an ex-

planation method for IL molecule substructure importance analysis. For FP-based machine

learning models, GC and FP are included as descriptors. We then compare the expressiveness

of FP with GC on diﬀerent machine learning models. For GNNs part, we included Graph

Convolutional Networks (GCN), Graph Attention Networks (GAT), and Graph Isomorphism

Networks (GIN) to do the CO2solubility prediction. Moreover, we make two improvements

to data representation and the GNN framework in order to build an IL explainer. Firstly,

instead of treating cation and anion as two separate graphs, we treat them as a single uncon-

nected graph and feed the whole graph into one GNN network. Secondly, we substitute the

ﬁnal pooling layer of GNN with a global node that connects with every atom within one data

point. Based on that, we develop an IL molecule explainer by combining the improved GNN

framework with the sub-graph-based GNN explaining methods.56 Beneﬁting from the two

improvements we make, the IL molecule explainer can provide an importance score insight

into a single atom level within the IL molecule. we also ﬁnd that our explanation method

can provide a reasonable fragments importance ranking for the IL molecule in the prediction

task and can be a useful tool to guide the design of new IL molecules. To the best of our

knowledge, this is one of the ﬁrst works in applying GNNs and the fragments importance

explanation study that reaches the single atom level for CO2absorption in ILs.

Method

Fig. 1 is the overview of the whole work. IL molecule pairs are represented in three ways

which are GC, FP, and graph representation. GC and FP are combined with various Machine

Learning models such as SVM, RF, XGBoost, and MLP to perform solubility prediction

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PredictingCO2AbsorptioninIonicLiquidswithMolecularDescriptorsandExplainableGraphNeuralNetworksYueJian,yYuyangWang,z,{andAmirBaratiFarimani,z,y,{yDepartmentofMaterialsScienceandEngineering,CarnegieMellonUniversity,USAzDepartmentofMechanicalEngineering,CarnegieMellonUniversity,USA{MachineLearningDepa...

展开>> 收起<<

Predicting CO 2Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Predicting CO 2Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: