1 Estimating oil and gas recovery factor s via machine learning Database - dependent accuracy and reliability Alireza Roustazadeh1 Behzad Ghanbarian1 Mohammad B. Shadmand2 Vahid Taslimitehrani3

2025-04-30 0 0 1.52MB 42 页 10玖币
侵权投诉
1
Estimating oil and gas recovery factors via machine learning: Database-
dependent accuracy and reliability
Alireza Roustazadeh1, Behzad Ghanbarian1*, Mohammad B. Shadmand2, Vahid Taslimitehrani3,
and Larry W. Lake4
1 Porous Media Research Lab, Department of Geology, Kansas State University, Manhattan 66506
KS, United States
2 Department of Electrical and Computer Engineering, College of Engineering, University of
Illinois at Chicago, Chicago 60607 IL, United States
3 Staff Machine Learning Scientist at Realtor.com, San Francisco CA, United States
4 Hildebrand Department of Petroleum and Geosystems Engineering, University of Texas at
Austin, Austin 78712 TX, United States
* Corresponding author’s email address: ghanbarian@ksu.edu
Abstract
With recent advances in artificial intelligence, machine learning (ML) approaches have
become an attractive tool in petroleum engineering, particularly for reservoir characterizations. A
key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would
provide decisive insights to drilling and production strategies. Therefore, this study aims to
estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as
porosity, permeability, pressure, and water saturation via the ML. We applied three regression-
based models including the extreme gradient boosting (XGBoost), support vector machine (SVM),
2
and stepwise multiple linear regression (MLR) and various combinations of three databases to
construct ML models and estimate the oil and/or gas RF. Using two databases and the cross-
validation method, we evaluated the performance of the ML models. In each iteration 90 and 10%
of the data were respectively used to train and test the models. The third independent database was
then used to further assess the constructed models. For both oil and gas RFs, we found that the
XGBoost model estimated the RF for the train and test datasets more accurately than the SVM and
MLR models. However, the performance of all the models were unsatisfactory for the independent
databases. Results demonstrated that the ML algorithms were highly dependent and sensitive to
the databases based on which they were trained. Statistical tests revealed that such unsatisfactory
performances were because the distributions of input features and target variables in the train
datasets were significantly different from those in the independent databases (p-value < 0.05).
Keywords: Hydrocarbon, Machine learning, Recovery factor, Regression, XGBoost
1. Introduction
Successful exploration and production of a hydrocarbon reservoir have always been
challenging in petroleum engineering [1]. Because of high fluctuations in oil and gas prices, it is
necessary to determine, if a field is economically feasible to be invested or not. A conventional
reservoir goes through different stages during its lifetime, namely exploration, appraisal,
development, production, and lastly abandonment. Once a reservoir is discovered (exploration
stage), the main challenge is to determine whether it has high potential to produce substantial
amount of hydrocarbon and subsequently yield enough profit. Even for a currently producing
reservoir, one still needs to analyze its performance and production rate to assure profits above
economical margins.
3
The performance of a reservoir influences its economic feasibility and is a function of
reservoir quality. Reservoir performance may be quantified by the initial production rate of a
reservoir and the decline in its production rate. Another indicator is the amount of recovered
hydrocarbon from the initial volume of hydrocarbon in a reservoir [2], also known as recovery
factor (RF). The RF, ranging between 0 and 1, is a quantity to evaluate the fate of a reservoir. It is
a function of displacement mechanisms, such as water drive, gas cap, rock compaction drive,
solution gas, and gravity drainage [3]. Its value may be determined at different stages of a reservoir
lifetime. However, in this study we refer to ultimate recovery factor by RF.
There exist different methods to determine the RF during the lifetime of a reservoir. When
a field is in its appraisal phase and no production data are available, analogs and empirical
formulas, e.g., history matching and volumetric reserve estimation, are commonly used. Such
methods, however, generally come with substantial uncertainties due to the lack of adequate
analogs [4]. During the production phase, dynamic properties of a field vary with production and
time and, thus, the RF determination is influenced by enhanced and improved recovery methods
[5] as well as more data being collected from a field [6]. Once approaching the abandonment phase,
the emphasis should be more on economical margin, and how close the field production is to such
a margin. At the plateau production phase, the RF is typically determined by simulations and
dynamic reservoir modeling. However, at late stages of production and when the plateau
production is no longer in place, the decline curve analysis should be analyzed. Various methods
used at different field life stages may lead to different RF determinations and uncertainties [4]. In
addition to the decline curve analysis, one may apply the material balance method to determine
the reserve and, accordingly, the recovery factor [4,6,7]. Both the material balance and production
4
decline curve analysis methods calculate reserves and recovery factor from the performance of a
field [8,9].
The aforementioned methods are time consuming, and their results may have a wide range
of uncertainties[1012]. Moreover, the RF is a function of multiple variables e.g., permeability,
temperature, and gas oil ratio (GOR), which may not be necessarily incorporated in such methods.
In addition, those methods are not cost efficient. With recent advances in artificial intelligence and
data analytics, one may construct a machine learning-based model to estimate the RF from other
available reservoir characteristics [13]. Such models may provide a cost efficient and accurate
platform as well as better insights into reservoir characterization and hydrocarbon production, if
implemented appropriately.
Machine learning (ML) techniques have been previously used to estimate the RF [1417].
For a recent review see Tahmasebi et al. [18]. For example, Lee and Lake [19] applied various
methods including multiple linear regression (MLR), MLR with sequential feature selection,
artificial neural networks (ANN), and Bayesian network to estimate both oil and gas RFs. In their
study, the ANN and MLR with sequential feature selection showed better performance than the
other two methods. Aliyuda and Howell [4] applied the MLR and support vector machine (SVM)
with Gaussian kernel on 93 reservoirs from the Norwegian Continental Shelf. 75 reservoirs were
from the Norwegian Sea, the Norwegian North Sea, and the Barents Sea, and the remaining 18
reservoirs from the Viking Graben in the UK sector of the North Sea. Aliyuda and Howell [4]
found that the SVM model outperformed the MLR model.
In another study, Chen et al. [16] applied the ANN approach to develop predictive oil RF
models with different sets of input data using the database TORIS and identified 19 principal
features (out of 70) in the construction of their ML model. Tewari et al. [20] used six different
5
approaches, i.e., random tree, random forest, SVM, bagging, radial basis function, and multilayer
perceptron and compared them with their proposed ensemble estimator (E2) model. Ensemble
methods are classifier algorithms that develop multiple classifiers and make inferencing using the
weighted vote of multiple estimations that they have made [21]. Tewari et al. [20] showed that
their E2-based model outperformed the other six methods.
In the literature, most ML studies divide a database into train and test splits. Although the
performance of ML models is evaluated using the test dataset, an independent database with no
data overlap is not used to further assess the accuracy and reliability of ML models constructed.
Furthermore, to the best of the authors’ knowledge, the extreme gradient boosting algorithm has
neither been applied to estimate the hydrocarbon recovery factory at the reservoir scale, nor has it
been compared with support vector machine and multiple linear regression. Therefore, the main
objectives of this study are to: (1) apply the extreme gradient boosting (XGBoost) approach to
develop an ML-based model, (2) compare its performance with that of multiple linear regression
(MLR) and support vector machine (SVM) in the estimation of oil and gas RFs using reservoirs
from around the world, and (3) address the database dependence of accuracy and uncertainty in
ML-based models.
2. Materials and Methods
2.1. Databases
- Commercial database
The commercial database analyzed in this study is the same one that [19] used. This
database contains more than 1200 samples including conventional reservoirs with different
geologic formations from around the world. The oil RF values were reported for more than 600
摘要:

1Estimatingoilandgasrecoveryfactorsviamachinelearning:Database-dependentaccuracyandreliabilityAlirezaRoustazadeh1,BehzadGhanbarian1*,MohammadB.Shadmand2,VahidTaslimitehrani3,andLarryW.Lake41PorousMediaResearchLab,DepartmentofGeology,KansasStateUniversity,Manhattan66506KS,UnitedStates2DepartmentofEle...

展开>> 收起<<
1 Estimating oil and gas recovery factor s via machine learning Database - dependent accuracy and reliability Alireza Roustazadeh1 Behzad Ghanbarian1 Mohammad B. Shadmand2 Vahid Taslimitehrani3.pdf

共42页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:42 页 大小:1.52MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注