Vector graphics extraction and analysis of electrical resistance data in Nature volume 586 pages 373377 2020 J. J. Hamlin1

2025-05-06 0 0 729.41KB 6 页 10玖币
侵权投诉
Vector graphics extraction and analysis of electrical resistance data
in Nature volume 586, pages 373–377 (2020)
J. J. Hamlin1
1Department of Physics, University of Florida, Gainesville, FL 32611, USA
(Dated: September 2022)
In this paper, I present an analysis of the electrical resistance graphs in Ref. [1], which reported
the discovery of room temperature superconductivity in a carbonaceous sulfur hydride and was
subsequently retracted on September 26th, 2022. I show that, over a single temperature interval,
the electrical resistance data can be decomposed into at least two signals of differing digital precision,
thus raising questions concerning the methods used to obtain the published data. Since the raw data-
files for the electrical resistance measurements have not been made available, in order to perform
this analysis, I have developed a set of python scripts to extract the data-points with high precision
from the internal structure of the vector graphics image files. I describe the data extraction method.
Example code and the resulting electrical resistance vs temperature data-files are made available in
public repositories.
INTRODUCTION
In October 2020, a paper was published reporting the
discovery of room temperature superconductivity in a
carbonaceous sulfur hydride (CSH) under high pressure
conditions [1]. The claim was based primarily on three
pieces of evidence: diamagnetic transitions in the ac mag-
netic susceptibility, zero electrical resistance, and sup-
pression of the transition in an applied magnetic field.
On Monday, September 26th, a retraction was published,
with the notice focusing on the validity of the background
subtraction method applied to the ac magnetic suscepti-
bility data. The retraction notice notes that the authors
maintain that the raw data provide strong support for
the main claims of the original paper.
In this work, I have analyzed the electrical resistance
graphs in Fig. 1a and Fig. 2b of Ref. [1], which report
electrical resistance measurements. Hereafter, I refer to
these figures as CSH-Fig1a and CSH-Fig2b, respectively.
I find artifacts similar to those that appear in the mag-
netic susceptibility data [2]. In the case of the magnetic
susceptibility data, two of the authors of Ref. [1] stated
that such artifacts are a consequence of the user-defined
background that was subtracted from the data [3]. No
such background subtraction for the published electrical
resistance data was indicated in Ref. [1].
VECTOR-BASED DATA EXTRACTION
The authors of Ref. [1] have not made the electrical
resistance raw data files publicly available. In such situ-
ations interested parties are left to analyze the published
graphs. Typically, this is done using bitmap-based data
extraction methods implemented by programs such as
DataThief [4] and Webplotdigitizer [5]. In this method,
the axes scale bars are manually calibrated and the data
points are selected either by hand or, in some cases, using
automated methods that function with varying levels of
success. Bitmap-based data extraction faces several lim-
itations. The precision with which the data points can
be extracted is limited by the resolution of the image file.
In favorable cases, the image resolution may be approx-
imately 600 pixels-per-inch. However, figures are often
included in formats such as jpg, where image compres-
sion may introduce artifacts that blur the edges of data
points. Most importantly, data points that are visually
covered by other data points will be invisible and thus im-
possible to extract. This is often the case for graphs that
contain a large number of densely-spaced data points.
Many journals adopt policies that graphics should be
included in vector format when possible. Common vec-
tor graphics formats include pdf,eps, and svg (scalable
vector graphics). A simple way to determine whether a
graph in a paper is a vector-based image is to use the
text-selection tool to try to select some of the text in the
graph. If you are able to select the text, it is likely a
vector-based image. A vector format image will remain
sharp no matter how much you zoom-in on it. Rather
than describing graphics as a grid of pixels, these for-
mats can include instructions describing a set of paths to
draw. The full svg instruction specification is available
at Ref. [6]. For example, a square data point might be
described by specifying the coordinates of a set of four
lines. Figure 1shows examples of the path instructions
for (a) a single red, Run 2 data point in CSH-Fig1a and
(b) a single red 9 T data point in CSH-Fig2b, after con-
verting the corresponding pages of the article pdf into
svg format using pdf2svg [7]. Typically, the coordinates
associated with these paths are stored with substantially
higher precision than a corresponding bitmap equivalent.
In principle then, it is possible to analyze the instruc-
tions in a vector image file to extract the coordinates of
the data points. Extraction of the data points from the
vector image data allows extraction of points that are
completely covered by other data points and are there-
fore not directly visible when viewing the image in a pdf
arXiv:2210.10766v1 [cond-mat.supr-con] 19 Oct 2022
2
(a)
< path style =" st roke : none ; fill - ru le : evenodd ; fill : rgb (94 .1 177 37 % , 32 .1 56 37 2% ,32 .5 485 23 %) ; fill -
opacity :1;" d =" M 208.2 9 2 969 217 . 4 72656 C 208 . 2 92969 21 8 . 171875 2 0 7.72656 2 2 18.7382 8 1
207.0312 5 2 1 8.73828 1 C 206. 3 3 2031 218. 7 3 8281 205. 7 6 5625 218. 1 7 1875 205. 7 6 5625 217. 4 7 2656 C
205.76 5 6 25 216.77 7 3 44 206.3 3 2 0 31 216.2 1 0 938 207.031 2 5 2 16.2109 3 8 C 207 . 7 26562 216 . 2 10938
208.29 2 9 69 216.77 7 3 44 208.29 2 9 69 217.47 2 6 56 "/ >
(b)
< path style =" fill - rule : evenodd ; fil l : rgb ( 92 .9 412 84 % , 13 .3 33 13 % , 14 .1 17 43 2% ) ; fill - opacity :1; stroke -
wi dth :0.139; stroke - l inecap : round ; stroke - linejo in : round ; str oke : rgb
( 52 .9 41 8 95 % , 9. 01 9 47 % , 9. 80 3 77 2% ) ; s tro ke - o pa ci ty : 1; stro ke - m i te rl im i t : 10 ;" d =" M 0 .0 00 4 57 77
-0 . 00143164 L 1.340302 1.3 3 8 4 1 2 L 2. 68 40 52 -0.0014316 4 L 1 . 3 4 03 02 -1.3 37369 Z M 0.00 0 4 5777
-0 . 00143164 " tr a ns form =" ma t r i x ( 1 ,0 ,0 , -1 ,46 3.7261 05 ,81.529818) "/ >
FIG. 1. Example of instructions used to draw data points in SVG files. (a) Shows an example of a red, Run 2 data point from
CSH-Fig1a, while (b) shows an example of red 9 T data point from CSH-Fig2b. Note the different structure of the two sets of
instructions: (b) uses a transform matrix, while (a) does not. Different graphs may describe the data in slightly different ways,
but the instructions can be straightforwardly parsed to extract the coordinates of each data point.
of a manuscript. Such is not possible from a bitmap im-
age. While not common, vector-based data extraction
has been implemented and shown to be reliable in least
one other study [8]. The technique has the potential to
aid in future data-mining efforts. Databases of experi-
mental information such as the The Pauling File [9], may
consider the feasibility of adding vector-extracted data to
their collections.
The data extraction workflow employed in this work is
as follows:
1. Convert an entire page of the pdf into svg format
using the pdf2svg utility [7].
2. Identify the hexadecimal color for each data set.
3. Use the svgpathtools python library [10] to extract
the svg path objects that have the correct color and
compute the centroid of the vertices of each path
(i.e. the center of each data point), or in the case
of line plots, the coordinates of each line segment.
4. Remove data points from the series that come from
the figure legends or other figures on the same page.
5. Analyze the locations of the tick marks in the svg
file in order to calibrate the scale.
Full details of the data extraction method and example
code are available in extensively commented jupyterlab
notebooks hosted on github [11] and archived on zen-
odo [12]. The program Inkscape [13] can also be useful
for initial exploration of the structure of the svg file.
Inkscape’s pdf to svg conversion assigns a path identi-
fication (ID) number property to every path, which can
be helpful for identifying the parts of the svg code that
correspond to certain objects in the graph. One can open
the svg in Inkscape, click on an item to find the path ID,
and then open the svg in a text editor and search for
that path ID to view the corresponding code.
VALIDATION
In order to validate the extraction method, I include
here an analysis of the extracted data for Fig. 3 of
Ref. [14] (Re22Be-Fig3), for which I have access to the
original raw data files. The paper was recently published
in Physical Review B, and Re22Be-Fig3 was included as
a vector graphics image. This figure presents normalized
electrical resistivity vs temperature measured at several
pressures for the material Be22Re. Superconducting tran-
sitions are present in each measurement. For this anal-
ysis, we focus on the measurement at 15 GPa, because
at that pressure, the data at both high and low tem-
perature are mostly obscured by measurements at other
pressures. Bitmap-based data extraction would not be
able to recover data in those regions.
Figure 2shows a comparison of the extracted data
and the original raw data. The raw resistance data has
been normalized to the value at 10 K, consistent with the
axis labeling in Be22Re-Fig3. The insets of Fig. 2show
zoomed comparisons of the extracted and raw data. Very
small differences between the two data sets are caused by
the limited precision of the svg path data.
PRECISION OF EXTRACTION
While a vector image plot in a publication has higher
precision than a corresponding bitmap, the precision is
not unlimited. The precision of data in a raw data file is
set not only by the number of digits recorded in the file
but also by the number of bits in the analog to digital
摘要:

VectorgraphicsextractionandanalysisofelectricalresistancedatainNaturevolume586,pages373{377(2020)J.J.Hamlin11DepartmentofPhysics,UniversityofFlorida,Gainesville,FL32611,USA(Dated:September2022)Inthispaper,IpresentananalysisoftheelectricalresistancegraphsinRef.[1],whichreportedthediscoveryofroomtempe...

展开>> 收起<<
Vector graphics extraction and analysis of electrical resistance data in Nature volume 586 pages 373377 2020 J. J. Hamlin1.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:729.41KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注