Towards Language-driven Scientific AI

2025-05-06 0 0 145.07KB 12 页 10玖币
侵权投诉
arXiv:2210.15327v2 [cs.CL] 31 Oct 2022
Towards Language-driven Scientific AI
Jos´e Manuel G´omez-P´ereza
aLanguage Technology Research Lab, Expert.ai, 3 Poeta Joan
Maragall, Madrid, 28020, Spain
Abstract
Inspired by recent and revolutionary developments in AI, particularly in lan-
guage understanding and generation, we set about designing AI systems that
are able to address complex scientific tasks that challenge human capabilities
to make new discoveries. Central to our approach is the notion of natural
language as core representation, reasoning, and exchange format between sci-
entific AI and human scientists. In this paper, we identify and discuss some
of the main research challenges to accomplish such vision.
Keywords: Science, Artificial Intelligence, Language Understanding
1. Introduction
During her presidential address at the AAAI Conference, Gil (2022) pon-
dered whether artificial intelligence (AI) will write scientific papers in the
future. She believed that we can be hopeful that the answer is yes and
that it may happen sooner than we might expect. As scientific questions
become significantly more complex, our capabilities to do scientific break-
throughs need to be augmented. Compare for instance the challenges of
formulating Kepler’s laws of planetary motion or the discovery of a cure
for Polio with demonstrating the existence of binary stellar-mass black hole
systems (Abbott et al., 2016) or the treatment of glioblastoma, a type of
brain cancer. While the former were achieved by a single scientist, the lat-
ter require large and interdisciplinary teams involving the collaboration of
hundreds of scientists from different fields to work together during years to
produce results.
In this paper, we present a personal perspective inspired by recent break-
throughs in AI and particularly language technologies to enable a next gen-
eration of AI systems that may become an effective part of the scientific
Preprint submitted to arXiv November 1, 2022
ecosystem, collaborate, contribute, and eventually produce significant find-
ings (Kitano, 2016). In recent years, the incorporation of intelligent tech-
niques for data mining and machine learning has provided scientists with
powerful data-driven analytics and discovery capabilities. However, such
techniques have been focused on solving well-defined narrow tasks. Confin-
ing intelligent machines to such tasks can severely limit our ability to truly
harness the potential of AI to enable us to tackle larger scientific problems.
It is time to take a quantum leap. Future scientific endeavors will require
partnerships of scientists and AI, where machines may independently pursue
substantial aspects of the research and contribute their own discoveries. Such
thoughtful AI systems (Gil, 2017) should be capable of formulating their
own research goals, proposing and evaluating hypotheses, designing theories,
debating alternative options, and generating new knowledge. They should
be able to explain their reasoning, compare their rationales to others, and
situate their findings in the existing literature. AI systems should be able to
communicate with scientists with different levels of expertise in a topic. To
form a true partnership, they should be able to take guidance from scientists
as well as to provide guidance to them. Today, this vision is still impossible
to the point that new research is required to make it happen.
The following sections delve into the challenges this vision entails and
how it could be accomplished from a language-driven research perspective.
2. Scientific AI will be language-driven
As part of the scientific task forces of the future, AI systems will need
to exchange feedback with human scientists and learn from their interaction.
Rather than fixed, structured formalisms to represent scientific knowledge,
which can be brittle and constrained to our ability to represent things explic-
itly, we propose a natural language-driven approach where language is the
main formalism to represent and exchange scientific information between the
different agents in the scientific ecosystem, be they humans or machines.
Generative language models like GPT-3 (Brown et al., 2020) or T5 (Raffel et al.,
2020) produce realistic human text based on a statistical bias acquired through
self-supervised training over an extremely large document corpus, learning
to guess the word that is most likely to come next given a prompt, with ap-
plications in many language tasks like information extraction, reading com-
prehension and question answering, conversation, summarization or machine
2
translation. Such models promote a change of paradigm in NLP, from “pre-
train, fine-tune, predict” to “pre-train, prompt, predict”, where a prompt is
a piece of text inserted in the input examples so that the task that needs to
be solved can be formulated as a language modeling problem. Subsequently,
prompt-based prediction (Gao et al., 2021; Schick and Sch¨utze, 2021) seeks
to specify such prompts as effectively as possible.
We posit that the task of formulating research goals, hypotheses, and
claims by machines in natural language, as well as the evaluation of those
produced by other scientists, can be recast into a series of instructions and
prompts in natural language that inform the model. However, there is no
research that has explored this path yet. Generative language models and
prompt-based prediction are promising but still in their infancy, scientific
tasks like the formulation of hypotheses, goals and claims require a level of
knowledge, abstract thinking and reasoning only humans have been capable
of yet, and there are no datasets that enable the evaluation and testing of
systems that aim to solve such tasks at human level.
3. ...but also multi-modal
Although we propose language as the main representation and exchange
formalism for scientific AI systems, scientific knowledge is heterogeneous and
can present itself in many forms. As originally put by Reddy (1988), Reading
a chapter in a college freshman text and answering the questions at the end
of the chapter is a hard problem that requires advances in vision, language,
problem-solving, and learning theory.”. As of today, this is still one of the
grand challenges to be tackled in AI.
Like many other manifestations of human thought, scientific discourse
usually adopts the form of a narrative, a scientific publication or techni-
cal report, where related information is presented in mutually supportive
ways over different modalities, including text, diagrams, figures, mathemat-
ical equations or tables, which need to be accounted for, represented, and
understood across the different modalities. Visually grounded language and
visual reasoning is frequent in Science. However, dealing with scientific visual
information entails additional complexity compared to natural images.
For example, scientific diagrams are more abstract and symbolic than
natural images, hindering the application of conventional language and vi-
sion understanding methods. Some approaches like (Kembhavi et al., 2016)
propose to parse diagram components and connectors as a graph that can be
3
摘要:

arXiv:2210.15327v2[cs.CL]31Oct2022TowardsLanguage-drivenScientificAIJos´eManuelG´omez-P´erezaaLanguageTechnologyResearchLab,Expert.ai,3PoetaJoanMaragall,Madrid,28020,SpainAbstractInspiredbyrecentandrevolutionarydevelopmentsinAI,particularlyinlan-guageunderstandingandgeneration,wesetaboutdesigningAIsy...

展开>> 收起<<
Towards Language-driven Scientific AI.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:145.07KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注