Towards Language-driven Scientific AI

2025-05-06 0 0 145.07KB 12 页 10玖币

侵权投诉

arXiv:2210.15327v2 [cs.CL] 31 Oct 2022

Towards Language-driven Scientiﬁc AI

Jos´e Manuel G´omez-P´ereza

aLanguage Technology Research Lab, Expert.ai, 3 Poeta Joan

Maragall, Madrid, 28020, Spain

Abstract

Inspired by recent and revolutionary developments in AI, particularly in lan-

guage understanding and generation, we set about designing AI systems that

are able to address complex scientiﬁc tasks that challenge human capabilities

to make new discoveries. Central to our approach is the notion of natural

language as core representation, reasoning, and exchange format between sci-

entiﬁc AI and human scientists. In this paper, we identify and discuss some

of the main research challenges to accomplish such vision.

Keywords: Science, Artiﬁcial Intelligence, Language Understanding

1. Introduction

During her presidential address at the AAAI Conference, Gil (2022) pon-

dered whether artiﬁcial intelligence (AI) will write scientiﬁc papers in the

future. She believed that we can be hopeful that the answer is yes and

that it may happen sooner than we might expect. As scientiﬁc questions

become signiﬁcantly more complex, our capabilities to do scientiﬁc break-

throughs need to be augmented. Compare for instance the challenges of

formulating Kepler’s laws of planetary motion or the discovery of a cure

for Polio with demonstrating the existence of binary stellar-mass black hole

systems (Abbott et al., 2016) or the treatment of glioblastoma, a type of

brain cancer. While the former were achieved by a single scientist, the lat-

ter require large and interdisciplinary teams involving the collaboration of

hundreds of scientists from diﬀerent ﬁelds to work together during years to

produce results.

In this paper, we present a personal perspective inspired by recent break-

throughs in AI and particularly language technologies to enable a next gen-

eration of AI systems that may become an eﬀective part of the scientiﬁc

Preprint submitted to arXiv November 1, 2022

ecosystem, collaborate, contribute, and eventually produce signiﬁcant ﬁnd-

ings (Kitano, 2016). In recent years, the incorporation of intelligent tech-

niques for data mining and machine learning has provided scientists with

powerful data-driven analytics and discovery capabilities. However, such

techniques have been focused on solving well-deﬁned narrow tasks. Conﬁn-

ing intelligent machines to such tasks can severely limit our ability to truly

harness the potential of AI to enable us to tackle larger scientiﬁc problems.

It is time to take a quantum leap. Future scientiﬁc endeavors will require

partnerships of scientists and AI, where machines may independently pursue

substantial aspects of the research and contribute their own discoveries. Such

thoughtful AI systems (Gil, 2017) should be capable of formulating their

own research goals, proposing and evaluating hypotheses, designing theories,

debating alternative options, and generating new knowledge. They should

be able to explain their reasoning, compare their rationales to others, and

situate their ﬁndings in the existing literature. AI systems should be able to

communicate with scientists with diﬀerent levels of expertise in a topic. To

form a true partnership, they should be able to take guidance from scientists

as well as to provide guidance to them. Today, this vision is still impossible

to the point that new research is required to make it happen.

The following sections delve into the challenges this vision entails and

how it could be accomplished from a language-driven research perspective.

2. Scientiﬁc AI will be language-driven

As part of the scientiﬁc task forces of the future, AI systems will need

to exchange feedback with human scientists and learn from their interaction.

Rather than ﬁxed, structured formalisms to represent scientiﬁc knowledge,

which can be brittle and constrained to our ability to represent things explic-

itly, we propose a natural language-driven approach where language is the

main formalism to represent and exchange scientiﬁc information between the

diﬀerent agents in the scientiﬁc ecosystem, be they humans or machines.

Generative language models like GPT-3 (Brown et al., 2020) or T5 (Raﬀel et al.,

2020) produce realistic human text based on a statistical bias acquired through

self-supervised training over an extremely large document corpus, learning

to guess the word that is most likely to come next given a prompt, with ap-

plications in many language tasks like information extraction, reading com-

prehension and question answering, conversation, summarization or machine

translation. Such models promote a change of paradigm in NLP, from “pre-

train, ﬁne-tune, predict” to “pre-train, prompt, predict”, where a prompt is

a piece of text inserted in the input examples so that the task that needs to

be solved can be formulated as a language modeling problem. Subsequently,

prompt-based prediction (Gao et al., 2021; Schick and Sch¨utze, 2021) seeks

to specify such prompts as eﬀectively as possible.

We posit that the task of formulating research goals, hypotheses, and

claims by machines in natural language, as well as the evaluation of those

produced by other scientists, can be recast into a series of instructions and

prompts in natural language that inform the model. However, there is no

research that has explored this path yet. Generative language models and

prompt-based prediction are promising but still in their infancy, scientiﬁc

tasks like the formulation of hypotheses, goals and claims require a level of

knowledge, abstract thinking and reasoning only humans have been capable

of yet, and there are no datasets that enable the evaluation and testing of

systems that aim to solve such tasks at human level.

3. ...but also multi-modal

Although we propose language as the main representation and exchange

formalism for scientiﬁc AI systems, scientiﬁc knowledge is heterogeneous and

can present itself in many forms. As originally put by Reddy (1988), ”Reading

a chapter in a college freshman text and answering the questions at the end

of the chapter is a hard problem that requires advances in vision, language,

problem-solving, and learning theory.”. As of today, this is still one of the

grand challenges to be tackled in AI.

Like many other manifestations of human thought, scientiﬁc discourse

usually adopts the form of a narrative, a scientiﬁc publication or techni-

cal report, where related information is presented in mutually supportive

ways over diﬀerent modalities, including text, diagrams, ﬁgures, mathemat-

ical equations or tables, which need to be accounted for, represented, and

understood across the diﬀerent modalities. Visually grounded language and

visual reasoning is frequent in Science. However, dealing with scientiﬁc visual

information entails additional complexity compared to natural images.

For example, scientiﬁc diagrams are more abstract and symbolic than

natural images, hindering the application of conventional language and vi-

sion understanding methods. Some approaches like (Kembhavi et al., 2016)

propose to parse diagram components and connectors as a graph that can be

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

arXiv:2210.15327v2[cs.CL]31Oct2022TowardsLanguage-drivenScientiﬁcAIJos´eManuelG´omez-P´erezaaLanguageTechnologyResearchLab,Expert.ai,3PoetaJoanMaragall,Madrid,28020,SpainAbstractInspiredbyrecentandrevolutionarydevelopmentsinAI,particularlyinlan-guageunderstandingandgeneration,wesetaboutdesigningAIsy...

展开>> 收起<<

Towards Language-driven Scientific AI.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Language-driven Scientific AI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: