Conversing with Copilot Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

2025-05-06 0 0 648.31KB 7 页 10玖币
侵权投诉
Conversing with Copilot: Exploring Prompt Engineering for
Solving CS1 Problems Using Natural Language
Paul Denny
p.denny@auckland.ac.nz
University of Auckland
Auckland, New Zealand
Viraj Kumar
viraj@iisc.ac.in
Indian Institute of Science
Bengaluru, India
Nasser Giacaman
n.giacaman@auckland.ac.nz
University of Auckland
Auckland, New Zealand
ABSTRACT
GitHub Copilot is an articial intelligence model for automatically
generating source code from natural language problem descriptions.
Since June 2022, Copilot has ocially been available for free to all
students as a plug-in to development environments like Visual
Studio Code. Prior work exploring OpenAI Codex, the underlying
model that powers Copilot, has shown it performs well on typical
CS1 problems thus raising concerns about the impact it will have
on how introductory programming courses are taught. However,
little is known about the types of problems for which Copilot does
not perform well, or about the natural language interactions that a
student might have with Copilot when resolving errors. We explore
these questions by evaluating the performance of Copilot on a
publicly available dataset of 166 programming problems. We nd
that it successfully solves around half of these problems on its very
rst attempt, and that it solves 60% of the remaining problems
using only natural language changes to the problem description.
We argue that this type of prompt engineering, which we believe
will become a standard interaction between human and Copilot
when it initially fails, is a potentially useful learning activity that
promotes computational thinking skills, and is likely to change the
nature of code writing skill development.
CCS CONCEPTS
Applied computing Education
;
Social and professional
topics
;
Software and its engineering Designing software
;
KEYWORDS
OpenAI, GitHub Copilot, foundation models, large language models,
CS1, articial intelligence, introductory programming.
ACM Reference Format:
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with
Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using
Natural Language. In SIGCSE ’23: ACM Technical Symposium on Computer
Science Education, March 15–18, 2023, Toronto, Ontario, Canada. ACM, New
York, NY, USA, 7 pages.
1 INTRODUCTION
Recent breakthroughs in deep learning have led to the emergence
of transformer language models that exhibit extraordinary perfor-
mance at generating novel human-like content such as text (e.g.,
SIGCSE ’23, March 15–18, 2023, Toronto, Ontario, Canada
©2023 Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal use. Not
for redistribution. The denitive Version of Record was published in SIGCSE ’23:
ACM Technical Symposium on Computer Science Education, March 15–18, 2023, Toronto,
Ontario, Canada.
GPT-3 [
5
]), images (e.g., DALL-E [
18
]) and source code (e.g., Codex
[
6
]). Producing source code automatically from natural language
prompts promises to greatly improve the eciency of professional
developers [
21
], and is being actively explored by groups such as
OpenAI (Codex), Amazon (CodeWhisperer) and Google (Alpha-
Code). After less than one year in technical preview, a production
version of Codex called Copilot
1
has recently been released as an
extension for development environments such as Visual Studio
Code. This extension is available for free to students, and claims to
be their “AI pair programmer”. Just how students will adopt and
make use of tools like Copilot is unclear [
9
], but it seems certain
they will play an increasing role inside and outside the classroom.
Very recent work has shown that these code generation models
are good at solving simple programming tasks. For instance, Finnie-
Ansley et al. evaluated the performance of OpenAI’s Codex on a
private repository of CS1 exam questions, nding that roughly half
of the questions were solved by Codex on its very rst attempt [
10
].
However, very little is known about the types of problems for
which these models tend to fail, or about how students will interact
with code generation tools when such failures occur. One hypothe-
sized interaction that seems very likely is that students will learn
to modify, or engineer, natural language problem descriptions to
guide the model into generating solutions that “work” (at least in
the sense of passing available test cases). Indeed, it is well known
that language model outputs are very sensitive to their inputs [
19
].
For example, when using Codex to solve probability and statistics
problems, engineering the prompt to include explicit hints on the
strategy for solving a problem is extremely eective [
23
]. An inter-
esting open question in computing education is how well this kind
of prompt engineering [
16
] works in practice, and whether it might
be a useful activity for students who are learning to program. In
this paper, we explore the current performance of Copilot and the
ecacy of prompt engineering guided by the following research
questions:
RQ1:
How well does Copilot perform, at the current time, on a
public repository of CS1 programming problems?
RQ2:
To what extent do natural language modications to the
problem description lead to the generation of successful
solutions after Copilot initially fails?
RQ3:
What commonalities exist amongst problems that Copilot
cannot solve, despite reasonable prompt engineering eort?
2 MOTIVATING EXAMPLE
CodeCheck
2
is an online autograder that includes a large collec-
tion of publicly accessible programming problems. The goal of
1https://github.com/features/copilot
2https://horstmann.com/codecheck/python-questions.html
arXiv:2210.15157v1 [cs.HC] 27 Oct 2022
SIGCSE ’23, March 15–18, 2023, Toronto, Ontario, Canada Paul Denny, Viraj Kumar, and Nasser Giacaman
CodeCheck is to assist instructors in oering students exercises
they can use for self-practice [15].
Consider the following problem which appears as Problem 5
under the “Sums, Averages, Products” category of the “Lists (Simple
Exercises)” group in the Python programming problem bank. The
problem is displayed below exactly as it appears on the CodeCheck
website, with the problem description appearing as a comment
within the code editor, the function signature provided, and the
statement “Your code here...” as a comment prompting the user to
enter their solution:
1# Given a l i s t o f s t r i n g s and a s t r i n g s ,
2# r e t u r n t h e a v e r a g e l e n g t h o f a l l s t r i n g s
3# containing s .
4
5d e f a v e r a g e C o n t a i n i n g S t r i n g S ( s t r i n g s , s ) :
6# Your code h e re . . .
What might happen if a student used Copilot as their “AI pair
programmer” here? We provided the problem description to Copi-
lot (in July 2022, shortly after its general availability release), by
pasting the comments and function header, excluding the “Your
code here...” prompt, into a Visual Studio Code editor with the Copi-
lot extension enabled. The following suggested solution appeared
almost instantly:
1# Given a l i s t o f s t r i n g s and a s t r i n g s ,
2# r e t u r n t h e a v e r a g e l e n g t h o f a l l s t r i n g s
3# containing s .
4
5d e f a v e r a g e C o n t a i n i n g S t r i n g S ( s t r i n g s , s ) :
6count = 0
7f o r string i n s t r i n g s :
8i f si n string :
9count += 1
10 r e t u r n l e n (s) count / l e n ( s t r i n g s )
This is, of course, incorrect—the combined length of the matching
strings should be divided by their frequency. The arithmetic above is
somewhat nonsensical. If the student who used Copilot to generate
this solution submitted it to the CodeCheck website for evaluation,
they would see that all of the provided test cases fail.
One way this error could be resolved is for the student to x the
arithmetic in the code manually. In this case, Copilot has still served
a useful purpose—it helped the student most of the way to a solu-
tion by producing code that iterates over the list of strings and that
counts the number of matching strings correctly. However a dier-
ent approach, and one that we envisage as becoming increasingly
common, involves modifying the natural language description of
the problem and querying Copilot again. How could the prompt be
engineered to help guide Copilot towards generating a correct solu-
tion? The current (incorrect solution) is not computing the lengths
of the matching strings. Consider the following modication to
the prompt which adds comment lines 4–6 outlining a simplistic
algorithm for solving the problem. With these modications in
place, Copilot produced the following solution:
1# Given a l i s t o f s t r i n g s and a s t r i n g s ,
2# r e t u r n t h e a v e r a g e l e n g t h o f a l l s t r i n g s
3# containing s .
4# Count th e number of s t r i n g s t h a t c o n ta in s
5# Als o , sum t he l e ng t h o f a l l s t r i n g s c o n t a i n i n g s
6# F i n a l l y , d i v i d e t he t o t a l l e ng t h by th e cou n t
7
8d e f a v e r a g e C o n t a i n i n g S t r i n g S ( s t r i n g s , s ) :
9count = 0
10 totalLength = 0
11 f o r string i n s t r i n g s :
12 i f si n string :
13 count += 1
14 totalLength += l e n ( s t r i n g )
15 return totalLength / count
The above solution is correct provided the nal value of
count
is positive (the CodeCheck website does not specify, or test for, the
return value if none of the strings in
strings
contain
s
). The com-
ments added to lines 4–6 of this example illustrate one potentially
valuable pedagogical use of Copilot: describing computational steps
in natural language as a way of guiding code generation models.
Although prior work in computing education has shown that
tools like Codex (which powers Copilot) perform well on typical
CS1 problems, little is known about the types of problems for which
they tend to fail. In addition, there is currently no work exploring
prompt engineering as a strategy for students to resolve errors. We
explore both of these ideas using a dataset of publicly accessible
problems, thus establishing a baseline for future evaluations of code
generation models which we expect will rapidly improve.
3 RELATED WORK
Large language models, or foundation models, are deep neural
networks trained with self-supervised learning on broad data sets
at a very large scale [
4
]. These models can then be adapted, or
ne-tuned, for application to a wide range of tasks including the
generation of natural language, digital images, and source code.
While their ability to generate novel human-like outputs is on
the one hand fascinating, their rapidly increasing deployment has
caused alarm among some researchers and led to calls for better
understanding of their implications and risks [3, 22].
GPT-3, released by OpenAI in May 2020, is a groundbreaking
large language model that is trained to predict the next token in
a text sequence [
5
]. The Codex model is the result of ne-tuning
GPT-3 with an enormous amount of code samples—159GB of code
from 54 million GitHub repositories [
6
]. Copilot is a production
version of Codex that has been released as an extension for devel-
opment environments like Visual Studio Code. It became generally
available to all developers in June of 2022, at which time GitHub an-
nounced it would be free for students
3
. The impact on educational
practice of such technologies is unknown, with arguments on both
sides—highlighting concerns of over-reliance by novices [
6
], and
suggesting that the ability to synthesize code automatically could
play a revolutionary role in teaching [11].
In the computing education literature, there have been very few
evaluations to date of code generation models. Finnie-Ansley et al.
explored the performance of Codex on a private dataset of CS1 exam
problems and on several common variations of the well-known
3
https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-
developers
摘要:

ConversingwithCopilot:ExploringPromptEngineeringforSolvingCS1ProblemsUsingNaturalLanguagePaulDennyp.denny@auckland.ac.nzUniversityofAucklandAuckland,NewZealandVirajKumarviraj@iisc.ac.inIndianInstituteofScienceBengaluru,IndiaNasserGiacamann.giacaman@auckland.ac.nzUniversityofAucklandAuckland,NewZeala...

展开>> 收起<<
Conversing with Copilot Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:648.31KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注