Snapshot of Algebraic Vision Joe Kileel and Kathl en Kohn In honor of Bernd Sturmfels 60th birthday

2025-05-03 0 0 4.45MB 40 页 10玖币

侵权投诉

Snapshot of Algebraic Vision

Joe Kileel and Kathl´en Kohn

In honor of Bernd Sturmfels’ 60th birthday

Abstract. In this survey article, we present interactions between algebraic

geometry and computer vision, which have recently come under the header of

algebraic vision. The subject has given new insights in multiple view geometry

and its application to 3D scene reconstruction and carried a host of novel

problems and ideas back into algebraic geometry.

Computer vision is the research ﬁeld that studies how computers can gain un-

derstanding from 2D images and videos, similar to human cognitive abilities. Typ-

ical computer vision tasks include the automatic recognition of objects in images,

the detection of events in videos, and the reconstruction of 3D scenes from many

given 2D images. A general overview of computer vision is presented in textbook

form in [172]. The subject is a pillar in the AI revolution.

Algebraic vision is the symbiosis of computer vision and algebraic geometry.

Motivated by Chris Aholt’s Ph.D. thesis titled Polynomials in Multiview Geometry

[9] and earlier works, the term “algebraic vision” was coined during a particular

lunch held at a Seattle oﬃce of Google in early spring 2014, attended by Sameer

Agarwal, Chris Aholt, Joe Kileel, Hon-Leung Lee, Max Lieblich, Bernd Sturmfels,

and Rekha Thomas. The intent was to encourage interactions between the applied

algebraic geometry community and the 3D reconstruction community in computer

vision. A short discussion of algebraic vision can be found in the review [34] on

nonlinear algebra and its applications.

Historically, computer vision made substantial use of projective geometry and

computational algebra in parts of its foundations. Speciﬁcally multiple view geom-

etry, as described in the textbook [94] of Hartley and Zisserman, is modeled on

projective three-space and two-space and group-equivariant (multi-)linear transfor-

mations between these. Similar algebraic treatments of the subject are the text-

books [133] and [137]. Previously, this connection was not well-appreciated by the

2020 Mathematics Subject Classiﬁcation. Primary 68T45, 14Q20, 13P25; Secondary 13P15,

65H14, 13P10.

J.K. is supported in part by NSF awards DMS-2309782 and IIS-2312746, and start-up grants

from the Department of Mathematics and Oden Institute at UT Austin.

K.K. is supported in part by the Wallenberg AI, Autonomous Systems and Software Program

(WASP) funded by the Knut and Alice Wallenberg Foundation.

arXiv:2210.11443v2 [math.AG] 17 Oct 2023

2 JOE KILEEL AND KATHL ´

EN KOHN

computational algebra geometry community. However, in the last decade, algebro-

geometric papers and workshops on 3D reconstruction have been appearing, leading

to novel results in multiple view geometry while motivating developments in applied

algebraic geometry.

The present article provides a survey of algebraic vision. No previous knowledge

of computer vision is assumed, and the prerequisites for computational algebraic

geometry are kept mostly to the level of undergraduate texts [51]. Due to space lim-

itations, the article makes no attempt to be comprehensive in any way, but instead

it focuses narrowly on the role of projective varieties and systems of polynomial

equations in 3D vision. An outline of the sections is as follows:

•In Section 1, we introduce the problem of 3D scene reconstruction from

unkown cameras and its algebro-geometric nature.

•In Section 2, we discuss a variety of usual models for cameras.

•In Section 3, we study multiview varieties which characterize feasible im-

ages of points under ﬁxed cameras. Their deﬁning equations play a key

role in 3D reconstruction algorithms, and their Euclidean distance degrees

measure the intrinsic complexity of noisy triangulation (i.e., the task of

recovering the 3D coordinates of a point observed by known cameras).

•In Section 4, we consider the space of all cameras. We explain how tuples

of cameras – up to changes of world coordinates – can be encoded via

multifocal tensors [94].

•In Section 5, we overview the most popular algorithmic pipeline to solve

3D scene reconstruction, highlighting minimal problems that are the algebro-

geometric heart of the pipeline.

•In Section 6, we describe polynomial solvers for minimal problems, focus-

ing on Gr¨obner basis methods using elimination templates and homotopy

continuation. Those method applies to zero-dimensional parameterized

polynomial systems in general.

•In Section 7, we discuss algebro-geometric approaches to understand de-

generate world scenes and image data, where uniqueness of reconstruction

breaks down and algorithms can encounter diﬃculty.

After reading Sections 1 and 2, the other sections are essentially independent;

only Section 6 builds on Section 5. We provide speciﬁc pointers to earlier sections

in case of partial dependencies.

Some important topics in algebraic vision that are omitted include group syn-

chronization (e.g., [161, 127]), uses of polynomial optimization (e.g., [104, 50, 6,

190, 44]), and approaches based on diﬀerential invariants (e.g., [42, 30]). Readers

may consult [147] for a survey that covers numerical and large-scale optimization

aspects in 3D reconstruction.

Acknowledgements. We thank Sameer Agarwal, Paul Breiding, Luca Car-

lone, Tim Duﬀ, Hongyi Fan, Fredrik Kahl, Anton Leykin, Tomas Pajdla, Jean

Ponce, Kristian Ranestad, Felix Rydell, Elima Shehu, Rekha Thomas, Matthew

Trager and Uli Walther for their comments on earlier versions of the manuscript.

1. Computer vision through the algebraic lens

One of the main challenges in computer vision is the structure-from-motion

(SfM) problem: given many 2D images, the task is to reconstruct the 3D scene

SNAPSHOT OF ALGEBRAIC VISION 3

and also the positions of the cameras that took the pictures. This has many ap-

plications such as 3D mapping from images taken by drones [158], to localize and

navigate autonomous cars and robots in a 3D world [83], or in the movie industry

to reconstruct 3D backgrounds [107], for photo tourism [2], and for combining real

and virtual worlds [60].

The structure-from-motion problem is typically solved using the 3D reconstruc-

tion pipeline. We will now sketch a highly simpliﬁed version of that pipeline, il-

lustrated in Figure 1. We provide more details in Section 5.1. Given a set of 2D

images, the ﬁrst step in the pipeline is to take a few of the given images and identify

geometric features, such as points or lines, that they have in common. In Figure 1b,

a detection algorithm has been used that only identiﬁes points. In the second step

of the pipeline, we forget the original images and only keep the geometric features

we have identiﬁed. We reconstruct the 3D coordinates of those features and also the

camera poses, that is, the locations and orientations of the cameras. In Figure 1c,

ﬁve common points were identiﬁed on two images, so we aim to reconstruct the

ﬁve points in 3-space and the two cameras. Finally, we repeat this process several

times until we have recovered all cameras and also enough geometric features to

approximate the 3D scene.

(a) Input images (b) Image matching

eras and 3D points (d) Output

Figure 1. 3D reconstruction pipeline (courtesy of Tomas Pajdla).

As the second step of the pipeline forgets the pictures and only works with

algebro-geometric features, such as points or lines, the reconstruction problem be-

comes purely algebraic. More speciﬁcally, we aim to compute a ﬁber of the joint

camera map:

Φ : X × Cm99K Y,(1.1)

that maps an arrangement X∈ X of 3D features and a tuple (C1, . . . , Cm)∈ Cmof

cameras to the m2D images of Xtaken by the cameras. For instance in Figure 1c,

the joint camera map becomes

Φ : R35× C299K R25×R25.(1.2)

A full speciﬁcation of the joint camera map requires a choice of camera model.

The simplest model is a pinhole camera; see Figure 2. Such a camera simply takes

a picture of a point in space by projecting it onto a plane. A pinhole camera in

standard position is typically assumed to be centered at the origin such that its

4 JOE KILEEL AND KATHL ´

EN KOHN

image plane is H={(x, y, z)∈R2|z= 1}. In these coordinates, the pinhole

camera is the map

R399K H, (x, y, z)7−→ (x

z,y

z,1).

(x, y, z)

(x/z, y/z, 1)

(0, 0, 1)

Figure 2. A pinhole camera in standard position is centered

at c= (0,0,0) and maps world points (x, y, z) to image points

z,y

z,1) on the image plane H.

Often homogeneous coordinates are used to model cameras. This means that

each point in the image plane is identiﬁed with the light ray passing through the

point and the origin. In homogeneous coordinates, the standard pinhole camera in

Figure 2 becomes

R99K P2

R,[x:y:z:w]7−→ [x:y:z].

This map is deﬁned everywhere except at the camera center [0 : 0 : 0 : 1], i.e. the

origin in the aﬃne chart where w= 1.

The projective geometry approach in modeling cameras is thoroughly explained

in the textbook [94]. That book laid many foundations and conventions used in

modern computer vision and oﬀers a great entry point for the algebraic community

into the ﬁeld of computer vision. The main focus of the book [94] is multiview

geometry, where a 3D object is viewed by several cameras, such as in Figure 1.

In that setting, we cannot assume that all cameras are in standard position as

described above. Instead, a pinhole camera is more generally given by a 3 ×4

matrix Aof rank three. The corresponding camera map

R99K P2

R, X 7−→ AX.

is deﬁned everywhere except at the camera center that is given by the kernel of A.

The standard camera in Figure 2 corresponds to the matrix h1 0 0 0

0 1 0 0

0 0 1 0 i.

Hence, when using pinhole cameras and homogeneous coordinates, the camera

variety Cmin (1.1) that describes all m-tuples of such cameras is

Cm= (PMat3×4

3)m,

SNAPSHOT OF ALGEBRAIC VISION 5

where Mat3×4

3⊂R3×4denotes the set of 3 ×4 matrices of rank three. For instance,

the joint camera map in (1.2) becomes

Φ : P3

R5×(PMat3×4

3)299K P2

R5×P2

R5,

(X1, . . . , X5, A1, A2)7−→ (A1X1, . . . , A1X5, A2X1, . . . , A2X5).

In the next section, we review common camera models and highlight algebraic

vision articles studying camera geometry. In the remaining sections, our focus

returns to the joint camera map in (1.1): We will see that many computer vision

problems can be formulated using the joint camera map – such as understanding

the image of a shape in space or reconstructing a 3D shape from several images –

and are thus natural to study through the algebraic lens. The recent paper [1] gives

a similar such unifying algebro-geometric framework for computer vision problems.

2. Camera models

Calibrated cameras. The camera model described in the previous section

is known as the projective / uncalibrated pinhole camera. The calibrated pinhole

camera model assumes that every camera is obtained from the standard pinhole

camera in Figure 2 by translation and rotation. This means that every camera

matrix Ais of the form [R|t] where R∈SO(3) is the relative rotation from the

standard pinhole camera to the camera with matrix Aand the relative translation

can be read oﬀ from the vector t∈R3: the camera center c, which is the origin

in Figure 2, is now c=−R⊤t(note that the vector (c, 1)⊤∈R4spans the kernel

of the camera matrix [R|t]). In particular, every calibrated pinhole camera has 6

degrees of freedom (3 for Rand 3 for t), whereas a projective pinhole camera has

11 degrees of freedom.

Calibrated pinhole cameras are a commonly used model in applications, corre-

sponding to the case when the internal parameters of the cameras are known (such

as from meta data stored inside the image ﬁle). There is also a variety of partly

calibrated pinhole cameras, e.g. a camera with unknown focal length, that have less

strict structural assumptions on the 3×4 camera matrices than the fully calibrated

model described above. Partly calibrated pinhole cameras are modeled as K[R|t]

where Kis a 3 ×3 upper triangular calibration matrix whose entries are partially

known [94, Chapter 6].

Distortion. In practice, cameras are not as ideal as in the calibrated model.

As seen in Figure 2, the pinhole cameras described so far assume that the world

point, the camera center, and the image point are collinear. This assumption does

not hold for real-life camera lenses, because they are aﬀected by various kinds of

distortion. The main factor of deviation from the idealistic pinhole camera model

is typically radial distortion; see Figure 3.

Often, calibrated cameras are a suﬃcient approximation of real-life cameras.

However, sometimes the impact of radial distortion is too big, e.g., for ﬁsheye

cameras. One approach to address radial distortion is to make the camera model

more complicated by adding distortion parameters that have to be estimated during

3D reconstruction (see [94, Chapter 7.4] for an overview and [106] for an algebraic

treatment of distortion varieties). Another approach is to simplify the camera model

by not estimating the radial distortion at all: Once the center of radial distortion on

a given image is determined, we know for every 3D point onto which line through

the distortion center it gets mapped by the camera (see Figure 3), although we

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SnapshotofAlgebraicVisionJoeKileelandKathl´enKohnInhonorofBerndSturmfels’60thbirthdayAbstract.Inthissurveyarticle,wepresentinteractionsbetweenalgebraicgeometryandcomputervision,whichhaverecentlycomeundertheheaderofalgebraicvision.Thesubjecthasgivennewinsightsinmultipleviewgeometryanditsapplicationto...

展开>> 收起<<

Snapshot of Algebraic Vision Joe Kileel and Kathl en Kohn In honor of Bernd Sturmfels 60th birthday.pdf

共40页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Snapshot of Algebraic Vision Joe Kileel and Kathl en Kohn In honor of Bernd Sturmfels 60th birthday

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: