Note
This is the website of the course taught in Fall 2023. If you are looking for the website of the course taught in Fall 2024, please click here.
Graduate-level course at ETH Zurich in Autumn Semester 2023
Lecturer:
Christos Sakaridis.
6 ECTS.
Class size limited to 90 students.
Lecture Team
- All
- General
- Project 1
- Project 2
- Website & Forum
Lectures
Date
Time
Room
Slides
Topic
22.09.2023
14:15 - 17:00
Fundamentals of Autonomous Cars
29.09.2023
14:15 - 17:00
Fundamental Computer Vision Architectures and Algorithms
for Autonomous Cars
06.10.2023
14:15 - 17:00
Fundamental Computer Vision Architectures and Algorithms
for Autonomous Cars (continued)
13.10.2023
14:15 - 17:00
Semantic Segmentation
20.10.2023
14:15 - 17:00
Depth Estimation
27.10.2023
14:15 - 17:00
Object Detection
03.11.2023
14:15 - 17:00
Instance Segmentation and Panoptic Segmentation
10.11.2023
14:15 - 17:00
Unimodal 3D Object Detection
17.11.2023
No lecture - CVPR conference deadline
24.11.2023
14:15 - 17:00
3D Reconstruction and Localization
01.12.2023
14:15 - 17:00
Domain Adaptation
08.12.2023
14:15 - 17:00
Multi-modal 2D and 3D Object Detection
(last updated 14.12.23)
15.12.2023
14:15 - 17:00
Visual Grounding, Anomaly Segmentation and
Vehicle-to-Vehicle Communication
22.12.2023
14:15 - 17:00
Multiple Object Tracking and Motion Prediction
Practical Sessions
Date
Time
Room
Slides
Topic
22.09.2023
No practical session
29.09.2023
No practical session
06.10.2023
10:15 - 12:00
Getting Started with Python and SLURM
13.10.2023
10:15 - 12:00
Project 1: Semantic Segmentation and Depth Estimation (Introduction)
20.10.2023
10:15 - 12:00
Project 1: Semantic Segmentation and Depth Estimation (Attention)
27.10.2023
10:15 - 12:00
Project 1: Q&A
03.11.2023
10:15 - 12:00
Project 1: Q&A
10.11.2023
10:15 - 12:00
Project 1: Q&A
17.11.2023
10:15 - 12:00
Project 1: Hand-in
24.11.2023
10:15 - 12:00
Project 2: Introduction
01.12.2023
10:15 - 12:00
Project 2: Q&A
08.12.2023
11:00 - 12:00
Project 2: Q&A
15.12.2023
10:15 - 12:00
Project 2: Q&A
22.12.2023
10:15 - 12:00
Project 2: Hand-in
Abstract
This course introduces the core computer vision techniques and algorithms that autonomous cars use
to perceive the semantics and geometry of their driving environment, localize themselves in it,
and predict its dynamic evolution. Emphasis is placed on techniques tailored for real-world settings,
such as multi-modal fusion, domain-adaptive and outlier-aware architectures, and multi-agent methods.
Objective
Students will learn about the fundamentals of autonomous cars and of the computer vision models and methods these cars use to analyze
their environment and navigate themselves in it. Students will be presented with state-of-the-art representations
and algorithms for semantic, geometric and temporal visual reasoning in automated driving and will gain hands-on experience
in developing computer vision algorithms and architectures for solving such tasks.
After completing this course, students will be able to:
- understand the operating principles of visual sensors in autonomous cars,
- differentiate between the core architectural paradigms and components of modern visual perception models and describe their logic and the role of their parameters,
- systematically categorize the main visual tasks related to automated driving and understand the primary representations and algorithms which are used for solving them,
- critically analyze and evaluate current research in the area of computer vision for autonomous cars,
- practically reproduce state-of-the-art computer vision methods in automated driving,
- independently develop new models for visual perception.
Content
The content of the lectures consists in the following topics:
- Fundamentals
- Fundamentals of autonomous cars and their visual sensors
- Fundamental computer vision architectures and algorithms for autonomous cars
- Semantic perception
- Semantic segmentation
- Object detection
- Instance segmentation and panoptic segmentation
- Geometric perception and localization
- Depth estimation
- 3D reconstruction
- Visual localization
- Unimodal visual/lidar 3D object detection
- Robust perception: multi-modal, multi-domain and multi-agent methods
- Multi-modal 2D and 3D object detection
- Visual grounding and verbo-visual fusion
- Domain-adaptive and outlier-aware semantic perception
- Vehicle-to-vehicle communication for perception
- Temporal perception
- Multiple object tracking
- Motion prediction
Projects
The practical projects involve implementing complex computer vision architectures and algorithms and applying them to real-world, multi-modal driving datasets.
In particular, students will develop models and algorithms for:
- Semantic segmentation and depth estimation,
- 3D object detection using LiDARs.
Prerequisites
Students are expected to have a solid basic knowledge of linear algebra, multivariate calculus, and probability theory,
and a basic background in computer vision and machine learning. All practical projects will require solid background
in programming and will be based on Python and libraries of it such as PyTorch, scikit-learn and scikit-image.
Exam
Examiners:
Christos Sakaridis
A session examination is offered. The mode of the exam is written and its duration is 120 minutes.
The language of examination is English. The performance assessment is only offered in the session after
the course unit. Repetition is only possible after re-enrolling.
The final grade will be calculated from the session examination grade and the overall projects
grade, with each of the two elements weighing 50%. The projects are an integral part of the course,
they are group-based and their completion is compulsory. Receiving a failing overall projects grade results
in a failing final grade for the course. Students who do not pass the projects are required to de-register from the exam.
Written aids for the final exam: two A4 pages (i.e. one A4 sheet of paper), either handwritten or 11-point font size minimum.
Simple non-programmable calculator.