Sacha Morin

I am a PhD student in Machine Learning at Université de Montréal and MILA. I am advised by Guy Wolf in the RAFALES lab and Liam Paull in the Robotics and Embodied AI Lab (REAL) lab.

I obtained a BSc in Mathematics and Computer Science from the Université de Montréal in 2021 and a Bachelor of Law from the Université de Sherbrooke in 2017.  


profile photo


My research interests include:

  • Foundation models for building multimodal 3D representations.
  • Generative models for mapping and planning.
  • Self-supervised representation learning for topological and visual navigation.

My CV includes more details.



ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Qiao Gu *, Alihusein Kuwajerwala *, Sacha Morin *, Krishna Murthy Jatavallabhula *, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull
International Conference on Robotics and Automation (ICRA), 2024
code / arXiv / webpage / bibtex
  author    = {Gu, Qiao and Kuwajerwala, Alihusein and Morin, Sacha and Jatavallabhula, {Krishna Murthy} and  Sen, Bipasha and Agarwal, Aditya and Rivera, Corban and Paul, William and Ellis, Kirsty and Chellappa, Rama and Gan, Chuang and {de Melo}, {Celso Miguel} and Tenenbaum, {Joshua B.} and Torralba, Antonio and Shkurti, Florian and Paull, Liam},
  title     = {ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning},
  journal   = {arXiv},
  year      = {2023},

ConceptGraphs uses off-the-shelf models to build an object-based map from RGB-D images. Objects have associated multi-view fused CLIP features and language captions that can be leveraged by robots to answer abstract queries.
For robots to perform a wide variety of tasks, they require a 3D representation
of the world that is semantically rich, yet compact and efficient for task-driven
perception and planning. Recent approaches have attempted to leverage features from
large vision-language models to encode semantics in 3D representations. However,
these approaches tend to produce maps with per-point feature vectors, which do not
scale well in larger environments, nor do they contain semantic spatial relationships
between entities in the environment, which are useful for downstream planning.
In this work, we propose ConceptGraphs, an open-vocabulary graph-structured
representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models
and fusing their output to 3D by multi-view association. The resulting representations
generalize to novel semantic classes, without the need to collect large 3D datasets or
finetune models. We demonstrate the utility of this representation through a number of
downstream planning tasks that are specified through abstract (language) prompts and
require complex reasoning over spatial and semantic concepts.

One-4-All: Neural Potential Fields for Embodied Navigation
Sacha Morin *, Miguel Saavedra-Ruiz *, Liam Paull
International Conference on Intelligent Robots and Systems (IROS), 2023
code / arXiv / webpage / bibtex
	title        = {One-4-All: Neural Potential Fields for Embodied Navigation},
	author       = {Morin, Sacha and Saavedra-Ruiz, Miguel and Paull, Liam},
	year         = 2023,
	journal      = {arXiv preprint arXiv:2303.04011}

An end-to-end fully parametric method for image-goal navigation that leverages self-supervised and manifold learning to replace a topological graph with a geodesic regressor. During navigation, the geodesic regressor is used as an attractor in a potential function defined in latent space, allowing to frame navigation as a minimization problem.
A fundamental task in robotics is to navigate between two locations.
In particular, real-world navigation can require long-horizon planning
using high-dimensional RGB images, which poses a substantial challenge
for end-to-end learning-based approaches. Current semi-parametric methods
instead achieve long-horizon navigation by combining learned modules with
a topological memory of the environment, often represented as a graph over
previously collected images. However, using these graphs in practice typically
involves tuning a number of pruning heuristics to avoid spurious edges, limit
runtime memory usage and allow reasonably fast graph queries. In this work,
we present One-4-All (O4A), a method leveraging self-supervised and manifold
learning to obtain a graph-free, end-to-end navigation pipeline in which the
goal is specified as an image. Navigation is achieved by greedily minimizing
a potential function defined continuously over the O4A latent space. Our system
is trained offline on non-expert exploration sequences of RGB data and controls,
and does not require any depth or pose measurements. We show that O4A can reach
long-range goals in 8 simulated Gibson indoor environments, and further demonstrate
successful real-world navigation using a Jackal UGV platform.

Geometry Regularized Autoencoders (GRAE)
Andres F. Duque *, Sacha Morin *, Guy Wolf, Kevin R. Moon
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
IEEE International Conference on Big Data, 2020
DiffGeo4DL, NeurIPS 2020 Workshop
code / paper / webpage / bibtex
  title={Geometry Regularized Autoencoders},
  author={Duque, Andres F and Morin, Sacha and Wolf, Guy and Moon, Kevin R},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},

Manifold-based regularization for learning better representations in autoencoders.
A fundamental task in data exploration is to extract simplified
low dimensional representations that capture intrinsic geometry
in data, especially for faithfully visualizing data in two or
three dimensions. Common approaches to this task use kernel methods
for manifold learning. However, these methods typically only provide an
embedding of fixed input data and cannot extend to new data points.
Autoencoders have also recently become popular for representation
learning. But while they naturally compute feature extractors that
are both extendable to new data and invertible (i.e., reconstructing
original features from latent representation), they have limited capabilities
to follow global intrinsic geometry compared to kernel-based manifold learning.
We present a new method for integrating both approaches by incorporating a
geometric regularization term in the bottleneck of the autoencoder. Our
regularization, based on the diffusion potential distances from the
recently-proposed PHATE visualization method, encourages the learned latent
representation to follow intrinsic data geometry, similar to manifold learning
algorithms, while still enabling faithful extension to new data and reconstruction
of data in the original feature space from latent coordinates. We compare our
approach with leading kernel methods and autoencoder models for manifold learning
to provide qualitative and quantitative evidence of our advantages in preserving
intrinsic structure, out of sample extension, and reconstruction. Our method is easily
implemented for big-data applications, whereas other methods are limited in this regard.

Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
Miguel Saavedra-Ruiz *, Sacha Morin *, Liam Paull
Conference on Robotics and Vision (CRV), 2022
code (model) / code (servoing) / arXiv / webpage / duckietown coverage / poster / youtube [FR]/ bibtex
	title        = {Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers},
	author       = {Saavedra-Ruiz, Miguel and Morin, Sacha and Paull, Liam},
	year         = 2022,
	journal      = {arXiv preprint arXiv:2203.03682}

Visual Servoing navigation using pre-trained Self-Supervised Vision Transformers.
In this work, we consider the problem of learning a perception model for
monocular robot navigation using few annotated images. Using a Vision
Transformer (ViT) pretrained with a label-free self-supervised method, we
successfully train a coarse image segmentation model for the Duckietown
environment using 70 training images. Our model performs coarse image
segmentation at the 8x8  patch level, and the inference resolution can be
adjusted to balance prediction granularity and real-time perception constraints.
We study how best to adapt a ViT to our task and environment, and find that some
lightweight architectures can yield good single-image segmentations at a usable
frame rate, even on CPU. The resulting perception model is used as the backbone
for a simple yet robust visual servoing agent, which we deploy on a differential
drive mobile robot to perform two tasks: lane following and obstacle avoidance. 

Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes
Paré et al.
PLOS One, 2021
paper/ bibtex
  title={Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes},
  author={Par{\'e}, Bastien and Rozendaal, Marieke and Morin, Sacha and Kaufmann, L{\'e}a and Simpson, Shawn M and Poujol, Rapha{\"e}l and Mostefai, Fatima and Grenier, Jean-Christophe and Xing, Henry and Sanchez, Miguelle and others},
  journal={PloS one},
  publisher={Public Library of Science San Francisco, CA USA}

Updated March 19 2024

Template from here, here and here.