Sacha Morin

About me

Featured

News

Publications

Projects

Sacha Morin

I am a PhD student in Machine Learning at Université de Montréal and MILA. I am advised by Guy Wolf in the RAFALES lab and Liam Paull in the Robotics and Embodied AI Lab (REAL) lab.

I obtained a BSc in Mathematics and Computer Science from the Université de Montréal in 2021 and a Bachelor of Law from the Université de Sherbrooke in 2017.

sacha.morin@mila.quebec

Research

My research interests include:

Foundation models for building multimodal 3D representations.
Generative models for mapping and planning.
Self-supervised representation learning for topological and visual navigation.

My CV includes more details.

Featured Work
	One-4-All: Neural Potential Fields for Embodied Navigation Sacha Morin , Miguel Saavedra-Ruiz , Liam Paull International Conference on Intelligent Robots and Systems (IROS), 2023 code / arXiv / webpage / bibtex @article{morin2023one, title = {One-4-All: Neural Potential Fields for Embodied Navigation}, author = {Morin, Sacha and Saavedra-Ruiz, Miguel and Paull, Liam}, year = 2023, journal = {arXiv preprint arXiv:2303.04011} } An end-to-end fully parametric method for image-goal navigation that leverages self-supervised and manifold learning to replace a topological graph with a geodesic regressor. During navigation, the geodesic regressor is used as an attractor in a potential function defined in latent space, allowing to frame navigation as a minimization problem. A fundamental task in robotics is to navigate between two locations. In particular, real-world navigation can require long-horizon planning using high-dimensional RGB images, which poses a substantial challenge for end-to-end learning-based approaches. Current semi-parametric methods instead achieve long-horizon navigation by combining learned modules with a topological memory of the environment, often represented as a graph over previously collected images. However, using these graphs in practice typically involves tuning a number of pruning heuristics to avoid spurious edges, limit runtime memory usage and allow reasonably fast graph queries. In this work, we present One-4-All (O4A), a method leveraging self-supervised and manifold learning to obtain a graph-free, end-to-end navigation pipeline in which the goal is specified as an image. Navigation is achieved by greedily minimizing a potential function defined continuously over the O4A latent space. Our system is trained offline on non-expert exploration sequences of RGB data and controls, and does not require any depth or pose measurements. We show that O4A can reach long-range goals in 8 simulated Gibson indoor environments, and further demonstrate successful real-world navigation using a Jackal UGV platform.

	ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning Qiao Gu , Alihusein Kuwajerwala , Sacha Morin , Krishna Murthy Jatavallabhula , Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull International Conference on Robotics and Automation (ICRA), 2024 code / arXiv / webpage / bibtex @article{conceptgraphs, author = {Gu, Qiao and Kuwajerwala, Alihusein and Morin, Sacha and Jatavallabhula, {Krishna Murthy} and Sen, Bipasha and Agarwal, Aditya and Rivera, Corban and Paul, William and Ellis, Kirsty and Chellappa, Rama and Gan, Chuang and {de Melo}, {Celso Miguel} and Tenenbaum, {Joshua B.} and Torralba, Antonio and Shkurti, Florian and Paull, Liam}, title = {ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning}, journal = {arXiv}, year = {2023}, } } ConceptGraphs uses off-the-shelf models to build an object-based map from RGB-D images. Objects have associated multi-view fused CLIP features and language captions that can be leveraged by robots to answer abstract queries. For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.

News

[July 2023] One-4-All has been accepted to IROS 2023!
[July 2023] I was a participant in the ETH Robotics Summer School in Geneva, Switzerland. We worked on an autonomous rough-terrain UGV for search and rescue operations.
[June 2023] I was one of the organizers of the 2023 Mila Robotics Summer School.
[May 2023] I was awarded the NSERC PGS D Scholarship!
[May 2023] I was awarded the FRQNT Doctoral Scholarship!
[January 2023] I will be TA for STT3795 - Theoretical Foundations of Data Science.
[December 2022] I gave a short talk on AI, Data & Algorithms at UQÀM in Sylvano Santini's SEM9500 Seminar.
[November 2022] GRAE has been accepted in the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
[September 2022] Happy to join the IVADO Student Intersectoral Committee for 2022-2023!
[September 2022] Started a PhD in Machine Learning with Guy Wolf at Université de Montréal and MILA. Still working on combining manifold learning techniques and deep learning!
[March 2022] Presented our project Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers at the My IVADO project in 180 seconds event. Watch the presentation on Youtube [French].
[April 2022] Our project Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers was featured on Duckietown's webpage!
[March 2022] Our paper Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers has been accepted for Poster presentation at the 19th Conference on Robotics and Vision (CRV).
[October 2021] Presented an ongoing project on COVID-19 data at the IVADO Digital October. Watch the presentation on Youtube [French].
[September 2021] Started a research MSc in Machine Learning with Guy Wolf at Université de Montréal and MILA. I will be working on blending manifold learning techniques and deep learning.
[May 2021] I was awarded the IVADO MSc Excellence Scholarship!
[May 2021] I was awarded the FRQNT B1X Scholarship!
[December 2020] GRAE has been accepted to the IEEE International Conference on Big Data.
[December 2020] Presented GRAE at the DiffGeo4DL NeurIPS Workshop. The presentation can be watched under the "Extendable and invertible manifold learning with geometry regularized autoencoders" section [English].

Publications
	ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning Qiao Gu , Alihusein Kuwajerwala , Sacha Morin , Krishna Murthy Jatavallabhula , Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull International Conference on Robotics and Automation (ICRA), 2024 code / arXiv / webpage / bibtex @article{conceptgraphs, author = {Gu, Qiao and Kuwajerwala, Alihusein and Morin, Sacha and Jatavallabhula, {Krishna Murthy} and Sen, Bipasha and Agarwal, Aditya and Rivera, Corban and Paul, William and Ellis, Kirsty and Chellappa, Rama and Gan, Chuang and {de Melo}, {Celso Miguel} and Tenenbaum, {Joshua B.} and Torralba, Antonio and Shkurti, Florian and Paull, Liam}, title = {ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning}, journal = {arXiv}, year = {2023}, } } ConceptGraphs uses off-the-shelf models to build an object-based map from RGB-D images. Objects have associated multi-view fused CLIP features and language captions that can be leveraged by robots to answer abstract queries. For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.

	One-4-All: Neural Potential Fields for Embodied Navigation Sacha Morin , Miguel Saavedra-Ruiz , Liam Paull International Conference on Intelligent Robots and Systems (IROS), 2023 code / arXiv / webpage / bibtex @article{morin2023one, title = {One-4-All: Neural Potential Fields for Embodied Navigation}, author = {Morin, Sacha and Saavedra-Ruiz, Miguel and Paull, Liam}, year = 2023, journal = {arXiv preprint arXiv:2303.04011} } An end-to-end fully parametric method for image-goal navigation that leverages self-supervised and manifold learning to replace a topological graph with a geodesic regressor. During navigation, the geodesic regressor is used as an attractor in a potential function defined in latent space, allowing to frame navigation as a minimization problem. A fundamental task in robotics is to navigate between two locations. In particular, real-world navigation can require long-horizon planning using high-dimensional RGB images, which poses a substantial challenge for end-to-end learning-based approaches. Current semi-parametric methods instead achieve long-horizon navigation by combining learned modules with a topological memory of the environment, often represented as a graph over previously collected images. However, using these graphs in practice typically involves tuning a number of pruning heuristics to avoid spurious edges, limit runtime memory usage and allow reasonably fast graph queries. In this work, we present One-4-All (O4A), a method leveraging self-supervised and manifold learning to obtain a graph-free, end-to-end navigation pipeline in which the goal is specified as an image. Navigation is achieved by greedily minimizing a potential function defined continuously over the O4A latent space. Our system is trained offline on non-expert exploration sequences of RGB data and controls, and does not require any depth or pose measurements. We show that O4A can reach long-range goals in 8 simulated Gibson indoor environments, and further demonstrate successful real-world navigation using a Jackal UGV platform.

	Geometry Regularized Autoencoders (GRAE) Andres F. Duque , Sacha Morin , Guy Wolf, Kevin R. Moon IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 IEEE International Conference on Big Data, 2020 DiffGeo4DL, NeurIPS 2020 Workshop code / paper / webpage / bibtex @article{duque2022geometry, title={Geometry Regularized Autoencoders}, author={Duque, Andres F and Morin, Sacha and Wolf, Guy and Moon, Kevin R}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2022}, publisher={IEEE} } Manifold-based regularization for learning better representations in autoencoders. A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, especially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.

	Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers Miguel Saavedra-Ruiz , Sacha Morin , Liam Paull Conference on Robotics and Vision (CRV), 2022 code (model) / code (servoing) / arXiv / webpage / duckietown coverage / poster / youtube [FR]/ bibtex @article{saavedra2022monocular, title = {Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers}, author = {Saavedra-Ruiz, Miguel and Morin, Sacha and Paull, Liam}, year = 2022, journal = {arXiv preprint arXiv:2203.03682} } Visual Servoing navigation using pre-trained Self-Supervised Vision Transformers. In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

	Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes Paré et al. PLOS One, 2021 paper/ bibtex @article{pare2021patient, title={Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes}, author={Par{\'e}, Bastien and Rozendaal, Marieke and Morin, Sacha and Kaufmann, L{\'e}a and Simpson, Shawn M and Poujol, Rapha{\"e}l and Mostefai, Fatima and Grenier, Jean-Christophe and Xing, Henry and Sanchez, Miguelle and others}, journal={PloS one}, volume={16}, number={12}, pages={e0260714}, year={2021}, publisher={Public Library of Science San Francisco, CA USA} } Analysis of patient outcomes in a SARS-Cov-2 outbreak in a Quebec hospital.

Projects and Preprints
	Sustained IFN signaling is associated with delayed development of SARS-CoV-2-specific immunity Elsa Brunet-Ratnasingham, Sacha Morin , Haley E. Randolph* et al. medrXiv Integrated analysis using k-means and manifold learning to uncover endotypes in a cohort of COVID-19 patients.

	StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables Sacha Morin , Robin Legault , Félix Laliberté, Zsuzsa Bakk , Charles-Édouard Giguère , Roxane de la Sablonnière , Éric Lacourse code / arXiv A Python package for multi-step estimation of latent class models with measurement and structural components. Enables joint clustering of continuous and categorical features with missing values.

	MILA COVID-19 Taskforce The Mila COVID-19 Taskforce is a collaboration between researchers to answer COVID-19 research questions via data-driven methods. Our team is composed of members from the Université de Montréal, Yale University and McGill University. Participating research laboratories include Mila, Krishnaswamy Lab, MHI-omics, Kaufmann Lab, Quantitative and Translational Medicine Laboratory, and Smith Lab.

Updated March 19 2024

Template from here, here and here.