Cette page dédiée présente l’un des thème / domaine stratégique actuellement en discussion dans le cadre de notre Programme de financement de recherche stratégique. L’ensemble des thèmes / domaines en discussion est indiqué sur la page du programme. Chaque page dédiée (y compris celle-ci) peut être à un niveau de détail et de maturité variable. Pour participer à la discussion sur ce thème ou en proposer de nouveaux, veuillez utiliser ce formulaire. Si vous souhaitez être tenu.e au courant des développements autour de ce thème / domaine, inscrivez-vous ci-dessous.
IA pour la découverte de molécules et matériaux
Description et justification du domaine
The SARS-CoV2 pandemic made us realize the fragility of our healthcare system when confronted with infectious diseases pandemics, and thus the importance of developing a local innovation pipeline for the discovery of new drugs to protect public health. Recent AI research shows an enormous potential for machine learning in the whole research and innovation process of the pharmaceutical industry. In addition, recent Mila research on the discovery of new molecules opens the door to a substantial acceleration in the discovery of new drugs and the potential to explore much more broadly the molecular space and thus discover drugs that are better suited to their therapeutic objective. In addition, the same kind of algorithms could be used to the discovery of novel materials, and that could be transformative in the fight against climate change, whether it is for carbon capture or improved batteries.The IVADO AI community has contributed to recent advances in relevant methodologies, including in reinforcement learning, representation learning, out-of-distribution generalization, causality, graph neural networks, generative models, meta-learning or black-box optimization with very few iterations. Since the technical goal is a form of exploration and optimization and that molecular space is discrete and very structured, there is also a major collaboration potential at the intersection of machine learning and optimization. At its base, this project is highly multidisciplinary: on the drug discovery and biology side, collaborations are in place between Mila and IRIC on the proposed theme and IRIC has developed new synthetic biology assays making it possible to screen a very large number of training examples. We plan to apply these ideas in the context of synthetic biology of antimicrobial peptides by yeasts to discover new antibiotics. The evolution of new pathogens that resist existing drugs (antibiotic, antifungal and antiviral drugs) is a global and growing public health concern that, if not quickly addressed, will soon surpass in economic and medical impacts those due to the SARS-CoV2 virus. On the materials discovery side, collaborations are ongoing between Mila and materials experts at McGill, and grant proposals and new financing opportunities are being developed between Mila and materials experts at Université de Montréal.
Our proposed project fits with the current efforts and initiatives at the provincial level (e.g. Université de Montréal’s Digital Health Consortium, Génome Québec’s Roundtable for Precision Medicine, and Montreal InVivo’s Strategic Committee on AI in Quebec’s life sciences and health technologies sector, among others). This, together with the strong ties between Mila, our team members and the private sector, will help ensure the economic impact of the proposed research through concerted actions within the provincial ecosystem. All of this will in turn contribute to maintaining our competitiveness. Projects at the intersection of AI and Health have also been of strategic interest at the federal level, as demonstrated by the recently funded pan-Canadian genomics strategy and the financial investments made by CIFAR towards the creation of Canadian Chairs in AI and Health. The federal government also signaled a strong interest in seeing these chairs investigate the application of AI to fight climate change (which is what the materials discovery component of this project is about).
Ajout 13/07 : Les progrès dans le domaine des matériaux sont à la base de la transformation de notre société, depuis plus de 150 ans. S’appuyant sur l’utilisation d’une palette sans cesse plus large d’éléments chimiques et la fabrication de composés toujours plus complexes adaptés à des usages de plus en plus pointus, ils ont permis des développements majeurs en chimie, incluant les fertilisants et la pétrochimie, en physique, avec les semi-conducteurs et la maîtrise de la fission nucléaire, ainsi qu’en métallurgie, dans le domaine des céramiques, de l’énergie et bien plus.
Cette approche, si elle a mené à la société contemporaine, repose sur un découplage presque complet entre le travail fondamental et les coûts environnementaux des solutions proposées, menant à des matériaux composés d’éléments rares, souvent toxiques, qui infligent des dégâts environnementaux importants et dont la réutilisation et le recyclage sont difficiles à mettre en place. Elle n’est plus tenable, alors que la planète est de plus en plus bouleversée par les activités humaines.
Au cours des dernières décennies, d’importants efforts ont été déployés pour remplacer des produits à l’impact environnemental démesuré, tels que les CFC et, plus récemment, les piézoélectriques à base de plomb (PZT), des composantes essentielles dans les microphones, les générateurs d’ultrasons et les senseurs mécaniques, par exemple. Or, malgré 20 ans de recherche et plus de 500 M$ d’investissements et de nombreux composés prometteurs, le but est encore loin, faute, en partie, à une approche systémique au problème.
Ce thème stratégique vise à mettre en place une approche intégrée pour la conception de matériaux durables capables de remplacer ceux que nous utilisons aujourd’hui. Des matériaux qui trouveront leur place dans des batteries vertes et facilement recyclables, des bétons à plus faible impact environnemental, des moteurs plus performants, des médicaments à base de molécules vertes et bien plus.
Cet effort nécessitera le traitement de données massives et la simulation numérique pour le développement et la caractérisation de matériaux. Cette méthodologie est particulièrement pertinente pour le Québec et le Canada, qui possèdent une industrie intégrée qui va de l’extraction des matières premières aux produits sophistiqués à haute valeur ajoutée.
Contexte
Mots-clefs : Artificial intelligence, machine learning, drug discovery, materials discovery, deep learning, reinforcement learning, discrete optimization, search, generative models, causality, active learning, out-of-distribution optimization, exploration algorithms, physique des matériaux, élaboration de nouveaux matériaux assister par l’apprentissage machine, cycle de vie des matériaux
Organisations pertinentes :
- Mila, IRIC, Médicament-Québec, Génome Québec, NRC, Valence Discovery, IBM, Samsung, Recursion Pharmaceuticals, Novartis, Genentech/Roche, GPAI, GARDP, DnDI, Bill and Melinda Gates Foundation, Chan-Zuckerberg Initiative.
- Valence Discovery, IBM, Samsung, Recursion Pharmaceuticals, Genentech/Roche, and Novartis have partnered with Mila and have a very keen interest in supporting research in AI and drug discovery, both financially and scientifically. If the theme we are proposing is selected by IVADO, we will pursue our discussions with these companies to further explore their involvement and level of support. Mila will also look into the possibility to allocate to the proposed project additional funding that our institute already secured from MEI (Ministère de l’Économie et de l’Innovation). This funding is aimed at fostering technology transfer in specific strategic application areas including drug and material discovery; as such, it would be a relevant complement to the IVADO funding.
- Our team is composed of a critical mass of world-class academic researchers with expertise in drug discovery and materials discovery, and a track-record of securing competitive funding for interdisciplinary projects in these areas, e.g. from Genome Quebec, Bill and Melinda Gates Foundation, Consortium Québécois du médicament (CQDM) among others. We will be seeking funding from these organizations and from others (e.g. NRC).
Personnes pertinentes suggérées durant la consultation :
Les noms suivants ont été proposés par la communauté et les personnes mentionnées ci-dessous ont accepté d’afficher publiquement leur nom. Notez cependant que tous les noms des professeur.e.s (qu’ils soient affichés publiquement ou non sur notre site web) seront transmis au comité conseil pour l’étape d’identification et de sélection des thèmes stratégiques. Notez également que les personnes identifiées durant l’étape de consultation n’ont pas la garantie de recevoir une partie du financement. Cette étape sert avant tout à présenter un panorama du domaine, incluant les personnes pertinentes et non à monter des équipes pour les programmes-cadres.
Machine Learning:
- Yoshua Bengio (UdeM)
- Doina Precup (McGill)
- Jian Tang (HEC)
- Pierre-Luc Bacon (UdeM)
- Sarath Chandar (Polytechnique)
- Guy Wolf (UdeM)
- Mathieu Blanchette (McGill)
- David Rolnick (McGill)
- Irina Rish (UdeM)
Drug discovery:
- Mike Tyers (UdeM)
- Anne Marinier (UdeM)
- Yves Brun (UdeM)
Materials discovery :
- Yelena Simine (McGill)
- Jim Wuest (UdeM)
- Mickael Dollé (UdeM)
- Audrey Laventure (UdeM)
- Michel Côté (UdeM)
- Normand Mousseau (UdeM)
Programmes-cadres potentiels
The task of searching a high-dimensional space such as that of molecules, looking for a few needles in a haystack, calls upon methodological advances in AI which not only have potentially transformative applications of major societal and economic value, they also connect with some of the frontiers of AI research begging to be pushed in order to approach human-level intelligence. In this project we propose to focus in particular on the following question: where and how should a learning agent explore in order to acquire knowledge which will be most useful later? We will focus on the specific space of molecules or multi-atom configurations and we propose to work with domain experts in drug discovery and materials discovery to evaluate the methodological advances made in this project. We propose to also work with industrial and non-profit partners who could provide practical experience and feedback and take up the most promising candidates for further study and potentially for clinical trials or more expensive assays.
The general methodological framework in which we propose to situate the above question can be summarized as follows. The learning agent interacts through a series of rounds with the real world (which we call an oracle). The oracle can score how useful a candidate molecule or material might be (not necessarily with full certainty, e.g., cellular assays do not give the full picture that mice assays provide, which do not yet reveal everything that clinical trials do). At each round, the learning agent can propose a batch of candidates, and receives a scalar score for each of them, thus augmenting its training set. The learning agent itself is broken down into two main parts: a world model and an experiment or query generator. The world model can be trained on the available data in a supervised way to predict the outcome (score) of experiments. The experiment generator produces the batch of candidates (queries for the oracle) at each round and can be trained by interrogating the world model, constructing a generative model which amortizes the cost of searching virtually for candidates, so it can then be used to actually generate those sent to the oracle.
The above raises many research questions, which different researchers in the project may focus on, including the following:
- How to represent molecules and materials as input and output (to generate new ones), e.g., using graph neural networks?
- How to predict properties of a candidate and more generally train a world model trying to capture the relevant aspect of the experimental assays?
- How to discover, represent and exploit epistemic uncertainty (lack of knowledge due to insufficient or incomplete data) in the world model?
- How to efficiently optimize in the discrete space of molecules or materials?
- How to train a generative policy which can sample from as many as possible of the high modes of some reward function (which combines the score estimated by the world model and epistemic uncertainty about it)?
- How to set up experimental assays from which large quantities of data can be obtained and that may have significant economic or societal value if the search succeeds?
- How to assess the value of a candidate molecule or material in a way that can guide the AI-driven search in the directions most likely to be of significant economic or societal value?
- What properties of a molecule or material matter to the above and what constraints on these properties (like synthesizability) must be taken into account by the AI candidate generator?
- How to exploit domain knowledge (about the molecules or materials and their effect) in the design of the world model used by the AI search to score candidates? For example this may include models of molecular dynamics or of low-energy joint atomic conformations.
- How to use machine learning to accelerate physical simulations that may be exploited by the world model (or are used as a proxy to an experimental assay)?
- How to design the world model and the generative model so as to take advantage of the causal view on the whole process, as a series of interventions aiming at uncovering causal structure?
- Given that the learning agent will have a single « »life » » (made of say a few dozen rounds of interrogations of the oracle), how should it pick its hyper-parameters when it is in the middle of its life and has not yet experienced the end of it? This calls upon questions about the exploration-exploitation and the possibility of using meta-learning to simulate under different scenarios how the rest of the « »life » » could unfold.
- How to explore efficiently in the high-dimensional space of molecules, taking into account the fact that the stream of experimental data feeding the learning agent is non-stationary (because the agent explores and pushes the boundaries of its knowledge as it progresses)?
- If the world model is causally structured (broken down into cause-effect mechanisms), how to discover some of its structure and how to reason over it in order to guide the exploration and the optimization?
- Considering that targeted pathogens can develop resistance to antimicrobials, how can we characterize the evolutionary distribution of likely mutations in order to favour candidate treatments while minimizing the probability of resistant mutations to these treatments?
Ajout 13/07 :
Voici quelques exemples d’axes de recherche qui pourraient être envisagés dans le cadre de ce thème stratégique.
Batteries :
défi : rendre les matériaux utilisés dans le stockage d’énergie conformes avec les principes de développement durable.
Alliages métalliques aciers, alliages d’aluminium, etc.
défi : développer les propriétés mécaniques recherchées en évitant les éléments les plus polluants ou les plus rares
Électronique :
défi : retrouver les propriétés électriques recherchées en utilisant des éléments facilitant le recyclage ou l’intégration.
Matériaux piézoélectrique :
défi : élaboration de matériaux piézoélectriques qui pourraient revaloriser la chaleur dégagée par les serveurs roulant les applications d’intelligence artificielle afin de limiter leurs impacts en termes de consommation d’énergie et de production de gaz à effet de serre.
Matériaux neuromorphiques :
défi : développer des matériaux qui sont plus près du fonctionnement du cerveau humain. Essentiellement, les algorithmes basés sur les réseaux de neurones utilisent l’architecture binaire des ordinateurs d’aujourd’hui pour simuler des processus analogiques. Les matériaux neuromorphiques pourraient remplir cette fonction directement.
Documentation complémentaire
(pas de documentation complémentaire pour le moment)
Historique
6 juillet 2021 : Première version.
13 juillet 2021 : Ajout d’informations complémentaires dans toutes les sections
15 juillet 2021 : Ajout de personnes pertinentes