Research Teams and Lines

Projects

List of on-going project

A Statistical Physics' perspective to Energy Based Models (StatPhysEBM)

PI: Aurélien Decelle and Beatriz Seoane

Affiliated members:

  • Elisabeth Agoritsas,
  • Alessandra Carbone,
  • Cyril Furtlehner,
  • Flora Jay,
  • Edoardo Sarti.
Program «Proyectos de Generación de Conocimiento», 2021 - Referencia: PID2021-125506NA-I00

Duration: 01/01/2022 to 31/12/2024

Description: The project aims to develop practical aspectes of Restricted Boltzmann Machines for practical applications.

RBM correspond to an Ising bipartite model. The two layers are then used under a setting of Machine Learning. The visible layer, correspond to the variables that are observed, or the dataset, while the hidden (or latent) variables, are used to encode effective interaction between the visible nodes. The learning of these machines is typically done using a maximum likelihood approach, where a gradient ascent is performed to update the parameters of the model. In this project, we plan to study on one side the phenomenology of the RBM's during the learning, the phase transition occurring in the learning trajectory and the effect of using non-convergent Monte Carlo chains for the training - together with the development of the Tethered method to avoid this problem in low-dimensional dataset. In the other side, we wish to develop practical tools to extract and interpret the features in biological dataset - proteins, DNA ,...

The key aspects that will be studied here are

  • Development of the Tethered approach for RBM's learning for both artificial and real dataset and its adaptation to categorical variables.
  • Out of Equilibrium training of RBMs and its effect on the learned features. In particular, we wish to understand how it is related to diffusion models.
  • The phase diagram of the learning of RBMs on real datasets and its relation to the observed glassy dynamics
  • How to extract and interpret the learned features in biological cases.

 

Disordered Physics for Biology and Artificial Intelligence

PI: Aurélien Decelle and Beatriz Seoane

Affiliated members:

  • Aurélien Decelle,
  • Beatriz Seoane
Banco Santander and Universidad Complutense de Madrid,  - Referencia: PR44/21‐29937

Duration: 20/07/2022 to 19/07/2023

Description: the goal of this project is to design a Restricted Boltzmann Machine for proteomics dataset. The main difficulties is to adjust the training procedure for proteomics dataset that can be quite clustered and therefore difficult to train.

Advanced aspects of  the Restricted Boltzmann Machines using Statistical Physics

PI: Aurélien Decelle

Affiliated members:

  • Giovanni Catania,
  • Aurélien Decelle,
  • Cyril Furtlehner,
  • Flora Jay,
  • Javier Moreno Gordo,
  • Beatriz Seoane
Program "atracción de talento", 2019 - Referencia: 2019­T1/TIC­13298

Duration: 01/07/2020 to 31/06/2024

Description: The project aims to study various aspects of the Restricted Boltzmann Machine (RBM), ranging from the equilibrium properties of learned machines to the understanding of the learning dynamics.

RBM correspond to an Ising bipartite model. The two layers are then used under a setting of Machine Learning. The visible layer, correspond to the variables that are observed, or the dataset, while the hidden (or latent) variables, are used to encode effective interaction between the visible nodes. The learning of these machines is typically done using a maximum likelihood approach, where a gradient ascent is performed to update the parameters of the model. The gradient is made of two terms. First the correlation, given by the mode, between the visible variables evaluated on the dataset and the answer of the hidden node. The second is the correlations estimated by the model. This second term is precisely the problematic one, since it is impossible to evaluate it exactly, and for which we typically rely on Monte Carlo approximation to estimate it.

Despite between simple, various aspects are not understood:

  • How to evaluate such models
  • How to speed-up the Monte Carlo evaluation (accelerating the mixing time)
  • Does it make sense to add more layers

The project therefore focuses on

  • Understanding the learned machine and its equilibrium properties
  • Design applications in order to explore the range of possibilities of the machine: Genome, Proteins, ...
  • To develop new models based on the same topology: e.g. Gaussian RBM with adaptive variance, Deep Boltzmann Machine, ...

A (Spin)Glass Approach to Bioinformatics

PI: Beatriz Seoane

Affiliated members:

  • Alessandra Carbone,
  • Aurélien Decelle
  • Juan Neftali Morillo Garcia,
  • Beatriz Seoane,
Program "atracción de talento", 2019 - Referencia: 2019-T1/TIC-12776

Duration: 01/07/2020 to 31/06/2024

Description: This project seeks to apply tools from the Statistical Physics of disordered systems to Proteins, and other Computational Biology problems. In particular, we investigate the glassy behavior of intrinsically disordered proteins, and develop tools for automatic feature extraction from large-scale genomic databases using disordered spin models