CryoSPHERE: bridging the gap between AlphaFold and cryoEM images.


1Linköping University, 2Uppsala University
ICLR 2025

Abstract

The three-dimensional structure of proteins plays a crucial role in determining their function. Protein structure prediction methods, like AlphaFold, offer rapid access to a protein’s structure. However, large protein complexes cannot be reliably predicted, and proteins are dynamic, making it important to resolve their full conformational distribution. Single-particle cryo-electron microscopy (cryo-EM) is a powerful tool for determining the structures of large protein complexes. Importantly, the numerous images of a given protein contain underutilized information about conformational heterogeneity. These images are very noisy projections of the protein, and traditional methods for cryo-EM reconstruction are limited to recovering only one or a few consensus conformations. In this paper, we introduce cryoSPHERE, which is a deep learning method that uses a nominal protein structure (e.g., from AlphaFold) as input, learns how to divide it into segments, and moves these segments as approximately rigid bodies to fit the different conformations present in the cryo-EM dataset. This approach provides enough constraints to enable meaningful reconstructions of single protein structural ensembles. We demonstrate this with two synthetic datasets featuring varying levels of noise, as well as one real dataset. We show that cryoSPHERE is very resilient to the high levels of noise typically encountered in experiments, where we see consistent improvements over the current state-of-the-art for heterogeneous reconstruction.

What is cryoEM ?

Single-particle cryo-electron microscopy (cryo-EM) is a powerful technique for determining the three-dimensional structure of biological macromolecules, including proteins. In a cryo-EM experiment, millions of copies of the same protein are first frozen in a thin layer of vitreous ice and then imaged using an electron microscope. This yields a micrograph: a noisy image containing 2D projections of individual proteins. The protein projections are then located on this micrograph and cut out so that an experiment typically yields ten thousands to ten millions images of individual proteins, referred to as particles. Our goal is to reconstruct the possible structures (called conformations) of the proteins given these images. Frequently, proteins are conformationally heterogeneous and each copy represents a different structure. Conventionally, this information has been discarded, and all of the sampled structures were assumed to be in only one or a few conformations (homogeneous reconstruction). Here, we would like to recover all of the structures in a heterogeneous reconstruction. There are a number of challenges:

  • The signal-to-noise ratio is low, typically 0.01 or lower.
  • The pose of the protein on a given image is unknown.
  • The protein might have different conformations on different images.

Why cryoSPHERE ?

Traditional methods reconstruct a volume of the protein, typically in Fourier space. Framing the problem this way has two shortcomings that make these methods susceptible to the low SNR:

  • It disregards prior information about the protein: we may already have a good idea of its structure, obtained through previous experiments such as Xray crystallography or using AlphaFold.
  • The motion of the protein is restricted by the laws of physics. Integrating such constraints is difficult for volume methods.
To remedy this, we root cryoSPHERE in the observation that different conformations can often be explained by large scale movements of domains of the protein. Specifically, we develop a variational auto-encoder (VAE) that, from a nominal structure and a set of cryo-EM images:
  • Learns how to divide the amino-acid chain into segments, given a user defined maximum number of segments. The nominal structure can for instance be obtained using AlphaFold.
  • For each image, learns approximately rigid transformations of the identified segments of the nominal structure, which effectively allows us to recover different conformations on an image-by-image basis.
These two steps happen concurrently, and the model is end-to-end differentiable.

Advantages of cryoSPHERE

Formulating the problem this way has a number of advantages:
  • Efficiency in deformation: Deforming a base structure into a density map avoids the computationally expensive Npix^2 pix evaluation required by a decoder neural network in methods implicitly parameterising the grid. Furthermore, direct deformation of a structure directly avoids the need for subsequent fitting into the recovered density map.
  • Reduced dimensionality and noise resilience: learning a transformation per segment is a lower dimensional problem than a transformation per residue. In addition, it corresponds to lower frequency movements that are less polluted by noise.
  • Interpretability: CryoSPHERE outputs segments along with one rotation and one translation per segment, providing valuable and interpretable information.

How does cryoSPHERE works ?

VAEFLOW


CryoSPHERE takes the number of segments N used to cut the protein as a hyperparameter. During a forward pass, two steps happen concurrently:

  • An encoder maps the image to a distribution over a latent space. A latent variable is sampled and the decoder maps it to N rotations and N translations, that is, one rotation and translation per segment.
  • A Gaussian mixture with N modes over the line supporting the residue indexes is used to compute the probability of each residue to belong to each segment.

Each residue position in the nominal structure is then rotated and translated according to the transformation of each segment, where these transformations are weighted by how much the residue belongs to each segment. This way, the transformation of the protein is locally rigid. The transformed nominal structure is then turned into a volume and projected into an image, that is compared to the true image. Finally, during the forward pass, the parameters of the encoder, decoder and Gaussian mixture are updated.

Results

cryoSPHERE outputs, for each image:

  • A latent distribution associated with that image.
  • The nominal structure deformed to match that image.
  • The segmentation of the protein. This segmentation is shared accross all the images.

One can make a PCA of the latent space recovered by cryoSPHERE and create structures corresponding to each of the principal component traversal. This provides a direct interpretation of motion recovered by cryoSPHERE.

We can plot structures recovered by cryoSPHERE to interpret the motion or compare it to the ground truth on toy datasets:

For visualization purposes, we can also create movies of the motion corresponding to principal components traversal. Below is a video of the motion recovered on EMPIAR10180: