资料来源 — AI 计算结构史

Article
                                                                                 https://doi.org/10.1038/s41586-019-1923-7


Supplementary information

Improved protein structure prediction using
potentials from deep learning
In the format provided by the    Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green,
authors and unedited             Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex Bridgland, Hugo Penedones, Stig
                                 Petersen, Karen Simonyan, Steve Crossan, Pushmeet Kohli, David T. Jones, David Silver,

                                 Koray Kavukcuoglu & Demis Hassabis




Nature | www.nature.com/nature                                                                Nature | www.nature.com | 1
Supplementary information for “Improved protein structure prediction using potentials from
deep learning” Senior et al.

This document contains equations referenced by the Methods section.

Distance potentials The basic distance potential is computed as a sum over all residue pairs of
the likelihood of the inter-residue distances:
                                               X
                         Vdistance (x) = −       log P (dij | S, MSA(S)).                   (1)
                                                     i,j, i6=j

The distance potential with a reference state becomes:
                                X
          Vdistance (x) = −          log P (dij | S, MSA(S)) − log P (dij | length, δαβ ).        (2)
                                  i,j, i6=j

The torsions are modelled with a von Mises distribution for each residue:
                                       X
                  Vtorsion (φ, ψ) = −      log pvonMises (φi , ψi | S, MSA(S)).                   (3)
                                                     i

The total potential that we optimise is thus:
         Vtotal (φ, ψ) = Vdistance (G(φ, ψ)) + Vtorsion (φ, ψ) + Vscore2 smooth (G(φ, ψ)).        (4)
The terms are weighted equally as determined by cross-validation.

Distogram lDDT lDDT is computed as follows.
                                      L
                                         P
                       100                  j,|i−j|≥r,Dij <15 1(|Dij − dij | < t)
                             X X
            lDDTr =                                 P                             .               (5)
                        4L           i=1               j,|i−j|≥r,Dij <15 1
                                     t∈{0.5,1,2,4}

We define Distogram lDDT (DLDDT) analogously:
                                L
                                   P
                     100              j,|i−j|≥r,Dij <15 P (|Dij − dij | < t | S, MSA(S))
                          X X
     DLDDTr =                                         P                                  .        (6)
                     4L        i=1                       j,|i−j|≥r,D ij <15 1
                              t∈{0.5,1,2,4}



Integrated gradients Given the expected value of the distance between any two residues I and
J, dI,J (x), we can consider its derivatives with respect to the input features xi,j,c , where i and j
are residue indices and c is the feature channel index. The attribution function, as calculated using
Integrated Gradients, of the expected distance between residues I and J with respect to the input
features is then defined as
                                                  Z 1
                            I,J                0          ∂dI,J (αx + (1 − α)x0 )
                          Si,j,c = (xi,j,c − xc )      dα                          ,               (7)
                                                   α=0             ∂xi,j,c
                       X I,J
                  s.t.    Si,j,c = dI,J (x) − dI,J (x0 ),                                          (8)
                      i,j,c


                                                            32
where x0 is a reference set of features; in this case we average the input features spatially:
                                                   N,N
                                              1 X
                                     x0c =               xi,j,c .                                (9)
                                             N 2 i=0,j=0

The derivatives of d can be calculated using backpropagation on the trained distogram network,
and the integral over α is approximated as a numerical summation.




                                                    33