RotamerTrie < BioGeometry

A topic under OptimizingRosetta

We should try to submit to WABI, April 20. This will take some work... The basic idea is described in Hellinga's paper, so we need to add something to it, which we can do by 1. experiments showing the effectiveness, 2. new pruning methods, and 3. handling of H-bonds if we can incorporate them into the tree.

I think the way the data structure and computation should be organized is that there is a "rotamer library trie" (write-only, organized as a linear list) that stores the entire rotamer library in a standard coordinate frame.

We then have a routine that makes "rotamer sets tries" at the modifiable residues. A rotamer set trie will take the subset of rotamers from the rotamer library trie that can be placed without background collisions in the coordinate frame of that residue on the protein. These become smaller linear lists with reduced atom descriptors that contain only the variable quantities (and can have a pointer back to the original atom for the static properties). Variable quantities include the coordinates in the world frame, the unit vectors for H or acceptor atoms, whether backbone D/A participate in H bonds, and skip values that are smaller because the rotamer set trie is smaller, bounding radii or ellipses if we want them to get smaller as we prune from the library to the set.

We then compute the rotamer-rotamer tables (suitably sparse) from pairs of rotamer set tries that have edges in the interaction graph and use these in the energy graph or MC evaluation as before. -- JackSnoeyink - 02 Apr 2005

Calculating Rotamer Interactions Using a TRIE

I have been designing and implementing an efficient method for calculating rotamer pair energies, that comes from a question posed in RotamerEnergyCalculation. The method relies on a tree data structure called a 'trie.'

Representing a set of rotamers with a trie

Tries are often used for dictionaries; their name comes from their ability to do fast reTRIEval of words. From the origin of the word, you would understand why some pronounce them "trees"; however, the "tris" pronunciation makes the distinction clear. In a dictionary, each node represents one letter and words trace out a path from the root to some node (not necessarily a leaf) in the trie. Words that share a common prefix, like 'apple' and 'apply,' would have a common path for the length of the shared prefix.

For a single to-be-redesigned residue, we represent all of its rotamers in a trie as follows. Each internal node in the trie represents an atom with a given type (C, N, ...) and position (xyz coordinates). Each leaf node stands for a rotamer that uses all atoms along the path from the root to that leaf. Each node stores an interaction energy variable or table, initially unassigned, which we calculate as described later.

Building a trie is easy once we decide on an ordering for the atoms of each rotamer. This order need not reflect the way these atoms are connected, but it might as well have the most commonly shared atoms first.

Some details on constructing a trie

To construct the trie, I first define an ordering on the atoms; a < b is true if ((a.type < b.type) || (a.type = b.type && a.x < b.x) || (a.type = b.type && a.x = b.x && a.y < b.y) || ...). Then I come up with a canonical ordering of the atoms (by atom name) for each rotamer. Two rotamers of the same amino acid will produce two lists of atoms, which may differ by the atomic coordinates. I can extend the ordering on atoms to a simple lexographic ordering on rotamers: r < s if (( r.at[1] < s.at[1] ) || (r.at[1] == s.at[1] && r.at[2] < s.at[2]) || ... ). If I now sort a list of rotamers according to this order, then rotamers with a common prefix will be adjacent. To build my trie, I first create a list of rotamers, and then sort this list. With the sorted list, it is easy to see which atoms are shared between two adjacent rotamers. jss It isn't any easier -- you still have to do it bottom up. ssj Knowing how many shared atoms there are lets me count the number of trie nodes I will have to allocate.

Let MAX_ATOM = the maximum number of atoms in an amino acid.

Rather than explicitly storing atom-less leaves for rotamers, we can mark internal nodes with a rotamer_id that stores the -1 if that atom is not the last atom in a rotamer, and the rotamer id if that atom is the last atom in a rotamer. Note that rotamers may terminate in non-leaf nodes. Consider two rotamers of histadine; a singly protonated (uncharged) version and a doubly protonated (positively charged) version that share the same coordinates for all of their heavy atoms. Let's say that the singly protonated histadine rotamer has its proton attached to ND1. In the rotamer trie where the canonical ordering for histadine atoms ends in "... ND1 HD1 NE2 HE2", these two rotamers would share almost all their computations. The trie node for NE2 would have to note that it is the last atom in the singly protonated rotamer X, and the HE2 trie node would note that it is the last atom in rotamer Y.

Calculations using rotamer tries

The interaction energies between all rotamer pairs for two residues can be efficiently calculated using a trie for each rotamer set, say R and S on residues 1 and 2.

To calculate the rotamer / rotamer pair energies, we use two recursive functions: atom_vs_trie() and trie_vs_trie(). For clarity we will describe these functions recursively; for speed we will implement these functions iteratively. These functions rely on a low level function, atom_atom_energy(), which encapsulates the details of the atomic forcefield.

atom_atom_energy

atom_atom_energy(r, s) returns the interaction energy between atoms r and s. Atoms are described by their atom type (e.g. sp3 carbon, sp2 carbon, polar hydrogen, alaphatic hydrogen, etc.), their center in 3-speace, and by unit vectors pointing to their base atoms. The Lennard-Jones atractive and repulsive terms and the Lazaridis-Karplus implicit solvation term depend upon the atom type and the atom center only. The hydrogen bonding term depends in addition on the cosine of the base-atom with hbond vectors.

atom_vs_trie interaction

atom_vs_trie(r, s, ancestral_e) recursively computes the interaction energy between atom r and all the rotamers in the subtree of S rooted at node s. It stores these energies in a global variable, at_vs_rot_stack, a stack of arrays. atom_vs_trie calls atom_atom_energy() and is called by trie_vs_trie(r, S).

trie_vs_trie interaction

trie_vs_trie(r, S) recursively computes the interaction energy between the rotamers in the subtree of R rooted at node r and rotamers in the trie S. It stores these energies in a global variable, rot_rot_energies, a table. An invocation of trie_vs_trie(R.root, S) will calculate all rotamer/rotamer pair energies. trie_vs_trie invokes atom_vs_trie.

Functions in detail.

Now we give the algorithm to calculate rotamer / rotamer energies.

Global variables

rot_rot_table. This table has (R.num_rotamers $\times$ S.num_rotamers) entries which will be the computed interaction energies for the corresponding pair of rotamers.
at_vs_rot_stack. This is a stack of arrays, each with S.num_rotamers enetries, that collect interaction energies of ancestor atoms of nodes in rotamer tree R with rotamers of S. The stack depth is limited to MAX_S_DEPTH, which is the maximum number of ancestors with siblings of any leaf in a rotamer tree.
ARstack_top. Top of stack pointer for at_vs_rot_stack.

at_vs_trie(r, s, ancestral_e) in detail

Pre-Condition: r is an atom of R, s is an atom of S. at_v_rot_stack(stack_depth, : ) contains the sum of the interaction energies of all of r's ancestors with the rotamers of S that, in the preorder traversal of S, terminate at or after s, and contains the sum of r's ancestors' and r's interaction energies for all rotamers of S that, in the preorder traversal of S, that terminate before s. ancestral_e holds the sum of the interaction energies r has with all ancestors of s.

Post-Condition: at_v_rot_stack( stack_depth, : ) contains the sum of r's ancestors' and r's interaction energies with all rotamers of S whose terminals are before s in a depth-first traversal of S or descendants of s. For all other rotamers, at_v_rot contains the sum of r's ancestors' interaction energies.

psuedocode:

atom_vs_trie(r, s, ancestral_e) 
{  ancestral_e += atom_atom_energy(r, s);
   if (s._terminal_rotamer_id != -1)  
      at_vs_rot_stack[stack_depth][s._terminal_rotamer_id] += ancestral_e;
   for (int i = 0; i < s._num_children; i++)
      atom_vs_trie(r, s._children[i], ancestral_e);
   return;
}

trie_vs_trie(r, S) in detail

Pre-Condition: r is an atom of R. at_vs_rot_stack[ stack_depth ] contains the sum of the interaction energies of r's ancestors with the rotamers of S. If r is the root, then stack_depth must be zero and each entry in at_v_rot_stack is zero. rot_rot_table contains the interaction energies for all rotamers of S against the rotamers of R that, in the preorder traversal of R, terminate before r.

Post-Condition: rot_rot_table contains the interaction energies for all rotamers of S and the rotamers of R that, in the preorder traversal of R, terminate before r or 3) that terminate in the subtree of R rooted at r.

Psuedocode

trie_vs_trie(r, S)
{  
   atom_vs_trie(r, S.root, 0);

   if (r._terminal_rotamer_id != -1)
      rot_rot_table[ r._terminal_rotamer_id b] = at_v_rot_stack[ stack_depth ]; // 
   
   if (R._num_children > 0) {
     stack_depth++;
     for (int i = 0; i < R._num_children-1, i++) {
       //copy stack top for nodes with siblings remaining
       at_v_rot_stack[stack_depth] = at_v_rot_stack[stack_depth - 1];
       trie_vs_trie(r.child[i], S);
     }
     stack_depth--;
     trie_vs_trie(r.child[R._num_children-1], S);
   }
   return;
}

Data structure refinements for efficient storage and computation

Here we consider ways to reduce the space and time. Our traversal of the trie for R must actually store interaction energies for its nodes with all rotamers of S. Since it only ever needs the energies on a root to leaf path, we can save memory (and have better cache coherency) by simply maintaining a stack of those energies that we need. And there are some instances in traversals of S where we do not need to evaluate children, either because the cost is already too high, or because we can see that they will not contribute. We consider this savings first.

Pruning in the trie traversal

There are two opportunities to prune, of which the current implementation uses only the second, I think. say how you would do these in the trie -- what variables must you add?

Pruning subtrees whose interaction cost has exceeded a threshold

Suppose that all interactions with energy exceeding a threshold are to be discarded. If we estimate the beneficial terms in a subtree, when we can halt the descent whenever the current energy exceeds the threshold by more than the beneficial amount. Since the threshold can be set to infinity, there is no harm in using it.

Pruning atoms with zero interaction

There are two kinds of zero interactions that Rosetta prunes. The first prune is of entire rotamer/rotamer calculations. Rosetta prunes with a CB-distance-based cutoff . The cutoff depends on the amino acid type of both rotamers. It's worth noting that all rotamers of a single amino acid on a single residue place their CB's identically.

I have not figured out an excellent way to implement a similar pruning. Such pruning is critical, as the number of CB's outside the cutoff distance is very large compared to the number of CB's inside of it.

Here is one idea:

At each trie_node, keep a smallest enclosing sphere for the atoms in the subtree rooted at that node. In at_v_trie(r, S) the distance between r and s.sphere will determine if the subtree rooted at s needs examining. If the distance exceeds an interaction threshold, skip over the atoms in that subtree. To maintain the at_v_rotamer_stack invariant, at_v_rotamer stack must be updated for all the rotamers which terminate in the skipped-over subtree. s will have to keep the index of the first and last rotamers that terminate in that subtree it roots, and update those entries in at_v_rotamer_stack.
If you really mean smallest enclosing sphere, that is too much because it makes you store a center and compute another distance -- too much when you are saving just some distance evaluations and lookups. -- JackSnoeyink - 20 Mar 2005
- However, keeping the radius of the sphere with the current atom center is good. You then have a few options with your existing distance computation: if dist(r,s)^2 > (r.radius + s.radius + interaction_threshold)^2, then all rotamers in r subtree x all rotamers in s subtree get ancestor_energy and both subtrees are skipped, else if dist(r,s)^2 > (s.radius + interaction_threshold)^2, then the stacked R energy vector gets the ancestor energy for all rotamers in s.subtree and the s subtree is skipped, else you continue down the s.subtree as usual. (Check if it is worth computing the thresholds or just making conservative overestimates in advance.)

The second prune concerns hydrogen atoms. Rosetta considers only the Lennard-Jones repulsive forces for hydrogens. The drop-off of the repulsive term is steep; at short distances, the repulsive energy is 0. To calculate the interaction between two rotamers rosetta examines all heavy-atom pairs. Rosetta will look at the hydrogen/heavy-atom and hydrogen/hydrogen pairs only if the heavy-atoms to which the hydrogens are attached are sufficiently close.

Note that this is the same prune now.

Currently I avoid calling fast_pairenergy for hydrogen atoms that are outside of the cutoff range, but I still pay to look at every hydrogen/heavyatom and hydrogen/hydrogen pair. Below is the code I use at this moment.

I maintain the stack of boolean arrays, parent_heavy_wi_hydrogen_cutoff which allows me to decide for hydrogen atoms in both R and S (R is "this" in the code below, and S is "rt") whether I should call atom_atom_energy. I have a few loop invariants that I've put into the comments.

not needed. Don't complicate things, make them simpler. I think that now you have the general idea, you need to start thinking from the perspective of the output data, so you can allocate everything up front and avoid allocating anything in the middle of your loops.
- You are trying to fill rot_rot_table. You have two different types of tries, R and S. They have a three different types of nodes: leaves (always marked with rot_id; stack popped to level of next node), nodes with one child (any with rot_id? stack need not be updated), nodes with two or more nodes (some with rot_id; need to push stack). Each node can be marked with a rotamer_id.
- The S tree tries to fill a rot vector entry when it hits a rot_id, otherwise it keeps a single value in the stack. If you prune a subtree, you get to fill several entries, depending on how many ids are in the subtree. (See man memset() for possibly faster fills.) (If you store the last rotamer id output with each node, then you don't even have to look at the subtree to determine which rotamers you missed.)
- The R tree fills a line of the rot_rot_table for a rotamer, or uses S to fill a vector in the stack. When you get to prune an r.subtree against an s.subtree, you can fill a rectangle of the table.

The main problem with this structure is that it does not skip over hydrogen atoms, it still considers them, even if only to copy a level of the stack downwards.

Yes, this is why you want to group atoms in one trie node, so you don't have to copy the stack. (You can still think of it as copying the stack without actually doing so. When you linearize the tree, just record the necessary stack operation with each node. See below.)

I do avoid calling "atom_atom_energy" but this doesn't feel like it's enough. I have some measurements on my laptop where I compare the existing code with this code; I observe a factor of .6 speedup ( 14.0 seconds for the existing code vs 8.9 seconds for the new code). Specifically, I am measuring two adjacent residues, residues 2 and 4, on the surface of ubiquitin while using a large rotamer library (arg, lys, met excluded). Each rotamer trie contains about 5K atoms; each trie_node takes 70 bytes (atom descriptor requires 9 floats and two integers). If I go and increase the rotamer library to include arg, lys, and met, then there are ~12K atoms in each rotamer_trie. I measured the trie_v_trie call to take 45 seconds compared against the existing code's 40 seconds.

It is good to hear that you are getting some speedup already. There are lots of ways to improve what you have.

I've done some math. The two tries with 12K atoms chew up 1.6 megs. I've exceeded the cache size by a mile.

So, I'm thinking if I can decrease the size of the trie_node, I can fit at least one rotamer_trie into memory. I suggest creating a light weight atom descriptor which keeps pointers (shorts) to atoms. I will also need a list of atoms. A list of 12K atoms with 12 bytes / atom will take 144K. The light_weight_trie_node will shrink to 23 bytes, so that with 12K atoms in the rotamer_trie, the atom_coords + trie_node * 12 combo will cost ~420K. That's almost the entire cache.

The pointers to atoms are worse than useless. The only purpose of the pointer is to access the atom, so it just takes more memory. This is what I mean when I suggest to think starting from the data. What do you need in a rotamer tree node to carry out the computation of energy and decide which node to visit next? Suppose that we linearize with the preorder traversal. Then we get to the next node by +1, and we only need to know what to do with the stack. When we prune, we need to know what node to skip to, and what to do with the stack.
- coordinates (12bytes), atom type (1) to call fast_pairenergy,
- rotamer_id (2),
- stack operation (1 byte: push = -1, or # to pop from 0-stack depth)
- prune radius (4), skip (2), & stack operation (1) When you can prune, shows how far in the list to skip and how many to pop off the stack to get to the next trie node.
- Maybe another radius for H bonds if you wish.
- Total <32 bytes x 12K nodes = 384Kb. It might be worth doing the experiment to see if not storing the prune radius with everything helps. (Just treat Cb special and H special.) But get this working first so we can instrument and collect data on how often things are pruned at different levels.
You only want one tree in cache (the other is touched once) and if you can prune some of that tree, you won't need it all. (Of course, cache lines will also occupied by the lookup tables that come in, etc...

rot_rot_table is about 16MB (2K x 2K x 4 bytes/float);

at_v_rot_stack is about 240K; two levels of the stack require only 16K

parent_wi_hydrogen_cutoff is about 480K; one level of the stack requires only 12K.

I fail to see how you can make the stack larger than the table you will fill, since every number is in the stack in order to eventually fill a slot in the table. (That is, if you have a number that is not being kept to contribute to the table, then there is something wrong.) And as I said above, I don't see the need for this hydrogen cutoff -- it belongs in the trie as part of a more general mechanism. (You can actually use the end of the table for the stack if your are careful, but I don't recommend this.)

Some other observations for speed
- Use class assignment or, if you can't, use memcopy() instead of writing your own memory copy loops.
- don't try to outsmart the compiler. E.g., keeping both indices ld and ldminus1 in a loop is a bad idea. The complier will convert table[ld-1] access to *((table-sizeof(entry))+ld*sizeof(entry)), where the first term is computed at compile time, and the second is reused from the table[ld] access. Thus, your table[ldminus1], which is *(table + ldminus1*sizeof(entry)) occupies another register and adds another multiplication for no benefit.
- Avoid the FA classes for 2 & higher d. They index column major, which means you are filling your rot_rot_table with 2k strides, and the compiler won't optimize them. I guess the rot_rot_table can be kept, since it is for output, but then you should fill it column first.
- fast_pairenergy can be sped up if we combine the tables into one struct and do one lookup. (The compiler will do better with C arrays because it won't have to first compute an index and then compute a pointer.)

Here's the code I promised.

void 
rotamer_trie::trie_v_trie_iterative( 
   rotamer_trie & rt,
   FArray2Da_float & rot_rot_table
)
{
   using namespace pdbstatistics_pack; //hydrogen_interaction_cutoff defined here.
   
   //cerr << "begin trie_v_trie_iterative" << endl;
   
   int rt_num_atoms = rt.get_num_atoms();
   int rt_num_rotamers = rt._num_rotamers;
   int this_num_atoms = get_num_atoms();
   
   FArray2D_float at_v_rot_stack( rt_num_atoms, _max_stack_depth, 0.0);           
   FArray2D_bool  parent_heavy_wi_hydrogen_cutoff(rt_num_atoms, _max_heavyatom_depth, true);  
   FArray2D_float hbderiv(3,2,0.0);
   
   float energy_stack[ MAX_ATOM + 1 + 1];  //index by 1, + 1 for root_depth = 1.
   energy_stack[1] = 0;
   
   int r_curr_stack_depth = 1;
   int r_last_tree_depth =  1;
   int s_curr_stack_depth; 
   int s_last_tree_depth;
      
   for (int ii = 1; ii <= this_num_atoms; ii++)
   {  
      energy_stack[1] = 0;
      trie_node & r = _trie[ii];
            
      if (r._depth <= r_last_tree_depth)
      {  //pop the stack;
         r_curr_stack_depth--;
      }
      r_last_tree_depth = r._depth;
            
      if (r._has_sibling)
      {  //push - copy stack downwards
         r_curr_stack_depth++;
         FArray1Da_float at_rot_array_d_proxy(at_v_rot_stack(1, r_curr_stack_depth));
         FArray1Da_float at_rot_array_dminus1_proxy(at_v_rot_stack(1, r_curr_stack_depth-1));
         at_rot_array_d_proxy.dimension(rt_num_rotamers);
         at_rot_array_dminus1_proxy.dimension( rt_num_rotamers );
         
         at_rot_array_d_proxy = at_rot_array_dminus1_proxy;
      }
      
      FArray1Da_float at_rot_array_proxy(at_v_rot_stack(1, r_curr_stack_depth));
      at_rot_array_proxy.dimension( rt_num_rotamers);
      
      if (r._is_H)
      {  FArray1Da_bool parent_wi_h_dist( parent_heavy_wi_hydrogen_cutoff( 1, r._heavyatom_depth) );
         parent_wi_h_dist.dimension(rt_num_atoms);
         
         s_curr_stack_depth = 1;
         s_last_tree_depth = 1;
         
         for (int jj = 1; jj <= rt_num_atoms; jj++)
         {  
            trie_node & s = rt._trie[jj];
            if (s._depth <= s_last_tree_depth)
               {  s_curr_stack_depth--;}
            s_last_tree_depth = s._depth;
            
            if (s._has_sibling)
            {  //push - copy stack downwards
               s_curr_stack_depth++;
               energy_stack[s_curr_stack_depth] = energy_stack[s_curr_stack_depth - 1];
            }
            
            if (! parent_wi_h_dist(jj) ) 
            {
            }
            else
            {   
               energy_stack[ s_curr_stack_depth ] += atom_atom_energy(r, s, _res_id, rt._res_id, hbderiv);
            }
            
            if (s._rotamer_terminal != 0)
               at_rot_array_proxy(s._rotamer_terminal) += energy_stack[ s_curr_stack_depth ];
         }
      }
      else //r is a heavy atom 
      {
         FArray1Da_bool parent_wi_h_dist( parent_heavy_wi_hydrogen_cutoff(1, r._heavyatom_depth));
         parent_wi_h_dist.dimension(rt_num_atoms);
         
         s_curr_stack_depth = 1;
         s_last_tree_depth = 1;
         
         for (int jj = 1; jj <= rt_num_atoms; jj++)
         {  
            trie_node & s = rt._trie[jj];
                       
            if (s._depth <= s_last_tree_depth)
            {  s_curr_stack_depth--;}
            
            s_last_tree_depth = s._depth;
            
            if (s._has_sibling)
            {  //push - copy stack downwards
               s_curr_stack_depth++;
               energy_stack[s_curr_stack_depth] = energy_stack[s_curr_stack_depth - 1];
            }

           
            if (s._is_H)
            {
               if (! (parent_wi_h_dist(s._parent_heavyatom)) )
               {  parent_wi_h_dist(jj) = false; 
               }
               else
               {  parent_wi_h_dist(jj) = true;
                  energy_stack[ s_curr_stack_depth ] += atom_atom_energy(r, s, _res_id, rt._res_id, hbderiv);
               }
              
            }
            else
            {
               float d2;              
               energy_stack[s_curr_stack_depth] += heavyatom_heavyatom_energy(r, s, _res_id, rt._res_id, hbderiv, d2);
               parent_wi_h_dist(jj) = (d2 < hydrogen_interaction_cutoff);
              
            }
           
            if (s._rotamer_terminal != 0)
            { 
              at_rot_array_proxy(s._rotamer_terminal) += energy_stack[s_curr_stack_depth];
            }
         }
      }
      
      if (r._rotamer_terminal != 0)
      {  
         for (int jj = 1, lrot_rot_array = rot_rot_table.index(1, r._rotamer_terminal), 
         lat_rot_array_proxy = at_rot_array_proxy.index(1); jj <= rt_num_rotamers; jj++, 
         lrot_rot_array++, lat_rot_array_proxy++)
       {  rot_rot_table[ lrot_rot_array ] = at_rot_array_proxy[ lat_rot_array_proxy ];
       } 
      }
   }
   
   return;
}

(We may want to do this for non-H atoms, too, to see if it helps. The general idea is to assign each node of S a radius such that atoms of R outside the radius have no contribution to the energy of the subtree of that node. Then bail whenever you are outside the radius (remembering to record the current energy value for all the rotamers in that

Reducing k-arity trees to binary by left-child/right-sibling

One trick that helps us is to reduce the trie from a k-arity tree to binary. We can use a standard binary tree representation of a k-ary tree where the k child pointers are replaced by a single left-child pointer and a right-sibling pointer. Note that pre-order traversal visits the same nodes, so we need not change our subroutines much.

On second thought, we don't have to do this to linearize. Since we usually traverse the tree in preorder, just make a linear list of the nodes in preorder and scan linearly. You need to record whether to push or how often to pop the stack of ancestral energy values or vectors to go from one node to the next. And you need to point out where you jump to if you can prune (jumping you do by scanning over the jumped nodes and outputting the current energy for all of them marked by rotamer ids. Much faster than recursion.

apl I had this thought when I woke up this morning and started implementing it before I read your comments. pla

Reducing left-child/right sibling trees to depth trees

Ordering rotamers as I do, I insert atoms into a large array such that they are ordered by their position in the preorder traversal. When I branch to children and then to siblings, I march across the large array from beginning to end. Keeping this preorder traversal, I can avoid representing child and sibling pointers all together; I'll just march. All I need to represent at each node is its depth. What does this let me do? I have created an iterative routine atom_v_trie_iterative which marches across the entire trie S, maintaining an ancestral_energy_stack, which makes up for the "ancestral_e" variable in the recursive function.

atom_vs_atom interaction

atom_v_atom(r, s, at_v_rot, ancestral_e) :

precondition: r is a single atom of rotamer trie R at depth d. s is a single atom of rotamer trie S at depth h, at_v_rot contains the sum of the interaction energies of all of r's ancestors with the rotamers of S for all rotamers with terminals at s or after s in a depth first traversal of S, and the sum of r's ancestors' and r's interaction energies for all rotamers of S that precede s in the depth first traversal of S. ancestral_e contains the sum of the interaction energies r has with the ancestors of s.

post-condition: at_v_rot contains the sum of r's ancestors' and r's interaction energies with all rotamers of S whose terminals are before s in a depth-first traversal of S or descendants of s. For all other rotamers, at_v_rot contains the sum of r's ancestors' interaction energies.

psuedo-code:

atom_v_atom(r, s, at_v_rot, ancestral_e) 
{  e = atom_interaction_energy(r, s);
   if (s._terminal_rotamer_id != 0)  
      at_v_rot[s._terminal_rotamer_id] = ancestral_e + e;
   if (s.child)
      atom_v_atom(r, s.child, at_v_rot, ancestral_e + e);
   if (s.sibling)
      atom_v_atom(r, s.sibling, at_v_rot, ancestral_e);
   return;
}

atom_vs_trie interaction

atom_vs_trie(r, T, energy_table) :

precondition: r is a single atom of rotamer trie R at depth d. at_v_rot_stack[d + 1] is the stack level currently being used; it is an array with S._num_rotamers elements. at_v_rot contains the sum of the interaction energies of r's ancestors in R with the rotamers of S. If r is the root of R, then all of at_v_rot_stack[d] entries will be zero. rot_rot_table contains the interaction energies for all rotamers of S against the rotamers of R that terminate in nodes that proceed r in a depth first traversal of R.

postcondition: at_v_rot contains the sum of the interaction energies of r's ancestors with the rotamers of S, and the interaction energy of r with the rotamers of S as well. rot_rot_table contains the interaction energies for all rotamers of S and the rotamers of R that either 1) terminate in nodes that preceed r in a depth first traversal of R, 2) that terminate in r itself, or 3) that terminate in descendants of r.

psuedo-code:

atom_v_trie(r, S, d, at_v_rot_stack, rot_rot_table)
{  
   //copy preceeding stack to this level.
   at_v_rot_stack[d + 1. :] = at_v_rot_stack[ d, :];

   atom_v_atom(r, S.root, at_v_rot_stack[d + 1, :], 0);

   if (r._terminal_rotamer_id != 0)
      for (ii = 1:S._num_rotamers)
         rot_rot_table[r._terminal_rotamer_id, ii] = at_v_rot_stack[d+1, ii];
   
   if (r.child)
      atom_v_trie(r.child, S, d+1, at_v_rot_stack, rot_rot_table);
   if (r.sibling)
      atom_v_trie(r.sibling, S, d, at_v_rot_stack, rot_rot_table);
   return;
}

We use two stacks to implement these two recursive functions; the calling stack for atom_v_atom and the at_v_rot_stack for atom_v_trie. This isn't mutual recursion. I did not have anything in mind previously when I referred to "plural" recursion. jss What the hell for... One stack is sufficient. I think you mean that you will use two stack in the implementation, but don't say something requires something unless it does. And this is hardly a plural recursion. It isn't tail recursion but it is pretty standard. Please look up "tail recursion", "inorder", "preorder", and "postorder" in the white book if you're not familiar with these concepts. ssj

jss It seems to me you have several things to do.

Have you precisely defined your tries? Do you want to attach atoms to nodes or edges? (my usual definition of a tree often will say, "a leaf represents... an internal node represents..., an edge represents..."" In particular, how are H represented in the trie?
You should define processing on the trie, including H atoms.
Then you can convert your trie into left child, right sibling representation, and say how that changes the processing.
Then you can linearize so that you store your binary tree in an array, and say how that changes the processing.

If you make each step of processing clear, before you convert the representation to a new one, then your algorithm will be clear. ssj

Linearizing the binary tree

jss don't worry about this until you've stated the invariants for the previous level of abstraction ssj

I lay out the nodes in the trie in an array, in preorder traversal order. I can represent the tree structure using only "depth" indicators, instead of having child and sibling pointers.

Detailed subroutines

Detailed pruning

jss write this as what you do, not as what you would like to do ssj I would like also to avoid considering hydrogen interactions in the trie structure unless the distance threshold for their heavy atoms is met. jss say why (what invariants are maintained) instead of specific implementation details ssj I can do so by introducing two more pointers. I will order each rotamer so that hydrogen atoms will line up immediately behind their heavy atoms. A "skip" pointer will bypass these hydrogen atoms, pointing at the next heavy atom in the list. For example, the leucine ordering is "CA HA CB 1HB 2HB CG 1HG CD1 ...". CB's child pointer is 1HB. It's skip pointer is CG. The second additional pointer is what I call the "special sibling" pointer. If two leucine rotamers share a Chi-1 dihedral but differ at Chi-2, then they will trace out the same path along the trie until the 1HG atom. jss the example is only helpful if you have concisely stated the general principle ssj

rot a: CAa HAa CBa 1HBa 2HBa CGa 1HGa CD1a 1HD1a 2HD1a ...
rot b: CAa HAa CBa 1HBa 2HBa CGa 1HGb CD1b 1HD1b 2HD1b ...

where a and b are used to point out if atoms have the same coordinates. The sibling pointer that connects these two rotamers is 1HGa's sibling pointer which points at 1HGb. Rotamer a's CGa skip pointer points at CD1a. If we skipped past 1HGa, then we would not follow the path over to rotamer b. To make up for this, CD1a's special-sibling pointer will point at CD1b. We will follow a special-sibling pointer only if we have arrived at a heavy atom having skipped some hydrogen atoms before it. Evaluate(r,s,e) will be modified to take a new parameter: Evaluate(r, s, e, did_we_skip_any_H);

With skip and special sibling pointers we can skip hydrogen atoms of S when we're considering a single heavy atom r. jss what invariant are you maintaining? ssj We cannot skip hydrogen atoms of R so easily. We must still consdier each one. However, when we're looking at a hydrogen atom, r, we can skip over S's hydrogen atoms when appropriate without making any distance calculations. To do this, we store a boolean "last_d2_close_enough" on each of S's heavy atoms that records whether the distance between that heavy atom and the heavy atom ancestor of r that r is attached were close enough that r's interaction should be calculated. If r's ancestor was not close enough to heavy atom s, then not only to we not have to calculate that interaction, but we may follow s's skip pointer and avoid the hydrogens attached to it.

Revision: r1.1 - 13 Apr 2005 - 09:59 - Main.guest

BioGeometry > ProgressReports > JackProgress > OptimizingRosetta > RotamerTrie