AndrewProgress < Main

Main

Main.AndrewProgressr1.1 - 25 Mar 2005 - 20:14 - Main.guesttopic end

Start of topic | Skip to actions

Moved rotamers in trie topic to RotamerTrie, linked from OptimizingRosetta.

Status

At the moment, I have

created an atom ordering for each amino acid
created a rotamer_descriptor class with an appropriate "operator <" for sorting
succesfully sorted a rotamer set
implemented a rotamer_trie constructor that compiles (and crashes)

I will begin implementing the recursive energy evaluation shortly. --3.14.2005

Comment

You may implement the recursive energy if you want to do the comparison, but non-recursive will be better unless your compiler is VERY smart. Implement both types of pruning, and let's see if we can get a threshold that will help.

Update

Past week:

Implemented a function called "get_fast_rotamer_energies" which uses the unique-atom method I previously described in OpenProblems. After a significant amount of work, I reduced the energy computation time by 5%. In a large rotamer library redesign of Ubiquitin's surface, a previous call to get_energies() that took 8.09 minutes now takes me 7.79 minutes. While this result is disappointing, I now have code in place that will make speeding up my non-pairwise decomposable energy model possible. As evaluating the NPD model takes ~3 times longer than a single call to get_energies in its current form, I'm expecting the improvement in moving to this new method to be large.

-- 2.28.2005

Update

Progress for the past week.

I have the energy_graph structure working for pairwise decomposable energy functions. For the small problems I have tested on my computer, I observe a factor of 2 to 3 speedup for simulated annealing. Simulated annealing takes on the order of 4 seconds on my laptop for these problems. With my new code, it takes 1 to 2 seconds. When I tested my code on some larger problems on Brian's machines, I observed no speedup whatsoever. I'm not sure why, though I'll keep thinking about it.
Rosetta's get_energies() function is painfully slow. And painfully redundant. Where simulated annealing takes seconds, get_energies() can take 20 minutes or longer. Speeding up simulated annealing seems superfluous. On Friday I began working on a "unique atom" approach to speeding up the rotamer pair calculations, and I think I can have it done reasonably soon. I have created a section of the Wiki for OpenProblems and included a description of the general problem of rotamer-pair energy calculation.
My non-pairwise-decomposable energy model currently uses 7 times as much memory as Rosetta is currently using. This is a problem. I will want to decrease the memory usage. I keep around an additional 6 floating point values for each rotamer pair. I don't need 6 additional values - it's just easier to allocate that much space. I have thought of a way to drop the extra memory to something like 1 additional floating point value for each rotamer pair. I will begin implementing this improvement.
My NPD model is very slow to evaluate. I want to first speed up Brian's get_energies() function, and then apply these speedups to the NPD model.

-- 2.21.2005

Update

Progress for the past week.

I have fixed all the bugs I know about with the NPD function and it compiles and runs to completion with the gccdebug compile option. I get an error when I use the -03 compile option - Rosetta crashes very early - well before the code I've written is evaluated. I think this is an inline problem. I solved a few other inline problems that the linker complained about with the 03 option along the way, and I think the problem I'm now seeing is related.
I have added a "proxy" functionality to the energy_graph edge classes (bg_sc_edge, and sc_sc_edge) so that they will not allocate any memory until they get a non-zero value to hold. They will delete themselves if they never allocated their energy tables.
I installed CVS on my computer.
I have begun modifying the energy_graph class to work with pairwise decomposable energy functions. I should be done shortly.

-- 2.14.2005

Compiling and Running A Non-Pairwise Decomposable Solvation Model in Rosetta

I have a running, though buggy version of my non-pairwise decomposable solvation model. I will describe briefly the code structure.

Brian wrote the "pack" module of Rosetta. To pack new sidechains into an existing scaffold, he has to calculate and store

the new rotamers he will consider
the interaction energy each new rotamer has with the background
the rotamer pair interaction energies

Brian creates the new rotamers, then calls get_energies(...) to fill in two tables. In the first table, energy1b, he keeps the interaction energy for the rotamers and the background. In the second table, energy2b, he keeps the pair interaction energies. energy2b is a sparse matrix, and Brian has a special way of indexing into it.

I created an energy_graph graph structure to hold the energy1b and energy2b values instead. Each residue is a vertex, and the rotamers for a residue are the state space for that vertex. Two residues with non-zero interaction energy are connected by an edge, which holds the table of interaction energies for all rotamer pairs. If two residues have zero interaction energy regardless of the rotamers assigned, then there is no corresponding edge in the energy_graph.

This graph structure represents interaction energies as efficiently as the sparse matrix energy2b. To me, it is also more intuitive. (I currently use the energy_graph structure for my model. I cannot instantly replace Brian's energy1b and energy2b with the energy_graph class, because mine is not pairwise decomposable. I think it would be worth making the changes needed to energy_graph so that Brian could use it, as it would make simulated annealing faster.)

I have to mimic get_energies(...) as I implement my Non-Pairwise Decomposable (NPD) model. The function call hierarchy to get_energies(...) is something like get_energies() --> get_sc_bbE() --> fast_pairenergy_with_hydrogens() --> fast_pairenergy()_. The lowest level -- fast_pairenergy() -- is a table lookup. The position in the table depends on the atom types of the two atoms and their separation distance, which is hashed to a bin. The next level up, fast_pairenergy_with_hydrogens() takes a pair of heavy atoms and evaluates the interaction energies between them and all of their attached hydrogen atoms. For two methyl carbons (e.g. on the end of a methionine), this would mean a single call to fast_pairenergy() for the two carbons, six fast_pairenergy() calls between each carbon and the other's three attached hydrogen atoms, and nine calls between the hydrogen atoms themselves.

The next level higher -- get_sc_bbE() -- starts working on entire rotamers. It evaluates the interaction energy for a rotamer (sc) and part of the background (one backbone segment, bb). Summing energy across each background component will produce the interaction energy of a rotamer with the background. In addition to get_sc_bbE() are other similar functions, e.g. get_sc_scE(), get_bb_bbE(). These other functions also rely on fast_pairenergy_with_hydrogens().

For the NPD model, I need to work at atom level. I need to know the solvE value from the implicit solvation model for a single heavy atom so that I may later remove it. In my code, I have rewritten the get_sc_bbE() functions, creating corresponding functions such as get_sc_bbE_npd(). These functions still use the fast_pairenergy_with_hydrogens() function.

At the top level, I have written a new function get_energies_npd() which creates the energy_graph and calling get_bb_bbE_npd(), get_sc_bbE_npd(), and get_sc_scE_npd() places the appropriate values into the tables held by the energy_graph structure.

Simulated Annealing

Simulated annealing works like this. Assign a state to the system. Consider modifying that state: if the modification decrases the system's energy, make that modification. If it increases the system's energy, then accept the modification with a certain probability. This probability will depend on the value of the energy change, and a "temperature" of the system.

What is requried of a structure representing the energy of a system is that it be able to determine the energy of the system when in a current state and that it be able to report the change in energy if the state is changed.

My energy_graph structure maintains the current state assignment and the energies associated with that state assignment. It has a function float energy_graph::consider_state_substitution(int vertex, int alternate_state) which returns the change in energy if the current state assignment is altered to a new state assignment.

The way my energy_graph works is more efficient at computing the change in energy than Brian's current code. Let's suppose vertex v is in state s and Brian wants to compute the change in energy induced by entering state s'. What he does is first compute the interaction energy s has with everything. He then computes the interaction energy s' has with everything. If Brian stored the interaction energy s had, then he would only need to compute the interaction energy s' would have. It is inefficient to keep recomputing the interaction energy of s. The energy_graph structure I have created keeps the energy of s around, thus making it more efficient.

If simulated annealing made a replacement every time it considered a new state, then the computation would be cut in half. However, since many state assignments are rejected, the computation could be cut far better than half. For the purposes of simulated annealing on pairwise decomposable energy functions, I could create an energy_graph structure within a few days. I'm currious to see if I can speed up SA using such a structure.

Current Bugs

My code does not yet work:

The energies my NPD model is coming back with are way wrong. Using SA, I came back with a score of -1170.37 kcal/mol when Brian's model would have come back with something like -43 kcal/mol. I'm evaluating the energy incorrectly, though I don't know where.
The bookkeeping in my energy_graph structure is incorrect. What the energy_graph thinks the current energy of the system is ends up very wrong when I use it in SA. In my small testbed, it seems to work fine - so I need a larger testbed to tease out the problem.

-- AndrewLeaverFay - 07 Feb 2005

Comments:

You will have to be sure that the system is not changing to invalidate the quantities that you have stored on edges. Otherwise it sounds like a good effiency. -- JackSnoeyink - 07 Feb 2005
to top

End of topic
Skip to action links | Back to top

Edit | Attach image or document | Printable version | Raw text | More topic actions
Revisions: | r1.17 | > | r1.16 | > | r1.15 | Total page history | Backlinks

You are here: Main > ProgressReports > AndrewProgress

to top