ProteinVoid < BioGeometry

Estimating protein void volumes

Estimating protein void volumes

Question: is it worthwhile putting in an estimate of the void volume as part of the energy function for optimizing rotamer choices? Since this is not pairwise, it is not in rosetta at present, so the impression is that redesigned proteins tend not to pack as well as native structures.

To answer this, we undertook a study of whether a simple grid-based method would show separation between native and decoys robustly.

Definition of void volumes

Approach to computing

Results of experiments on 1bbz

Misc.

Brian's email on 1bbz, 7 oct 05

Here are two pdb files relevant to the SASApack term (Phil's scoring term) and void volumes. The redesigned pdb (1bbz_redesign.pdb) has a higher SASApack score. You can see the table of SASApack scores if you search for SASApack in the pdb file. They are broken down on a residue by residue basis.

My impression is that you are going to look and see if the grid based approach would be a practical way to rapidly calculate viod volumes during a rotamer search.

Email to Craig after presentation

Enlarging the atom radius slowed the program down significantly because more grid boxes have to be checked for relevant atoms, so I spent some time hacking this to get it back up to speed.

It's called pv2 only because I get lazy while developing -- we'll convert the name back to proteinvoid when it stabilizes.

The main modifications from what we talked about yesterday are: 1. with each water box, consider atoms that kill the water at the lower corner and save as abest an atom that we estimate (in preprocessing) will cover much of the box by being close to the center and having large radius. Then, when you consider a box, you can use that atom to kill off void points before beginning (and bail out early if all void points are killed.)

2. When visiting the boxes around a box to collect the relevant atoms, I can avoid the 3rd loop because of my sorting method -- I know that all the atoms in boxes along the first coordinate are consecutive. Thus, I can avoid the loop overhead of the for ii loop for atoms, and only need it for waters. (I switched to counting sort I showed you so that the first coordinate was the most rapidly varying instead of the last.)

I did some runs with different void spacings and still get nice separation on the number of void points again: avg # void points over vs native redesign #runs 2 12359.3 13403.7 1000 3 37742.7 41113.7 1000 4 85807.6 93664.4 1000 5 163753.7 178825.6 100 6 276417.5 302130.5 100

When I wanted to try to convert to void volume, I find a phenomenon that has been called fractal dimension, which I wasn't expecting because I'm not changing the volume. Going from i=2..6 void points per water radius, the volume associated with each void point should be waterrad^3/i^3 so the number should grow as i^3 because I'm not changing the volume, just how it is sampled and counted. These numbers grow more like i^2.85.

I haven't tried other water_spacing values, but I'd like to although that slows things down a lot. A natural water_spacing size would be about 0.8 A, because then the diameter of a box is sqrt(3*.8^2) < 1.4 A, and if any water is remaining around a box then we would not have to check it. (Changing && to || in the water_remaining3 if statement.) There might be other things we could do to tune for smaller water_spacing values.

other unimplemented ideas: Should visit the boxes in a spiral outward so that atoms most likely to kill voids are found as early as possible. Since with voidspacing = 3 visiting the boxes is slower than killing voids, this would not be a win right now. Sweep, so that we can use a 2d array (sort the 3rd coordinate). Kill whole segments of void points at a time by finding where lines pierce spheres.

These unimplemented ideas are not so applicable to the next task, which is to be able to do rotamer replacement.

For rotamer replacement, we need to go back to the type of ideas you were using -- when you add a bunch of atoms, you kill off waters, and then want to consider a list of potential voids that are uncovered by the waters and not covered by the bunch of atoms you added. With each rotamer we could have lists of these potential waters and voids. We also need to be able to take a rotamer out...

By a rotamer, I mean a small set of (4-20) atoms given in global (even water grid) coordinates that are associated with one sidechain of one amino acid. Given 1000-10K rotamers, grouped into 20-100 rotamers at each of 20-100 positions on the protein, if we select rotamer B to replace existing rotamer A, what is the net increase/decrease in void points?

You may do whatever preprocessing you wish on individual rotamers, including making lists of the waters and voids that they affect. We can now measure how large can we expect these lists to be, for different choices of water and void grid spacings.

We'll eventually want to do this in C++, but for now, pseudocode and back-of-the-envelope estimates.

-- JackSnoeyink - 23 Nov 2005

Attachment	Action	Size	Date	Who	Comment
protvoid.zip	manage	65.6 K	23 Nov 2005 - 16:52	JackSnoeyink	23Nov version of code, which generated test graphs for 1bbz

Revision: r1.1 - 23 Nov 2005 - 17:39 - Main.guest

BioGeometry > ProgressReports > JackProgress > OptimizingRosetta > ProteinVoid