Skip to topic | Skip to bottom
Home
Main
Main.XueyiProgressr1.1 - 22 Mar 2005 - 04:05 - Main.guesttopic end

Start of topic | Skip to actions
Things need to do
  • RNASC -- RNA Structure Correction. A C++ program for correcting bad clashes in existing RNA structure.
  • ColCheck? -- Collision Check. Comparing the performances of a bunch of collision check algorithms using real proteins for simulation.
  • Derive Ramachandran-like plots for RNA residue/suite.

Weekly Reports

Mar. 21, 2005

I finished a book "Molecular Modeling" (by four authors in Europe) during the spring break and found a method for clustering the 6/7 dimension dihedral data of RNA residue/suite. Instead of calculating the isosurfaces on these data, maybe a good way is to calculate the energy for each conformation in the 6/7 dimension data and search for the local minima. These local minima should be good candidates of clustered conformations.

From computer science perspective, calculating energy is trival as long as energy functions are correct, but finding the local minima in 6/7 dimension data and further calculating the isosurfaces (must be projected to 3/4 dimensions?) by the energy level may be hard. I think Ramachandran-like plots represented by energy isosurfaces should be more convincing to the biologists.

Mar. 02, 2005

Finished two collision checking algorithms -- grid algorithm (#1) and its improvement (#2) (only update the rotated chain -- in one rotation per step, 3 atoms per residue, 100,000 steps and [-10, 10] per rotation.

#1, SEED = 1.

  residues clash runtime per step (ms) check pairs per step
1A92 50 3 0.2014 851
1CTF 68 3.039 0.271 1121
1O16 154 4.961 0.64132 2952
1SBP 309 10.631 1.28444 5549
1QRR 390 5.762 1.63581 7151
1LOX 647 7.936 2.75856 12106
1JB0 740 7.517 3.29283 15440

#2, SEED = 1.

  residues clash % runtime per step check pairs per step
1A92 50 3.0 0.11465 29
1CTF 68 3.039 0.15682 35
1O16 154 4.961 0.3493 64
1SBP 309 10.631 0.66927 91
1QRR 390 5.617 0.87015 113
1LOX 647 7.936 1.45691 102
1JB0 740 7.517 1.67429 126

#1, SEED = NULL, reflesh every 50 steps.

  residues clash % runtime per step check pairs per step
1A92 50 59.767 0.18987 789
1CTF 68 60.023 0.25247 1034
1O16 154 62.105 0.54649 1902
1SBP 309 63.116 1.09537 3666
1QRR 390 44.984 1.47702 5640
1LOX 647 35.439 2.56809 10210
1JB0 740 35.213 3.07102 12381

#2, SEED = NULL, reflesh every 50 steps.

  residues clash % runtime per step check pairs per step
1A92 50 74.131 0.03765 46
1CTF 68 77.017 0.05268 56
1O16 154 82.929 0.09274 52
1SBP 309 85.112 0.23554 110
1QRR 390 81.432 0.31726 94
1LOX 647 73.863 0.60418 85
1JB0 740 73.785 0.65384 86

Some observations:

1. For Algorithm #1. SEED = 1 yields ~5% collisions while SEED = NULL yields > 50% collisions. The atoms will be updated and hashed every step, so the advantage of having many collisions doesn't significantly reduce the processing time (improved ~7%).

2. For Algorithm #2. SEED = 1 yields ~5% collisions while SEED = NULL yields > 50% collisions. In each step, only rotated atoms will be updated, so the advantage of having many collisions significantly reduces the processing time (2.5~3.5 times faster). The similar comparion pairs mean that the comparing time is almost the same (bug???), the only difference is the update time when the rotated step is accepted. Most of operations in this step is copy coodinates and update the grid.

3. When SEED = 1, #2 is about 2 times faster than #1. The significant difference of comparing pairs per step means that about half of the time is spend on the calculating new coordinates and updating grids and coordinates (when the rotation is accepted). It is because most of the rotations are accepted, we need to update the grids and coordinates in #2 in every step (this work is done before checking collisions in #1).

4. According to 1, 2 and 3, it is not surprise that when SEED = NULL, #2 is about 5 times faster than #1. It is because more than half rotations are failed, so we no need to update the grids and coordinates.

5.

Feb. 22, 2005

RNASC -- still working on the rough step, it turns out debugging in C++ is harder than Matlab.

Collision Algorithm Comparision (ColCheck?) -- working on three grid algorithms when considering backbone atoms only (3 atoms per residue).

I realize there is some trickes in the chain-tree algorithm. They use Oriented Bounding Box (OBB), as shown in the figure 7 of Lotan 04. When the bond between F and G is rotated, only the transformation matrices in F and K are changed and only the OBBs in N and O are re-computed. It means they delayed the update of coordinates of atoms in G and H and the update of the OBB in M (OBB in M doesn't need to be re-computed, it just needs a rotation) to the step for checking collisions. So they can achieve the update in O(logn) for one rotation.

The trick here is that, when they decend from the root to leaves for checking bad collisions, they will stop and discard the rotation as soon as they find a bad collision, so the delay of update has some advantages. But considering the practical situations, we can find some interesting findings:

1. When the length of protein backbone is short (= 374 residues in 1HTB, i.e. ~1500 nodes on the backbone), the update time for grid and chain-tree is close, as shown in figure 9. So for longer chain, chain-tree algorithm is much better. But I think most simulated protein should have short backbone, since we still don't have accurate energy function and knowledges to simulate large protein. Note in this simulation, they only consider the atoms on the backbone, 3 atoms per residue, one rotation each step, 0-30 degree for each rotation.

2. They choose randomly from 0-30 degree for each rotation in this paper, but they also choose randomly from 0-12 degree for each rotation (calculating energy instead of collision) in Lotan 03 and they claim that the rejection ratio is 60% (bad collision) for one rotation each step in 1CTF, which contains only 68 residues. So I assume the rejection ratio should be much higher in the simulations done in this paper. Question: What's the normal/acceptable rejection ratio in actual simulation?

3. In figure 9b, chain-tree algorithm is 2-5 times faster than grid algorithm, and we can see that for 1HTB, chain-tree is approximately 3 times faster. Note here the algorithm stops as soon as they find the bad collision. But in figure 10, they try to find all collisions in each step and they use 11 different conformations (gyration radius ranges from 20A to 85A) for simulation. We can see that for one rotation per step, the chain-tree can be 10 times faster than grid. Does it mean the grid algorithm can find collision faster than chain-tree? Or Does it mean grid is faster in compact conformation (native conformation) than in noncompact conformation? I think this figure is a little misleading.

-- XueyiWang - 20 Feb 2005

Feb. 15, 2005

RNASC:

I'm still working on the rough step of adjusting RNA backbone. In rough step for adjusting suite, the brute force algorithm fixes the phosphorus position, samples the dihedrals of O5*, C5*, C4* and O3*, C3* at every 5 (or user defined) degree, and uses some criterions like the maximum and minimum distances of C4*-C1* and C3*-C1*, the maximum and minimum angles of C5*-C4*-C1* and O3*-C3*-C1*. For extended adjustment, say, adjusting atoms within two residues instead of one suite, the adjusted atoms are one times more than before, but the criterions are the same. When rough step finishes, the valid dihedrals of O5*, C5*, C4* and O3*, C3* are kept for refining.

I added some more flags in command line, but I still didn't figure out the better way to do clustering.

-ROUGHSTEP#   Span of degrees in rough step.
-REFINESTEP#   Span of degrees in refine step.
     
-FIRSTPHOSPHORUS [#.# #.# #.#]
    Adjust the positions of first phosphorus. In suite, it is
    the only phosphorus can be adjusted.
    In suite, the x axis is defined as from C1* in first base
    to C1* in second base. When defining triangle
    P-C1*(1)-C1*(2) in counter clockwise, z axis points out of
    the triangle and perpendicular to it.
-SECONDPHOSPHORUS [#.# #.# #.#]
    Adjust the positions of second phosphorus.
-FIRSTBASE [#.# #.# #.#]
    Adjust the positions of first base. In residue, it is
    the only base can be adjusted.
    In residue, the x axis is defined as from first phosphorus
    to second phosphorus. When defining triangle P(1)-C1*-P(2)
    in counter clockwise, z axis points out of the triangle
    and is perpendicular to it.
-SECONDBASE [#.# #.# #.#]
    Adjust the positions of second base.

Collision algorithms -- I'm not sure I can finish all of these:

Experiment contents:

1. Use real PDB files. Remove the atoms on the side chains except Cb.

2. Randomly generate the rotation position and degree of dihedrals.

3. Compare performance of different algorithms in two situations: a. rotating 1 dihedral in each step; b. rotating 3 dihedrals in each step; c. each rotate chooses from [-10,10] randomly.

4. Another situations: a. only atoms on the main chain are considered. b. atoms on the backbone and the cb atoms are considered.

5. 100,000 simulation steps each experiment.

Different algorithms to be compared:

1. General grid algorithm: resampling all atoms and re-checking collisions in each step.

2. Improved grid algorithm by updating rotated atoms only and checking collisions with un-rotated atoms.

3. Further improvement on grid algorithm by setting the origin to the center of protein strand instead of one end. This one is trival but I'd like to see the performance.

4. Sweep-plane algorithm. It seems that the reason why the general grid algorithm is much more slower than chain-tree algorithm seems lies on the too many un-needed interacting pairs in grid algorithm. From the comparison in Lotan's paper, 20 times more comparison causes 10 times slower in running time. The improvement in 2 should reduce much more comparison, but because the span of grid should equal to the largest bad-collision threshold in all atom pairs, there are still lots of un-needed pair comparison. So I'm thinking of using a sweep-plane algorithm to improve the performance. First split the protein backbone into a series of strands, as shown in the figure. Each strand contains an atom with local maximum z coordinate in the middle and two atoms with local minimum z coordinates in the two ends. After each rotation, we reconstruct the rotated protein structure into strands. Then we sweep down the protein structure, compare the rotated and un-rotated protein strands. In the sweep plane, we can also use a sweep line to detect possible pair interactions. It seems that this method reduces the pair comparisons (I assume the time complexity is close to O(nlogn)??), but in practical, I'm not quite sure if the overhead will eat up the saved comparison time.

5. chain-tree algorithm by Itay Lotan.

6. Modified chain-tree algorithm: using Axis Aligned Bounding Volume instead of Oriented Bounding Volume and calculating overlapped volume. If at a certain level, bounding boxes A and B have overlapped volume V, then we walk down the tree one level, get the bounding boxes C & D under A and E & F under B. For the algorithm in 5, we need to check four pairs at most, C&E, C&F, D&E and D&F. But now if C has no overlaps with V, we can reduce the pairs to two: D&E and D&F. Although AABB has larger volume than OBB, I think we can improve the performance by reducing the number of pairs and simplifying the overlap check. Question: does this method reduce the time complexity, which is O(n4/3). I still didn't figure it out.

-- XueyiWang - 14 Feb 2005

Feb. 08, 2005

I decided to use the CCP4 Coordinate Library and Clipper Library for reading PDB files, constructing RNA structure, and manipulating X-ray crystallography data.

I finished the command line parts, defined as follows:

RNASC [-Flags [value]] filename

Flags:

-RESNUM#   Number of residue to be adjusted (default=first residue)
    For clash in suite, it's the residue of the phosphate.
-CHAIN#   The chain ID (default = the first chain)
-MODEL#   The model ID (default = the first model)
-PUCKER#[-#]   Specific pucker types of first and second sugar (default=1))
    For clash in residue, there is only one sugar pucker.)
    0 - determined by the perpendicular distance from phosphate)
    to first base.)
    1 - determined by the perpendicular distance from phosphate)
    to bond N1/9 - C1*)
    2,3 - C2*endo and C3*endo.
-RESIDUECLASH   Clash in residue.
     
-PARAMETER# [file]   Specify the set of bond lengths and angles (default=1))
    1 -- canonical paramters)
    2 -- bond lengths and angles used in pdb file)
    4 -- the average 1 and 2)
    8 -- user defined parameters (unavailable)
    Different sets can be used at the same time, e.g. 7 means
    calculating the structure three times, each time use one
    set of parameters from 1, 2 and 4.
-SIG#   The allowable range of the bond angles in sugar puckers,
    designated by the times of the sigma values of each bond
    angle (default=3).
-SHORTEXTENSION   Adjust the atoms in one suite/residue range, if this item
    is not specified, adjusting will be in two residues/suites
    range.
     
-ENERGYFUNC#.#   Energy function used for checking bad clashes in pairwise
    (default=0.4). If desginated by number, e.g. default=0.4
    means the function is vdwi+vdwj-0.4. Otherwise, new energy
    function must be specified (unavailable)
     
-CONFORMNUM#   The number of comformations to be output (default=100).
-NOMEASUREDIHEDRAL   Don't measure dihedrals for each conformation.
-NOMEASURECONFORM   Don't determine conformation type for each conformation.
     
-Help   all command line parameters.

-- XueyiWang - 07 Feb 2005

Other options

You might also want options to cluster results, so that nearby solutions (maybe measured by RMSD of coordinates or of parameters) are considered the same and -CONFORMNUM counts distinct solutions.

-- JackSnoeyink - 08 Feb 2005
to top


You are here: Main > ProgressReports > XueyiProgress

to top

Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback