A protocol for fitting the weights of the energy function
Documentation for
OptE
INTRODUCTION
OptE is a protocol for optimizing the weights of the score function for Mini Rosetta developed with significant contributions by Jim Havranek and Andrew Leaver-Fey. Given a set of score functions and initial weights the goal is to adjust the weights to optimize some objective, such as given the native backbone conformation of a protein recovering the native amino acid sequence or native rotamer. This document is a short introduction to the
OptE system and some pointer on how to get started in using it.
CONCEPTS
Sequence recovery: The percentage of positions where the designed amino acid matches the amino acid of the native protein.
Particle Swarm Optimization: An optimization algorithm like a "genetic algorithm". A collection of candidate weights are represented as particles with position and velocity in weight space. At each time step, each particle moves based on its speed, its optimal value so far and the global optimal value.
Reference energies:
USAGE
Here is how we (the Kuhlman Lab) have gotten
OptE to work
Set up the directory structure as follows.
/mini_optE
/mini (checked out from svn.rosettacommons.org/source/trunk/mini)
/minirosetta_database (checked out from svn.rosettacommons.org/source/trunk/minirosetta_database
/optE_runs
/001.pnataa (for each run, make copy of entire directory, eg 002.pnataa is next)
/weightdir
optE_scorefile_1.wts
...
optE_scorefile_10.wts (final weights)
sensitivity_1.dat
...
sensitivity_10.dat
/workdir_0
...
/workdir_9
/logdir
minimization_dat_1.dat
...
minimization_dat_10.dat
fixed_wts.txt (weights file format, specifies which score terms are fixed)
free_wts.txt (weights file format, specifies which score terms are free)
log (redirect output here)
command (put execution script here, note: depends on how server is set up!)
Set up the command script for the server that the
OptE jobs are to be run on. Here is an example of the command file for the Bass cluster in the computer science department at UNC which is running
GridEngine?.
It is common to change the fixed_wts.txt and free_wts.txt between each run.
Note that currently this directory structure is fragile!
OptE may not work correctly if it is missing any of the directories.
ALGORITHM
The
OptE code is currently in the mini/src/protocols/optimize_weights directory
OptE process ( IterativeOptEDrver )
divide up pdbs among
outer loop 10 times:
collect rotamer energies
optimize weights
inner loop 6 times:
write new score file
test sequence recovery
break if sequence recovery improved
collect_rotamer_energies(...)
compute rotamer energies for assigned pdbs
compute rotamers around ligands
collect decoy discrimination, ligand discrimination, dG of binding and ddG of mutation data
optimize_weights(...)
if the optE::optimize_starting_free_weights option is set, run particle swarm optimization on weights
run optimization::Minimizer on weights
write_new_score_file(...)
// mix the old weights with the weights found after minimizing them
define mixing_factor_:
let o be the outer loop counter and i be the inner loop counter
if o == 1 -> mixing_factor_ = 1
if i <= 5 -> mixing_factor_ = 1/(o+i)
else -> mixing_factor_ = 1
weights = (1 - mixing_factor_)*old_weights + mixing_factor*new_weights
ref_energies = (1 - mixing_factor_)*old_ref_energies + mixing_factor*new_ref_energies
test_sequence_recovery(...)
design pdbs with newly weighted score function -> get sequence recovery rate
repack pdbs with newly weighted score function -> get rotamer recovery rate
Tasks to clean up
OptE:
-make all the folders not hard coded (or at least SUPER clear)
-remove the return value for measure_sequence_recovery
-make code for centroid mode an option not a hack
--
MatthewOmeara - 04 Feb 2009