MANUAL

Chapter 3: The Phaser Elves


Introduction

The Phaser Elves run mlphare, and dm. They repeatedly run mlphare until all the atomic parameters stop moving (or start oscillating). Any atoms that refine to insignificant occupancy, or a ridiculous B-factor are removed, and the refinement is repeated without them. Once this rejection-refinement procedure has converged, Phaser Elves apply a simple dm solvent flattening procedure to the mlphare results, and check the real-space residuals from dm as a measure of the correctness of the phasing solution. Alternative occupancy signs, site hands, and space-group choices are explored automatically. New sites can also be found by Phaser Elves using difference Fouriers, and added to the mlphare script.

Finding Heavy-Atom Sites in the First Place

Phaser Elves were designed to simply refine provided heavy-atom sites. They currently have the capability of running shelx, rantan or rsps to find a set of sites de-novo, but work is currently underway on improving this procedure.

At the moment, I recommend solve as the best, most reliable way to find heavy atom sites, and solve is set up by Scaler Elves, and run by Processer Elves.

Phaser Elves' difference-Fourier/Patterson cross-check procedure for finding heavy atoms has turned out to be quite powerful. So far, in three separate cases, site constellations of 3,5, and 8 Seleniums have been "rolled up" by starting Phaser Elves with just one of the correct sites. More testing needs to be done to confirm the generatity of this procedure, but it might actually be a better idea to start Phaser Elves with one good atom, and let the difference Fouriers do the rest.

Convergence Criteria

Phaser Elves consider their mlphare refinement "done" when the output, refined values for x, y, z, occupancy and B-factor are the same as the ones that were input. They also check a history of the last three mlphare runs, and if the same set of values turns up twice, then the mlphare refinement is oscillating, and Phaser Elves will then consider it "done". These are very general convergence criteria, and I have not encountered any crystal structures where these criteria could not be met.

If you are in a hurry, type "hurry" on the Phaser Elves command line. If you do this, Phaser Elves will loosen their convergence criteria to requiring only that the first decimal place be unchanged during the run (instead of the last). This will speed things up quite a bit, but might not give you the most accurate phases.

If one of your metal sites refines to a B-factor that would make its contribution to the diffraction pattern fall significantly out of the resolution range of your data (i.e. B<10 for 2A data, or B>500 for a ~10A low-res cutoff), or both the real and anomalous occupancies are near zero, Phaser Elves will consider it a "bad" site, and throw it out. A record of all "bad" sites is kept at the bottom of the mlphare script.

The Reference Data Set

Since mlphare requires one of your data sets to be used as a "reference" (against which, all the other data sets are compared), Phaser Elves need to know what this reference data set is. There are several ways to tell them.

The easiest way to specify a reference data set is to mention the name of the data column in the mtz on the Phaser Elves command line. You can also specify which sigma goes with it, but Phaser Elves are pretty good at figuring that out. Alternately, you can just provide Phaser Elves with a script to imitate.

If you don't supply a reference data set, Phaser Elves will evaluate each data set in the provided mtz file, in turn, looking for the one with the largest "isomorphous" differences from the rest. It is assumed that the larger differences will provide the strongest phasing information (generally true for MAD, not always true for MIR). In addition, care is taken to make sure the reference data set has an appreciable anomalous signal, since those anomalous differences will be all that determines the atom positions for that derivative.

The simplest way to override the above procedure (which is a little slow) and specify the reference data set yourself, is to simply mention the name of the data set (as it appears in mtzdmp output) on the Phaser Elves command line. Phaser Elves are usually smart enough to match the reference "F" column with its appropriate sigma, but you can specify this too. Also, it is always possible to provide Phaser Elves with a pre-written mlphare script. If you do, they will adhere to the reference data set provided therein.

Solvent Flattening

Phaser Elves use dm to do solvent flattening. To do this, they write and use a dm.com script which they will deploy into the "scripts" subdirectory. If you know your solvent content, you can give this to Phaser Elves on their command line as a number ending in "%". If you're not exactly sure, you can specify more than one solvent content: "40% 42% 45%", and Phaser Elves will try them all, in turn. If you do not specify a solvent content, Phaser Elves will try every value between 25% and 70%, in 5% increments. The results of each flattening run will be kept in a file called mtz/dm_0.40.mtz (for 40%), but the one that has the best free-solvent residual from dm will be copied to mtz/best_phased.mtz, and/or used to calculate a map.

Sign and Hand Ambiguities

Particularly for MAD data, there are several unresolved ambiguities in the correctness of your heavy atom constellation once the atomic parameters have been refined. By default, Phaser Elves will try inverting the hand of your site constellation, and repeat the entire refinement and solvent-flattening procedure. The statistics from mlphare will be almost identical if you do this, but the real-space residual from dm will not. The maps produced by these different hand conventions will look VERY different. At least one of them will be completely uninterpretable. Since dm is sensitive to the "look" of the initial map, it is a good program for resolving this hand ambiguity.

In addition, Phaser Elves will try inverting the sign of the "real" occupancies in mlphare (the ones that measure dispersive differences), because these alternate solutions are also difficult to distinguish from each other on phasing statistics alone. You might think this is silly if you know what your f' and f" values are at each wavelength, but I have rarely encountered a crystal where I was sure the dispersive maximum had been hit exactly. Besides, it doesn't take very long for Phaser to check this.

In handed space groups (such as P43/P41), Phaser Elves will also flip between the two, alternative space group assignments, again checking to see which one works the best (total of 7 possibilities).

Phaser Elves base their estimation of map quality on the real-space free residual from dm. This number almost always identifies the correctly-phased map. If mapman or "bones" are available, Phase Elves will try a bones trace, and decide on the quality of the map by the length of the bones. Long, tracable segments are a very good hallmark of a correctly phased map. Nevertheless, all of these maps are kept on disk, for you to evaluate visually if you want to. An o macro called o/allmaps.omac will load each map, in turn. If you don't want Phaser Elves to try all these alternatives, just put the words "no flip" on their command line. This can also be qualified: "no flip sg", "no flip sign", and "no flip hand", all work.

Finding More Sites

After trying all the above possibilities, the "best" site constellation is used to compute difference Fouriers (from solvent-flattened phases) to look for new metal sites. These metal sites are checked for consistency with the Patterson map (combined from all the difference data), and then recorded on disk in mlphare format. By default, these new sites are added into the mlphare script, and a new round of refinement is begun on all the sites. As with user-input sites, newly added sites that refine to near-zero occupancy or unreasonable B-factors are removed from the mlphare input, and retired to the bottom of the script (which is never executed), as "bad" atoms. Also, each putative new site that has been "tried" is recorded as an "old" atom at the bottom of the mlphare script. The positions of all these "legacy" atoms are avoided in future rounds of atom finding. In this way, Phaser Elves can exhaustively investigate nearly every significant peak in the difference Fourier. This CAN take a very long time, and if you don't want Phaser Elves to do this, simply put the words "no add" on their command line.

Customizing

Although they were developed for phasing MAD data, Phaser Elves are just as happy working with MIR. SIR, SIRAS or SAD. All you have to do is run Phaser Elves on an mtz file containing all your F and Dano data, and then edit the mlphare script they write to make sure they have the right constellation of metal sites listed under the right derivative. Then you can run Phaser Elves again, this time providing the edited script on their command line, and they will proceed to optimize it.

Input File Formats

Phaser Elves are most familiar with multi-column mtz data output from Scaler Elves (the same mtz format used by SHARP), but they will soon also be able to read scalepack output.

Phaser Elves can directly recognize and import metal sites from:

These files need only be listed on the Phaser Elves command line. Phaser Elves will recognize the file format, and read it appropriately. If you do not provide Phaser Elves with metal sites, they will use shelx, rantan, or rsps to try and find some.

Phaser Elves read more than just heavy-atom coordinates from mlphare scripts and log files, they also detect and preserve other mlphare keywords and then "pick up" their refinement procedure from where they left off, or optimize a script that you just wrote or modified.

Output Formats

Phaser Elves use mtz files. They keep most of their intermediate files around, to make it easier for you to start working at any particular place in the project. The most recent mlphare output is kept in mtz/mlphare.mtz, and the script that produced it is saved as ../scripts/mlphare.com, and the log file it produced is kept in logs/mlphare.log. A running history of the last three mlphare scripts and logs are kept by appending .old, .older, and .oldest to the script or log file. Phaser Elves use these files to check for the oscillation of parameters.

The results of the solvent-flattening runs are kept as logs/dm_##.log, where ## is the %-solvent content used in that run. The output mtz file is stored as mtz/dm_##.mtz. The mtz file that Phaser Elves considers the "best" solution is always copied to mtz/best_phased.mtz.

Map files are stored in the maps/ subdirectory, and are either called "maps/phased.map" for the most-recently calculated map, or "maps/best_phased.map" for the map Phaser Elves believe is the best solution they have found so far.

The results of sign/hand/occupancy flipping are saved under similar names, but with a string like: "flip_+hand_-occ" added to the filename. The "+" and "-" signs in this string are always relative to the input file, and are not, necessarily, the absolute signs.

In a directory called "o", many files are created to assist in quickly evaluation Phaser's progress in o:

  • o/sites.pdb - is a PDB version of the sites in ../scripts/mlphare.com
  • o/phased.omap - is the "o" version of maps/phased.map
  • o/bones.o - is a bones trace of o/phased.omap
  • o/pick.pdb - is a peak-pick of o/phased.omap (in case bones isn't around)
  • o/latest.omac - is an o macro for loading all the above files
  • o/latest - is a "mini" o macro for loading just the map
  • o/best_* - is a prefix appended to a copy of all the above files that were created from mtz/best_phased.mtz.
  • o/best.omac - is an o macro for loading the "o/best_*" files
  • o/map - mini-macro for loading best_phased.omap.
  • o/allmaps.omac - is an o macro for loading all the o maps (from each "flip" trial), one-by-one, into o.

  • Phaser Elves HOWTO

    So, how do you use Phaser Elves? The simplest way is just to try it. Run the script with a sentence about what you want Phaser Elves to do on their command line:

    unix% Phaser Elves, phase the data in mtz/all.mtz with the sites in SOLVE/solve.status in P22121 with 40% 50% solvent

    will write an mlphare script using the last site constellation found in the solve output file, re-index mtz/all.mtz to P21212 (with the cell permuted so that the old "a" axis becomes "c"), and then start running iterative mlphare, followed by solvent-flattening at 40% and 50% solvent in dm.

    Alternately, if you already have an mlphare script (perhaps edited from the above run), you can "pick up" an interrupted Phaser jobs by just mentioning the starting script:

    unix% Phaser ./start_mlphare.com

    and Phaser Elves will begin by imitating this script, and then trying all the solvent-flattening and hand-flipping stuff described above.


    Scripts written by Phaser Elves

    Like all Elves, Phaser Elves work by writing and editing shell scripts for CCP4 programs. These scripts have been designed to be easy to read and edit, but also contain a "smart setup" section to make them as flexible as possible in their unmodified form. Some examples of these scripts are shown here, with a brief description of how to run them:

    mlphare.com
    is the template mlphare script written by Phaser Elves. Phaser Elves will have set up mlphare.com with some metal sites in a proper number of DERIV entries, and reasonable EXCLUDE cards for eliminating unusually large difference data. The output file will always be called mtz/mlphare.mtz Use it like this:

    usage: mlphare.com mtz/all.mtz

    will run mlphare on the data found in mtz/all.mtz. mlphare.com will automatically use an input mtz file supplied on its command line, other than that, it is relatively brain-dead, and you will probably have to edit it to suit your needs. In fact, the most useful thing you can do with mlphare.com is edit it, and feed it back to Phaser Elves (on their command line) so that they can imitate your changes.

    dm.com
    is the Phaser Elves solvent-flattening script. It can be used to apply solvent flattening to partially-phased mtz files from almost any source. The script will scan its command line for input mtz files (ending in "mtz"), solvent contents (ending in "%") and dataset names (something that matches a column title in the mtz file). If you don't specify a preferred data set, the script automatically selects a column of Fs from the mtz file you provide based on its overall resolution, completeness, and signal/noise. This script can be applied to the output of mlphare.com (above), but can also be applied directly to the mtz file created by solve.

    usage: dm.com mtz/mlphare.mtz 50% Fpp

    will run dm on the data set called Fpp in mtz/mlphare.mtz for an automatic number of cycles with 50% solvent. If you include an integer on the dm.com command line, it will be taken as a number of cycles to run (instead of the internally-determined automatic number). If you include the word "omit" on the dm.com command line, dm will be run in "combine omit" mode instead of "PERT" or "FREE" (depending on the version). The output file will be called: dm.mtz

    fft.com
    is the Phaser Elves map calculation script. Its input is quite similar to the dm.com script, in that it scans its command line for a F and phase to use to make the map, and defaults to the F with the best resolution, completeness, or signal/noise, and the most recently added phase set. After calculating the map, fft.com also converts it to o format (dsn6), and performs a basic bones trace of the map, and writes a short o macro called map.omac for reading in the map.

    bestFH.com
    applies a generalization of the procedure developed by Matthews et al (1965) for estimating the amplitude of the heavy-atom contribution (FH) at each hkl by combining anomalous and isomorphous (or dispersive) differences. It takes a standard, merged CCP4 mtz file, and uses all the differences between all the "F" datasets found therein, along with all the anomalous difference "D" datasets to estimate FH = |FPH-FP|. This is a similar procedure to the CCP4 program revise, except that it requires no keyworded input, and, theoretically, works for all kinds of difference data, not just MAD. The only caveat is that all the difference data provided to bestFH.com should be from metal sites at the same XYZ location, otherwise, you will get an "averaged" FH for all the site constellations.
    usage: bestFH.com alldata.mtz

    will calculate the best estimate of FH from the native and derivative data sets found in alldata.mtz. In my experience, the Pattersons produced by FH are cleaner than simple isomorphous and anomalous difference Pattersons, and direct-methods programs like shelx also work better with FH.

    crosscheck.com
    this script checks heavy-atom sites against a Patterson map. The map should be a CCP4-type Patterson, and the input sites can be in the form of a shelx *.res file, a pdb file, or just any kind of "site" file containing fractional coordinates as a list of three, consecutive numbers between 0 and 1 with three or more decimal places. Sites can also be "provided" in the form of an electron density map (usually a difference Fourier of some kind). If such an input map is provided, it can also be converted to a Patterson by the script.

    usage: crosscheck.com shelx.res best_Patt.map
    will evaluate the (usually 20) sites in the shelx.res file to see if they can explain the Harker and (mutual) cross peaks in best_Patt.map. This "Patterson score" is calculated by rsps. Alternately:
    usage: crosscheck.com diff.map

    will pick peaks in diff.map, then back-transform diff.map, square it, and convert it to a Patterson. The list of picked peaks will then be checked against this Patterson, as above. A text file will be produced, containing all the various scores for each site, called crosscheck.list.

    reindex.com
    This is a "smart" script for doing simple reindexing between Laue-equivalent space groups. All you need to give it are the mtz file and the new space group. It works on both merged and unmerged data, and the "pseudo" space groups P2122, P2212, etc. are supported (wether or not you've got CCP4 4.x or not). The result of reindexing to a "pseudo" space group will be an mtz with a cannonical space group name (P2221 or P21212), but with the cell axes permuted appropriately.

    usage: reindex.com merged.mtz P222
    will change the space group of "merged.mtz" to P222. The output filename will be "reindexed.mtz".

    rrsps.com
    Recursive, Real-Space Patterson Search is basically a more comprehensive extension of the CCP4 rsps program. After an initial "harker scan" possible sites are each, in turn, checked for cross-scoring new sites. for each of these pairs of sites, another cross-score is computed, and a list of candidate third sites is obtained. This process is repeated, recursively, until no significant (default: 3 sigma) new sites are found. Each constellation of sites is then given a score, which is the product of all the peak heights in the constellation. This list is sorted, and provided to the user for subsequent evaluation. rrsps.com is a genuinely recursive shell program: it actually launches a new instance of itself for each new crossvector search! One might think that an exhaustive recursive search like this would take a really long time, but, for modest numbers of sites (<10) it only takes 15-60 minutes on an SGI Octane workstation.

    usage: rrsps.com patterson.map P212121 P222

    Will search for site constellations consistent with the Patterson in patterson.map, first using P212121, and then P222 symmetry. Each space group is given a separate output file. If you like, more than one Patterson map can also be given, and each will be considered in separate runs. Remember, however, that because this is a Patterson search, inversion-related constellations get the same score.

    rantan.com

    shelx.com


    The Future of Phaser Elves

    1. The site permutation/reindexing logic should be more thoroughly tested
    2. "minus-one" test: eliminate a site, and make sure it comes back as a difference feature
    3. run oasis on SAD data
    4. more intelligent CUI


    Back to the Elves Manual Table of Contents.
    This page is not finished. It will never be finished, and neither will yours. Admit it.

    James Holton <jamesh@ucxray.berkeley.edu>