MANUAL

Chapter 3: The Scaler Elves

Introduction

The Scaler Elves run scala. They perform smooth localscaling on raw, x-ray spot intensity data from an arbitrary number of wedges and wavelengths. They are most familiar with data processed with mosflm, but they can also read denzo output. When the scaling is done, they merge data belonging to each wavelength together, and output files that are ready to go into mlphare, solve, shelx, and cns/xplor.

Scaler Elves also set-up scripts to run solve and shelx in each possible alternative space group (ones in the same point group as the space group you give to Scaler).

Advantages of Smooth Scaling

Scala optionally applies a smoothing constraint to the frame scales and B-factors assigned to your data. This is almost always a good idea, but it is not usually available in other scaling programs. Traditional "batch" scaling programs tend to over-fit frame-by-frame scales because they rely on a few fully-recorded spots to determine the scales. This has the effect of amplifying the noise in a few of your measurments, and applying that noize to the rest of your data. A sensible way around this is to apply the "prior knowlege" that the scale (dose) should vary smoothly and slowly across a wedge of data. Of course, problems can arise if you try to smooth over a real discontinuity (beam refills, sub-wedge boundaries, or actual changes in exposure time). Scaler Elves try to find these discontinuities before imposing smooth scales, but you are the only one who really knows how the data collection went, so check.

Localscaling (allowing the scale to vary smoothly over all of reciprocal space) was shown over a decade ago to give cleaner Patterson maps, superior direct-methods performance, and better agreement with model data. This is because most of the systematic error in your data set is the result of the absorption of x-rays by your sample. Unless your sample (crystal, loop, etc.) is perfectly symmetric, different diffraction geometries will represent different pathlengths through the sample, and "symmetry-equivalent" spots will be attenuated by different amounts by x-ray absorption effects. Localscaling attempts to compensate for these absorption effects by changing the scale assigned to each region of reciprocal space. In this way, spots with similar HKL indicies are forced to have similar scale factors. Scaler Elves do localscaling by using scala's detector face scaling option. This allows the scale to vary (smoothly) across the detector face. In this way, localscaling is done in the camera frame, rather than abstracting it into reciprocal space.

The Reference Data Set

The scala manual stresses the importance of the reference data set. Having a reference helps stabilize the scaling procedure, and can be used to minimize the effects of sample absorption. Localscaling can compensate for relative differences in absorption, but, if your crystal shape is symmetric with its own lattice symmetry (and it usualy is), then there will be no way to correct for total absorption effects. For example, a needle crystal with tetragonal symmetry (c along the long axis of the needle), then spots located near the c-axis will be systematically low because their diffracted rays must pass through more crystal before they hit the detector. Some of this effect can be compensated for by using an overall anisotropic B-factor in model refinement, but this is only a first-order correction.

Scaler Elves also allow you to supply a merged mtz file of your own to use as a reference data set. This could be a high-resolution native data set, or, perhaps, a calculated data set from a refined structure. Using the latter would, conceivably, provide a very good way to correct for crystal absorption in scaling, but might bias the scaling toward the refined Fs. To this end, hkls flagged as part of the Free-R set are removed from the reference data set.

If you don't supply a reference data set, Scaler Elves will scale and merge the most complete wavelength, and then use that as the reference set. To minimize absorption effects, you want this to be data collected at a wavelength where the crystal absorbs the least (below the metal's absorption edge).

Output Formats

Because converting between file formats is such a pain for most programs, Scaler Elves ouptut your final, merged data in many file formats. The standard CCP4 multi-column mtz file all.mtz can be input into most programs. Mlphare, SHARP, and recent versions of solve should be able to read it. The data are also output in old solve format (*.fmt files) in both merged and unmerged form. All the difference data (anomalous and isomorphous) are output in shelx format and an estimate of F_H (combining all the difference data) is also output in shelx format.

Free-R flags

Scaler Elves employ a sensible and flexible free-R flag assignment program called FreeRer.com. This script behaves almost identically to the CCP4 uniqueify script, except that it is capable of "inheriting" a free-R flag set from another data file (either mtz or xplor/cns format). If there are Free-R flags in the mtz file used as a reference data set, they will be passed along to Scaler's output files.

Solvent Content

Although not, technically, important for scaling, your solvent content is important to know in the mergeing step. The CCP4 program truncate requires an estimate of a protein's size in order to place your final F data on an absolute scale (1 "F" unit = 1 electron equivalent). If you don't care about this, Scaler Elves will assume you have a V_M of 2.4, and calculate an expected protein size from that. You can, of course, tell Scaler Elves what your V_M or your solvent content is:

unix% Scaler Elves */*/raw.mtz and my crystal is 70% solvent

You can also give Scaler Elves a sequence file (fasta, pdb, etc.), and they can calculate the molecular weight from it. Given a molecular weight, (and, optinally, a V_M), Scaler Elves will calculate the number of monomers that will probably fit in your asymmetric unit.

Metal Site Finding Program Set-up

Scaler Elves also set up data and scripts for solve, shelx, and xplor/cns heavy-atom finding and phasing programs. For this reason, they want to know the number and type of expected metal sites in your protein. You can, of course, explicitly tell Scaler Elves that you have 8 mercury sites, for example. Or, if you give them a sequence file and tell them that you have mercury sites, they will assume that you have one mercury bound to each cystiene (and this number will be multiplied by the expected number of monomers in the asymmetric unit). The same goes for selenium sites and the number of methionines you have. If you don't tell Scaler Elves anything about metal sites, they will try to guess which metal you have from the wavelength you used to collect the data. For example, data collected primarily at 1.04A is probably from a gold derivative.

Remember, Elves only make these assumptions in the absence of information from you. If you mention the word "platinum" on the command line, they will take that as meaning you have a platinum derivative. The purpose of this guessing is to save you from having to type in stuff that is either standard practice (and, therefore, predictable), or unimportant (i.e. you are scaling native data).

Scaler Elves HOWTO

So, how do you use Scaler Elves? The simplest way is just to try it. Run the script with a sentence about what you want Scaler Elves to do on their command line:

unix% Scaler Elves, scale the data in */*/raw.mtz which is from a 22kD protein

Or, if you have denzo data:

unix% Scaler Elves, scale the data in denzo/frame*.x which is from a 22kD protein

If you don't like the way Scaler has decided to organize your frames, you can overrule them by editing the runlist.txt file as you see fit. It shouldn't be difficult to figure out how the runlist.txt file works, the comment lines (beginning with "#") are used to initiate a "wavelength" (group of runs that will, eventually, be merged together), and a blank line terminates the list of runs. Note that, at present, a "wavelength" is the default "mergeing group." But you can specify runs of data collected at different wavelengths be merged together, if you want too. At the moment, however, you must have different values for the wavelength for each "mergeing group" in runlist.txt, even if two mergeing groups have the same wavelength in reality (it won't affect scaling at all), This will be fixed soon.

When you mention the new runlist.txt on a new run of Scaler Elves, they will recognize this file as the user-preferred run definitions. They will use them, and never change them.

Scripts written by Scaler Elves

Like all Elves, Scaler Elves work by writing and editing shell scripts for CCP4 programs. These scripts have been designed to be easy to read and edit. Some of them also contain a "smart setup" section to make them as flexible as possible in their unmodified form. Some examples of these scripts are shown here, with a brief description of how to run them:

autoscala
Autoscala optimizes the SDCORR card in a given scala script, displaying which one gives the best scatter/sigma (chi²). Scaler Elves use this to optimize their treatment of SDCORR for your particular data set. At the moment, it is only optimized for one wavelength (mergeing group), and the results are applied to the whole experiment.

usage: autoscala merge.com

will run the command "merge.com" but substitute the line beginning with "SDCORR" in merge.com with a series of possible SDCORR x y z command cards. The scatter/sigma table produced by merge.com is checked, and a new SDCORR line is chosen, based on the principle of the Golden Section search. Once the Golden Section search converges, a file called merge.com_best will be created, which is identical to merge.com, but with the SDCORR line edited to the "best" values found.

autoscalepack
is another script which does the same thing for the error_scale_factor and estimated_error lines of a scalepack script.

The Future of Scaler Elves

Mergeing Groups
At present, Scaler considers same-wavelength data as a "mergeing group", this will be made more flexible in the beta version.
Postrefinement
One of the major remaining shortcomings of the Scaler Elves routine is postrefinement. Although mosflm postrefines as it integrates, it cannot take advantage of orientation information outside the current wedge. Therefore, data that has been broken up into many small wedges might not be postrefined very well.
Absorption corrections
Scaler Elves currently support the concept of model-derived absorption corrections, monitored by the free-R, but it has not been thoroughly tested. Use of refined Fs instead of a merged wavelength in Scala's scaling procedure has been seen to produce data sets that are a percent or so different from each other, but it is still not clear which is better. An iterative approach, where scaling and refinement are cycled back and forth is currently under development.
smart scaleit.com
the current scaleit.com script will only work for the data Scaler was run on. In the future, Scaler's scaleit.com script will be smart enough to scale an arbitrary number of merged mtzs together into one file.
recognize and read x-plor/cns or scalepack reference data
At the moment, only merged mtz files can be used as an externally-defined reference data set for Scaler, but future versions will automatically recognize and convert merged data in x-plor or scalepack output formats.

Back to the Elves Manual Table of Contents.

This page is not finished. It will never be finished, and neither will yours. Admit it.

James Holton <JMHolton@lbl.gov>