
Scaler Elves guide to their scripts



What am I supposed to do with all these scripts!?


################################################################################
sort_everything.com	- script for combining and sorting all raw data into one file
    reads: ../raw.mtz	    
    makes: ./mtz/rawdata.mtz
    usage: ./scripts/sort_everything.com [SG]
    where: [SG] is the (optional) new space group

    example: ./scripts/sort_everything.com P6122 >! ./logs/sorting.log
	
	Will sort all raw data and reindex it to P6122
    (Note: the reindexing done here is at the mercy of the CCP4 program "reindex", 
    it is always safer to change your indicies in the integration program)
    For asymmetric orthorhombic space groups (P2221, P21212 and C2221) you can
    also specify "pseudo" space groups like "P2212" to indicate P2221 with the 
    screw axis along "b" etc.
    
################################################################################
make_reference_set.com	- scale and merge the reference data set.
    reads: ./mtz/rawdata.mtz
    makes: ./mtz/reference.mtz
    usage: ./scripts/make_reference_set.com [infile.mtz]

    example: ./scripts/make_reference_set.com >! ./logs/make_reference_set.log

    This script creates a "pre-merged" reference data set out of one of 
    your wavelengths.  This reference set will be used by SCALA as a guide 
    for scaling all raw data.  It exists solely to stabilize the scaling run, 
    and is not included in the final mergeing step.  Mergeing of wavelength used
    to make this reference set will follow the same procedure as the rest of 
    the wavelengths.
    
    Note: this is not, necessarily the reference set you will be using in
     mlphare, etc.

################################################################################
import_reference.com  - combine the reference data set with ./mtz/rawdata.mtz.
    reads: ./mtz/rawdata.mtz ./mtz/reference.mtz
    makes: ./mtz/sorted_ref.mtz
    usage: ./scripts/import_reference.com reference.mtz [infile.mtz]

    example: ./scripts/import_reference.com ./mtz/reference.mtz >> ./logs/make_reference_set.log

    This script imports an arbitrary reference data set into your scaling
    run.  This can be almost any set of unique data.  By default, Scaler
    Elves will make a reference dataset one from your most complete wavelength
    using the script above.
    You could also use a calculated dataset from your final, refined structure
    as ./mtz/reference.mtz here.  This would have the effect of traditional FC-directed 
    absorption corrections in localscaling (below).  By default, the Free-R 
    Free-R flagged HKLs will be excluded, as not to bias scaling of Fobs to your
    final Fc.  This allows you to use the free R to see if this absorption 
    correction did you any good.

################################################################################
rough_scale.com		- first round of all-data scaling
    reads: ./mtz/sorted_ref.mtz ./runlist.txt
    makes: ./mtz/rough_scaled.mtz
    usage: ./scripts/rough_scale.com [cycles] spacing [spacing] filter [infile]

    where: [cycles] is the number of scaling cycles you want (default: 50)
	   [spacing] is the Bfactor smoothing window, in degrees (default: 10)
	   [infile] is the input mtz file (default: ./mtz/sorted_ref.mtz)
	   filter turns on the "eigenvalue filter"
    
    example: ./scripts/rough_scale.com 5   spacing 10   filter

    This script scales all data (guided by the "pre-merged" reference) using
    one scale factor per frame, but requiring that the B-factor vary smoothly
    over all frames.  This script mainly serves to remove large discontinuities
    in scale that would crash localscale.com.  If your runs have no discontinuities
    (no fills, inverse beam, etc.),  then you can skip rough_scale.com	   

    If ./runlist.txt is missing, it will be regenerated by rough_scale.com
    
################################################################################
localscale.com		- second round of all-data scaling (3D scales)
    reads: ./mtz/rough_scaled.mtz ./runlist.txt
    makes: ./mtz/localscaled.mtz
    usage: ./scripts/localscale.com [cycles] spacing [spacing] filter [infile]

    where: [cycles] is the number of scaling cycles you want (default: 50)
	   [spacing] is the scale smoothing window, in degrees (default: 10)
	   [infile] is the input mtz file (default: ./mtz/rough_scaled.mtz)
	   filter turns on the "eigenvalue filter"

    example: ./scripts/localscale.com 50    ./mtz/sorted_ref.mtz

    This script scales all data, guided by the "pre-merged" reference, and 
    building on the scales obtained by rough_scale.com in a "3D" localscaling
    procedure.  Scale factors are required to vary smoothly within each "run".
    B-factors are not refined, but instead, the scale is allowd to vary (smoothly) 
    across the detector face.  Combined with the smooth scaling over frames, this
    has the effect of assigning a smoothly-varying scale to every point in the
    observed reciprocal space, and, hence, localscaling.  This is the MAD scaling
    procedure recommended by Phil Evans in the SCALA documentation, and JMH has
    found it to improve Rmerge significantly.
    
    If ./runlist.txt is missing, it will be regenerated by localscale.com

################################################################################
merge.com		- mergeing utility
    reads: ./mtz/localscaled.mtz ./runlist.txt
    makes: merged.mtz
    usage: ./scripts/merge.com [wave] [RESO] [SG]

    where: [wave] is is the wavelength name to merge (default: all of them)
           [RESO] is the high-resolution cutoff (default: 2)
	   [SG] is the space group, reindexed using "reindex" (default: P6122)
    
    example: ./scripts/merge.com FP 2 P6322

    This script merges all data from the provided wavelength.  No scaling is done, 
    so you should use a scaled MTZ.  merge.com "knows" which wavelength is which
    from the information in ./runlist.txt
 		    
    If ./runlist.txt is missing, it will be regenerated by merge.com

################################################################################
extract.com		- non-mergeing wavelength extractor
    reads: ./mtz/localscaled.mtz ./runlist.txt
    makes: unmerged.mtz
    usage: ./scripts/extract.com wave [RESO] [SG]

    where: wave is is the wavelength name to merge (required)
           [RESO] is the high-resolution cutoff (default: 2)
	   [SG] is the space group, reindexed using "reindex" (default: P6122)
    
    example: ./scripts/extract.com FP 2

    This script works pretty much the same as merge.com, except it does
    not merge equivalent reflection data.  HOWEVER, it does add partials.
    extract.com serves mainly to migrate scaled, but unmerged reflection
    data to another scaling program (such as SOLVE's localscaling procedure).

    If ./runlist.txt is missing, it will be regenerated by extract.com

################################################################################
scaleit.com		- place merged data in a multicolumn MTZ file
    reads: ./mtz/localscaled.mtz ./runlist.txt
    makes: ./mtz/all.mtz
    usage: ./scripts/scaleit.com [FP]
			    
    where:  [FP] is the wavelength name to use as a reference
	    (one of: 0.97973)

	This script combines each of the files produced by merge.com into
    a single, multi-column mtz file.  This is the file you should use for
    SHARP and mlphare.
	If you type "./scripts/scaleit.com all", scaleit.com will do a 
    scaleit run on each of FP in turn.
    
    

Utilities:

################################################################################
scaleit_sum.com		- sumarize Diso and Dano
    reads: scaleit logs
    makes: an xloggraph plot (to screen)
    usage: ./scripts/scaleit_sum.com ./logsscaleit.log

	This little jiffy serves primarily in edge walking.  It gives you
    a quick plot of Dano and Diso (relative to the reference) vs. x-ray energy.

################################################################################
mtz_sum.com		- sumarize an MTZ file
    reads: merged MTZ files
    makes: a nice table of completeness and <F>/<sigF>
    usage: ./scripts/mtz_sum.com mtzfile.mtz [RESO]
    
    example: ./scripts/mtz_sum.com 2 ./mtz/all.mtz

	Prints out completeness and F/sigF for every F in the mtz file provided.
    
################################################################################
scala_summary.com	- sumarize mergeing results from one or more scala logs
    reads: scala/truncate log files
    makes: a nice table of Rmerge Ranom I/sigma Completeness Multplicity and Wilson B
    usage: ./scripts/scala_summary.com scala.log [otherscala.log ... ]

    example: ./scripts/scala_summary.com ./logs/merge_*

	Prints out 

################################################################################
FreeRer.com		- add/inherit Free-R flags
    reads: one or two mtzs (or one mtz and an x-plor file)
    makes: FreeRed.mtz, FreeR_flag.mtz and XPLOR.cv
    usage: ./scripts/FreeRer.com mtzfile.mtz [free-R source] [fraction[%]]
    where: mtzfile.mtz is the file you want to ADD Free-R flags to
           [free-R source] is the file you want to get the flags from (mtz or X-plor)
	   [fraction[%]] is the fraction of spots to put in the free-R set (default: 10%)

    example: ./scripts/FreeRer ./mtz/all.mtz /some/random/place/xplor/olddata.cv
    
    FreeRed.mtz, FreeR_flag.mtz and XPLOR.cv will always be made, and they contain equivalent 
    representations of the Free-R set.  FreeR_flag.mtz, however will contain Free-R assignments 
    extending out to 1.5A.  That way, FreeR_flag.mtz can be used to assign the Free-R set from 
    future crystals (which might diffract better).
    If no [free-R source] is given, the Free-R flags will be made up (as in uniqueify).  However, 
    if a second file is given (mtz or X-PLOR format) The Free-R flags will be taken from it.  
    Any "holes" in an externally-obtained set (I.E. missing HKLs) will be filled in as described
    in the CCP4 documentation.  
    The given example will produce a file called FreeRed.mtz that contains the Free-R flags used 
    in /some/random/place/xplor/olddata.cv.  
    
################################################################################
bestFH.com	- Matthews "best" FH estimator

    input:  all.mtz	- a cad-ed mtz file with multiple data sets
    output: FH.mtz	- an mtz containing only the estimate of FH
            fh.hkl	- shelx version of FH.mtz
	    FH_Patt.map - a Patterson map of FH
	    FH_Four.map - phased map of FH (if a phase is in all.mtz)
	   
    usage: ./scriptsbestFH.com all.mtz [Fset] [Dset] [1.8A]
    where: 
    all.mtz    contains same-site reflection data      (default: mtz/all.mtz)
    Fset       are the sets of Fs you want to use      (default: all of them)
    Dset       are the sets of Danos you want to use   (default: all of them)
    1.8A       is the desired outer resolution limit   (default: all data)
    PHI        is the phase set you want to use	       (default: most recent phase)
    
    FH_Patt.map is calculated with a 4*rms(FH) cutoff, as calculated by scaleit.
    
    example1: ./scriptsbestFH.com mtz/all.mtz
	will calculate an estimate of FH from all the difference data in 
	mtz/all.mtz.
	
    example2: ./scriptsbestFH.com dmed.mtz
	will calculate an estimate of FH from all the difference data in 
	mtz/all.mtz. (same as above), but will also calculate a phased map 
	of FH, using the most recently-added phase in dmed.mtz (PHIDM).  
	This is usually superior to ordinary difference Fouriers for finding 
	new heavy-atom sites.
    
    example3: ./scriptsbestFH.com mtz/all.mtz no DANOFlo Flo
	same thing, but leave the "DANOFlo difference data-set and "Flo" data
	set out of the calculation.
    
    description:
	This script offers the "new" functionality of computing a "Matthews FH" 
	estimate.  This analysis not only "averages" information from all your 
	diference data into a single data set, but reduces the systematic error
	produced by cross-terms in the substraction of anomalous and 
	isomorphous difference data: |FH| == |FPH-FP| != |FPH|-|FP|
    
	In bestFH.com, all anomalous difference data are scaled together, 
	and then added (sigma-weighted) together.  Then, all the possible
	isomorphous differences between "F"s in the mtz are subtracted, 
	scaled, and also added together.  Finally, Dano is scaled against
	Diso, and FH is calculated by the Pythagorean theorem.
	
	Care must be taken in the ordering of the "F" datasets.  For example, 
	in a 3-wavelength MAD experiment: Finf Fpeak Fhi should be the order
	used.  Fhi Fpeak Finf is okay too, but not Finf Fhi Fpeak.  The latter
	would result in Finf-Fhi and Fhi-Fpeak "canceling" each other, because
	the f' differences will have opposite signs.  bestFH.com will try to
	get this ordering right, but you should check the difference dataset
	list to make sure none of them are opposing each other.

	Note also that all the data in mtz/all.mtz should be from 
	crystals with metal sites at the same positions, otherwise, FH will 
	be a mix of the two site constellations.  
    
    
################################################################################
reindex.com	- general-purpose re-indexing script

    input:  data.mtz	    - mtz file to re-index (merged or unmerged)
    output: reindexed.mtz   - mtz file with the new space group
    
    examples: 
	./scriptsreindex.com data.P41212.mtz P43212
	  - will change the space group of "data.P41212.mtz" to P43212 
	    (assuming that is possible), and write the results to 
	    "reindexed.mtz"
	./scriptsreindex.com data.P2221.mtz P2122
	  - will change the space group of "data.P2221.mtz" to the "pseudo" 
	    space group "P2122", which is P2221, but with "a" as the screw 
	    axis.  This is done by leaving the mtz file in P2221, but 
	    permuting the cell (and the data) so that the shortest cell edge 
	    (normally "a"), is moved to the third cell parameter (the one 
	    with screw symmetry).
	
    description:
	This is a general utility for changing the assigned space group of 
	mtz data using the CCP4 program "reindex".  It works on merged and 
	unmerged data.  Re-assignment of the screw/rotation axes of 
	anisotropic orthorhombic space groups is supported (see example 2).
	
	"Flipping" between alternative axis assignments is also easily done.
	Just include the word "flip" on the reindex.com command line to switch
	to the "other" axis assignment.  This may be neccessary for any space
	group having two or more cell edges exactly the same length.  The only 
	tricky ones are R3 and P3x, which have four possible axis assignments.  
	To specify the remaining two, use the word "flip" two or three times 
	(respectively).  see /programs/ccp4-5.0.2/doc/reindexing.doc for details.
	
	Changing between space groups with different point group, or even
	lattice symmetry is allowed, but unadvisable!  These transformations
	involve mergeing or "un-mergeing" spots, which reindex can't do.
    
	Note: moving/removing screw axes will result in the "loss" of 
	some systematic absence reflections, so be careful.  It is probably
	advisable to always merge in P222, and reindex later.
    
################################################################################
SGsearch.com		- exhaustive space-group search
    reads: a scala script
    makes: a table of mergeing statistics
    usage: SGsearch.com [script.com] [raw.mtz] [rootSGs]
    where: 
	script.com is the scala script to use        (default: merge.com)
	raw.mtz    is the raw, unscaled data         (default: raw.mtz)
	rootSGs    is/are the "starting" space group (default: SG from raw.mtz)

    example: SGsearch.com merge.com P212121

	will run merge.com with every orthorhombic space group:
	    P222, P2221, (P2122, P2212), P21212, (P21221, P22121), and P212121

    Picking the wrong space group has been known to waste weeks to years of an 
    investigators time.  SGsearch.com uses the space group provided to get the
    general crystal system your crystal was indexed with, and will then try 
    mergeing your data in EVERY space group belonging to that crystal system.
    The Rmerge, systematic absences, and asymmetric unit volume will be presented
    in a neat table for your review.  
    
    The actual logs from the individual merge.com runs will be placed in the ./logs/
    directory, named merge.SG.log.  If SGsearch.com finds these logs aready exist, 
    it will use the statistics in them to make the table, this usually saves you a 
    lot of time re-generating the table, and you can always delete these logs, and 
    run SGsearch.com again.
    
    SGsearch.com is desiged to work with the merge.com provided by Wedger Elves, 
    but should work fine with any scala/truncate script that is capable of 
    accepting and applying a space group provided on its command line.

################################################################################
autoscala		- optimizer for SDCORR card
    reads: a scala script
    makes: a better scala script
    usage: ./scripts/autoscala script.com
    where: script.com is the \scala script to optimize

    example: ./scripts/autoscala ./scripts/merge.com

    Scala's SDCORRECTION card allows the assigned error (sigma) of the spot 
    intensities to be edited.  Most measurement programs cannot predict the
    effects of absorption and other systematic measurement errors, and therefore
    usually give unrealisticially low estimates of the error in the measured
    spot intensities.  You should read the scala documentation to find out 
    exactly how SDCORR works.
    Briefly, "correct" sigmas should be similar to the scatter of observed intensities.
    That is, if the 10 observations of hkl=(5,9,12) deviate from the average value
    of (5,9,12) by 100 units (rms), then the sigma of (5,9,12) should be 100.  So, 
    if the assigned sigma is 50, then the scatter/sigma will be 2.  This analysis, 
    grouped by intensity bins, is the last graph in the scala logfile.  You want 
    all the points on this graph to be as close to 1.0 as possible.  If you see this, 
    then your assigned sigmas are probably realistic.
    To save you from hours of diddling with the SDCORR numbers, autoscala uses a 
    "Golded-Section" search (derived from Numerical Recipies), to optimize the three 
    numbers for scala's SDCORRECTION card, using the deviation of the aforementioned
    graph from 1.0 as a target.  In CCP4 3.3 and beyond, the first number on the SDCORR 
    card is optimized internally (and might as well be "1"), but the remaining two can 
    be tuned up by autoscala.

################################################################################
mtz2various.com		- basic format-converter script
    reads: merged.mtz
    makes: outfile.EXT
	EXT -> FORMAT
	cif -> CIF
	hkl -> shelx
	tnt -> TNT
	fin -> XtalView
	phs -> XtalView
	fobs-> XPLOR
	cv  -> XPLOR
	cns -> CNS
    usage: mtz2various.com merged.mtz outfile.EXT [format]
    where: 
	merged.mtz   is the merged mtz file (containing Fs)
	outfile.EXT  is the filename you want to use for the exported data
	format	     is the (optional) program you want outfile.EXT formatted for

    examples:
	mtz2various.com merged.mtz merged.cif
	  - will convert merged.mtz to CIF format
	mtz2various.com all.mtz "F1" merged.fobs
	  - will convert "F1" in all.mtz to XPLOR format
	mtz2various.com merged.mtz merged.hkl shelx
	  - will convert merged.mtz to shelx format
	mtz2various.com merged.mtz merged.hkl tnt
	  - will convert merged.mtz to TNT format

    description:
	mtz2various.com is a general-purpose "smart" script for converting
	"F" data from an mtz file (such as ./mtz/all.mtz) to other file formats
	for other non-CCP4 programs.  The format of the output file can either
	be implied by using a standard file extension in the output file name,
	or declared explicitly and separately on the command line.  Free-R 
	flags are exported automatically, if they are present.  In the case of 
	XtalView files, a suitable CRYSTAL file is also generated.

################################################################################
autoscala		- optimizer for SDCORR card
    reads: a scala script
    makes: a better scala script
    usage: ./scripts/autoscala script.com
    where: script.com is the \scala script to optimize

    example: ./scripts/autoscala ./scripts/merge.com

    Scala's SDCORRECTION card allows the assigned error (sigma) of the spot 
    intensities to be edited.  Most measurement programs cannot predict the
    effects of absorption and other systematic measurement errors, and therefore
    usually give unrealisticially low estimates of the error in the measured
    spot intensities.  You should read the scala documentation to find out 
    exactly how SDCORR works.
    Briefly, "correct" sigmas should be similar to the scatter of observed intensities.
    That is, if the 10 observations of hkl=(5,9,12) deviate from the average value
    of (5,9,12) by 100 units (rms), then the sigma of (5,9,12) should be 100.  So, 
    if the assigned sigma is 50, then the scatter/sigma will be 2.  This analysis, 
    grouped by intensity bins, is the last graph in the scala logfile.  You want 
    all the points on this graph to be as close to 1.0 as possible.  If you see this, 
    then your assigned sigmas are probably realistic.
    To save you from hours of diddling with the SDCORR numbers, autoscala uses a 
    "Golded-Section" search (derived from Numerical Recipies), to optimize the three 
    numbers for scala's SDCORRECTION card, using the deviation of the aforementioned
    graph from 1.0 as a target.  In CCP4 3.3 and beyond, the first number on the SDCORR 
    card is optimized internally (and might as well be "1"), but the remaining two can 
    be tuned up by autoscala.

################################################################################
For more detailed information, see your CCP4 documentation in:
/programs/ccp4-5.0.2/doc

or go to the CCP4 homepage at:
netscape http://www.dl.ac.uk/CCP/CCP4/main.html

