P1 Lysozyme example

These are notes I made while processing this data, so it is like a diary
of what I did.  The notes are raw, so you will see my "Ooops" every now and
then.

I have downloaded what appear to be 5 scans.  
There are more scans, but I will not use them for now.

Scan  SeqStart  SeqEnd Dist  2theta  Start End Inc
a 0001 0060	400    0      -90 90 3.0
b 0001 0060	400    0      -90 90 3.0 (phi 180)
c 0001 0120	200    0      -90 90 1.5 
d 0001 0120	200    0      -90 90 1.5 (phi 180)
e 0001 0360	100    0      -90 90 0.5
f 0001 0360	100    0      -90 90 0.5 (phi 180)

By examination, I see that a & b are related and so are c & d.
It appears that the crystal is rotated 180 degrees around phi or omega 
before collected b and d.  The direction of rotation is important since
these scans start at -90 degrees.  The phi axis vector is either (1 0 0) or
(-1 0 0).  Trial and error shows it is (-1 0 0).

Modified d*TREK on 2011-Apr-27 to have the correct gonio vectors and to
use the CRYSTAL_GONIO_VALUES keyword in the header.

=======================================================================
The basic problems with this crystal are rather trivial:

P1. The crystal ends up being split at some point, so care must be taken
to index from the major part of the crystal and not be mis-led by the smaller
satellite bits of diffraction. This problem is overcome by using higher
resolution reflections for the refinement and indexing.  d*TREK allows one
to specify different resolutions for the different steps including during
integration.  For example, we integrate the entire active area of the detector,
but only use higher resolution spots to refine the crystla and experimental 
parameters.

P2. The beam center is wrong in every scan.  This is easily overcome by using
dtdisplay to overlay a few images and inspecting the diffraction pattern.  The
dtdisplay "Beam circle" cursor can trivially get the beam center.

P3. This is more insidious.  The rotation axis is not perpendicular to the X-ray
beam.  The detector position is not well known. For example, the longer crystal
to detector distance of scans a & b mean that the detector sags and has a 
significant 2-theta angle (it is NOT at 2theta 0 degrees).  We overcome these
kinds of problems by doing a global refinement of all parameters before we
start to integrate.  In order to do a global refinement, we find spots on many
images rather than on one image.

P4. Many reflections are saturated.  They are deleted by d*TREK by default, but
one could change this.

P5. The multiple scans are not a problem by themselve even when merged together.
However, I suggest that each scan be processed separately, then scaled 
separately at first to see if there are any problems.  One can reject 
reflections from any scan, too.  Then combine all the reflections and scale
together.

P6. Mosaicity.  It may make sense to fix the mosaicity for some of the scans
during integration.  We will test a few hypotheses.

P7. Default for spot circles in dtdisplay is 9 pixels, which is too small
to see them, so change to 33 (dtdisplay>Edit>Refln view props...> Size 33

===========================================================================

Things to watch out for:

W1. Make sure predictions are dead on.  Watch during integration with dtdisplay
especially in the corners and for widely-spaced-in-rotation images.

W2. Be sure to mask out the shadows of the beam stop.

W3. Watch how I created the refinment macro to use during integration.  It is
tricky, but better than the default macro.  In essence: restrict resolution,
use -cycles 30 and multiple -go steps:
...  -reso 2 .5  -cycles 30 -verbose 0 -go -go -go -go -verbose 1 -go

W4. During scaling, set the -sigma higher than normal if the scale factors do
no vary smoothly.  Then if still not smooth, reject some Batch IDs (that is,
reflections with a given Batch ID).

W5. We will use the d*TREK Prefix to name files and results.  The prefix
will be a_ for scan a, b_ for scan b, ..., and f_ for scan f.  In dtintegrate,
we will use Batch Prefix 1, 2, 3, ..., 6 for scan a, b, c, ..., f, respectively.

W6. During scaling together of all scans, we may wish to "pre-scale" scans 
by multiplying I and sigI by a scale factor, so that the plots look nice.
  See prescale.csh

We will process the separate scans in subdirectories A, B, C, D, E, F

W7. I saved ALL the log files and most of the other files.  Since d*TREK
does automatic versioning of files (see the manual), the first version is
file_1.log, the 2nd file_2.log while the most recent file will not have a
version number: file.log.  This may help you understand the path I took
to get there.

================================================================
Scan A


dtdisplay>File>New>Overlay ... 1-6, use the BeamCircle to set the beam center.
Process>dtprocess...

dtprocess, change Prefix to "a_", "Write a_dtprocess.head" for autoindexing
choose the P1 spacegroup.

If we Predict for Image 1, things are good, but when we
Predict for image 60, it does not match perfectly.  This suggests that the
experimental hardware is not exactly as specified.  So to refine all this,
(Don't forget in dtdisplay>Edit>Refln view props...>Size 33)

Set mode to Manual.  Go to Find.  Find spots on images 1, 15, 30, 45, and 60.
Now do a better job of refinement since we suspect
First use the macro above.  Notice that at corners, predictions are still
not perfect, so suspect lower reso spots have more weight, so change reso
to "-reso 2.5 0.5" and allow lower sigma spots "-sigma 3"
Change rejection limits to 1 1 1 (larger distance between predicted and observed
get included, but not too far away)

Integrate - Be sure to set BatchPrefix to 1 (we will use 2 for scan b, etc)
Double check that the refinement macro was used.  If it wasn't, then you forgot
to click on (i.e. select) a resultant a_dtrefine.head earlier on.

Scaling - try the defaults: Run Scale.  Click on >Utils>PlotStats...
The batch scale factors were not used, so increase Sigma from 5 to 10 
and Run Scale.  Looks better, Emul is < 1, so very nice. 
I restrained batch scale factors with 0.001 instead of 0.002.  Also be sure
to output a_dtunavg.ref for later use.

Rmerge = 2.9%  

=================================================
Scan B
We could use the detector position and beam center from Scan A, but let's
just repeat what we did for Scan A but with scan B images.

In dtscaleaverage, use -sigma 10
Rmerge = 2,7%

=================================================
Scan C
New detector position.  And we need to mask out the beamstop shadow

Index from image 1, predict for image 120 (see c1.xwd) not so good, so
find spots on images 1, 30, 60, 90, 120, 45, 75, 105 and refine with restricted
resolution to higher resolution

Predictions are better, but not perfect, so restrict reso to 1.5 to 0.5
OK, that looks good.  So integrate (don't forget BatchPrefix 3)

While integrating, noticed that maybe the mosaicity should be fixed to
a larger value like 0.8 or so.  Let's see what scaling gives us, then perhaps
re-integrate.

Scaling 1: Seems OK, set -sigma 10 and repeat, Rmerge = 3.8%,
but Emul is 1.90 and not below 1 as before.  Let's re-integrate with fixed
mosaicity.

Re-integrate: Set d*TREK prefix to c_m0p8_ and in Integrate use 
-mosaicitymodel 0 0.8  Since I used the .head file from the END of the
previous integration, I was to make sure predictions at BEGINNING of 
integration still match nicely.  They do.

Scaling 2:  Rmerge = 3.0%, so that's probably all I want to mess with here.

==============================
Scan D

Proceed like Scan C (use c_beam.mask)
Bad matching predictions for image 120.  So try something drastic, reindex
with the c_dtfind.ref from multiple images.  Then try refinement with 
 -reso 1.2 0.5, then with our macro used in this exercise
Now predictions look outstanding.

Integrate with a fixed mosaicity of 0.8
Scaling:  Use the defaults Rmerge = 3.0%
===================================================================

Scan E

New scan, need new bad pixel mask.
Find spots on a number of images, use them in dtrefine, things look great.
Predictions match for images 1 to 360

Integrate with Batch prefix 5.

Scaling: With defaults, a couple of images have bogus scale factors, so
change from -sigma 5 to -sigma 10 to see if that makes things better...
Does not quite work, so exclude batch 50321 from scaling and scale again.
This worked!  Rmerge=2.9%
=====================================================================

Scan F
Like scan E, use e_beam.mask
Use the same refine macro: ... +All -sigma 2 -reso 1 0.5 -rej 1 1 1 -cycles 30
 ... -verbose 0 -go -go -go -go -verbose 1 -go

Scan F proceeds smoothly, 
Scaling: Suggest last batch should be excluded: 60360
Rmerge = 3.0%

Ooops!  All the files in the F subdirectory had d*TREK prefix e_
Simply re-run dtscaleaverage with the f_ prefix

=====================================================================
OVERALL SCALING of the 6 scans

There are many possible ways to scale, but we will start with the most
straightforward:  We will combine and scale the previously scaled 
(but unaveraged) measurements from the 6 scans that have been processed.

Copy [a-f]_dtunavg.ref to the SCALE directory.
Copy e_dtintegrate.head to the SCALE directory

Now use
  dtprocess e_dtintegrate.head -nodisplay
in the SCALE directory, in the Setup dialog, change d*TREK output file prefix
to abcdef_ and then click on "Write abcdef_dtprocess.head"
Then click on "[Merge refln files]" in the flow bar.
Select the 6 *_dtunavg.ref files and click on "Run merge"
(You will need to use ctrl-click to add to the selection.)
This will also create an abcdef_dtreflnmerge.scom file that you will edit later.

Select "Scale/Average" in the flow bar.

You will now scale the merged file.  

Since absorption correction was applied it does not need to be applied again 
(but see below), so select "Batch only"

Since the error model has already been
adjusted it does not need to be adjusted again.  Set "Weight multiplier" to 1.0
and "Weight addend" to 0.0001

Since batches were already rejected from the *_dtunavg.ref files, no batches
need to be rejected at first.

Click on "Run scale"

Then click on Utils>PlotStats...(see scale1.xwd)

The incoming intensities are wide-ranging as shown in the scale factor table
and in the plot.  This is OK, but not conducive to understanding what is going
on.  Let's get a better plot by pre-scaling all the scans before running
dtscaleaverage.  We do this by examining the log file and estimating a single
scale factor to apply separately to the 6 scans.  We could also base this on
exposure time, but it is also easy to do this by inspection of the numbers.

Let's try to make everything close to scan f.  In the initial scaling, scan f
has scale factors around 0.6, so does scan e, (that makes sense!), so we will
not scale them, but will scale the other scans to them.

It looks like scan a and b need a factor of 100, while scans c and d need
a factor of 15.

To do this see prescale.scom in the directory.  We multiply intensities and
sigmas by the scale factors with the options of dtreflnmerge:
  -fIntensity\*=100 -fSigmaI\*=100
where 100 is the multiplying factor and \*= is an "escaped *= operator.

So edit abcdef_dtreflnmerge.scom and save the changes as prescale.scom
then run that.

Oops, output file name was *dtprofit.ref when it should have been *dtunavg.ref
so simply edit prescale.scom and re-run.

Now back in dtprocess, select the newly merged reflnlist file.  It is not
found in the "Reflnlists" list at first, so click on File>Reflnlist... and
select it.  This will also refresh the list in the main dialog.

Click "Run Scale"  Scales now plot better (scale2.xwd), but there are issues
with scale factors of scans a and b.  To try to get them to calm now, I 
will use -sigma 10.  That seemed to work (see scale3.xwd).  Let's also restrain
the batch scale factors more with -batchrestrain 0.001.  That is better, but
there are still little unexplained blips for batch IDs 50001 and 40094, so
reject them and scale again.  Looks OK, so we are done.


==============
Another way to Scale?
Instead of using the *_dtunavg.ref to scale, let's to use the *_dtprofit.ref
files instead.

First combine them with the prescale_dtprofit.scom script

Then in dtprocess, change d*TREK prefix to abcdef_dtp_ in order to keep things
separated from the previous scaling.

In the Scale/Average dialog, use "Batch+4th 3D", Weight multiplier -2, 
and Weight Added -0.03, no rejects, and batchrestrain 0.002 (i.e the defaults), 

Ooops, had wrong file name (extra "_e" in the name) for the merged 
*_dtprofit.ref files.  So edit prescale_dtprofit.scom and run again.

Then "Run scale" again.  Results in abcdef_dtp_dtscaleaverage.log

In theory, it looks OK, but in practice, I do not trust the error model
 (with Emul 0.15).  I can try to manually adjust this and see the result.

Change the d*TREK prefix to 2abcdef_dtp_ to keep these results and tests
separate from the others.

Set Emul to 1 and Eadd to 0.05.  The results look rather good, so I'll
reject batch IDs 40096 & 50001 and call it a day.

FINAL results from 2abcdef_dtp_dtscaleaverage.log


Summary of data collection statistics
-------------------------------------------------------------
 Spacegroup                       P1
 Unit cell dimensions             27.08   31.26   33.76
                                  87.97   71.99   67.86
 Mosaicity                        0.20

 Resolution range                 28.83 - 0.60    (0.62 - 0.60)
 Total number of reflections      981757
 Number of unique reflections     182609
 Average redundancy               5.38            (2.77)
 % completeness                   73.8            (8.9)
 Rmerge                           0.056           (0.551)
 Rmeas                            0.060           (0.665)
 RmeasA (I+,I- reflns kept apart) 0.062           (0.647)
 Reduced ChiSquared               1.20            (1.50)
 Output <I/sigI>                  16.4            (1.5)
-------------------------------------------------------------
  Note: Values in () are for the last resolution shell.


  995105 reflections in data set
       2 reflections rejected (|ChiSq| >  50.00)
   13348 reflections total rejected (  1.34%  |Deviation|/sigma >  40.14)


FINAL output file for the next step:

2abcdef_dtp_dtscale.ref

==================================================================
As an alternate, let's cut resolution to 0.65 Angstrom and check statistics

Use 0.65A_2abcdef_dtp_ for the PREFIX.  Results end up in


0.65A_2abcdef_dtp_dtscaleaverage.log and
0.65A_2abcdef_dtp_dtscale.ref