The Tom Terwilliger challenge for crystallography

The challenge:

I dare anyone who considers themself an expert macromolecular crystallographer to find a way to solve this data using only the anomalous differences

Tom Terwilliger got it with this data. Can you do any better? If you can, I'll name this challenge after you.

Why?

Just as the legendary John Henry faced down a machine built to take his job, macromolecular crystallographers have seen a similar cyberization of their profession in the early 21st century. It seems that everything these days is automated and push-button and program developers seem to only compete on how much faster their automation package can solve a structure as opposed to someone else's automation package. Although the field is moving much faster than ever before, this situation could be considered by some to be getting depressing. Are there no genuine crystallographic challenges left for a human being to solve?

Well, of course there are. And this is one of them. The classic challenge in macromolecular crystallography is looking critically at a "ropy" electron density map and divining a helix here and a side chain there that eventually lets you build your way out to a refined structure. Yes, there are now many automation programs for doing this for you, but the important question is: are they better at it than you?

Here's your chance to prove they aren't:

Tom Terwilliger took up my challege last year and managed to take the anomalous data in this file:

possible.mtz

feed them with the right set of parameters into phenix.autobuild with the right sites and managed to converge on the correct structure. Specifically, the "right answer" here is the PDB entry 3dko.

However, the anomalous data in this file:

impossible.mtz

although only slightly different than those in possible.mtz above, lead to an abysmal failure of every phasing and model-building package anyone has ever tried.

Short of actually cheating (see below), there doesn't seem to be any automated way to arrive at a solved structure from these phases. What is interesting about this is how remarkably similar the data are. Only 0.6% different in F, and 7% different in anomalous differences. The question is: can you do any better?

Where did these data come from?

They are actually from simulated diffraction patterns created for an educational workshop to demonstrate to novice crystallographers what a good anomalous signal and a bad anomalous signal look like. The challenge data here were made by interpolating between the "goodsignal" and "badsignal" images. This is effectively equivalent to changing the fraction of Se incorporation in the Met sites of the structure in 3dko. This particular PDB entry was selected primarily because the one long unit cell axis can be used to demonstrate spot overlap problems.

However, in the present "Tom Terwilliger challenge" case, the only variable that changed between the two data sets was the fraction of Se incorporation. The difference between "possible" and "impossible" is just a 1% increment in Se occupancy. The "possible" dataset has 11% Se incorporation into the 12 methionine residues you will find in the sequence, but the "impossible" dataset only has 10% Se incorporation.

What if I want to process the data myself?

You can download the raw images for the goodsignal (tarball) and badsignal (tarball) data sets and then run my image-mixing script img_mix.com like this:

./img_mix.com badsignal/fake_1_\#\#\#.img 0.89 goodsignal/fake_1_\#\#\#.img -outprefix possible_
./img_mix.com badsignal/fake_1_\#\#\#.img 0.90 goodsignal/fake_1_\#\#\#.img -outprefix impossible_

Or you can try a 0.895, 0.891, or even smaller fractions to do something in between. You are limited only by the fact that the image file format only supports integers when "mixing" two images.

What constitutes "cheating"?

For the results of this challenge to be useful to practicing crystallographers (both those of us who use software and those who develop it) your "solution" to the "impossible" problem must be a plausible "before you knew the right answer" scenario. For example, simply dropping in the "right answer" (3dko) and refining is definitely cheating.

Using the right sequence information is not cheating, since that is generally something you will know before you sit down to collect data. You might also know that your Se incorporation level is ~10%.

I also don't think it is cheating to take advantage of available technology. Even John Henry had a hammer. And finding a way to work with the steam hammer instead of against it would have been healthier. However, simply feeding the sequence into BALBES is definitely cheating! This is because BALBES will simply use 3dko as a molecular-replacement search model, and since the structure is absolutely identical to the "right answer", the subsequent molecular replacement and refinement will be trivial.

And yes, this does beg another important question: how close of a homolog can you use without it being considered cheating? Well, I'd define this as the homolog that is just beyond the reach of current automation technology. I have not done a search for which starting model fits this bill, but if you can find two very closely-related structures that can and cannot be used to solve 3dko, let me know!

So, is there anyone out there who can beat Tom Terwilliger? Anyone?

If you think you have found a way to build out of this map without "cheating" in any way, let me know! I will re-name the challenge after you.

James Holton <JMHolton@lbl.gov>