Small crystals burn up too fast.

I dare anyone to write an automation program for solving the structure by SAD phasing using only
the data in xtal001 through xtal070 of these images:
img files

also available as 15 530 MB tarballs.

500 MB tarballs are also available for just image number
001,
002, and
003

Don't let the unit cell fool you. This is not lysozyme. This is titin (1g1c), with the unit cell squeezed a bit so that two are exactly the same length. Although the space group is P212121, there will be an indexing ambiguity that you must resolve.

These data are a realistic simulation of the radiation damage situation faced with a lysozyme-sized protein growing ~5 micron crystals and shot with a 6 micron beam.

The exposure time was adjusted to get decent resolution on the first image, but unfortunately, you don't get very many shots before the crystal dies! And once you start trying to scale and merge the data, it gets even worse. The rad dam creates "non-isomorphism" that rapidly becomes unmanageable, despite the fact that the damage model used here is actually a very simple equation (described by Holton & Frankel, 2010).

The real trick with this dataset, however, is the fact that the a and b axes are the same length, but the space group is P212121. This means that autoindexing will get a and b swapped for about half of the wedges and you will need to check each one of them and "flip" the ones that don't agree with the rest.

Is this a common
problem when mergeing data from many crystals? Yes.

Is there a program for doing this automatically? No.

Good luck.

With 5 micron crystals and a 6 micron beam it is formally impossible to get a complete data set from a single crystal (Holton & Frankel, 2010). They burn up too fast. Solving structures from micro-focus beams therefore requires data processing software that can assemble data from multiple crystals that may or may not have ambiguous indexing, may have radiation damage, and are individually highly incomplete. Short of actually cheating (see below), can you figure this out?

If you can, I'll post your solution, and a link to your software here.

They are from simulated diffraction patterns of titin (1g1c).

Processing has many options, and XDS also changes over time. A recent "naiive" processing run where XDS was given no information
about the data other than the image headers is available as XDS_ASCII.HKL (51 MB) or
INTEGRATE.HKL (70 MB).

An earlier version of XDS (2015) did things a little differently INTEGRATE.HKL (34 MB)
and XDS_ASCII.HKL (18 MB).

If you just want to experiment with radiation damage correction and data merging strategy the correctly-indexed and full-resolution INTEGRATE.HKL (74 MB) and XDS_ASCII.HKL (37 MB) files for you.

So, is there anyone out there who can piece this data together? Anyone?

If you think you have found a way to solve this without "cheating" in any way, let me know! I will re-name the challenge after you.

James Holton <JMHolton@lbl.gov>