My appreciation is that the data taken as a whole constitutes proof. Absolute proof? Well, there's almost no such thing in Science. Definitely not. Proof beyond "reasonable doubt"? Taking all the evidence together, I think so, yes. Proof "on the balance of probability"? Certainly, that was found by the Full Bench of the Australian Family Court as far back as 2003, when there was far less evidence than today.
But what about taking each piece of evidence individually, on its own? Is there a single, reliable touchstone? Adequate for a diagnosis, as conjectured by a New Scientist article I blogged about a month ago?
Er.... I don't think so, no. Much as I'd like to believe otherwise.
To illustrate the limits of reliability of each individual piece of evidence taken on its own, there's a marvellous post that explains it far better than I could, on Sugar and Slugs: Why Sex Differences Don’t Always Measure Up.
Please read the whole thing. I'll just give some pictorial hints. First, some graphs showing the overlaps involved. To start with, nothing controversial: just height.
Then what I consider the single best piece of evidence, numbers and types of cells in one particular area of the brain.
Well, so far, so good. There's overlap, but no worse than for height, and it's accepted without demur that "women are shorter than men" even though we all know many exceptions.
The one that illustrates graphically the problem we have regarding reliability is this:
On the "number of cases examined" axis, we flatline. The difference from zero is too small to see at this resolution, we've lost all data.
We can be really certain about men and women in general, but for trans people, not so much.
First - there aren't that many Transsexuals. 1 in 3000 people or so.
Second - in medicine and psychology, many propositions have been accepted not so much as "True" as "True Enough". We routinely prescribe and conduct surgery on the basis of propositions only proven to the level of 0.05. Meaning there's a 1 in 20 chance we're wrong. This "medical standard of proof", once determined, is deemed enough - and getting funding just to remove a bit of doubt from something "everyone already knows" is problematic. Add in the fact that the test involves autopsying well-preserved brains of trans people, and repeated experiments here just aren't going to happen.
fMRI and PET scans don't have this disadvantage, so while the overlap there is greater, and the fuzziness more fuzzy, we can get better reliability by repeating things hundreds or thousands of times without requiring the subject to be dead.
Which is good for Science, but for diagnostic tests? When there's so much overlap in individuals and so much uncertainty on single measurements? No. Not enough. It might help, but a definitive test it's not, and on its own never will be.
A series of different, independant tests of the same kind might though. If each test has a 50% chance of being wrong, and we do 10 different tests, all saying the same thing... then the odds of being wrong are less than 1 in 1000.