Thursday, 6 March 2014

How do you imagine the future of data sharing in healthcare or research?

Each day more and more data is collated. This data could have huge effects for healthcare and research if properly used. But will this knowledge be harnessed and used to its biggest advantage?

Will this knowledge be harnessed?
Will it be used to its biggest advantage?

There is a growing potential in sharing medical data but there is still much work to be done in gaining people's trust and encouraging them that it's the right thing to do: it's for our own good and that of mankind.  Some people may be naturally less inclined or have good reason not to do so.  And rightfully so since there might be a few immediate drawbacks like getting scooped in research or illicit use of someone's medical data to discriminate.  We have a fear of being singled out, of being watched.
Are we willing to accept that data sharing might have some negative effects at the timescale of the individual for the sake of the greater good?
We need to encourage sharing of data, we need to promote the moral trust to think rationally and altruistically.
But it is our responsibility to share the benefits of our research to support greater public involvement to prove that the benefits outweigh the drawbacks.  If we can show that our data sharing is saving lives in the long run.
Convincing people is hard but if we share the good coming of data sharing in healthcare and research then we will be able to distil this positive message and we can guarantee a positive feedback loop.

Building trust takes years but only moments to tear down.  Internet companies such as Facebook and Google have deceived us.  Already people give out a lot of information unwittingly.  We would like to avoid this.
Scientific journals have deceived us: Eg recent articles about most research findings being FP.  Scientists are often much more cynical about science than people outside the field.

Ideally we would like people to know that there are giving information, we do not want to deceive them.  We should make it clear what their data will be used for.
For example Google is able to trace and predict epidemic based on search terms.  That's a good use of our data, one that benefits mankind as a whole.
However we don't like people making a profit at our expense by selling our data to third parties.
But if data is considered private then there will always be market for it.  Better an open market.
Nothing worse then breaking a promise: promising confidentiality or giving the illusion of confidentiality to later sell off the information to a third party.
If everything is in the public domain to start with then there is no need for this black market.
However this is where information sharing can help. Sharing information has had a huge psychological impact on our society.  I believe,  perhaps quite naively, that we are promoting a more open, honest and tolerant society, one were we have nothing to hide.  
Thinking in terms of utility. Bentham: the greater good.
Possibly we need to distinguish population data from individual longitudinal data.
We are scared of being watched.

Promoting scientific honesty, thinking in terms of utility and greater good
It's ok to be wrong.  And it's ok for things to be incomplete.  In fact sometimes we learn new things from a person's draft, about their way of thinking that are not obvious in the final product.  But it's in our nature to fear being single out as being wrong, as an outlier.  In fact we (collectively) learn more from when things don't work then when they do.  The important thing is to learn from our collective mistakes.
Simple when something works we don't need to fix it and so are less motivated to understand how it works.


I believe that the Wikipedia model shows that objectivity and scientific honesty prevails amidst dialogue. Open data encourages scientific dialogue. Complete transparency . Compare the performance of an athlete who trains by himself and one who trains with others.  There are many cases where competition drives progress but also cases where competition distracts from alternative roads less travelled, inhibits diversity and encourages lying and deceit.

Massive parallelisation of collection and analysis
It is pretty clear that everything needs to be parallelised/distributed.
Massively parallel collection of phenotypes.
Collating data efficiently.  Preserving anonymity.
Genetics risk factors will be updated on the fly.
Parellelisation makes consistency harder.

The objectivity of data, unlabelled data
Putting labels on things can be as useful as it can be destructive.  We've learned not to label ourselves, now we need to learn not to label our data.  The issue with data sharing is not so much the data itself but the interpretation of it, the label that comes with it which can be misleading.  For example products when you buy a product in the supermarket it comes with a detailed list of ingredients but not with a risk factor.  A recent example is 23 and me who were sued over their diagnostics.  It's one thing to collect the data, it's another thing to interpret it.  Over-dramatisation carries the risk of causing mass-hysteria.
 I believe we need to encourage data sharing without the interpretation of the data, or at least provide several interpretations of the data.   Every dataset should come with a disclaimer stating that the data is provided as such, that it came off the machine x, has undergone the following steps of QC.
There are many levels of raw data.
Many possible data labels, no label is permanent, many data interpretations
Drowning in data, starving for information
In science there are often many competing hypotheses which, depending on the the data, have posterior probabilities of being true.  In the light of new data, these posterior probabilities might change or new hypotheses might emerge.
This is the increasingly popular Bayesian way of thinking whereby our beliefs are continuously updated in light of new data.
Although we have learned a great deal about genetic data in the last 20 years there is still a lot we don't know.  We have high-level conceptual models of genes, of how the immune system works, of how cancer metastasizes.  But in some cases we still have very little predictive power of how a disease will evolve, how efficient is a vaccination.
Jumping to conclusions to diagnosis for the sake of impact is one of the biggest problem  we are facing in research, lack of objectivity. So called expert judgments overuling objectivity.  Dismissing competing hypotheses, oversimplifying before enough evidence has been gathered, leaps of reasoning, favouring elegant solutions. People blindly following the opinion of so-called experts.  We need a minimum of trust which is the point of peer reviewing to establish of knowledge base.  Some journals are more trusted than others.
Keeping our options open
But what if we lack the expertise to analyse the data?  When do we chose to suspend our disbelief?  A more mundane example, say I bring my car to the garage for a road test.  Do I trust the mechanics diagnosis?  Do I consult a second opinion?

As long as we are allowed to question and have these options at our disposal.  We don't own the earth, we don't own our genetic code, we are merely borrowing it from future generations.
We are only transiently here but we have the chance of contributing to something that may outlive us all.
My naive hope is that the future of data sharing is a much simpler than the present. 

I believe (albeit naively) that humanity has come of age, that our tolerance, understanding, and scientific openness has reached a point where data can be put in the public space without fear of confidentiality, judgement or reprisal.
That we become as open about our genetics and medical problems than about our thoughts, religion, sexual orientation.  That these things don't become newsworthy anymore.  If anything genetics shows us that we are all exceptions, we all carry minor alleles which distinguish us from everyone else.  We are all genetically flawed in some way.  It's normal to be different, the mean doesnt' exist.
I see a future so simple that data sharing is no longer a newsworthy question, but my job is not to predict it but to enable it.