Hunting single genes does give a complete picture of how the human body works and illnesses develop. Instead, we should look at the networks of genes and proteins, says University of Tartu Professor Johan Björkegren.
You published a review article in Science Translational Medicine where you talk about ‘new biology’. What is wrong with the old one?
We are using ‘new’ in a double sense (also short for ‘network enabled wisdom’ – A.O.). We wanted to refer to the fact that for researchers it was not possible to address other than perhaps a single gene or a single protein and then try to figure out the role of the particular protein in a particular pathway.
We are not saying there was anything wrong with that but there were no tools to do anything else. All over the world most professors in biology and medicine are professors because they are experts on a particular pathway, sometimes even a particular gene.
Since the change in millennium there has been great belief in what would happen with the new knowledge of the genome. It is a little bit like entering a library. The library with all the books and the information that is in there is amazing, but if you don’t know what to look for you are sort of lost.
That describes a little of what happened when we saw the entire library at the turn of the millennium. I guess the first ten years have been quite a disappointment, but we are now starting to understand that the genome is very complex. In order to understand the genome we need to understand what is activated in the genome.
With the new biology, when we can screen the activity of genes, we can also understand that in different cell types, and also physiological and pathological conditions, different sections of the library are used. Depending on what particular section is used in a particular cell – say, a liver cell – we understand which DNA variants are important for that particular situation. The new biology builds on the fact that there are all these new tools which are demanding in many ways.
Which tools?
Besides genome widescreens we also needed a way to handle all this big data, as they call it more frequently now. We not only generate, we also need to analyse it. This is what [co-author, professor of Mount Sinai School of Medicine] Eric Schadt has been really instrumental in. He developed many ways useful to analyse human genome date to find these networks.
Three developments have been instrumental in using the new biology: the development of Internet, the development of new sequencing technologies and also the development of algorithms to infer networks from this type of data.
How to describe those networks of active genes or proteins?
If you look at the way we transport things, we do not transport things directly from one small town to another, but rather use hubs. The real big hubs, if you look at airports, for example, are quite few. Also, in social networks, most people don’t have interactions beyond the family and maybe a few friends.
But then there are these people that are very well connected. Most of us maybe know someone who knows someone who is really famous or otherwise well connected. This seems to be an efficient way of transporting information or energy or whatever it could be.
What the networks are really providing, why they are so important, is to show what these hubs are, these central regulators that we know are so important in the social and transportation networks. We believe that the hubs in these molecular networks of genes and proteins are equally important.
Surprisingly, many of the genes seen in the networks we have inferred so far are new genes. For example, we find that many of the fatty acid metabolism genes in the network that are already known are not the hubs. Except for one or two of the known genes, many of the hubs of fatty acid metabolism were previously not linked to fatty acid metabolism at all. Why is that?
Many of the genes that we already know seem to be what we call effector genes. They have a particular role in biology: They can code for receptors, enzymes or signalling molecules. What we also noticed is that we have not associated the new genes with well-known pathways before.
The reason is that they don’t have an effector role, they have a regulatory role. They are not discovered when you study fatty acids because they are not involved in the actual production or degradation of fatty acids, they are just regulators [of other genes].
Your article leaves the impression that genome-wide association studies (GWAS) that keep producing news about finding genes for one or another thing, are a wild goose chase when we try to understand human physiology or complex diseases. How do network models help us to understand the real causes of these diseases?
It is not really a wild goose chase, but we are criticising the way the GWAS databases have been analysed thus far. They are seeking the most significant variants for a particular disease. We are not saying that these variants are not true, not even saying these are not relevant, but we are saying the way the analysis has been performed thus far on the GWAS data sets has only revealed a very small fraction of the total risk scenario.
The way analysis has been done is designed in a way so those environmental factors that we think activate DNA variants important for the disease are not discovered. The ones that are discovered are completely independent of micro- and macro-environments. They are not having a huge effect on disease.
To make a comparison: when listening to a song, the GWAS findings would be like lowering the volume a little bit or raising it a little bit. It could be that they are systemically important but we don’t think they are really good targets for finding new treatments because these variants they have found are likely affect the disease over a very long time.
They are independent of the environment. They actually should start affecting the risk for the disease from the day you were born, then subtly increase in risk throughout life, and when you are 60 it will give you some effect on cardiovascular disease.
But what we are saying and what has been also shown in type-II diabetes is that important risk factors, the inherited DNA risk factors for these complex diseases, remain hidden in the GWAS data set.
If you look at people of normal weight, they don’t suffer any of these risk variants, so for them having a risk variant is not really a problem because it is not affecting risk. However, when you look at the same variant in people who are slightly overweight, with a body mass index over 26, then you see that those variants comprise a risk for disease.
Meaning that certain environmental factors affect a certain gene, which then distorts the balance of the network, thereby causing illness?
Exactly. The traditional way has only found the variants that continuously distorted the network, but with the network models we can find those that need given environmental pressure. It could be aging, smoking, obesity or something else that has an impact on you.
We distinguish between the macro-environment, which includes those who are smoking, overweight, and so on, and this macro-environment affects, on a second level, the micro-environment, which is the environment in different tissues. That will change the battery of co-factors and all of a sudden a variant that has been silent is active and starts contributing to disease. In order to find these you need to look at the networks instead of analysing individual genes.
The GWAS data sets are a good asset, and the Estonian biobank will be useful in the future as well, but we need to find out new ways to examine these databases. The traditional ways of finding one or two variants that are the most significant ones will only explain 5–10 per cent of the risk variation of the population. 80–90 per cent remains hidden. We still think these variants are to be found in the GWAS database, but you need networks to find these variants that are environmentally dependant.
If one disease can be caused by distorting the networks in many different ways, are we actually seeing many different diseases where we thought we had just one?
To take the example of atherosclerosis, the molecular mechanism is very similar even between human and mice. However, there are many ways to trigger atherosclerosis depending on different environmental risk factors that act in conjunction with DNA variants.
It could hypertension, or it could be obesity. Therefore, variants affect hypertension, hypercholesterolemia, obesity or diabetes, and all have the potential to cause the development of atherosclerosis in a secondary way.
You can view this as a funnel where many different diseases can contribute to atherosclerosis, but the process for atherosclerosis is very similar. But there are many ways of triggering that network to cause more atherosclerosis.
Does the multitude of root causes mean that novel treatments are harder to find? In the article you also talk more about prevention than treatment.
Networks are not only about disease, they are there for physiology. There are clearly many networks to be defined, even normally functioning ones.
Let’s take the example of liver and say you have a liver disease, maybe hepatitis. It is still uncertain how that will affect the physiological networks – whether they are replaced or altered with inflammatory components or if there are new cell types that will contribute to an entirely new inflammatory network. We need to understand those physiological networks and then we also need to understand the pathological networks.
What we see in the future is that we think these networks will eventually also be mirrored by markers in blood. If we can understand fairly well how the networks are functioning in organs, then by taking a blood sample and screening some maybe 3000 protein markers, it then gives us a good hint when we have something going on with, for example, the liver network or some other network in the fat.
We see that the future of medicine is in finding new drugs that treat different pathological networks which are the root of the evil. Today, I would say that the majority of our treatments are actually treating symptoms and not the cause.
We hope there will be much more focus on health so that in the future you might normally have your first blood and DNA screen at 35 or 40. From that we could tell when there is clearly some pathological thing going on, and then you could receive treatment, although you feel perfectly healthy. In that way we will treat early on – that’s the preventive part.
When is the future coming?
In maybe five years we will have an atlas of networks, and then in maybe another five years we will start understanding treatments. It is important to mention that what we are suggesting now has already been achieved. This is not only a vision, this is actually happening.
A similar cohort to what I’m doing with cardiovascular disease, Eric Schadt, has collected samples from the fat tissue and liver of diabetic patients. That has led to several new drugs in the pipeline.
You’re in Tartu not only because of blood ties (Johan Björkegren’s maternal grandparents fled Estonia in 1944 – A.O.) but also for the quality of research you can do in Tartu. What does Estonia have to offer?
I also have a lab at Karolinska Institute, where my research goals are the same: We are focusing on myocardial infarction and atherosclerosis. At Karolinska there is quite a lot of research interest from the clinical doctors, and from early on I learned of Arno Ruusalepp, who defended his PhD in Karolinska. He moved back to Tartu and is now one of the most important heart surgeons in Tartu.
I realised that Tartu has an ambition to really make a difference. You are in a state of development in Estonia, where you are building up the society, and while doing it why not do it the best way? University of Tartu, with its centres – particularly the Centre of Translational Genomics that has now been set up – very much wants to develop this personalised and preventive medicine based on the new insights in genomic sciences.
The Estonian version of the interview originally appeared in Postimees.
Schadt EE, & Björkegren JL (2012). NEW: network-enabled wisdom in biology, medicine, and health care. Science translational medicine, 4 (115) PMID: 22218693