Not So Big Data Blog Ramblings of a data engineer (or two)

Attempting to simulate the Antagonistic Pleiotropy Hypothesis

17 minute read

rabbit

Foreword

Hi all. This was a fascinating rabbit hole I found myself descending into. The world of biology, and evolutionary biology in particular, is intoxicating to me. It seeks to both explain the world that came before, and to predict certain behaviours of the natural world (often) long before we have the scientific means to prove the underlying mechanism. In this post, we’ll explore a small sliver of evolutionary biology – and do so entirely as a non-expert. If I’ve made any glaring mistakes, please send us an email or leave a comment!

I was set off along this exploratory journey after listening to brothers Brett and Eric Weinstein discussing Brett’s fascinating career as an evolutionary biologist over on Eric’s podcast, The Portal. The over 2-hour-long episode is well worth the listen to hear Brett’s story on his masterful and insightful prediction regarding long telomeres (we’ll get what these are later) in lab mice, as well as the corrupt forces in academia that, paraphrasing his older brother, “robbed him of his place in history”.

The topic of the Antagonistic Pleiotropy Hypothesis is a relatively minor footnote in their larger discussion, but the idea was a fun one that I impulsively began exploring with code. I’ve decided to split this post into two major parts, the first exploring what the Antagonistic Pleiotropy Hypothesis is and its implications. In the second part, I’ll share how we can potentially see its effect and behaviour in action by simulating an evolutionary environment with its own selective pressures, and observe the prevalence of various genes within a population of simplified animals.

Part Zero: A primer on Evolution

skull-illustration

I understand that not everyone may be familiar with evolution (or its most famous mechanism – natural selection) and the associated terminology. So, just to make sure we’re all on the same page, let’s go over the basics 1 at a high level. Hopefully we’ll also clear up some minor misconceptions along the way.

There are two important terms to understand, the first of which is evolution.

Evolution is a change in heritable characteristics of biological populations over time.

In other words, Evolution is a process of change. But what causes evolution to occur? Is it purely random change at the genetic level (for example, through mutation), or is there a more deterministic process? The most famous evolutionary mechanism is natural selection, as popularized by Charles Darwin in On the Origin of Species.

Natural selection is the differential survival and reproduction of individuals due to differences in phenotype.

Argued differently, there is some degree of variation within biological populations due to different genetics in individual members of a population. Some of these traits are beneficial or detrimental, either in terms of survivability or reproducibility, to an individual. Since the offspring of individuals’ genetics are composed of its parents (plus some chance of a random mutation), over time these “beneficial” genes will accumulate within the population. Have enough accumulations, and you eventually arrive at speciation, which is another similarly-fascinating topic we’ll cover another day.

What’s important to grasp, however, is that natural selection can only act on what nature “sees”. If a gene occurs, but doesn’t express as an observable trait (what’s known as a phenotype), then nature cannot “act” (i.e. select for or against that gene) on that particular trait. This becomes important when we discuss the Antagonistic Pleiotropy Hypothesis.

Part One: What is the Antagonistic Pleiotropy Hypothesis anyway?

fish

Against all odds, the Wikipedia article actually gives a rather good summary.

But, put differently, the Antagonistic Pleiotropy Hypothesis suggests that if you have a single gene that controls more than one trait or phenotype (pleiotropy), and one of these traits is beneficial to the organism in early life, and another is detrimental to the organism in later life (making the two phenotypes antagonistic in nature), then this gene will accumulate in the population.

This idea, among a few foundational others, was proposed by George C. Williams in his 1957 article Pleiotropy, Natural Selection, and the Evolution of Senescene. George C. Williams’ is a big deal in the biological world and, if you’re even slightly curious to learn more, I’d highly recommend skimming through his paper (this particular link isn’t behind a paywall) or grabbing it for a later reading.

Let’s take an example. Imagine a single gene controlling for two traits in animal. Let’s assume that if the gene is present it:

  1. Makes the animal better at finding food, and thus surviving in early life (since if it cannot find food, it’ll die).
  2. But makes the animal more likely to die of disease as it ages.

In this case, the hypothesis predicts that, since finding food contributes favourably to surviving in early life, that this gene will accumulate in the population despite the penalty the animal will pay in later life.

Intuitively, this makes sense: if your primary bottleneck for surviving until you can reproduce is finding food, then nature is unable to “see” the detrimental trait that occurs later in life, and thus the gene will accumulate (at least initially!).

But then things get fascinating. As the gene begins to accumulate in the population, and the individuals become more and more successful at surviving, the population begins to increasingly suffer from the detrimental trait as they age. Steadily, nature begins to “see” this detrimental phenotype, and can now select against it. Et voilà, you have two “antagonistic” phenotypes, controlled by a single gene. Now, it becomes a balancing game for nature.

So why is this such a curious hypothesis? There are multiple reasons, but the one that absolutely encapsulated my imagination is that it quite possibly explains why we age. And this was, in fact, what George C. Williams based his idea on – using the Anatagonistic Pleitropic Hypothesis as an explanation for senescene (aging).

Death by mortality. Death by immortality.

crab

What I’m about to briefly summarize is discussed in much greater detail in the conversation between Brett and Eric I mentioned in the Foreword. If you’re hungry for more after reading through this, that’s where you should begin (perhaps along with George C. Williams’ paper).

So, what’s with the title about death and (im)mortality?

There’s this curious entity called a telomere, which is a section of nucleotide sequences (the stuff DNA is made of) that exists at the end of a chromosome. The telomere is interesting in that it shortens each time chromosomes replicate (when cells divide!). When the telomere becomes too short to continue (encountering what’s known as the Hayflick limit), the cells are no longer able to divide. If our cells are no longer able to divide, we can no longer repair and maintain our bodies – and so, essentially, we age.

But why do telomeres exist in the first place? Isn’t it bizarre that evolution has selected for aging? Surely it’s advantageous for cells to be able to replicate forever, dividing continuously, allowing organisms to indefinitely repair damage and maintain their bodies?

“Continuously divide”.

Oh.

You mean like cancer?

Yes indeed, nature already knows how to live indefinitely – it’s solved the mortality problem a long time ago. After all, cancer cells just replicate continuously if left alone, seemingly without ever encountering their own Hayflick limit.

Have you ever thought about how your body is able to heal when you cut your finger? Poorly summarized, your cells essentially “listen” to chemical signals of their neighbors. If they can’t “hear” this signal (presumably because you’ve separated them using a knife), they divide in order to grow into the gap. But let’s imagine something strange occurs - what happens if a clump of cells go deaf?

Presumably, they divide. And they divide, unable to “hear” their neighbors. And they’ll divide indefinitely. If only there was a way to stop cells from continuously dividing after a certain point, to prevent something like this happening? Enter the telomere. This is likely where moles on your skin come from – cells that have gone, for lack of a better term “deaf”. Fortunately, however, they stopped dividing once their division “counter” ran out.

And so, we have ourselves an antagonistic pleiotropy scenario. We have, say, one gene that controls the amount of telomeres we have. Having more telomeres allows us to heal more rapidly and effectively, and sustain more cellular damage , and therefore live longer. But it comes at a risk of uncontrolled cell division (cancer). In contrast, having fewer telomeres greatly limits our ability to heal, and thus we age more quickly, but the risk of cancer is significantly reduced.

One gene (number of telomeres). Multiple phenotypes (living longer vs. cancer). And since not dying of cancer early in life greatly increases the odds of reproduction in early life, this gene will accumulate in the population.

What this signals to me personally however is that sadly2, according to the rules of nature anyway, we thus appear to be destined to die. Either from mortality, as our cell-division counter winds down and we age, or through the scourge of immortality (cancer). A both sobering and profound thought, perhaps worth dwelling on for a moment.

But let’s not linger too long here, onwards to Part Two.

Part Two: Can we simulate it?

brain

Whenever I encounter a process where small differences in probability result in larger accumulating changes over time, I always find myself wanting to build and simulate a model. If for no other reason than as a fun exercise that has the possibility of granting a little insight. No exceptions here.

So, what do we need to simulate the Antagonistic Pleiotropy Hypothesis? It’s (perhaps surprisingly) not that complicated to set up an evolutionary scenario. We need:

  1. An environment, that contains some kind of resource (let’s say, food).
  2. This resource is limited in some way (this enforces competition, or selective pressure).
  3. A species of organism that contains “genes”.
  4. These “genes” may or may not manifest as observable phenotypes in the organism.
  5. The organism must be able to reproduce, die and find food, with varying degrees of success (we’ll model these as probabilities).
  6. If an organism reproduces, the offspring must consist of its parents’ genetics.
  7. If an organism reproduces, the offspring must have a small probability for its genes to mutate (one of the genes can randomly mutate to any possible gene, even those that are not inherited).

The last point is particularly important – there needs to be some level of variation in the genetic material of the offspring. Not having any variation rapidly becomes a genetic death march, as a species is unable to adapt in any way to its environment through natural selection. This is a key requirement for adaptation, so we mustn’t forget this!

And, to extend the evolutionary scenario to one where we can investigate the Antagonistic Pleiotropy Hypothesis, we’ll modify things so that:

  1. One of the genes becomes pleiotropic in nature (influences at least 2 of the organism’s observable traits: being able to reproduce, die and find food)
  2. This pleiotropic gene must benefit the organism in its early life, and penalize it in its late life. The strength of these effects is currently unknown.

There’s some nuance in there, but it’s simple enough. It shouldn’t be too tricky to implement.

Our magical organism

Our species of Animal (any organism will do, but I’m going with Animal) has a single “chromosome” that consists of four genes that can be one of a, b, c or d 3.

We’ll also be simplifying things just a tiny bit for the sake of the code: Each Animal can have multiples of the same gene. So, an Animal with ['a', 'b', 'c', 'd'] as its chromosome is just as valid as ['a', 'a', 'd', 'c']. We ignore the order, and the more of a gene you have, the more powerful its effect 4. (So having two a genes, means its effect will be twice as strong). Also, if an animal does not find food (whether from lack of food or otherwise), it dies.

We also need to discuss how reproduction is implemented in our universe. When an animal reproduces, it randomly breeds with another member of the population. The offspring’s chromosome takes any two genes from either parent (50/50 split). In the case of a mutation, one of these inherited genes is uniformly sampled from the list of genes (['a', 'b', 'c', 'd']).

Each Animal has the following base probabilities:

  • \( p(\text{food}) = 0.6 \)
  • \( p(\text{reproduce}) = 0.5 \)
  • \( p(\text{death}) = 0.05 \cdot \text{age}\)
  • \( p(\text{mutate} \vert \text{reproduce}) = 0.01 \)

Our environment

Our environment is simple. The environment starts with a certain amount of food at the beginning of the simulation. Each time step (which represents one “year” or “generation”) sees the environment replenish a fixed amount food. In order to keep things fair-ish when food is rare, animals eat in a random order. And, of course, if the food is exhausted within a year, no additional animals can eat. This is part of our environmental pressure that will drive natural selection.

Simulation without gene effects

For now, we haven’t programmed any gene effects (so there’s no benefit / detriment) to having a particular gene. In this scenario, we’d expect to see the distribution of genes in the population be random for an individual simulation, and stay approximately uniform in general (Law of large numbers and all that). Things are actually a bit more tricky potentially5, but it’s a decent enough hypothesis. We’ll also start off our initial batch of organisms by uniformly sampling from all possible genes when constructing each initial Animal (so the gene distribution will be more or less uniform for our first batch).

Let’s run the simulation a few times and see what happens:

individual
simulations

As expected, more or less random. Some genes rise to prominence some of the time. This is what we’d expect to see when the genetics of an individual has zero effect on their observable traits. Let’s aggregate all of the individual trials to get a more general description of our simulations:

aggregate
simulations

And as you can see, more or less uniform (with a propensity to becoming uniform as we increase the number of trails). All good, nothing too surprising.

Let’s get on to the fun part.

Simulating with gene effects (but no pleiotropy)

To test out our understanding (and that everything is working correctly), let’s add a singular gene effect.

Let’s encode that having an a gene makes you a little bit more effective at finding food, say 5% better: \( p(\text{food} \vert \text{gene a}) = p(\text{food}) + 0.05 \cdot n_{\text{a}} \) 6. Remember that that having more of gene a will multiply the effect (\(n_{\text{a}}\) is the number of a genes). We don’t make the animal pay any penalties, for now.

We would expect to see the a gene become more prevalent within the population 7, since it provides a beneficial trait to individuals, making them more likely to survive and reproduce (and thus pass on their genes).

Let’s see what happens:

individual
simulations single
gene

Great! So, we definitely see the a gene consistently become more common in the population - and quite rapidly so. It’s remarkable how strong the effect of a relatively small 0.05 probability bump can be given enough time. The effect of which is, mind you, diluted if the animal is unlucky enough to not find any food before it runs out. It demonstrates quite clearly how, given enough time, marginal gains result in large scale change across a population.

There’s also an interesting side effect – a’s rise to prominence seems to occasionally be accompanied by another random gene. This is likely an artifact of our reproduction mechanism (two genes from each parent).

Let’s look at the aggregate to have the law of large numbers draw indicate some less-noisy trends for us:

aggregate simulations single
gene

The trend-line tells the whole story. Over time, the a gene accumulates, as we predicted.

But now, let’s tackle the whole point of this post – the pleitropic case

Simulating antagonistic pleiotropy

Let’s, at last, run the simulation for the antagonistic pleiotropic case – where one gene expresses in two observable ways, one benefiting the organism in early life and the other penalizing the organism in later life.

Let’s take our previous scenario, and add an antagonistic effect of a that makes the organism more susceptible to death as it ages:

\[p(\text{death} \vert \text{gene a}) = p(\text{death} \vert \text{age}) + 0.15 \cdot n_{\text{a}} \cdot \text{age},\]

and watch what happens:

individual
simulations pleiotropy

Ooph. That’s no good. It looks as if our a gene punishes the animals a little too harshly 8. So let’s boost the benefit the animal gets in early life a little bit. Let’s keep the effect of a on dying the same, but rather boost the probability of finding food a touch:

\[p(\text{food} \vert \text{gene a}) = 0.15 \cdot n_{\text{a}}.\]

Let’s re-run things and take a look:

individual
simulations
pleiotropy

Ah. Despite having a 0.15 probability increase per year to death, we do see the a gene accumulate in the population, provided it bumps the probability of finding food by 0.15. This nicely illustrates the antagonistic in “Antagonistic Pleiotropy Hypothesis”.

Finding the antagonistic tipping point

What’s peculiar about these kinds of experiments is that natural selection selects for genes in unpredictable ways. The accumulation of the a gene in the population for different values of \(p(\text{food} \vert \text{gene a})\) and \(p(\text{death} \vert \text{gene a})\) is not always easily predicted.

But we have computation on our side! So let’s run simulations 9 for a range of combinations of \(p(\text{food} \vert \text{gene a})\) and \(p(\text{death} \vert \text{gene a})\), and plot the average proportion of a genes in the population for each scenario:

food_death_grid

In this plot, the fill indicates the average proportion of the a gene across the whole gene population. The x-axis indicates the effective probability of finding food if an organism has one a gene (of course, more a genes increase the effect, up to a maximum of \(p(\text{food}) = 1\) ). The y-axis shows the same thing, but for \(p(\text{death})\) per ‘year’ the organism survives.

Here, we can clearly see the antagonistic pleiotropic nature of the a gene at play. Punish the organism in late life too severely, and the gene gets weeded out. Do the opposite, and the gene accumulates.

The “atagonistic boundary” is quite beautifully illustrated with a single plot, I think.

Conclusion

lobster

Thanks for making it all the way through! This was a longer post than what we usually put out here. I hope the dive was worth it.

We’ve gone through a whirlwind exploration through the fascinating Antagonistic Pleiotropy Hypothesis proposed by George C. Williams and its repercussions for evolution, natural selection and perhaps even our own mortal / immortal destiny (whew!).

We then also attempted to simulate both regular and antagonistic pleiotropic gene scenarios in order to gain insight into the effect of natural selection on the gene population given different gene effects, before finally finding and plotting the exact antagonistic boundary of our gene effects. Hope it was fun.

Till next time, Michael.

PS: The source code will become available soon - I’m very likely going to do another post on it!

Footnotes

  1. I’m not speaking as an expert, just as someone who holds a general interest. So I may accidentally say some things that aren’t technically correct. Please let me know! 

  2. Or not, depending on your life philosophy. 

  3. I’m very imaginative, I know. 

  4. I know this isn’t technically how genes work. I presume if it did in reality, nature would min-max, but we are running a vastly simplified scenario. (Spoiler, changing this in the code doesn’t drastically affect the outcome!) 

  5. Since a “bad-luck” event (say, a large portion of the animals with the a gene by sheer randomness don’t find food and die) can often lead to quite bizarre gene distributions and wild swings in a gene’s prevalence (since no gene is selected for by nature). So occasionally, you might see a gene take a knock early on, and then (with fewer individuals to propagate it to the next generation) eventually go extinct. 

  6. Up to a maximimum probability of 1, of course. We’ll clip anything higher than 1 to 1. 

  7. Of course, we still have a \(p(\text{mutate} \vert \text{reproduce}) = 0.01\), so it won’t become the only gene. 

  8. Astute readers will have no doubt suspect that this is done on purpose to illustrate the point :). 

  9. These took ~6 hours to run. I’ll likely share the details in another post in the future.