Below is a Venn diagram. These are diagrams that use circles to represent sets that contain objects. In our case the circles represent two sets, one a group of high-sugar foods and the other a group of healthy foods. The rectangle around theses sets can be thought of as representing the whole universe of all possible foods (mmm…). Where the two circles overlap the sets are said to “intersect”. The mathematical way of writing that sets A and B intersect is A ∩ B. All of the objects in both sets can be written as A ∪ B, the “union” of A and B (think ‘U’ for ‘u’nion). One other useful thing to know is the “complement” of a set, A’, which just means not A.

Let’s examine the items in the intersection High sugar ∩ Healthy. I think it would be a bit extreme to claim that anything in the intersection isn’t healthy, but it’s a fact that fruits like Bananas, Grapes and Mangoes contain a lot of sugar.

So we can already see how a Venn diagram might be useful as a visual tool to make sense of the world. But are there any other uses? Well, it turns out they can be very helpful to get a handle on certain problems that human beings generally have very little intuition for. Welcome to the world of “conditional probability”, where we answer questions like “what is the probability of something happening given that another thing already happened”. What is the probability of you reading this any further, given that I have just started talking about conditional probability? Go on, keep reading. What is the probability now, given that I just encouraged you to keep reading? In the real world, many events are conditional on other events, but if they are not, they are said to be “independent” events.

Let me start with a simple example involving a pack of playing cards and a Venn diagram. There are 52 cards in a full deck, and 4 different suits: clubs, spades, diamonds and hearts. Each suit has 13 cards comprising the numbers 2 to 10 and four picture cards which are the Ace, Jack, Queen and King.

In the diagram above the yellow circle labelled A represents all of the cards which are Aces. There are four Aces, the Ace of Clubs, The Ace of Hearts, The Ace of Diamonds and The Ace of Spades. The red circle labelled B represents all of the cards which are hearts. There are 13 hearts in the whole pack of 52 cards. Only one card in the whole pack is both an Ace and a Heart and that is the intersection A ∩ B.

So I could ask the question, given that the card is a heart, what is the probability that it is an ace? Well, there are 13 hearts and only one Ace of Hearts, so the probability is 1 in 13. Can we get to this using a Venn diagram?

So “given” that we know the card is a heart, we know it is in the green circle. The green circle contains 13 items, only one of which is an Ace, the Ace of Hearts. There is a mathematical way of saying “given”, so A | B means “A given that B is true”. In the case above, A stands for “the card is an Ace”, and B is “the card is a heart”. So literally, we want to know the probability that the card is an Ace given that we know the card is a heart.

Perhaps we can now start to see the intersection of two sets in a different way. If the green circle is a given i.e. we know we have a heart, then the intersection is the only bit of the green circle that also satisfies the red circle. Therefore, the number of items in the intersection, divided by the total number of items in the green circle gives us our probability (1/13).

More generally we can write:

and from this one can derive Bayes’ Theorem, the work horse of conditional probability. From this, quite surprising results can follow, like this one.

Suppose you are a famous athlete (you might be for all I know?) and to your horror you test positive for a performance enhancing drug. It’s the kind of test result that could get you banned from your sport, stop you earning a living and ruin your reputation forever. You are innocent, and you want to prove it to the world using maths…

So you do some research and find out that the test is 95% accurate at picking up the drugs when they really *are* in your body (a “true positive”). But you also find out that the test isn’t perfect and can sometimes get it wrong, producing a positive result even if you didn’t take any drugs (a “false positive”). However, you are a bit discouraged when you find out that the test is pretty good and a false positive result occurs only 3% of the time. The final bit of information you find out, by talking to some experts at wada and ukad, is that only 1% of all athletes take performance enhancing drugs. So what is the chance that you did take drugs, given that you tested positive??

You read up on conditional probability (you are trying to find out the chances that you took drugs *given* that you tested positive), you put the numbers in to Bayes’ Theorem, and you discover that there is only a 24% chance you took drugs despite the test being positive. Of course you knew you were innocent, but this is proof to the world that your innocence is more likely than your guilt. Phew! Maths has saved your career. Despite the test having a true positive rate of 95% and only predicting a false positive result 3% of the time, it is still much more likely you didn’t take drugs even though you tested positive!! Your second sample comes back negative and, vindicated, you go on to win Olympic gold.

In the drug testing of real sportsmen and women, the tests have to be extremely accurate with very high true positive rates and very low false positive rates. Furthermore there are multiple tests to reduce the chances of false positives wrecking somebody’s career.