If you only have one data point, then you have an “anecdote”. Like the photo below:

The information we have is that of a single Chacma Baboon *Papio ursinis *on a date (30 December 2018) at a locality (Bettys Bay, Western Cape, South Africa), engaging in a nefarious activity (housebreaking and theft). This is an anecdote. From this single observation, we cannot draw the conclusion that lone baboons regularly raid holiday homes at Bettys Bay in summer. This is a sample of size one. To make decisions about how and when baboons in Bettys Bay need to be managed, a much larger sample of data is needed.

Sometimes a sample of size one is massively important. It can alert us to a new and emerging issue. There is an awesome paper in *Biodiversity Observations* by citizen scientists John Fincham and Nollie Lambrechts. It is called “How many tortoises do a pair of Pied Crows *Corvus albus* need to kill to feed their chicks?” The abstract reads: “This paper presents proof of heavy predation on tortoises by a pair of Pied Crows at a single nest site in order to rear successive broods of chicks. ” The operative word is “single”. This is a sample of size one. From this single observation, it would be irresponsible to decide to cull Pied Crows to save the tortoise.

The paper by John Fincham and Nollie Lambrechts comes to precisely the correct conclusion by saying: “A comprehensive survey to establish the extent to which this degree of damage is replicated needs to be undertaken urgently.” This is an important “biodiversity observation”. The authors might be onto a real conservation issue for tortoises. But they might equally well have discovered an unusual pair of Pied Crows! You cannot take management action on an anecdote, a sample of size one

If the sample size is two, then you really just have two anecdotes, two data points. You cannot draw conclusions from a sample of size two. How about three? How large a sample do you need to be able to draw reliable conclusions? How many data points do you need before you can decide whether an intervention is needed? A statistician would talk about “sample size” and denote this unknown number with the letter *n*.

There is (unfortunately) no straightforward answer to questions about sample size. There is no rule of thumb. Ultimately the answer lies in discovering how variable the thing you are trying to measure is.

If you are a budding astronomer, say in Ancient Egypt, and you wanted to find out the number of days from one full moon to the next. The answer is very dull: 29½ days. After you have got the same answer repeatedly, it is clear that you got it right first time. All you really needed was a sample of size one. If there is no variability, a sample of size one is adequate. But you cannot know that at first!

The eggs of the African Black Oystercatcher *Haematopus moquini* are variable in length, so you definitely need to measure more than one egg to get a good handle on average egg length. But this a not a particularly variable characteristic, so once you have measured a small sample, you have a pretty accurate estimate of egg length.

In contrast to the lengths of oystercatcher eggs, the sizes of African Elephant herds are very variable. So to get a good estimate of “average” herd size, a large sample size is essential.

In this blog, we have learnt that a sample of size one can (usually) be dismissed as an “anecdote”. We have learnt that, as the thing we want to measure gets more variable, we need larger and larger sample sizes to be able to draw conclusions from the data. Most of the time, large variability is a pain, requiring that we get a large samples to estimate the “average” of the thing we want to measure.

In future blogs, we will think about sensible ways to measure the average in a sample of data, and about how we measure variability.

A very clear explanation about the sample size concept. I have really enjoyed reading this article and with your permission will like to use it to begin during the introductory discussions with the students in my biostatistics class.

Thank you, Sam. You (and everyone else) are welcome to use this in any way. This is going to be part of a series, so please keep coming back. There is already one on means and medians (http://thebdi.org/blog/2019/06/26/exploring-data-the-median-and-the-mean-and-everything-in-between/)