Cluster Busting – Why K, not R, is the most important variable in the pandemic

/
9

Highly Unpredictable 

On 6 February 2020, a woman living in Daegu checked herself into hospital following a minor traffic accident. Despite a high fever, she continued about her day-to-day life in between visits to the hospital. She attended two church services and met a friend for a buffet lunch at a hotel. In the following days, as her symptoms worsened, she was advised to take a test for COVID-19. She was soon identified as South Korea’s 31st case. Referred to as Patient 31, she played an important role in accelerating the country’s epidemic. In the following days, hundreds of individuals in her church had tested positive. Soon, investigators drew a list of thousands of individuals that had been in contact with Patient 31 in those crucial days. Clusters of cases were all linked back to Patient 31, the largest of which was that of her church, which saw over 5,000 cases. Contact tracers suggested that this single patient could be the source of up to 60% of South Korea’s cases. 

The situation in South Korea, in which infections spread in an unpredictable way and appeared in bursts and clusters, is repeated all over the world. After many months of trying to understand the dynamics of this pandemic, questions still remain: 

There are plenty of suggestions about the different progression of the pandemic in different parts of the world. Might there be different levels of exposure to related coronaviruses in different regions, leading to some degree of immunity and protection in certain communities? Or perhaps might there be inherent discrepancies in the ability of different populations to fight off infections? Do older, more infirmed populations lead to a worse epidemic?  

The Problem with Averages 

Our efforts to understand the progression of the pandemic and the impact of our protective measures often focuses on the now famous R number. We hear about how this number – the average number of secondary infections generated by a single infection – changes on an almost real-time basis. In the U.K. and many other countries around the world, the current R number in a given region dictates the severity of the restrictions that are imposed. The problem with the R number is that it is an average generated from a tremendous amount of data points. What it doesn’t do is show how that data is distributed – an R number of 2 could in fact mean that each individual does indeed infect two others, but it could also mean that one individual could infect hundreds and most could infect nobody – in both cases the average would still be 2.   

Let’s imagine I am a multi-billionaire. Let’s say I’m even richer than Jeff Bezos (currently the wealthiest individual in the world) – I’m as rich as Mansa Musa, the King of Mali who ruled from 1280 – 1337 and, adjusted for inflation, had a net worth of roughly $400 billion.  If I walk into a room containing 100 people on the average salary in the UK (£29,000) then the average wealth of the now 101 people in the room grows to nearly £4 billion. That average wealth figure does nothing to tell you about the distribution of wealth in that room. 

The distribution of new infections is becoming an increasingly important factor in understanding this pandemic. It is becoming clear that not every COVID-spreader are not made equal. Some individuals seem capable of infecting dozens or hundreds of others if the conditions are right, but others simply do not infect anyone. Epidemiologists call this discrepancy the dispersion factor. It also has a letter associated with it – K. It is a statistical value that tells us how much variation there is in the distribution of infections. The lower the value of K, the more transmission comes from a smaller number of people. This is important because it tells us a great deal about the dynamics of the spread of the virus – and could challenge the measures that have been taken to control it.   

The K Number 

A recent study of clusters of infection outside of China estimated that as few as 10% of infected people may be responsible for up to 80% of all transmission – meaning that the vast majority of people do not transmit the virus at all. This means that different countries can experience substantially different epidemics – one might see an explosion of cases resulting from a small number of super-spreading events and another may see a very large number of introductions of COVID-19 and not have a serious outbreak. The K number for SARS-CoV-2 – the virus responsible for COVID-19, has been estimated to be as low as 0.1, compared to roughly 1 for the 1918 Spanish Influenza pandemic. This may not sound like a large difference, but in reality it means the two viruses are transmitted in completely different ways. Viruses such as SARS-CoV-2, with a highly varied dispersal of transmission and a very low K number, are referred to as ‘over-dispersed’. 

A virus with an evenly distributed transmission and a high K number (roughly 1). This virus does not rely on super-spreading events and so the R number is a more accurate reflection of transmission.
A virus with a highly uneven distribution of transmission and a low K number (roughly 0.1). Very few infected individuals pass on the virus to others. Instead, there are small explosions of clusters generated by ‘super spreaders’. The R number would not be a very accurate reflection of transmission in this case.

That COVID-19 is so over-dispersed is not surprising – the coronaviruses that caused the SARS epidemic and the evolving MERS epidemic also rely on super-spreading events. However, not all viruses spread in this way. As highlighted above, Influenza viruses are far more predictable with a far less random sequence of infections. This means that the R number is a truer reflection of the true pattern of spread. The stochastic nature of over-dispersed viruses such as SARS-CoV-2 makes it far harder to judge the nature of the pandemic through the R number alone.  

This has a real-world impact: if you consider a school environment, it is often the case that a single pupil or a cluster of pupils cause all of a teacher’s grief in the classroom. Specifically dealing with that pupil could solve the problem. The teacher’s approach would be very different if the bad behaviour were evenly distributed across all the pupils in the class. In the same way, the methods used to control an epidemic should be different depending on the distribution and clustering of infections.  

Cluster Busting

It isn’t always clear why a disease spreads in clusters. It may be because individuals simply transmit the virus at different rates through varied shedding of viral particles, or perhaps by a difference in how soon after infection individuals can spread it to others. While there is still more to understand here, it is quite clearly established that certain settings are more likely to result in lots of transmission. Poorly ventilated, indoor settings are thought to be responsible for the majority of transmission – weddings, churches, funerals, gyms, meat-packing facilities, schools, restaurants and all others that also tend not to involve the wearing of masks are regularly identified as hotspots. We have already seen these establishments at the focus of national and local lockdown measures – when a virus spreads through super-spreading events, it is crucially important to prevent those events from occurring.   

Many nations do not consider clustering in their contact tracing measures. A lot of contact tracing methods are prospective, meaning that when an individual tests positive for COVID-19, tracers try to find out who their close contacts were after they became infectious, so we can warn, isolate and test potential exposures.  

However, there is an argument that due to the clustering exhibited by this virus, all contact tracing should be retrospective in nature, meaning contacts should be traced back from the identified case as well as forwards.  

Because of the clustered nature of COVID-19 infections, most people will have been infected by someone that infected a lot of other people – because most people do not infect anyone at all. Retrospective contact tracing tells us that if we trace back the contacts of an infected person and find who infected them, then trace forward from there then we will find a large number of cases and at-risk groups. Simply using prospective tracing would only give us an indication of potential cases, many of which will not lead to further infections.  

A lot of contact tracing is prospective, or forward contact tracing. This method can miss super-spreading events and mean many potential pathways of infection go unchecked.
A combination of retrospective – or ‘backward’ – and prospective – ‘forward’ – contact tracing acknowledges the importance of super-spreading events and may result in more potential exposures being discovered.

A recent study suggested that retrospective contact tracing could be far more effective, stating:

“Forward tracing alone can, on average, identify at most the mean number of secondary infections [the R number]. In contrast, backward tracing increases this maximum number of traceable individuals by a factor of 2-3”.  

What all this tells us is that a combination of backward and forward contact tracing would more accurately reflect the way the virus spreads, by identifying clusters and super-spreaders. Forward tracing is not without merit, but it should not be the centrepiece of test-and-trace systems. 

The Future of COVID-19

Overdispersion and clustering is a difficult thing for us to contend with when trying to understand the dynamics of this pandemic. We want to believe in a simple model: patient A infects three people, who then each go on to infect three more. We want this to be the case because that makes the pandemic easier to predict and mitigate. Highly varied dispersal of infections adds randomness and inexplicable variation to patterns of spread. It breaks down cause-and-effect relationships, which makes studying the impact of certain factors and establishing multi-nation studies more difficult.  

South Korea and Japan, both of whom were the subject of dire warnings about an explosion of cases, overwhelmed hospitals and soaring death tolls, have demonstrated how to mitigate even a highly varied epidemic. South Korea, whose epidemic took off after a super-spreading event through the infamous Patient 31, has particularly demonstrated the value of backward contact tracing, eliminating outbreaks before they have a chance to take off.  

Countries around the world are in a fine balancing act – easing restrictions with one eye on growing case numbers. If the emerging importance of the K number tells us anything, it is that public perception of how this virus spreads needs to change. Our assumption should not be that most people transmit evenly to two or three others, but instead only a small number of infections dominate the majority of transmission.  

As we look to avoid the most catastrophic of economic consequences, we will have to learn to engage in ‘cluster-busting’ to prevent super-spreading events, which seem to be the key to the wide-scale transmission of this virus. This can be achieved in two ways – firstly by continuing to limit the opportunity for super-spreading to occur through identified of super spreading hotspots and avoiding dense, poorly ventilated areas, all the while emphasising the wearing of masks. This should be complemented by a shift to embracing the methods of contact tracing that specifically recognise the importance of super-spreading and the clustered nature of infections.  

This virus will be with us for a long time. Our success in preventing both a health and economic crisis will rely on our ability to understand how it spreads and continually adapt our measures to control it accordingly.     

Joe

Having studied Biomedical Sciences, I have spent my career sharing my passion for science and making life-changing educational opportunities accessible for anyone, no matter their background. This blog is another way of sharing the stories and ideas that fascinate me - I hope you find them just as interesting!

Previous Story

Why do so many viruses come from bats?

forest covered by smoke
Next Story

Smokescreen - The Deadly Secret Veiled in Vapour