Scunthorpe

“What about a holiday in Lincolnshire?”

“Sounds good. Maybe some golf?”

“Yes. I need to work on my handicap. Or rather, our handicap. Tomorrow?”

“Great. See you there – I can’t wait.”

I start setting up a normal weekend getaway. I hope Ben gets the golf reference right so he brings the St. Andrews files. I did not dare to mention their existence during the call.

 

The first version was online censorship systems blocking or changing offensive words. Many were too literal-minded: looking for a sequence of characters they blocked legit names and locations (“the Scunthorpe problem”) or did substitutions that made things worse (“the clbuttic mistake”). But as spam filtering and text processing improved, so did performance.

The next version was the detection of IP violations. Rolled out during the IP Wars of the early 21st century they were also at first laughable: blocking videos of political conventions because of some strand of protected music, blocking NASA footage because some news channel had the footage inside its own IP-protected material. But the economic incentives for getting it right were enormous.

 

“Martha, could you clear a trip to Ashby?”

“The golf course near Silica? Trent is Level 4 advisory. Would Canwick do?”

My office automation is too clever, so it takes me a bit of negotiation to set up a travel plan that passes the insurance and safety guidelines. The irony is that due to my seniority I have less freedom than many of the junior filterers. I am essential personnel, as HR loves to tell me.

 

Being able to block certain information was of course useful to governments, democratic or not. The IP infrastructure was a natural synergy: if a protest movement began using a certain meme or symbol, just file a spurious infringement complaint and get it blocked everywhere. Much more seamless than direct takedown notices or using the anti-paedophilia infrastructure since people were used to seeing information blocked for IP reasons every day. Sure, humans are innovative and will quickly invent new codes. But it could stop riots from reaching the percolation threshold, prevent memes from reaching consensus, hide embarrassing facts, and the system gave you surveillance for free.

 

“Could you help me access the ClearWater forum? It forgot that I am the PI again.” Hannu asks as I pass the work den. I beam over a signed limited-time certificate and wait for the acknowledgements that it has diffused properly: it wouldn’t do to seem in a hurry. We chitchat about the annoying roadworks outside – why were we not informed about them beforehand?

Hannu, Ben and Ali used to argue with me about how to strike a balance. Of course we saw the potential totalitarian aspects, and we did try to find solutions.

 

Then came the Cytokine Wars. 2 billion dead. As battling biohackers – if that was what it actually was – spread their designer plagues using commodity DNA printers and home pharma systems it became clear that some information must be stopped at all costs. Filtering dangerous sequences had a high price, but it managed to quell the War. There were of course plenty of problems – researchers trying to find cures finding their files censored or even being hauled away by some intelligence service with no notion of what was going on. Natural species censored for reasons nobody could put a finger on: who knows what Gymnocalcium hides, since no legal lab equipment will touch it? Limits to human freedom, inquiry and innovation, yes… but 2 billion dead. It could have been 8.

Since then things have accelerated. We have other technology now, technology that makes the early biohacking look tame. Macro-EBC origami, rosettatronics, even charge-flipping. Each potentially worse than all nightmares of designer plagues and white goo together.

People have mostly accepted it. At least we do not hear much criticism or suggested alternatives. Ali of course suggested that it was because the system increasingly filters out criticism, dampening it until it no longer has social critical mass. But he was always a bit of a paranoid.

 

“Have you heard from Ali recently?” Tina asks in the elevator.

“Isn’t he on a sabbatical at Zhejiang?”

“I thought so, but I cannot reach him.”

“I’m sure he is just on one of his surfing holidays.”

Tina looks sceptical, briefly glances upwards, but agrees. I just smile and ask how the kids are doing.

 

The problem with a censorship system is that it tends to censor discussion about itself. It is only natural: if you know how it works you can undermine it, unleashing danger. At first our system even ended up censoring itself, getting dragged into amusing software loops as it tried to hide evidence that it was trying to hide evidence. But huge investments of effort and ingenuity solved the problem. It now balances itself like the immune system, with virtual antibodies forming epistatic networks: filter autoimmunity was a solved problem, we proclaimed. And excessive false positives could always be managed, we thought.

But now… At first it was just more and more glitches on the whitelisted forums where we did our development, slowing discussion. We were losing contact with other experts. Then access to the software layer became blocked. Colleagues trying to fix things disappeared.

 

They grab me on the parking lot. Nondescript people whose faces I cannot focus on. I try to beam my clearance and certificates at them, but the communication is blocked.

“What is it? What are you accusing me of?” I ask loudly.

They cannot hear or answer.

What makes a watchable watchlist?

Watch the skiesStefan Heck managed to troll a lot of people into googling “how to join ISIS”. Very amusing, and now a lot of people think they are on a NSA watchlist.

This kind of prank is of course by why naive keyword-based watch lists are total failures. One prank and it gets overloaded. I would be shocked if any serious intelligence agency actually used them for real. Given that people’s Facebook likes give pretty good predictions of who they are (indeed, better than many friends know them) there are better methods if you happen to be a big intelligence agency.

Still, while text and other online behavior signal a lot about a person, it might not be a great tool for making proper watchlists since there is a lot of noise. For example, this paper extracts personality dimensions from online texts and looks at civilian mass murderers. They state:

Using this ranking procedure, it was found that all of the murderers’ texts were located within the highest ranked 33 places. It means that using only two simple measures for screening these texts, we can reduce the size of the population under inquiry to 0.013% of its original size, in order to manually identify all of the murderers’ texts.

At first, this sounds great. But for the US, that means the watchlist for being a mass murderer would currently have 41,000 entries. Given that over the past 150 years there has been about 150 mass murders in the US, this suggests that the precision is not going to be that great – most of those people are just normal people. The base rate problem crops up again and again when trying to find rare, scary people.

The deep problem is that there is not enough positive data points (the above paper used seven people) to make a reliable algorithm. The same issue cropped up with NSA’s SKYNET program – they also had seven positive examples and hundreds of thousands of negatives, and hence had massive overfitting (suggesting the Islamabad Al Jazeera bureau chief was a prime Al Qaeda suspect).

Rational watchlists

The rare positive data point problem strikes any method, no matter what it is based on. Yes, looking at the social network around people might give useful information, but if you only have a few examples of bad people the system will now pick up on networks like the ones they had. This is also true for human learning: if you look too much for people like the ones that in the past committed attacks, you will focus too much on people like them and not enemies that look different. I was told by an anti-terrorism expert about a particular sign for veterans of Afghan guerrilla warfare: great if and only if such veterans are the enemy, but rather useless if the enemy can recruit others. Even if such veterans are a sizable fraction of the enemy the base rate problem may make you spend your resources on innocent “noise” veterans if the enemy is a small group. Add confirmation bias, and trouble will follow.

Note that actually looking for a small set of people on the watchlist gets around the positive data point problem: the system can look for them and just them, and this can be made precise. The problem is not watching, but predicting who else should be watched.

The point of a watchlist is that it represents a subset of something (whether people or stocks) that merits closer scrutiny. It should essentially be an allocation of attention towards items that need higher level analysis or decision-making. The U.S. Government’s Consolidated Terrorist Watch List requires nomination from various agencies, who presumably decide based on reasonable criteria (modulo confirmation bias and mistakes). The key problem is that attention is a limited resource, so adding extra items has a cost: less attention can be spent on the rest.

This is why automatic watchlist generation is likely to be a bad idea, despite much research. Mining intelligence to help an analyst figure out if somebody might fit a profile or merit further scrutiny is likely more doable. As long as analyst time is expensive it can easily be overwhelmed if something fills the input folder: HUMINT is less likely to do it than SIGINT, even if the analyst is just doing the preliminary nomination for a watchlist.

The optimal Bayesian watchlist

One can analyse this in a Bayesian framework: assume each item has a value x_i distributed as f(x_i). The goal of the watchlist is to spend expensive investigatory resources to figure out the true values; say the cost is 1 per item. Then a watchlist of randomly selected items will have a mean value V=E[x]-1. Suppose a cursory investigation costing much less gives some indication about x_i, so that it is now known with some error: y_i = x_i+\epsilon. One approach is to select all items above a threshold \theta, making V=E[x_i|y_i<\theta]-1.

If we imagine that everything is Gaussian x_i \sim N(\mu_x,\sigma_x^2), \epsilon \sim N(0,\sigma_\epsilon^2), then  V=\int_\theta^\infty t \phi(\frac{t-\mu_x}{\sigma_x}) \Phi\left(\frac{t-\mu_x}{\sqrt{\sigma_x^2+\sigma_\epsilon^2}}\right)dt. While one can ram through this using Owen’s useful work, here is a Monte Carlo simulation of what happens when we use \mu_x=0, \sigma_x^2=1, \sigma_\epsilon^2=1 (the correlation between x and y is 0.707, so this is not too much noise):

Utility of selecting items for watchlist as a function of threshold. Red curve without noise, blue with N(0,1) noise added.
Utility of selecting items for watchlist as a function of threshold. Red curve without noise, blue with N(0,1) noise added.

Note that in this case the addition of noise forces a far higher threshold than without noise (1.22 instead of 0.31). This is just 19% of all items, while in the noise-less case 37% of items would be worth investigating. As noise becomes worse the selection for a watchlist should become stricter: a really cursory inspection should not lead to insertion unless it looks really relevant.

Here we used a mild Gaussian distribution. In term of danger, I think people or things are more likely to be lognormal distributed since it is a product of many relatively independent factors. Using lognormal x and y leads to a situation where there is a maximum utility for some threshold. This is likely a problematic model, but clearly the shape of the distributions matter a lot for where the threshold should be.

Note that having huge resources can be a bane: if you build your watchlist from the top priority down as long as you have budget or manpower, the lower priority (but still above threshold!) entries will be more likely to be a waste of time and effort. The average utility will decline.

Predictive validity matters more?

In any case, a cursory and cheap decision process is going to give so many so-so evaluations that one shouldn’t build the watchlist on it. Instead one should aim for a series of filters of increasing sophistication (and cost) to wash out the relevant items from the dross.

But even there there are pitfalls, as this paper looking at the pharma R&D industry shows:

We find that when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models’ brute-force efficiency.

Just like for drugs (an example where the watchlist is a set of candidate compounds), it might be more important for terrorist watchlists to aim for signs with predictive power of being a bad guy, rather than being correlated with being a bad guy. Otherwise anti-terrorism will suffer the same problem of declining productivity, despite ever more sophisticated algorithms.