When machine learning expert, Rebecca Portnoff, learned data overload was part of the problem in identifying human traffickers, she thought artificial intelligence might help solve the problem.
It’s what classic detective stories are made of: a scarcity of information. In fiction, that lack of data or evidence, often meant to solve a difficult case, authorities would call in a Sherlock Holmes, a Hercule Poirot or a Miss Marple to piece together snippets of insight to crack the case. Up until recently, law enforcement suffered a similar shortage of clues in real life.
Today, suddenly, they often face the precise opposite: an abundance of information. They are swamped with data – from phone records to tweets to historic crime records. The job now is to sort the wheat from the chaff. This requires a whole new skill set and techniques, and especially new technology.
Future master detectives will partner with a different kind of Watson.
Artificial Intelligence can be applied to nearly any kind of data problem and is already used in sectors ranging from banking to mining to medicine. We’re only starting to see AI networks’ potential uses, says Fernando Lucini, Accenture Digital’s Managing Director for Artificial Intelligence.
“They do amazing work in pattern matching situations: What’s normal versus what’s not normal?” he explains. “Think of all the use cases you could put out there where technology can tell you what’s normal and what’s not?”
Rebecca Portnoff is a PhD student at Berkeley University specialising in how to apply machine learning – a branch of artificial intelligence focused on training machines to replicate human decision making – to help identify and rescue trafficked people. As a computer science undergraduate, she wondered – like many idealistic students – how she could apply her skills to help people.
She read Half the Sky by Nicholas Kristof about female oppression and was moved by stories in the book about human trafficking. She wondered if she might help and picked up the phone. “I spent a year cold-calling NGOs asking them what help they might need from people like me.”
She heard the same problem over and over again: data overload. Fortunately, this is precisely the sort of problem that machine learning could tackle. She came across Thorn, a charity which tries to stop child abuse and trafficking by developing new technological solutions and working with law enforcement to apply them.
“What I heard was that there was too much data, and they didn’t know how to find the right people.”
Thorn – where Rebecca now works as a data scientist – sketched out the scale of the task. There are millions of slaves trafficked around the world, and approximately 15,000 are trafficked into the U.S. each year. It’s a multi-billion dollar industry. Much of it is not international trafficking at all – it’s U.S. citizens sold on by people who they might have met online, or thought they were in a relationship with. Sometimes it’s runaway kids who have been abused at home. All too often, it’s poor or marginalised people, who don’t have a support group to help them.
So how to rescue them? It’s well known to the authorities that trafficked people often appear on classified advertising websites that have sections dedicated to adult services, including escorts. The problem is that those ads appear, on the surface, indistinguishable from legal adverts. They are hiding in plain sight within the thousands of new adverts that are uploaded every day to the site. Law enforcement usually just scroll through the ads and read through each one manually, hoping to find a name or spot some other clue or pattern. But of course, criminals are smart, and always use multiple phones and several email accounts. It’s infuriating – trafficked slaves are almost in plain sight, and yet so hard to find.
This is where machine learning comes in. Rebecca’s idea was to use a branch of machine learning called “stylometry” to find ads based on “latent” data. This is a technique of analysing language patterns to attribute authorship. (It was used to identify J.K. Rowling as the author of “The Cuckoo’s Calling”.) Everyone unwittingly gives away clues in their style of writing, like certain word repetition, emoji use, punctuation. Although you don’t realise it, your writing style is like a fingerprint. By uploading the Harry Potter series, algorithms were able to determine that it was extremely similar to The Cuckoo’s Calling, which had been written under a pseudonym.
In the same way, Rebecca thought it would be possible to determine if two adverts were written by the same person, even if they used fake names and different emails. This is important because, typically, someone trafficking people is selling multiple people at the same time – and so this is a vital clue to help the authorities focus in on suspects. (In conjunction with this work, Rebecca is also analysing Bitcoin transactions on the site, trying to link wallets to specific adverts and identifying suspicious clusters of activity where a lot of money is being transferred).
No human could spot subtle language style: but with enough data, a machine can. To test the idea, Rebecca trained an algorithm based on adverts written by the same people (say 20 by the same vendor) and then thousands of others written by different people. She then tested how well it could identify a previously unseen advert that was written by an author she’d already identified in her set. Her model turned out to be remarkably accurate. When she tested it on 91 cases, she accurately identified 90 matches. This was all done on historical data: but her team at Thorn is working with law enforcement to apply it to new cases in progress.
“It’s an arms race.”
Obviously, this creates new problems. The same technique could be used by the authorities to unmask people who are staying hidden for legitimate reasons, such as investigative journalists. “I do worry about the privacy implications,” she told me. “It’s one of the difficulties of working in this space. It really depends on who is using the tool.” What’s more, any criminal reading this will of course immediately change their language use to trick the system. “I’m sure there will soon be freely available anti-stylometry software,” she said. "It's an arms race." And so it is. But there are many young victims who need the authorities to win it.
Read More Stories on AI