Consider an app like Siri, which, when you stop to think about it, combines two amazing feats: it automatically converts what you say into text (and doing it for a wide range of accents and languages!) and it also answers a wide range of questions, using information it finds by searching the Internet, based on this text. Or take facial recognition on a social network. Have you ever stopped to wonder exactly how a computer recognises and automatically tags pictures of you and your friends?

Whilst humans are very good at recognising and categorising objects, when it comes to certain tasks there’s just too much information for any human to get through. So if we wanted to set computers on such tasks, how would we find efficient rules for computers to follow?

Have you ever stopped to wonder exactly how a computer recognises and automatically tags pictures of you and your friends?

Machine learning is the field of computer science dedicated to teaching computers how to learn from big datasets. An algorithm is a set of rules, a recipe, for a computer to follow. What makes machine learning algorithms special is that they rely on data to work, not a programmer telling it what the rules are; and the more data the better. That means a machine learning algorithm is only as good as its data, so, for example, an image recognition algorithm is going to be very accurate at identifying Big Ben, because there are so many pictures of it on the Internet for it to learn from, but it won’t be as good at identifying your cat Ben, unless of course you happen to post lots of pictures of Ben on the Internet.

When you combine statistics, which is all about understanding data, with computer science, which is about telling computers how to process data in the most efficient way possible you get machine learning; efficient algorithms that automatically learn from data, improving as they get more data.

Big datasets are all around us, and they’re being collected all the time. When you tag yourself and others in photos, you’ve contributed data. And when everyone is doing this everyday it quickly becomes big data! When you click on a web search result you’re teaching the algorithm that it was the best result, so it’ll be more likely to show that to someone like you that makes that search. Data can also be created by specialists, for example doctors who record medical diagnoses in databases, which scientists can then use to develop new methods for diagnosis.

Every day the equivalent of 65,000 peoples’ worth of energy is used by machine algorithms just for Google searchers – that’s half of Oxford’s population – every day! So if we can make google searches more efficient, we can cut down on a lot of energy and hence CO2 emissions.

In our example we have a large set of images of cats and dogs labelled for us. We feed these into our algorithm, which you could consider like a black box. The algorithm is trying to learn a relationship so that when it sees a new picture it can accurately classify it and provide an out of either “dog” or “cat”.

Let’s look inside the black box and see how it might do that. A class of simple, but highly effective, algorithms focuses on learning a decision boundary; a line that separates images of cats from dogs.

We can think of laying out these pictures on a graph, and we want to find a way of positioning them before drawing the decision boundary. There’s an infinite number of ways of arranging the pictures, and a lot of the engineering work that goes into machine learning focuses on coming up with good ways of telling the computer to process the images so that it can arrange them.

Once the computer has sorted the pictures, with cats over here and dogs over there the computer searches for the best boundary to separate the cats from the dogs, maybe just a straight line to start.

When we feed in a new picture we see where it falls, then we check which side of the boundary it’s on and output “cat”. And so on.

*Spatial representation of machine learning decisions.*

Now imagine teaching a computer to solve this problem, but instead of having just cats and dogs, it has pictures of more than a billion different people to decide amongst! Or imagine trying to solve this problem without having big computing resources in a cloud, but solving it entirely on your phone.

Or, what if we’re searching for criminals, rather than cats or dogs? Then we’ll also need to rate our level of confidence in our predictions.

Our group at the University of Oxford focuses on scaling to big problems, solving them quickly or with less computing power, and trying to quantify our degree of certainty in our predictions.

In particular, our focus lies in probabilistic modelling, that is using probability, to assign a value to the uncertainties inherent in the world; from the theories and developing methods, all the way to applying them to real world data. Analysis of (factors correlated with) US presidential election outcomes, modelling social networks and document classification are only a few of the applications that we explore.

## Add comment