Showing posts with label: data science. Show all posts.

New Dark Age - by James Bridle

Monday 12 November 2018

I had seen Mr Bridle's TED talk about YouTube videos for children, which I found interesting enough to get and read his book. Unfortunately, I was seriously underwhelmed by it. I usually wait a few weeks after finishing a book before I write about it to let it digest properly, but in this case I don't really remember much of what he said, which is probably not a good sign.

The book covers the downsides of technology, specifically AI, which is a subject I am extremely familiar with so it's possible I was already familiar with most of the points he made. It's also possible that I disagree with many of his opinions on the downside of AI given that my familiarity with the technology precludes me from buying into many of the fearful opinions many people have, which are more based in movies than in any actual aspects of the technology.

As of the state of the art today, AI or machine learning is just converting a real-world problem into a mathematical function which you then feed a lot of data into and hopefully it will result in a function which accurately predicts data it has not seen. Mr Bridle seems to think it a problem that many of these models are "black boxes" - meaning that we don't understand what happens inside, we just understand the input and the output. I personally don't understand why this is such a problem. I don't think anyone really understands quantum mechanics but that doesn't really matter so long as the predictions are accurate.

There were certain parts of the book which I thought were very interesting, but overall the book did not make much of an impression. However, as I said, I am very familiar with the topic covered and someone who is not may get more out of this book.

Labels: technical, books, data science, technology
No comments

Given my interest in data science I was very excited to read this book, and I was not disappointed. The book mainly discusses information that can be gleaned from web searches, and how it differs from how people respond to surveys and polls, which is a rather narrow topic, but the author manages to find some rather interesting tidbits from the data.

I am more interested in scientific applications of data science, but for people who are not interested in the subject, the book gives a nice overview of what data is really about. Here is an example - you want to find out about something, say how happy people are in their marriages. You send out surveys asking people about how happy they are with their spouses. The people who are responding to the surveys can say whatever they want. Maybe they are miserable, but they want to project a positive image so they say they are very happy. Maybe they see all of their friends on Facebook constantly posting about how wonderful their husbands are so they say their husbands are wonderful too. The researcher receives the surveys and concludes that all marriages are wonderful.

In the meantime the people who filled out the surveys are going onto Google and searching for "I hate my husband" or "how can I tell if my husband is cheating?" This turns out to not be too far from the actual case. For a variety of reasons people are going to say certain things although those things may not be quite true. On Facebook people tend to post idealized pictures of themselves and idealized versions of their lives. But when they go to Google the searches they perform are going to be more honest and revealing. 

Google records every single search made (although the data is anonymized) and Mr. Stephens-Davidowitz has gone through those searches to attempt to draw some actual conclusions about people. For me, the results were much as expected, though less cynical people may be in for quite a shock. One example - after Obama was elected searches for "n-word president" shot through the roof. And the places where those searches were concentrated voted heavily for Donald Trump. Search data seems to indicate that racism in the US is alive and well and also seems to indicate that the election of Trump was largely driven by a racist backlash against the election of Obama.

While the fact that most people do not describe themselves as racist may seem to contradict the search data, in my opinion not many people think of themselves as racist. The people who really are racist are probably going to say "I am not racist, it's just a fact that other races are inferior to mine." Depending on people to self-report their thoughts and attitudes and actions is a very unreliable way to gather information. The Dunning-Kruger effect describes how people who are experts in a subject matter tend to downplay their expertise, while those who are not experts tend to consider themselves far more knowledgeable than they really are. The experts know enough to know how much they don't know, while the non-experts think they know it all. This describes a general inability for people to really objectively evaluate themselves, and this is where data comes in.

Data is a truly objective description of reality. There is the saying "if you torture numbers enough they will confess to almost anything" which means that it is easy to draw almost any conclusion from a large enough data set. Data science is the science of trying to find signals in data in an objective way, and that is something that is desperately needed in the world today, especially as experts are labelled "elitists" when they say things people don't want to hear.

I, as a data scientist, enjoyed this book very much. However it is written in such a way that you do not need to be a data scientist to understand it. I highly recommend this book.

Labels: books, data science
No comments