I've been thinking more and more lately as to what problems in the world can be reduced to a math problem solved by analyzing massive sets of data. It's along the lines of Chris Anderson's fantastic Wired cover story, The End Of Science. Some hard problems become easier if there was a lot of data collected about them. It's been a major theme of this blog. I suggested in 2006 that math can even be applied to better understand history.
Just when I was thinking of problems, the biggest and baddest, the current financial crisis hit in September. How could this have been avoided? Could big data help us here?
Could the answer be in linguistics?
Here we digress for a second. One of the more brilliant CTO's I've worked with (most MIT PhD's are), suggested over dinner once that there was a theory that English was the language most suited for deception. The language where you can say a lot and mean nothing, and more importantly conceal your real intentions. It was postulated at that very same dinner that it was this language advantage that made the English speaking world dominate the world. You can tell we had some really good wine that night.
Let's assume that there is no such theory that says English is better suited for deception, but let's accept that some languages may be better or worse in hiding emotions and intent. Even the position of the verb in a sentence could make a difference. In Turkish it's at the end, in English it is at the second place, and in German it is both. If languages can be analyzed this way for the ability to lie, with enough data can computers determine whether a person is prone to lie, or whether a certain paragraph is likely to be misleading?
Can Google, if looking at enough data of somebody's spoken or written text, tell whether the person is misleading, naive, or powerless over his/her own BS?
Could the Google computer see through Madoff, and the Ponzi scheme he pulled?
If you can make the leap of faith, that this is possible, the rest is easy, we are almost there. Surely getting the data is not a problem. One day every spoken word by any public speaker will be indexed. Quotes in articles, speeches, radio talks all of that will go into the computer to be analyzed. All that data be used to extract indices of emotions, quantified and relevant.
Why stop at individuals either? People can be grouped together and collectively analyzed. A common honesty coefficient can be found for them as a group. Then perhaps, some questions like "Are the bank CEO's being honest?", "Does this administration believe what they are saying about the effect of a stimulus package?" can be conclusively answered.
Would that help figure out if a crisis is coming? It just might. What do you guys think?


Interesting. And the possibilities are not limited to just language analytics. There's quite a bit of work in determining emotions (as well as stress levels, etc.) in speech. (see http://courses.media.mit.edu/2004spring/mas630/04.projects/zeynep_ron/) If we had audio clips of Madoff, could a machine tell that he has lying? I'd love to put this up agains political rhetoric. Remember weapons of mass destruction in Iraq? :)
Posted by: Cem Sertoglu | December 16, 2008 at 12:28 AM
I believe the main problem is very far from detecting lies. We may be able to detect lies perfectly one day, but i'm not sure that will be a solution. The main problem is very simple : to be able to predict the right thing for us, or even more simply, to predict that something will be good/useful/profitable or bad for us. This financial crisis seems to be much bigger than a total of few schemes. Using Cem's example; the existence of weapons of mass destruction in Iraq may or may not be true (even though they couldn't find any, it doesn't mean there wasn't any), but did saying they exist and making the operation help? From the liberated (at least, from Saddam) Iraqis point of view, it helped. From the army's point of view, they did found some bad guys over there, so it helped. But financially, billions of dollars, which could be very useful to help the economy and fund the officials to find the schemers (the funds were cut down after 2001 for these departments), were spent, so it didn't help. And a good economy indeed would be helpful to catch the bad guys, so it didn't help. Such dilemmas exist for the economy. Although there are some mathematical analysis such as game theory to help, there aren't any exact solutions, and you can't use data before creating the algorithm.
Posted by: Me | December 16, 2008 at 03:05 PM
Excellent points. If I read you right, you are saying that if you don't know what's right for you, what good is it to know who is lying. True.
But if you flip that around, wouldn't it be good to know what is a lie and what isn't to help you figure out what's right?
Posted by: baris | December 16, 2008 at 03:21 PM
I think it was Malcolm Gladwell who also argued that Asian languages are more suitable for children to learn the numbers quicker (i.e. not only are numbers very short and mono-syllabic sounds, but also all you need to learn as a child is around ten numbers/words and the rest is adding them up; for instance seventy-four is seven tens and four, or fifteen is one ten and five, etc), effectively making them better at math at younger ages than their English speaking peers worldwide.
Posted by: marty | December 17, 2008 at 03:00 AM
A quick response to the Iraqi comment: for the hundreds of thousands of Iraqis that were killed since the US-led invasion in 2003 (some estimates go above one million people), it definitely didn't help.
Posted by: marty | December 17, 2008 at 03:01 AM
I agree that it would be good to know what is a lie, but it will only marginally help us figure out what's right, because everybody lies even if it's good for all. The art of persuasion is almost identical to lying, and it has to be. People don't want change, and we already established that we don't even know what's right for us. So if you have an idea of a good thing (it can be good for just you or everyone), you have to lie and this is probably why English is probably the language most suited for deception. You can see examples of this everywhere: from detergent ads claiming it provides "whiter than white", to politicians. Most people believe Ataturk deceived people into founding a republic and secularism. If it was deception, it was a necessary one and it would be bad if people found out it was a lie. On the other hand, Hitler believed in what he was saying, so he wasn't even lying. Therefore even if you do have the best detergent (killing all the bacteria), you may have to lie.(and offer "whiter than white") In such conditions, detecting lies can hardly help.
Posted by: Me | December 17, 2008 at 08:28 AM
I recently came across a really interesting book on this topic: Numerati by Stephen Baker. It discusses building predictive mathematical models of people, privacy issues, and bunch of other thought proving topics.
Another application of such detection technologies is currently being used in brand monitoring for companies and agencies. There are bunch of innovative companies working on sentiment extraction, etc. using complex semantic based algorithms.
Posted by: Baris Aksoy | December 19, 2008 at 03:23 PM