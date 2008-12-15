I've been thinking more and more lately as to what problems in the world can be reduced to a math problem solved by analyzing massive sets of data. It's along the lines of Chris Anderson's fantastic Wired cover story, The End Of Science. Some hard problems become easier if there was a lot of data collected about them. It's been a major theme of this blog. I suggested in 2006 that math can even be applied to better understand history.

Just when I was thinking of problems, the biggest and baddest, the current financial crisis hit in September. How could this have been avoided? Could big data help us here?

Could the answer be in linguistics?

Here we digress for a second. One of the more brilliant CTO's I've worked with (most MIT PhD's are), suggested over dinner once that there was a theory that English was the language most suited for deception. The language where you can say a lot and mean nothing, and more importantly conceal your real intentions. It was postulated at that very same dinner that it was this language advantage that made the English speaking world dominate the world. You can tell we had some really good wine that night.

Let's assume that there is no such theory that says English is better suited for deception, but let's accept that some languages may be better or worse in hiding emotions and intent. Even the position of the verb in a sentence could make a difference. In Turkish it's at the end, in English it is at the second place, and in German it is both. If languages can be analyzed this way for the ability to lie, with enough data can computers determine whether a person is prone to lie, or whether a certain paragraph is likely to be misleading?

Can Google, if looking at enough data of somebody's spoken or written text, tell whether the person is misleading, naive, or powerless over his/her own BS?

Could the Google computer see through Madoff, and the Ponzi scheme he pulled?

If you can make the leap of faith, that this is possible, the rest is easy, we are almost there. Surely getting the data is not a problem. One day every spoken word by any public speaker will be indexed. Quotes in articles, speeches, radio talks all of that will go into the computer to be analyzed. All that data be used to extract indices of emotions, quantified and relevant.

Why stop at individuals either? People can be grouped together and collectively analyzed. A common honesty coefficient can be found for them as a group. Then perhaps, some questions like "Are the bank CEO's being honest?", "Does this administration believe what they are saying about the effect of a stimulus package?" can be conclusively answered.

Would that help figure out if a crisis is coming? It just might. What do you guys think?