The Quest for Meaning (continued)
Bayes' theorem has proven an effective aid to many sorts of pattern-recognition tasks, such as fingerprint identification, facial matching, and handwriting analysis. Take the problem of teaching a computer to read scribbled-on bank checks. As anyone who's tried to scan a magazine article into a Word file knows, getting a computer to read clean, standard fonts accurately is dodgy enough. Add in the variables introduced by eccentric writing styles, ballpoint versus felt-tip pens, differing rates of ink absorption, plus crumpling and folding of the pages, and traditional optical character recognition can be rendered useless.
The Bayesian model allows a computer to incorporate prior knowledge of the billions of ways handwritten letters and numbers can stray from standard forms, training itself to read the writing on checks by "seeing" lots of examples, thus building a base of prior probabilities to factor into decisions. If, for example, in most of the past 1 million instances the computer discerned a wavy shape that turned out to be an s, then that loopy figure on the check is probably an s, too - unless it's followed by what's probably a 6, in which case it's more likely to be a dollar sign or an 8.
In the emerging world of computing applications that employ such networked Bayes reasoning engines, almost any observable phenomenon can be inferred as a symptom of a hidden cause - whether characters in a document, or the behavior of an office worker repeatedly clicking a button on his keyboard when the computer refuses to respond. If an email message has broken out in exclamation points, perhaps the disease is spam. In the case of the office worker, the ailment might be a toxic interface. If traditional computing seems designed for a binary universe only a microchip could love, Bayes nets are made for the world of uncertainty, conflicting truths, static, and frustratingly incomplete information sets we live in.
One of Peter Rayner's hobbies recalls the problem of the short-order cook: using Bayesian methods to extract clear audio signals from the thickets of random noise on old recordings, resurrecting the glory of '20s jazz musicians from scratchy gramophone discs. As we sipped Côtes du Rhône in the oak-paneled sanctuary of the Masters' Lodge, Rayner and Lynch discussed approaches to boosting the sound quality of MP3 files, searching databases of GIFs to find a particular image, and predicting the transport rate of pharmaceuticals through blood.
At Cambridge, researchers have applied the reverend's notions to disciplines as various as improving hearing aids and determining whether a given dose of a drug will sufficiently anesthetize a surgery patient. "This man of enormous importance for the 20th century - with a philosophy so far-reaching it makes Marx pale into insignificance - was essentially forgotten," Rayner tells me, adding that in a university environment designed to churn out MBAs, the wider implications of Bayes' work would have been overlooked long ago because it seemed to have few practical applications.
Lynch was recognized as an extraordinary, if unorthodox, student while still an undergraduate in the '80s. Rayner - a ruddy, alabaster-bearded, outspoken embodiment of the Cambridge lineage that produces infidel mathematicians - recalls that his now notably successful former student had a tough time getting out of bed. "Mike didn't do any work at all until a quarter of an hour before the exam, when he was miles away from any textbooks. But he used to invent these solutions which were very creative."
The broad sweep of Rayner's academic and cultural interests was a powerful influence on the young engineer, who says his mentor's insistence on problem solving over "hand-waving, headline-grabbing rubbish" encouraged him to think of innovative and practical applications for Bayes' work. It was over morning coffee with Rayner and other graduate students, says Lynch, that he first considered applying the 250-year-old theorem to the task of training computers to recognize patterns of meaning.
Lynch's first company - created in 1991 during his student years and fortified with an impulsive £2,000 loan offered in a pub - is called Neurodynamics. Working for, among others, companies in the British intelligence and defense industries, Neurodynamics uses neural-network technology and Bayesian methods to create applications that specialize in character, handwriting, and facial recognition, as well as surveillance. Lynch enjoyed cooking up solutions for high-level skunk works because, he says, "they have the most interesting problems."
One of the interesting problems Lynch addressed for British intelligence was how to enable computers to make sense of large volumes of words in many languages for a top-secret project. The young entrepreneur, who didn't have the necessary security clearances, was never told what sort of texts the technology would analyze - intercepted email, faxes, leaked documents? - but was instructed to perform his operations on newspaper stories from around the world. Out of that work came the chunk of code called the Dynamic Reasoning Engine, the Bayesian heart of every Autonomy product.
To determine whether two passages are concerned with the same fundamental ideas, Lynch realized, you don't need to know the meaning of each word. In fact, it's not even necessary to be able to speak the language. As long as you can teach the computer where one word ends and another begins, it can look at the ideas contained in a text as the outcomes of probabilities derived from the clustering of certain symbols. The symbol penguin, for instance, might refer to the Antarctic bird, a hockey team, or Batman's nemesis. If it clusters near certain other symbols in a passage, however - say, ice, South Pole, flightless, and black and white - penguin most likely refers to the bird. You can carry the process further: If those other words are present, there's an excellent chance the text is about penguins, even if the symbol for penguin itself is absent.
Lynch had zeroed in on the Achilles' heel of search engines. A search on penguin is just as likely to generate a list of pages about Penguin-Putnam books, the Purple Penguin Design Group, and why Linus Torvalds chose the bird as Linux's symbol as it is to uncover useful data on flightless aquatic birds of the tuxedoed, krill-munching variety.