Kären Spärck Jones: The Forgotten Mother of the Search Engine
Tue, Jan 15th, 2019
Google processes over 40,000 search queries every second, adding up to nearly 3.5 billion searches each day!
When you really think about the sheer volume of those numbers, and the data they yield, it really is not hard to understand how Google has been able to position itself as the juggernaut of the tech industry. And although most of us take this technology for granted today, search engines like Google were little more than an outlandish, even laughable, concept in the not too distant past.
In the 1960s and early 1970s most scientists were still focusing on code as the exclusive communication medium between people and computers.
However, that would all change in 1972 when a still relatively unknown Cambridge professor published a groundbreaking research paper in the Journal of Documentation entitled “A Statistical Interpretation of Term Specificity and Its Application in [Information] Retrieval” that would change the way we interact with technology forever and, indeed, lay the very framework of what would become the modern search engine... that professor’s name was Kären Spärck Jones and, despite her incredibly innovative work in the fields of computer programming, linguistics and software security, her story and seminal contributions to what has emerged as one of the largest industries in the world have been mostly lost to the annals of history. No more.
Who is Kären Sparck Jones?
Kären Ida Boalth Spärck Jones was born in the modest textile manufacturing town of Huddersfield, England on August 26th, 1935. The only child of Alfred Owen Jones, a chemistry lecturer, and Ida Spärck, a Norwegian national who fled Norway to England amid the Nazi occupation and continued to work for her homeland’s deposed government throughout WWII whilst in exile in London, Spärck Jones showed a great affinity towards the world of academia even in her early years.
At age 12 Spärck Jones declared to her father that she wanted to study History at the prestigious University of Cambridge. Sitting in her Cambridge office in 2001, just 6 years before her death, Spärck Jones recounts this pivotal time in her life to Janet Abbate,
“My parents were very keen that I should have a proper education, and encouraged me in thinking about things like wanting to go to university. I was an only child. My father was older, because his first wife had died, and so he’d married mother as his second wife. He was very pleased: you had to take a competitive examination to come to Oxford or Cambridge then, and I think he was very excited that I actually got in to Cambridge—because my school, unlike some of the private schools, which trained people to come to university, didn’t train people to go to Oxford or Cambridge…I’d said I wanted to come to Cambridge from about the age of 12, and I did!”
(Karen Spärck Jones, an oral history conducted in 2001 by Janet Abbate, IEEE History Center, Hoboken, NJ, USA.)
It was during her first year at Cambridge, where she studied History and then Moral Sciences (modern day Philosophy), that Spärck Jones first encountered her future husband Roger Needham, a brilliant young man studying Mathematics and Philosophy. In 1956 Spärck Jones graduated with a degree in History and began teaching at a nearby grammar school. Simultaneously, Needham took an interest in computing, enrolled in a 1-year course entitled “Diploma in Numerical Analysis and Automatic Computing”, and became involved with the Cambridge Language Research Unit, headed by Margaret Masterman.
While teaching, Spärck Jones would go to Cambridge on the weekends to, “go in to find out what the Language Unit was doing, because I thought that they were rather interesting” (Karen Spärck Jones, an oral history conducted in 2001 by Janet Abbate, IEEE History Center, Hoboken, NJ, USA.). Eventually catching the attention of Masterman, Spärck Jones was offered a research position within the Language Unit and began working on machine translation and information retrieval.
The major turning point in her career came during her time working for Masterman. Spärck Jones wanted to figure out a way to program a computer to successfully understand words that could have varying means in varying contexts, leading her to begin the arduous task of programming a massive thesaurus. As a result of this work Spärck Jones published “Synonymy and Semantic Classification” in 1964, a research paper that is now widely accepted as the very foundation of natural language processing.
In 1972 Spärck Jones published a revolutionary paper that would lay the groundwork for the modern search engine. In “A Statistical Interpretation of Term Specificity and Its Application in [Information] Retrieval”, she combined statistics with linguistics to establish formulas that laid down the principles with which computers could interpret the complex relationship between words; introducing the concept of index-term weighting. Additionally, she introduced the world to inverse document frequency, a program that counts the number of times a particular term appears in a given document in order to determine its importance.
The concepts of index-term weighting and inverse document frequency, as outlined by Spärck Jones in 1972, are both recognized as being the founding principles on which every modern search engine operates.
Shortly before her death in 2007, Spärck Jones gave an interview to the British Computer Society in which she acknowledged her tremendous contribution to the world of search engine technology saying, “Anything that does index-term weighting using any kind of statistical information will be using a weighting function that I published in 1972…pretty much every web engine uses those principles”.