As an introduction to this blog series, we previously showcased a video that paid tribute to five pioneering women and their influential contributions to laying the groundwork for future generations. Now, we’re at the fourth stop in this journey, a blog series that offers an intimate look into the lives of these women, further exploring how their extraordinary innovations have been instrumental in shaping the world we inhabit.
Our inaugural blog post took us on a trip to the 1800s, stepping into the sophisticated realms of the world’s first computer programmer, Ada Lovelace, and her uniquely eccentric mentor. The subsequent blog post underscored the laudable feats of visionary programmer Dr. Adele Goldberg, her significant role in the evolution of modern computing and graphical user interfaces being a centerpiece of our discussion. In our third article, we shifted our gaze toward Annie Easley, a groundbreaking figure who shattered societal norms to contribute significantly to the aerospace and energy sectors. Now, in this fourth entry, we cast our attention back to Karen Spärck Jones, an acclaimed computer scientist, lecturer and author who embarked on her journey in the seemingly unrelated field of school teaching.
Born in England in 1935 to a British chemist and a Norwegian government employee, Jones was raised in an environment where her education was deemed crucial. Although she suspected her father might have preferred a science degree, Jones graduated from the University of Cambridge in 1956 with a degree in history. Following her graduation, she launched into a teaching career, but swiftly made a transition that more accurately mirrored her father’s hopes: teaching computers to understand human language.
Married to a member of the Cambridge Language Research Unit (CLRU), a small group studying machine translation of human language at the University of Cambridge, Jones often found herself visiting this lab during her tenure as a schoolteacher, driven by curiosity about their work. She eventually met the founder of CLRU, Margaret Masterman, who inspired Jones to join the field of computer science. Masterman, who Jones described as a “wowsky individual” with a strong personality, offered Jones a job at the lab in 1962, which she accepted with enthusiasm. Jones admired the fact that Masterman used her maiden name professionally; Jones herself had retained her own last name after marriage, asserting that “it maintains a permanent existence of your own.”
This job signified Jones’s entry into the realm of computer science. She soon began working on information retrieval (IR), the process of identifying and ranking documents containing valuable information in response to a user’s query. Wanting to incorporate linguistics, Jones embarked on a project to transcribe the entire Roget’s Thesaurus onto punch cards. This was intended to serve as an IR classification framework to categorize words for use in machine translation experiments.
In a 2001 oral history session conducted for the Institute of Electrical and Electronics Engineers History Center, Jones light-heartedly laughed about “exploiting” her husband and his resources. She utilized the IR programs he developed and his access to Cambridge’s more advanced computers to conduct experiments for her concurrently pursued PhD thesis, Synonymy and Semantic Classification. Her groundbreaking work explored how linguistic data, such as semantic groups of words found in a thesaurus, could serve as raw input for computers to decipher natural (i.e., human) language text. This seminal paper is regarded as a cornerstone in the field of natural language processing (NLP) – the science behind teaching computers to understand human language.
Jones quickly determined that she wanted to gain proficiency in computer programming to advance her experiments, and so, with her husband’s assistance, she began teaching herself to program. Jones described her first attempt as an “enormous data-processing program” designed to input data for retrieval experiments. Reflecting on this period, Jones noted this was “a real saga” and recalls how her data came in “gigantic paper tapes that broke when you read them in, because they were so enormous […].”
Jones integrated statistics with linguistics to establish her most acclaimed contribution to the field of IR. This groundbreaking work was introduced in her 1972 paper titled, A Statistical Interpretation of Term Specificity and its Application in Retrieval: Inverse Document Frequency (IDF).
In the field of IR, the importance of a document was traditionally gauged by counting how often a word from a user’s search showed up in that document, called term frequency (TF). The more often a word appeared, the more “hits” a document would receive, thereby elevating its rank in the returned results. However, Jones introduced a fresh viewpoint, advocating that not all hits should be considered of equal weight. She proposed that words should also be assessed by their IDF, a measure of their distinctiveness or rarity.
When IDF is paired with term frequency, it lessens the importance of words in a search query that are very common in each document being searched (for instance, “the” in “the helicopter”), and boosts the significance of words that occur frequently but are found in fewer documents (for example, “helicopter” in “the helicopter”). The TF-IDF score of a word is determined by multiplying these two statistics; the higher the TF-IDF score, the more relevant the word is considered to be in a document. This not only assists in ranking documents presented to the user, but also aids in filtering out those documents that contain an excess of common words from a query, such as “the” or “for.”
Stating that this achievement has withstood the test of time would truly be an understatement: IDF is regarded as the foundation of modern search engines like Google, utilized in every search query to ensure users worldwide find exactly what they’re searching for.
Jones spent most of her professional career shifting from one research fellowship to another, living on what was commonly referred to as “soft money” (supporting herself with contract work). It wasn’t until 1999 that she officially became a professor at Cambridge. She believed the university’s tendency to be “in many ways not user-friendly, in the sense of woman-friendly” affected her ability to obtain steady work at the institution. “There were people around who thought it was perfectly okay to say, ‘Oh, well, higher education isn’t really a sort of thing you should let women have,’” Jones reveals, labelling such people as “fuddies” who set the tone for colleges “forty years ago, at least—if not four hundred years ago!”
In spite of this, Jones led a multifaceted professional life. She conducted experiments, authored papers in information retrieval, and lectured at Cambridge. In 1982, the British government enlisted her services for the Alvey Program to advance British IT research. She co-authored a seminal textbook on natural language processing systems in 1993 and ascended to the presidency of the Association for Computational Linguistics (ACL) in 1994. Her accolades include the ACL Lifetime Achievement Award, the 2007 Athena Award from the Association for Computing Machinery Women’s Group, and an honorary Doctor of Science degree bestowed by City University London in 1997. “Don’t you feel it’s odd to be one woman in a whole room full of men?” Jones remembers being asked this question by a company representative while she was discussing the Alvey program. “No, I’m used to it!” she responded.
In March 2007, Jones earned the distinction of becoming the first woman to receive the Lovelace Medal, bestowed by the British Computer Society in recognition of “exceptional contributions to computing research and computing education.” During an interview conducted at this time, she proclaimed her self-chosen rallying cry intended to inspire more women to partake in the tech world: “Computing is too important to be left to men.” Jones passed away a month later.
In her oral history, Jones addressed the need to inspire more girls to venture into computer science. She reflected on the possibility that the broadening utilization of computing in secretarial work may have inadvertently devalued it by linking it to boring and repetitive tasks. Regardless, Jones staunchly maintained that computing can indeed be captivating and appealing to women, particularly when highlighting its social applications. She gave the example of driving: “most women drive—traffic modeling is a growing area. What is it? How should it be factored in charging for where you go, convenience of route, and so on? All these things are part of the fabric of one’s life.
In honor of Jones, the British Computer Society established the Karen Spärck Jones Award in 2008, which recognized researchers dedicated to natural language processing and information retrieval. In 2019, the New York Times featured an extensive obituary on her as part of their Overlooked series. The obituary not only highlighted her remarkable professional accomplishments, but also captured her vibrant personality, including her “booming voice and puckish sense of humour,” and her unconventional habit of bicycling to formal dinners with her dress pinned to her handlebars.
Jones’s technological contributions paved the way for the remarkable growth and success of one of the Internet’s most widely-used tools: search engines. However, her impact on modern technology extended far beyond that, as highlighted by Carolina Bessega, Extreme Networks Innovation Lead. “The importance of her work influenced the evolution of information retrieval, setting the foundations for more sophisticated machine learning techniques,” Bessega states. “This has allowed us to reach the point where we can transform complex content into knowledge graphs for efficient, fast, and contextually relevant information retrieval, which is instrumental in numerous applications such as search engines, personal digital assistants, and advanced data analytics.”
Jones’s revolutionary contribution, the TF-IDF technique, completely transformed how machines comprehend the significance of words within a vast amount of data, laying the foundation for the subsequent evolution of artificial intelligence. Carolina states, “This principle is even evident in modern large language models like GPT-4, which utilize similar concepts to generate coherent and contextually relevant responses.” With the recent explosion of machine learning and artificial intelligence into the mainstream, Carolina asserts that Jones’s work will undoubtedly continue to influence forthcoming generations of language-based technology. “Advancements in AI and data science will undoubtedly continue building upon her pioneering work, driving the next generation of machine understanding and natural language generation.”