How our heritage can be digitally explored

Imagine that you had digital access to all the books that have ever been written. Imagine further that you could search through them just by a few keystrokes. And that you could find out how frequently any word or expression of interest to you were used through the ages. Do you think this would be an extraordinary tool?

In fact, that tool already exists and it currently can search through more than six million books. That is possible first because Google has already digitized about a quarter of the 130 million books that have ever been written, and secondly, because two brilliant scientists have produced a software that allows anyone to perform such word searches and in a number of languages.

I recently read the book that those two young scientists — Erez Aiden and Jean-Baptiste Michel — published in 2013, where they describe their work, give all kinds of illuminating examples and explain why this may be regarded as a quantum leap in our exploration of various aspects of history. The book, Uncharted: Big Data as a Lens on Human Culture, gives a number of examples of word comparisons that can be performed through time (going back to the earliest book available) and where, in each case, one sees how humanity sometimes gradually and sometimes suddenly changed its mindset: Men and women (spoiler: the word “women” overtakes “men” in books in 1983), war and peace, gold and oil, coffee and tea, science and religion, Catholics and Muslims and many more. One quickly sees how this tool indeed becomes a “lens” on human culture.

As I have mentioned, the tool can be used by everyone; it is freely usable online, at: In December 2010, Aiden and Michel published a paper presenting their work in the top journal Science and simultaneously released the tool, the “Ngram Viewer”, which allows anyone to do search comparisons between words or expressions from 1500 to today, or actually to 2008, to be accurate. The next day, it was on the front page of the New York Times, and within 24 hours, it got more than three million visits. Clearly, they had hit on something.

I remember hearing about the Ngram Viewer a few years ago and actually visiting the webpage and doing a few searches for fun, but I had not grasped its historical and cultural significance. The book explains this very well, and in witty style.

Indeed, applications range from the linguistic to the socio-political: One can investigate and come to understand the history and evolution of irregular verbs, social norms (through expressions, such as “slavery”, “heaven and hell”, “terrorism”, etc.), scientific topics (“evolution”, “space ship” or “space travel”, “cancer”, “autism”, etc.), how people become famous (and which types of personalities society prefers), how quickly various technologies spread through society (trains, telephones, radio, TV, internet, etc.), and many other trends.

The moment I understood all this, and well before I finished the book, I went online and decided to do some “interesting” searches. On the barebone webpage, I saw the languages menu, and I immediately thought: ‘Yes, I’ll search through the Arabic heritage; there are so many things one can learn from the frequency with which some words or expressions have been used in Arabic books through history!’ How disappointed — I was to find that Arabic was not one of the available languages: American English, British English, Chinese, French, German, Hebrew, Italian, Russian and Spanish.

There have actually been some digitisation efforts made with Arabic or Islamic manuscripts. For example, the Yemeni Manuscript Digitisation Initiative, the largest of such efforts, covers 50,000 books from the 10th century to the present; the Welcome Arabic Manuscripts collection has about 1,000 digitised texts on the history of medicine; and other such initiatives, often found in western universities. Compared to the millions of books that Google and Ngram allow one to search, those manuscript collections are a tiny fraction of the Arabic/Islamic heritage. The Arab-Muslim world may currently have other fish to fry, but this is an important cultural development that some governments or cultural foundations should seriously consider.

In fact, it is not just books that are being digitised (Google says it will be done with all 130 million of them by 2020), newspapers are being digitised (one Australian foundation has already taken care of 100 million articles), and of course our current online activity (emails, tweets, facebook posts, Instagram pictures, YouTube videos, etc.) is already digital. Soon, the sum total of our digital records will provide both scholars and the general public extraordinary fields to plough, where human history, society and culture could be examined from various angles.

The Arab world must move its history and culture from the dusty shelves of old libraries and private collections to the digital, open and immediately accessible world of today and tomorrow.

By Nidhal Guessoum, published in Gulf News, April 2nd 2015.

Nidhal Guessoum is a professor at the American University of Sharjah. You can follow him on Twitter at: