Wednesday, October 3, 2007

The Penguin English-Hindi/Hindi-English Thesaurus and Dictionary

The Penguin English-Hindi/Hindi English Thesaurus and Dictionary

published by Penguin (India)

The book consists of three parts:

  1. The English-Hindi/Hindi-English Thesaurus
  2. The English-Hindi Dictionary and Index
  3. The Hindi-English Dictionary and Index

The parts are placed in a sturdy and hardboard cover, designed sophisticatedly.

The Penguin English-Hindi/Hindi-English Thesaurus and Dictionary

Preface


Language historians, population geneticists and archaeologists believe that a band of early humans, perhaps no more than 2,000 strong, acquired the amazing faculty for complex languages and invented linguistic communication. Blessed with the many advantages of meaningful speech, the band could now organize itself better, take on predators and prevail. Its population grew exponentially.

Around 50,000 years ago, the band developed sufficient navigation skills to cross the seas. Its members travelled far and wide, some settled in new colonies, others moved further, thus launching the first globalization movement. The band underwent successive population and language splits, its descendants becoming many races. Their ancient tongue has long been forgotten, but it left behind more than 5,000 languages.

Thesauruses and dictionaries through the ages

Ancient Sanskrit scholars describe word or language as vyakrita vani or meaningful, analyzed, systematized voice. They called it shabd Brahm, i.e., word, the Brahm. Brahm is personified in the Indian tradition as Brahma, the Creator. His consort, the goddess Sarasvati, is known as gira (voice) or Vagdevi (the goddess of voice). Ancient Greeks called word logos, giving it the status of God. In Christian theology, word is the Ultimate Reality, especially as manifest in the creative and sustaining spirit of God as revealed in Jesus.

Words as specific sound patterns represent things and communicate commands, instructions, ideas and thoughts. They are oral icons, symbols, representations. As societies identified and invented more things, they coined more words for them based on perceived associations, similarities and dissimilarities.

Language development is an ongoing process. It has been pivotal in enriching our mental capabilities, generating new ideas, codifying complex knowledge bases, and inventing and keeping track of philosophical thought, social codes, useful techniques and scientific systems, thus contributing to present-day systemic societal organization.

Before the emergence of early scripts, man had begun to make tools to record words and standardize language by defining rules. The first lexical works were simple word lists, the precursors of the modern, vast and intricate thesauruses and dictionaries. Examples are a short seventh-century bc Akkadian word list, from central Mesopotamia, and the early-third-century bc Erya, the first Chinese language dictionary which organized Chinese characters by semantic groups.

In India, the tradition of glossaries, thesauruses and dictionaries goes back to the Vedic age, between 3000 and 1500 bc. The world's first-known and extant thesaurus is Nighantu, a glossary of 1,800 Vedic words, arranged subject-wise. Its compiler, Kashyap, was bestowed with the lofty title of Prajapati, the progenitor. Nirukt, the sage Yask's treatise on Nighantu, may have been the world's first dictionary-encyclopaedia; it gives words and their meanings which are elaborated upon in great detail.

There were several subsequent compilations of Sanskrit dictionaries. The Shabdakalpadrum, a Sanskrit dictionary of an unknown date, lists twenty-nine such works, most of which were arranged subject-wise and were, in a broad sense, thesauruses.

Amar Kosh is the bible of all the Sanskrit thesauruses. Its author, Amar Singh (Amar Simha in Roman Devanagari) gave his work the title of Namalinganushasan (the Discipline of Names and Genders). It was also called Trikaand, because it was divided in three hierarchical cantos with twenty-five chapters having a total of 8,000 words in 1,502 shlokas or verses. It is popularly known as Amar Kosh to acknowledge the achievement of its author, just as the English thesaurus, in all its editions and variations, is better known as Roget's Thesaurus.

When the Amar Kosh first made its appearance is not known, but it may have been written between the fourth and the tenth centuries ad. Ancient Indians rarely kept records of dates! Like the later Roget's Thesaurus, Amar Kosh was an instant success. Its fame spread beyond the Himalayas and it became the subject of numerous treatises. It is said that one Pandit Gunaraj translated it into Chinese in the sixth century. The Hindi–Persian poet, Ameer Khusro's Khalikbari (twelfth–thirteenth century ad) was directly inspired by it. His Persian–Hindi thesaurus-cum-dictionary can be counted among the early bilingual thesauruses of the world.

Most Sanskrit and Indo–Persian dictionaries till the nineteenth century were arranged in a rhyming order. In non-script and pre-printing societies, versification was the accepted way of writing important books on the premise that it is easier to remember a verse than a prose paragraph. This also explains the proliferation of synonyms in these languages; it helps to have parallel words at hand, to balance a metric line.

The advent of modern lexicography goes back to early-seventeenth-century England. The first English dictionary is believed to be Robert Cawdrey's Table Alphabeticall of 1604. It included 3,000 words and contained little more than synonyms. The first comprehensive dictionary was Thomas Blount's Glossographia in 1656. But the first true modern English dictionary was Samuel Johnson’s Dictionary of the English Language (1755).

In 1806, Webster published A Compendious Dictionary of the English Language, the first American dictionary. Immediately thereafter, he went to work on his magnum opus, An American Dictionary of the English Language, for which he learned twenty-six languages, including Anglo-Saxon and Sanskrit, in order to research the origins of his mother tongue. This book, published in 1828 with 70,000 entries, set a new standard in lexicography. Many felt that it surpassed Samuel Johnson's 1755 British masterpiece, not only in scope but in authority as well.

The largest dictionary of the world is het Woordenboek der Nederlandsche Taal (WNT) (the Dictionary of the Dutch language). It took 134 years to create (1864–1998) and has approximately 4,30,000 entries on 45,805 pages in 92,000 columns.

A big landmark in modern lexicography was the publishing of Dr Peter Mark Roget's thesaurus in 1852. This edition had 1,500 words arranged in a systematic, subject-wise manner. Roget's work gave the writer his first tool to select the right word for a concept. Since then, its newer editions have had many words added to it, culminating in the vast international editions of today.

Contact with the West and the establishment of British rule in the eighteenth–nineteenth centuries gave a new impetus to language studies in India. The rulers needed to understand their subjects better and better and the discipline of Indology came into being. Simultaneously, great efforts were afoot to propagate Christianity. To make vernacular translations of the Bible, Christian missionaries took to learning Indian languages and made grammars to fulfil their needs. Scholars made bilingual dictionaries; among them is the famous and still unrivalled Sanskrit–English Dictionary (1872) by Sir Monier Monier-Williams.

Even before Independence, many individuals and organizations in India were making Hindi, English–Hindi and Hindi–English dictionaries. The vast Hindi dictionaries of Nagari Pracharini Sabha (Varanasi) and Hindi Sahitya Sammelan (Allahabad) are examples of the remarkable collective work and modern India's attempts in lexicography. India’s independence from British rule in 1947 greatly accelerated the process; the nascent nation had to come to terms with a new world. This gave a new urgency to dictionary making.

Under the British rule, many Indians opposed the usage of English which they viewed as an imperial imposition on the country. After Independence, however, English was increasingly perceived as an important portal of India to the world. This explains the emphasis on the creation of English–Hindi and Hindi–English dictionaries. Some bilingual dictionaries between Hindi and languages like Russian and German were also made. The Government of India set up commissions to coin technical terms so that Hindi could replace English as the medium of education, governance and technological development.

We decide to make a thesaurus

Arvind first came to know of and use Roget's work in 1952 and wished Hindi had such a wonderful tool. He hoped that in the new spirit of dictionary making in India, a Hindi thesaurus would soon be made too. Two decades later, Arvind was in Bombay (now Mumbai), editing a Hindi fortnightly magazine, Madhuri, for the Times of India group. There was still no Hindi thesaurus on the horizon. On the evening of Christmas Day 1973, it occurred to him that he would have to make it. The next morning, we discussed the idea during our walk and decided to go ahead with the work.

We were well aware that the colossal job would require our full-time dedication. Arvind would have to leave his lucrative job and, in the absence of any financial support, we would have to live simply off our savings.

We spent some months in collecting reference material. On 19 April 1976, we started work on a part-time basis, in our off hours. Arvind would write words on specially designed cards and Kusum would later create indexes for them on a set of smaller cards. In 1978, Arvind left Bombay and we moved to Delhi. The final plunge into the ocean of Hindi vocabulary had been taken.

Arvind had imagined that we would be able to complete the work in two years (it eventually took twenty!). He had reasoned that we could follow the pattern of Roget’s Thesaurus. We assigned numbers to all the concepts and put the numbered cards in the Rogetian sequence. All that remained to be done was to fill the cards with appropriate Hindi words. Alas, it was not that simple.

To check the model, Arvind went through the first few pages of a Hindi dictionary. A large number of necessary concepts were missing in Roget's and there was no way to add more categories between the already assigned sequential numbers.

Roget's work is based on the so-called scientific classification. Language, however, is anything but scientific. While the study of words is a science, people coin words in various unscientific ways, mostly associative, but sometimes just whimsical. Associations vary from people to people and time to time and have societal contexts. The scientific system is also handicapped by difficulties that the layman may have in making a straight association of concepts. For example, in modern Rogetian editions, wheat is listed with grasses. Among its associations are bamboo, banana. No relationship has been pointed out with cereal or food. Another example is that of steel. The user thinks of steel in the context of iron. But in Roget, it is counted among alloys with no reference to iron.

When Roget’s system failed us, we considered emulating Amar Singh. However, he was out of sync with new realities. Wars or arms no longer conjure up images of warriors from the kshatriya caste. Nor would one associate lion with a kshatriya or cow with a vaishya. The shudras are no longer menials or servants. In Amar Singh’s time, music was a heavenly activity, but a musician a menial. Thus, he put music in the first canto Heavens, and musician under Shudras in the second; this would not work in the contemporary context.

It was now plain to us that we had no model; that we were on our own. There were no pointers to what order, sequence, pattern or structure we would give to our word groups. We decided to evolve our own system as we progressed. There were at least five false starts. It was fourteen years before we came upon a viable structure.

The job of adding words was divided between the two of us. Arvind took care of categories like activities, ideas, abstract nouns, verbs, adjectives, adverbs, idioms and exclamations. Kusum was assigned words relating to things, animals, trees, herbs and mythological names. She had to face unforeseen difficulties. Hindi has many words for a tree/animal and a word may stand for many trees/animals. Her problem was how to find a way to distinguish and insert a word in the right place. Fortunately for her, Sir Monier-Williams' excellent Sanskrit–English dictionary gives the New Latin technical names of such things. Kusum started making an index of New Latin technical terms, to check and re-check if her entries were right.

Computer and the Shabda Lexicographer

By 1990-91, we had a roomful of 60,000 hand-written cards with over 2,50,000 words. The cards were arranged subject-wise in specially designed wooden trays in which we were able to stack two or three rows of about 150 cards. The trays and rows were arranged in conceptual groups and subgroups. To change the sequence, we would inter-shift trays, or subgroups within a tray. The task of handling the data spread all over was getting out of hand. There was also much overlapping of categories and repetition of words.

We also had to think of the means to resolve the logistics of handling the data while publishing. The numerous cards would first have to go to typists who, we feared, would mix up their sequence or lose some cards. There could be typographical errors, or corrected type sheets could get mixed up. Typesetters at the printing press would add their own quota of errors. Even with careful proof-reading, it seemed unlikely that we would have an error-free work.

The formidable task of creating indexes also stared us in the face; once the thesaurus part of the book was typeset, a veritable army would be required to index it and, worse, indexers might supply their own share of unforgivable blunders. Without an index, a thematic thesaurus would have no meaning. Even fifteen years after starting it, the work was nowhere near completion.

At this time, our son, Dr Sumeet Kumar, a double gold-medallist mbbs, ms, from the Seth G.S. Medical College, Mumbai, was working as a resident surgeon at Dr Ram Manohar Lohia Hospital, New Delhi. There, viewing the first personal computers that were beginning to be used in India and the computerization of data at the hospital, he saw their great potential.

He suggested that we computerize our data. We initially turned down the idea, then submitted. However, having over the years supported our work from our savings, we had no money for a computer nor programmers. Sumeet took up an assignment as surgeon for the National Iranian Oil Company for one and a half years, with the explicit goal of returning to India as soon as he had saved enough money to computerize our work. He was back in Delhi in 1992. After some research, we purchased our first i386 computer in May 1993.

In Iran, Sumeet also educated himself about computers and computer applications. He had determined that our work required a database programme, not just a word processor.

The importance of a database for a thesaurus or dictionary cannot be overstated. It facilitates the handling and management of data in various ways. One can add as many new categories or concepts as one likes, include extra columns, rows and fields, enter any number of synonyms, and shift groups to change/modify the sequence. Once a data is in place, duplications show up and can be removed; records or expressions can be examined, edited, changed. And, more importantly, indexing is automatic.

To be of any use, databases need complex programming. We soon learned that there were no programmes available for making thesauruses. We would need to get our own software package developed and customized. But computer programmers do not come cheap. Further, we discovered, no one from the several software companies we approached had any previous experience to meet our specific requirements. The task of developing a custom-built solution would take time and cost an astronomical amount.

Sumeet found he had a natural and hitherto undiscovered talent for programming and took on the daunting task. He selected FoxPro 2.0 as the most appropriate platform for our database. Over the next six months, he wrote the initial application for converting our manually written cards. He kept upgrading the programme, adding new modules to satisfy our ever-increasing demands, enabling us to view and examine the growing data, edit it, and reorganize it. His programme allowed us to earmark individual records for selection to feature in various types of mono-, bi- and multilingual thesauruses and dictionaries. He has now evolved a foolproof, almost automatic system of converting DOS data into fully formatted Adobe PageMaker and Microsoft Word documents with multilingual indexes, ready for taking camera-ready printouts.

Our labour of love first bore fruit after twenty years in the shape of Samantar Kosh Hindi Thesaurus—the first ever in Hindi. It contains 1,60,850 expressions grouped in 1,100 categories and 23,759 sub-categories. National Book Trust, India, published it in 1996 as part of the golden jubilee celebrations of Independence. We were thrilled to present its first copy on 13 December 1996 to the then President of India, Dr Shankar Dayal Sharma.

We often wonder what would have happened if we had not taken the computer route. We may still have been writing cards!

Cross-cultural linguistic tools: Need of the day

We are in the throes of yet another wave of linguistic globalization, first reflected in the exhaustive international editions of Roget's thesauruses, designed for English-speaking nations. The new world scenario calls for lexicographical works which can meet the global cross-cultural needs. Indians are contributing to the hectic global scientific, economic and cultural activity; opinion leaders, reporters, newscasters, scientists, teachers, students, and migrants have to deal with proliferating concepts in different geographical and cultural milieus. Bilingual thesauruses can suffice, to begin with, before multi-lingual ones materialize.

Makers of bilingual dictionaries would welcome one-to-one correspondences for words in any two languages. However, as linguists know, it is uncommon to find two words in two languages which have the same meanings, weights, backgrounds and associations. To give a simple example, the English word success has two Hindi equivalent words, saphalata and kamyabi. All three words have different cultural and semantic backgrounds and contexts. The word success represents a sense of reaching somewhere. Saphalata is a word emanating from an agricultural background; it literally means fruitfulness or having come to fruition. Kamyabi has an Indo-Persian origin and denotes the achievement of an objective. Its Sanskrit-based equivalent can be kritkaryata (success in one's endeavour), a term now used for thankfulness. Success leads to succession, but neither saphalata nor kamyabi can lead one to uttaradhikar.

One is also at a loss to find the English equivalent for the commonly used Hindi word, shobha. Hindi–English and Sanskrit–English dictionaries offer a number of English words as its rough equivalents: splendour, brilliance, lustre, beauty, grace, loveliness, elegance, show… None of these is satisfactory. Shobha embodies only a fraction of these put together and a lot more.

A bilingual English–Hindi/Hindi–English thesaurus was the obvious way around this predicament. For a concept in either language, it would offer a host of options to choose from, far exceeding the potential of a simple dictionary.

India has a very high density of English-knowing and -speaking people; many Indians have been educated through the English medium and are more comfortable using English than Hindi. There are also many first- and second-generation non-resident Indians, especially in the USA and UK, non-Indian researchers and scholars of Hindi, others who wish to enrich their Hindi and English vocabularies or some who simply wish to look up a correct Hindi word for an English one. There are also several people translating into and from Hindi and need parallel Hindi/English words. In addition, there are non-English non-Indians who learn English to learn Hindi, as a bridge between their mother tongue and Hindi. South Asians who share cultural traits with us can also be included in the list of people for whom such a work would be useful. Also, for the many non-Indians who would like to understand South Asian terms in the context of their own sensibilities, such a work would be needed.

With these factors in mind, we started work on an English–Hindi word bank in 1997.

We took the Hindi data as the base. Now, the first step was to add, in the FoxPro table, columns to accommodate corresponding English headings, subheadings and synonyms. The next was to find equivalent English words for them in the Samantar Kosh. To help us, our daughter Meeta Lall, gold-medallist MSc in Nutrition from Delhi's Lady Irwin College, willingly took up jotting down the English equivalents in a copy of the Samantar Kosh. (She later edited our data on food, nutrition, and health.)

From here on, it was Arvind's task to find and add more English words for all the subheads. Kusum would sometimes be pressed in to look up Hindi–English, English–Hindi, and standalone English dictionaries to check and cross-check meanings.

Once the Hindi to English part was done, we knew a large number of non-Hindi concepts must have been left out since our data was basically Hindi and Indian. To ensure a true bilingual character with cross-cultural references, we now engaged in entering words from the English vocabulary, going from A to Z. Unique English expressions had to be inserted and linked to Indian culture at appropriate places and Hindi equivalents added for them. This process of cross-fertilization has helped us change, enrich and improve the Hindi data too. Many new categories have been added, and many more expressions included.

Now we may rightfully claim to have a rich cross-cultural bilingual data of English and Hindi expressions, linking Indian and principal world cultures. We can also claim to have developed a unique easy-to-use database system, adaptable to the growing requirements of a lexicographic group.

The initial programming and first entries to the data on the English–Hindi/Hindi–English thesaurus and dictionary were made in Kuala Lumpur (Malaysia) where Sumeet was getting his hospital management system installed. Since then the work has moved from country to country and within India from town to town. For two years, we worked on it in Dallas (Texas) and Tulsa (Oklahoma). In India, we worked on it in Ghaziabad and Chennai. The last four years saw us work in Pondicherry (renamed Pudducherry) and Auroville, founded in 1968 as an international township that aspires to realize human unity. As a consequence of the growing worldwide influence of Sri Aurobindo and the Mother, Auroville has residents from over forty countries, engaged in cross-cultural exchange, social experimentation and innovation.

The Penguin English–Hindi/Hindi–English Thesaurus and Dictionary द पेंगुइन इंग्लिशहिंदी/हिंदीइंग्लिश थिसारस ऐंड डिक्शनरी is in three parts. The first part is The English–Hindi/Hindi–English Thesaurus द इंग्लिशहिंदी/हिंदीइंग्लिश थिसारस, the second is The English–Hindi Dictionary and Index द इंग्लिशहिंदी डिक्शनरी ऐंड इंडैक्स and the third is The Hindi–English Dictionary and Index द हिंदीइंग्लिश डिक्शनरी ऐंड इंडैक्स.

We are happy it is being published in the diamond jubilee year of Independence.

We would like to thank…

This work may not have seen the light of day if it were not for the large number of well-wishers who encouraged and applauded us. They are too numerous for us to list individually but we express our heartfelt gratitude to them.

Our special thanks go to Meeta for her initial and valuable input and to her husband Atul who gave her and us moral support. As for Sumeet, we do not know how to thank him!

We also thank Udayan Mitra of Penguin India for taking personal interst in its publication, and Neeta Gupta and Meeta for coordinating between Penguin India and us.

Arvind Kumar, Kusum Kumar

5 May 2007

6 comments:

Nitin Vashist said...

Awesome work!

Shah4U.S.Senate said...

Bravo! Congratulations for the greatest accomplishment in the history of spoken languages.

You deserve the nomination for the Noble Prize in Literature for your contribution in producing the Koshnama, a marriage between the two major(Europian and Indian) languages spoken by over half of the world's population.

You have build a liguistic bridge. This is not an insignificant task.

hridayesh said...

dear sir , i have heard that there is coming a software interface to accompany your dictionary , in november.


is that true?when will that be available to purchase ?

3D Animation said...

I have started Hindi Sorting Service. Whatever software you are using for Hindi Publishing, You can Export/Save as Text File.
Send me that Text File, I will sort it according to Hindi Alphabets and send you back. No Advance. Send me the unsorted text file,
and the details of the font you are using. 50% of your sorted output I will send free of cost, if you like, pay me Rs.150/- Per Thousand Lines.
I will send you rest of the file.
You can see the sorted output here for ready reference
http://i240.photobucket.com/albums/ff226/kkrawal/hi_mage1.jpg
http://i240.photobucket.com/albums/ff226/kkrawal/hi_mage2.jpg


patricia_animation@hotmail.com

Unknown said...

I have quoted the following paragraph from your post at my link here, with a link back to this post : http://wp.me/pm0v9-1AN
Please let me know that you do not mind...
"The world's first-known and extant thesaurus is Nighantu, a glossary of 1,800 Vedic words, arranged subject-wise. Its compiler, Kashyap, was bestowed with the lofty title ofPrajapati, the progenitor. Nirukt, the sage Yask's treatise on Nighantu, may have been the world's first dictionary-encyclopaedia; it gives words and their meanings which are elaborated upon in great detail.

There were several subsequent compilations of Sanskrit dictionaries. The Shabdakalpadrum, a Sanskrit dictionary of an unknown date, lists twenty-nine such works, most of which were arranged subject-wise and were, in a broad sense, thesauruses.

Amar Kosh is the bible of all the Sanskrit thesauruses. Its author, Amar Singh (AmarSimha in Roman Devanagari) gave his work the title of Namalinganushasan (the Discipline of Names and Genders). It was also called Trikaand, because it was divided in three hierarchical cantos with twenty-five chapters having a total of 8,000 words in 1,502 shlokas or verses. It is popularly known as Amar Kosh to acknowledge the achievement of its author."

सत्यात्री / SATYAATRI said...

if i buy this dictionary-thesaurus do i still need to purchase Sahaj Samantar Kosh?