Dr. Mark Davis[1]
The title of this keynote is “Globalization: Resistance is Futile”. While playing with ideas for the title, I mentioned this one to my wife. She is always amused by my California accent, and pointed out that people might think I meant the word feudal — meaning a system of governance in mediæval Europe — rather than futile — meaning pointless. In her view, my accent is degraded English; of course, in my view it is merely removing a few unnecessary distinctions in speech…
The phrase “Resistance is Futile” is instantly recognized by Americans. In a well-known science-fiction television series, this phrase is uttered by the Borg, members of a vast, mechanistic society where people are subsumed into a so-called ‘collective’. As the Borg encounters each new culture, it inevitably consumes and absorbs that culture, making it over into an image of the Borg. Only the intrepid heroes of the series stand between the Borg and eventual domination, everywhere.
Many people have similar fears back here on Earth. As we
have seen by the riots associated with the WTO in
Often, the term ‘globalization’ is taken as a synonym for ‘Americanization’. In many cases the most visible external influences are American. And the spread of MacDonald’s and Starbuck’s only serve to confirm this opinion. Yet the phenomenon of globalization is not simply American; the cell-phone culture, with its impact on people’s daily lives, first arose outside of the United States. The whole topic of globalization is quite complex, but it is clear that the sources of globalization are, themselves, global.
To a certain extent, some fears of assimilation are overblown. For example, many are concerned about the number of English words entering their language. It is striking to them how many words are popping up, and it feels like their language is slowly being consumed—and even degraded (perhaps much like my wife feels about my English). Yet if carefully measured, these new words, in percentage, are an extremely small part of the normal daily vocabulary, and do not affect the fundamental grammatical structure of the language. The additional terms that are useful, that do capture a new concept, will probably survive; those that are merely fashionable, in the moment, will wither away.
Yet languages do change; otherwise I’d be standing before you speaking Indo-European.
The world as a whole is changing, and becoming more and more interdependent. Yet such change is not new; this process has been going on since prehistoric times. Geographical and environmental factors shaped the modern world, as argued in books like Guns, Germs, and Steel by Jared Diamond. Societies that had a head start in food production advanced beyond the hunter-gatherer stage, developing more complex social organizations that allowed them to develop technical innovations such as the wheel or the yoke, and skills such as horticulture, or metallurgy. The more connections between cultures, the more communication between peoples, the faster such innovations developed.
A
key point is that societies that were open to influences from the outside,
prevailed. Societies that were shut off—or shut themselves off—from contact
with the outside, were isolated from the ferment of ideas that contributed to
technological and social advances. And thus isolated, these societies
inevitably fell behind.
This does not mean that we can’t influence the direction that globalization takes, and the impact on our lives. Societies can adopt different strategies for dealing with the dislocations that it causes. Yet whatever we think of globalization in this sense, in the long term it is inevitable; we will see more and more interaction between different cultures, and without a doubt that will lead to changes in all of them. In the long run, resistance is futile.
Yet in the context of this conference, we deal with the term globalization in a very different sense; and one that is not nearly so controversial nor so complex. I speak here of product globalization, and even more specifically, of software globalization.
So how does this usage differ?
To really trace the origin, we have to go back in time. Not, in this case, millennia, but at least decades.
I myself first became involved in this area some twenty years ago at Apple Computer. At the time, people were only starting to realize the advantages of selling software products outside of their home markets, products that were translated so that others could, well, understand them. But more than simply translated, the products needed to be localized. What does this mean?
A localized product is fully adapted to the cultural requirements and expectations of a given user community. The common conventions of that community are used for any dates, times, numbers, currency amounts, text searching and sorting, screen layout, accounting practices, and so on.
A fully-localized program, insofar as it presents an interface to users, will also make use of images and metaphors for usage that are appropriate to that user community. This is not simply a translation of the text.
An icon of an American mailbox, for example, may be completely puzzling to people in a different environment. The worst case I’ve seen was a Pause button with the image of two animal pawprints… Obvious, and mildly amusing to a native English speaker; utterly baffling to others.
And the need for localized products is clearly there. Over 95% of the world’s population is not native English speakers. Even if we count secondary speakers of English, those who are functional in the language but not native speakers, over 92% of the world needs a language other than English. Take the top ten languages in total, including both primary and secondary speakers[2]:
18% Mandarin Chinese
7.5% English
5% Spanish
4.5% Russian
4% French
4% Hindi/Urdu
3.5% Arabic
3% Portuguese
3% Bengali
2% Japanese
1.5% German
It is still less half the world’s population (and that does
not count overlap in speakers). And if we look at the areas of growth, it is
clear that monolingual software is simply missing the boat.
The end goal for localization is to present a product that a user can easily use, no matter what his or her culture or language is. Of course, I am preaching to the choir at this meeting; I wouldn’t expect to get a great hue and cry against localized product at the LISA conference…
But the development of localized products was painful, expensive, and time-consuming. Non-English-speaking customers were often short-changed because of these barriers to product development. And the more their writing systems differed from English, the greater the barriers that loomed.
The next advance was the development of what are called internationalized products.
An internationalized product is one that can be localized without modification, by the addition or replacement of data modules (called resources). Internally, it is modularized, and accesses language-specific services such as sorting through common interfaces.
Internationalized products represented a tremendous advance both for companies and for clients. By separating out all the features of a product that needed localization into separate modules, companies could deliver products much more quickly. With a common code base, maintenance and support became easier, faster, and cheaper. And because the incremental cost associated with additional languages is vastly lowered, markets that would have been uneconomic are now open.
But the development of internationalized products was a house built on a very creaky foundation. That foundation is one of the most basic elements we need in computers, the storage of text.
Now internally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. But as people developed products for different languages, they would develop their own assignments of numbers. Eventually, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone required several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflicted with one another. That is, two encodings would use the same number for two different characters, or use different numbers for the same character. What is even worse is that computers would use the same name for conflicting encodings. The term “Shift-JIS”, for example, can apply to a dozen different encodings. Any given computer (especially servers) needed to support many different encodings; yet whenever data was passed between different encodings or platforms, that data always ran the risk of corruption.
And because the different encodings required different internal architectures to handle them, it was difficult, slow, and error-prone to have to deal with all of them.
What we needed was a way to deal with text data in a uniform manner, no matter what the language, and no matter what the computer platform or product. That way we could mix data from any source without corruption. That way we could design products with a single internal architecture for text handling, regardless of language. And that way we could finally give people around the world what they needed from their computers.
Back when I was working at Apple, we had kicked around different ideas for a radically new kind of character encoding that would encompass all the world’s languages, but we never fleshed out these ideas. That is, not until we heard about a proposal from colleagues at Xerox for such a new kind of character encoding, a character encoding originated by Joe Becker, and baptized 'Unicode'."
Apple and Xerox formed the core of a group dedicated to developing Unicode, and were soon joined by other companies, including Adobe, HP, IBM, Microsoft, RLG, Sybase, and Sun in a consortium; later to include companies such as JustSystem, Oracle, PeopleSoft, and SAP, as well as many other organizations—even some national governments. This consortium also worked closely with the international standards organization.
The goal was simple: unify the many hundreds of separate
character encodings into a single, universal standard. It would provide a
unique number for every character, no matter what the platform, no matter what
the program, no matter what the language.
By the way, the people involved in Unicode are a curious mixture. Hard-headed pragmatists on the one hand, bent on fashioning a standard that could be used effectively without compromising on speed or capabilities. But on the other hand, although the main focus has always been on the current and future market requirements, time has also been allocated to the writing systems of uncommon languages such as Mongolian, Limbu, or Tai Le. While these languages may not have immediate commercial application for the Unicode member companies, including their writing systems in Unicode helps in at least some small way to preserve those languages, and allow them to move into the computer age.
With Unicode in place, the stage was set for the next major advance, to globalized products.
A globalized product is internationalized for multiple languages simultaneously, using a single, uniform character encoding in all of its internal processing.
A globalized program can support data from any language without any intervening installation process, and can freely intermix data in those languages without risking data corruption. A globalized product with a user interface also has the capability to localize it to any desired language, and to switch the user interface from any of the localized languages to another one, without reinstalling. Think of this as being able to plug in new languages at will.
A globalized server can simultaneously serve the needs of many different clients, having many different languages. And we must remember that nowadays, even a server will often have a user interface, as it interacts with users via browsers and other mechanisms.
A globalized product can, of course, interact with systems using older, legacy, character encodings. These can be converted on input into Unicode, and then processed in a uniform manner along with any other textual data in the system. Whenever required, a product or system can always convert the text to other encodings on output, in dealing with older interfaces that are not yet Unicode-capable.
Globalized products bring all the advantages of internationalization, and more. For example, consider large customers, whether they be companies, governments, or other organizations. Large customers only deploy a product after internal testing, so they will see saving in maintenance, testing, and support. They also realize saving in their own applications where they do have to develop and test for different languages. Globalization is of particular advantage in web-based services. In large-scale server systems, for example, the workload can be balanced more effectively when there does not have to be a dedicated server per language.
With Unicode at the base, other pieces of infrastructure began to build upon it. One of the most important is XML, a key component of modern interchange of data. XML uses—and requires—the use of Unicode. At a basic level, XML interprets text as Unicode. It does allow the use of other encodings, but even in that case the characters in the other encoding are always interpreted by reference to their Unicode equivalents.
This percolates out through all of the different standards built on XML, such as XML Schema and SOAP (Simple Object Access Protocol), and to products built on them. These standards apply directly to the development of globalized software. For example, XML Schema provides mechanisms for interchanging data such as dates, times and numbers in a language-neutral fashion. Whenever such data ends up being displayed to a user, it can then be formatted in a localized fashion for that display.
My own company, IBM, is fully committed to globalized software architecture. A product such as WebSphere delivers a powerful and flexible globalized Web application server, partly because of the broad implementation of the range of leading-edge open standards. Other companies are moving the same direction, supporting a layer of tools and infrastructure that allow globalized products to be easily developed, and easily globalized from the beginning. Languages such as Java and C# are using Unicode natively, and provide a wide range of globalization services through their APIs. They thus provide a superstructure that can easily be used to build globalized products.
Languages such as C or C++ do not have globalization services built into them, but they can be supplemented with libraries. In support of this, IBM has even open-sourced its premier set of Unicode globalization services, called the International Components for Unicode (or ICU). These libraries provide robust and full-featured globalization services with the same results across all the various platforms, without sacrificing performance.
Thus pieces are in place for companies to produce fully globalized software. There are still some rough edges in many cases, but companies can and do build tools, products and systems that handle arbitrarily many languages, with full localization for each of those languages.
In this context, it is interesting to think of a recent event here in the States. On Feb. 17, a month after the Bush administration filed a brief with the Supreme Court opposing diversity policies at the University of Michigan, more than 300 organizations representing academia, major corporations, labor unions and nearly 30 of the nation's top former military and civilian defense officials, announced that they would file briefs supporting the university. Typical of the comments on this issue is a quote from Kenneth Frazier of Merck, the pharmaceutical company. “Diversity creates stronger companies,” he said. “The work we do directly impacts patients of all types around the globe. Understanding people is essential to our success."[3]
And selling to people, all around the world, is a key to success for a great many companies.[4] On the product side, the key to being successful in multiple markets is to supply products that are fully localized. And the key to successfully producing localized products is to design them to be globalized in the first place.[5] As the barriers to entering new markets lower, any company that does not do so risks being overwhelmed by its competitors. Companies need to, and will, weight the costs of marketing and support before they go into any particular market, but by designing globalized products from the beginning, they have the freedom to choose.
With regard to software globalization, we need not fear the Borg. Instead of people around the world being homogenized, those people’s needs are forcing companies to adapt to them. As companies convert their products to be fully globalized we see the benefits for both the companies in terms of broader markets, and for their clients in terms of access to more—and better—products.
In a strong sense, software globalization is the opposite of what most people think of as globalization. Instead of homogenizing the world, it helps to preserve diverse languages and cultures. It is the best practice for software development; companies and other organizations ignore the advantages of software globalization at their peril, for if they don’t adopt it their competitors will.
Thus in the long run, resistance is futile.
[1] Copyright © 2003, Mark Davis. All Rights Reserved.
[2] It is very difficult to determine the number of speakers, primary or secondary, for a given language. So this table should only be taken as an approximation.
[3] This is not to argue for or against any particular form of affirmative action. The reason it is cited here is as an example of how important it is for companies to be able to sell all around the world, and in order to do that, to understand clients all around the world.
[4] A recent study cited by an Oracle white paper claimed that major web sites in the US turned away almost half the orders coming from outside the country. The reason given is the inability to store, access and retrieve multilingual data such as contact and address information, or to process multiple currencies. Of course, like all statistics, this should be treated with some caution: see Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best.
[5] It may make business sense in some cases for a company to not fully globalize the first version of a product. Time-to-market considerations may be crucial for early success. However, with modern tools there is very little marginal cost to designing and developing products that are either globalized from the start, or take little effort to upgrade. But where globalization is not considered at all in the early development, it can be very expensive to retrofit products.