Google is an incredibly multi-lingual company when it comes right down to it. That’s no negligible factor, either. While Yahoo! U.S. was busy siding with Bing to finally have a shot at breaking into Google’s impenetrable 70% search share, Yahoo! Japan decided to side with the “enemy” behemoth and their algorithm simply because of its linguistic capabilities. Google hasn’t stopped its linguistic development, either, adding more advanced and versatile language recognition for voice services, additional typesets in Blogger and Docs, and now OCR for 29 additional languages.
OCR, or “optical character recognition,” is the service that allows you to convert a PDF or image into an editable text document. Typically, this is done via the Google Docs platform, where users are able to both store their uploaded files and edit them as a cloud-accessible word document. Users can tap into the power of this new OCR by simply uploading a file as normal through the Docs upload interface, then selecting the appropriate language in the newly expanded drop-box under the “Convert text from PDF or image” option.
When OCR was first released in June of 2010, it supported English, Spanish, German, Italian, and French. The 29 added languages, which includes Russian, simplified Chinese, and a couple dozen others, brings the grand total to 34. Beyond being impressive in its own right, the addition of these extra languages means that Google is now almost as accomplished a linguist as Noam Chomsky, Leonard Bloomfield, or even my Grandpa.
[via the Google Docs Blog]