Monday, November 05, 2007

Ruby Community Seeks Autotranslator

As many of you know, Ruby was created in Japan by Yukihiro Matsumoto, and most of the core development team is still Japanese to this day. This has posed a serious problem for the Ruby community, since the language barrier between the Japanese core team and community and the English-speaking community is extremely high. Only a few members of the core team can speak English comfortably, so discussions about the future of Ruby, bug fixes, and new features happens almost entirely on the Japanese ruby-dev mailing list. That leaves those of us English speakers on the ruby-core mailing list out in the cold.

We need a two-way autotranslator.

Yes, we all know that automated translation technology is not perfect, and that for East Asian languages it's often barely readable. But even having partial, confusing translations of the Japanese emails would be better than having nothing at all, since we'd know that certain topics are being discussed. And English to JP translators do a bit better than the reverse direction, so core team members interested in ruby-core emails would get the same benefit.

I imagine this is also part of the reason Rails has not taken off as quickly in Japan as it has in the English-speaking world: the Rails core team is peopled primarily by English speakers, and the main Rails lists are all in English. Presumably, an autotranslating gateway would be useful for many such communities.

But here's the problem: I know of no such service.

There are multiple translation services, for free and for pay, that can handle Japanese to some level. Google Translate and Babelfish are the two I use regularly. But these only support translating a block of text or a URL entered into a web form. There also does not appear to be a Google API for Translate, so screen-scraping would be the only option at present.

The odd thing about this is that autotranslators are good enough now that there could easily be a generic translation service for dozens of languages. Enter in source and target languages, source and target mailing lists, and it would busily chew through mail. For closely-related European languages, autotranslators do an extremely good job. And just last night I translated a Chinese blog post using Google Translate that ended up reading as almost perfect English. The time is ripe for such a service, and making it freely available could knock down some huge barriers between international communities.

So, who's going to set it up first and grab the brass ring (or is there a service I've overlooked)?

16 comments:

Anonymous said...

I'd like to put something together, I'll see what I can do.

Jason Toy

Anonymous said...

InterTran (hundreds of languages) might be interesting, but the public service is almost always too busy to give results. Translating is computationally expensive. In the long run it needs to sustain the hardware and software costs of such a service.

Daniel Spiewak said...

What you really need is some guy who is literate in both languages. Throw him into a cage, feed mailing-list dump and pizza in at one end, receive translation at the other. I suppose the only sticky point is finding enough pizza to power our gerbil-cage translation system...

Unknown said...

I used to be *reasonably* conversational in japanese, but a decade of atrophy has mostly left with with "nihongo ga sukoshi wakarimasu".

Perhaps we need a ruby focussed language learning community. Like livemocha but for geeks.

Stephan.Schmidt said...

When I was doing a convention over configuration, component based web framework with Brick and ERb back 2001, the biggest problem with Ruby where that most cool stuff was not documented at all or only in Japanese. And the developers couldn't speak Englisch. Said to hear that this problem still exists 6 years later.

The language barrier was a one of the reasons we dropped Ruby for commercial back then and I hoped that would have changed.

Peace
-stephan

Matt Stine said...

The Java Posse guys talked about this in episode #148. There is now an "unofficial" Java API to Google Translate:

http://code.google.com/p/google-api-translate-java/

Hope this helps.

Matt

Unknown said...

InterTran apparently offers a server-based product.

Someone could probably make good money building out an API on top of their server product and charging for API calls.

And then that somebody could afford to build and auto-translation gateway for the ruby mailing lists on top of their technology. ;)

Anonymous said...

In the meanwhile there is an excellent tool for reading Japanese called "Rikai-chan". You just hover over some text and a pop-up appears with a translation.
You still need to know some Japanese grammar, but it helps a lot:

The Firefox plugin:
http://www.polarcloud.com/rikaichan/

The web-based one:
http://www.rikai.com/perl/Home.pl

Anonymous said...

How come members of Japanese core team must use English? How come almost members of English-speaking community don't study Japanese despite the fact that they can study 'speaking Ruby'?

Dr Nic said...

@anon 11/10 - perhaps the eagerness for the entire community to focus on the English language is that the Ruby syntax itself is English (alphanumberic characters, left-to-right across the page) rather than Japanese. This syntax encouraged English speakers to use Ruby. That's my guess.

Anonymous said...

Thanks, Dr. Nic, that was a very polite answer. I couldn't do that. You know I'm really peace and all... but the cultural ignorance of some japanese folks just puzzles me.

Hello?! We're talking about software and computer science here. Almost every notable academic and practical achievement in this area has been published in English. This is clearly the language that connects minds across borders. Just a fact. Not my fault. And I'm not a native speaker either.

I mean, you guys really have your ways with your culture. The reason we haven't got proper unicode support in Ruby today is because the standard didn't fully acknowledge the subtleties of the language.

And did you come up with something better? I mean other than an encoding scheme which represents absolutely nothing else but Japanese?

Nobody's telling anyone to learn English but it would sure help you get a broader perspective and get to learn more about the parts of the world that are not Japan.

I mean, I really love you all and I have the deepest respect for your culture. But you definitely need to be less stiff - and relax.

P.S. Sorry for ranting so cowardly anonymous, but hey, it's the internet! You may even insult me in return and I wouldn't care...

And Mats still rules, no matter what!

Andy said...

I should add that just because Chinese to English works ok, means nothing. Round tripping English to Japanese to English on any of the mentioned services illustrates how bad things can get. East Asian languages are similar as far as the lexical stuff goes, but Chinese just happens to have similar word order and sentence structure to English whereas Japanese is inflected and highly context sensitive.

Anonymous said...

The best bet is still Google because of their new algorithms for translation using machine learning. They do like Ruby, so if you ask them, they could probably be willing to help.

Best

Anonymous said...

Hi guys, I've put a site up at: http:/translator.rubynow.com
It still needs a lot of work to make it better, tell me what you think.
Thanks,
Jason Toy

Charles Oliver Nutter said...

Jason: This is a great effort! What are you using to translate? I especially like that you provide a way for other human translators to come in and provide a more accurate translation. I'll blog this if you don't mind, so people know it's out there.

A couple missing features come to mind:

- subscription feeds
- a mailing list for each translated output would be even better

rich said...

"The" Ruby community? We Americans enjoy a lot of convenience because of it's large and wealthy market which magnifies our influence. The Ruby Community speaks many languages and the issues are hardly unique, except that Americans are on the receiving end this time. As a matter of fact, keeping it to two way translations keeps non-English non-Japanese from contributing to core. There are some good Spanish and Portuguese . Being Number 2 is better than *actually* being left out.