using wget to get the translated google result

advertisements

everybody. I want to write a bash-script to use the google translate from the terminal and translate the English into Chinese. My plan is first use the wget to translate the English,then use sed to get the result. So i use
wget -qO- --header="Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" --header="Accept-Charset:GBK,utf-8;q=0.7,*;q=0.3" --header="Accept-Encoding:gzip,deflate,sdch" --header="Accept-Language:en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4" -U "Mozilla/5.0 (X11; Linux i686) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.27 Safari/536.11" http://translate.google.cn/#en/zh-CN/hello | gunzip > out.html.
And i also just tried wget -U "Mozilla/5.0" http://translate.google.cn/#en/zh-CN/hello.
Both the results of them seem not what i want, because i can't find 你好 in Chinese from the result.
What's the problem with me?


What you're doing wrong is that you're actually only requesting http://translate.google.cn/ from the server and not http://translate.google.cn/#en/zh-CN/hello. This is because the part after the hash doesn't get sent to the server and is only supposed to be used by the browser. Google uses this part to make the translation request using Javascript.

To get the translation you need to make a request to the URL that the Javascript on that page uses. Something like this would work:

curl -A "Mozilla/5.0" 'http://translate.google.com/translate_a/t?client=t&text=hello&hl=en&sl=en&tl=zh-CN&ie=UTF-8&oe=UTF-8&multires=1&prev=btn&ssel=0&tsel=0&sc=1'

The previous command will print the following result:

[[["你好","hello","Nǐ hǎo",""]],[["interjection",["喂"],[["喂",["hello","hey"],,0.0087879393]]]],"en",,[["你好",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["你好",1000,0,0],["招呼",0,0,0],["打招呼",0,0,0],["个招呼",0,0,0],["喂",0,0,0]],[[0,5]],"hello"]],,,[["en"]],6]

You can then use sed to obtain the result as follows:

curl -A "Mozilla/5.0" 'http://translate.google.com/translate_a/t?client=t&text=hello&hl=en&sl=en&tl=zh-CN&ie=UTF-8&oe=UTF-8&multires=1&prev=btn&ssel=0&tsel=0&sc=1' | sed 's/\[\[\["\([^"]*\).*/\1/'

However, as others have mentioned you should not be using this to abuse the translate service, and for anything beyond experimentation or CLI-badassery you should (and probably have to) use the Google Translate API to avoid getting yourself in trouble. Google monitors usage, and will most definitely detect any attempts to abuse their services.

P.S: I'm not qualified to give legal advice, and what I wrote above regarding what I consider to be "not abusing the service" is 100% personal opinion, so please don't take it as the final say in the matter.