小作品： Python 命令行词典，含 15 万离线词库（附源码）

python-translate（Python 命令行词典）

python-translate 是一个简单的命令行翻译工具，数据源自必应、有道及爱词霸翻译服务。

screenshot_v0.1.3.jpg

代码说明

Python 版本
python 2.6 +
演示环境
BunsenLabs Linux Hydrogen (Debian GNU/Linux 8.5)

基本功能

英汉 / 汉英翻译
拼写检查及拼写建议（仅英文）
数据存储 (使用 dbm 模块)
单词发音

使用方法

usage: translate.py [-h] [-n] [-p {espeak,festival}] [-s {bing,youdao,iciba}]
                    [-w] [-V]
                    word

positional arguments:
  word                  word or 'some phrase'

optional arguments:
  -h, --help            show this help message and exit
  -n, --nostorage       turn off data storage
  -p {espeak,festival,real}, --pronounce {espeak,festival,real}
                        text-to-speech software: 'espeak', 'festival' or 'real'
  -s {bing,youdao,iciba}, --service {bing,youdao,iciba}
                        translate service: 'bing', 'youdao' or 'iciba'
  -w, --webonly         ignore local data
  -V, --version         show program's version number and exit

关于查询结果保存
默认保存查询结果，如需关闭，可使用 -n 或 --nostorage 选项。

$ python2 translate.py hello -n

关于本地数据使用
默认使用本地数据库，如需关闭，可使用 -w 或 --webonly 选项。

$ python2 translate.py hello -w

关于翻译服务选择
可使用 -s 或 --service 选项指定翻译服务：bing | youdao | iciba ，默认使用必应翻译。以下三种表示方法均有效：

$ python2 translate.py hello -s=youdao
$ python2 translate.py hello -s youdao
$ python2 translate.py hello -syoudao

若该选项非空，则 webonly 会自动开启，即不使用本地数据库。

关于单词发音
单词发音功能默认关闭，如需启用，可使用 -p 或 --pronounce 选项，选择具体的软件发音： espeak | festival 。
另外 TTS 合成语音效果一般，若有真人语音文件，可配合 aplay、mpg321、sox 等命令使用，可修改源码中的 pronounce 部分以更改的发音配置。
p.s. 语音资源可搜索 "OtdRealPeopleTTS"、"WyabdcRealPeopleTTS" 等关键词。

$ python2 translate.py hello -p=espeak
$ python2 translate.py hello -p=festival
$ python2 translate.py hello -p=real

库依赖 & 软件支持

$ pip install requests beautifulsoup4 lxml pyenchant
# OR
$ pip install -r requirements.txt

eSpeak (发音需要，可选择安装)
Festival (发音需要，可选择安装)
ALSA (发音需要，可选择安装)

$ sudo apt-get install libxml2-dev libxslt-dev python-dev espeak festival alsa-base alsa-utils

小贴士

设置命令别名

$ alias t="python2 /path/to/the/translate.py"
$ alias te="t -p=espeak"
$ alias tf="t -p=festival"
$ alias tr="t -p=real"
$ alias tb="t -s=bing"
$ alias ty="t -s=youdao"
$ alias ti="t -s=iciba"

data 文件夹内包含了 15 万英文单词的翻译结果
修改 hosts 配置可加速在线查询，参考 test 文件夹中的 hosts 文件
预先批量查询并保存结果，可作离线词典使用，单词列表见 spell-checker 文件夹

源码（v0.1.3)

#!/usr/bin/env python
# -*- coding:utf-8 -*
import os
import argparse
import dbm
import re
from multiprocessing.dummy import Pool as ThreadPool
from multiprocessing import Process


class Bing(object):

    def __init__(self):
        super(Bing, self).__init__()

    def query(self, word):
        import requests
        from bs4 import BeautifulSoup
        sess = requests.Session()
        headers = {
            'Host': 'cn.bing.com',
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate',
        }
        sess.headers.update(headers)
        url = 'http://cn.bing.com/dict/SerpHoverTrans?q=%s' % (word)
        try:
            resp = sess.get(url, timeout=100)
        except:
            return None
        text = resp.text
        if (resp.status_code == 200) and (text):
            soup = BeautifulSoup(text, 'lxml')
            if soup.find('h4').text.strip() != word.decode('utf-8'):
                return None
            lis = soup.find_all('li')
            trans = []
            for item in lis:
                transText = item.get_text()
                if transText:
                    trans.append(transText)
            return '\n'.join(trans)
        else:
            return None


class Youdao(object):

    def __init__(self):
        super(Youdao, self).__init__()

    def query(self, word):
        import requests
        try:
            import xml.etree.cElementTree as ET
        except ImportError:
            import xml.etree.ElementTree as ET
        sess = requests.Session()
        headers = {
            'Host': 'dict.youdao.com',
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate'
        }
        sess.headers.update(headers)
        url = 'http://dict.youdao.com/fsearch?q=%s' % (word)
        try:
            resp = sess.get(url, timeout=100)
        except:
            return None
        text = resp.content
        if (resp.status_code == 200) and (text):
            tree = ET.ElementTree(ET.fromstring(text))
            returnPhrase = tree.find('return-phrase')
            if returnPhrase.text.strip() != word.decode('utf-8'):
                return None
            customTranslation = tree.find('custom-translation')
            if not customTranslation:
                return None
            trans = []
            for t in customTranslation.findall('translation'):
                transText = t[0].text
                if transText:
                    trans.append(transText)
            return '\n'.join(trans)
        else:
            return None


class Iciba(object):

    def __init__(self):
        super(Iciba, self).__init__()

    def query(self, word):
        import requests
        from bs4 import BeautifulSoup
        sess = requests.Session()
        headers = {
            'Host': 'open.iciba.com',
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate'
        }
        sess.headers.update(headers)
        url = 'http://open.iciba.com/huaci_new/dict.php?word=%s' % (word)
        try:
            resp = sess.get(url, timeout=100)
            text = resp.text
            pattern = r'(<div class=\\\"icIBahyI-group_pos\\\">[\s\S]+?</div>)'
            text = re.search(pattern, text).group(1)
        except:
            return None
        if (resp.status_code == 200) and (text):
            soup = BeautifulSoup(text, 'lxml')
            ps = soup.find_all('p')
            trans = []
            for item in ps:
                transText = item.get_text()
                transText = re.sub(
                    r'\s+', ' ', transText.replace('\t', '')).strip()
                if transText:
                    trans.append(transText)
            return '\n'.join(trans)
        else:
            return None


path = os.path.dirname(os.path.realpath(__file__))
db = dbm.open(path + '/data/vocabulary', 'c')
DEFAULT_SERVICE = 'bing'


class Client(object):

    def __init__(self, word, service=None, webonly=False):
        super(Client, self).__init__()
        if not service:
            service = DEFAULT_SERVICE
        self.service = service
        self.word = word
        self.trans = None
        if webonly:
            self.db = {}
        else:
            self.db = db

    def translate(self):
        trans = self.db.get(self.word)
        if trans:
            return trans
        else:
            if self.service == 'bing':
                S = Bing()
            if self.service == 'youdao':
                S = Youdao()
            elif self.service == 'iciba':
                S = Iciba()
            trans = S.query(self.word)
            self.trans = trans
            return trans

    def suggest(self):
        if re.sub(r'[a-zA-Z\d\'\-\.\s]', '', self.word):
            return None
        import enchant
        try:
            d = enchant.DictWithPWL(
                'en_US', path + '/data/spell-checker/american-english-large')
        except:
            d = enchant.Dict('en_US')
        suggestion = d.suggest(self.word)
        return suggestion

    def pronounce(self, tts):
        if tts == 'festival':
            cmd = ' echo "%s" | festival --tts > /dev/null 2>&1' % (self.word)
        elif tts == 'espeak':
            cmd = 'espeak -v en-us "%s" > /dev/null 2>&1' % (self.word)
        elif tts == 'real':
            cmd = 'find %s/data/RealPeopleTTS/ -type f -iname "%s.wav" | head -n1 | xargs -I {} aplay {} > /dev/null 2>&1' % (
                path, self.word)
        import commands
        try:
            status, output = commands.getstatusoutput(cmd)
        except:
            pass
        return True

    def updateDB(self):
        if self.trans:
            db[self.word] = self.trans.encode('utf-8')
        db.close()
        return True


def parseArgs():
    parser = argparse.ArgumentParser()
    parser.add_argument('word', help="word or 'some phrase'")
    parser.add_argument('-n', '--nostorage', dest='nostorage',
                        action='store_true', help='turn off data storage')
    parser.add_argument('-p', '--pronounce', dest='pronounce', choices=[
                        'espeak', 'festival', 'real'], help="text-to-speech software: 'espeak', 'festival' or 'real'")
    parser.add_argument('-s', '--service', dest='service', choices=[
                        'bing', 'youdao', 'iciba'], help="translate service: 'bing', 'youdao' or 'iciba'")
    parser.add_argument('-w', '--webonly', dest='webonly',
                        action='store_true', help='ignore local data')
    parser.add_argument('-V', '--version', action='version',
                        version='%(prog)s 0.1.3')
    return parser.parse_args()


if __name__ == '__main__':
    args = parseArgs()
    word = args.word.strip()
    service = args.service
    webonly = args.webonly
    if service:
        webonly = True
    C = Client(word, service=service, webonly=webonly)
    pool = ThreadPool()
    _trans = pool.apply_async(C.translate)
    _suggestion = pool.apply_async(C.suggest)
    trans = _trans.get()
    if trans:
        print trans
        if args.pronounce:
            p1 = Process(target=C.pronounce, args=(args.pronounce,))
            p1.daemon = True
            p1.start()
        if not args.nostorage:
            p2 = Process(target=C.updateDB)
            p2.daemon = True
            p2.start()
    else:
        suggestion = _suggestion.get()
        if not suggestion:
            print 'No translations found for \"%s\" .' % (word)
        else:
            print 'No translations found for \"%s\", maybe you meant:\
                  \n\n%s' % (word, ' / '.join(suggestion))

GitHub 仓库 https://github.com/caspartse/python-translate

最后编辑于：2017.12.04 03:07:39

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,544评论 6赞 501
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,430评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 162,764评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,193评论 1赞 292
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,216评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,182评论 1赞 299
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,063评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,917评论 0赞 274
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,329评论 1赞 310
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,543评论 2赞 332
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,722评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,425评论 5赞 343
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,019评论 3赞 326
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,671评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,825评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,729评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,614评论 2赞 353

小作品： Python 命令行词典，含 15 万离线词库（附源码）

小作品： Python 命令行词典，含 15 万离线词库（附源码）

python-translate（Python 命令行词典）

代码说明

基本功能

使用方法

库依赖 & 软件支持

小贴士

更多资源

源码（v0.1.3)

推荐阅读更多精彩内容