Releases: hankcs/HanLP
v2.1.1 Ancient Chinese Support
After supporting 130 languages, HanLP has officially released an open-source Ancient Chinese model. This model supports automatic word segmentation, lemmatization, part-of-speech tagging, and dependency parsing for Ancient Chinese. Thanks to multi-task learning, this single model can handle all of these tasks, as well as coarse-grained/fine-grained segmentation and UPOS/XPOS/PKU part-of-speech tagging sets.
- Blog post:
- Demo: //
- Performance:
- Visualization:
Full Changelog: v2.1.0...v2.1.1
v2.1.0 English Support
What's Changed
- Release an English MTL model with ModernBERT encoder:
- Enhance Security Practices for HanLP Based on OpenSSF Scorecard by @Fix3dP0int in #1931
New Contributors
- @Fix3dP0int made their first contribution in #1931
Full Changelog: v2.1.0-beta.62...v2.1.0
v1.8.6 常规维护
What's Changed
- 更新Portable版中的自定义词典 fix: #1936
- 清理
- 数据包兼容
- Portable版同步升级到v1.8.6
Full Changelog: v1.8.5...v1.8.6
v1.8.5 常规维护
What's Changed
- 修复mini二元文法在JRE初始化后第一次分词可能出现的不一致 fix: #1851 (comment)
- 修复ViterbiSegment分词器中加载自定义词典时未替换DoubleArrayTrie导致分词不符合预期的问题 by @wxy929629 in #1835
- fix:修复CWSEvaluator比较切分语句时的计算错误 by @webSue in #1853
- 数据包兼容
- Portable版同步升级到v1.8.5
New Contributors
- @wxy929629 made their first contribution in #1835
Full Changelog: v1.8.4...v1.8.5
v2.1.0-beta.62 Routine Release
What's Changed
- Release mMiniLMv2L12 version of MTL on UD210
- Release a small MTL model trained on our new corpora
- Multi-process compatible loader
- Support new versions of tensorflow and numpy
- Add support for Python 3.10
- Implementation of "Graph Pre-training for AMR Parsing and Generation"
- Let PipeLine support copy() by @Vela-zz in #1861
New Contributors
Full Changelog: v2.1.0-beta.0...v2.1.0-beta.62
v1.8.4 常规维护
- 将<>视作分隔符 fix
- Segment 添加是否进行 Normalize 的配置方法 close #1714
- 修复文本推荐的评分器分数计算时 scorer.boost 的 bug fix: #1718
- bugfix: 修复 bintrie 树全分词时 提前跳出循环 bug by @carl10086 in #1775
- 自定义词典支持.tsv格式 fix: #1785
- 修复自定义词典路径传参 fix: #1799
- 为DoubleArrayTrie增加enableFastBuild by @qiangwang in #1805
- 数据包兼容
- Portable版同步升级到v1.8.4
New Contributors
- @carl10086 made their first contribution in #1775
- @qiangwang made their first contribution in #1805
Full Changelog: v1.8.3...v1.8.4
v1.8.3 常规维护
- 修复动态自定义词典与CustomDictionaryForcing的搭配问题 fix #1712
- 调整
fix #1670 - 根据总词频动态决定未登录词的默认词频
- DoubleArrayTrie里的LongestSearcher的next支持null作为值 by @tiandiweizun in #1674
- Update DoubleArrayTrie.java的注释 by @TITC in #1699
- 数据包兼容
- Portable版同步升级到v1.8.3
Full Changelog: v1.8.2...v1.8.3
New Contributors
v2.1.0-beta 104 languages, 10 tasks, dual backends
We are proud to announce the beta release of HanLP 2.1, which now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
v1.8.2 常规维护与准确率提升
- 调整公式,维特比分词准确率从94.49提升至94.69
- 改进 HMM 采样函数
- 支持禁用自动刷新词典缓存(CustomDictionaryAutoRefreshCache=false)fix #1655
- 修复CoreDictionary的reload方法
- 修订bigram模型
- 修订简繁映射表
- lve4的韵母修正为ve fix #1644
- 修复 CustomDictionary.reload() fix #1635
- 数据包兼容
- Portable版同步升级到v1.8.2
v1.8.1 常规维护与修复
- 修复 convertToPinyinList fix #1634
- 修复CharTable 归一化部分字符错误 fix #1615
- 数据包兼容
- Portable版同步升级到v1.8.1