Commit Graph

368 Commits

Author SHA1 Message Date
Anthony MOI
c02d4e2202 Python - Improve AddedToken interface 2020-06-19 17:53:46 -04:00
Anthony MOI
a14cd7b219 Python - Bump version to 0.8.0.rc2 for release 2020-06-19 10:48:53 -04:00
Anthony MOI
898a4a812e Python - Make AddedToken pickable 2020-06-19 10:34:11 -04:00
Anthony MOI
63edb95130 Python - Update AddedToken repr 2020-06-19 10:18:55 -04:00
Anthony MOI
4c7a0ff4ec Update CHANGELOGs 2020-06-18 14:50:16 -04:00
Anthony MOI
fc63d56eab AddedVocabulary - Add tests, update bindings + various tweaks 2020-06-18 14:50:16 -04:00
Anthony MOI
c6f633eb1c Rust - Fix/Tweak AddedVocabulary + Fix python tests 2020-06-16 14:42:53 -04:00
Anthony MOI
397cc539da Rust - Add AddedVocabulary + normalized option on AddedToken 2020-06-16 14:42:53 -04:00
Anthony MOI
fb964adfdb Python - Bump version to 0.8.0.rc1 for release 2020-06-11 14:24:34 -04:00
Anthony MOI
847651445e Fix build-wheels.sh script for manylinux wheels
Before this change, we added the `.so` files from previous version in
the `.whl` files of later versions.

Fix #301
2020-06-11 12:43:40 -04:00
Anthony MOI
433a311887 Update CHANGELOGs 2020-06-09 17:33:41 -04:00
Anthony MOI
794759b56d Python - Improve truncation/padding management 2020-06-09 17:33:41 -04:00
Anthony MOI
d00ac60162 Update changelogs and bump version for python release 2020-06-03 18:27:49 -04:00
Morgan Funtowicz
fcb4e76d9b Ensure pad_to_multiple_of is correctly forwarded in base_tokenizer.py 2020-05-31 10:02:59 +02:00
Anthony MOI
0934fe5803 Python - Bindings for pad_to_multiple_of 2020-05-29 20:34:41 -04:00
Anthony MOI
2a0f2337db Python - Update CHANGELOG and bump version to 0.8.0.dev1 for release 2020-05-27 14:22:00 -04:00
Anthony MOI
c205afe7a5 Python - Also allow creating Tokenizer from_buffer 2020-05-27 13:46:37 -04:00
Anthony MOI
0e890d0d05 Update CHANGELOGs 2020-05-27 13:46:37 -04:00
Anthony MOI
de9feae0b5 Python - Make Encoding pickable 2020-05-27 13:46:37 -04:00
Anthony MOI
c5bba91bf4 Python - Test and fix classes pickling 2020-05-27 13:46:37 -04:00
Anthony MOI
6a70162d78 Python - Make all relevant classes pickable 2020-05-27 13:46:37 -04:00
Anthony MOI
93bb82c657 Update READMEs and CHANGELOGs 2020-05-27 13:32:20 -04:00
Anthony MOI
b24904513c Update READMEs and CHANGELOGs 2020-05-27 13:12:46 -04:00
Anthony MOI
85c7c94809 Python - Add to/from str and files for Tokenizer 2020-05-27 13:07:53 -04:00
Anthony MOI
cffcbb95fc Rust - serialization fixes + loading/saving methods 2020-05-27 13:07:53 -04:00
Anthony MOI
c800813bbe Python - Add Tokenizer saving capability 2020-05-27 13:07:52 -04:00
Anthony MOI
2b17d4221c Python - Restore custom PyDecoder and PyPreTokenizer 2020-05-27 13:07:52 -04:00
Anthony MOI
07fb3283f4 Python - Disable custom Decoder/PreTokenizer for now 2020-05-27 13:07:52 -04:00
Anthony MOI
400d9545fd Update rust toolchain for now 2020-05-21 19:15:40 -04:00
Anthony MOI
5a01792413 Python - Update CHANGELOGs and bump to 0.8.0-dev for release 2020-05-21 18:57:02 -04:00
Anthony MOI
7ad3bda369 Merge pull request #249 from huggingface/pre-tokenized
Allow pre-tokenized inputs to encode/encode_batch
2020-05-21 18:39:46 -04:00
Anthony MOI
8cb4ca72b6 Python - Update dependencies 2020-05-20 19:55:14 -04:00
Anthony MOI
30216190e5 Python - Improve typings for new encode/encode_batch 2020-05-01 17:11:55 -04:00
Anthony MOI
3fb8033770 Python - Improve tests for new encode/encode_batch 2020-05-01 17:11:55 -04:00
Anthony MOI
efaa6f589a Python - Improve encode/encode_batch 2020-05-01 17:11:54 -04:00
Anthony MOI
dbc8e68c68 Python - Update tests for new encode 2020-05-01 17:11:54 -04:00
Anthony MOI
2e105c4258 Python - Update typings for new encode 2020-05-01 17:11:54 -04:00
Anthony MOI
835f08ab02 Python - Update bindings for new encode 2020-05-01 17:11:54 -04:00
Anthony MOI
02cc97756f Rust - Improve TruncationError 2020-04-24 12:13:17 -04:00
Anthony MOI
7d2b59b0aa Rust - Add len() and is_empty() on Encoding 2020-04-24 11:44:10 -04:00
jaymody
a28fd29204 Python - Fix bug in bert wordpiece example script 2020-04-18 17:50:52 -04:00
Anthony MOI
670f619ab5 Python - bump to 0.7.0 for final release 2020-04-17 12:48:10 -04:00
Anthony MOI
3312ad75d9 Python - Bump to 0.7.0rc6 for release 2020-04-16 19:39:04 -04:00
Anthony MOI
ad0e488998 Python - Update changelog 2020-04-16 19:32:54 -04:00
Anthony MOI
249a282f1d Python - Fix style 2020-04-16 19:31:00 -04:00
Thomas Wolf
77590b9291 style! 2020-04-17 01:29:52 +02:00
Thomas Wolf
7216486686 Update CharLevelBPE 2020-04-17 01:15:02 +02:00
Anthony MOI
873ac2d9a8 Python - Add missing char_to_word 2020-04-16 18:20:30 -04:00
Anthony MOI
bdfb02f473 Python - Bump to 0.7.0rc6 for release 2020-04-16 14:42:22 -04:00
Anthony MOI
8834508547 Update CHANGELOGs 2020-04-16 14:25:19 -04:00