Anthony MOI
e874641cf9
Merge pull request #333 from huggingface/fix-added-tokens
...
Python - Fix Added token deserialization
2020-07-06 14:52:37 -04:00
Anthony MOI
2194970679
Merge pull request #330 from huggingface/bert-normalization
...
Improve BertNormalizer behavior
2020-07-06 14:52:23 -04:00
Anthony MOI
d33af1a3be
Python - Fix Added token deserialization
2020-07-06 14:46:12 -04:00
Anthony MOI
7a95ffc4fa
BertNormalizer has same behavior than original implem
2020-07-06 13:55:18 -04:00
Anthony MOI
8bf482cecc
Improve parallelism tracking and warning
2020-07-06 13:05:14 -04:00
आलोक
6fe284dd8d
Use supplied UNK token even when vocab absent
...
If a vocab file isn't provided the supplied unk token (different from [UNK]) gets ignored and later throws an error:
Exception: WordPiece error: Missing [UNK] token from the vocabulary
when trying to encode an input string with an unknown token.
2020-07-05 19:01:04 +05:30
Anthony MOI
6349ca51b3
Python - Bump version for 0.8.0 release
2020-06-26 16:12:26 -04:00
Anthony MOI
8ae1982149
Finally it will be rc4 for transformers
2020-06-26 15:36:08 -04:00
Anthony MOI
5a653869af
Try local version for transformers
2020-06-26 15:19:00 -04:00
Anthony MOI
1a08b21329
Python - Bump version for 0.8.0.transformers release
2020-06-26 14:37:22 -04:00
Anthony MOI
bb668bc439
Try with target_family = unix
2020-06-23 16:52:21 -04:00
Anthony MOI
f8b1630aa6
Update CHANGELOGs
2020-06-23 13:32:21 -04:00
Anthony MOI
aa3b39f692
Python - Tests for parallelism with multiprocessing
...
Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com >
2020-06-23 11:25:39 -04:00
Anthony MOI
ae743f5dc1
Python - Automatically disable parallelism after fork
2020-06-22 20:31:52 -04:00
Anthony MOI
5d20322319
Rust - Fix optional parallelism with par_bridge
2020-06-22 20:31:52 -04:00
Anthony MOI
dce52621c6
Rust - Make parallelism optional
2020-06-22 20:31:52 -04:00
Anthony MOI
74d812d401
Python - Bump version to 0.8.0.rc3 for release
2020-06-22 12:54:31 -04:00
Anthony MOI
c02d4e2202
Python - Improve AddedToken interface
2020-06-19 17:53:46 -04:00
Anthony MOI
a14cd7b219
Python - Bump version to 0.8.0.rc2 for release
2020-06-19 10:48:53 -04:00
Anthony MOI
898a4a812e
Python - Make AddedToken pickable
2020-06-19 10:34:11 -04:00
Anthony MOI
63edb95130
Python - Update AddedToken repr
2020-06-19 10:18:55 -04:00
Anthony MOI
4c7a0ff4ec
Update CHANGELOGs
2020-06-18 14:50:16 -04:00
Anthony MOI
fc63d56eab
AddedVocabulary - Add tests, update bindings + various tweaks
2020-06-18 14:50:16 -04:00
Anthony MOI
c6f633eb1c
Rust - Fix/Tweak AddedVocabulary + Fix python tests
2020-06-16 14:42:53 -04:00
Anthony MOI
397cc539da
Rust - Add AddedVocabulary + normalized option on AddedToken
2020-06-16 14:42:53 -04:00
Anthony MOI
fb964adfdb
Python - Bump version to 0.8.0.rc1 for release
2020-06-11 14:24:34 -04:00
Anthony MOI
847651445e
Fix build-wheels.sh script for manylinux wheels
...
Before this change, we added the `.so` files from previous version in
the `.whl` files of later versions.
Fix #301
2020-06-11 12:43:40 -04:00
Anthony MOI
433a311887
Update CHANGELOGs
2020-06-09 17:33:41 -04:00
Anthony MOI
794759b56d
Python - Improve truncation/padding management
2020-06-09 17:33:41 -04:00
Anthony MOI
d00ac60162
Update changelogs and bump version for python release
2020-06-03 18:27:49 -04:00
Morgan Funtowicz
fcb4e76d9b
Ensure pad_to_multiple_of is correctly forwarded in base_tokenizer.py
2020-05-31 10:02:59 +02:00
Anthony MOI
0934fe5803
Python - Bindings for pad_to_multiple_of
2020-05-29 20:34:41 -04:00
Anthony MOI
2a0f2337db
Python - Update CHANGELOG and bump version to 0.8.0.dev1 for release
2020-05-27 14:22:00 -04:00
Anthony MOI
c205afe7a5
Python - Also allow creating Tokenizer from_buffer
2020-05-27 13:46:37 -04:00
Anthony MOI
0e890d0d05
Update CHANGELOGs
2020-05-27 13:46:37 -04:00
Anthony MOI
de9feae0b5
Python - Make Encoding pickable
2020-05-27 13:46:37 -04:00
Anthony MOI
c5bba91bf4
Python - Test and fix classes pickling
2020-05-27 13:46:37 -04:00
Anthony MOI
6a70162d78
Python - Make all relevant classes pickable
2020-05-27 13:46:37 -04:00
Anthony MOI
93bb82c657
Update READMEs and CHANGELOGs
2020-05-27 13:32:20 -04:00
Anthony MOI
b24904513c
Update READMEs and CHANGELOGs
2020-05-27 13:12:46 -04:00
Anthony MOI
85c7c94809
Python - Add to/from str and files for Tokenizer
2020-05-27 13:07:53 -04:00
Anthony MOI
cffcbb95fc
Rust - serialization fixes + loading/saving methods
2020-05-27 13:07:53 -04:00
Anthony MOI
c800813bbe
Python - Add Tokenizer saving capability
2020-05-27 13:07:52 -04:00
Anthony MOI
2b17d4221c
Python - Restore custom PyDecoder and PyPreTokenizer
2020-05-27 13:07:52 -04:00
Anthony MOI
07fb3283f4
Python - Disable custom Decoder/PreTokenizer for now
2020-05-27 13:07:52 -04:00
Anthony MOI
400d9545fd
Update rust toolchain for now
2020-05-21 19:15:40 -04:00
Anthony MOI
5a01792413
Python - Update CHANGELOGs and bump to 0.8.0-dev for release
2020-05-21 18:57:02 -04:00
Anthony MOI
7ad3bda369
Merge pull request #249 from huggingface/pre-tokenized
...
Allow pre-tokenized inputs to encode/encode_batch
2020-05-21 18:39:46 -04:00
Anthony MOI
8cb4ca72b6
Python - Update dependencies
2020-05-20 19:55:14 -04:00
Anthony MOI
30216190e5
Python - Improve typings for new encode/encode_batch
2020-05-01 17:11:55 -04:00