Commit Graph

992 Commits

Author SHA1 Message Date
Setu Shah
1f2cc6ee73 Include license in PyPI package 2020-07-16 14:20:32 -04:00
Anthony MOI
5be375eaea Update CHANGELOGs and bump version for python release 2020-07-06 15:21:47 -04:00
Anthony MOI
e874641cf9 Merge pull request #333 from huggingface/fix-added-tokens
Python - Fix Added token deserialization
2020-07-06 14:52:37 -04:00
Anthony MOI
2194970679 Merge pull request #330 from huggingface/bert-normalization
Improve BertNormalizer behavior
2020-07-06 14:52:23 -04:00
Anthony MOI
d34a172ec6 Merge pull request #329 from huggingface/improve-parallelism-warning
Improve parallelism tracking and warning
2020-07-06 14:52:07 -04:00
Anthony MOI
d33af1a3be Python - Fix Added token deserialization 2020-07-06 14:46:12 -04:00
Anthony MOI
7a95ffc4fa BertNormalizer has same behavior than original implem 2020-07-06 13:55:18 -04:00
Anthony MOI
8bf482cecc Improve parallelism tracking and warning 2020-07-06 13:05:14 -04:00
Anthony MOI
b91deeaa3d Merge pull request #327 from aalok-sathe/patch-1
Use supplied UNK token even when vocab absent
2020-07-06 09:01:57 -04:00
आलोक
6fe284dd8d Use supplied UNK token even when vocab absent
If a vocab file isn't provided the supplied unk token (different from [UNK]) gets ignored and later throws an error:
Exception: WordPiece error: Missing [UNK] token from the vocabulary
when trying to encode an input string with an unknown token.
2020-07-05 19:01:04 +05:30
Pierric Cistac
9294db78a4 Node - Version 0.7.0 2020-07-01 17:48:23 -04:00
Anthony MOI
6349ca51b3 Python - Bump version for 0.8.0 release 2020-06-26 16:12:26 -04:00
Anthony MOI
8ae1982149 Finally it will be rc4 for transformers 2020-06-26 15:36:08 -04:00
Anthony MOI
5a653869af Try local version for transformers 2020-06-26 15:19:00 -04:00
Anthony MOI
1a08b21329 Python - Bump version for 0.8.0.transformers release 2020-06-26 14:37:22 -04:00
Anthony MOI
6d531a435e Merge pull request #311 from huggingface/optional-parallelism
Make parallelism optional
2020-06-26 10:55:20 -04:00
Anthony MOI
bb668bc439 Try with target_family = unix 2020-06-23 16:52:21 -04:00
Anthony MOI
f8b1630aa6 Update CHANGELOGs 2020-06-23 13:32:21 -04:00
Anthony MOI
aa3b39f692 Python - Tests for parallelism with multiprocessing
Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com>
2020-06-23 11:25:39 -04:00
Anthony MOI
5f760df231 CI - Add build checks on macos for Python 2020-06-22 20:31:52 -04:00
Anthony MOI
ae743f5dc1 Python - Automatically disable parallelism after fork 2020-06-22 20:31:52 -04:00
Anthony MOI
5d20322319 Rust - Fix optional parallelism with par_bridge 2020-06-22 20:31:52 -04:00
Anthony MOI
dce52621c6 Rust - Make parallelism optional 2020-06-22 20:31:52 -04:00
Anthony MOI
74d812d401 Python - Bump version to 0.8.0.rc3 for release 2020-06-22 12:54:31 -04:00
Anthony MOI
42983ccb8e Merge pull request #315 from huggingface/fix-serialization
Rust - Fix Serialization when tokens are part of original vocab
2020-06-22 12:52:28 -04:00
Anthony MOI
f84d20835a Rust - Fix Serialization when tokens are part of original vocab 2020-06-22 12:41:37 -04:00
Anthony MOI
5b5fbed088 Merge pull request #312 from huggingface/improve-added-token-python
Python - Improve AddedToken interface
2020-06-22 12:30:20 -04:00
Anthony MOI
c02d4e2202 Python - Improve AddedToken interface 2020-06-19 17:53:46 -04:00
Anthony MOI
a14cd7b219 Python - Bump version to 0.8.0.rc2 for release 2020-06-19 10:48:53 -04:00
Anthony MOI
58e29fd30c Merge pull request #309 from huggingface/improve-added-tokens
Improve additionnal vocabulary management
2020-06-19 10:42:58 -04:00
Anthony MOI
898a4a812e Python - Make AddedToken pickable 2020-06-19 10:34:11 -04:00
Anthony MOI
63edb95130 Python - Update AddedToken repr 2020-06-19 10:18:55 -04:00
Anthony MOI
4c7a0ff4ec Update CHANGELOGs 2020-06-18 14:50:16 -04:00
Anthony MOI
b92d739808 Rust - Fix byte-level decoding for added tokens 2020-06-18 14:50:16 -04:00
Anthony MOI
fc63d56eab AddedVocabulary - Add tests, update bindings + various tweaks 2020-06-18 14:50:16 -04:00
Anthony MOI
c6f633eb1c Rust - Fix/Tweak AddedVocabulary + Fix python tests 2020-06-16 14:42:53 -04:00
Anthony MOI
397cc539da Rust - Add AddedVocabulary + normalized option on AddedToken 2020-06-16 14:42:53 -04:00
Anthony MOI
7dff86b704 Rust - Add slice and slice_bytes to NormalizedString 2020-06-16 14:42:52 -04:00
Anthony MOI
66be62b6e6 Rust - Extract AddedVocabulary management from Tokenizer 2020-06-16 14:42:52 -04:00
Anthony MOI
6091c9b229 Merge pull request #303 from huggingface/node-extraction
Node bindings - Huge refactoring
2020-06-16 14:37:14 -04:00
Pierric Cistac
88354b0e40 gitignore vscode workspace files 2020-06-15 18:08:41 -04:00
Pierric Cistac
252b84a100 Node - Fix encoding pad 2020-06-15 17:59:54 -04:00
Pierric Cistac
c3ec7b1544 Node - Tweaks doc / test warning 2020-06-15 17:59:19 -04:00
Stefan Mesken
14b12b47a3 127 rust example not working (#277)
* fix Rust example

* fix formatting

* create vocab.json and merges.txt from trained encoder

* Example of training, serializing and deserializing a tokenizer
2020-06-15 08:38:41 -04:00
Anthony MOI
fb964adfdb Python - Bump version to 0.8.0.rc1 for release 2020-06-11 14:24:34 -04:00
Anthony MOI
bb68ec3414 Fix number of added tokens
Related to #302
2020-06-11 14:22:58 -04:00
Anthony MOI
67b458b134 Fix get_vocab_size returning the wrong number
cc @thomwolf
2020-06-11 14:13:50 -04:00
Anthony MOI
847651445e Fix build-wheels.sh script for manylinux wheels
Before this change, we added the `.so` files from previous version in
the `.whl` files of later versions.

Fix #301
2020-06-11 12:43:40 -04:00
Anthony MOI
a0e32914e9 Node: Activate rustfmt and clippy in the CI 2020-06-11 12:01:42 -04:00
Anthony MOI
99b3f0ba4d Node - Fix Clippy lint warnings 2020-06-11 12:01:42 -04:00