Anthony MOI
d953d58cee
Rust - Fix offsets when there are added tokens
2020-03-19 12:53:03 -04:00
Anthony MOI
d53de0e2da
Python - Expose normalize on BaseTokenizer
2020-03-18 16:44:31 -04:00
Anthony MOI
ae0d330907
Update CHANGELOGs
2020-03-18 16:42:27 -04:00
Anthony MOI
60a4fb35f4
Python - Update bindings
2020-03-16 10:36:42 -04:00
Morgan Funtowicz
505bfbba82
Fix invalid error messages.
2020-03-12 15:38:29 +01:00
Morgan Funtowicz
5ed1f26c71
Throw a more meaningful error when provided python input is None.
2020-03-12 10:59:05 +01:00
Anthony MOI
257360acec
Python - encode & encode batch with add_special_tokens
2020-03-10 16:21:10 -04:00
Anthony MOI
a9be177185
Update CHANGELOGs
2020-03-10 13:12:34 -04:00
Anthony MOI
28f022058c
Keep default values as true
2020-03-10 12:58:53 -04:00
Anthony MOI
45f3eaaf72
Update bindings and typings
2020-03-10 12:28:24 -04:00
Anthony MOI
efbbfea558
Update ByteLevel PostProcessor
2020-03-10 12:05:04 -04:00
Anthony MOI
7e9003ccb7
Python - Update bindings
2020-03-09 18:37:03 -04:00
Anthony MOI
86d2e90ad2
Update CHANGELOGs
2020-03-06 17:44:44 -05:00
Anthony MOI
d778ed5e0a
Python - Update README and implementation
2020-03-06 17:44:44 -05:00
Anthony MOI
52180a9179
Python - Add ByteLevel PostProcessor
2020-03-06 17:44:44 -05:00
Anthony MOI
b60eef5245
Python - Make style
2020-03-06 17:44:44 -05:00
Anthony MOI
d8e7a830b2
Update CHANGELOGs
2020-03-06 17:44:34 -05:00
Anthony MOI
b2e5f54b6f
Python - Fix ByteLevelBPETokenizer implementation
2020-03-06 17:44:03 -05:00
Anthony MOI
f1460fadb9
Python - Update docs and implementations
2020-03-06 17:44:03 -05:00
Anthony MOI
2393506dc7
Python - Add ByteLevel Normalizer
2020-03-06 17:44:03 -05:00
Anthony MOI
47cef0e13a
Python - Fix BPE and WordPiece builders usage
2020-03-06 12:20:39 -05:00
Anthony MOI
4b596e19dd
Rust - Improve training progress for multiple files
2020-03-03 11:04:24 -05:00
Anthony MOI
8e791791d1
Python - prepare for release
2020-03-02 14:56:42 -05:00
Anthony MOI
4deeb9511f
Update CHANGELOGs
2020-03-02 14:37:17 -05:00
Anthony MOI
f8f0702d98
Fix LongestFirst truncation strategy
2020-02-29 16:26:13 -05:00
Anthony MOI
657f8b6c15
Rust & Python - Update CHANGELOGs
2020-02-26 11:30:44 -05:00
Anthony MOI
3b10d640d5
Rust & Python - Update CHANGELOGs
2020-02-26 10:51:40 -05:00
Anthony MOI
2425fe877d
Python - Update CHANGELOG
2020-02-26 09:31:17 -05:00
Anthony MOI
61b4c9c30a
Python - Add missing tokens to BertWordPieceTokenizer
2020-02-26 09:21:54 -05:00
Anthony MOI
440e8e9bd9
Python - Bump version for release
2020-02-24 16:08:49 -05:00
Anthony MOI
be08d9574c
Python - Add Changelog
2020-02-24 10:12:50 -05:00
Anthony MOI
999088ef94
Python - Bump version for release
2020-02-24 09:56:08 -05:00
Morgan Funtowicz
817b760ab9
Make name parameter Optional[str] on BaseTokenizer
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-22 14:57:43 +01:00
Morgan Funtowicz
d274a7691d
Avoid breaking changes and let parameter name be Optional.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-22 14:56:59 +01:00
Morgan Funtowicz
0fc8be9d69
Formatting for python binding.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-22 00:17:44 +01:00
Morgan Funtowicz
f88a6b40ac
Make parameter name on Model.save() optional.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-22 00:01:32 +01:00
Anthony MOI
11dd6c8bae
Python - Bump version for release
2020-02-18 18:49:11 -05:00
Anthony MOI
41929462c7
Python - Add classifiers
2020-02-18 18:48:21 -05:00
Anthony MOI
d8a73c89a7
Python - Add Encoding length
2020-02-18 18:24:13 -05:00
Anthony MOI
d48fdbe057
Python - Only add special tokens when in-vocabulary
2020-02-18 17:27:27 -05:00
Anthony MOI
5daf1eea86
Python - Replace last BPETokenizer occurences
2020-02-18 16:25:59 -05:00
Anthony MOI
f263d7651f
Python - RustFmt
2020-02-18 15:07:34 -05:00
Anthony MOI
8e9fae6be4
Python - Add check-style to Makefile
2020-02-18 11:11:07 -05:00
Anthony MOI
81be207819
Python - Black auto formatting
2020-02-18 10:45:36 -05:00
Anthony MOI
4706151c32
Python - Add Makefile with Black formatting
2020-02-18 10:45:10 -05:00
Anthony MOI
1509f747af
Python - Uniformize implementations parameters
2020-02-18 10:27:10 -05:00
MOI Anthony
3512bd3400
Merge pull request #149 from colinclement/master
...
Allow dropout option in ByteLevelBPETokenizer
2020-02-18 09:59:40 -05:00
Morgan Funtowicz
891dd4adb8
Fix invalid num_added_tokens method call in BaseTokenizer.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-02-17 15:32:34 +01:00
Funtowicz Morgan
bb8321ac0d
Add Strip normalizer ( #140 )
...
* WIP strip.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Rust StripNormalizer
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Allow to specify strip direction
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Renamed StripNormalizer to Strip
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Added Python binding.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Makes Strip python compatible with pythonic constructor.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Run RustFmt
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Clippy next ofc.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Move lstrip and rstrip on NormalizedString
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* implment strip() for normalizer + unittests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Add some more unittests on edge cases.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* clippy and fmt.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
* Simplify strip and fix offsets
* Python - Update strip bindings with default values
Co-authored-by: MOI Anthony <xn1t0x@gmail.com >
2020-02-17 11:26:40 +01:00
Colin Clement
e591cfce7b
pass through dropout option in ByteLevelBPETokenizer
2020-02-15 01:58:55 +00:00