Commit Graph

687 Commits

Author SHA1 Message Date
a80d0a302f Update .gitignores for data folder 2020-03-16 10:36:41 -04:00
e0779c50b5 Rust - Improve utils and unzipping encode results 2020-03-16 10:36:41 -04:00
45f3449096 Rust - Add some integration tests on offsets 2020-03-15 10:46:42 -04:00
9fa11bf28b Rust - Remove NormalizedString from Encoding 2020-03-15 10:45:05 -04:00
7d511287ef Rust - Shared data directory for benches and tests 2020-03-14 23:17:54 -04:00
f89f867adc Rust - Expose get_range_of from normalizer 2020-03-14 23:16:11 -04:00
61294a6db7 Merge pull request #196 from huggingface/python_none_handling
Throw a more meaningful error when provided python input is None.
2020-03-12 10:43:20 -04:00
505bfbba82 Fix invalid error messages. 2020-03-12 15:38:29 +01:00
5ed1f26c71 Throw a more meaningful error when provided python input is None. 2020-03-12 10:59:05 +01:00
5673b00724 Merge pull request #193 from huggingface/add-special-tokens
Ability to choose whether to add special tokens at `encode`
2020-03-11 14:45:55 -04:00
3abf615a51 Node - Update bindings 2020-03-10 18:22:36 -04:00
c1a92d581a Node - encode & encodeBatch with add_special_tokens (rust-side) 2020-03-10 16:55:12 -04:00
257360acec Python - encode & encode batch with add_special_tokens 2020-03-10 16:21:10 -04:00
9e3d69389d Rust - update benchmarks 2020-03-10 16:18:09 -04:00
d761d406cf Rust - encode & encode_batch with add_special_tokens 2020-03-10 16:10:07 -04:00
523e173ddf Merge pull request #188 from huggingface/fix-byte-level
Fix byte level BPE offsets
2020-03-10 14:37:47 -04:00
7764d3d770 Node - Fix bindings 2020-03-10 14:31:42 -04:00
a9be177185 Update CHANGELOGs 2020-03-10 13:12:34 -04:00
28f022058c Keep default values as true 2020-03-10 12:58:53 -04:00
45f3eaaf72 Update bindings and typings 2020-03-10 12:28:24 -04:00
efbbfea558 Update ByteLevel PostProcessor 2020-03-10 12:05:04 -04:00
aa62c951dc Node - Update bindings 2020-03-09 22:45:33 -04:00
7e9003ccb7 Python - Update bindings 2020-03-09 18:37:03 -04:00
6a50ecfa5c Rust - Remove unnecessary ByteLevel Normalizer 2020-03-09 18:28:19 -04:00
5cc78706e8 Rust - PreTokenizer can update the NormalizedString 2020-03-09 18:10:33 -04:00
4510ea5ce3 node: type errors 2020-03-06 18:01:11 -05:00
a44eb2b5cd node: update bytelevel bindings 2020-03-06 17:44:45 -05:00
55f38698dd Node - Add ByteLevel PostProcessor 2020-03-06 17:44:44 -05:00
86d2e90ad2 Update CHANGELOGs 2020-03-06 17:44:44 -05:00
d778ed5e0a Python - Update README and implementation 2020-03-06 17:44:44 -05:00
52180a9179 Python - Add ByteLevel PostProcessor 2020-03-06 17:44:44 -05:00
8dcbc8377e Make ByteLevel a PostProcessor to fix offsets 2020-03-06 17:44:44 -05:00
adf6501609 Node - Hotfix 2020-03-06 17:44:44 -05:00
b60eef5245 Python - Make style 2020-03-06 17:44:44 -05:00
43811698a1 Node - Add ByteLevel normalizer 2020-03-06 17:44:44 -05:00
d8e7a830b2 Update CHANGELOGs 2020-03-06 17:44:34 -05:00
b2e5f54b6f Python - Fix ByteLevelBPETokenizer implementation 2020-03-06 17:44:03 -05:00
f1460fadb9 Python - Update docs and implementations 2020-03-06 17:44:03 -05:00
2393506dc7 Python - Add ByteLevel Normalizer 2020-03-06 17:44:03 -05:00
760ceda632 Make ByteLevel a Normalizer for add_prefix_space 2020-03-06 17:44:03 -05:00
8f1a8f2734 Add append on NormalizedString 2020-03-06 17:44:02 -05:00
402a6871e4 Add prepend on NormalizedString 2020-03-06 17:44:00 -05:00
efc75332f1 Merge pull request #184 from huggingface/node-async
node: "asyncification"
2020-03-06 17:42:52 -05:00
dae345cc6d node: add continuingSubwordPrefix to wordpiece model 2020-03-06 16:30:36 -05:00
6693e6992f Node - BPE_fromFiles uses a builder for the task 2020-03-06 16:27:27 -05:00
578eddcdf9 node: expose decode / decodeBatch in BaseTokenizer 2020-03-06 16:27:27 -05:00
3eaeb64cc8 node: make decode and decodeBatch async 2020-03-06 16:27:27 -05:00
ffcd5c63bf node: make BPE.fromfiles async 2020-03-06 16:27:27 -05:00
dc0a054f9e fix indentation 2020-03-06 16:06:06 -05:00
fe49512d37 node: make WordPiece.fromFiles async 2020-03-06 16:06:06 -05:00