|
a80d0a302f
|
Update .gitignores for data folder
|
2020-03-16 10:36:41 -04:00 |
|
|
e0779c50b5
|
Rust - Improve utils and unzipping encode results
|
2020-03-16 10:36:41 -04:00 |
|
|
45f3449096
|
Rust - Add some integration tests on offsets
|
2020-03-15 10:46:42 -04:00 |
|
|
9fa11bf28b
|
Rust - Remove NormalizedString from Encoding
|
2020-03-15 10:45:05 -04:00 |
|
|
7d511287ef
|
Rust - Shared data directory for benches and tests
|
2020-03-14 23:17:54 -04:00 |
|
|
f89f867adc
|
Rust - Expose get_range_of from normalizer
|
2020-03-14 23:16:11 -04:00 |
|
|
61294a6db7
|
Merge pull request #196 from huggingface/python_none_handling
Throw a more meaningful error when provided python input is None.
|
2020-03-12 10:43:20 -04:00 |
|
|
505bfbba82
|
Fix invalid error messages.
|
2020-03-12 15:38:29 +01:00 |
|
|
5ed1f26c71
|
Throw a more meaningful error when provided python input is None.
|
2020-03-12 10:59:05 +01:00 |
|
|
5673b00724
|
Merge pull request #193 from huggingface/add-special-tokens
Ability to choose whether to add special tokens at `encode`
|
2020-03-11 14:45:55 -04:00 |
|
|
3abf615a51
|
Node - Update bindings
|
2020-03-10 18:22:36 -04:00 |
|
|
c1a92d581a
|
Node - encode & encodeBatch with add_special_tokens (rust-side)
|
2020-03-10 16:55:12 -04:00 |
|
|
257360acec
|
Python - encode & encode batch with add_special_tokens
|
2020-03-10 16:21:10 -04:00 |
|
|
9e3d69389d
|
Rust - update benchmarks
|
2020-03-10 16:18:09 -04:00 |
|
|
d761d406cf
|
Rust - encode & encode_batch with add_special_tokens
|
2020-03-10 16:10:07 -04:00 |
|
|
523e173ddf
|
Merge pull request #188 from huggingface/fix-byte-level
Fix byte level BPE offsets
|
2020-03-10 14:37:47 -04:00 |
|
|
7764d3d770
|
Node - Fix bindings
|
2020-03-10 14:31:42 -04:00 |
|
|
a9be177185
|
Update CHANGELOGs
|
2020-03-10 13:12:34 -04:00 |
|
|
28f022058c
|
Keep default values as true
|
2020-03-10 12:58:53 -04:00 |
|
|
45f3eaaf72
|
Update bindings and typings
|
2020-03-10 12:28:24 -04:00 |
|
|
efbbfea558
|
Update ByteLevel PostProcessor
|
2020-03-10 12:05:04 -04:00 |
|
|
aa62c951dc
|
Node - Update bindings
|
2020-03-09 22:45:33 -04:00 |
|
|
7e9003ccb7
|
Python - Update bindings
|
2020-03-09 18:37:03 -04:00 |
|
|
6a50ecfa5c
|
Rust - Remove unnecessary ByteLevel Normalizer
|
2020-03-09 18:28:19 -04:00 |
|
|
5cc78706e8
|
Rust - PreTokenizer can update the NormalizedString
|
2020-03-09 18:10:33 -04:00 |
|
|
4510ea5ce3
|
node: type errors
|
2020-03-06 18:01:11 -05:00 |
|
|
a44eb2b5cd
|
node: update bytelevel bindings
|
2020-03-06 17:44:45 -05:00 |
|
|
55f38698dd
|
Node - Add ByteLevel PostProcessor
|
2020-03-06 17:44:44 -05:00 |
|
|
86d2e90ad2
|
Update CHANGELOGs
|
2020-03-06 17:44:44 -05:00 |
|
|
d778ed5e0a
|
Python - Update README and implementation
|
2020-03-06 17:44:44 -05:00 |
|
|
52180a9179
|
Python - Add ByteLevel PostProcessor
|
2020-03-06 17:44:44 -05:00 |
|
|
8dcbc8377e
|
Make ByteLevel a PostProcessor to fix offsets
|
2020-03-06 17:44:44 -05:00 |
|
|
adf6501609
|
Node - Hotfix
|
2020-03-06 17:44:44 -05:00 |
|
|
b60eef5245
|
Python - Make style
|
2020-03-06 17:44:44 -05:00 |
|
|
43811698a1
|
Node - Add ByteLevel normalizer
|
2020-03-06 17:44:44 -05:00 |
|
|
d8e7a830b2
|
Update CHANGELOGs
|
2020-03-06 17:44:34 -05:00 |
|
|
b2e5f54b6f
|
Python - Fix ByteLevelBPETokenizer implementation
|
2020-03-06 17:44:03 -05:00 |
|
|
f1460fadb9
|
Python - Update docs and implementations
|
2020-03-06 17:44:03 -05:00 |
|
|
2393506dc7
|
Python - Add ByteLevel Normalizer
|
2020-03-06 17:44:03 -05:00 |
|
|
760ceda632
|
Make ByteLevel a Normalizer for add_prefix_space
|
2020-03-06 17:44:03 -05:00 |
|
|
8f1a8f2734
|
Add append on NormalizedString
|
2020-03-06 17:44:02 -05:00 |
|
|
402a6871e4
|
Add prepend on NormalizedString
|
2020-03-06 17:44:00 -05:00 |
|
|
efc75332f1
|
Merge pull request #184 from huggingface/node-async
node: "asyncification"
|
2020-03-06 17:42:52 -05:00 |
|
|
dae345cc6d
|
node: add continuingSubwordPrefix to wordpiece model
|
2020-03-06 16:30:36 -05:00 |
|
|
6693e6992f
|
Node - BPE_fromFiles uses a builder for the task
|
2020-03-06 16:27:27 -05:00 |
|
|
578eddcdf9
|
node: expose decode / decodeBatch in BaseTokenizer
|
2020-03-06 16:27:27 -05:00 |
|
|
3eaeb64cc8
|
node: make decode and decodeBatch async
|
2020-03-06 16:27:27 -05:00 |
|
|
ffcd5c63bf
|
node: make BPE.fromfiles async
|
2020-03-06 16:27:27 -05:00 |
|
|
dc0a054f9e
|
fix indentation
|
2020-03-06 16:06:06 -05:00 |
|
|
fe49512d37
|
node: make WordPiece.fromFiles async
|
2020-03-06 16:06:06 -05:00 |
|