152880ab3e
Adding truncation_side within TruncationParams
. ( #860 )
...
* Add truncation to enable_truncation
* Fix typo
* Adding truncation_side within `TruncationParams`.
* Node serialization of this direction param.
* Update the test.
* Fixing warnings/lint.
* Adding stuff (can't local debug :( )
* Slow loop... ;(
* Stub.py.
Co-authored-by: Niels Rogge <niels.rogge1@gmail.com >
2021-12-28 12:37:06 +01:00
04368b1998
Truncate Right ( #841 )
...
* feat(tokenizers): add truncate test case
* !feat(tokenizer): truncate right
* refacto(tokenizers): clippy
* feat(bindings): update bindings for truncate()
* fix(tokenizers): remove unsafe code
* refacto(tokenizers): truncate direction
* truncate direction enum
* compute parts ranges beforehand
* 2n space because encoding is dropped at the end of procedure
* update bindings
* add pip install in python bindings' make test
* fix(node): clippy asks to use unwrap_or_else
* fix(node): lint
* refacto(tokenizers): replace Vec<Range<usize>> by Vec<(usize, usize)>
* refacto(bindings): add match syntax
* refacto(tokenizers): use mem::replace instead of mem::swap
* refacto(tokenizers): assign value the normal way
2021-12-23 13:34:21 +01:00
884bfb7970
Prepare node release ( #794 )
...
* Node - Update changelog for release
* Update node release to add v14 & v15
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
* Node - Update version number
* Node - Update dependencies
* Node - Lint
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
2021-09-02 09:58:01 -04:00
d3d9f2c76b
words -> word_ids & sequences -> sequence_ids
2020-11-09 16:02:07 -05:00
57d162b269
Add an Encoding.sequences to allow masking
2020-11-06 10:41:56 -05:00
385d25720a
Simplify the API for Encoding.token_to_XXX
2020-11-06 10:41:56 -05:00
a79cc55e08
Node - Encoding mappings handle sequence_id
2020-11-06 10:41:56 -05:00
95cc8c47ad
Changed rust api for merges, that is now Vec<(String, String)>
2020-09-24 08:57:02 +02:00
26cafe0d6c
Fixing eslint.
2020-09-10 14:00:53 -04:00
a16d71abd0
Node - Update bindings
2020-08-19 12:42:12 -04:00
e9a2e63a67
Node - Fix new linting errors
2020-07-24 15:44:39 -04:00
a03eba2fe9
Node - Typings proposal
2020-05-27 13:12:47 -04:00
b5247f41f1
Node - Update base tokenizer
2020-05-12 18:08:26 -04:00
4aecd82d07
Node - Improve mappings on Encoding
2020-04-16 14:23:37 -04:00
38d53a7b84
Node - Expose more bindings
2020-04-13 16:48:32 -04:00
3ad1360210
Word indices are None for special tokens
2020-04-09 09:52:02 -04:00
e9667a7b83
Node - tokenizer.postProcess
bindings
2020-03-26 15:42:45 -04:00
0408567f23
Node - Merge encodings
2020-03-26 15:42:45 -04:00
70552812fe
Node - Bindings for tokenized encoding
2020-03-26 15:42:45 -04:00
ce3cf78ea5
Node - Bindings for Encoding mappings
2020-03-26 15:42:45 -04:00
7dd2400214
Node - Remove addSpecialTokens
from BertWordPieceTokenizer
2020-03-26 15:10:08 -04:00
d25eb075c8
Node - Finalize AddedToken support
2020-03-25 12:36:03 -04:00
f53a885fdd
Node - Expand AddedToken supported use
2020-03-25 11:12:29 -04:00
2aeae555e2
Node - Expose normalize
on tokenizer
2020-03-18 17:10:26 -04:00
25ef729a5a
Node - Update bindings
2020-03-18 15:13:29 -04:00
3abf615a51
Node - Update bindings
2020-03-10 18:22:36 -04:00
523e173ddf
Merge pull request #188 from huggingface/fix-byte-level
...
Fix byte level BPE offsets
2020-03-10 14:37:47 -04:00
7764d3d770
Node - Fix bindings
2020-03-10 14:31:42 -04:00
45f3eaaf72
Update bindings and typings
2020-03-10 12:28:24 -04:00
efbbfea558
Update ByteLevel PostProcessor
2020-03-10 12:05:04 -04:00
aa62c951dc
Node - Update bindings
2020-03-09 22:45:33 -04:00
4510ea5ce3
node: type errors
2020-03-06 18:01:11 -05:00
a44eb2b5cd
node: update bytelevel bindings
2020-03-06 17:44:45 -05:00
dae345cc6d
node: add continuingSubwordPrefix
to wordpiece model
2020-03-06 16:30:36 -05:00
578eddcdf9
node: expose decode
/ decodeBatch
in BaseTokenizer
2020-03-06 16:27:27 -05:00
ffcd5c63bf
node: make BPE.fromfiles
async
2020-03-06 16:27:27 -05:00
fe49512d37
node: make WordPiece.fromFiles
async
2020-03-06 16:06:06 -05:00
917996841d
node: "proxy" raw Encoding with getters
2020-02-26 18:15:16 -05:00
f836f2109b
build ts
2020-01-14 15:20:34 -05:00
a95d0e6ba1
Node - Fix import
2020-01-10 16:11:44 -05:00
24c08b2530
fix sentencepiece tokenizer name
2020-01-10 16:03:47 -05:00
80f6d58177
big big big
2020-01-10 14:49:13 -05:00
6b0935d5de
first implementations draft
2020-01-10 11:53:30 -05:00