Commit Graph

15 Commits

Author SHA1 Message Date
152880ab3e Adding truncation_side within TruncationParams. (#860)
* Add truncation to enable_truncation

* Fix typo

* Adding truncation_side within `TruncationParams`.

* Node serialization of this direction param.

* Update the test.

* Fixing warnings/lint.

* Adding stuff (can't local debug :( )

* Slow loop... ;(

* Stub.py.

Co-authored-by: Niels Rogge <niels.rogge1@gmail.com>
2021-12-28 12:37:06 +01:00
04368b1998 Truncate Right (#841)
* feat(tokenizers): add truncate test case

* !feat(tokenizer): truncate right

* refacto(tokenizers): clippy

* feat(bindings): update bindings for truncate()

* fix(tokenizers): remove unsafe code

* refacto(tokenizers): truncate direction

* truncate direction enum
* compute parts ranges beforehand
* 2n space because encoding is dropped at the end of procedure
* update bindings
* add pip install in python bindings' make test

* fix(node): clippy asks to use unwrap_or_else

* fix(node): lint

* refacto(tokenizers): replace Vec<Range<usize>> by Vec<(usize, usize)>

* refacto(bindings): add match syntax

* refacto(tokenizers): use mem::replace instead of mem::swap

* refacto(tokenizers): assign value the normal way
2021-12-23 13:34:21 +01:00
d3d9f2c76b words -> word_ids & sequences -> sequence_ids 2020-11-09 16:02:07 -05:00
57d162b269 Add an Encoding.sequences to allow masking 2020-11-06 10:41:56 -05:00
385d25720a Simplify the API for Encoding.token_to_XXX 2020-11-06 10:41:56 -05:00
a79cc55e08 Node - Encoding mappings handle sequence_id 2020-11-06 10:41:56 -05:00
e9a2e63a67 Node - Fix new linting errors 2020-07-24 15:44:39 -04:00
4aecd82d07 Node - Improve mappings on Encoding 2020-04-16 14:23:37 -04:00
3ad1360210 Word indices are None for special tokens 2020-04-09 09:52:02 -04:00
e9667a7b83 Node - tokenizer.postProcess bindings 2020-03-26 15:42:45 -04:00
0408567f23 Node - Merge encodings 2020-03-26 15:42:45 -04:00
ce3cf78ea5 Node - Bindings for Encoding mappings 2020-03-26 15:42:45 -04:00
25ef729a5a Node - Update bindings 2020-03-18 15:13:29 -04:00
fe49512d37 node: make WordPiece.fromFiles async 2020-03-06 16:06:06 -05:00
917996841d node: "proxy" raw Encoding with getters 2020-02-26 18:15:16 -05:00