Truncate Right (#841)

* feat(tokenizers): add truncate test case

* !feat(tokenizer): truncate right

* refacto(tokenizers): clippy

* feat(bindings): update bindings for truncate()

* fix(tokenizers): remove unsafe code

* refacto(tokenizers): truncate direction

* truncate direction enum
* compute parts ranges beforehand
* 2n space because encoding is dropped at the end of procedure
* update bindings
* add pip install in python bindings' make test

* fix(node): clippy asks to use unwrap_or_else

* fix(node): lint

* refacto(tokenizers): replace Vec<Range<usize>> by Vec<(usize, usize)>

* refacto(bindings): add match syntax

* refacto(tokenizers): use mem::replace instead of mem::swap

* refacto(tokenizers): assign value the normal way
This commit is contained in:
Luc Georges
2021-12-23 13:34:21 +01:00
committed by GitHub
parent 362df327b0
commit 04368b1998
9 changed files with 337 additions and 219 deletions

View File

@@ -286,7 +286,7 @@ class Encoding:
:obj:`List[str]`: The list of tokens
"""
pass
def truncate(self, max_length, stride=0):
def truncate(self, max_length, stride=0, direction="right"):
"""
Truncate the :class:`~tokenizers.Encoding` at the given length
@@ -299,6 +299,9 @@ class Encoding:
stride (:obj:`int`, defaults to :obj:`0`):
The length of previous content to be included in each overflowing piece
direction (:obj:`str`, defaults to :obj:`right`)
Truncate direction
"""
pass
@property