Add a way to specify the unknown token in SentencePieceUnigramTokenizer python implem (#762)

* add a way to specify the unknown token in `SentencePieceUnigramTokenizer`

* add test that verify that an exception is raised for the missing unknown token

* style

* add test tokens
This commit is contained in:
SaulLu
2021-08-12 15:42:44 +02:00
committed by GitHub
parent 46bed542fa
commit da4c7b10e4
3 changed files with 77 additions and 3 deletions

View File

@ -2,7 +2,7 @@ from setuptools import setup
from setuptools_rust import Binding, RustExtension
extras = {}
extras["testing"] = ["pytest"]
extras["testing"] = ["pytest", "requests", "numpy", "datasets"]
setup(
name="tokenizers",