mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-07 21:28:19 +00:00
If a vocab file isn't provided the supplied unk token (different from [UNK]) gets ignored and later throws an error: Exception: WordPiece error: Missing [UNK] token from the vocabulary when trying to encode an input string with an unknown token.