into a proper Rust Tokenizer format and check it on a file. - Also fuse_unks by default in `tokenizers`'s BPE.