tokenizers

mirror of https://github.com/mii443/tokenizers.git synced 2025-08-22 16:25:30 +00:00

Files

Kaito Sugimoto 1bb9884f45 Fixing the vocab size of the trained Unigram model (#952 )

* Fixing the vocab size of the trained Unigram model

* add test for the vocab size of the trained Unigram model

* Revert "add test for the vocab size of the trained Unigram model"

This reverts commit fb8955c831b357d1037548ceaa8789734d544646.

* Fixing the vocab size of the trained Unigram model

* format codes

* get the position of vocab-size calculation out of loop

2022-03-18 18:13:17 +01:00

node

Making the regex in ByteLevel optional. (#939 )

2022-03-18 09:03:20 +01:00

python

Fixing the vocab size of the trained Unigram model (#952 )

2022-03-18 18:13:17 +01:00