|
f1faec1756
|
Fix typos in strings and comments (#1770)
|
2025-05-27 08:17:36 +02:00 |
|
|
91393ef75e
|
Fixing doc. (#1499)
* Fixing doc.
* SentencePieceUnigram and Convert.py still used sentencepiece
* stub
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
|
2024-04-17 09:32:40 +02:00 |
|
|
29fef1e7aa
|
[remove black ] And use ruff (#1436)
* nits
* Fixing deps.
* Ruff update.
* Import order matters.
* Fix.
* Revert ruff fix.
* Visualizer.
* Putting back the imports.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
|
2024-03-12 11:24:21 +01:00 |
|
|
4b0dc6b947
|
Fix SPM conversions (#686)
* Fix SPM conversions
* Update changelog
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
|
2021-05-20 09:55:55 -04:00 |
|
|
e999a7b5f9
|
Revert "Fix SPM conversions"
This reverts commit e1ffe39764 .
|
2021-04-21 18:09:58 -04:00 |
|
|
e1ffe39764
|
Fix SPM conversions
|
2021-04-21 18:09:49 -04:00 |
|
|
96b9972842
|
Fix SentencePiece tokenizers conversion
|
2021-02-03 12:44:46 -05:00 |
|
|
598ce61229
|
Removed now wrong code in convert.py , fixed strange black magic.
|
2020-09-24 08:57:02 +02:00 |
|
|
8f8156fd2c
|
Adressing first pass of comments.
|
2020-09-24 08:57:02 +02:00 |
|
|
9d3a93db5b
|
Going back for not fuse_unk by default for BPE, but add a flag to
enable it.
|
2020-09-22 16:27:09 -04:00 |
|
|
033b98ce59
|
Updating convert scripts with Replace normalizer.
|
2020-09-22 08:21:38 +02:00 |
|
|
c59b216baa
|
Fixing convert/check scripts.
|
2020-09-22 08:21:38 +02:00 |
|
|
b16406c900
|
Moving StripAccents within normalizer for Albert +XLNet, but now crash
in Precompiled. offsets are wrong ?
|
2020-09-22 08:21:38 +02:00 |
|
|
275ee6d4c4
|
Making convert script machine agnostic.
|
2020-09-22 08:21:38 +02:00 |
|
|
2fd1d9cf06
|
Adding a new convert script, that will convert all python Tokenizer code
into a proper Rust Tokenizer format and check it on a file.
- Also fuse_unks by default in `tokenizers`'s BPE.
|
2020-09-22 08:21:38 +02:00 |
|