29fef1e7aa
[remove black
] And use ruff ( #1436 )
...
* nits
* Fixing deps.
* Ruff update.
* Import order matters.
* Fix.
* Revert ruff fix.
* Visualizer.
* Putting back the imports.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-03-12 11:24:21 +01:00
6113666624
Updating python formatting. ( #1079 )
...
* Updating python formatting.
* Forgot gh action.
* Skipping isort to prevent circular imports.
* Updating stub.
* Removing `isort` (it contradicts `stub.py`).
* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
da4c7b10e4
Add a way to specify the unknown token in SentencePieceUnigramTokenizer
python implem ( #762 )
...
* add a way to specify the unknown token in `SentencePieceUnigramTokenizer`
* add test that verify that an exception is raised for the missing unknown token
* style
* add test tokens
2021-08-12 09:42:44 -04:00
49d11b1f69
Python - Add components getter/setters to BaseTokenizer
2021-01-11 16:08:38 -05:00
d94fa220b6
Python - Add train_from_iterator to implementations
2021-01-07 09:02:20 -05:00
000c19a7a5
Doc - Improve snippets testing
2020-11-02 17:07:27 -05:00
655809c718
Attempt to get some documentation going.
2020-11-02 17:07:27 -05:00
a410903051
Upgrading to black 20.8b1
2020-09-24 09:27:30 -04:00
36832bfa12
from_files -> from_file everywhere
...
- read_files -> read_file
- from_file pure rust impl in python bindings
- Fix some typing in python binding
- Added {BPE,WordLevel,WordPiece}.from_file tests.
2020-09-24 08:57:02 +02:00
9672995a56
We use 19.10b0 not 20 here...
2020-09-24 08:57:02 +02:00
9b1ef9d895
Black pre-commit after rebase.
2020-09-24 08:57:02 +02:00
8f8156fd2c
Adressing first pass of comments.
2020-09-24 08:57:02 +02:00
98a30eead1
Temp work to make the APIs uniform (build from memory by default).
2020-09-24 08:57:02 +02:00
aa3b39f692
Python - Tests for parallelism with multiprocessing
...
Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com >
2020-06-23 11:25:39 -04:00
837791ee1f
Python - Test BertWordPieceTokenizer
2020-04-01 17:25:56 -04:00
7fd7dfd113
Python - Test CharBPETokenizer
2020-04-01 17:25:56 -04:00
5ebe687753
Python - Add first implementations tests
2020-04-01 17:25:55 -04:00