17 Commits

Author SHA1 Message Date
29fef1e7aa [remove black] And use ruff (#1436)
* nits

* Fixing deps.

* Ruff update.

* Import order matters.

* Fix.

* Revert ruff fix.

* Visualizer.

* Putting back the imports.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-03-12 11:24:21 +01:00
6113666624 Updating python formatting. (#1079)
* Updating python formatting.

* Forgot gh action.

* Skipping isort to prevent circular imports.

* Updating stub.

* Removing `isort` (it contradicts `stub.py`).

* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
da4c7b10e4 Add a way to specify the unknown token in SentencePieceUnigramTokenizer python implem (#762)
* add a way to specify the unknown token in `SentencePieceUnigramTokenizer`

* add test that verify that an exception is raised for the missing unknown token

* style

* add test tokens
2021-08-12 09:42:44 -04:00
49d11b1f69 Python - Add components getter/setters to BaseTokenizer 2021-01-11 16:08:38 -05:00
d94fa220b6 Python - Add train_from_iterator to implementations 2021-01-07 09:02:20 -05:00
000c19a7a5 Doc - Improve snippets testing 2020-11-02 17:07:27 -05:00
655809c718 Attempt to get some documentation going. 2020-11-02 17:07:27 -05:00
a410903051 Upgrading to black 20.8b1 2020-09-24 09:27:30 -04:00
36832bfa12 from_files -> from_file everywhere
- read_files -> read_file
- from_file pure rust impl in python bindings
- Fix some typing in python binding
- Added {BPE,WordLevel,WordPiece}.from_file tests.
2020-09-24 08:57:02 +02:00
9672995a56 We use 19.10b0 not 20 here... 2020-09-24 08:57:02 +02:00
9b1ef9d895 Black pre-commit after rebase. 2020-09-24 08:57:02 +02:00
8f8156fd2c Adressing first pass of comments. 2020-09-24 08:57:02 +02:00
98a30eead1 Temp work to make the APIs uniform (build from memory by default). 2020-09-24 08:57:02 +02:00
aa3b39f692 Python - Tests for parallelism with multiprocessing
Co-authored-by: Evan Pete Walsh <epwalsh10@gmail.com>
2020-06-23 11:25:39 -04:00
837791ee1f Python - Test BertWordPieceTokenizer 2020-04-01 17:25:56 -04:00
7fd7dfd113 Python - Test CharBPETokenizer 2020-04-01 17:25:56 -04:00
5ebe687753 Python - Add first implementations tests 2020-04-01 17:25:55 -04:00