* Move to maturing mimicking move for `safetensors`.
* Tmp.
* Fix sdist.
* Wat?
* Clippy 1.72
* Remove if.
* Conda sed.
* Fix doc check workflow.
* Moving to maturin AND removing http + openssl mess (smoothing transition
moving to `huggingface_hub`)
* Fix dep
* Black.
* New node bindings.
* Fix docs + node cache ?
* Yarn.
* Working dir.
* Extension module.
* Put back interpreter.
* Remove cache.
* New attempt
* Multi python.
* Remove FromPretrained.
* Remove traces of `fromPretrained`.
* Drop 3.12 for windows?
* Typo.
* Put back the default feature for ignoring links during simple test.
* Fix ?
* x86_64 -> x64.
* Remove warning for windows bindings.
* Excluse aarch.
* Include/exclude.
* Put back workflows in correct states.
* Split `get_n_added_tokens` into separate method
* Modify `TokenizerImpl.with_truncation()` to raise an error if given bad parameters
* Return Python error if `tokenizer.with_truncation()` fails
* Add dummy variable assignment for `no_truncation()` case
* Unrelated fmt fix.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Makes `decode` and `decode_batch` work on borrowed content.
* Make `decode_batch` work with borrowed content.
* Fix lint.
* Attempt to map it into Node.
* Second attempt.
* Step by step.
* One more step.
* Fix lint.
* Please ...
* Removing collect.
* Revert "Removing collect."
This reverts commit 2f7ec04dc84df3cc5488625a4fcb492fdc3545e2.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Upgrade pyo3 to 0.15
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>
* Upgrade pyo3 to 0.16
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>
* Install Python before running cargo clippy
* Fix clippy warnings
* Use `PyArray_Check` instead of downcasting to `PyArray1<u8>`
* Enable `auto-initialize` of pyo3 to fix `cargo test
--no-default-features`
* Fix some test cases
Why do they change?
* Refactor and add SAFETY comments to `PyArrayUnicode`
Replace deprecated `PyUnicode_FromUnicode` with `PyUnicode_FromKindAndData`
Co-authored-by: messense <messense@icloud.com>
* tokenizer.save has the wrong arguments compared to documentation
* Fixing doc of `save` function.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
This let us keep everything that was set on the model except from the vocabulary when trained. For example, this let us keep the configured `unk_token` of BPE when its trained.
* First pass on automatic stubbing our python files.
* And now modifying all rust docs to be visible in Pyi files.
* Better assert fail message.
* Fixing github workflow.
* Removing types not exported anymore.
* Fixing `Tokenizer` signature.
* Disabling auto __init__.py.
* Re-enabling some types.
* Don't overwrite non automated __init__.py
* Automated most __init__.py
* Restubbing after rebase.
* Fixing env for tests.
* Install blakc in the env.
* Use PY35 target in stub.py
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* Fixing hanging error while acquiring GIL from custom pretokenizer
during training.
Fixes#469
* cleanup
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>