* add doc in the code
* add option to skip special tokens
* nits
* add api dummy for now
* Fmt.
* Fix fmt.
* Fix the stub.
* add a test
* add a test in python
* style it
* nits
* add getter and setters
* stub
* update python test
* fmt
* last nit
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* nits
* allow for legacy beahaviour without making any breaking changes
* add a todo
* set to legacy by default
* skip legacy serialization
* push correct update
* lint
* add deserialization test
* add a python test as well
* updates
* fix serialization tests
* nits
* python stylijng of the tests
* better tests
* fix offsets
* fix imports
* fmt
* update metaspace
* remove TODO
* use enm
* fix some tses
* nits
* use enum
* update tests
* syling
* remove impl from for PrependScheme
* use simple getters and setters
* lint
* update tests
* add test new == new_with_prepend_scheme
* revert a change
* use setters and getterts
* Update bindings/python/src/pre_tokenizers.rs
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* nits
* use copy rather than ref
* nits format
* more nits
* allow option string
* enforce First Never Always camel cased
* nits
* refactor
* update test as well
* fmt
* nits
* properly error out
* Update bindings/python/src/pre_tokenizers.rs
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* suggestion changes
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Fixing the progressbar.
* Upgrade deps.
* Update cargo audit
* Ssh this action.
* Fixing esaxx by using slower rust version.
* Trying the new esaxx version.
* Publish.
* Get cache again.
* Move to maturing mimicking move for `safetensors`.
* Tmp.
* Fix sdist.
* Wat?
* Clippy 1.72
* Remove if.
* Conda sed.
* Fix doc check workflow.
* Moving to maturin AND removing http + openssl mess (smoothing transition
moving to `huggingface_hub`)
* Fix dep
* Black.
* New node bindings.
* Fix docs + node cache ?
* Yarn.
* Working dir.
* Extension module.
* Put back interpreter.
* Remove cache.
* New attempt
* Multi python.
* Remove FromPretrained.
* Remove traces of `fromPretrained`.
* Drop 3.12 for windows?
* Typo.
* Put back the default feature for ignoring links during simple test.
* Fix ?
* x86_64 -> x64.
* Remove warning for windows bindings.
* Excluse aarch.
* Include/exclude.
* Put back workflows in correct states.
* CD backports
follow
huggingface/safetensors#317
* fix node bindings?
`cargo check` doesnt work on my local configuration from `tokenizers/bindings/node/native`
i don't think it will be a problem but i have difficulty telling
* backport #315
* safetensors#317 back ports
* Split `get_n_added_tokens` into separate method
* Modify `TokenizerImpl.with_truncation()` to raise an error if given bad parameters
* Return Python error if `tokenizer.with_truncation()` fails
* Add dummy variable assignment for `no_truncation()` case
* Unrelated fmt fix.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>