* TMP.
* Adding support for pickling Python trainers.
* Remove not warranted files + missed naming updates.
* Stubbing.
* Making sure serialized format is written in python tests.
* Fixing deserialization order of added_tokens.
* Actually add a test made things more obvious.
It was a mess to handle `special` outside the notion of `AddedToken`.
This would merit an actual rework, as including `special` within the
token should make everything simpler.
For now we just make our lives easy.
* Cleanup.
* Fixing comment.
* Making the test stronger.
* Fixing off by one error in `single_word` AddedToken.
* Real fix for all unicode ranges.
Both `single_word` and `lstrip`, `rstrip` were affected.
* Adding warning when unexpected code path is taken.
* in serialization.rs, the supplementary tokens are now added "in batch" to the tokenizer vocabulary, so that the tokenizer trie is built just once (Fix#914)
Building the trie is a very expensive operation: previously this operation was carried out for every and each token, so that, for large vocabularies, the overall loading time of the vocabulary resulted unacceptable.
* reformatted code of serialization.rs with rustfmt
* Propose code cleanups.
Co-authored-by: Piercarlo Slavazza <p.slavazza@elibra.eu>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* tokenizer.save has the wrong arguments compared to documentation
* Fixing doc of `save` function.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Fixing bad deserialization following inclusion of a default for
`Punctuation`.
* don't remove the type now...
* Adding slow test to run on all the tokenizers of the hub.
* `PartialEq` everywhere.
* Forcing `type` to exist on the `pre_tokenizers`.
* Starting from master again.
Upgrade libssl everywhere on quay
Extra is ubuntu based (running the quay in a container).
making only extra run + attempt to fix ssl update.
Extra with newer openssl versions.
`-y`.
Use checkoint@v2 + remove `-` from environment name.
Debugging back the conda release..
Attempt to use `base` env.
3.7 requires `activate-environement: true.
MacOS and windows don't run on manylinux.
Remove yum on windows/macOs.
Miniconda doesn't like manylinux2014 anymore ?
Attempting different approach for manylinux + conda.
Use wget.
Extra bracet.
Executing $filename
Activate the env.
Activate the env on eevery step that requires it.
Openssl-devel.
Activating env for extracting version ?
Retest all workflows.
Manylinux2010 requires checkout@v1
Run on tag for extra and conda again.
openssl-devel.
* Putting back into deploy state.
* Adding links in CHANGELOG.
* Remove clippy from changelog.