0eb7455fe5
Preparing 0.12
release. ( #967 )
...
* Preparing `0.12` release.
* Fix click version: https://github.com/psf/black/issues/2964
2022-03-31 11:06:33 +02:00
28cd3dce2a
Bump minimist from 1.2.5 to 1.2.6 in /bindings/node ( #966 )
...
Bumps [minimist](https://github.com/substack/minimist ) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases )
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6 )
---
updated-dependencies:
- dependency-name: minimist
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 09:52:43 +02:00
a5f644616b
Fix the error test for Python 3.10 (error message is different). ( #962 )
2022-03-23 10:35:58 +01:00
cd730594e9
Fixing issue with ConvBert not being able to save because of of holes in ( #954 )
...
the vocab.
2022-03-21 19:28:49 +01:00
1bb9884f45
Fixing the vocab size of the trained Unigram model ( #952 )
...
* Fixing the vocab size of the trained Unigram model
* add test for the vocab size of the trained Unigram model
* Revert "add test for the vocab size of the trained Unigram model"
This reverts commit fb8955c831b357d1037548ceaa8789734d544646.
* Fixing the vocab size of the trained Unigram model
* format codes
* get the position of vocab-size calculation out of loop
2022-03-18 18:13:17 +01:00
daa4dd2288
Making the regex in ByteLevel optional. ( #939 )
...
* Making the regex in ByteLevel optional.
* Changed the stub.
* Beter stub.
* Typo fix.
* Remove bad comments.
2022-03-18 09:03:20 +01:00
cdabef14c4
Changing Decoder
trait to be more composable. ( #938 )
...
* Changing `Decoder` trait to be more composable.
Fix #872
* Fixing Python side.
* Fixing test.
* Updating cleanup signature, removing turbofish.
2022-03-17 10:32:09 +01:00
4b6055d4fb
Adding pickling support for trainers ( #949 )
...
* TMP.
* Adding support for pickling Python trainers.
* Remove not warranted files + missed naming updates.
* Stubbing.
* Making sure serialized format is written in python tests.
2022-03-14 12:18:11 +01:00
71ae5421eb
Python - add initial_alphabet to spm unigram trainer ( #942 )
...
* Python - add initial_alphabet to spm unigram trainer
* Python - use optional instead of mutable defaults in spm unigram trainer
2022-03-09 09:54:03 +01:00
98249dfb0f
Python - add doctype to length in implementations spm unigram ( #943 )
2022-03-08 11:59:07 +01:00
4a8f5db067
Python - Add length to train_from_iterator in implementations ( #937 )
2022-03-04 14:11:58 +01:00
845da6d8e8
Feat/m1 manual build ( #936 )
...
* feat(bindings): move target compilation flags to correct config file
* feat(bindings): m1 build 'script'
* feat(ci): for loop in bdist_wheel script for m1
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-03-02 14:44:13 +01:00
a4a68de98a
Workarounds publishing issues:
...
- Upgrade package-lock.json (cannot find VS code attempt)
- Use published `macro_rules_attribute` so `cargo publish` works.
2022-02-28 11:16:46 +01:00
ffaee13994
Preparing for 0.11.6 release.
2022-02-28 10:20:49 +01:00
2fecdc10dd
Update the CHANGELOG.
2022-02-16 13:07:31 +01:00
5679323bbc
Minor version bump.
2022-02-16 12:51:11 +01:00
88d718207a
tokenizer.save has the wrong arguments compared to documentation ( #901 )
...
* tokenizer.save has the wrong arguments compared to documentation
* Fixing doc of `save` function.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-02-15 17:55:55 +01:00
448054f3c7
fix python3.10 build ( #895 )
2022-01-28 17:51:51 +01:00
a8e07d734f
Changelog.
2022-01-17 22:31:54 +01:00
9b85424520
Version bump.
2022-01-17 22:30:25 +01:00
1a84958cc8
Fixing bad deserialization following inclusion of a default for Punctuation
. ( #884 )
...
* Fixing bad deserialization following inclusion of a default for
`Punctuation`.
* don't remove the type now...
* Adding slow test to run on all the tokenizers of the hub.
* `PartialEq` everywhere.
* Forcing `type` to exist on the `pre_tokenizers`.
2022-01-17 22:28:25 +01:00
c2fd765087
Update Cargo.lock for Python.
2022-01-17 10:32:46 +01:00
a4cf53f6a7
Update CHANGELOG.
2022-01-17 09:56:56 +01:00
ab9a2f3100
Update versions.
2022-01-17 09:40:01 +01:00
b18b572ed2
Bump shelljs from 0.8.4 to 0.8.5 in /bindings/node ( #881 )
...
Bumps [shelljs](https://github.com/shelljs/shelljs ) from 0.8.4 to 0.8.5.
- [Release notes](https://github.com/shelljs/shelljs/releases )
- [Changelog](https://github.com/shelljs/shelljs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/shelljs/shelljs/compare/v0.8.4...v0.8.5 )
---
updated-dependencies:
- dependency-name: shelljs
dependency-type: direct:development
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-17 09:26:09 +01:00
cabbecb96c
add python3.10 release ( #877 )
...
* add missing python3.9 classifier
* add python3.10 release
* run tests on 3.10
* Revert "run tests on 3.10"
This reverts commit ceed64249e54b6ec622b06c59bf47da7c6dfc1b0.
2022-01-12 09:42:13 +01:00
076319d542
Aho corasick version for many added tokens. ( #871 )
...
* Aho corasick version.
* Remove test file.
* Compile on `stable`.
2022-01-06 16:04:51 +01:00
8e0d66a254
New python version.
2022-01-04 14:58:02 +01:00
6972e49f1d
Fix the clippy warnings. ( #869 )
2022-01-04 14:32:07 +01:00
1054e243e2
Fix invalid continuing subwrd prefix. ( #864 )
...
* Creating failing test for invalid continuing subwrd prefix.
* Test in rust + the associated fix.
* Clippy.
* Black.
2022-01-04 14:25:35 +01:00
4122a33f09
Fixing missing direction
in TruncationParams. ( #868 )
2022-01-04 14:21:46 +01:00
7069988ffe
Update to 0.11.1
2021-12-28 13:59:31 +01:00
152880ab3e
Adding truncation_side within TruncationParams
. ( #860 )
...
* Add truncation to enable_truncation
* Fix typo
* Adding truncation_side within `TruncationParams`.
* Node serialization of this direction param.
* Update the test.
* Fixing warnings/lint.
* Adding stuff (can't local debug :( )
* Slow loop... ;(
* Stub.py.
Co-authored-by: Niels Rogge <niels.rogge1@gmail.com >
2021-12-28 12:37:06 +01:00
c4c9de23a5
Feature: Handle invalid truncate direction ( #858 )
...
* refacto: TruncateDirection -> TruncationDirection
* feat(node): invalid direction will throw
* feat(python): invalid direction will throw
* Update bindings/node/lib/bindings/raw-encoding.test.ts
* Update bindings/python/tests/bindings/test_encoding.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2021-12-27 14:31:57 +01:00
943f4ef469
Preparing for 0.11.0 Re-release. ( #856 )
...
* Starting from master again.
Upgrade libssl everywhere on quay
Extra is ubuntu based (running the quay in a container).
making only extra run + attempt to fix ssl update.
Extra with newer openssl versions.
`-y`.
Use checkoint@v2 + remove `-` from environment name.
Debugging back the conda release..
Attempt to use `base` env.
3.7 requires `activate-environement: true.
MacOS and windows don't run on manylinux.
Remove yum on windows/macOs.
Miniconda doesn't like manylinux2014 anymore ?
Attempting different approach for manylinux + conda.
Use wget.
Extra bracet.
Executing $filename
Activate the env.
Activate the env on eevery step that requires it.
Openssl-devel.
Activating env for extracting version ?
Retest all workflows.
Manylinux2010 requires checkout@v1
Run on tag for extra and conda again.
openssl-devel.
* Putting back into deploy state.
* Adding links in CHANGELOG.
* Remove clippy from changelog.
2021-12-23 16:43:48 +01:00
04368b1998
Truncate Right ( #841 )
...
* feat(tokenizers): add truncate test case
* !feat(tokenizer): truncate right
* refacto(tokenizers): clippy
* feat(bindings): update bindings for truncate()
* fix(tokenizers): remove unsafe code
* refacto(tokenizers): truncate direction
* truncate direction enum
* compute parts ranges beforehand
* 2n space because encoding is dropped at the end of procedure
* update bindings
* add pip install in python bindings' make test
* fix(node): clippy asks to use unwrap_or_else
* fix(node): lint
* refacto(tokenizers): replace Vec<Range<usize>> by Vec<(usize, usize)>
* refacto(bindings): add match syntax
* refacto(tokenizers): use mem::replace instead of mem::swap
* refacto(tokenizers): assign value the normal way
2021-12-23 13:34:21 +01:00
c1100ec542
Clippy fixes. ( #846 )
...
* Clippy fixes.
* Drop support for Python 3.6
* Remove other 3.6
* Re-enabling caches for build (5h + seems too long and issue seems
solved)
https://github.com/actions/virtual-environments/issues/572
* `npm audit fix`.
* Fix yaml ?
* Pyarrow issue fixed: https://github.com/huggingface/datasets/pull/2268
* Installing dev libraries.
* Install python dev elsewhere ?
* Typo.
* No sudo.
* ...
* Testing the GH again.
* Maybe v2 will fix ?
* Fixing tests on MacOS Python 3.8+
2021-12-15 15:55:48 +01:00
1dc19e0dd4
Fix Python README example
2021-10-07 16:56:48 +02:00
b0ee27847f
Python - Prepare for release 0.11.0 ( #799 )
2021-09-08 03:15:47 -04:00
fd316bdc61
Update esaxx-rs to 0.1.7 to fix building on windows
2021-09-02 20:11:27 +02:00
884bfb7970
Prepare node release ( #794 )
...
* Node - Update changelog for release
* Update node release to add v14 & v15
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
* Node - Update version number
* Node - Update dependencies
* Node - Lint
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
2021-09-02 09:58:01 -04:00
b8b584d4e5
Python - Pretty json saving defaults to true ( #793 )
...
* Python - Pretty json saving defaults to true
* Update changelog
2021-09-02 08:43:54 -04:00
23cf8c69ae
Bump tar from 4.4.17 to 4.4.19 in /bindings/node ( #792 )
...
Bumps [tar](https://github.com/npm/node-tar ) from 4.4.17 to 4.4.19.
- [Release notes](https://github.com/npm/node-tar/releases )
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md )
- [Commits](https://github.com/npm/node-tar/compare/v4.4.17...v4.4.19 )
---
updated-dependencies:
- dependency-name: tar
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-02 08:06:54 -04:00
e68aecc442
Python - Update Cargo.lock
2021-09-02 14:04:35 +02:00
35c96e5e3f
Add tests for from_pretrained
2021-08-31 09:00:05 -04:00
ad7090a5c7
Improve READMEs for from_pretrained
2021-08-31 09:00:05 -04:00
a4d0f3dd18
Update docs for from_pretrained
2021-08-31 09:00:05 -04:00
528c9a532e
Node - Add bindings to Tokenizer.from_pretrained
2021-08-31 09:00:05 -04:00
6f9e867330
Better export for FromPretrainedParameters
2021-08-31 09:00:05 -04:00
e44fdee4a1
Python - Add bindings to Tokenizer.from_pretrained
2021-08-31 09:00:05 -04:00