152880ab3e
Adding truncation_side within TruncationParams
. ( #860 )
...
* Add truncation to enable_truncation
* Fix typo
* Adding truncation_side within `TruncationParams`.
* Node serialization of this direction param.
* Update the test.
* Fixing warnings/lint.
* Adding stuff (can't local debug :( )
* Slow loop... ;(
* Stub.py.
Co-authored-by: Niels Rogge <niels.rogge1@gmail.com >
2021-12-28 12:37:06 +01:00
c4c9de23a5
Feature: Handle invalid truncate direction ( #858 )
...
* refacto: TruncateDirection -> TruncationDirection
* feat(node): invalid direction will throw
* feat(python): invalid direction will throw
* Update bindings/node/lib/bindings/raw-encoding.test.ts
* Update bindings/python/tests/bindings/test_encoding.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2021-12-27 14:31:57 +01:00
38a85b2112
Last touches for conda hopefully
...
- Missng env activation for many linux + upload
2021-12-24 08:05:09 +01:00
943f4ef469
Preparing for 0.11.0 Re-release. ( #856 )
...
* Starting from master again.
Upgrade libssl everywhere on quay
Extra is ubuntu based (running the quay in a container).
making only extra run + attempt to fix ssl update.
Extra with newer openssl versions.
`-y`.
Use checkoint@v2 + remove `-` from environment name.
Debugging back the conda release..
Attempt to use `base` env.
3.7 requires `activate-environement: true.
MacOS and windows don't run on manylinux.
Remove yum on windows/macOs.
Miniconda doesn't like manylinux2014 anymore ?
Attempting different approach for manylinux + conda.
Use wget.
Extra bracet.
Executing $filename
Activate the env.
Activate the env on eevery step that requires it.
Openssl-devel.
Activating env for extracting version ?
Retest all workflows.
Manylinux2010 requires checkout@v1
Run on tag for extra and conda again.
openssl-devel.
* Putting back into deploy state.
* Adding links in CHANGELOG.
* Remove clippy from changelog.
2021-12-23 16:43:48 +01:00
04368b1998
Truncate Right ( #841 )
...
* feat(tokenizers): add truncate test case
* !feat(tokenizer): truncate right
* refacto(tokenizers): clippy
* feat(bindings): update bindings for truncate()
* fix(tokenizers): remove unsafe code
* refacto(tokenizers): truncate direction
* truncate direction enum
* compute parts ranges beforehand
* 2n space because encoding is dropped at the end of procedure
* update bindings
* add pip install in python bindings' make test
* fix(node): clippy asks to use unwrap_or_else
* fix(node): lint
* refacto(tokenizers): replace Vec<Range<usize>> by Vec<(usize, usize)>
* refacto(bindings): add match syntax
* refacto(tokenizers): use mem::replace instead of mem::swap
* refacto(tokenizers): assign value the normal way
2021-12-23 13:34:21 +01:00
362df327b0
Adding Decoders
to the API doc in Python. ( #845 )
2021-12-20 10:53:58 +01:00
4759700da8
Fixing interaction between is_pretokenized
and trim_offsets
. ( #844 )
2021-12-20 10:53:46 +01:00
31dd4364f0
Feature gate http-deps ( #850 )
...
* Feature gate http-deps
* Default features cleanup
* Review fixups
* One more import fix
2021-12-20 10:53:09 +01:00
b240ccb68a
Updating doc with real links. ( #851 )
...
* Updating doc with real links.
* Remove cache to make it build ?
2021-12-17 17:50:24 +01:00
c1100ec542
Clippy fixes. ( #846 )
...
* Clippy fixes.
* Drop support for Python 3.6
* Remove other 3.6
* Re-enabling caches for build (5h + seems too long and issue seems
solved)
https://github.com/actions/virtual-environments/issues/572
* `npm audit fix`.
* Fix yaml ?
* Pyarrow issue fixed: https://github.com/huggingface/datasets/pull/2268
* Installing dev libraries.
* Install python dev elsewhere ?
* Typo.
* No sudo.
* ...
* Testing the GH again.
* Maybe v2 will fix ?
* Fixing tests on MacOS Python 3.8+
2021-12-15 15:55:48 +01:00
1dc19e0dd4
Fix Python README example
2021-10-07 16:56:48 +02:00
b0ee27847f
Python - Prepare for release 0.11.0 ( #799 )
2021-09-08 03:15:47 -04:00
0a37bd8d55
Attempt at fixing Conda builds
...
Ref #585
2021-09-08 08:56:58 +02:00
fd316bdc61
Update esaxx-rs to 0.1.7 to fix building on windows
2021-09-02 20:11:27 +02:00
36204c8dde
Exclude node 15.x for windows
2021-09-02 16:11:41 +02:00
884bfb7970
Prepare node release ( #794 )
...
* Node - Update changelog for release
* Update node release to add v14 & v15
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
* Node - Update version number
* Node - Update dependencies
* Node - Lint
Co-authored-by: Huan (李卓桓) <zixia@zixia.net >
2021-09-02 09:58:01 -04:00
b8b584d4e5
Python - Pretty json saving defaults to true ( #793 )
...
* Python - Pretty json saving defaults to true
* Update changelog
2021-09-02 08:43:54 -04:00
23cf8c69ae
Bump tar from 4.4.17 to 4.4.19 in /bindings/node ( #792 )
...
Bumps [tar](https://github.com/npm/node-tar ) from 4.4.17 to 4.4.19.
- [Release notes](https://github.com/npm/node-tar/releases )
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md )
- [Commits](https://github.com/npm/node-tar/compare/v4.4.17...v4.4.19 )
---
updated-dependencies:
- dependency-name: tar
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-02 08:06:54 -04:00
e68aecc442
Python - Update Cargo.lock
2021-09-02 14:04:35 +02:00
c65b72dec7
Rust - Prepare for release 0.11.0 ( #789 )
2021-08-31 10:57:21 -04:00
35c96e5e3f
Add tests for from_pretrained
2021-08-31 09:00:05 -04:00
ad7090a5c7
Improve READMEs for from_pretrained
2021-08-31 09:00:05 -04:00
a4d0f3dd18
Update docs for from_pretrained
2021-08-31 09:00:05 -04:00
528c9a532e
Node - Add bindings to Tokenizer.from_pretrained
2021-08-31 09:00:05 -04:00
6f9e867330
Better export for FromPretrainedParameters
2021-08-31 09:00:05 -04:00
e44fdee4a1
Python - Add bindings to Tokenizer.from_pretrained
2021-08-31 09:00:05 -04:00
e71e5be64f
Rust - Add from_pretrained on Tokenizer
2021-08-31 09:00:05 -04:00
e7dd6436dd
Fix word level tokenizer determinism ( #718 )
...
* compare not only counts of words, but if equal also words themselves
* add missing semicolon
* Fix a few clippy warnings and imports
Co-authored-by: Anthony Moi <m.anthony.moi@gmail.com >
2021-08-13 10:53:39 -04:00
5982498195
Switch git dependencies in Cargo.toml back to regular versions ( #728 )
...
* Switch git dependencies in Cargo.toml back to regular versions
rayon-cond turned out to be a rustc bug that has been fixed for a while
(see cuviper/rayon-cond#2 ), so we can revert the git dependency.
numpy has released the commit in question as part of 0.12.
* Also update Cargo.lock files
Co-authored-by: Anthony Moi <m.anthony.moi@gmail.com >
2021-08-13 09:32:00 -04:00
e2bf8daa3a
Add SplitDelimiterBehavior to Punctuation constructor ( #657 )
...
Resolves : #642
2021-08-13 09:19:23 -04:00
c1100dcbe3
Fix typo in documentation ( #743 )
...
* Doc - Fix typo (And instance of -> An instance of)
* Add missing text_signature for WordLevel.from_file
Co-authored-by: Anthony Moi <m.anthony.moi@gmail.com >
2021-08-13 08:08:23 -04:00
71fb73e129
update lexical-core because 0.7.4 doesn't compile ( #758 )
...
* update lexical-core because 0.7.4 doesn't compile
Fix the issue as described in https://github.com/rust-lang/rust/issues/81654
* update lexical-core because 0.7.4 doesn't compile
Fix the issue as described in https://github.com/rust-lang/rust/issues/81654
2021-08-12 10:34:45 -04:00
6616e699f7
Expand documentation of UnigramTrainer ( #770 )
...
* Expand documentation of UnigramTrainer
* Put doc at the source
* Add signature
* make style
Co-authored-by: Anthony Moi <m.anthony.moi@gmail.com >
2021-08-12 10:12:26 -04:00
da4c7b10e4
Add a way to specify the unknown token in SentencePieceUnigramTokenizer
python implem ( #762 )
...
* add a way to specify the unknown token in `SentencePieceUnigramTokenizer`
* add test that verify that an exception is raised for the missing unknown token
* style
* add test tokens
2021-08-12 09:42:44 -04:00
46bed542fa
Bump path-parse from 1.0.6 to 1.0.7 in /bindings/node ( #774 )
...
Bumps [path-parse](https://github.com/jbgutierrez/path-parse ) from 1.0.6 to 1.0.7.
- [Release notes](https://github.com/jbgutierrez/path-parse/releases )
- [Commits](https://github.com/jbgutierrez/path-parse/commits/v1.0.7 )
---
updated-dependencies:
- dependency-name: path-parse
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-12 09:41:25 -04:00
ab3d3bcbfb
Bump tar from 4.4.13 to 4.4.17 in /bindings/node ( #775 )
...
Bumps [tar](https://github.com/npm/node-tar ) from 4.4.13 to 4.4.17.
- [Release notes](https://github.com/npm/node-tar/releases )
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md )
- [Commits](https://github.com/npm/node-tar/compare/v4.4.13...v4.4.17 )
---
updated-dependencies:
- dependency-name: tar
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-12 09:31:47 -04:00
5d1b0a9381
Bump glob-parent from 5.1.1 to 5.1.2 in /bindings/node ( #734 )
...
Bumps [glob-parent](https://github.com/gulpjs/glob-parent ) from 5.1.1 to 5.1.2.
- [Release notes](https://github.com/gulpjs/glob-parent/releases )
- [Changelog](https://github.com/gulpjs/glob-parent/blob/main/CHANGELOG.md )
- [Commits](https://github.com/gulpjs/glob-parent/compare/v5.1.1...v5.1.2 )
---
updated-dependencies:
- dependency-name: glob-parent
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-12 09:21:00 -04:00
96c122ccf6
Bump ws from 7.3.1 to 7.4.6 in /bindings/node ( #721 )
...
Bumps [ws](https://github.com/websockets/ws ) from 7.3.1 to 7.4.6.
- [Release notes](https://github.com/websockets/ws/releases )
- [Commits](https://github.com/websockets/ws/compare/7.3.1...7.4.6 )
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-12 09:20:36 -04:00
256a71c1f2
Clippy 1.54. ( #773 )
2021-08-11 14:43:49 +02:00
d83772d62c
Fixing tokenizers with 1.53 (updated some dependencies + clippy) ( #764 )
2021-07-21 09:58:38 +02:00
755e5f5c1e
Remove support for Python 3.5 ( #714 )
...
* Python - remove support for python 3.5
* revert ci
* revert build-wheels.sh
* Update CHANGELOG.md
2021-05-24 17:31:01 -04:00
3a002c1aa8
Python - prepare for release 0.10.3
2021-05-24 16:59:10 -04:00
c046da7679
Fix stripping strings containing Unicode characters ( #707 )
...
* Strip seems to have been broken for a while on unicode strings.
- Includes a failing tests + fixed it.
- This function could maybe b optimized, we're scanning the string 3 times now.
and once fully for chars.
* Update CHANGELOG.md
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2021-05-24 16:49:59 -04:00
4b7f8c2d7c
Fix CHANGELOG.md
2021-05-24 16:16:40 -04:00
bd19584580
Bump lodash from 4.17.19 to 4.17.21 in /bindings/node ( #701 )
...
Bumps [lodash](https://github.com/lodash/lodash ) from 4.17.19 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases )
- [Commits](https://github.com/lodash/lodash/compare/4.17.19...4.17.21 )
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-20 14:22:02 -04:00
8f639b42ea
Bump hosted-git-info from 2.8.8 to 2.8.9 in /bindings/node ( #702 )
...
Bumps [hosted-git-info](https://github.com/npm/hosted-git-info ) from 2.8.8 to 2.8.9.
- [Release notes](https://github.com/npm/hosted-git-info/releases )
- [Changelog](https://github.com/npm/hosted-git-info/blob/v2.8.9/CHANGELOG.md )
- [Commits](https://github.com/npm/hosted-git-info/compare/v2.8.8...v2.8.9 )
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-20 14:21:52 -04:00
7574349223
Bump y18n from 4.0.0 to 4.0.3 in /bindings/node ( #708 )
...
Bumps [y18n](https://github.com/yargs/y18n ) from 4.0.0 to 4.0.3.
- [Release notes](https://github.com/yargs/y18n/releases )
- [Changelog](https://github.com/yargs/y18n/blob/y18n-v4.0.3/CHANGELOG.md )
- [Commits](https://github.com/yargs/y18n/compare/v4.0.0...y18n-v4.0.3 )
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-20 14:21:40 -04:00
3cf957e6f8
Bump handlebars from 4.7.6 to 4.7.7 in /bindings/node ( #700 )
...
Bumps [handlebars](https://github.com/wycats/handlebars.js ) from 4.7.6 to 4.7.7.
- [Release notes](https://github.com/wycats/handlebars.js/releases )
- [Changelog](https://github.com/handlebars-lang/handlebars.js/blob/master/release-notes.md )
- [Commits](https://github.com/wycats/handlebars.js/compare/v4.7.6...v4.7.7 )
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-05-20 14:21:28 -04:00
4b0dc6b947
Fix SPM conversions ( #686 )
...
* Fix SPM conversions
* Update changelog
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2021-05-20 09:55:55 -04:00
2e2e7558f7
Add CTC Decoder for Wave2Vec models ( #693 )
...
* Rust - add a CTCDecoder as a seperate mod
* Adding bindings to Node + Python.
* Clippy update.
* Stub.
* Fixing roberta.json URLs.
* Moving test files to hf.co.
* Update cargo check and clippy to 1.52.
* Inner ':' actually is used for domains in sphinx.
Making `domain` work correctly was just too much work so I went the easy
way and have global roles for the custom rust extension.
* Update struct naming and docs
* Update changelog
Co-authored-by: Thomaub <github.thomaub@gmail.com >
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2021-05-20 09:30:09 -04:00