5886179eee
Bump decode-uri-component in /tokenizers/examples/unstable_wasm/www ( #1125 )
...
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component ) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases )
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2 )
---
updated-dependencies:
- dependency-name: decode-uri-component
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-19 14:24:24 +01:00
a408b44429
Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node ( #1126 )
...
Bumps [minimatch](https://github.com/isaacs/minimatch ) from 3.0.4 to 3.1.2.
- [Release notes](https://github.com/isaacs/minimatch/releases )
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md )
- [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2 )
---
updated-dependencies:
- dependency-name: minimatch
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-19 14:09:24 +01:00
bfa842e063
Adding stale bot ? ( #1123 )
...
* Adding stale bot ?
* Clippy.
2022-12-19 13:50:48 +01:00
1649d74536
Fixing conda ssl location ( #1124 )
...
* Fixing conda build ?
* Reduce the scope to speedup testing.
* Reduce more.
* Trying to link to conda lib.
* Trying to enable `pkg-config` on the codna env.
* Really publish.
* Update conda builds.
* Remove 3.11
* Putting releases back onto release track.
2022-12-19 13:50:36 +01:00
9a25b2cb8e
[FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. ( #1120 )
...
* [fix] Use unk_token
In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used.
* [fix] If unk_token is None, this case is also considered.
* Update bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-12-19 13:40:04 +01:00
102dfe87a3
Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node ( #1116 )
...
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component ) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases )
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2 )
---
updated-dependencies:
- dependency-name: decode-uri-component
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-05 18:09:38 +01:00
67080e163a
Include license file in Rust crate ( #1115 )
...
* Include license file in Rust crate
* Ignore security warning.
* Also for python.
* Upgrading ubuntu version.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-11-30 23:17:56 +01:00
c74e9e62f6
Bump loader-utils in /tokenizers/examples/unstable_wasm/www ( #1108 )
...
Bumps [loader-utils](https://github.com/webpack/loader-utils ) from 1.4.0 to 1.4.2.
- [Release notes](https://github.com/webpack/loader-utils/releases )
- [Changelog](https://github.com/webpack/loader-utils/blob/v1.4.2/CHANGELOG.md )
- [Commits](https://github.com/webpack/loader-utils/compare/v1.4.0...v1.4.2 )
---
updated-dependencies:
- dependency-name: loader-utils
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-16 12:01:25 +01:00
e9529cb02f
Merge pull request #1107 from huggingface/revert-1101-update_doc_pr_actions
...
Revert "Update pr docs actions"
2022-11-16 11:41:51 +01:00
ffcf5a4136
Revert "Update pr docs actions ( #1101 )"
...
This reverts commit 99c06c82e0
.
2022-11-16 11:41:38 +01:00
bbae829a72
Adding rust audit. ( #1099 )
...
* Adding rust audit.
* Update clap version + derive_builder (they clashed).
* Ignoring specific CVE which can be ignored
https://github.com/Azure/iot-identity-service/issues/481
* Updating python lock.
* Revert `derive-builder` update.
* Adding back help msg.
2022-11-09 12:59:36 +01:00
99c06c82e0
Update pr docs actions ( #1101 )
2022-11-09 11:09:52 +01:00
b8a4aa6000
Fixing extra wheels memory usage. ( #1098 )
2022-11-07 09:11:18 +01:00
11bb2e00f2
Add python 3.11 to manylinux buildwheels ( #1096 )
...
* Add python 3.11 to manylinux buildwheels
* Fixing clippy.
* Node clippy.
* Python clippy.
* Changelog + version number update.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-11-07 08:45:04 +01:00
96a9e5715c
New version. ( #1082 )
...
* New version.
The actual release will happen *before* PyO3 0.17.2 because
the tests were ran before than.
* Manylinux2014 necessary now with Rust 1.64.
2022-10-06 15:45:56 +02:00
4ef0afbeb6
Update old gh actions, remove deprecated doc building. ( #1069 )
2022-10-05 17:59:46 +02:00
8129dd3309
pyo3: update to 0.17 ( #1066 )
...
* python: update bindings to edition 2021
* python: update to pyo3 0.17
* Updating testing.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-10-05 16:59:01 +02:00
6113666624
Updating python formatting. ( #1079 )
...
* Updating python formatting.
* Forgot gh action.
* Skipping isort to prevent circular imports.
* Updating stub.
* Removing `isort` (it contradicts `stub.py`).
* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
5f6e978452
Fixing roberta type id (everything is zero). ( #1072 )
...
* Fixing roberta type ids (everything is zero).
* We need to fix type_ids for all sequence even when not changing
anything else.
* Fixing tests hopefully better.
2022-09-26 18:00:41 +02:00
6e5569a540
Moving versions numbers to dev
mode. ( #1067 )
2022-09-22 18:24:07 +02:00
63082c4d11
Enabling static interpreter embedding for manylinux. ( #1064 )
...
* Removing dead file.
* Checking that we can distribute with static python embedding for
manylinux
* Many linux embed interpreter.
* Building wheels manylinux with static embedding
* Better script.
* typo.
* Using a dummy feature?
* default features ?
* Back into order.
* Fixing manylinux ??.
* Local dir.
* Missing star.
* Makedir ?
* Monkey coding this.
* extension module ?
* Building with default features `RustExtension`.
* bdist_wheel + rustextension any better ?
* update rust-py version.
* Forcing extension module.
* No default features.
* Remove py37 out of spite
* Revert "Remove py37 out of spite"
This reverts commit 6ab7facd792b59c2e30be82fe42816d24c32cf0d.
* Really extraneous feature.
* Fix build wheels.
* Putting things back in place.
2022-09-21 12:18:46 +02:00
655f4057b7
Removing python3.6 from manylinux it's not supported anymore. ( #1063 )
2022-09-19 12:22:02 +02:00
7c146d9ce5
Turns out we introduced a regression because bad code. ( #1060 )
2022-09-16 11:20:59 +02:00
7bfab48979
Preparing rc1 release. ( #1056 )
...
* Preparing rc1 release.
* Fixing test_alignment_methods
* Fixing the overflowing sequence_id issue (LayoutLMv2 tests caught this).
* Adding overly complex overflowing test.
2022-09-12 16:07:06 +02:00
06025e4ca1
Adding Sequence
for PostProcessor
. ( #1052 )
...
* Adding `Sequence` for `PostProcessor`.
* Fixing node? Writing in the dark here, don't have Python2.7
* `undefined` is not accepted.
* Other test.
2022-08-25 14:50:06 +02:00
37f7bae0f7
Making process_encodings
not eat up the encodings any more. ( #1051 )
...
* Making `process_encodings` not eat up the encodings any more.
* Fixing clippy.
2022-08-25 11:49:18 +02:00
c174b5bd34
Adding m1 build to the release process for Python. ( #1055 )
...
* Adding m1 build to the release process for Python.
* typo.
2022-08-25 11:06:03 +02:00
6878ab028d
Bump node-forge and webpack-dev-server ( #1053 )
...
Bumps [node-forge](https://github.com/digitalbazaar/forge ) and [webpack-dev-server](https://github.com/webpack/webpack-dev-server ). These dependencies needed to be updated together.
Updates `node-forge` from 0.10.0 to 1.3.1
- [Release notes](https://github.com/digitalbazaar/forge/releases )
- [Changelog](https://github.com/digitalbazaar/forge/blob/main/CHANGELOG.md )
- [Commits](https://github.com/digitalbazaar/forge/compare/0.10.0...v1.3.1 )
Updates `webpack-dev-server` from 3.11.3 to 4.10.0
- [Release notes](https://github.com/webpack/webpack-dev-server/releases )
- [Changelog](https://github.com/webpack/webpack-dev-server/blob/master/CHANGELOG.md )
- [Commits](https://github.com/webpack/webpack-dev-server/compare/v3.11.3...v4.10.0 )
---
updated-dependencies:
- dependency-name: node-forge
dependency-type: indirect
- dependency-name: webpack-dev-server
dependency-type: direct:development
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-24 20:08:46 +02:00
460bdded80
Modify Processor
trait to support chaining. ( #1054 )
...
0 modifications yet, everything will consume the vector.
Every test should be green without any modifications.
2022-08-24 19:49:23 +02:00
b1c9bc68b5
Updating code according to clippy. ( #1048 )
...
- Adding `Eq` where possible
- Denied the ref deref warnings as it was spamming and solution not
really better.
2022-08-24 19:45:15 +02:00
67c56adf68
Upgrade macro_rules_attribute to 0.1.2 ( #1038 )
2022-08-08 14:03:19 +02:00
67fb60a33c
Bump terser in /tokenizers/examples/unstable_wasm/www ( #1032 )
...
Bumps [terser](https://github.com/terser/terser ) from 4.8.0 to 4.8.1.
- [Release notes](https://github.com/terser/terser/releases )
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md )
- [Commits](https://github.com/terser/terser/commits )
---
updated-dependencies:
- dependency-name: terser
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-22 09:00:14 +02:00
eb2213842b
Update README.md ( #1019 )
...
* Update README.md
Add reference to normalizer blog post
* Update lib.rs
* Fixing PR + clippy on node.
* Update readme to match docstring.
* Other clippy warning.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-07-19 09:54:29 +02:00
3564f24311
Add from_bytes
approach for creating tokenizers ( #1024 )
...
Signed-off-by: HaoboGu <haobogu@outlook.com >
2022-07-18 16:25:45 +02:00
adf90dcd72
Adding unstable_wasm
feature + example to run tokenizers
on wasm. ( #1009 )
...
* Adding `unstable_wasm` feature + example to run `tokenizers` on wasm.
Co-Authored-By: josephrocca <1167575+josephrocca@users.noreply.github.com >
Co-Authored-By: Matthias Brunel <matthias.brunel@mithrilsecurity.io >
* Adding some serialization tests.
* Updating with comments.
Co-authored-by: josephrocca <1167575+josephrocca@users.noreply.github.com >
Co-authored-by: Matthias Brunel <matthias.brunel@mithrilsecurity.io >
2022-06-10 14:58:02 +02:00
943b5421aa
Changing Decoder
trait to be more composable. ( #938 ) ( #1008 )
...
* Changing `Decoder` trait to be more composable. (#938 )
* Changing `Decoder` trait to be more composable.
Fix #872
* Fixing Python side.
* Fixing test.
* Updating cleanup signature, removing turbofish.
* Adding `Sequence` Decoder.
2022-06-02 14:43:42 +02:00
519cc13be0
Upgrade pyo3 to 0.16 ( #956 )
...
* Upgrade pyo3 to 0.15
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com >
* Upgrade pyo3 to 0.16
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com >
* Install Python before running cargo clippy
* Fix clippy warnings
* Use `PyArray_Check` instead of downcasting to `PyArray1<u8>`
* Enable `auto-initialize` of pyo3 to fix `cargo test
--no-default-features`
* Fix some test cases
Why do they change?
* Refactor and add SAFETY comments to `PyArrayUnicode`
Replace deprecated `PyUnicode_FromUnicode` with `PyUnicode_FromKindAndData`
Co-authored-by: messense <messense@icloud.com >
2022-05-05 15:48:40 +02:00
6533bf0fad
Merge pull request #989 from huggingface/mishig25-patch-2
...
Update pipeline.mdx
2022-04-25 21:03:52 +02:00
00132ba836
Update pipeline.mdx
...
Fix conversion errors
2022-04-25 21:03:31 +02:00
0bd4976dba
Merge pull request #988 from huggingface/mishig25-patch-1
...
Update pipeline.mdx
2022-04-25 17:54:10 +02:00
6a84727368
Update pipeline.mdx
2022-04-25 17:50:12 +02:00
e6cd73a291
.dev0
suffix in python version (#987 )
2022-04-22 09:36:18 +02:00
e7d9e34f9e
Merge pull request #986 from huggingface/doc_build_typo
...
Fix typo in doc-build GH workflow
2022-04-21 16:42:49 +02:00
37957f67f1
Fix typo in doc-build GH workflow
2022-04-21 16:42:04 +02:00
142d7ba381
Merge pull request #980 from huggingface/docs_new_frontend
...
Migrate docs to new frontend
2022-04-21 16:35:42 +02:00
dad9c6c0d2
Revert dev changes
2022-04-21 16:08:22 +02:00
95b5d066d5
Update doc build gh workflow to install rust
2022-04-21 09:20:20 +02:00
c2aa87a256
Add setup.py
extras["dev"]
2022-04-19 15:14:44 +02:00
5c97125d22
Fix hashlink ids
2022-04-18 12:13:40 +02:00
f6ba840e3e
Add @property docs
2022-04-18 11:58:52 +02:00