Commit Graph

1825 Commits

Author SHA1 Message Date
5a94a2b6e7 Add missing build targets (#1145)
* M1 3.11 was not out neither windows amd64.

* python@v4.

* Actually upload.

* Update needs.

* Preparing the actual PR.
2023-01-15 10:18:08 +01:00
fe4ae7dc38 Bump json5 from 2.2.0 to 2.2.3 in /bindings/node (#1140)
Bumps [json5](https://github.com/json5/json5) from 2.2.0 to 2.2.3.
- [Release notes](https://github.com/json5/json5/releases)
- [Changelog](https://github.com/json5/json5/blob/main/CHANGELOG.md)
- [Commits](https://github.com/json5/json5/compare/v2.2.0...v2.2.3)

---
updated-dependencies:
- dependency-name: json5
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-03 11:50:51 +01:00
c3fedd96b3 Bump json5, copy-webpack-plugin, webpack and webpack-cli (#1139)
Removes [json5](https://github.com/json5/json5). It's no longer used after updating ancestor dependencies [json5](https://github.com/json5/json5), [copy-webpack-plugin](https://github.com/webpack-contrib/copy-webpack-plugin), [webpack](https://github.com/webpack/webpack) and [webpack-cli](https://github.com/webpack/webpack-cli). These dependencies need to be updated together.


Removes `json5`

Updates `copy-webpack-plugin` from 5.1.2 to 11.0.0
- [Release notes](https://github.com/webpack-contrib/copy-webpack-plugin/releases)
- [Changelog](https://github.com/webpack-contrib/copy-webpack-plugin/blob/master/CHANGELOG.md)
- [Commits](https://github.com/webpack-contrib/copy-webpack-plugin/compare/v5.1.2...v11.0.0)

Updates `webpack` from 4.46.0 to 5.75.0
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](https://github.com/webpack/webpack/compare/v4.46.0...v5.75.0)

Updates `webpack-cli` from 3.3.12 to 5.0.1
- [Release notes](https://github.com/webpack/webpack-cli/releases)
- [Changelog](https://github.com/webpack/webpack-cli/blob/master/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-cli/compare/v3.3.12...webpack-cli@5.0.1)

---
updated-dependencies:
- dependency-name: json5
  dependency-type: indirect
- dependency-name: copy-webpack-plugin
  dependency-type: direct:development
- dependency-name: webpack
  dependency-type: direct:development
- dependency-name: webpack-cli
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-03 10:22:49 +01:00
9b155b5723 [FIX] In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used. (#1136)
* [fix] Use unk_token

In SentencePieceBPETokenizer, when Vocab or  merges is None, unk_token cannot be used.

* [fix] If unk_token is None, this case is also considered.

* Update bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* [FIX] In CharBPETokenizer, Use unk_token.

In CharBPETokenizer, when Vocab or merges is None, unk_token cannot be used.

* Update bindings/python/py_src/tokenizers/implementations/char_level_bpe.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update bindings/python/py_src/tokenizers/implementations/char_level_bpe.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-12-27 11:13:52 +01:00
60a00dda44 Fix one char super tiny typo (#1137)
* Update pipeline.mdx

* Update pipeline.rst
2022-12-26 11:13:38 +01:00
4d520c9664 Ignore Cargo.lock for subfolders (#1131) 2022-12-25 11:35:47 +01:00
fbad581128 Bump derive_builder from 0.9 to 0.12 (#1129) 2022-12-23 23:37:16 +01:00
2bed678958 Fix broken links in docs (#1133) 2022-12-23 23:35:18 +01:00
3e7476de86 Wrap rustdoc html entity in code block (#1130) 2022-12-23 23:30:45 +01:00
03ce27d2fa Bump cached-path from 0.5 to 0.6 (#1127) 2022-12-21 18:10:48 +01:00
5886179eee Bump decode-uri-component in /tokenizers/examples/unstable_wasm/www (#1125)
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-19 14:24:24 +01:00
a408b44429 Bump minimatch from 3.0.4 to 3.1.2 in /bindings/node (#1126)
Bumps [minimatch](https://github.com/isaacs/minimatch) from 3.0.4 to 3.1.2.
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.1.2)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-19 14:09:24 +01:00
bfa842e063 Adding stale bot ? (#1123)
* Adding stale bot ?

* Clippy.
2022-12-19 13:50:48 +01:00
1649d74536 Fixing conda ssl location (#1124)
* Fixing conda build ?

* Reduce the scope to speedup testing.

* Reduce more.

* Trying to link to conda lib.

* Trying to enable `pkg-config` on the codna env.

* Really publish.

* Update conda builds.

* Remove 3.11

* Putting releases back onto release track.
2022-12-19 13:50:36 +01:00
9a25b2cb8e [FIX] In SentencePieceBPETokenizer, when Vocab or merges is None, unk_token cannot be used. (#1120)
* [fix] Use unk_token

In SentencePieceBPETokenizer, when Vocab or  merges is None, unk_token cannot be used.

* [fix] If unk_token is None, this case is also considered.

* Update bindings/python/py_src/tokenizers/implementations/sentencepiece_bpe.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-12-19 13:40:04 +01:00
102dfe87a3 Bump decode-uri-component from 0.2.0 to 0.2.2 in /bindings/node (#1116)
Bumps [decode-uri-component](https://github.com/SamVerschueren/decode-uri-component) from 0.2.0 to 0.2.2.
- [Release notes](https://github.com/SamVerschueren/decode-uri-component/releases)
- [Commits](https://github.com/SamVerschueren/decode-uri-component/compare/v0.2.0...v0.2.2)

---
updated-dependencies:
- dependency-name: decode-uri-component
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-05 18:09:38 +01:00
67080e163a Include license file in Rust crate (#1115)
* Include license file in Rust crate

* Ignore security warning.

* Also for python.

* Upgrading ubuntu version.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-11-30 23:17:56 +01:00
c74e9e62f6 Bump loader-utils in /tokenizers/examples/unstable_wasm/www (#1108)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 1.4.0 to 1.4.2.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v1.4.2/CHANGELOG.md)
- [Commits](https://github.com/webpack/loader-utils/compare/v1.4.0...v1.4.2)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-16 12:01:25 +01:00
e9529cb02f Merge pull request #1107 from huggingface/revert-1101-update_doc_pr_actions
Revert "Update pr docs actions"
2022-11-16 11:41:51 +01:00
ffcf5a4136 Revert "Update pr docs actions (#1101)"
This reverts commit 99c06c82e0.
2022-11-16 11:41:38 +01:00
bbae829a72 Adding rust audit. (#1099)
* Adding rust audit.

* Update clap version + derive_builder (they clashed).

* Ignoring specific CVE which can be ignored

https://github.com/Azure/iot-identity-service/issues/481

* Updating python lock.

* Revert `derive-builder` update.

* Adding back help msg.
2022-11-09 12:59:36 +01:00
99c06c82e0 Update pr docs actions (#1101) 2022-11-09 11:09:52 +01:00
b8a4aa6000 Fixing extra wheels memory usage. (#1098) 2022-11-07 09:11:18 +01:00
11bb2e00f2 Add python 3.11 to manylinux buildwheels (#1096)
* Add python 3.11 to manylinux buildwheels

* Fixing clippy.

* Node clippy.

* Python clippy.

* Changelog + version number update.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-11-07 08:45:04 +01:00
96a9e5715c New version. (#1082)
* New version.

The actual release will happen *before* PyO3 0.17.2 because
the tests were ran before than.

* Manylinux2014 necessary now with Rust 1.64.
2022-10-06 15:45:56 +02:00
4ef0afbeb6 Update old gh actions, remove deprecated doc building. (#1069) 2022-10-05 17:59:46 +02:00
8129dd3309 pyo3: update to 0.17 (#1066)
* python: update bindings to edition 2021

* python: update to pyo3 0.17

* Updating testing.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-10-05 16:59:01 +02:00
6113666624 Updating python formatting. (#1079)
* Updating python formatting.

* Forgot gh action.

* Skipping isort to prevent circular imports.

* Updating stub.

* Removing `isort` (it contradicts `stub.py`).

* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
5f6e978452 Fixing roberta type id (everything is zero). (#1072)
* Fixing roberta type ids (everything is zero).

* We need to fix type_ids for all sequence even when not changing

anything else.

* Fixing tests hopefully better.
2022-09-26 18:00:41 +02:00
6e5569a540 Moving versions numbers to dev mode. (#1067) 2022-09-22 18:24:07 +02:00
63082c4d11 Enabling static interpreter embedding for manylinux. (#1064)
* Removing dead file.

* Checking that we can distribute with static python embedding for

manylinux

* Many linux embed interpreter.

* Building wheels manylinux with static embedding

* Better script.

* typo.

* Using a dummy feature?

* default features ?

* Back into order.

* Fixing manylinux ??.

* Local dir.

* Missing star.

* Makedir ?

* Monkey coding this.

* extension module ?

* Building with default features `RustExtension`.

* bdist_wheel + rustextension any better ?

* update rust-py version.

* Forcing extension module.

* No default features.

* Remove py37 out of spite

* Revert "Remove py37 out of spite"

This reverts commit 6ab7facd792b59c2e30be82fe42816d24c32cf0d.

* Really extraneous feature.

* Fix build wheels.

* Putting things back in place.
2022-09-21 12:18:46 +02:00
655f4057b7 Removing python3.6 from manylinux it's not supported anymore. (#1063) 2022-09-19 12:22:02 +02:00
7c146d9ce5 Turns out we introduced a regression because bad code. (#1060) 2022-09-16 11:20:59 +02:00
7bfab48979 Preparing rc1 release. (#1056)
* Preparing rc1 release.

* Fixing test_alignment_methods

* Fixing the overflowing sequence_id issue (LayoutLMv2 tests caught this).

* Adding overly complex overflowing test.
2022-09-12 16:07:06 +02:00
06025e4ca1 Adding Sequence for PostProcessor. (#1052)
* Adding `Sequence` for `PostProcessor`.

* Fixing node? Writing in the dark here, don't have Python2.7

* `undefined` is not accepted.

* Other test.
2022-08-25 14:50:06 +02:00
37f7bae0f7 Making process_encodings not eat up the encodings any more. (#1051)
* Making `process_encodings` not eat up the encodings any more.

* Fixing clippy.
2022-08-25 11:49:18 +02:00
c174b5bd34 Adding m1 build to the release process for Python. (#1055)
* Adding m1 build to the release process for Python.

* typo.
2022-08-25 11:06:03 +02:00
6878ab028d Bump node-forge and webpack-dev-server (#1053)
Bumps [node-forge](https://github.com/digitalbazaar/forge) and [webpack-dev-server](https://github.com/webpack/webpack-dev-server). These dependencies needed to be updated together.

Updates `node-forge` from 0.10.0 to 1.3.1
- [Release notes](https://github.com/digitalbazaar/forge/releases)
- [Changelog](https://github.com/digitalbazaar/forge/blob/main/CHANGELOG.md)
- [Commits](https://github.com/digitalbazaar/forge/compare/0.10.0...v1.3.1)

Updates `webpack-dev-server` from 3.11.3 to 4.10.0
- [Release notes](https://github.com/webpack/webpack-dev-server/releases)
- [Changelog](https://github.com/webpack/webpack-dev-server/blob/master/CHANGELOG.md)
- [Commits](https://github.com/webpack/webpack-dev-server/compare/v3.11.3...v4.10.0)

---
updated-dependencies:
- dependency-name: node-forge
  dependency-type: indirect
- dependency-name: webpack-dev-server
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-24 20:08:46 +02:00
460bdded80 Modify Processor trait to support chaining. (#1054)
0 modifications yet, everything will consume the vector.
Every test should be green without any modifications.
2022-08-24 19:49:23 +02:00
b1c9bc68b5 Updating code according to clippy. (#1048)
- Adding `Eq` where possible
- Denied the ref deref warnings as it was spamming and solution not
  really better.
2022-08-24 19:45:15 +02:00
67c56adf68 Upgrade macro_rules_attribute to 0.1.2 (#1038) 2022-08-08 14:03:19 +02:00
67fb60a33c Bump terser in /tokenizers/examples/unstable_wasm/www (#1032)
Bumps [terser](https://github.com/terser/terser) from 4.8.0 to 4.8.1.
- [Release notes](https://github.com/terser/terser/releases)
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md)
- [Commits](https://github.com/terser/terser/commits)

---
updated-dependencies:
- dependency-name: terser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-22 09:00:14 +02:00
eb2213842b Update README.md (#1019)
* Update README.md

Add reference to normalizer blog post

* Update lib.rs

* Fixing PR + clippy on node.

* Update readme to match docstring.

* Other clippy warning.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-07-19 09:54:29 +02:00
3564f24311 Add from_bytes approach for creating tokenizers (#1024)
Signed-off-by: HaoboGu <haobogu@outlook.com>
2022-07-18 16:25:45 +02:00
adf90dcd72 Adding unstable_wasm feature + example to run tokenizers on wasm. (#1009)
* Adding `unstable_wasm` feature + example to run `tokenizers` on wasm.

Co-Authored-By: josephrocca <1167575+josephrocca@users.noreply.github.com>
Co-Authored-By: Matthias Brunel <matthias.brunel@mithrilsecurity.io>

* Adding some serialization tests.

* Updating with comments.

Co-authored-by: josephrocca <1167575+josephrocca@users.noreply.github.com>
Co-authored-by: Matthias Brunel <matthias.brunel@mithrilsecurity.io>
2022-06-10 14:58:02 +02:00
943b5421aa Changing Decoder trait to be more composable. (#938) (#1008)
* Changing `Decoder` trait to be more composable. (#938)

* Changing `Decoder` trait to be more composable.

Fix #872

* Fixing Python side.

* Fixing test.

* Updating cleanup signature, removing turbofish.

* Adding `Sequence` Decoder.
2022-06-02 14:43:42 +02:00
519cc13be0 Upgrade pyo3 to 0.16 (#956)
* Upgrade pyo3 to 0.15

Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>

* Upgrade pyo3 to 0.16

Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>

* Install Python before running cargo clippy

* Fix clippy warnings

* Use `PyArray_Check` instead of downcasting to `PyArray1<u8>`

* Enable `auto-initialize` of pyo3 to fix `cargo test
--no-default-features`

* Fix some test cases

Why do they change?

* Refactor and add SAFETY comments to `PyArrayUnicode`

Replace deprecated `PyUnicode_FromUnicode` with `PyUnicode_FromKindAndData`

Co-authored-by: messense <messense@icloud.com>
2022-05-05 15:48:40 +02:00
6533bf0fad Merge pull request #989 from huggingface/mishig25-patch-2
Update pipeline.mdx
2022-04-25 21:03:52 +02:00
00132ba836 Update pipeline.mdx
Fix conversion errors
2022-04-25 21:03:31 +02:00
0bd4976dba Merge pull request #988 from huggingface/mishig25-patch-1
Update pipeline.mdx
2022-04-25 17:54:10 +02:00