b8a4aa6000
Fixing extra wheels memory usage. ( #1098 )
2022-11-07 09:11:18 +01:00
11bb2e00f2
Add python 3.11 to manylinux buildwheels ( #1096 )
...
* Add python 3.11 to manylinux buildwheels
* Fixing clippy.
* Node clippy.
* Python clippy.
* Changelog + version number update.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-11-07 08:45:04 +01:00
96a9e5715c
New version. ( #1082 )
...
* New version.
The actual release will happen *before* PyO3 0.17.2 because
the tests were ran before than.
* Manylinux2014 necessary now with Rust 1.64.
2022-10-06 15:45:56 +02:00
4ef0afbeb6
Update old gh actions, remove deprecated doc building. ( #1069 )
2022-10-05 17:59:46 +02:00
8129dd3309
pyo3: update to 0.17 ( #1066 )
...
* python: update bindings to edition 2021
* python: update to pyo3 0.17
* Updating testing.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-10-05 16:59:01 +02:00
6113666624
Updating python formatting. ( #1079 )
...
* Updating python formatting.
* Forgot gh action.
* Skipping isort to prevent circular imports.
* Updating stub.
* Removing `isort` (it contradicts `stub.py`).
* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
5f6e978452
Fixing roberta type id (everything is zero). ( #1072 )
...
* Fixing roberta type ids (everything is zero).
* We need to fix type_ids for all sequence even when not changing
anything else.
* Fixing tests hopefully better.
2022-09-26 18:00:41 +02:00
6e5569a540
Moving versions numbers to dev
mode. ( #1067 )
2022-09-22 18:24:07 +02:00
63082c4d11
Enabling static interpreter embedding for manylinux. ( #1064 )
...
* Removing dead file.
* Checking that we can distribute with static python embedding for
manylinux
* Many linux embed interpreter.
* Building wheels manylinux with static embedding
* Better script.
* typo.
* Using a dummy feature?
* default features ?
* Back into order.
* Fixing manylinux ??.
* Local dir.
* Missing star.
* Makedir ?
* Monkey coding this.
* extension module ?
* Building with default features `RustExtension`.
* bdist_wheel + rustextension any better ?
* update rust-py version.
* Forcing extension module.
* No default features.
* Remove py37 out of spite
* Revert "Remove py37 out of spite"
This reverts commit 6ab7facd792b59c2e30be82fe42816d24c32cf0d.
* Really extraneous feature.
* Fix build wheels.
* Putting things back in place.
2022-09-21 12:18:46 +02:00
655f4057b7
Removing python3.6 from manylinux it's not supported anymore. ( #1063 )
2022-09-19 12:22:02 +02:00
7c146d9ce5
Turns out we introduced a regression because bad code. ( #1060 )
2022-09-16 11:20:59 +02:00
7bfab48979
Preparing rc1 release. ( #1056 )
...
* Preparing rc1 release.
* Fixing test_alignment_methods
* Fixing the overflowing sequence_id issue (LayoutLMv2 tests caught this).
* Adding overly complex overflowing test.
2022-09-12 16:07:06 +02:00
06025e4ca1
Adding Sequence
for PostProcessor
. ( #1052 )
...
* Adding `Sequence` for `PostProcessor`.
* Fixing node? Writing in the dark here, don't have Python2.7
* `undefined` is not accepted.
* Other test.
2022-08-25 14:50:06 +02:00
37f7bae0f7
Making process_encodings
not eat up the encodings any more. ( #1051 )
...
* Making `process_encodings` not eat up the encodings any more.
* Fixing clippy.
2022-08-25 11:49:18 +02:00
c174b5bd34
Adding m1 build to the release process for Python. ( #1055 )
...
* Adding m1 build to the release process for Python.
* typo.
2022-08-25 11:06:03 +02:00
6878ab028d
Bump node-forge and webpack-dev-server ( #1053 )
...
Bumps [node-forge](https://github.com/digitalbazaar/forge ) and [webpack-dev-server](https://github.com/webpack/webpack-dev-server ). These dependencies needed to be updated together.
Updates `node-forge` from 0.10.0 to 1.3.1
- [Release notes](https://github.com/digitalbazaar/forge/releases )
- [Changelog](https://github.com/digitalbazaar/forge/blob/main/CHANGELOG.md )
- [Commits](https://github.com/digitalbazaar/forge/compare/0.10.0...v1.3.1 )
Updates `webpack-dev-server` from 3.11.3 to 4.10.0
- [Release notes](https://github.com/webpack/webpack-dev-server/releases )
- [Changelog](https://github.com/webpack/webpack-dev-server/blob/master/CHANGELOG.md )
- [Commits](https://github.com/webpack/webpack-dev-server/compare/v3.11.3...v4.10.0 )
---
updated-dependencies:
- dependency-name: node-forge
dependency-type: indirect
- dependency-name: webpack-dev-server
dependency-type: direct:development
...
Signed-off-by: dependabot[bot] <support@github.com >
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-24 20:08:46 +02:00
460bdded80
Modify Processor
trait to support chaining. ( #1054 )
...
0 modifications yet, everything will consume the vector.
Every test should be green without any modifications.
2022-08-24 19:49:23 +02:00
b1c9bc68b5
Updating code according to clippy. ( #1048 )
...
- Adding `Eq` where possible
- Denied the ref deref warnings as it was spamming and solution not
really better.
2022-08-24 19:45:15 +02:00
67c56adf68
Upgrade macro_rules_attribute to 0.1.2 ( #1038 )
2022-08-08 14:03:19 +02:00
67fb60a33c
Bump terser in /tokenizers/examples/unstable_wasm/www ( #1032 )
...
Bumps [terser](https://github.com/terser/terser ) from 4.8.0 to 4.8.1.
- [Release notes](https://github.com/terser/terser/releases )
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md )
- [Commits](https://github.com/terser/terser/commits )
---
updated-dependencies:
- dependency-name: terser
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-22 09:00:14 +02:00
eb2213842b
Update README.md ( #1019 )
...
* Update README.md
Add reference to normalizer blog post
* Update lib.rs
* Fixing PR + clippy on node.
* Update readme to match docstring.
* Other clippy warning.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2022-07-19 09:54:29 +02:00
3564f24311
Add from_bytes
approach for creating tokenizers ( #1024 )
...
Signed-off-by: HaoboGu <haobogu@outlook.com >
2022-07-18 16:25:45 +02:00
adf90dcd72
Adding unstable_wasm
feature + example to run tokenizers
on wasm. ( #1009 )
...
* Adding `unstable_wasm` feature + example to run `tokenizers` on wasm.
Co-Authored-By: josephrocca <1167575+josephrocca@users.noreply.github.com >
Co-Authored-By: Matthias Brunel <matthias.brunel@mithrilsecurity.io >
* Adding some serialization tests.
* Updating with comments.
Co-authored-by: josephrocca <1167575+josephrocca@users.noreply.github.com >
Co-authored-by: Matthias Brunel <matthias.brunel@mithrilsecurity.io >
2022-06-10 14:58:02 +02:00
943b5421aa
Changing Decoder
trait to be more composable. ( #938 ) ( #1008 )
...
* Changing `Decoder` trait to be more composable. (#938 )
* Changing `Decoder` trait to be more composable.
Fix #872
* Fixing Python side.
* Fixing test.
* Updating cleanup signature, removing turbofish.
* Adding `Sequence` Decoder.
2022-06-02 14:43:42 +02:00
519cc13be0
Upgrade pyo3 to 0.16 ( #956 )
...
* Upgrade pyo3 to 0.15
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com >
* Upgrade pyo3 to 0.16
Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com >
* Install Python before running cargo clippy
* Fix clippy warnings
* Use `PyArray_Check` instead of downcasting to `PyArray1<u8>`
* Enable `auto-initialize` of pyo3 to fix `cargo test
--no-default-features`
* Fix some test cases
Why do they change?
* Refactor and add SAFETY comments to `PyArrayUnicode`
Replace deprecated `PyUnicode_FromUnicode` with `PyUnicode_FromKindAndData`
Co-authored-by: messense <messense@icloud.com >
2022-05-05 15:48:40 +02:00
6533bf0fad
Merge pull request #989 from huggingface/mishig25-patch-2
...
Update pipeline.mdx
2022-04-25 21:03:52 +02:00
00132ba836
Update pipeline.mdx
...
Fix conversion errors
2022-04-25 21:03:31 +02:00
0bd4976dba
Merge pull request #988 from huggingface/mishig25-patch-1
...
Update pipeline.mdx
2022-04-25 17:54:10 +02:00
6a84727368
Update pipeline.mdx
2022-04-25 17:50:12 +02:00
e6cd73a291
.dev0
suffix in python version (#987 )
2022-04-22 09:36:18 +02:00
e7d9e34f9e
Merge pull request #986 from huggingface/doc_build_typo
...
Fix typo in doc-build GH workflow
2022-04-21 16:42:49 +02:00
37957f67f1
Fix typo in doc-build GH workflow
2022-04-21 16:42:04 +02:00
142d7ba381
Merge pull request #980 from huggingface/docs_new_frontend
...
Migrate docs to new frontend
2022-04-21 16:35:42 +02:00
dad9c6c0d2
Revert dev changes
2022-04-21 16:08:22 +02:00
95b5d066d5
Update doc build gh workflow to install rust
2022-04-21 09:20:20 +02:00
c2aa87a256
Add setup.py
extras["dev"]
2022-04-19 15:14:44 +02:00
5c97125d22
Fix hashlink ids
2022-04-18 12:13:40 +02:00
f6ba840e3e
Add @property docs
2022-04-18 11:58:52 +02:00
fd005a7c4e
Add doc-builder gh workflows
2022-04-18 09:50:31 +02:00
6eda286ab1
Init new docs
2022-04-18 09:37:14 +02:00
66c9af26f6
Fixing the documentation for ByteLevel
in Python ( #982 )
...
* Fixing the documentation for `ByteLevel` in Python
* Python stub.py (after rebuilding ofc).
2022-04-14 16:29:50 +02:00
8a9bb28f46
Preparing for 0.12.1 ( #978 )
...
* Preparing for 0.12.1
* Updated the changelog.
2022-04-12 17:57:33 +02:00
4a9da798e2
Adding a new document that is the checklist to make ( #975 )
...
* Adding a new document that is the checklist to make
a new `tokenizers` release.
This will help making sure nothing is forgotten.
* Update RELEASE.md
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com >
* Update RELEASE.md
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com >
* Update RELEASE.md
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com >
* Update RELEASE.md
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com >
* Adding runnning full test suite instructions.
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com >
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com >
2022-04-12 14:18:09 +02:00
ec43947786
Revert "Changing Decoder
trait to be more composable. ( #938 )" ( #971 )
...
This reverts commit cdabef14c4
.
2022-04-04 09:43:28 +02:00
23a22da18c
Update the builder to use earlier windows version (2022) is not understood. ( #969 )
...
* Update the builder to use earlier windows version (2022) is not
understood.
* No node for windows.
* Ready to deploy.
2022-03-31 15:00:11 +02:00
0eb7455fe5
Preparing 0.12
release. ( #967 )
...
* Preparing `0.12` release.
* Fix click version: https://github.com/psf/black/issues/2964
2022-03-31 11:06:33 +02:00
28cd3dce2a
Bump minimist from 1.2.5 to 1.2.6 in /bindings/node ( #966 )
...
Bumps [minimist](https://github.com/substack/minimist ) from 1.2.5 to 1.2.6.
- [Release notes](https://github.com/substack/minimist/releases )
- [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6 )
---
updated-dependencies:
- dependency-name: minimist
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 09:52:43 +02:00
20ec60aeba
Adding a link to the Ruby port of tokenizers
( #961 )
2022-03-24 17:09:30 +01:00
28fe0e40e7
Preventing yelling on empty OrderedVocab (triggered by pickle.dumps). ( #963 )
2022-03-24 17:09:18 +01:00
a5f644616b
Fix the error test for Python 3.10 (error message is different). ( #962 )
2022-03-23 10:35:58 +01:00