93b37f36dc
styling
2023-09-04 20:54:55 +00:00
058e34b421
make special editable as well
2023-09-04 20:54:29 +00:00
2291c89896
python stub.py
2023-09-04 19:49:36 +00:00
b235f85527
clippy
2023-09-04 19:31:48 +00:00
9aab096da8
fmt
2023-09-04 19:31:05 +00:00
a59bb76aa1
update and todo
2023-09-04 19:21:38 +00:00
c599db1421
nits
2023-09-04 19:11:19 +00:00
d4008b0d7a
cliipy
2023-09-04 19:11:05 +00:00
b117ac7f16
updates
2023-09-04 19:10:22 +00:00
a53dff9bc5
make content writable in python
2023-09-04 18:18:21 +00:00
d9829cdc6e
fix more tests
2023-09-04 17:22:27 +00:00
39bd27e673
fix build
2023-09-01 21:22:07 +00:00
9f0c703f03
update init and src for bingings python
2023-09-01 21:07:01 +00:00
587748ab09
clean derive partial eq
2023-09-01 20:50:34 +00:00
fdef4a118b
fmt
2023-09-01 20:48:47 +00:00
d1566a9ecc
update, // AddedTokens can be updated if value changed
2023-09-01 20:48:36 +00:00
399c6fe852
fix and update tes
2023-09-01 20:40:06 +00:00
2b72017e17
correctly compute the new id: we take the max of the AddedToken + get_vocab_size
2023-09-01 19:03:33 +00:00
db319492f7
clippy
2023-09-01 18:57:39 +00:00
2dca476810
fix some tests
2023-09-01 18:48:50 +00:00
6cca5716af
fix one test?
2023-09-01 18:42:30 +00:00
345b4eba96
updates
2023-09-01 18:41:36 +00:00
8e522a38d9
Updating the docs with the new command. ( #1333 )
2023-08-29 13:15:26 +02:00
d2010d5165
Move to maturing mimicking move for safetensors
. + Rewritten node bindings. ( #1331 )
...
* Move to maturing mimicking move for `safetensors`.
* Tmp.
* Fix sdist.
* Wat?
* Clippy 1.72
* Remove if.
* Conda sed.
* Fix doc check workflow.
* Moving to maturin AND removing http + openssl mess (smoothing transition
moving to `huggingface_hub`)
* Fix dep
* Black.
* New node bindings.
* Fix docs + node cache ?
* Yarn.
* Working dir.
* Extension module.
* Put back interpreter.
* Remove cache.
* New attempt
* Multi python.
* Remove FromPretrained.
* Remove traces of `fromPretrained`.
* Drop 3.12 for windows?
* Typo.
* Put back the default feature for ignoring links during simple test.
* Fix ?
* x86_64 -> x64.
* Remove warning for windows bindings.
* Excluse aarch.
* Include/exclude.
* Put back workflows in correct states.
2023-08-28 16:24:14 +02:00
f2952020d5
Python 38 arm ( #1330 )
2023-08-23 16:29:16 +02:00
f08058ab2b
Reduce number of different revisions by 1 ( #1329 )
2023-08-23 15:57:36 +02:00
6c350d88fe
Re-using scritpts from safetensors. ( #1328 )
2023-08-23 15:37:38 +02:00
d0bb35d5a6
Merge pull request #1316 from boyleconnor/add-expect-for-no-truncation
...
Add `expect()` for disabling truncation
2023-08-18 19:30:53 +02:00
540bf2eb01
pyo3: update to 0.19 ( #1322 )
...
* Bump pyo3 dependency versions
* Fix deprecation warnings from pyo3
---------
Co-authored-by: Mike Lui <mikelui@meta.com >
2023-08-16 18:40:32 +02:00
9a93c50c25
Fix stride condition. ( #1321 )
...
* Release all at once for simplicity.
* rc2
2023-08-14 15:27:55 +02:00
b35d33f981
Release all at once for simplicity. ( #1320 )
2023-08-14 13:49:45 +02:00
fb292d1eae
0.13.4.rc1 ( #1319 )
2023-08-14 12:06:43 +02:00
862046ac94
CD backports ( #1318 )
...
* CD backports
follow
huggingface/safetensors#317
* fix node bindings?
`cargo check` doesnt work on my local configuration from `tokenizers/bindings/node/native`
i don't think it will be a problem but i have difficulty telling
* backport #315
* safetensors#317 back ports
2023-08-10 18:52:22 +02:00
748556a9ed
Fix code style
2023-08-07 15:17:43 -07:00
d47d3e377c
Derive clone for TrainerWrapper ( #1317 )
2023-08-07 15:15:10 +02:00
a0a8ebe03f
Add expect()
for disabling truncation
2023-08-06 13:25:50 -07:00
efea6c7246
Handle when precompiled charsmap is empty ( #1308 )
...
* Handle when precompiled charsmap is empty
* Black
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2023-07-31 14:35:24 +02:00
c2664ae13f
Give error when initializing tokenizer with too high stride ( #1306 )
...
* Split `get_n_added_tokens` into separate method
* Modify `TokenizerImpl.with_truncation()` to raise an error if given bad parameters
* Return Python error if `tokenizer.with_truncation()` fails
* Add dummy variable assignment for `no_truncation()` case
* Unrelated fmt fix.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2023-07-28 09:16:44 +02:00
bb38f390a6
Single warning for holes. ( #1303 )
...
* Single warning for holes.
* Dummy.
2023-07-25 12:57:23 +02:00
d6326b2b88
feat: Added CITATION.cff. ( #1302 )
2023-07-25 12:16:09 +02:00
ea4d3f634c
Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node ( #1299 )
...
Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap ) from 1.2.3 to 1.2.4.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases )
- [Commits](https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4 )
---
updated-dependencies:
- dependency-name: word-wrap
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-21 08:08:10 +02:00
291b2e23ae
Fixing clippy warnings on 1.71. ( #1296 )
...
* Fixing clippy warnings on 1.71.
* Fix.
* Fmt.
* Python clippy.
* Should really set my env back again.
* Fix.
2023-07-16 15:58:38 +02:00
4811f769a1
import Tuple from typing ( #1295 )
2023-07-14 17:39:29 +02:00
150559b61e
master -> main ( #1292 )
2023-07-12 11:51:22 +02:00
92bfb9c993
Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node ( #1291 )
...
Bumps [tough-cookie](https://github.com/salesforce/tough-cookie ) from 4.0.0 to 4.1.3.
- [Release notes](https://github.com/salesforce/tough-cookie/releases )
- [Changelog](https://github.com/salesforce/tough-cookie/blob/master/CHANGELOG.md )
- [Commits](https://github.com/salesforce/tough-cookie/compare/v4.0.0...v4.1.3 )
---
updated-dependencies:
- dependency-name: tough-cookie
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-10 09:44:31 +02:00
26659de473
revise type specification ( #1289 )
2023-07-06 16:36:48 +02:00
864135bef1
Add unigram bytefallback ( #1217 )
...
* current updates will go red
* cargo fmt
* npm install
* refactor train for unigram to allow bytefallbakc (breaking)
* fmt
* nits
* update
* add a proper test
* fix encode optimised fallback + add trainer arg
* fixes
* fixes
* fix tests
* add test
* fmt
* fix rust test
* update python bindings
* update
* pub is okay and needed
* more fix
* cleanup
* remove useles id
* MissingUnkId error
* nits
* fix offset
* add a test in python
* update src bindings
* remove bytefallback from trainer
* styling
* update pckg
* lint
* fmt
* stup with dev
* update code based on review
* remove unused function
* udpate python test to compare ids
* fix option bool issues
* final fix
* clippy
* fix npm isntall
* update
* update test
* more in depth testing
* Lint
* last attempt to fix node
* update node bindings
* fmt
* Update tokenizers/src/models/unigram/model.rs
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
* update based on review
* simpler test
* lint
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2023-06-26 10:46:59 +02:00
8c9cfb0b68
Improve error for truncation with too high stride ( #1275 )
2023-06-12 10:38:42 +02:00
348ed70e58
[doc build] Use secrets ( #1273 )
2023-06-09 12:58:27 +02:00
5d70f15bfb
Update README.md - Broken link ( #1272 )
...
* Update README.md - Broken link
fixed "python documentation" link
* Update README.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2023-06-08 10:20:11 +02:00