Commit Graph

40 Commits

Author SHA1 Message Date
tinyboxvk
bdfc38b78d Fix typos (#1715)
* Fix typos

Signed-off-by: tinyboxvk <13696594+tinyboxvk@users.noreply.github.com>

* Update docs/source/quicktour.rst

* Update docs/source-doc-builder/quicktour.mdx

---------

Signed-off-by: tinyboxvk <13696594+tinyboxvk@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-01-09 11:53:20 +01:00
Arthur
3d51a1695f Fix documentation build (#1642)
* use v4

* fix ruff

* style
2024-10-01 14:48:02 +02:00
Arthur
29fef1e7aa [remove black] And use ruff (#1436)
* nits

* Fixing deps.

* Ruff update.

* Import order matters.

* Fix.

* Revert ruff fix.

* Visualizer.

* Putting back the imports.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-03-12 11:24:21 +01:00
Quentin Lhoest
e76f900bc0 Faster datasets train example
Using .iter() is much faster than accessing using row ids
2023-03-23 11:24:30 +01:00
mert-kurttutan
5c18ec5ff5 pyo3 v0.18 migration (#1173)
* pyo v0.18 migration

* Fix formatting issues of black
2023-03-08 11:27:47 +01:00
Nicolas Patry
6113666624 Updating python formatting. (#1079)
* Updating python formatting.

* Forgot gh action.

* Skipping isort to prevent circular imports.

* Updating stub.

* Removing `isort` (it contradicts `stub.py`).

* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
Tal Perry
8916b6bb27 Add a visualization utility to render tokens and annotations in a notebook (#508)
* Draft functionality of visualization

* Added comments to make code more intelligble

* polish the styles

* Ensure colors are stable and comment the css

* Code clean up

* Made visualizer importable and added some docs

* Fix styling

* implement comments from PR

* Fixed the regex for UNK tokens and examples in notebook

* Converted docs to google format

* Added a notebook showing multiple languages and tokenizers

* Added visual indication of chars that are tokenized with >1 token

* Reorganize things a bit and fix import

* Update docs

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-12-04 10:25:56 -05:00
Anthony MOI
3a8627ce4d Improve docs and fix tests around training 2020-11-28 12:29:35 -05:00
Anthony MOI
ed9baeabb7 Add example for training with datasets 2020-11-28 12:29:35 -05:00
Anthony MOI
a0a163fd62 Remove unwanted file 2020-09-24 14:05:47 -04:00
Anthony MOI
171a042ee0 Python - Bump version for dev4 release 2020-09-24 10:16:18 -04:00
Nicolas Patry
a410903051 Upgrading to black 20.8b1 2020-09-24 09:27:30 -04:00
Nicolas Patry
9672995a56 We use 19.10b0 not 20 here... 2020-09-24 08:57:02 +02:00
Nicolas Patry
8f8156fd2c Adressing first pass of comments. 2020-09-24 08:57:02 +02:00
Anthony MOI
b24a2fc178 Some suggestions from @narsil 2020-09-23 15:50:01 -04:00
Anthony MOI
b1097a988f Python - Improved example with custom components 2020-09-23 15:50:01 -04:00
Anthony MOI
5d20322319 Rust - Fix optional parallelism with par_bridge 2020-06-22 20:31:52 -04:00
jaymody
a28fd29204 Python - Fix bug in bert wordpiece example script 2020-04-18 17:50:52 -04:00
Bjarte Johansen
fab97475e5 Python - Update examples to use new models API 2020-04-06 21:40:23 +02:00
Anthony MOI
81be207819 Python - Black auto formatting 2020-02-18 10:45:36 -05:00
Anthony MOI
dd9270a406 Python - Fix example.py for GPT-2
cc @mfuntowicz `from_pretrained` takes only on argument. Do you know if
we can make this compatible otherwise?
2020-02-10 13:51:03 -05:00
Anthony MOI
8585b761d1 Python - More updates to the new API 2020-02-10 11:57:30 -05:00
Anthony MOI
505c428f72 Python - Update example.py with new API 2020-02-10 11:55:14 -05:00
Anthony MOI
42c4691e4d Python - Update Bert default special tokens
Closes #106
2020-02-05 12:55:01 -05:00
Morgan Funtowicz
374f944e32 Use the same vocabs/merges for Python and Rust comparison.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-15 11:57:34 +01:00
Morgan Funtowicz
894f887444 Updated train_bert_wordpiece.py as well.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-14 13:32:02 +01:00
Morgan Funtowicz
7caf9fd823 Updated train_bytelevel_bpe.py to use the high level Python API.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-14 12:00:50 +01:00
Anthony MOI
cc33418044 Python - Update examples with getter/setter 2020-01-07 15:23:11 -05:00
Anthony MOI
7eebd06409 Python - Improve imports 2020-01-06 12:03:01 -05:00
Anthony MOI
fab4e96b51 Python - Add bert wordpiece training example 2020-01-03 19:37:29 -05:00
Anthony MOI
04cfeea2d5 Python - ByteLevel BPE training example file
cc @julien-c
2020-01-02 18:39:31 -05:00
Anthony MOI
3779bf3e19 Python - Update example 2019-12-29 00:38:37 -05:00
Anthony MOI
a7734ffc9f Python - Update doc and readme for add_prefix_space 2019-12-26 10:34:53 -05:00
Anthony MOI
4bc5a7bbe7 Python - fix example 2019-12-24 11:20:40 -05:00
Anthony MOI
036ee603f4 Python - Update example 2019-12-16 18:50:21 -05:00
Anthony MOI
e93cc62a71 Python - Handle kwargs for bert modules 2019-12-13 15:28:29 -05:00
Anthony MOI
3355be89cd Python - Update examples and improve errors 2019-12-13 14:37:29 -05:00
Anthony MOI
6c294c60b0 Python - Add Encoding repr + improve example 2019-12-10 15:18:07 -05:00
Anthony MOI
018f57f054 Python - Update example 2019-12-09 12:51:05 -05:00
Anthony MOI
6437c40235 Python - PoC Custom PreTokenizer 2019-11-24 00:52:13 -05:00