tinyboxvk
bdfc38b78d
Fix typos ( #1715 )
...
* Fix typos
Signed-off-by: tinyboxvk <13696594+tinyboxvk@users.noreply.github.com >
* Update docs/source/quicktour.rst
* Update docs/source-doc-builder/quicktour.mdx
---------
Signed-off-by: tinyboxvk <13696594+tinyboxvk@users.noreply.github.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2025-01-09 11:53:20 +01:00
Arthur
3d51a1695f
Fix documentation build ( #1642 )
...
* use v4
* fix ruff
* style
2024-10-01 14:48:02 +02:00
Arthur
29fef1e7aa
[remove black] And use ruff ( #1436 )
...
* nits
* Fixing deps.
* Ruff update.
* Import order matters.
* Fix.
* Revert ruff fix.
* Visualizer.
* Putting back the imports.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-03-12 11:24:21 +01:00
Quentin Lhoest
e76f900bc0
Faster datasets train example
...
Using .iter() is much faster than accessing using row ids
2023-03-23 11:24:30 +01:00
mert-kurttutan
5c18ec5ff5
pyo3 v0.18 migration ( #1173 )
...
* pyo v0.18 migration
* Fix formatting issues of black
2023-03-08 11:27:47 +01:00
Nicolas Patry
6113666624
Updating python formatting. ( #1079 )
...
* Updating python formatting.
* Forgot gh action.
* Skipping isort to prevent circular imports.
* Updating stub.
* Removing `isort` (it contradicts `stub.py`).
* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
Tal Perry
8916b6bb27
Add a visualization utility to render tokens and annotations in a notebook ( #508 )
...
* Draft functionality of visualization
* Added comments to make code more intelligble
* polish the styles
* Ensure colors are stable and comment the css
* Code clean up
* Made visualizer importable and added some docs
* Fix styling
* implement comments from PR
* Fixed the regex for UNK tokens and examples in notebook
* Converted docs to google format
* Added a notebook showing multiple languages and tokenizers
* Added visual indication of chars that are tokenized with >1 token
* Reorganize things a bit and fix import
* Update docs
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2020-12-04 10:25:56 -05:00
Anthony MOI
3a8627ce4d
Improve docs and fix tests around training
2020-11-28 12:29:35 -05:00
Anthony MOI
ed9baeabb7
Add example for training with datasets
2020-11-28 12:29:35 -05:00
Anthony MOI
a0a163fd62
Remove unwanted file
2020-09-24 14:05:47 -04:00
Anthony MOI
171a042ee0
Python - Bump version for dev4 release
2020-09-24 10:16:18 -04:00
Nicolas Patry
a410903051
Upgrading to black 20.8b1
2020-09-24 09:27:30 -04:00
Nicolas Patry
9672995a56
We use 19.10b0 not 20 here...
2020-09-24 08:57:02 +02:00
Nicolas Patry
8f8156fd2c
Adressing first pass of comments.
2020-09-24 08:57:02 +02:00
Anthony MOI
b24a2fc178
Some suggestions from @narsil
2020-09-23 15:50:01 -04:00
Anthony MOI
b1097a988f
Python - Improved example with custom components
2020-09-23 15:50:01 -04:00
Anthony MOI
5d20322319
Rust - Fix optional parallelism with par_bridge
2020-06-22 20:31:52 -04:00
jaymody
a28fd29204
Python - Fix bug in bert wordpiece example script
2020-04-18 17:50:52 -04:00
Bjarte Johansen
fab97475e5
Python - Update examples to use new models API
2020-04-06 21:40:23 +02:00
Anthony MOI
81be207819
Python - Black auto formatting
2020-02-18 10:45:36 -05:00
Anthony MOI
dd9270a406
Python - Fix example.py for GPT-2
...
cc @mfuntowicz `from_pretrained` takes only on argument. Do you know if
we can make this compatible otherwise?
2020-02-10 13:51:03 -05:00
Anthony MOI
8585b761d1
Python - More updates to the new API
2020-02-10 11:57:30 -05:00
Anthony MOI
505c428f72
Python - Update example.py with new API
2020-02-10 11:55:14 -05:00
Anthony MOI
42c4691e4d
Python - Update Bert default special tokens
...
Closes #106
2020-02-05 12:55:01 -05:00
Morgan Funtowicz
374f944e32
Use the same vocabs/merges for Python and Rust comparison.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-01-15 11:57:34 +01:00
Morgan Funtowicz
894f887444
Updated train_bert_wordpiece.py as well.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-01-14 13:32:02 +01:00
Morgan Funtowicz
7caf9fd823
Updated train_bytelevel_bpe.py to use the high level Python API.
...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co >
2020-01-14 12:00:50 +01:00
Anthony MOI
cc33418044
Python - Update examples with getter/setter
2020-01-07 15:23:11 -05:00
Anthony MOI
7eebd06409
Python - Improve imports
2020-01-06 12:03:01 -05:00
Anthony MOI
fab4e96b51
Python - Add bert wordpiece training example
2020-01-03 19:37:29 -05:00
Anthony MOI
04cfeea2d5
Python - ByteLevel BPE training example file
...
cc @julien-c
2020-01-02 18:39:31 -05:00
Anthony MOI
3779bf3e19
Python - Update example
2019-12-29 00:38:37 -05:00
Anthony MOI
a7734ffc9f
Python - Update doc and readme for add_prefix_space
2019-12-26 10:34:53 -05:00
Anthony MOI
4bc5a7bbe7
Python - fix example
2019-12-24 11:20:40 -05:00
Anthony MOI
036ee603f4
Python - Update example
2019-12-16 18:50:21 -05:00
Anthony MOI
e93cc62a71
Python - Handle kwargs for bert modules
2019-12-13 15:28:29 -05:00
Anthony MOI
3355be89cd
Python - Update examples and improve errors
2019-12-13 14:37:29 -05:00
Anthony MOI
6c294c60b0
Python - Add Encoding repr + improve example
2019-12-10 15:18:07 -05:00
Anthony MOI
018f57f054
Python - Update example
2019-12-09 12:51:05 -05:00
Anthony MOI
6437c40235
Python - PoC Custom PreTokenizer
2019-11-24 00:52:13 -05:00