Anthony MOI
59d66c6db8
Doc - Add CI for automatic deployment
2020-11-02 17:07:27 -05:00
Anthony MOI
16e1348038
Doc - Add CI for checking build
2020-11-02 17:07:27 -05:00
Anthony MOI
8e5d90d94d
Doc - Quick js/css update + remove sphinx_tabs deps
2020-11-02 17:07:27 -05:00
Anthony MOI
000c19a7a5
Doc - Improve snippets testing
2020-11-02 17:07:27 -05:00
Anthony MOI
f4e7754112
Doc - Rust snippets moved in tests
2020-11-02 17:07:27 -05:00
Anthony MOI
080302e8c2
Doc - Better language/version selector colors
2020-11-02 17:07:27 -05:00
Anthony MOI
6fb48cb5aa
Doc - more customization + language/version selector
2020-11-02 17:07:27 -05:00
Anthony MOI
6b1b7551f4
Doc - Use huggingface theme
2020-11-02 17:07:27 -05:00
Anthony MOI
e865b7cd7c
Customize the doc for each language
2020-11-02 17:07:27 -05:00
Anthony MOI
7366b9e797
Build the doc for each language
2020-11-02 17:07:27 -05:00
Nicolas Patry
9ebe26b179
Fix imports on node tests.
2020-11-02 17:07:27 -05:00
Nicolas Patry
128197b59a
Fixing node doc location and rust README.
2020-11-02 17:07:27 -05:00
Nicolas Patry
44e8f4be8f
Fixing node.js example.
...
- Now supports more lenient syntax and more aligned with python&Rust.
- Backward compatible.
2020-11-02 17:07:27 -05:00
Nicolas Patry
6f8892e3ae
Upgrade neon version + tests in JS instead of TS.
2020-11-02 17:07:27 -05:00
Nicolas Patry
81bb4f6da3
Actually adding docs.
2020-11-02 17:07:27 -05:00
Nicolas Patry
655809c718
Attempt to get some documentation going.
2020-11-02 17:07:27 -05:00
taufique74
4929809af0
makes from_file() method static
2020-11-01 13:15:15 -05:00
Anthony MOI
8f03d6ddc1
Node - Update tests for models
2020-10-30 13:47:04 -04:00
Anthony MOI
ae88a55ed9
Node - Add UnigramTrainer
2020-10-30 13:47:04 -04:00
Anthony MOI
fb37054b3c
Node - Lint
2020-10-30 13:47:04 -04:00
Anthony MOI
b16a3c6b4d
Node - Add bindings for Unigram
2020-10-30 13:47:04 -04:00
Anthony MOI
991128f9e1
Node - Fix models init methods & add WordLevel
2020-10-30 13:47:04 -04:00
Anthony MOI
816d6ecc9d
Node - Add missing post-processors
2020-10-30 13:47:04 -04:00
Anthony MOI
e8917ad9b6
Node - Add missing pre-tokenizers
2020-10-30 13:47:04 -04:00
Anthony MOI
8fd6388533
Node - Add missing normalizers
2020-10-30 13:47:04 -04:00
Anthony MOI
212594c92f
Node - Add bindings for some components methods
2020-10-30 13:47:04 -04:00
Anthony MOI
2364d376f7
Python - Update CHANGELOG and bump to 0.9.3 for release
2020-10-26 16:40:24 -04:00
Anthony MOI
466f5303eb
Fix UnigramTrainer
2020-10-26 16:31:58 -04:00
Anthony MOI
73b5da917f
Unigram - Add special_tokens at the end of training + optional unk
2020-10-26 10:57:29 -04:00
Anthony MOI
390ef2f9f3
Use special_tokens in UnigramTrainer
2020-10-26 10:57:29 -04:00
Anthony MOI
1a6f4b5204
Allow initial_alphabet on UnigramTrainer
2020-10-26 10:57:29 -04:00
Timur Ganiev
f7c61c267a
Fixed BPE.read_files -> BPE.read_file in SentencePieceBPETokenizer
2020-10-26 10:57:14 -04:00
Anthony MOI
a2289d49b4
Finish exposing the UnicodeScripts PreTokenizer
2020-10-21 11:01:54 -04:00
Anthony MOI
25e74b5400
TemplateProcessing serialization is now deterministic
2020-10-21 11:01:37 -04:00
Nicolas Patry
180371d929
Fixing hanging error while acquiring GIL from custom pretokenizer during training. ( #470 )
...
* Fixing hanging error while acquiring GIL from custom pretokenizer
during training.
Fixes #469
* cleanup
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com >
2020-10-20 14:23:39 -04:00
Anthony MOI
91f602f744
Python - Update CHANGELOG and bump to 0.9.2 for release
2020-10-15 10:14:58 -04:00
Nicolas Patry
2ccd16bf5c
Adding a new tests for PreTokenizer.custom.
...
This example is more illustrative of what's doable for custom
PreTokenizer.
2020-10-15 10:07:48 -04:00
Anthony MOI
2fc0edda01
Fix RobertaProcessing deserialization in PostProcessorWrapper
2020-10-15 09:37:34 -04:00
Anthony MOI
f94a274702
Python - Update CHANGELOG and bump version for release
2020-10-13 14:45:21 -04:00
Nicolas Patry
9e6a992310
Proper fixing of new clippy lints. ( #454 )
...
* Proper fixing of new clippy lints.
* Adding a comment to explain how `filter` works.
* Better fix for the `rev() rev()` problem.
* Limiting visibility of `process_tokens_with_offsets_mut`
2020-10-13 20:43:56 +02:00
Nicolas Patry
88556790e7
Fixing a bug where long tokenizer files would be incorrectly deserialized ( #459 )
...
* Fixing a bug where long tokenizer files would be incorrectly
deserialized
- Add a bunch of tests to check deserialization behaviour
- One tests also confirms current Single deserialization of Sequence.
* Better test locations for Windows + no file dependency in Python binding
Rust side.
* Adressing @n1t0 comments.
2020-10-13 18:44:24 +02:00
Nicolas Patry
b3c016cf9c
Fixing sampling test heuristics to fail less often.
2020-10-13 12:15:02 -04:00
Anthony MOI
3bb794681c
Python - Use 1.46.0 for now
2020-10-09 13:40:35 -04:00
Anthony MOI
83e11a8de4
Python - Update dependencies for release
2020-10-09 13:09:35 -04:00
Anthony MOI
4f4ba4a11a
Python - Bump version for 0.9.0 release
2020-10-09 13:00:19 -04:00
Nicolas Patry
fbca797b3d
Fixing Trainer with u8 instead of chars. ( #452 )
...
* Fixing Trainer with u8 instead of chars.
Now check both optimized and unoptimized encodings schemes for Unigram.
* Small fixes.
* Fixing makefile.
2020-10-09 18:57:14 +02:00
Nicolas Patry
35feff0042
Fixing 1.47 complaints from clippy.
...
needless_collect seems to be off because removing them
*will* cause some borrow checker complaints. We might be able
to correct those better, but probably not everywhere.
2020-10-09 18:39:03 +02:00
Nicolas Patry
dd9fda5d05
Bump rc version.
2020-10-06 11:04:36 +02:00
Anthony MOI
dcb3bba235
TemplateProcessing - pair must use both sequences
2020-10-05 10:21:36 -04:00
Anthony MOI
04163f4ffd
Rust - Fix TemplateProcessing overflowings handling
2020-10-05 10:21:36 -04:00