Commit Graph

1294 Commits

Author SHA1 Message Date
Anthony MOI
59d66c6db8 Doc - Add CI for automatic deployment 2020-11-02 17:07:27 -05:00
Anthony MOI
16e1348038 Doc - Add CI for checking build 2020-11-02 17:07:27 -05:00
Anthony MOI
8e5d90d94d Doc - Quick js/css update + remove sphinx_tabs deps 2020-11-02 17:07:27 -05:00
Anthony MOI
000c19a7a5 Doc - Improve snippets testing 2020-11-02 17:07:27 -05:00
Anthony MOI
f4e7754112 Doc - Rust snippets moved in tests 2020-11-02 17:07:27 -05:00
Anthony MOI
080302e8c2 Doc - Better language/version selector colors 2020-11-02 17:07:27 -05:00
Anthony MOI
6fb48cb5aa Doc - more customization + language/version selector 2020-11-02 17:07:27 -05:00
Anthony MOI
6b1b7551f4 Doc - Use huggingface theme 2020-11-02 17:07:27 -05:00
Anthony MOI
e865b7cd7c Customize the doc for each language 2020-11-02 17:07:27 -05:00
Anthony MOI
7366b9e797 Build the doc for each language 2020-11-02 17:07:27 -05:00
Nicolas Patry
9ebe26b179 Fix imports on node tests. 2020-11-02 17:07:27 -05:00
Nicolas Patry
128197b59a Fixing node doc location and rust README. 2020-11-02 17:07:27 -05:00
Nicolas Patry
44e8f4be8f Fixing node.js example.
- Now supports more lenient syntax and more aligned with python&Rust.
- Backward compatible.
2020-11-02 17:07:27 -05:00
Nicolas Patry
6f8892e3ae Upgrade neon version + tests in JS instead of TS. 2020-11-02 17:07:27 -05:00
Nicolas Patry
81bb4f6da3 Actually adding docs. 2020-11-02 17:07:27 -05:00
Nicolas Patry
655809c718 Attempt to get some documentation going. 2020-11-02 17:07:27 -05:00
taufique74
4929809af0 makes from_file() method static 2020-11-01 13:15:15 -05:00
Anthony MOI
8f03d6ddc1 Node - Update tests for models 2020-10-30 13:47:04 -04:00
Anthony MOI
ae88a55ed9 Node - Add UnigramTrainer 2020-10-30 13:47:04 -04:00
Anthony MOI
fb37054b3c Node - Lint 2020-10-30 13:47:04 -04:00
Anthony MOI
b16a3c6b4d Node - Add bindings for Unigram 2020-10-30 13:47:04 -04:00
Anthony MOI
991128f9e1 Node - Fix models init methods & add WordLevel 2020-10-30 13:47:04 -04:00
Anthony MOI
816d6ecc9d Node - Add missing post-processors 2020-10-30 13:47:04 -04:00
Anthony MOI
e8917ad9b6 Node - Add missing pre-tokenizers 2020-10-30 13:47:04 -04:00
Anthony MOI
8fd6388533 Node - Add missing normalizers 2020-10-30 13:47:04 -04:00
Anthony MOI
212594c92f Node - Add bindings for some components methods 2020-10-30 13:47:04 -04:00
Anthony MOI
2364d376f7 Python - Update CHANGELOG and bump to 0.9.3 for release 2020-10-26 16:40:24 -04:00
Anthony MOI
466f5303eb Fix UnigramTrainer 2020-10-26 16:31:58 -04:00
Anthony MOI
73b5da917f Unigram - Add special_tokens at the end of training + optional unk 2020-10-26 10:57:29 -04:00
Anthony MOI
390ef2f9f3 Use special_tokens in UnigramTrainer 2020-10-26 10:57:29 -04:00
Anthony MOI
1a6f4b5204 Allow initial_alphabet on UnigramTrainer 2020-10-26 10:57:29 -04:00
Timur Ganiev
f7c61c267a Fixed BPE.read_files -> BPE.read_file in SentencePieceBPETokenizer 2020-10-26 10:57:14 -04:00
Anthony MOI
a2289d49b4 Finish exposing the UnicodeScripts PreTokenizer 2020-10-21 11:01:54 -04:00
Anthony MOI
25e74b5400 TemplateProcessing serialization is now deterministic 2020-10-21 11:01:37 -04:00
Nicolas Patry
180371d929 Fixing hanging error while acquiring GIL from custom pretokenizer during training. (#470)
* Fixing hanging error while acquiring GIL from custom pretokenizer
during training.

Fixes #469

* cleanup

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-10-20 14:23:39 -04:00
Anthony MOI
91f602f744 Python - Update CHANGELOG and bump to 0.9.2 for release 2020-10-15 10:14:58 -04:00
Nicolas Patry
2ccd16bf5c Adding a new tests for PreTokenizer.custom.
This example is more illustrative of what's doable for custom
PreTokenizer.
2020-10-15 10:07:48 -04:00
Anthony MOI
2fc0edda01 Fix RobertaProcessing deserialization in PostProcessorWrapper 2020-10-15 09:37:34 -04:00
Anthony MOI
f94a274702 Python - Update CHANGELOG and bump version for release 2020-10-13 14:45:21 -04:00
Nicolas Patry
9e6a992310 Proper fixing of new clippy lints. (#454)
* Proper fixing of new clippy lints.

* Adding a comment to explain how `filter` works.

* Better fix for the `rev() rev()` problem.

* Limiting visibility of `process_tokens_with_offsets_mut`
2020-10-13 20:43:56 +02:00
Nicolas Patry
88556790e7 Fixing a bug where long tokenizer files would be incorrectly deserialized (#459)
* Fixing a bug where long tokenizer files would be incorrectly
deserialized

- Add a bunch of tests to check deserialization behaviour
- One tests also confirms current Single deserialization of Sequence.

* Better test locations for Windows + no file dependency in Python binding
Rust side.

* Adressing @n1t0 comments.
2020-10-13 18:44:24 +02:00
Nicolas Patry
b3c016cf9c Fixing sampling test heuristics to fail less often. 2020-10-13 12:15:02 -04:00
Anthony MOI
3bb794681c Python - Use 1.46.0 for now 2020-10-09 13:40:35 -04:00
Anthony MOI
83e11a8de4 Python - Update dependencies for release 2020-10-09 13:09:35 -04:00
Anthony MOI
4f4ba4a11a Python - Bump version for 0.9.0 release 2020-10-09 13:00:19 -04:00
Nicolas Patry
fbca797b3d Fixing Trainer with u8 instead of chars. (#452)
* Fixing Trainer with u8 instead of chars.

Now check both optimized and unoptimized encodings schemes for Unigram.

* Small fixes.

* Fixing makefile.
2020-10-09 18:57:14 +02:00
Nicolas Patry
35feff0042 Fixing 1.47 complaints from clippy.
needless_collect seems to be off because removing them
*will* cause some borrow checker complaints. We might be able
to correct those better, but probably not everywhere.
2020-10-09 18:39:03 +02:00
Nicolas Patry
dd9fda5d05 Bump rc version. 2020-10-06 11:04:36 +02:00
Anthony MOI
dcb3bba235 TemplateProcessing - pair must use both sequences 2020-10-05 10:21:36 -04:00
Anthony MOI
04163f4ffd Rust - Fix TemplateProcessing overflowings handling 2020-10-05 10:21:36 -04:00