1870 Commits

Author SHA1 Message Date
c46ec97855 Update README 2019-12-03 17:26:20 -05:00
75232c0f06 Fix setup.py 2019-12-03 16:20:20 -05:00
499f5507df Bump versions for 0.0.3 release 2019-12-03 16:11:45 -05:00
ec2ed483a3 Improve python readme with training example 2019-12-03 16:11:03 -05:00
eaafb22511 Add bindings for Trainer in Python 2019-12-03 15:54:15 -05:00
310a2af76b Add BPE empty constructor 2019-12-03 15:39:54 -05:00
0324beea57 BpeTrainer is a Trainer 2019-12-03 15:39:33 -05:00
466555bade Add Trainer trait and Tokenizer.train 2019-12-03 15:38:45 -05:00
768eb9b920 bpe::Error implements std::error::Error 2019-12-03 15:23:08 -05:00
5011523e99 Update python readme 2019-12-03 10:26:19 -05:00
5f31ac3f75 Python release CI (#2) 2019-12-02 19:04:25 -05:00
1a52cda912 Fix yaml indent 2019-11-30 13:06:32 -05:00
f9ccf62301 Try updating to official rust Github Action to avoid missing rust components. 2019-11-30 13:06:32 -05:00
78e7591780 Fix Cargo.toml not found in Rust workflow 2019-11-30 13:06:32 -05:00
5db08ac15d Update wheel building 2019-11-29 22:36:17 -05:00
27ac65c466 Remove onig dependency 2019-11-29 21:35:16 -05:00
d1b6b14bd7 Attempt fix workflows 2019-11-29 19:28:49 -05:00
989e9b03ca Ignore some python files 2019-11-27 12:22:01 -05:00
428890d6e0 Basic python setuptools 2019-11-27 12:21:37 -05:00
e49abab747 Python - Add Decoder/PreTokenizer standalone capabilities 2019-11-26 17:52:19 -05:00
d565bbf309 Container - Add ability to execute 2019-11-26 17:51:26 -05:00
5c6834f363 Added GitHub Action workflow for Rust
This allows for automated build & test of the library.
2019-11-26 09:47:48 +00:00
f4369b312d Python - Add ability to create custom Decoder 2019-11-25 19:14:07 -05:00
d7ba6802df Update gitignore 2019-11-25 15:35:54 -05:00
512e85dfda Update python README 2019-11-24 00:55:13 -05:00
bafdc5e157 Code style 2019-11-24 00:52:48 -05:00
6437c40235 Python - PoC Custom PreTokenizer 2019-11-24 00:52:13 -05:00
b081e6ca04 Python - Also expose default classes 2019-11-24 00:35:05 -05:00
bd1aa80d8a Python - Custom PreTokenizer backbone 2019-11-23 23:59:33 -05:00
891fc12de2 Python - Update example with new format 2019-11-22 21:09:17 -05:00
8fbe3c2662 Python - Add decoders 2019-11-22 21:08:57 -05:00
e44f52024c Python - Set a PreTokenizer in a model 2019-11-22 21:01:52 -05:00
9b71c8f8de Python - BPE construction 2019-11-22 20:57:54 -05:00
f6a9b57b5b Python - Add pre_tokenizers module 2019-11-22 20:56:50 -05:00
39a6d04c53 Improve Python bindings
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00
663644e041 Fix ByteLevel Decoder
The join was done after replacing bytes and building subwords, which was preventing bytes across these subwords to be merged correctly. We need to join first.
2019-11-21 16:50:25 -05:00
634415c098 Add a parallel capable cache for BPE
This allows for some performance improvement in the best case scenarios (up to 40% during some tests)
2019-11-21 16:09:07 -05:00
070fd08583 Update python example 2019-11-21 11:57:57 -05:00
c28a83cdc4 Update python bindings 2019-11-21 11:55:07 -05:00
6853e6c904 Tokenizer decoding 2019-11-21 11:54:54 -05:00
2419c14e42 ByteLevel is also a Decoder 2019-11-21 11:52:55 -05:00
56e37475c3 Add Decoder to Tokenizer 2019-11-21 11:51:43 -05:00
3ec26b332c Add Tokenizer token_to_id/id_to_token 2019-11-20 17:28:28 -05:00
8b3d7d1aa0 Add vocab/merge arguments to example.py 2019-11-20 16:47:02 -05:00
98323d1f21 Update readme and fix example 2019-11-19 19:38:57 -05:00
351d526e1e Basic python bindings 2019-11-19 19:31:37 -05:00
39afc64e13 impl PreTokenizer for Whitespace 2019-11-19 19:31:37 -05:00
2d7c5f04f8 Fix readme indentation 2019-11-18 16:34:13 -05:00
1b32560067 Update readme with simple example 2019-11-18 16:31:35 -05:00
872aa86b71 Basic cli for testing 2019-11-18 15:47:35 -05:00