c46ec97855
Update README
2019-12-03 17:26:20 -05:00
75232c0f06
Fix setup.py
2019-12-03 16:20:20 -05:00
499f5507df
Bump versions for 0.0.3 release
2019-12-03 16:11:45 -05:00
ec2ed483a3
Improve python readme with training example
2019-12-03 16:11:03 -05:00
eaafb22511
Add bindings for Trainer in Python
2019-12-03 15:54:15 -05:00
310a2af76b
Add BPE empty constructor
2019-12-03 15:39:54 -05:00
0324beea57
BpeTrainer is a Trainer
2019-12-03 15:39:33 -05:00
466555bade
Add Trainer trait and Tokenizer.train
2019-12-03 15:38:45 -05:00
768eb9b920
bpe::Error implements std::error::Error
2019-12-03 15:23:08 -05:00
5011523e99
Update python readme
2019-12-03 10:26:19 -05:00
5f31ac3f75
Python release CI ( #2 )
2019-12-02 19:04:25 -05:00
1a52cda912
Fix yaml indent
2019-11-30 13:06:32 -05:00
f9ccf62301
Try updating to official rust Github Action to avoid missing rust components.
2019-11-30 13:06:32 -05:00
78e7591780
Fix Cargo.toml not found in Rust workflow
2019-11-30 13:06:32 -05:00
5db08ac15d
Update wheel building
2019-11-29 22:36:17 -05:00
27ac65c466
Remove onig dependency
2019-11-29 21:35:16 -05:00
d1b6b14bd7
Attempt fix workflows
2019-11-29 19:28:49 -05:00
989e9b03ca
Ignore some python files
2019-11-27 12:22:01 -05:00
428890d6e0
Basic python setuptools
2019-11-27 12:21:37 -05:00
e49abab747
Python - Add Decoder/PreTokenizer standalone capabilities
2019-11-26 17:52:19 -05:00
d565bbf309
Container - Add ability to execute
2019-11-26 17:51:26 -05:00
5c6834f363
Added GitHub Action workflow for Rust
...
This allows for automated build & test of the library.
2019-11-26 09:47:48 +00:00
f4369b312d
Python - Add ability to create custom Decoder
2019-11-25 19:14:07 -05:00
d7ba6802df
Update gitignore
2019-11-25 15:35:54 -05:00
512e85dfda
Update python README
2019-11-24 00:55:13 -05:00
bafdc5e157
Code style
2019-11-24 00:52:48 -05:00
6437c40235
Python - PoC Custom PreTokenizer
2019-11-24 00:52:13 -05:00
b081e6ca04
Python - Also expose default classes
2019-11-24 00:35:05 -05:00
bd1aa80d8a
Python - Custom PreTokenizer backbone
2019-11-23 23:59:33 -05:00
891fc12de2
Python - Update example with new format
2019-11-22 21:09:17 -05:00
8fbe3c2662
Python - Add decoders
2019-11-22 21:08:57 -05:00
e44f52024c
Python - Set a PreTokenizer in a model
2019-11-22 21:01:52 -05:00
9b71c8f8de
Python - BPE construction
2019-11-22 20:57:54 -05:00
f6a9b57b5b
Python - Add pre_tokenizers module
2019-11-22 20:56:50 -05:00
39a6d04c53
Improve Python bindings
...
This is an attempt at actually exposing the same structure that we use in the Rust lib. This will allow Python to instantiate Model/PreTokenizer/... with their own arguments, combining everything without relying on parsed kwargs.
2019-11-22 17:57:36 -05:00
663644e041
Fix ByteLevel Decoder
...
The join was done after replacing bytes and building subwords, which was preventing bytes across these subwords to be merged correctly. We need to join first.
2019-11-21 16:50:25 -05:00
634415c098
Add a parallel capable cache for BPE
...
This allows for some performance improvement in the best case scenarios (up to 40% during some tests)
2019-11-21 16:09:07 -05:00
070fd08583
Update python example
2019-11-21 11:57:57 -05:00
c28a83cdc4
Update python bindings
2019-11-21 11:55:07 -05:00
6853e6c904
Tokenizer decoding
2019-11-21 11:54:54 -05:00
2419c14e42
ByteLevel is also a Decoder
2019-11-21 11:52:55 -05:00
56e37475c3
Add Decoder to Tokenizer
2019-11-21 11:51:43 -05:00
3ec26b332c
Add Tokenizer token_to_id/id_to_token
2019-11-20 17:28:28 -05:00
8b3d7d1aa0
Add vocab/merge arguments to example.py
2019-11-20 16:47:02 -05:00
98323d1f21
Update readme and fix example
2019-11-19 19:38:57 -05:00
351d526e1e
Basic python bindings
2019-11-19 19:31:37 -05:00
39afc64e13
impl PreTokenizer for Whitespace
2019-11-19 19:31:37 -05:00
2d7c5f04f8
Fix readme indentation
2019-11-18 16:34:13 -05:00
1b32560067
Update readme with simple example
2019-11-18 16:31:35 -05:00
872aa86b71
Basic cli for testing
2019-11-18 15:47:35 -05:00