36e3c28a23
Ignore .vim folder
2020-05-01 17:11:54 -04:00
dbc8e68c68
Python - Update tests for new encode
2020-05-01 17:11:54 -04:00
2e105c4258
Python - Update typings for new encode
2020-05-01 17:11:54 -04:00
835f08ab02
Python - Update bindings for new encode
2020-05-01 17:11:54 -04:00
993c1c80a8
Rust - Add some tests
2020-05-01 17:11:54 -04:00
d7a5496606
Rust - encode_batch uses the new interface
2020-05-01 17:07:10 -04:00
52fda08f6e
Rust - Update tests with new encode interface
2020-05-01 17:07:10 -04:00
6ed5ce22e0
Rust - Encode uses the new interface
2020-05-01 17:07:09 -04:00
15aae7bab2
Rust - Further improve encode interface
2020-04-24 22:44:12 -04:00
b5d47754ad
Rust - New encode interface
2020-04-24 22:44:11 -04:00
02cc97756f
Rust - Improve TruncationError
2020-04-24 12:13:17 -04:00
7d2b59b0aa
Rust - Add len() and is_empty() on Encoding
2020-04-24 11:44:10 -04:00
9d75e38cc9
Merge pull request #241 from jaymody/master
...
Python - Fix bug in bert wordpiece example script
2020-04-22 16:12:20 -04:00
a28fd29204
Python - Fix bug in bert wordpiece example script
2020-04-18 17:50:52 -04:00
670f619ab5
Python - bump to 0.7.0 for final release
2020-04-17 12:48:10 -04:00
5be775df0e
Rust - ByteLevel can trim "real" whitespaces too
...
This shouldn't be needed in most cases, but if the tokens include
an AddedToken with a whitespace, it will handle this case too.
2020-04-17 10:47:39 -04:00
3312ad75d9
Python - Bump to 0.7.0rc6 for release
2020-04-16 19:39:04 -04:00
db25a29e96
Python - Update CharBPETokenizer to match GPT BPE ( #239 )
2020-04-16 19:36:41 -04:00
0756480b83
Fix offsets ( #238 )
2020-04-16 19:35:07 -04:00
ad0e488998
Python - Update changelog
2020-04-16 19:32:54 -04:00
249a282f1d
Python - Fix style
2020-04-16 19:31:00 -04:00
77590b9291
style!
2020-04-17 01:29:52 +02:00
7216486686
Update CharLevelBPE
2020-04-17 01:15:02 +02:00
873ac2d9a8
Python - Add missing char_to_word
2020-04-16 18:20:30 -04:00
0524efa8a4
Rust - Fix trimming trailing offset
2020-04-16 16:49:10 -04:00
75e88464a7
Make bytelevel trim offsets test fail 😬
2020-04-16 16:18:10 -04:00
1865ec8d66
Node - Tweak robertaProcessing param types
2020-04-16 16:17:20 -04:00
bdfb02f473
Python - Bump to 0.7.0rc6 for release
2020-04-16 14:42:22 -04:00
5945d2892c
Improve mappings ( #234 )
2020-04-16 14:36:53 -04:00
8834508547
Update CHANGELOGs
2020-04-16 14:25:19 -04:00
71b7830d1b
Rust | Python | Node - Also add char_to_word
2020-04-16 14:23:37 -04:00
4aecd82d07
Node - Improve mappings on Encoding
2020-04-16 14:23:37 -04:00
c5e22c14cb
Python - Improve mappings on Encoding
2020-04-16 14:23:37 -04:00
3fb347b453
Rust - Improve mappings on Encoding
2020-04-16 14:23:35 -04:00
0de276d2a9
Fix offsets ( #236 )
2020-04-16 14:22:50 -04:00
c96c4d95bd
Update CHANGELOGs
2020-04-16 10:34:34 -04:00
95d4ee18f7
Node - Add offsets trimming to RobertaProcessing
2020-04-15 19:15:32 -04:00
81e2cc2fc4
Python - Add offsets trimming to RobertaProcessing
2020-04-15 18:49:38 -04:00
7caa4d94d2
Rust - Add offsets trimming to RobertaProcessing
2020-04-15 18:34:12 -04:00
6058f7576e
Rust - ByteLevel also trims overflowing encodings
2020-04-15 17:24:15 -04:00
690a0dfb6d
Rust - Fix ByteLevel trimming original offsets
2020-04-15 17:07:24 -04:00
26d4aa3c79
Rust - Fix offsets when merging multiple sequences
...
When the input sequence gets split into multiple sub-sequences,
there may be changes in the offsets (original <=> normalized) that
don't get reverted when merging back to one single sequence.
So in order to avoid this, we have to convert back to original offsets
before actually merging the various encodings and normalized strings back
together.
2020-04-15 16:41:57 -04:00
c164baf539
Node - Version 0.6.2
2020-04-13 16:57:44 -04:00
38d53a7b84
Node - Expose more bindings
2020-04-13 16:48:32 -04:00
a42f3581ba
Python - improve compatibility with sentencepiece in the conversion script ( #229 )
2020-04-13 10:48:07 -04:00
0865a9ad55
Python - improve compatibility with sentencepiece in the conversion script
2020-04-11 17:35:50 +02:00
09104afd07
Python - Bump to 0.7.0-rc5 for release
2020-04-09 11:41:10 -04:00
af66d6fc6f
Rust - Bump to 0.10.1 for release
2020-04-09 11:30:59 -04:00
f9c76b6c82
Python - Use PyO3 0.9.2 ( #227 )
2020-04-09 11:26:36 -04:00
a6c33f5de8
Python - update some dependencies
2020-04-09 10:56:26 -04:00