mirror of
https://github.com/mii443/tokenizers.git
synced 2025-12-07 21:28:19 +00:00
Update readme and fix example
This commit is contained in:
@@ -1,10 +1,20 @@
|
||||
### Python Bindings
|
||||
|
||||
```
|
||||
python3.7 -m venv .env
|
||||
# This expect the rust chain to be nightly
|
||||
rustup update
|
||||
rustup default nightly
|
||||
|
||||
# In this folder:
|
||||
python3 -m venv .env
|
||||
source .env/bin/activate
|
||||
pip install maturin
|
||||
maturin build
|
||||
maturin develop --release
|
||||
|
||||
# Then test:
|
||||
pip install transformers
|
||||
|
||||
python example.py --file <FILE_PATH>
|
||||
# or
|
||||
python example.py
|
||||
```
|
||||
|
||||
@@ -17,27 +17,27 @@ if args.file is not None:
|
||||
text = [ line.strip() for line in fp ]
|
||||
else:
|
||||
text = """
|
||||
The Zen of Python, by Tim Peters
|
||||
Beautiful is better than ugly.
|
||||
Explicit is better than implicit.
|
||||
Simple is better than complex.
|
||||
Complex is better than complicated.
|
||||
Flat is better than nested.
|
||||
Sparse is better than dense.
|
||||
Readability counts.
|
||||
Special cases aren't special enough to break the rules.
|
||||
Although practicality beats purity.
|
||||
Errors should never pass silently.
|
||||
Unless explicitly silenced.
|
||||
In the face of ambiguity, refuse the temptation to guess.
|
||||
There should be one-- and preferably only one --obvious way to do it.
|
||||
Although that way may not be obvious at first unless you're Dutch.
|
||||
Now is better than never.
|
||||
Although never is often better than *right* now.
|
||||
If the implementation is hard to explain, it's a bad idea.
|
||||
If the implementation is easy to explain, it may be a good idea.
|
||||
Namespaces are one honking great idea -- let's do more of those!
|
||||
""".split("\n")
|
||||
The Zen of Python, by Tim Peters
|
||||
Beautiful is better than ugly.
|
||||
Explicit is better than implicit.
|
||||
Simple is better than complex.
|
||||
Complex is better than complicated.
|
||||
Flat is better than nested.
|
||||
Sparse is better than dense.
|
||||
Readability counts.
|
||||
Special cases aren't special enough to break the rules.
|
||||
Although practicality beats purity.
|
||||
Errors should never pass silently.
|
||||
Unless explicitly silenced.
|
||||
In the face of ambiguity, refuse the temptation to guess.
|
||||
There should be one-- and preferably only one --obvious way to do it.
|
||||
Although that way may not be obvious at first unless you're Dutch.
|
||||
Now is better than never.
|
||||
Although never is often better than *right* now.
|
||||
If the implementation is hard to explain, it's a bad idea.
|
||||
If the implementation is easy to explain, it may be a good idea.
|
||||
Namespaces are one honking great idea -- let's do more of those!
|
||||
""".split("\n")
|
||||
|
||||
|
||||
tok_p = GPT2Tokenizer.from_pretrained('gpt2')
|
||||
|
||||
Reference in New Issue
Block a user