A small compiler that can convert Python scripts to pickle bytecode.
- Python 3.8+
No third-party modules are required.
Using pip:
$ pip install pickora
From source:
$ git clone https://github.com/splitline/Pickora.git
$ cd Pickora
$ python setup.py install
Compile from a string:
$ pickora -c 'from builtins import print; print("Hello, world!")' -o output.pkl
$ python -m pickle output.pkl # load the pickle bytecode
Hello, world!
None
Compile from a file:
$ echo 'from builtins import print; print("Hello, world!")' > hello.py
$ pickora hello.py # output compiled pickle bytecode to stdout directly
b'\x80\x04\x95(\x00\x00\x00\x00\x00\x00\x00\x8c\x08builtins\x8c\x05print\x93\x94\x94h\x01\x8c\rHello, world!\x85R.'
usage: pickora [-h] [-c CODE] [-p PROTOCOL] [-e] [-O] [-o OUTPUT] [-d] [-r]
[-f {repr,raw,hex,base64,none}]
[source]
A toy compiler that can convert Python scripts into pickle bytecode.
positional arguments:
source source code file
optional arguments:
-h, --help show this help message and exit
-c CODE, --code CODE source code string
-p PROTOCOL, --protocol PROTOCOL
pickle protocol
-e, --extended enable extended syntax (trigger find_class)
-O, --optimize optimize pickle bytecode (with pickletools.optimize)
-o OUTPUT, --output OUTPUT
output file
-d, --disassemble disassemble pickle bytecode
-r, --run run (load) pickle bytecode immediately
-f {repr,raw,hex,base64,none}, --format {repr,raw,hex,base64,none}
output format, none means no output
Basic usage: `pickora samples/hello.py` or `pickora --code 'print("Hello, world!")' --extended`
- Basic types: int, float, bytes, string, dict, list, set, tuple, bool, None
- Assignment:
val = dict_['x'] = obj.attr = 'meow'
- Augmented assignment:
x += 1
- Named assignment:
(x := 1337)
- Unpacking:
a, b, c = 1, 2, 3
- Function call:
f(arg1, arg2)
- Doesn't support keyword argument.
- Import
from module import things
(directly usingSTACK_GLOBALS
bytecode)
- Macros (see below for more details)
STACK_GLOBAL
GLOBAL
INST
OBJ
NEWOBJ
NEWOBJ_EX
BUILD
Note: All extended syntaxes are implemented by importing other built-in modules. So with this option will trigger
find_class
when loading the pickle bytecode.
- Attributes:
obj.attr
(usingbuiltins.getattr
only when you need to "load" an attribute) - Operators (using
operator
module)- Binary operators:
+
,-
,*
,/
etc. - Unary operators:
not
,~
,+val
,-val
- Compare:
0 < 3 > 2 == 2 > 1
(usingbuiltins.all
for chained comparing) - Subscript:
list_[1:3]
,dict_['key']
(usingbuiltins.slice
for slice) - Boolean operators (using
builtins.next
,builtins.filter
)- and: using
operator.not_
- or: using
operator.truth
(a or b or c)
->next(filter(truth, (a, b, c)), c)
(a and b and c)
->next(filter(not_, (a, b, c)), c)
- and: using
- Binary operators:
- Import
import module
(usingimportlib.import_module
)
- Lambda
lambda x,y=1: x+y
- Using
types.CodeType
andtypes.FunctionType
- [Known bug] If any global variables are changed after the lambda definition, the lambda function won't see those changes.
There are currently 4 macros available: STACK_GLOBAL
, GLOBAL
, INST
and BUILD
.
Example:
function_name = input("> ") # > system
func = STACK_GLOBAL('os', function_name) # <built-in function system>
func("date") # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
- PUSH modname
- PUSH name
- STACK_GLOBAL
Example:
func = GLOBAL("os", "system") # <built-in function system>
func("date") # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
Simply write this piece of bytecode: f"c{modname}\n{name}\n"
Example:
command = input("cmd> ") # cmd> date
INST("os", "system", (command,)) # Tue Jan 13 33:33:37 UTC 2077
Behaviour:
- PUSH a MARK
- PUSH
args
by order - Run this piece of bytecode:
f'i{modname}\n{name}\n'
state
is forinst.__setstate__(state)
andslotstate
is for setting attributes.
Example:
from collections import _collections_abc
BUILD(_collections_abc, None, {'__all__': ['ChainMap', 'Counter', 'OrderedDict']})
Behaviour:
- PUSH
inst
- PUSH
(state, slotstate)
(tuple) - PUSH
BUILD
RTFM.
It's cool.
No, not at all, it's definitely useless.
Yep, it's cool garbage.
No. All pickle can do is just simply define a variable or call a function, so this kind of syntax wouldn't exist.
But if you want to do things like:
ans = input("Yes/No: ")
if ans == 'Yes':
print("Great!")
elif ans == 'No':
exit()
It's still achievable! You can rewrite your code like this:
from functools import partial
condition = {'Yes': partial(print, 'Great!'), 'No': exit}
ans = input("Yes/No: ")
condition.get(ans, repr)()
ta-da!
For the loop syntax, you can try to use map
/ starmap
/ reduce
etc .
And yes, you are right, it's functional programming time!