Skip to content

hsfzxjy/lambdex

Repository files navigation

lambdex

PyPI version fury.io PyPI pyversions PyPI status Build Status GPLv3 license

lambdex allows you to write multi-line anonymous function expression (called a lambdex) in an idiomatic manner. Below is a quick example of a recursive Fibonacci function:

def_(lambda n: [
    if_[n <= 0] [
        raise_[ValueError(f'{n} should be positive')]
    ],
    if_[n <= 2] [
        return_[1]
    ],
    return_[callee_(n - 1) + callee_(n - 2)]
])(10)  # 55

Compared with ordinary lambda, which only allows single expression as body, lambdex may contain multiple "statements" in analogue to imperative control flows, whilst does not violate the basic syntax of Python.

Table of Content

More about lambdex

An anonymous function is a function definition that is not bound to an identifier, which is ubiquitous in most languages with first-class functions. The language feature could be handy for logics that appear to be short-term use, and therefore adopted widely in some functional programming paradigms.

Python provides lambda <arg>: <expr> for such purpose. Lambdas are good for simple functionalities, but appear limited if logical complexity goes up. Consequently, higher-order functions (e.g., decorators) are often implemented as nested named functions, which is not concise enough.

lambdex as an experimental complement to lambdas, aims to provide a syntax similar to Python for anonymous functions. The syntax itself is built upon valid Python expressions, and therefore requires no modification to the interpreter. This package transpiles lambdexes into Python bytecodes at runtime, and therefore ensures the efficiency.

Installation & Usage

You can install lambdex from PyPI by

pip install pylambdex

or from Github by

pip install git+https://github.com/hsfzxjy/lambdex

To use lambdex, a simple import is required:

from lambdex import def_

my_sum = def_(lambda a, b: [
    return_[a + b]
])

That's it! You don't even need to import other keywords such as return_.

Language Features

We are going to explore a wide range of features supported by lambdex in the following sections.

Parameters

The parameter declaration of lambdexes appears after the lambda. The syntax supports most variants of declaration just as ordinary functions.

show code
# ordinary parameters
def_(lambda a, b: [...])

# parameters with default values
def_(lambda a, b=1: [...])

# starred arguments
def_(lambda *args, **kwargs: [...])

# keyword-only arguments
def_(lambda *, a, b: [...])

# positional-only arguments (Python 3.8+)
def_(lambda a, b, /: [...])

Variable assignment

Lambdexes use < instead of = for assignments, since = in Python is valid only in statements.

show code
def_(lambda: [
    foo < "bar",
])
show equivalent function
def anonymous():
    foo = "bar"

< is chainable like ordinary =.

show code
def_(lambda: [
    foo < baz < "bar",
])
show equivalent function
def anonymous():
    foo = baz = "bar"

Note that < has a higher precedence than not, and, or and if...else.... R-value with these operators should be enclosed by parentheses:

show code
def_(lambda: [
    foo < (a or b and not c),
    foo < (a if cond else b),
])

Tuple destruction is also supported:

show code
def_(lambda: [
    (a, b) < (b, a),
    (a, *rest, c) < [1, 2, 3],
])

In Python 3.8 or above, the walrus operator := might also be used. But be careful that Python enforces parentheses around := in many cases.

show code
def_(lambda: [
    foo := "bar",           # OK
    foo := baz := "bar",    # syntax error
    foo := (baz := "bar"),  # OK
    if_[condition] [
        foo := "bar",       # syntax error
        (foo := "bar"),     # OK
    ]
])

Augmented assignment

The augmented assignments are written as [op]_<, for example, +_< for +=. The snippet below illustrates all supported augmented assignments:

show code
def_(lambda: [
    a +_< 1,
    a -_< 1,
    a *_< 1,
    a /_< 1,
    a //_< 1,
    a @_< 1,
    a %_< 1,
    a <<_< 1,
    a >>_< 1,
    a **_< 1,
    a &_< 1,
    a |_< 1,
    a ^_< 1,
])
show equivalent function
def anonymous():
    a += 1
    a -= 1
    a *= 1
    a /= 1
    a //= 1
    a @= 1
    a %= 1
    a <<= 1
    a >>= 1
    a **= 1
    a &= 1
    a |= 1
    a ^= 1

Conditional statement

Lambdexes use if_, elif_ and else_ for conditional control flows.

show code
def_(lambda: [
    if_[condition_1] [
        ...,
    ].elif_[condition_2] [
        ...,
    ].else_[
        ...,
    ]
])
show equivalent function
def anonymous():
    if condition_1:
        ...
    elif condition_2:
        ...
    else:
        ...

Looping

Lambdexes support the two kinds of looping by keywords for_ and while_.

show code
def_(lambda: [
    # for...in...else...
    for_[i in range(10)] [
        print_(i),
    ].else_[
        print("the optional else clause"),
    ],

    # while...else...
    while_[condition] [
        ...,
    ].else_[
        print("the optional else clause"),
    ]
])
show equivalent function
def anonymous():
    # for...in...else...
    for i in range(10):
        print(i)
    else:
        print("the optional else clause")

    # while...else...
    while condition:
        print("the optional else clause")

break_ and continue_ are also supported.

show code
def_(lambda: [
    for_[i in range(10)] [
        if_[i >= 5] [
            break_
        ].else_[
            continue_
        ]
    ]
])
show equivalent function
def anonymous():
    for i in range(10):
        if i >= 5:
            break
        else:
            continue

With statement

With statements are supported by the with_ keyword. The optional as is written using >.

show code
def_(lambda: [
    # simple `with`
    with_[open("foo")] [
        ...
    ]

    # `with` with `as`
    with_[open("foo") > fd] [
        ...
    ]

    # multiple `with`
    with_[open("foo"), open("bar") > fd] [
        ...
    ]
])
show equivalent function
def anonymous():
    # simple `with`
    with open("foo"):
        ...

    # `with` with `as`
    with open("foo") as fd:
        ...

    # multiple `with`
    with open("foo"), open("bar") as fd:
        ...

Try statement

The ordinary try statements are supported by keywords try_, except_, else_ and finally_.

show code
def_(lambda: [
    try_[
        ...
    ].except_[RuntimeError] [
        ...
    ].except_[
        ...
    ].else_[
        ...
    ].finally_[
        ...
    ]
])
show equivalent function
def anonymous():
    try:
        ...
    except RuntimeError:
        ...
    except:
        ...
    else:
        ...
    finally:
        ...

The optional as in except clause is written as >.

show code
def_(lambda: [
    try_[
        ...
    ].except_[RuntimeError > e] [
        ...
    ]
])
show equivalent function
def anonymous():
    try:
        ...
    except RuntimeError as e:
        ...

Yield statement

The yield and yield...from... structures are supported by keywords yield_ and yield_from_. A lambdex contains one or more yield_ or yield_from_ will automatically become a generator.

show code
def_(lambda: [
    yield_[1, 2],
    yield_from_[range(3, 10)],
])
show equivalent function
def anonymous():
    yield (1, 2)
    yield from range(3, 10)

yield_ itself is an expression, and thus can appear in anywhere an expression is allowed. Note that parentheses might be added.

show code
def_(lambda: [
    a < (yield_),

    if_[a < (yield_)] [...],

    with_[(yield_) > cm] [...],

    for_[i in (yield_)] [...]
])
show equivalent function
def anonymous():
    a = (yield)
    if a < (yield): ...
    with (yield) as cm: ...
    for i in (yield): ...

Async and Await

lambdex supports coroutines by keywords async_def_, async_for_, async_with_ and await_.

show code
from lambdex import async_def_
async_def_(lambda: [
    async_for_[a in b] [ ... ],
    async_with_[a > b] [ ... ],
    await_[a],
])
show equivalent function
async def anonymous():
    async for a in b: ...
    async with a as b: ...
    await a

Miscellaneous

Lambdexes support some other keywords in Python too.

The return_ is analogue to keyword return.

show code
def_(lambda: [
    return_[a, b]
])
show equivalent function
def anonymous():
    return a, b

The pass_ is analogue to keyword pass.

show code
def_(lambda: [
    pass_
])
show equivalent function
def anonymous():
    pass

The raise_ is analogue to keyword raise.

show code
def_(lambda: [
    try_[
        raise_[RuntimeError]
    ].except_[ValueError > e] [
        # the optional from clause
        raise_[RuntimeError].from_[e]
    ].except_[
        # the bare raise
        raise_
    ]
])
show equivalent function
def anonymous():
    try:
        raise RuntimeError
    except ValueError as e:
        raise RuntimeError from e
    except:
        raise

The del_ is analogue to keyword del.

show code
def_(lambda: [
    a < [1, 2],
    del_[a[0], a],
])
show equivalent function
def anonymous():
    a = [1, 2]
    del a[0], a

The global_ and nonlocal_ are analogue to keywords global and nonlocal.

show code
def_(lambda: [
    global_[a],

    return_[def_(lambda: [
        nonlocal_[a],
    ])]
])
show equivalent function
def anonymous():
    global a

    def _inner():
        nonlocal a

    return _inner

Nested lambdexes

Lambdexes can be nested to construct more complicated logics. Lambdexes respect the nested scoping rules in Python, i.e., inner lambdex captures names defined in its parent scopes. For example, we can define IIFE like in JavaScript to capture looping variables.

show code
# without IIFE
arr = []
for i in range(10):
    arr.append(def_(lambda: [
        print(i)
    ]))
for func in arr:
    func()  # print "9" x10 times

# with IIFE
arr = []
for i in range(10):
    func = def_(lambda i: [
        return_[def_(lambda: [
            print(i)
        ])]
    ])(i)
    arr.append(func)
for func in arr:
    func()  # print from "0" to "9"

Recursion

One can always access the current lambdex itself via callee_ within a lambdex. The feature is quite handy since you don't need to assign a lambdex to a name for doing recursion.

# Summing from 1 to 10
(def_(lambda n: [
    if_[n == 1] [
        return_[n]
    ],
    return_[callee_(n - 1) + n]
]))(10)

Note that callee_ within an inner lambdex repesents itself instead of the outer one:

f = def_(lambda: [
    inner < def_(lambda: [
        return_[callee_]
    ]),
    return_[inner, inner(), callee_]
])
f1, f2, f3 = f()
f1 is f2  # True
f3 is f   # True

Renaming functions

A lambdex may have an optional name which uses the syntax def_.<name>(...). For example:

def_.one_divided_by_zero(lambda: [
    1 / 0,
])

The name is used for improving readability of a traceback when an error occurs. For example, the function above yields an exception:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    f()
  File "test.py", line 3, in one_divided_by_zero
    1 / 0,
ZeroDivisionError: division by zero

The last frame of the traceback displays a name one_divided_by_zero instead of some anonymous_xxx by default.

But be careful that this feature does not imply any name bindings, that is, you can not use the name as a variable to reference a function:

def_.one_divided_by_zero(lambda: [
    1 / 0,
    one_divided_by_zero,  # NameError
])
one_divided_by_zero  # NameError

def_(lambda: [
    def_.inner_func(lambda: [
        inner_func  # NameError
    ]),
    inner_func,  # NameError
])

Detailed Compile-time and Runtime Error

lambdex preserves information of source code such as line number or token offsets. The information are used to provide detailed messages when error occurs.

For example, the following code mis-types else_ as els_:

from lambdex import def_

def_(lambda: [
    if_[cond][
        ...
    ].els_[
        ...
    ]
])

which will yield a SyntaxError at compile-time:

Traceback (most recent call last):
  File "demo.py", line 3, in <module>
    def_(lambda: [
  --- Traceback omitted ---
  File "demo.py", line 6
    ].els_[
         ^
SyntaxError: expect 'else_' or 'elif_'

Errors at runtime can also be located to corresponding lines. For example:

from lambdex import def_

def_(lambda: [
    def_(lambda: [
        a < 1 / 0,
        return_[a]
    ])()
])()

will yield:

Traceback (most recent call last):
  File "demo.py", line 3, in <module>
    def_(lambda: [
  File "demo.py", line 4, in anonymous_d598829c
    def_(lambda: [
  File "demo.py", line 5, in anonymous_dc2006c1
    a < 1 / 0,
ZeroDivisionError: division by zero

EDGE CASES

We are going to discuss several edge cases in this section.

Running in an REPL

If you are using an interactive environment (REPL), like IDLE or IPython, you should import the keywords from lambdex.repl:

>>> from lambdex.repl import def_
>>> my_sum = def_(lambda a, b: [
...     return_[a + b]
... ])
...
>>> my_sum(1, 2)
3

The statement should be executed at the beginning to ensure that corresponding patching stuff is enabled.

Currently lambdex has been well tested on 3 REPL environments: the built-in Python REPL, IDLE and IPython (Jupyter). Other REPL may or may not be supported.

Declaration Disambiguity

Suppose you are running the following code:

f1, f2 = def_(lambda a, b: [return_[a + b]]), def_(lambda a, b: [return_[a * b]])

The code yields an exception SyntaxError: ambiguious declaration 'def_'.

What's going on here? The problem is that there are more than one lambdexes defined on the same line. Since CPython provides no effective way but a line number for locating a given lambda, the lambdex compiler fails to obtain the source code of the lambda in this case. A workaround is to prepend an identifier after def_ of lambdex:

f1, f2 = def_.f1(lambda a, b: [return_[a + b]]), def_.f2(lambda a, b: [return_[a * b]])

With this, the compiler can now tell them from each other.

In the example above, it's not necessary to add identifier for both lambdexes. The following is also acceptable, as long as their declarations are different:

f1, f2 = def_.f1(lambda a, b: [return_[a + b]]), def_(lambda a, b: [return_[a * b]])

Runtime Efficiency

The transpilation procedure could be very time-consuming, and thus degrades the runtime efficiency. To solve the problem, lambdex itself provides several mechanisms on different levels for optimizing the bytecodes.

Bytecode Caching

By default, a lambdex defined at a specific location will be compiled only once. The code object of compiled lambdex will be cached and reused in the future execution. Such mechanism applies to lambdexes defined either in a looping or as an inner function, i.e., the two lambdexes below would be compiled only once:

s = 0
for i in range(10000):
    def_(lambda: [          # compiled at i = 0
        global_[s],
        s < s + i,
    ])()

def foo(i):
    return def_(lambda: [   # compiled at the first time `foo()` executed
        global_[s],
        s < s + i,
        return_[s],
    ])

Bytecode Optimization at Function Level

Bytecode caching reduces most of the redundant and heavy jobs, but still has some overhead -- the core of lambdex needs to update some metadata (such as closure cellvars) every time def_ was executed. For example, one may find that the snippet below costs too much time to run (like >3s):

from lambdex import def_
s = 0
def sum():
    n = 1000000
    for i in range(n):
        adder = def_(lambda: [
            global_[s],
            s < s + i
        ])
        adder()
    assert s == n * (n - 1) / 2
sum()

To optimize, one can use the @asmopt decorator:

from lambdex import def_, asmopt
s = 0
@asmopt
def sum():
    n = 1000000
    for i in range(n):
        adder = def_(lambda: [
            global_[s],
            s < s + i
        ])
        adder()
    assert s == n * (n - 1) / 2
sum()

The running time will now reduce to ~0.3s, which is x10 faster and the same as using ordinary functions. The magical @asmopt eliminates def_ calling and directly stores compiled lambdex on sum. It is worth to note that @asmopt should always be the innermost decorator.

Bytecode Optimization at Module Level

The previous mechanism only applies to lambdexes within some functions, and still has some overhead at module initialization phase. Can we do better? Absolutely yes! One can use the # lambdex: modopt directive to optimize the whole module, and persist the optimized bytecode into corresponding .pyc files.

# modopt_demo.py

# the directive could be placed everywhere
# lambdex: modopt
from lambdex import def_
s = 0
n = 1000000
for i in range(n):
    adder = def_(lambda: [
        global_[s],
        s < s + i
    ])
    adder()
assert s == n * (n - 1) / 2
sum()
$ time python3 -m modopt_demo  #  > 3s: 1st time, unoptimized
$ time python3 -m modopt_demo  # ~0.3s: 2nd time and later, optimized
$ time python3 -m modopt_demo  # ~0.3s

Optimized bytecodes will be invalidated when the source file is edited, but be available in the following executions. Thus you can see that the script costs rather long time at first, but becomes efficient afterwards.

It's worth to note that such mechanism is unavailable when you run the file as a script via python3 modopt_demo.py, which is a limitation of CPython. In other cases, such as using python3 -m modopt_demo or importing as a module in other files, the mechanism works well.

Customization

Users are able to customize some aspects of lambdex, in order to fit their preference.

Keyword and Operator Aliasing

If you don't like the default keywords or operators, lambdex allows you to use alternative ones. See the doc for detailed configuration.

Language Extension

lambdex allows you to customize some of the syntax. For how to enable specific extension, please forward to the doc.

Currently the following ones are supported:


await_attribute

With this enabled, you can use Rust-style await expressions.

show code
async_def_(lambda: [
    a.await_.b.await_.c,
])
show equivalent function
async def anonymous():
    (await (await a).b).c

implicit_return

With this enabled, the last statement of a function body will be regarded as the return value.

show code
def_(lambda: [
    1 + 1
])
show equivalent function
def anonymous():
    return 1 + 1

But be careful that this doesn't apply to assignments at the last:

show code
def_(lambda: [
    a < 1
])
show equivalent function
def anonymous():
    a = 1

Code Formatting

The proposed lambdex syntax violates the convention of most code formatters. In order to keep the code tidy, this library provides a light-weight formatter lxfmt for lambdex syntax, which can either work standalonely or cooperate with existing formatters.

Here's an example of what lxfmt does:

show code
from lambdex import def_

def f():
    return def_.myfunc(  # comment1
        lambda a, b:   [# comment2
  if_[condition] [
      f2 < def_(lambda:[a+b]),
      return_[f2],
            ],try_[# comment3
      body,
  ].except_[Exception > e] [
      except_handler
  ] # comment4
  ,         ],# comment5
        )
show code
from lambdex import def_


def f():
    return def_.myfunc(lambda a, b: [  # comment1
        # comment2
        if_[condition] [
            f2 < def_(lambda: [
                a+b
            ]),
            return_[f2],
        ],
        try_ [  # comment3
            body,
        ].except_[Exception > e] [
            except_handler
        ],  # comment4
        # comment5
    ])

Standalone lambdex formatter

The usage of standalone lxfmt is shown below:

usage: lxfmt [-h] [-d | -i | -q] [-p] [files [files ...]]

Default formatter for lambdex

positional arguments:
  files           reads from stdin when no files are specified.

optional arguments:
  -h, --help      show this help message and exit
  -d, --diff      print the diff for the fixed source
  -i, --in-place  make changes to files in place
  -q, --quiet     output nothing and set return value
  -p, --parallel  run in parallel when formatting multiple files.

For example, use lxfmt -i file.py to format in-place, or lxfmt -d file.py to show the difference before and after formatting.

Lambdex formatter as post-processor

lxfmt can work as a post-processor of existing formatter, such as yapf. One can specify a formatter backend by prepending -- -b BACKEND to the command. The overall usage is shown below:

usage: lxfmt [ARGS OF BACKEND] -- [-h] [-b BACKEND] [-e EXECUTABLE]

Lambdex formatter as a post-processor for specific backend

optional arguments:
  -h, --help            show this help message and exit
  -b BACKEND, --backend BACKEND
                        name of formatter backend (default: dummy)
  -e EXECUTABLE, --executable EXECUTABLE
                        executable of backend

Note that [ARGS OF BACKEND] are the arguments fed to the specified backend.

For example, to use lxfmt after yapf, with yapf style configuration at ~/.config/yapf/style, one may use:

lxfmt file.py --style ~/.config/yapf/style -- -b yapf

Currently lxfmt supports only yapf. Adapters for other formatters will be added in the future.

Mocking existing formatter executable

The -- -b BACKEND appears to be verbose, and sometimes you may want to alias the command of "formatter backend + post-processor" to save the typing work. The library provides another tool lxfmt-mock to do the job.

usage: lxfmt-mock [-h] [-r] BACKEND

Mock or reset specified formater backend

positional arguments:
  BACKEND      The backend to be mocked/reset

optional arguments:
  -h, --help   show this help message and exit
  -r, --reset  If specified, the selected command will be reset

For example, running lxfmt-mock yapf, the tool will search for and list out available yapf executables to be mocked:

$ lxfmt-mock yapf
[?] Which one do you want to mock?:
 ❯ /home/me/.local/bin/yapf
   /usr/data/anaconda3/bin/yapf

By choosing a executable, e.g. /home/me/.local/bin/yapf, /home/me/.local/bin/yapf <args> will become a shorthand for lxfmt <args> -- -b yapf -e /home/me/.local/bin/yapf. The original executable will be stored at /home/me/.local/bin/original_yapf.

To reset a mocked executable, simply run lxfmt-mock yapf -r and choose from the list.

Mocking a formatter backend could be very useful when you want to enable lambdex code formatting in your IDE/editor. By mocking the executable your IDE/editor uses, you can enjoy the feature on the fly without modifying any settings.

Known Issues & Future

Currently lambdex doesn't support:

  1. type annotation
  2. import statements

Type annotation [1] and import statements [2] will not be supported.

Lambdexes also violate linters, which is inevitable.

Besides, the upcoming versions will:

  • add style options for lxfmt

, in order to provide a better developing experience.

Q & A

Why using brackets "[]" to enclose statement heads and bodies instead of parentheses "()"?

Brackets are easier to type than parentheses on most of the keyboards.


Why using "<" and ">" for assignment and as?

The design is from three considerations. 1) Comparators such as "<", "<=", ">" or ">=" have lower precedence than most of the other operators, thus allowing R-values without parentheses for most of the time; 2) in AST representation, chained comparators have a flat structure, which is easier to parse; 3) "<" and ">" visually illustrate the direction of data flows.

The preference of "<" ">" over "<=" "=>" is that the previous ones consume only one character and are easier to type.


Why use configuration file based keyword and operator aliasing instead of a programmatic approach?

The design is from two concerns.

  1. A programmatic approach may cause inconsistency at runtime, which is difficult for troubleshooting. For example, if one declares the aliasing in mod/__init__.py and uses the new keywords in mod/A.py, the aliasing works fine if mod/A.py imported as mod.A, but fails if run as a standalone script.
  2. The compiler and formatter should behave consistently when processing the same file. If a programmatic approach used, the formatter must apply semantic analysis to figure out the aliasing rules, which is far more complicated.

Lambdex appears to be less readable than functions and will mess up my code. Why should I use it?

The project is not to criticize the present design of Python, but an experimental attempt to provide alternative for the ones who need a better anonymous function expression. The need may be from a second language they are familiar with, or the paradigms they want to use.

lambdex decides not to perform any modification on the interpreter, but build the new syntax upon existing Python syntax. The choice determines that keywords should be aliased and artifacts like "[]" would appear everywhere, which reduces the readability. To mitigate this, lambdex is paying effort to make the lambdex syntax resemble the Python syntax.

It is true that there's a trade-off between readability and functionality. The decision should depend on your own requirements.


What's the magic behind lambdex?

def_ or def_.<ident> are actually callables, which take a lambda object as input, transpile it into an ordinary function and then return. The definition is in lambdex.keywords module.

The transpilation process can be roughly separated into three stages.

In the first stage, we try to find the source code of given lambda object and parse it into AST. Source code searching is performed by lambdex.utils.ast::ast_from_source, which is modified from inspect.getsourcelines to work more robust on lambdas. The obtained source is parsed into AST, which then pattern-matched to locate the Lambda node. The entry of this stage is lambdex.ast_parser::lambda_to_ast.

In the second stage, we traverse the AST of lambda object, replacing some node patterns with correponding Python statements, recursively building up the body of new function. Inner lambdexes are detached from where they lay, and become nested functions within the constructed body. This part is at lambdex.compiler.dispatcher and lambdex.compiler.rules.

In the last stage, we compile the new AST into code object, restore metadata (globals, closures, etc.) from the original lambda object, and wrap it by a Function object. Modification might be applied to AST to correct the compilation result, e.g., wrap AST in a dummy function to make specific names nonlocal instead of global. This is done in lambdex.compiler.core::compile_lambdex. Bytecode caching also happens in this stage.

For better understanding, you may look into the source code and check the detailed implementation.

License

Copyright (c) 2021 Jingyi Xie (hsfzxjy). Licensed under the GNU General Public License version 3.