How Pony (ORM) does its tricks?

Question

Pony ORM does the nice trick of converting a generator expression into SQL. Example:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

I know Python has wonderful introspection and metaprogramming builtin, but how this library is able to translate the generator expression without preprocessing? It looks like magic.

[update]

Blender wrote:

Here is the file that you're after. It seems to reconstruct the generator using some introspection wizardry. I'm not sure if it supports 100% of Python's syntax, but this is pretty cool. – Blender

I was thinking they were exploring some feature from the generator expression protocol, but looking this file, and seeing the ast module involved... No, they are not inspecting the program source on the fly, are they? Mind-blowing...

@BrenBarn: If I try to call the generator outside the select function call, the result is:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

Seems like they are doing more arcane incantations like inspecting the select function call and processing the Python abstract syntax grammar tree on the fly.

I still would like to see someone explaining it, the source is way beyond my wizardry level.

Presumably the p object is an object of a type implemented by Pony that looks at what methods/properties are being accessed on it (e.g., name, startswith) and converts them to SQL. — BrenBarn, Commented Apr 20, 2013 at 2:02
Here is the file that you're after. It seems to reconstruct the generator using some introspection wizardry. I'm not sure if it supports 100% of Python's syntax, but this is pretty cool. — Blender, Commented Apr 20, 2013 at 2:03
@Blender: I've seen this kind of trick in LISP - pulling this stunt in Python is just plain sick! — Paulo Scardine, Commented Apr 20, 2013 at 3:00

Alexander Kozlovsky · Accepted Answer · 2013-04-22 17:37:38Z

Pony ORM author is here.

Pony translates Python generator into SQL query in three steps:

Decompiling of generator bytecode and rebuilding generator AST (abstract syntax tree)
Translation of Python AST into "abstract SQL" -- universal list-based representation of a SQL query
Converting abstract SQL representation into specific database-dependent SQL dialect

The most complex part is the second step, where Pony must understand the "meaning" of Python expressions. Seems you are most interested in the first step, so let me explain how decompiling works.

Let's consider this query:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

Which will be translated into the following SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

And below is the result of this query which will be printed out:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |[email protected]   |***     |John Smith    |USA    |address 1
2 |[email protected]|***     |Matthew Reed  |USA    |address 2
4 |[email protected]|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode. We can get bytecode instructions of this generator using standard python dis module:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM has the function decompile() within module pony.orm.decompiling which can restore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

Let's now see how the decompile() function works.

The decompile() function creates a Decompiler object, which implements the Visitor pattern. The decompiler instance gets bytecode instructions one-by-one. For each instruction the decompiler object calls its own method. The name of this method is equal to the name of current bytecode instruction.

When Python calculates an expression, it uses stack, which stores an intermediate result of calculation. The decompiler object also has its own stack, but this stack stores not the result of expression calculation, but AST node for the expression.

When decompiler method for the next bytecode instruction is called, it takes AST nodes from the stack, combines them into a new AST node, and then puts this node on the top of the stack.

For example, let's see how the subexpression c.country == 'USA' is calculated. The corresponding bytecode fragment is:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

So, the decompiler object does the following:

Calls decompiler.LOAD_FAST('c'). This method puts the Name('c') node on the top of the decompiler stack.
Calls decompiler.LOAD_ATTR('country'). This method takes the Name('c') node from the stack, creates the Geattr(Name('c'), 'country') node and puts it on the top of the stack.
Calls decompiler.LOAD_CONST('USA'). This method puts the Const('USA') node on top of the stack.
Calls decompiler.COMPARE_OP('=='). This method takes two nodes (Getattr and Const) from the stack, and then puts Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) on the top of the stack.

After all bytecode instructions are processed, the decompiler stack contains a single AST node which corresponds to the whole generator expression.

Since Pony ORM needs to decompile generators and lambdas only, this is not that complex, because the instruction flow for a generator is relatively straightforward - it is just a bunch of nested loops.

Currently Pony ORM covers the whole generator instructions set except two things:

Inline if expressions: a if b else c
Compound comparisons: a < b < c

If Pony encounters such expression it raises the NotImplementedError exception. But even in this case you can make it work by passing the generator expression as a string. When you pass a generator as a string Pony doesn't use the decompiler module. Instead it gets the AST using the standard Python compiler.parse function.

Hope this answers your question.

Very performant: (1) Bytecode decompiling is very fast. (2) Since each query has corresponding code object, this code object can be used as a cache key. Because of this, Pony ORM translates each query only once, whereas Django and SQLAlchemy have to translate the same query again and again. (3) As Pony ORM uses IdentityMap pattern, it caches query results within the same transaction. There is a post (in russian) where author states that Pony ORM turned out to be 1.5-3 times faster than Django and SQLAlchemy even without query result caching: habrahabr.ru/post/188842 — Alexander Kozlovsky, Commented Feb 27, 2014 at 19:36
I don't tested it, but some Reddit commenter says it is compatible: tinyurl.com/ponyorm-pypy — Alexander Kozlovsky, Commented Feb 28, 2014 at 10:06
SQLAlchemy has query caching and the ORM makes extensive use of this feature. It isn't on by default because its true we don't have a feature in place to link the construction of a SQL expression to the position in source code which it is declared, which is what the code object is really giving you. We could use stack frame inspection to get the same result but that's just a little too hacky for my tastes. Generation of SQL is the least critical performance area in any case; fetching rows and bookkeeping changes is. — zzzeek, Commented Apr 8, 2014 at 4:13
@randomsurfer_123 probably not, we just need some time to implement it (maybe a week), and there are other tasks that are more important to us. — Alexander Kozlovsky, Commented Mar 18, 2016 at 10:43

Collectives™ on Stack Overflow

How Pony (ORM) does its tricks?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
orm
metaprogramming
dsl
ponyorm
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonormmetaprogrammingdslponyorm or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
orm
metaprogramming
dsl
ponyorm
or ask your own question.