Port FlatBuffers to Python. #112

rw · 2014-12-23T00:00:06Z

Implement code generation and runtime library for Python 2 and 3, derived
from the Go implementation. Additionally, the test suite verifies:

the exact bytes in the Builder buffer during many scenarios,
vtable deduplication, and
table construction, via a fuzzer derived from the Go implementation.

rw · 2014-12-23T00:01:52Z

This is a feature-complete Python port, derived from the Go version. I'm not trying to collide with other PRs, but instead provide more options for whatever ends up being merged.

rw · 2014-12-23T00:36:53Z

Please provide feedback on my use of ctypes; it seems mostly unnecessary now but I would benefit from additional opinions.

shaxbee · 2014-12-23T06:33:44Z

I've tried ctypes and ended up using struct because it enforces value ranges automatically, takes care of endianess and returns native python data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
   ....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
   ....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.

rw · 2014-12-23T07:23:41Z

Nice microbenchmarks.

I am personally more interested in correctness than speed at this point,
hence my emphasis on quite a bit of testing.
On Dec 22, 2014 10:33 PM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

I've tried ctypes and ended up using struct because it enforces value
ranges automatically, takes care of endianess and returns native python
data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.

—
Reply to this email directly or view it on GitHub
#112 (comment).

rw · 2014-12-23T07:49:12Z

Also, what do you mean by steal?
On Dec 22, 2014 11:23 PM, "Robert Winslow" me@rwinslow.com wrote:

Nice microbenchmarks.

I am personally more interested in correctness than speed at this point,
hence my emphasis on quite a bit of testing.
On Dec 22, 2014 10:33 PM, "Zbigniew Mandziejewicz" <
notifications@github.com> wrote:

I've tried ctypes and ended up using struct because it enforces value
ranges automatically, takes care of endianess and returns native python
data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.

—
Reply to this email directly or view it on GitHub
#112 (comment).

shaxbee · 2014-12-23T11:45:09Z

If you don't mind I'll use parts of your builder implementation and put datatype definitions in separate module as you did and remove hardcoded datatype sizes.

rw · 2014-12-23T11:54:34Z

I'd rather you keep your PR at its current scope..."lifting" code is not
the kind way to do things.
On Dec 23, 2014 6:45 AM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

If you don't mind I'll use parts of your builder implementation and put
datatype definitions in separate module as you did and remove hardcoded
datatype sizes.

—
Reply to this email directly or view it on GitHub
#112 (comment).

shaxbee · 2014-12-23T12:00:59Z

I don't understand - you've mentioned combining my PR with yours in #110. I'll go ahead and provide own implementation of Builder then...

rw · 2014-12-23T12:24:38Z

I think we are both miscommunicating. Reconciling our efforts is my goal.
I'd rather not have flames here :-)
On Dec 23, 2014 7:01 AM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

I don't understand - you've mentioned combining my PR with yours in #110
#110. I'll go ahead and
provide own implementation of Builder then...

—
Reply to this email directly or view it on GitHub
#112 (comment).

shaxbee · 2014-12-23T12:36:13Z

I think we are talking about same thing then :-)

rw · 2014-12-28T04:30:31Z

Where do our implementations overlap?

shaxbee · 2015-01-02T18:30:56Z

They don't seem to overlap that much as #110 is based on C++ API and uses struct / array heavily.
My focus was on performance and therefore I reduced the scope of v1 features to reading.

rw · 2015-01-02T18:36:57Z

That's important, but not as much as architecture choices. I'm more comfortable with the very strict byte-layout focused test suite #112 is based on. @shaxbee feel like working with me on a joint PR?

Anyone else--thoughts?

shaxbee · 2015-01-02T18:59:28Z

Sure. How do you want to do that? Merge tests and builder codegen from #112 to #110?

ghost · 2015-01-06T00:57:06Z

A joint PR would be great :)

shaxbee · 2015-01-06T15:16:10Z

@rw: I've merged encode / numtypes code, using struct module but keeping the methods. Could we sync via email?

rw · 2015-01-17T20:55:33Z

@shaxbee Yes, please email me. Address is on my profile page. :-)

rw · 2015-03-04T10:48:48Z

Beta testers: please evaluate this PR for merging.

I force-pushed some updates:

Add read benchmark for the gold example data (currently at 2100+ traversals/sec).
Add write benchmark for the gold example data (currently at 1300+ builds/sec).
Add the PythonUsage.md documentation file.
Include dependencies when installing with setup.py install.
Add code coverage reports to the test runner (currently reports 87%).
Use faster struct packing.

This branch has feature parity with the Java and Go versions, supports both Python 2 and 3, and is thoroughly tested.

gkogan · 2015-03-05T00:40:14Z

Looks good to me.

jordan52 · 2015-03-07T03:00:15Z

this looks to be in really good shape.

rw · 2015-03-08T06:41:40Z

Reified exceptions into their own types (in the exceptions.py file).
Sped up read traversals by 50%, to 3000/sec, by removing a supermethod call in the packer.py hotpath.

rw · 2015-03-09T06:12:35Z

Increase code coverage to 97%: Add cases to generate and test conditionals not traversed with the 'gold' example Monster data.

shaxbee · 2015-03-09T06:15:30Z

+1 for this PR, i don't have capacity at the moment to finish up #110.

On Mon, 9 Mar 2015 at 14:12 Robert notifications@github.com wrote:

Increase code coverage to 97%: Add cases to generate and test conditionals
not traversed with the 'gold' example Monster data.

—
Reply to this email directly or view it on GitHub
#112 (comment).

ghost · 2015-04-08T18:35:01Z

Robert, do you want to do more to this commit, or is it "good enough" for now?

rw · 2015-04-08T19:28:09Z

@gwvo I'll address your comments in this thread, squash, and this should be good to go. I'll comment here when that's ready. Ideally we'd do a round of speed work afterwards, similar to #165.

rw · 2015-04-12T09:28:56Z

Pushed some updates:

Rebase on top of latest master.
Update use of GenComment to match the new type signature.
Squash commits.
Comment out the 'typed Python'-style asserts.

ghost · 2015-04-13T16:51:15Z

ready then? :)

rw · 2015-04-13T17:39:35Z

@gwvo Yep!

ghost · 2015-04-13T17:54:41Z

Python2 tests:
Traceback (most recent call last):
File "py_test.py", line 17, in
import MyGame.Example.Any # refers to generated code
File "/usr/local/google/home/wvo/rep/vendor/unbundled_google/libs/flatbuffers/tests/MyGame/Example/Any.py", line 5, in
from enum import Enum
ImportError: No module named enum

rw · 2015-04-17T20:56:49Z

Improved compatibility:

Remove all external dependencies for the runtime library and generated code.
Add Python 2.6 compatibility. CPython 2.6, 2.7, 3+, and PyPy are now passing, using the same generated code.

ghost · 2015-04-20T22:21:14Z

Still get this: ImportError: No module named enum in __init__.py on all 3 tests. My actual Python version is 2.7.6. On Linux.

Also maybe have the files generates in tests rather than tests/py_gen, that way they line up with Java/C#.

rw · 2015-04-21T08:44:43Z

More tweaks:

Lift generated Python files into tests/MyGame, like we do for the other ports.
Tests will run if at least one Python interpreter is found.
Tests will run if no coverage utility is found.
Disable benchmarks by default.

rw · 2015-04-22T09:54:33Z

Now we disable Go/Java comparison checks by default.

layzerar · 2015-04-29T07:05:08Z

@rw I found a little bug in the setup script, the following patch should fix.

 python/setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/setup.py b/python/setup.py
index f067b38..44f5ee7 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -8,7 +8,7 @@ setup(
     author_email='me@rwinslow.com',
     url='https://github.com/python/flatbuffers/python',
     long_description='Python runtime library and code generator for use with the Flatbuffers serialization format.',
-    packages=['flatbuffers'],
+    packages=['flatbuffers', 'flatbuffers.vendor'],
     include_package_data=True,
     requires=[],
     description='Runtime library and code generator for use with the Flatbuffers serialization format.',

layzerar · 2015-04-29T07:06:12Z

After reinstall this package, everything is OK for me.

rw · 2015-04-29T07:16:43Z

Thanks @layzerar! Pushed.

layzerar · 2015-05-11T12:29:51Z

Life is short, how about this?

rw · 2015-05-11T18:52:27Z

@layzerar Almost there, incorporating some out-of-band feedback.

Implement code generation and self-contained runtime library for Python. The test suite verifies: - Correctness of generated Python code by comparing output to that of the other language ports. - The exact bytes in the Builder buffer during many scenarios. - Vtable deduplication correctness. - Edge cases for table construction, via a fuzzer derived from the Go implementation. - All code is simultaneously valid in Python 2.6, 2.7, and 3.4. The test suite includes benchmarks for: - Building 'gold' data. - Parsing 'gold' data. - Deduplicating vtables. All tests pass on this author's system for the following Python implementations: - CPython 2.6.7 - CPython 2.7.8 - CPython 3.4.2 - PyPy 2.5.0 (CPython 2.7.8 compatible)

rw · 2015-05-12T22:44:47Z

Factored out some duplicated code, made the runtime library PEP8-compliant, and made a number of other stylistic fixes. The structure of the code has not changed.

This is looking OK to me. Anyone else want to give feedback?

layzerar · 2015-05-13T08:06:03Z

flatc failed to compile in VS2013, error log is here:

1>------ Rebuild All started: Project: flatc, Configuration: Release Win32 ------
2>------ Rebuild All started: Project: flatsamplebinary, Configuration: Release Win32 ------
3>------ Rebuild All started: Project: flatsampletext, Configuration: Release Win32 ------
4>------ Rebuild All started: Project: flattests, Configuration: Release Win32 ------
1>  idl_gen_fbs.cpp
2>  sample_binary.cpp
4>  idl_gen_fbs.cpp
3>  idl_parser.cpp
1>  idl_gen_general.cpp
4>  idl_gen_general.cpp
3>  idl_gen_text.cpp
1>  idl_gen_go.cpp
4>  idl_parser.cpp
3>  sample_text.cpp
2>  flatsamplebinary.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flatsamplebinary.exe
1>  idl_gen_python.cpp
3>  Generating Code...
4>  idl_gen_text.cpp
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IRWXU' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IRGRP' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IXGRP' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IROTH' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IXOTH' : undeclared identifier
1>  idl_parser.cpp
4>  test.cpp
1>  idl_gen_cpp.cpp
4>  Generating Code...
1>..\..\src\idl_gen_cpp.cpp(276): error C2220: warning treated as error - no 'object' file generated
1>..\..\src\idl_gen_cpp.cpp(276): warning C4189: 'nested_root' : local variable is initialized but not referenced
1>  idl_gen_text.cpp
3>  flatsampletext.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flatsampletext.exe
1>  flatc.cpp
1>  Generating Code...
4>  flattests.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flattests.exe
========== Rebuild All: 3 succeeded, 1 failed, 0 skipped ==========

py_test.py run with following error (Python2.6) :

Traceback (most recent call last):
  File "C:\Users\layz\Desktop\flatbuffers\flatbuffers\tests\py_test.py", line 1331, in <module>
    main()
  File "C:\Users\layz\Desktop\flatbuffers\flatbuffers\tests\py_test.py", line 1294, in main
    ('<benchmark read count> <benchmark build count>')
TypeError: 'str' object is not callable

this patch would fix:

 tests/py_test.py | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/py_test.py b/tests/py_test.py
index c6861f1..2934394 100644
--- a/tests/py_test.py
+++ b/tests/py_test.py
@@ -1290,13 +1290,13 @@ def main():
     import os
     import sys
     if not len(sys.argv) == 4:
-       sys.stderr.write(('Usage: %s <benchmark vtable count>')
-                        ('<benchmark read count> <benchmark build count>')
-                        ('\n' % sys.argv[0]))
-       sys.stderr.write(('       Provide COMPARE_GENERATED_TO_GO=1   to check')
-                        ('for bytewise comparison to Go data.\n'))
-       sys.stderr.write(('       Provide COMPARE_GENERATED_TO_JAVA=1 to check')
-                        ('for bytewise comparison to Java data.\n'))
+       sys.stderr.write('Usage: %s <benchmark vtable count>'
+                        '<benchmark read count> <benchmark build count>'
+                        '\n' % sys.argv[0])
+       sys.stderr.write('       Provide COMPARE_GENERATED_TO_GO=1   to check'
+                        'for bytewise comparison to Go data.\n')
+       sys.stderr.write('       Provide COMPARE_GENERATED_TO_JAVA=1 to check'
+                        'for bytewise comparison to Java data.\n')
        sys.stderr.flush()
        sys.exit(1)

ghost · 2015-05-13T17:00:56Z

For S_IRWXU etc: make sure you use the mkdir related function in util.h instead.
The nested_root thing I can fix.

rw · 2015-05-13T17:51:45Z

@layzerar Thanks, fixed string catenation in py_test.py's error message.

rw · 2015-05-13T17:55:27Z

@gwvo Switched from mkdir to EnsureDirExists (from util.h).

ghost · 2015-05-13T18:18:30Z

Ok, merge at will :)

Port FlatBuffers to Python.

layzerar · 2015-05-14T02:15:50Z

Briliant!

rw · 2015-05-15T03:07:24Z

We've got it on PyPI:
$ pip install flatbuffers

Downchuck · 2015-09-25T17:48:30Z

Is ./flatc still needed to do code gen? I'm hitting issues with its python output.
EDIT: My bad -- looks like "namespace" is required when generating python output.

shaxbee · 2015-09-27T10:39:32Z

@rw Congrats on getting this through! :-)

…k_multi_repos support multi repositories in github webhook

rw mentioned this pull request Dec 23, 2014

Python flatbuffers read support #110

Closed

ghost force-pushed the master branch from f177663 to b929c62 Compare January 7, 2015 23:51

shaxbee added a commit to shaxbee/flatbuffers that referenced this pull request Jan 29, 2015

Merging with PR google#112

09c6913

fix string catenation

2746aab

use EnsureDirExists from util.h instead of mkdir

776c4eb

rw added a commit that referenced this pull request May 13, 2015

Merge pull request #112 from rw/python23-read-write-gen-port

f8139b0

Port FlatBuffers to Python.

rw merged commit f8139b0 into google:master May 13, 2015

kakikubo pushed a commit to kakikubo/flatbuffers that referenced this pull request Apr 19, 2016

Merge pull request google#112 from kiyoto-suzuki/feature/github_weboo…

d1d5fea

…k_multi_repos support multi repositories in github webhook

rw mentioned this pull request Jul 3, 2019

Contributing Golang and Rust implementations zarr-developers/community#5

Open

Port FlatBuffers to Python. #112

Port FlatBuffers to Python. #112

Conversation

rw commented Dec 23, 2014

rw commented Dec 23, 2014

rw commented Dec 23, 2014

shaxbee commented Dec 23, 2014

rw commented Dec 23, 2014

rw commented Dec 23, 2014

shaxbee commented Dec 23, 2014

rw commented Dec 23, 2014

shaxbee commented Dec 23, 2014

rw commented Dec 23, 2014

shaxbee commented Dec 23, 2014

rw commented Dec 28, 2014

shaxbee commented Jan 2, 2015

rw commented Jan 2, 2015

shaxbee commented Jan 2, 2015

ghost commented Jan 6, 2015

shaxbee commented Jan 6, 2015

rw commented Jan 17, 2015

rw commented Mar 4, 2015

gkogan commented Mar 5, 2015

jordan52 commented Mar 7, 2015

rw commented Mar 8, 2015

rw commented Mar 9, 2015

shaxbee commented Mar 9, 2015

ghost commented Apr 8, 2015

rw commented Apr 8, 2015

rw commented Apr 12, 2015

ghost commented Apr 13, 2015

rw commented Apr 13, 2015

ghost commented Apr 13, 2015

rw commented Apr 17, 2015

ghost commented Apr 20, 2015

rw commented Apr 21, 2015

rw commented Apr 22, 2015

layzerar commented Apr 29, 2015

layzerar commented Apr 29, 2015

rw commented Apr 29, 2015

layzerar commented May 11, 2015

rw commented May 11, 2015

rw commented May 12, 2015

layzerar commented May 13, 2015

ghost commented May 13, 2015

rw commented May 13, 2015

rw commented May 13, 2015

ghost commented May 13, 2015

layzerar commented May 14, 2015

rw commented May 15, 2015

Downchuck commented Sep 25, 2015

shaxbee commented Sep 27, 2015