Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port FlatBuffers to Python. #112

Merged
merged 3 commits into from
May 13, 2015
Merged

Port FlatBuffers to Python. #112

merged 3 commits into from
May 13, 2015

Conversation

rw
Copy link
Collaborator

@rw rw commented Dec 23, 2014

Implement code generation and runtime library for Python 2 and 3, derived
from the Go implementation. Additionally, the test suite verifies:

the exact bytes in the Builder buffer during many scenarios,
vtable deduplication, and
table construction, via a fuzzer derived from the Go implementation.

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

This is a feature-complete Python port, derived from the Go version. I'm not trying to collide with other PRs, but instead provide more options for whatever ends up being merged.

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

Please provide feedback on my use of ctypes; it seems mostly unnecessary now but I would benefit from additional opinions.

@shaxbee
Copy link
Contributor

shaxbee commented Dec 23, 2014

I've tried ctypes and ended up using struct because it enforces value ranges automatically, takes care of endianess and returns native python data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
   ....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
   ....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

Nice microbenchmarks.

I am personally more interested in correctness than speed at this point,
hence my emphasis on quite a bit of testing.
On Dec 22, 2014 10:33 PM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

I've tried ctypes and ended up using struct because it enforces value
ranges automatically, takes care of endianess and returns native python
data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.


Reply to this email directly or view it on GitHub
#112 (comment).

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

Also, what do you mean by steal?
On Dec 22, 2014 11:23 PM, "Robert Winslow" me@rwinslow.com wrote:

Nice microbenchmarks.

I am personally more interested in correctness than speed at this point,
hence my emphasis on quite a bit of testing.
On Dec 22, 2014 10:33 PM, "Zbigniew Mandziejewicz" <
notifications@github.com> wrote:

I've tried ctypes and ended up using struct because it enforces value
ranges automatically, takes care of endianess and returns native python
data types. Also it seems to be much faster:

ctypes:

%%timeit raw = struct.pack('<i', 534217); head = 0
n = 0
n |= raw[head]
n |= raw[head + 1] << 8
n |= raw[head + 2] << 16
n |= raw[head + 3] << 24
ctypes.c_int32(n).value
....:
1000000 loops, best of 3: 994 ns per loop

struct:

%%timeit raw = struct.pack('<i', 534217); fmt = struct.Struct('<i'); offset = 0
fmt.unpack_from(raw, offset)[0]
....:
1000000 loops, best of 3: 268 ns per loop

I'll steal the idea of explicit type definitions though.


Reply to this email directly or view it on GitHub
#112 (comment).

@shaxbee
Copy link
Contributor

shaxbee commented Dec 23, 2014

If you don't mind I'll use parts of your builder implementation and put datatype definitions in separate module as you did and remove hardcoded datatype sizes.

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

I'd rather you keep your PR at its current scope..."lifting" code is not
the kind way to do things.
On Dec 23, 2014 6:45 AM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

If you don't mind I'll use parts of your builder implementation and put
datatype definitions in separate module as you did and remove hardcoded
datatype sizes.


Reply to this email directly or view it on GitHub
#112 (comment).

@shaxbee
Copy link
Contributor

shaxbee commented Dec 23, 2014

I don't understand - you've mentioned combining my PR with yours in #110. I'll go ahead and provide own implementation of Builder then...

@rw
Copy link
Collaborator Author

rw commented Dec 23, 2014

I think we are both miscommunicating. Reconciling our efforts is my goal.
I'd rather not have flames here :-)
On Dec 23, 2014 7:01 AM, "Zbigniew Mandziejewicz" notifications@github.com
wrote:

I don't understand - you've mentioned combining my PR with yours in #110
#110. I'll go ahead and
provide own implementation of Builder then...


Reply to this email directly or view it on GitHub
#112 (comment).

@shaxbee
Copy link
Contributor

shaxbee commented Dec 23, 2014

I think we are talking about same thing then :-)

@rw
Copy link
Collaborator Author

rw commented Dec 28, 2014

Where do our implementations overlap?

@shaxbee
Copy link
Contributor

shaxbee commented Jan 2, 2015

They don't seem to overlap that much as #110 is based on C++ API and uses struct / array heavily.
My focus was on performance and therefore I reduced the scope of v1 features to reading.

@rw
Copy link
Collaborator Author

rw commented Jan 2, 2015

That's important, but not as much as architecture choices. I'm more comfortable with the very strict byte-layout focused test suite #112 is based on. @shaxbee feel like working with me on a joint PR?

Anyone else--thoughts?

@shaxbee
Copy link
Contributor

shaxbee commented Jan 2, 2015

Sure. How do you want to do that? Merge tests and builder codegen from #112 to #110?

@ghost
Copy link

ghost commented Jan 6, 2015

A joint PR would be great :)

@shaxbee
Copy link
Contributor

shaxbee commented Jan 6, 2015

@rw: I've merged encode / numtypes code, using struct module but keeping the methods. Could we sync via email?

@rw
Copy link
Collaborator Author

rw commented Jan 17, 2015

@shaxbee Yes, please email me. Address is on my profile page. :-)

shaxbee added a commit to shaxbee/flatbuffers that referenced this pull request Jan 29, 2015
@rw
Copy link
Collaborator Author

rw commented Mar 4, 2015

Beta testers: please evaluate this PR for merging.

I force-pushed some updates:

  1. Add read benchmark for the gold example data (currently at 2100+ traversals/sec).
  2. Add write benchmark for the gold example data (currently at 1300+ builds/sec).
  3. Add the PythonUsage.md documentation file.
  4. Include dependencies when installing with setup.py install.
  5. Add code coverage reports to the test runner (currently reports 87%).
  6. Use faster struct packing.

This branch has feature parity with the Java and Go versions, supports both Python 2 and 3, and is thoroughly tested.

@gkogan
Copy link

gkogan commented Mar 5, 2015

Looks good to me.

@jordan52
Copy link

jordan52 commented Mar 7, 2015

this looks to be in really good shape.

@rw
Copy link
Collaborator Author

rw commented Mar 8, 2015

  • Reified exceptions into their own types (in the exceptions.py file).
  • Sped up read traversals by 50%, to 3000/sec, by removing a supermethod call in the packer.py hotpath.

@rw
Copy link
Collaborator Author

rw commented Mar 9, 2015

Increase code coverage to 97%: Add cases to generate and test conditionals not traversed with the 'gold' example Monster data.

@shaxbee
Copy link
Contributor

shaxbee commented Mar 9, 2015

+1 for this PR, i don't have capacity at the moment to finish up #110.

On Mon, 9 Mar 2015 at 14:12 Robert notifications@github.com wrote:

Increase code coverage to 97%: Add cases to generate and test conditionals
not traversed with the 'gold' example Monster data.


Reply to this email directly or view it on GitHub
#112 (comment).

@ghost
Copy link

ghost commented Apr 8, 2015

Robert, do you want to do more to this commit, or is it "good enough" for now?

@rw
Copy link
Collaborator Author

rw commented Apr 8, 2015

@gwvo I'll address your comments in this thread, squash, and this should be good to go. I'll comment here when that's ready. Ideally we'd do a round of speed work afterwards, similar to #165.

@rw
Copy link
Collaborator Author

rw commented Apr 12, 2015

Pushed some updates:

  1. Rebase on top of latest master.
  2. Update use of GenComment to match the new type signature.
  3. Squash commits.
  4. Comment out the 'typed Python'-style asserts.

@ghost
Copy link

ghost commented Apr 13, 2015

ready then? :)

@rw
Copy link
Collaborator Author

rw commented Apr 13, 2015

@gwvo Yep!

@ghost
Copy link

ghost commented Apr 13, 2015

Python2 tests:
Traceback (most recent call last):
File "py_test.py", line 17, in
import MyGame.Example.Any # refers to generated code
File "/usr/local/google/home/wvo/rep/vendor/unbundled_google/libs/flatbuffers/tests/MyGame/Example/Any.py", line 5, in
from enum import Enum
ImportError: No module named enum

@rw
Copy link
Collaborator Author

rw commented Apr 17, 2015

Improved compatibility:

  • Remove all external dependencies for the runtime library and generated code.
  • Add Python 2.6 compatibility. CPython 2.6, 2.7, 3+, and PyPy are now passing, using the same generated code.

@ghost
Copy link

ghost commented Apr 20, 2015

Still get this: ImportError: No module named enum in __init__.py on all 3 tests. My actual Python version is 2.7.6. On Linux.

Also maybe have the files generates in tests rather than tests/py_gen, that way they line up with Java/C#.

@rw
Copy link
Collaborator Author

rw commented Apr 21, 2015

More tweaks:

Lift generated Python files into tests/MyGame, like we do for the other ports.
Tests will run if at least one Python interpreter is found.
Tests will run if no coverage utility is found.
Disable benchmarks by default.

@rw
Copy link
Collaborator Author

rw commented Apr 22, 2015

Now we disable Go/Java comparison checks by default.

@layzerar
Copy link
Contributor

@rw I found a little bug in the setup script, the following patch should fix.

 python/setup.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/setup.py b/python/setup.py
index f067b38..44f5ee7 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -8,7 +8,7 @@ setup(
     author_email='me@rwinslow.com',
     url='https://github.com/python/flatbuffers/python',
     long_description='Python runtime library and code generator for use with the Flatbuffers serialization format.',
-    packages=['flatbuffers'],
+    packages=['flatbuffers', 'flatbuffers.vendor'],
     include_package_data=True,
     requires=[],
     description='Runtime library and code generator for use with the Flatbuffers serialization format.',

@layzerar
Copy link
Contributor

After reinstall this package, everything is OK for me.

@rw
Copy link
Collaborator Author

rw commented Apr 29, 2015

Thanks @layzerar! Pushed.

@layzerar
Copy link
Contributor

Life is short, how about this?

@rw
Copy link
Collaborator Author

rw commented May 11, 2015

@layzerar Almost there, incorporating some out-of-band feedback.

Implement code generation and self-contained runtime library for Python.

The test suite verifies:
  - Correctness of generated Python code by comparing output to that of
    the other language ports.
  - The exact bytes in the Builder buffer during many scenarios.
  - Vtable deduplication correctness.
  - Edge cases for table construction, via a fuzzer derived from the Go
    implementation.
  - All code is simultaneously valid in Python 2.6, 2.7, and 3.4.

The test suite includes benchmarks for:
  - Building 'gold' data.
  - Parsing 'gold' data.
  - Deduplicating vtables.

All tests pass on this author's system for the following Python
implementations:
  - CPython 2.6.7
  - CPython 2.7.8
  - CPython 3.4.2
  - PyPy 2.5.0 (CPython 2.7.8 compatible)
@rw
Copy link
Collaborator Author

rw commented May 12, 2015

Factored out some duplicated code, made the runtime library PEP8-compliant, and made a number of other stylistic fixes. The structure of the code has not changed.

This is looking OK to me. Anyone else want to give feedback?

@layzerar
Copy link
Contributor

  • flatc failed to compile in VS2013, error log is here:
1>------ Rebuild All started: Project: flatc, Configuration: Release Win32 ------
2>------ Rebuild All started: Project: flatsamplebinary, Configuration: Release Win32 ------
3>------ Rebuild All started: Project: flatsampletext, Configuration: Release Win32 ------
4>------ Rebuild All started: Project: flattests, Configuration: Release Win32 ------
1>  idl_gen_fbs.cpp
2>  sample_binary.cpp
4>  idl_gen_fbs.cpp
3>  idl_parser.cpp
1>  idl_gen_general.cpp
4>  idl_gen_general.cpp
3>  idl_gen_text.cpp
1>  idl_gen_go.cpp
4>  idl_parser.cpp
3>  sample_text.cpp
2>  flatsamplebinary.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flatsamplebinary.exe
1>  idl_gen_python.cpp
3>  Generating Code...
4>  idl_gen_text.cpp
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IRWXU' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IRGRP' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IXGRP' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IROTH' : undeclared identifier
1>..\..\src\idl_gen_python.cpp(577): error C2065: 'S_IXOTH' : undeclared identifier
1>  idl_parser.cpp
4>  test.cpp
1>  idl_gen_cpp.cpp
4>  Generating Code...
1>..\..\src\idl_gen_cpp.cpp(276): error C2220: warning treated as error - no 'object' file generated
1>..\..\src\idl_gen_cpp.cpp(276): warning C4189: 'nested_root' : local variable is initialized but not referenced
1>  idl_gen_text.cpp
3>  flatsampletext.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flatsampletext.exe
1>  flatc.cpp
1>  Generating Code...
4>  flattests.vcxproj -> C:\Users\layz\Desktop\flatbuffers\flatbuffers\build\VS2010\Release\flattests.exe
========== Rebuild All: 3 succeeded, 1 failed, 0 skipped ==========
  • py_test.py run with following error (Python2.6) :
Traceback (most recent call last):
  File "C:\Users\layz\Desktop\flatbuffers\flatbuffers\tests\py_test.py", line 1331, in <module>
    main()
  File "C:\Users\layz\Desktop\flatbuffers\flatbuffers\tests\py_test.py", line 1294, in main
    ('<benchmark read count> <benchmark build count>')
TypeError: 'str' object is not callable
  • this patch would fix:
 tests/py_test.py | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/py_test.py b/tests/py_test.py
index c6861f1..2934394 100644
--- a/tests/py_test.py
+++ b/tests/py_test.py
@@ -1290,13 +1290,13 @@ def main():
     import os
     import sys
     if not len(sys.argv) == 4:
-       sys.stderr.write(('Usage: %s <benchmark vtable count>')
-                        ('<benchmark read count> <benchmark build count>')
-                        ('\n' % sys.argv[0]))
-       sys.stderr.write(('       Provide COMPARE_GENERATED_TO_GO=1   to check')
-                        ('for bytewise comparison to Go data.\n'))
-       sys.stderr.write(('       Provide COMPARE_GENERATED_TO_JAVA=1 to check')
-                        ('for bytewise comparison to Java data.\n'))
+       sys.stderr.write('Usage: %s <benchmark vtable count>'
+                        '<benchmark read count> <benchmark build count>'
+                        '\n' % sys.argv[0])
+       sys.stderr.write('       Provide COMPARE_GENERATED_TO_GO=1   to check'
+                        'for bytewise comparison to Go data.\n')
+       sys.stderr.write('       Provide COMPARE_GENERATED_TO_JAVA=1 to check'
+                        'for bytewise comparison to Java data.\n')
        sys.stderr.flush()
        sys.exit(1)

@ghost
Copy link

ghost commented May 13, 2015

For S_IRWXU etc: make sure you use the mkdir related function in util.h instead.
The nested_root thing I can fix.

@rw
Copy link
Collaborator Author

rw commented May 13, 2015

@layzerar Thanks, fixed string catenation in py_test.py's error message.

@rw
Copy link
Collaborator Author

rw commented May 13, 2015

@gwvo Switched from mkdir to EnsureDirExists (from util.h).

@ghost
Copy link

ghost commented May 13, 2015

Ok, merge at will :)

rw added a commit that referenced this pull request May 13, 2015
@rw rw merged commit f8139b0 into google:master May 13, 2015
@layzerar
Copy link
Contributor

Briliant!

@rw
Copy link
Collaborator Author

rw commented May 15, 2015

We've got it on PyPI:
$ pip install flatbuffers

@Downchuck
Copy link

Is ./flatc still needed to do code gen? I'm hitting issues with its python output.
EDIT: My bad -- looks like "namespace" is required when generating python output.

@shaxbee
Copy link
Contributor

shaxbee commented Sep 27, 2015

@rw Congrats on getting this through! :-)

kakikubo pushed a commit to kakikubo/flatbuffers that referenced this pull request Apr 19, 2016
…k_multi_repos

support multi repositories in github webhook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants