Skip to content
This repository has been archived by the owner on Mar 8, 2018. It is now read-only.

Iterate over a stream of json objects + support new yajl version + bug fixes #33

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

iainb
Copy link

@iainb iainb commented Sep 17, 2011

Note: I've aimed to keep backwards API compatibility but I've made some additions, how these new additions actually work are open to discussion.

I think that this patch aims to address issues:

The main improvement is that you can now write something like:

import yajl
import sys

for i in yajl.Decoder(allow_multiple_values=True,stream=sys.stdin):
    print i

Which will let you iterate over a stream of json objects read from the processes std input channel. It's not too slow either, on my core2duo a yajl based producer / consumer connected via a unix pipe with a very basic object can process about 100,000 json objects/sec.

I don't think the iterator method will handle non blocking sockets very well at the moment, it may not handle them at all - I've not tested it yet. If the file object is in blocking mode the iterator handles that fine.

I've also fixed a number of bugs relating to memory management (including issue #32) along the way.

Summary of changes.

  • Maintain backwards compatibility
  • Upgrade to newer version of yajl
  • Change yajl decoder class to decode objects into an internal python list (as each read while iterating my decode 0 to >1 objects yet we must only return 1).
  • Add iterator method to decoder
  • Add len() method to the decoder (though I'm not really sure if this is needed - it returns the size of the internal list).
  • decoder now takes 3 optional arguments when being initialised.
    1. allow_multiple_values - true / false - allow yajl to continue decoding past the first value
    2. stream - a file like object to read from when iterating -
    3. bufsize - integer - the size of each read performed internally when iterating over a stream

The unit tests still pass and the current version also supports python 3. Though I really should write some more unit tests for the new features and some documentation to accompany them.

* updated build instructions
rickeyski@cf3c29d

Move IssueTwentySevenTest dict into python2 to prevent python3 test from bailing on loading the test suite

Update IssueTwentySevenTest to only run on python2. (probably want to run this on python3 too).
* Store decoded objects in a list
Remove actions from _internal_decode that reset the parser state
clang -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -Iincludes/ -Iyajl/src -I/usr/include/python2.6 -c yajl.c -o build/temp.linux-x86_64-2.6/yajl.o -Wall -DMOD_VERSION="0.3.6-38862b0"
yajl.c:411:15: warning: incompatible pointer types initializing 'PyCFunction' (aka 'PyObject *(*)(PyObject *, PyObject *)') with
      an expression of type 'PyCFunctionWithKeywords' (aka 'PyObject *(*)(PyObject *, PyObject *, PyObject *)')
    {"dumps", (PyCFunctionWithKeywords)(py_dumps), METH_VARARGS | METH_KEYWORDS,
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
yajl.c:427:14: warning: incompatible pointer types initializing 'PyCFunction' (aka 'PyObject *(*)(PyObject *, PyObject *)') with
      an expression of type 'PyCFunctionWithKeywords' (aka 'PyObject *(*)(PyObject *, PyObject *, PyObject *)')
    {"dump", (PyCFunctionWithKeywords)(py_dump), METH_VARARGS | METH_KEYWORDS,
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
yajl.c:331:18: warning: unused function 'py_iterload' [-Wunused-function]
static PyObject *py_iterload(PYARGS)
                 ^
3 warnings generated.

clang -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -Iincludes/ -Iyajl/src -I/usr/include/python2.6 -c encoder.c -o build/temp.linux-x86_64-2.6/encoder.o -Wall -DMOD_VERSION="0.3.6-38862b0"
encoder.c:316:9: warning: expression result unused [-Wunused-value]
        PyObject_INIT_VAR(op, &PyString_Type, size);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from encoder.c:32:
In file included from /usr/include/python2.6/Python.h:81:
/usr/include/python2.6/objimpl.h:159:29: note: instantiated from:
    ( Py_SIZE(op) = (size), PyObject_INIT((op), (typeobj)) )
                            ^
encoder.c:316:9: note: instantiated from:
        PyObject_INIT_VAR(op, &PyString_Type, size);
        ^                 ~~
encoder.c:316:27: note: instantiated from:
        PyObject_INIT_VAR(op, &PyString_Type, size);
                          ^~
1 warning generated.
Updated runtests.sh to run issue_11 test under python3
To iterate over json items written to the stdin of a python process:

import yajl
import sys

for i in yajl.Decoder(allow_multiple_values=True,stream=sys.stdin):
    print i

Python3 support is broken in this build.
Refactor py_yajldecoder_decode
add check to decoder init function to check stream has read() method
Also fix functions which were returning true/false/none types but not incrementing the refcount
Unicode buffer was not being freed after use
@rtyler
Copy link
Owner

rtyler commented Mar 13, 2014

@iainb I was reminded again of this pull request, would you like me to transfer ownership of the yajl module on pypi over to you?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants