Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-24905: Support BLOB incremental I/O in sqlite module #271

Closed
wants to merge 18 commits into from

Conversation

palaviv
Copy link
Contributor

@palaviv palaviv commented Feb 24, 2017

This PR adds support in BLOB incremental I/O at the sqlite module. As asked by @serhiy-storchaka and @berkerpeksag I will try to get some more developers to give their input on the wanted API. I am tagging some people that are active in the ghaering/pysqlite and rogerbinns/apsw.
@ghaering, @rianhunter, @rogerbinns, @phdru. Please look at the PR and give your notes.

https://bugs.python.org/issue24905

@rogerbinns
Copy link

The APSW doc for reference is at https://rogerbinns.github.io/apsw/blob.html

Does having len make sense? Files don't have that method. It is also confusing - should len return the size from the current seek offset?

The documentation should make clearer that you cannot change the size of a blob, and mention zeroblob as the means to make a blob in a query without having to fill it in.

It may be worth mentioning that another approach is to store large data in a file, and only store the filename in the database. (This comes up on the sqlite-users mailing list quite a lot.)

@phdru
Copy link

phdru commented Feb 24, 2017

I cannot remember I ever was in need to read/write a part of a BLOB; it was always "all or nothing" for me. So I never used BLOB APIs; instead I always SELECT/INSERT/UPDATE BLOB columns; in Postgres they are not even BLOB columns — I always use BYTEA type.

So I'm -0 on exposing BLOB API for SQLite.

@rogerbinns
Copy link

@phdru SQLite is the same with regular queries: you can only read or write blobs in their entirety. That for example means that if you store a 25MB blob then you must read or write 25MB at once.

SQLite has the "incremental blob" API for accessing just portions of blobs. The motivation comes from "Lite" in the name - developers use SQLite because it is lighter weight (amongst other reasons). DBAPI doesn't specify incremental blob I/O so only developers intending to use SQLite directly and not another database would use it. Should they be able to?

@phdru
Copy link

phdru commented Feb 24, 2017

-0 from me means: I don't care and if there will be such an API I'm not gonna use it. That's all.

@palaviv
Copy link
Contributor Author

palaviv commented Feb 25, 2017

Thanks for the input @rogerbinns.

Does having len make sense? Files don't have that method. It is also confusing - should len return the size from the current seek offset?

What is the difference between implementing __len__ to the method length APSW blob has?

@rogerbinns
Copy link

@palaviv there is no difference between the value returned by len and length or similar methods. It is however very uncommon to have a len method on file like objects - I couldn't find an example of any! For example StringIO is closest and has no len. Hence my recommendation to avoid len in favour of another method name.

@serhiy-storchaka
Copy link
Member

The mmap.mmap() object is an example of file-like object supporting len().

@rogerbinns
Copy link

@serhiy-storchaka good example. They don't document it though, and there is a size() method although it is returning something slightly different. There also seems to be a correlation between types that have len and those that can you can array access.

In any event my recommendation is to avoid breaking new ground with a len method since that seems not to be normal practise for this kind of thing that provides a file like interface.

@palaviv
Copy link
Contributor Author

palaviv commented Feb 28, 2017

I actually think that we should use __len__ as by the definition this is the length of the object. The Blob object is a representation of the BLOB and that is the BLOB length.

@serhiy-storchaka
Copy link
Member

Pull request conversation is purposed for discussing the code.

It would be better to continue the design discussion on the bug tracker or mailing list.

@palaviv
Copy link
Contributor Author

palaviv commented Mar 4, 2017

@serhiy-storchaka I have implemented the sequence protocol but I have a few questions:

  1. Do I need both PySequenceMethods.sq_item, PySequenceMethods.sq_ass_item and PyMappingMethods.mp_subscript, PyMappingMethods.mp_ass_subscript.
  2. I can't make __contains__ work. Could you point me to how fix that?

@palaviv
Copy link
Contributor Author

palaviv commented Apr 18, 2018

I think that the contains operation should not be supported for blobs. As blobs can be very large looking for a subset of bytes inside them will be a very inefficient process in memory or in compute.

@palaviv palaviv requested a review from a team as a code owner April 18, 2018 13:32
The BLOB size cannot be changed using the :class:`Blob` class. Use
``zeroblob`` to create the blob in the wanted size in advance.

.. versionadded:: 3.7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Blob Objects
------------

.. versionadded:: 3.7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can retarget this for py 3.9

@matrixise
Copy link
Member

Hi @palaviv

Would you be interested to upgrade your PR to the last master?

Thank you

@matrixise matrixise requested a review from berkerpeksag May 7, 2019 20:47
@palaviv palaviv force-pushed the issue-24905-sqlite-blob branch 2 times, most recently from efac873 to 765545e Compare May 9, 2019 09:55
Copy link

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retarget for python 3.9

Blob Objects
------------

.. versionadded:: 3.8
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.9

@eamanu
Copy link
Contributor

eamanu commented Jun 21, 2020

Hi @palaviv There is some plan with this PR?
Was open 3 years ago. Are you still interest on this patch?
If yes, could you fix the conflict. please?

@palaviv
Copy link
Contributor Author

palaviv commented Jul 4, 2020

Hi @eamanu,
This patch exist since January 2016 and I kind of given up on it ever going into CPython as there is no core developer that works on sqlite. I would recommend you to use apsw that support this feature. In case any core developer would be in interested in working on this I would gladly fix any needed changes.

@simonw
Copy link
Contributor

simonw commented Jul 27, 2020

I'd love to see this land in Python. I think there's a strong case for it: SQLite lets you store up to 2GB of data in a BLOB, and reading an entire 2GB value into memory at once isn't nearly as pleasant as reading it incrementally, which is what this would let us do.

@palaviv
Copy link
Contributor Author

palaviv commented Aug 3, 2020

Thanks for the review @berkerpeksag. I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@berkerpeksag: please review the changes made to this pull request.

@nightlark
Copy link
Contributor

nightlark commented Aug 18, 2021

Other than rebasing (due new conflicts arising over time), is there anything that can be done to help move this PR along?

(@palaviv do you want to do the rebase? if you'd like or are too busy I can do the rebase, though I'd need to open a new PR since I don't think I can modify yours)

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Aug 18, 2021

@nightlark, @palaviv: Here' a short list from the top-of-my head of what is needed to rebase this onto main:

  • the test suite has been normalized; we now use snake case test_foo_bar method names
  • Argument Clinic
  • use heap types iso. static types
  • exception types are accessed through the (temporary) global state; for Connection objects, it's available through self->state

If you want to try to land this, Ryan, please give Aviv a week or so to respond before opening a new PR :)

@nightlark
Copy link
Contributor

@erlend-aasland Okay — I think I understand how to use argument clinic. Is there a guide to what iso. static types (or heap types)? Is the iso. a prefix for the types or an abbreviation? If it’s an abbreviation maybe that’s the missing a search term I should be using to find relevant resources.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Aug 20, 2021

@erlend-aasland Okay — I think I understand how to use argument clinic.

Great! AC is nice once you get into it. Feel free to ask if you get stuck :)

Is there a guide to what iso. static types (or heap types)?

There's some info in the docs, but you can also check the PR's that converted the existing types:

Don't hesitate to ask if you need more pointers.

Is the iso. a prefix for the types or an abbreviation?

Sorry, it's an abbreviation: instead of :) I have the bad habit of using it too much.

@erlend-aasland
Copy link
Contributor

Regarding heap types, take a look at Victor's blog: https://vstinner.github.io/isolate-subinterpreters.html

@erlend-aasland
Copy link
Contributor

@nightlark Are you still working on this PR?

@CoolCat467
Copy link

It's 2022 now woooo

@dholth
Copy link
Contributor

dholth commented Feb 16, 2022

+1

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Mar 19, 2022
erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this pull request Apr 14, 2022
erlend-aasland pushed a commit to erlend-aasland/cpython that referenced this pull request Apr 14, 2022
@JelleZijlstra
Copy link
Member

I just merged #30680, a simplified version of this PR. Blobs will be in Python 3.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.