Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etag errors in ceph backed instance #10

Open
rklra opened this issue Jan 8, 2024 · 4 comments
Open

etag errors in ceph backed instance #10

rklra opened this issue Jan 8, 2024 · 4 comments

Comments

@rklra
Copy link

rklra commented Jan 8, 2024

Hi,

I was trying to get the package setup using ceph object storage (theoretically s3 compatible) as the backing store but the etag system used to protect against concurrent writes (

func getInfo(dst OutputFS, fp string, out blockfmt.Uploader) (string, time.Time, error) {
) does not seem to function with cephs etag system (with the value of e.ETag() being blank) as seen here:

sync: etag "[etag here]" from Stat disagrees with etag

rebuilding sdb with the if statement here:

if e, ok := out.(etagger); ok && e.ETag() != etag {
		return "", time.Time{}, fmt.Errorf("etag %s from Stat disagrees with etag %s", etag, e.ETag())
	}

commented out, queries appear to function normally and syncing continues to work (as long as no concurrent reading/writing to the table is occuring)

full outputs here:
(unmodified version of sdb)

root@avx1 ~/sneller # go run ./cmd/sdb/ -- sync main table
sync: etag "[full etag here]" from Stat disagrees with etag
exit status 1

(modified version of sdb)

root@avx1 ~/sneller # go run ./cmd/sdb/ -- sync main table

(edited version -- worked with no errors)

note on concurrent read/write

in my (limited) testing with concurrent reads/writes occurring while the sync task is running, few to no issues occur with querying data that was synced during a write on my modified instance

(sync during write)

root@avx1 ~/sneller # go run ./cmd/sdb/ -- sync main table
root@avx1 ~/sneller # sdb query -fmt json "SELECT COUNT(*) FROM main.table"
{"count": 76}

(sync after write)

root@avx1 ~/sneller # go run ./cmd/sdb/ -- sync main table
root@avx1 ~/sneller # sdb -v  query -fmt json "SELECT COUNT(*) FROM main.table"
{"count": 87}

(full rebuild of indexes and zion files)

root@avx1 ~/sneller # go run ./cmd/sdb/ -- sync main table
root@avx1 ~/sneller # sdb -v  query -fmt json "SELECT COUNT(*) FROM main.table"
{"count": 87}
@ramondeklein
Copy link
Member

I checked the ceph documentation and it looks like the Get Object method doesn't return the ETag header, which differs from the normal S3 API. It looks like ceph does know about the ETag, because the GET method does support the if-match request header. I would file a PR with ceph to return the ETag when listing buckets and fetching the object.

Another option may be to use Minio instead (if that suits your use-case). We have done tests with Minio and it works fine with Sneller.

Disabling the ETag check is not a suitable solution, because it could result in problems when sdb is run in parallel with executing queries. Feel free to disable the check yourself if you are certain that you never run it in parallel.

@rklra
Copy link
Author

rklra commented Jan 8, 2024

Hi, thanks so much for the info;

I will try to get the changes made to ceph but while waiting what would be the best interim solution to this issue without causing any execution issues?

Unfortunately using aws/minio is not possible for our use case (we have an extremely write heavy workload that makes these solutions infeasible to operate, but is seemingly well suited for sneller).

Thanks again for the help with resolving these issues!

@ramondeklein
Copy link
Member

The safest way would be to interleave sdb and Sneller query execution (if that's feasible).

You also may want to check if it's feasible to change the code to check timestamps instead of etags, but it's not a something that is currently supported. Also don't know if that would be completely safe, because timestamps are often not very precise.

@philhofer
Copy link
Contributor

philhofer commented Jan 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants