SuperDB

🔴 NOTICE OF PROJECT READINESS 🔴

SuperDB is still under development so there's not yet a GA release. You're welcome to try it out in its early form (i.e., tip-of-main) and we'd love to hear your feedback. Read on for more info!

SuperDB is a new analytics database that supports relational tables and JSON on an equal footing. It shines when it comes to data wrangling where you need to explore or process large eclectic data sets. It's also pretty decent at analytics and search use cases.

Unlike other relational systems that do performance-fragile "schema inference" of JSON, SuperDB won't fall over if you throw a bunch of eclectic JSON at it. You can easily do schema inference if you want, but data is ingested by default in its natural form no matter how much heterogeneity it might have. And unlike systems based on the document data model, every value in SuperDB is strongly and dynamically typed thus providing the best of both worlds: the flexibility of the document model and the efficiency and performance of the relational model.

In SuperDB's SQL dialect, there are no "JSON columns" so there isn't a "relational way to do things" and a different "JSON way to do things". Instead of having a relational type system for structured data and completely separate JSON type system for semi-structured data, all data handled by SuperDB (e.g., JSON, CSV, Parquet files, Arrow streams, relational tables, etc) is automatically massaged into super-structured data form. This super-structured data is then processed by a runtime that simultaneously supports the statically-typed relational model and the dynamically-typed JSON data model in a unified compute engine.

SuperSQL

SuperDB uses SQL as its query language, but it's a SQL that has been extended with pipe syntax and lots of fun shortcuts. This extended SQL is called SuperSQL.

Here's a SuperSQL query that fetches some data from GitHub Archive, computes the set of repos touched by each user, ranks them by number of repos, picks the top five, and joins each user with their original created_at time from the current GitHub API:

FROM 'https://data.gharchive.org/2015-01-01-15.json.gz'
| SELECT union(repo.name) AS repos, actor.login AS user
  GROUP BY user
  ORDER BY len(repos) DESC
  LIMIT 5
| FORK (
  => FROM eval(f'https://api.github.com/users/{user}')
   | SELECT VALUE {user:login,created_at:time(created_at)}
  => PASS
  )
| JOIN USING (user) repos

Super JSON

Super-structured data is strongly typed and "polymorphic": any value can take on any type and sequences of data need not all conform to a predefined schema. To this end, SuperDB extends the JSON format to support super-structured data in a format called Super JSON where all JSON values are also Super JSON values. Similarly, the Super Binary format is an efficient binary representation of Super JSON (a bit like Avro) and the Super Columnar format is a columnar representation of Super JSON (a bit like Parquet).

Even though SuperDB is based on these super-structured data formats, it can read and write most common data formats.

Project Status

Our long-term goal for SuperSQL is to be Postgres-compatible and interoperate with existing SQL tooling. In the meantime, SuperSQL is a bit of a moving target and we would love community engagement to evolve and fine tune its syntax and semantics.

Our areas of active development include:

the SuperSQL query language,
the type-based query compiler and optimizer,
fast, vectorized ingest of common file formats,
a complete vectorized runtime, and
a data lake based on super-structured data.

Try It

As SuperDB is still under construction, GA releases are not yet available. However, you can install a build of the super command-line tool based on code that's under active development to start tinkering. Detailed documentation for the SuperDB system and its piped SQL syntax is available on the SuperDB docs site.

As the code and docs are evolving, we recommend focusing first on what's in the super command doc. Feel free to explore other docs and try things out, but please don't be shocked if you hit speedbumps in the near term, particularly in areas like performance and full SQL coverage. We're working on it! 😉

Once you've tried it out, we'd love to hear your feedback via our community Slack.

NOTE: The SuperDB query engine can run locally without a storage engine by accessing files, HTTP endpoints, or S3 paths using the super command. While earlier in its development, SuperDB can also run on a super-structured data lake using the super db sub-commands.

SuperDB Desktop - Coming Soon

SuperDB Desktop is an Electron-based desktop app to explore, query, and shape data in a SuperDB data lake. It combines a search experience with a SQL query and has some really slick design for dealing with complex and large JSON data.

Unlike most JSON browsing tools, it won't slow to a crawl --- or worse crash --- if you load it up with ginormous JSON values.

Contributing

See the contributing guide on how you can help improve SuperDB!

Join the Community

Join our public Slack workspace for announcements, Q&A, and to trade tips!

Name		Name	Last commit message	Last commit date
Latest commit History 3,625 Commits
.github/workflows		.github/workflows
api		api
cli		cli
cmd		cmd
compiler		compiler
docs		docs
fuzz		fuzz
lake		lake
lakeparse		lakeparse
mdtest		mdtest
order		order
performance		performance
pkg		pkg
python/superdb		python/superdb
runtime		runtime
scripts		scripts
service		service
testdata		testdata
vector		vector
vng		vng
zbuf		zbuf
zcode		zcode
zfmt		zfmt
zio		zio
zngbytes		zngbytes
zson		zson
ztest		ztest
ztests		ztests
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
.linkspector.yml		.linkspector.yml
.markdownlint.yaml		.markdownlint.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
acknowledgments.txt		acknowledgments.txt
code_of_conduct.md		code_of_conduct.md
complex.go		complex.go
complex_test.go		complex_test.go
context.go		context.go
context_test.go		context_test.go
go.mod		go.mod
go.sum		go.sum
mapper.go		mapper.go
package.json		package.json
primitive.go		primitive.go
recordbuilder.go		recordbuilder.go
type.go		type.go
typevectortable.go		typevectortable.go
value.go		value.go
value_test.go		value_test.go
walk.go		walk.go
zed_test.go		zed_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuperDB

SuperSQL

Super JSON

Project Status

Try It

SuperDB Desktop - Coming Soon

Contributing

Join the Community

About

Contributors 17

Languages

License

brimdata/super

Folders and files

Latest commit

History

Repository files navigation

SuperDB

SuperSQL

Super JSON

Project Status

Try It

SuperDB Desktop - Coming Soon

Contributing

Join the Community

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 17

Languages