Skip to content

Latest commit

 

History

History
613 lines (398 loc) · 30.4 KB

README.md

File metadata and controls

613 lines (398 loc) · 30.4 KB

github_star

HASH

HASH is an open-source, self-building database. You can read more about it on our blog.

HASH provides a powerful graph datastore with its own GUI, for creating and using types and entities, and managing the database's growth. Intelligent, autonomous agents can be deployed to grow, check, and maintain the database, integrating and structuring information from the public internet as well as your own connected private sources.

In the future, we envisage HASH serving as an all-in-one workspace, or complete operating system.

We currently recommend using the hosted version of HASH. We haven't yet written up an official guide to self-hosting HASH, although you can find the code powering the application here in this (rather large) GitHub repository.

Warning: The repository is currently in a state of flux while some large improvements are being implemented. As such, portions of this README may prove outdated in the interim, this could include guides on how to load blocks, references to various services, broken tests, features, etc.

a   About the HASH application

This folder contains only the HASH project README. The application itself is split across several different services which can be found co-located alongside this directory. See the respective section in the parent README for descriptions of the following services:

a   Getting started

Running HASH locally

Running HASH locally

To run HASH locally, please follow these steps:

  1. Make sure you have, Git, Node LTS, Yarn Classic, Rust, Docker, Protobuf, and Java. Building the Docker containers requires Docker Buildx. Run each of these version commands and make sure the output is expected:

    git --version
    ## ≥ 2.17
    
    node --version
    ## ≥ 20.12
    
    yarn --version
    ## ≥ 1.16
    
    rustup --version
    ## ≥ 1.27.1 (Required to match the toolchain as specified in `rust-toolchain.toml`)
    
    docker --version
    ## ≥ 20.10
    
    docker compose version
    ## ≥ 2.17.2
    
    docker buildx version
    ## ≥ 0.10.4
    
    protoc --version
    ## ≥ 25
    
    java --version
    ## ≥ 8

    If you have difficulties with git --version on macOS you may need to install Xcode Command Line Tools first: xcode-select --install.

    If you use Docker for macOS or Windows, go to PreferencesResources and ensure that Docker can use at least 4GB of RAM (8GB is recommended).

  2. Clone this repository and navigate to the root of the repository folder in your terminal.

  3. Enable corepack:

corepack enable

You might need to re-create your shell.

  1. Install dependencies:

    yarn install
  2. Ensure Docker is running. If you are on Windows or macOS, you should see app icon in the system tray or the menu bar. Alternatively, you can use this command to check Docker:

    docker run hello-world
  3. If you need to test or develop AI-related features, you will need to create an .env.local file in the repository root with the following values:

    OPENAI_API_KEY=your-open-ai-api-key                                      # required for most AI features
    ANTHROPIC_API_KEY=your-anthropic-api-key                                 # required for most AI features
    HASH_TEMPORAL_WORKER_AI_AWS_ACCESS_KEY_ID=your-aws-access-key-id         # required for most AI features
    HASH_TEMPORAL_WORKER_AI_AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key # required for most AI features
    E2B_API_KEY=your-e2b-api-key                                             # only required for the question-answering flow action

    Note on environment files: .env.local is not committed to the repo – put any secrets that should remain secret here. The default environment variables are taken from .env, extended by .env.development, and finally by .env.local. If you want to overwrite values specified in .env or .env.development, you can add them to .env.local. Do not change any other .env files unless you intend to change the defaults for development or testing.

  4. Launch external services (Postgres, the graph query layer, Kratos, Redis, and OpenSearch) as Docker containers:

    yarn external-services up --wait
    1. You can optionally force a rebuild of the Docker containers by adding the --build argument(this is necessary if changes have been made to the graph query layer). It's recommended to do this whenever updating your branch from upstream.

    2. You can keep external services running between app restarts by adding the --detach argument to run the containers in the background. It is possible to tear down the external services with yarn external-services down.

    3. When using yarn external-services:offline up, the Graph services does not try to connect to https://blockprotocol.org to fetch required schemas. This is useful for development when the internet connection is slow or unreliable.

    4. You can also run the Graph API and AI Temporal worker outside of Docker – this is useful if they are changing frequently and you want to avoid rebuilding the Docker containers. To do so, stop them in Docker and then run yarn dev:graph and yarn workspace @apps/hash-ai-worker-ts dev respectively in separate terminals.

  5. Launch app services:

    yarn start

    This will start backend and frontend in a single terminal. Once you see http://localhost:3000, the frontend end is ready to visit there. The API is online once you see localhost:5001 in the terminal. Both must be online for the frontend to function.

    You can also launch parts of the app in separate terminals, e.g.:

    yarn start:graph
    yarn start:backend
    yarn start:frontend

    See package.jsonscripts for details and more options.

  6. Log in

    There are three users seeded automatically for development. Their passwords are all password.

    • alice@example.com, bob@example.com – regular users
    • admin@example.com – an admin

If you need to run the browser plugin locally, see the README.md in the apps/plugin-browser directory.

Resetting the local database

If you need to reset the local database, to clear out test data or because it has become corrupted during development, you have two options:

  1. The slow option – rebuild in Docker

    1. In the Docker UI (or via CLI at your preference), stop and delete the hash-external-services container
    2. In 'Volumes', search 'hash-external-services' and delete the volumes shown
    3. Run yarn external-services up --wait to rebuild the services
  2. The fast option – reset the database via the Graph API

    1. Run the Graph API in test mode by running yarn dev:graph:test-server
    2. Run yarn graph:reset-database to reset the database
    3. If you need to use the frontend, you will also need to delete the rows in the identities table in the dev_kratos database, or signin will not work. You can do so via any Postgres UI or CLI. The db connection and user details are in .env

External services test mode

The external services of the system can be started in 'test mode' to prevent polluting the development database. This is useful for situations where the database is used for tests that modify the database without cleaning up afterwards.

To make use of this test mode, the external services can be started as follows:

yarn external-services:test up
Deploying HASH to the cloud

Deploying HASH to the cloud

To deploy HASH in the cloud, follow the instructions contained in the root /infra directory.

User authentication

Development users are seeded when the HASH API is started, these users are alice@example.com and bob@example.com. You'll be able to sign in to these users with the password password.

Sending emails

Email-sending in HASH is handled by either Kratos (in the case of authentication-related emails) or through the HASH API Email Transport (for everything else).

Transactional emails templates are located in the following locations:

  • Kratos emails in ./../../apps/hash-external-services/kratos/templates/. This directory contains the following templates:
    • recovery_code - Email templates for the account recovery flow using a code for the UI.
      • When an email belongs to a registered HASH user, it will use the valid template, otherwise the invalid template is used.
    • verification_code - Email verification templates for the account registration flow using a code for the UI.
      • When an email belongs to a registered HASH user, it will use the valid template, otherwise the invalid template is used.
  • HASH emails in ../hash-api/src/email/index.ts

To use AwsSesEmailTransporter instead, set export HASH_EMAIL_TRANSPORTER=aws_ses in your terminal before running the app. Note that you will need valid AWS credentials for this email transporter to work.

Integration with the Block Protocol

HASH is built around the open Block Protocol (@blockprotocol/blockprotocol on GitHub).

Using blocks

Blocks published to the Þ Hub can be run within HASH via the 'insert block' (aka. 'slash') menu.

While running the app in development mode, you can also test local blocks out in HASH by going to any page, clicking on the menu next to an empty block, and pasting in the URL to your block's distribution folder (i.e. the one containing block-metadata.json, block-schema.json, and the block's code). If you need a way of serving your folder, try serve.

HASH blocks

The code pertaining to HASH-developed blocks can be found in the /blocks directory in the root of this monorepo.

Creating new blocks

See the Developing Blocks page in the Þ Docs for instructions on developing and publishing your own blocks.

Development

The Graph Query Layer

HASH's primary datastore is an entity graph. The service that provides this is located within the /apps/hash-graph folder. The README contains more information for development. You do not need to visit that README or folder unless you want to amend the graph service.

Testing

Debug mode

Some parts of the UI designed to help with development/debugging are hidden. You can display these elements by running the following in your browser console.

localStorage["hash.internal.debugging"] = "true";

Backend integration tests

Backend integration tests are located in the /tests/hash-backend-integration folder.

The tests require a running instance of hash-external-services. see here for information on doing this without polluting the development database.

yarn test:backend-integration

We originally planned to use Playwright API testing feature instead of Jest (subsequently replaced by Vitest), which would have led to the convergence of yarn test:backend-integration and yarn test:playwright -- this may still happen.

Playwright tests

Playwright tests are browser-based integration and end-to-end tests. The playwright tests are located within the /tests/hash-playwright/tests folder. To run these tests locally, you will need to have both backend and frontend running.

  • The tests require a running instance of external-services. see here for information on doing this without polluting the development database.

Terminal 1

yarn dev:backend

Terminal 2

yarn seed-data

## option 1: frontend in dev mode
yarn dev:frontend

## option 2: frontend in prod mode
yarn workspace @apps/hash-frontend build
yarn workspace @apps/hash-frontend start

Terminal 3

yarn test:playwright

You can add extra arguments to configure how Playwright runs, e.g.:

yarn test:playwright --headed

See yarn test:playwright --help for more info.

Unit tests

Unit tests are executed by Vitest, which we use in place of Jest, due to its improved TS/ESM compatibility.

Unit tests can be launched at any time with this command:

yarn test:unit

Note: some of the unit tests may output console.error messages. Please disregard these and focus on the pass/fail indicators.

Going forward, consider using Playwright if you want to test the UI. Your tests will be less wired to the implementation details and thus be closer to what real users see and do.

Code quality

We perform automated linting and formatting checks on pull requests using GitHub Actions. When a pull request is created or updated, GitHub Action will run those checks. This includes ESLint, TSC, Biome, Markdownlint, rustfmt, and a few other tools. Some checks may be skipped depending on the files that have been changed in the pull request.

First-time contributors need to wait for a maintainer to manually launch the checks.

Monorepo

We use Yarn Workspaces to work with multiple packages in a single repository. Turborepo is used to cache script results and thus speed up their execution.

New packages

New local packages should follow these rules:

  1. Anything which is imported or consumed by something else belongs in libs/ and have a package.json "name":
    • beginning with @local/ for non-published JavaScript dependencies
    • identical to their npm name for published JavaScript dependencies
    • begin with @rust/ for Rust dependencies
  2. Things which are executed belong in apps/, and are named `@apps/app-name
  3. Packages which aren't published to npm should have "private": true in their package.json
  4. All TypeScript packages should be "type": "module"
  5. ESLint and TypeScript configuration should all extend the base configs (see existing examples in other packages). Don't modify or override anything unless necessary.

Read the next section to understand how to configure compilation for packages.

TypeScript package resolution / compilation

The package resolution setup is designed to meet two goals:

  1. Enable the local dependency graph for any application to be executed directly as TypeScript code during development, whilst
  2. Enabling it to be run as transpiled JavaScript in production.

This is achieved by maintaining two parallel exports definitions for each package:

  1. The exports field in package.json should point to the transpiled JavaScript (and typesVersions to the type definition files)
  2. The paths map in the base TSConfig should map the same import paths to their TypeScript source

During development (e.g. running yarn dev for an application), the paths override will be in effect, meaning that the source TypeScript is being run directly, and modifying any dependent file in the repo will trigger a reload of the application (assuming tsx watch or equivalent is used).

For production builds, where they are created, a tsconfig.build.json in the package is used which overwrites the paths field in the root config, meaning that the imports will resolve to the transpiled JavaScript (usually in a git-ignored dist/ folder).

Creating a production build should be done by running turbo run build, so that turbo takes care of building its dependencies first. Running yarn build may not work as expected, as the built JavaScript for its dependencies may be (a) missing or (b) out of date.

If a bundler is used rather than tsc, the paths override needs to be translated into the appropriate configuration for the bundler. For webpack, this is automated by adding the TsconfigPathsPlugin to the configuration's resolve field (search existing examples in repo).

New packages which are to be built as JavaScript, whether as an app or dependency, must follow these rules:

  1. They must have a tsconfig.json which extends the base config and sets "module": "NodeNext" and "moduleResolution": "NodeNext"
  2. Imports within a package must use relative imports and not the package's name (they will not be resolved when built otherwise)
  3. Relative imports within a package must have a .js file extension (tsc will enforce this)
  4. They must have a tsconfig.build.json which overrides the paths field ("paths": {})
  5. They must have a build command which uses this file (typically rimraf ./dist/ && tsc -p tsconfig.build.json)
  6. They must specify the paths exposed to consumers in exports and typesVersions in package.json, and paths in the base TSConfig
  7. They must have a turbo.json which extends the root and specifies the outputs for caching (see existing examples)

Authoring Patches

Patches to JavaScript packages are managed by Yarn, using the yarn patch command.

Creating a new patch

yarn patch <package>
# ➤ YN0000: Package <package>@npm:<version> got extracted with success!
# ➤ YN0000: You can now edit the following folder: /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-df787c87/user
# ➤ YN0000: Once you are done run yarn patch-commit -s /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-df787c87/user and Yarn will store a patchfile based on your changes.
# ➤ YN0000: Done in 0s 702ms

Once you have completed your changes, run the command that was output to commit the patch:

yarn patch-commit -s /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-df787c87/user

This will automatically create a patch file and put it into the .yarn/patches directory. If you're modifying a direct dependency in any workspace it will replace the package.json entry with a patch: reference to the patch file. In case you're patching an indirect dependency a new resolutions entry will be added to the root workspace package.json.

You will need to run yarn install for the patch to be installed and applied to the lockfile.

Modifying an existing patch

The procedure to modify an existing patch is very similar, but instead of running yarn patch <package> you will need to run yarn patch -u <package>. This will apply existing patches and then extract the package for you to modify.

yarn patch -u <package>
# ➤ YN0000: Package <package>@npm:<version> got extracted with success along with its current modifications!
# ➤ YN0000: You can now edit the following folder: /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-d772c076/user
# ➤ YN0000: Once you are done run yarn patch-commit -s /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-d772c076/user and Yarn will store a patchfile based on your changes.
# ➤ YN0000: Done in 1s 455ms

Once you have completed your changes, run the command that was output to commit the patch:

yarn patch-commit -s /private/var/folders/lk/j93xz9pd7nqgd5_2wlyxmbh00000gp/T/xfs-d772c076/user

This will automatically update the patch file with your changes. Do not forget to run yarn install for the patch to be installed and applied to the lockfile.

Removing a patch

Locate any patch: protocol entries in any workspace package.json and remove them. The entry will look somewhat similar to: patch:@changesets/assemble-release-plan@npm%3A5.2.4#~/.yarn/patches/@changesets-assemble-release-plan-npm-5.2.4-2920e4dc4c.patch, to remove the patch simply extract out the package (everything after the patch: and before #) and url-decode it and extract the version from it, so for the example it would be 5.2.4. You should not completely remove the line from the package.json.

In case the patch has been applied in the resolutions field you should also check if the resolution is made redundant. This is the case if the left side is the same as the right, e.g. "react@npm:18.2.0": "18.2.0" is redundant, same as "react@npm:18.2.0": "npm:18.2.0", or "react@npm:18.2.0": "npm:react@18.2.0", but "react": "npm:react@18.2.0" is not redundant.

A resolution specifier like "react": "npm:react@18.2.0", is also correct. Simply meaning that the react package should be resolved to the npm package react@18.2.0, in fact "react": "18.2.0" is simply a shorthand for "react": "npm:react@18.2.0".

If the left hand of a resolution has no version specifier it is assumed to be npm:*, e.g. "react": "18.2.0" is equivalent to "react@npm:*": "18.2.0" (replace react with version 18.2.0 regardless of the dependency requirement).

For more examples see the yarn documentation

Then run yarn install to remove the patch.

You can then safely remove the patch file from .yarn/patches.

Yarn currently does not provide a command to remove a patch, so you will need to do this manually.

Troubleshooting

eslint parserOptions.project

There is a mismatch between VSCode's eslint plugin and the eslint cli tool. Specifically the option parserOptions.project is not interpreted the same way as reported here. If VSCode complains about a file not being "on the project" underlining an import statement, try to add the following to the plugin's settings:

"eslint.workingDirectories": [
  { "directory": "apps/hash-api", "!cwd": true }
]

Services are not launched because ports are reported as busy

Make sure that ports 3000, 3333, 3838, 5001, 5432, 6379 and 9200 are not used by any other processes. You can test this by running:

lsof -n -i:PORT_NUMBER

TODO: replace lsof with npx ??? A,B,...N for a better DX. Suggestions welcome!

User Registration failing (WSL users)

If you're running the application on Windows through Windows Subsystem for Linux (WSL) you might need to change the registration url in apps/hash-external-services/docker-compose.yml from http://host.docker.internal:5001/kratos-after-registration to http://{WSL_IP}:5001/kratos-after-registration, where WSL_IP is the IP address you get by running:

wsl hostname -I

The kratos and kratos-migrate services will need to be restarted/rebuilt for the change to take effect.

Environment variables

Here's a list of possible environment variables. Everything that's necessary already has a default value.

You do not need to set any environment variables to run the application.

General API server environment variables

  • NODE_ENV: ("development" or "production") the runtime environment. Controls default logging levels and output formatting.
  • PORT: the port number the API will listen on.

AWS configuration

If you want to use AWS for file uploads or emails, you will need to have it configured:

  • AWS_REGION: The region, eg. us-east-1
  • AWS_ACCESS_KEY_ID: Your AWS access key
  • AWS_SECRET_ACCESS_KEY: Your AWS secret key
  • AWS_S3_UPLOADS_BUCKET: The name of the bucket to use for file uploads (if you want to use S3 for file uploads), eg: my_uploads_bucket
  • AWS_S3_UPLOADS_ACCESS_KEY_ID: (optional) the AWS access key ID to use for file uploads. Must be provided along with the secret access key if the API is not otherwise authorized to access the bucket (e.g. via an IAM role).
  • AWS_S3_UPLOADS_SECRET_ACCESS_KEY: (optional) the AWS secret access key to use for file uploads.
  • AWS_S3_UPLOADS_ENDPOINT: (optional) the endpoint to use for S3 operations. If not, the AWS S3 default for the given region is used. Useful if you are using a different S3-compatible storage provider.
  • AWS_S3_UPLOADS_FORCE_PATH_STYLE: (optional) set true if your S3 setup requires path-style rather than virtual hosted-style S3 requests.

For some in-browser functionality (e.g. document previewing), you must configure a Access-Control-Allow-Origin header on your bucket to be something other than '*'.

File uploads

By default, files are uploaded locally, which is not recommended for production use. It is also possible to upload files on AWS S3.

  • FILE_UPLOAD_PROVIDER: Which type of provider is used for file uploads. Possible values LOCAL_FILE_SYSTEM, or AWS_S3. If choosing S3, then you need to configure the AWS_S3_UPLOADS_ variables above.
  • LOCAL_FILE_UPLOAD_PATH: Relative path to store uploaded files if using the local file system storage provider. Default is var/uploads (the var folder is the folder normally used for application data)

Email

During development, the dummy email transporter writes emails to a local folder.

  • HASH_EMAIL_TRANSPORTER: dummy or aws. If set to dummy, the local dummy email transporter will be used during development instead of aws (default: dummy)
  • DUMMY_EMAIL_TRANSPORTER_FILE_PATH: Default is var/api/dummy-email-transporter/email-dumps.yml
  • DUMMY_EMAIL_TRANSPORTER_USE_CLIPBOARD: true or false (default: true)

OpenSearch

NOTE: Opensearch is currently disabled by default due to issues.

  • HASH_OPENSEARCH_ENABLED: whether OpenSearch is used or not. true or false. (default: false).
  • HASH_OPENSEARCH_HOST: the hostname of the OpenSearch cluster to connect to. (default: localhost)
  • HASH_OPENSEARCH_PASSWORD: the password to use when making the connection. (default: admin)
  • HASH_OPENSEARCH_PORT: the port number that the cluster accepts (default: 9200)
  • HASH_OPENSEARCH_USERNAME: the username to connect to the cluster as. (default: admin)
  • HASH_OPENSEARCH_HTTPS_ENABLED: (optional) set to "1" to connect to the cluster over an HTTPS connection.

Postgres

  • POSTGRES_PORT (default: 5432)

Various services also have their own configuration.

The Postgres superuser is configured through:

  • POSTGRES_USER (default: postgres)
  • POSTGRES_PASSWORD (default: postgres)

The Postgres information for Kratos is configured through:

  • HASH_KRATOS_PG_USER (default: kratos)
  • HASH_KRATOS_PG_PASSWORD (default: kratos)
  • HASH_KRATOS_PG_DATABASE (default: kratos)

The Postgres information for Temporal is configured through:

  • HASH_TEMPORAL_PG_USER (default: temporal)
  • HASH_TEMPORAL_PG_PASSWORD (default: temporal)
  • HASH_TEMPORAL_PG_DATABASE (default: temporal)
  • HASH_TEMPORAL_VISIBILITY_PG_DATABASE (default: temporal_visibility)

The Postgres information for the graph query layer is configured through:

  • HASH_GRAPH_PG_USER (default: graph)
  • HASH_GRAPH_PG_PASSWORD (default: graph)
  • HASH_GRAPH_PG_DATABASE (default: graph)

Redis

  • HASH_REDIS_HOST (default: localhost)
  • HASH_REDIS_PORT (default: 6379)

Statsd

If the service should report metrics to a StatsD server, the following variables must be set.

  • STATSD_ENABLED: Set to "1" if the service should report metrics to a StatsD server.
  • STATSD_HOST: the hostname of the StatsD server.
  • STATSD_PORT: (default: 8125) the port number the StatsD server is listening on.

Snowplow telemetry

  • HASH_TELEMETRY_ENABLED: whether Snowplow is used or not. true or false. (default: false)
  • HASH_TELEMETRY_HTTPS: set to "1" to connect to the Snowplow over an HTTPS connection. true or false. (default: false)
  • HASH_TELEMETRY_DESTINATION: the hostname of the Snowplow tracker endpoint to connect to. (required)
  • HASH_TELEMETRY_APP_ID: ID used to differentiate application by. Can be any string. (default: hash-workspace-app)

Others

  • FRONTEND_URL: URL of the frontend website for links (default: http://localhost:3000)
  • NOTIFICATION_POLL_INTERVAL: the interval in milliseconds at which the frontend will poll for new notifications, or 0 for no polling. (default: 10_000)
  • HASH_INTEGRATION_QUEUE_NAME The name of the Redis queue which updates to entities are published to
  • HASH_REALTIME_PORT: Realtime service listening port. (default: 3333)
  • HASH_SEARCH_LOADER_PORT: (default: 3838)
  • HASH_SEARCH_QUEUE_NAME: The name of the queue to push changes for the search loader service (default: search)
  • API_ORIGIN: The origin that the API service can be reached on (default: http://localhost:5001)
  • SESSION_SECRET: The secret used to sign sessions (default: secret)
  • LOG_LEVEL: the level of runtime logs that should be omitted, either set to debug, info, warn, error (default: info)
  • BLOCK_PROTOCOL_API_KEY: the api key for fetching blocks from the Þ Hub. Generate a key at https://blockprotocol.org/settings/api-keys.

Contributors

The HASH application's development is overseen by HASH (the company).

As an open-source project, we gratefully accept external contributions and have published a contributing guide that outlines the process. If you have questions, please open a discussion.