Skip to content

feature: Add FIPS compliant hashing algorithm SHA256 #909

@mdambski

Description

@mdambski

Did you check the docs?

  • I have read all the NeMo-Guardrails docs

Is your feature request related to a problem? Please describe.

The library utilizes MD5 for hashing in embeddings cache and knowledge base.

In embeddings cache:

  • The library utilizes two hash key generators. One uses MD5 and other uses Python built-in hash function, which has utilized the SipHash algorithm since Python 3.4, as introduced in PEP456.

For knowledgebase:

Both algorithms (MD5, SipHash) are not approved for use under US federal requirements (FIPS regulation). To ensure the library is usable by companies that need to remain compliant, adding an additional compliant algorithm like SHA-256 as a configuration option is a small change but would make a difference.

Describe the solution you'd like

In nemoguardrails/embeddings/cache.py, add one more (non-default) implementation of the Key Generator which would allow the usage of SHA-256 as an alternative.

In KnowledgeBaseConfig located in nemoguardrails/rails/llm/config.py, allow configuring the hash algorithm of choice (which would allow choosing SHA-256 instead of only MD5) and use it in the KnowledgeBase class for computing the hash.

Describe alternatives you've considered

Alternative supported algorithms are also a viable option.

In case of KnowledgeBase class the most straightforward solution would be just replacing the call to hashlib.md5 with hashlib.sha256 but this would not be backward compatibile and might cause recomputation of knowledge base. This is why I proposed to add configuration option, even though it is only used for naming of the cache file.

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions