Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

[feature request] support configurable hashing and crypto algorithms for Dragonfly #1204

Open
starnop opened this issue Feb 3, 2020 · 5 comments
Labels
areas/security kind/feature kind/feature-request This is a feature request from community for project

Comments

@starnop
Copy link
Contributor

starnop commented Feb 3, 2020

Why you need it?

I also had another question on the crypto algorithms used in the project - the presentation seemed to indicate a dependence on MD5 for hashing and DES for encryption.   Both of those algorithms are very outdated with several published flaws and not recommended for a modern project.   We also discussed on the call that the on-block format and layout makes it hard to change these algorithms due to issues with block versioning etc ...   Could you provide some info on the plans for moving to more modern algorithms like SHA256 and AES for the project?

FROM Alex Chircop

How it could be?

Other related information

@pouchrobot pouchrobot added kind/feature kind/feature-request This is a feature request from community for project labels Feb 3, 2020
@starnop starnop changed the title [feature request] support configurable algorithms for hashing [feature request] support configurable algorithms for Dragonfly Feb 4, 2020
@starnop
Copy link
Contributor Author

starnop commented Feb 4, 2020

How to handle the configurable algorithms of Dragonfly

We also discussed on the call that the on-block format and layout makes it hard to change these algorithms due to issues with block versioning etc ...

Let's take a look at how the current CDN stores data firstly:
The supernode generates a taskID for each file, and CDN manager creates a directory based on the first three bytes of taskID. For each task, CDN manager uses three files to store information about the task

File Name File Content Example
taskID The task file content. 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1
taskID.meta The task meta info. 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.meta
taskID.md5 The task hash info. 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.md5
$ ll -h /home/admin/supernode/repo/download/1a0
-rw-r--r-- 1 root root  22M Feb  4 09:23 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1
-rw-r--r-- 1 root root  319 Feb  4 09:23 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.md5
-rw-r--r-- 1 root root  376 Feb  4 09:23 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.meta

$ cat 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.md5
# block MD5: block content size
ce5584163a368f2856c0a28cdac1a731:4194304
73ac985ba8d37fbc99c3c113c9170e84:4194304
60df935374c85b208c4cb43d1959f331:4194304
35d319f56c4895daa5a9f8427de4340e:4194304
b0a07657a0b1b0e133c79785c63e8724:4194304
2362dcbf5d5294be0b1744f40f0e423a:1048606
# file MD5
4aae22e14d5a70eaa769d3ee50804427
# the SHA1 info of the above data
418fe37595d7d3f9731a6d6b335b275605bdb791

$ cat 1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1.meta | jq 
{
  "taskID": "1a09b5b0c0bb42c2f87b217b71a637c766ee8c598ba3790025d83212f2e7dce1",
  "url": "http://127.0.0.1:8001/randomFile",
  "pieceSize": 4194304,
  "httpFileLen": 22020096,
  "bizId": "",
  "accessTime": 1580806876806,
  "interval": 0,
  "fileLength": 22020126,
  "md5": "",
  "realMd5": "4aae22e14d5a70eaa769d3ee50804427",
  "lastModified": 1580778763000,
  "eTag": "\"5e38c50b-1500000\"",
  "finish": true,
  "success": true
}

NOTE: The task file content does not store the original raw data, but sequentially stores all the block data after processing. And the structure of each block is shown in the following figure:
image

After understanding the storage principle of CDN, we prefer to config and record the info in the task meta file both for hash algorithms and encryption algorithms, as well as configuration items that may be added in the future by adding new fields for task meta.

  • It will not break compatibility.
  • Flexibility and scalability can also be achieved through task meta.

At present, the task meta struct is as follows:

type fileMetaData struct {
	TaskID      string `json:"taskID"`
	URL         string `json:"url"`
	PieceSize   int32  `json:"pieceSize"`
	HTTPFileLen int64  `json:"httpFileLen"`
	Identifier  string `json:"bizId"`
	AccessTime   int64  `json:"accessTime"`
	Interval     int64  `json:"interval"`
	FileLength   int64  `json:"fileLength"`
	Md5          string `json:"md5"`
	RealMd5      string `json:"realMd5"`
	LastModified int64  `json:"lastModified"`
	ETag         string `json:"eTag"`
	Finish       bool   `json:"finish"`
	Success      bool   `json:"success"`
}

Use configurable algorithms for hashing not only MD5

As dragonflyoss/dragonfly#1205 (comment) mentioned, Dragonfly uses MD5 and SHA1 to guarantee data integrity. However, this will also bring some performance losses. For sha256, that will cost more time for hashing.

In a way, we prefer performance over security. But actually maybe this is not the best choice for everyone. We should make it configurable.

Let's look at an example with configurable algorithms for hashing:

  1. Start the supernode with the specified hash algorithm. E.g. MD5, sha256, etc.
  2. The dfget start to download a file.
    2.1 Dfget register to the supernode and get the register result with the hash algorithm used by supernode.
    2.2 Supernode will trigger CDN if not exist and CDN will use the specified hash algorithm.
  3. When dfget succeed in downloading a block, and it will check the block content with the hash algorithm which returned when registering to supernode.

Use configurable algorithms for encryption not only DES

In fact, in our plans, we are splitting the scheduler and the CDN modules for supernode. And then for the new CDN architecture, the CDN manager will process each piece by the Post Processor List which Including encryption processing. And the encryptor will use the algorithm specified by CDN config which will also be stored in the task meta file and will be sent along with the register result.

image

@allencloud allencloud changed the title [feature request] support configurable algorithms for Dragonfly [feature request] support configurable hashing and crypto algorithms for Dragonfly Feb 7, 2020
@chira001
Copy link

Thank you for the feature request and clarifying the process how compatibility could be integrated.

@justincormack
Copy link

I am confused by the reference to DES here, as I can't find any DES code, can someone point me at it?

@chira001
Copy link

I am confused by the reference to DES here, as I can't find any DES code, can someone point me at it?

see the project presentation recording at 38:00 mark: https://www.youtube.com/watch?v=Sl7ZDt7rx4M

@lowzj
Copy link
Member

lowzj commented Mar 15, 2020

I am confused by the reference to DES here, as I can't find any DES code, can someone point me at it?

There may be a little mistakes in the question. Dragonfly uses the MD5 and SHA1 to guarantee data integrity at present, but not DES. It's on plan in the Dragonfly's new architecture to make it easy to encrypt data and make the encryption algorithm configurable. This comment describes the detailed information.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
areas/security kind/feature kind/feature-request This is a feature request from community for project
Projects
None yet
Development

No branches or pull requests

6 participants