Solving issue of differents ckpt with same hash #2459

jn-jairo · 2022-10-13T06:37:25Z

Issue

This issue may happen with specific ckpt files when merged with interpolations pairs that add up to 1 like you can see in the screenshot bellow:

First I thought it could be just the first characters, but it turns out the whole hash is equal, as you can see:

Checking the files proves their content are different:

So just the bytes being used to create the hash are equal as I could verify:

Solution

To solve this issue I added a option to create the hash using the entire model and save it to a .sha256 file next to the .ckpt file in the first execution and reading from it in the following executions.

The content of the sha256 file follow the default sha256 format, allowing to be checked using the command sha256sum -c model.sha256 and you can even create the sha256 file yourself using a command like sha256sum -b model.ckpt > model.sha256 or equivalent and upload it to the models folder along with the ckpt to avoid the hash generation time on the first start.

This solves the issue of differents ckpt files with the same hash without breaking the code to who don't have this issue, and just the first execution is slower, the following executions are as fast as the default hash code.

The new hashes for comparison:

Environment this was tested in

OS: Ubuntu 20.04.5 LTS Linux 5.4.0-128-generic x86_64
Browser: Brave 1.44.108 Chromium 106.0.5249.103 64 bits
Graphic card: NVIDIA GeForce MX150 2GB

…e it to a sha256 file

dfaker · 2022-10-13T17:15:09Z

@raefu had a nice and consistently fast solution to this: hashing the zip directory section at the end of the file so it's a hash of the attributes and crcs of all the contents.

jn-jairo · 2022-10-13T19:11:15Z

@dfaker cool, but I don't know how to do it, so someone will need to help with that solution.

d8ahazard · 2022-10-13T20:55:27Z

Silly question - would this break existing hashes stored in images?

If so, then I'd definitely want an option to enable this or not. Or use a "hashv2" param in infotext or something.

jn-jairo · 2022-10-13T23:20:55Z

@d8ahazard yes, the new hash will be different from the old one, there is a option to enable this, the default is disabled, but your idea to add something to differentiate in the info text is good, I think we should discurse about it and find a good way to differentiate from the hash method used.

Some options:

Add v2 to the hash like: 87d1ac53-v2
Use more characters instead of 8: 87d1ac53ab
Use another param in info text: Model hash: 87d1ac53, Model hash version: v2

dfaker · 2022-10-14T10:01:38Z

@dfaker cool, but I don't know how to do it, so someone will need to help with that solution.

it's a small section at the end of the file, starting with the signature 0x02014b50, taking the last MB of the .pt sould capture it.

AUTOMATIC1111 · 2022-10-14T18:05:11Z

the biggest problem is what we do with all current model hashes

0xdevalias · 2022-10-24T01:28:30Z

Some options:

Add v2 to the hash like: 87d1ac53-v2
Use more characters instead of 8: 87d1ac53ab
Use another param in info text: Model hash: 87d1ac53, Model hash version: v2

I like the hash-v2 and/or Model hash: 87d1ac53, Model hash version: v2 options out of the 3 suggestions here. I feel like the 'use more characters' options is too obscure/'magic' feeling. I tend to personally prefer explicitness.

the biggest problem is what we do with all current model hashes

Just riffing off the top of my head here and haven't fully thought this through, but if the original hashing method is kept, alongside this new method (particularly if the new method designates itself as a v2/etc in the hash in some identifiable way), then presumably it would be possible to look up the hash either in both 'old' (current) v1 way, or the new v2 way.

There will still be the edgecases where the current v1 hash clashes for distinct models obviously, but in those cases perhaps it could just show a list of the models that match, and potentially offer to upgrade the embedded hash to the v2 hash. (I'm not actually familiar with the workflow around how the hashes are used, so if the above doesn't match the reality of how they're used, adjust/ignore as appropriate)

eg.

OldV1Hash	NewV2Hash	Lookup (v1->v2)	Reverse Lookup (v2->v1)
AAAAAAAA	A2345678-v2	A2345678-v2	AAAAAAAA
BBBBBBBB	B2345678-v2	B2345678-v2	BBBBBBBB
CCCCCCCC	C2345678-v2	C2345678-v2 or D2345678-v2	CCCCCCCC
CCCCCCCC	D2345678-v2	C2345678-v2 or D2345678-v2	CCCCCCCC

0xdevalias · 2022-10-24T01:36:48Z

Also, just as a 'prior art' reference, this (hashing the entire *.ckpt file + outputting the full hash as a *.sha256) appears to be what InvokeAI is currently doing:

https://github.com/invoke-ai/InvokeAI/search?q=sha256

https://github.com/invoke-ai/InvokeAI/blob/4b95c422bde493bf7eb3068c6d3473b0e85a1179/ldm/invoke/model_cache.py#L263-L281

# ~/dev/stable-diffusion/InvokeAI/models/ldm/stable-diffusion-v1

⇒  ls
model.ckpt  model.sha256

⇒  cat model.sha256
fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556%

⇒  time sha256sum --binary model.ckpt
fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556 *model.ckpt
sha256sum --binary model.ckpt  18.98s user 0.60s system 99% cpu 19.728 total

jn-jairo · 2022-10-24T01:43:57Z

@0xdevalias thank you, I was thinking the same, I just didn't have time to code it, I will take a look in that InvokeAI code to see if it helps.

0xdevalias · 2022-10-24T02:06:18Z

@dfaker: @raefu had a nice and consistently fast solution to this: hashing the zip directory section at the end of the file so it's a hash of the attributes and crcs of all the contents.

@dfaker: it's a small section at the end of the file, starting with the signature 0x02014b50, taking the last MB of the .pt sould capture it.

https://en.wikipedia.org/wiki/ZIP_(file_format)#Structure
- A ZIP file is correctly identified by the presence of an end of central directory record which is located at the end of the archive structure in order to allow the easy appending of new files. If the end of central directory record indicates a non-empty archive, the name of each file or directory within the archive should be specified in a central directory entry, along with other metadata about the entry, and an offset into the ZIP file, pointing to the actual entry data. This allows a file listing of the archive to be performed relatively quickly, as the entire archive does not have to be read to see the list of files.
https://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers
- All multi-byte values in the header are stored in little-endian byte order. All length fields count the length in bytes.
https://en.wikipedia.org/wiki/ZIP_(file_format)#Central_directory_file_header
- Signature: 0x02014b50 (in little endian: PK\x01\x02)
https://en.wikipedia.org/wiki/ZIP_(file_format)#End_of_central_directory_record_(EOCD)
- Signature: 0x06054b50 (in little endian: PK\x05\x06)
- Among other things: Offset of start of central directory, relative to start of archive

So sounds like you could probably open the *.ckpt file as read binary, seek to the end, check that it has the EOCD (0x06054b50), then read backwards till you find the CFDH (0x02014b50), then take that slice of data and sha256 hash it.

Some quick Google/StackOverflow results with some example code:

Though this might be a little 'low-level', and may just be worth seeing if it's possible to use an existing python zip lib such as:

https://docs.python.org/3/library/zipfile.html
- https://docs.python.org/3/library/zipfile.html#zipfile.is_zipfile
- https://docs.python.org/3/library/io.html#performance

0xdevalias · 2022-10-24T04:00:43Z

So my brain got curious, and decided to dive into writing a little PoC script for this:

https://github.com/0xdevalias/poc-quick-zip-sha256-hash
- https://github.com/0xdevalias/poc-quick-zip-sha256-hash/blob/main/quick-zip-sha256hash.py

This script will efficiently read the *.ckpt zip file by seeking to the 'end of central directory' (EOCD) record, reading the 'central directory' (CD) offset + length from it, then seeking to the CD offset, reading the CD record in, and then calculating the SHA256 of the CD. It also does some basic error/sanity checking along the way to ensure the file doesn't seem to be corrupted.

Running it looks like this:

⇒  ./quick-zip-sha256hash.py
>> *.ckpt file looks good!
>> Calculating sha256 hash of the Zip Central Directory from the *.ckpt weights file
>> sha256(ckpt_cd) = 685bf114177d8ed310eead5838d4ca5aa6e396a64ab978ca91a0dbfcb6247f02 (0.00s)

The code also writes the hash out to model.sha256-cd:

⇒  cat model.sha256-cd
685bf114177d8ed310eead5838d4ca5aa6e396a64ab978ca91a0dbfcb6247f02%

Note that model.ckpt here is sd-v1-4.ckpt, and so all going to plan, running my script on your SD 1.4 should give the same hash.

jn-jairo · 2022-10-24T05:10:12Z

@0xdevalias Sorry, your code didn't worked for other models, I got an error for the model wd-v1-2-full-ema.ckpt
Didn't find the *.ckpt Zip file Central Directory (CD) signature where we expected to. Is the *.ckpt corrupted?

0xdevalias · 2022-10-24T07:38:43Z

I don't have that particular *.ckpt file, so I can't really check locally, but the PoC code is fairly straightforward, so it shouldn't be too hard to follow along, add in some debugging/print statements/etc to see what's going on, and explore what issues are happening that's leading to that error being hit.

You can see the code that raises that error here. Essentially it seeks to the file offset that the EOCD told it should be the start of the CD record, then attempts to read in the length of the record's bytes. It then checks the first 4 bytes to see if they match the 'magic number' that defines the EOCD record start, and if not, throws that error:

https://github.com/0xdevalias/poc-quick-zip-sha256-hash/blob/main/quick-zip-sha256hash.py#L51-L59

  # Seek to where we expect the Central Directory (CD) record to start and read it in
  fh.seek(ckpt_eocd_cd_offset, os.SEEK_SET)
  ckpt_cd = fh.read(ckpt_eocd_cd_size_bytes)

  # https://en.wikipedia.org/wiki/ZIP_(file_format)#Central_directory_file_header
  ckpt_cd_sig = ckpt_cd[0:4]

  if ckpt_cd_sig != cd_sig:
    raise Exception("Didn't find the *.ckpt Zip file Central Directory (CD) signature where we expected to. Is the *.ckpt corrupted?")

Original message I wrote before I realised I read the error you pasted wrong; in case it's helpful still

I don't have that particular *.ckpt file, so I can't really check locally, but if you look at the note in the PoC code at that section, it suggests that if the *.ckpt has comments in it, then this current naive approach won't work, and you'll need to seek further back in chunks and then scan for the 'magic string' signature to find the start of the EOCD.

https://github.com/0xdevalias/poc-quick-zip-sha256-hash/blob/main/quick-zip-sha256hash.py#L35-L37

if ckpt_eocd_sig != eocd_sig:
    raise Exception("Didn't find the *.ckpt Zip file End of Central Directory (EOCD) signature where we expected to. Is the *.ckpt corrupted, or does the Zip file have comments in it?")
    # NOTE: If the Zip file has comments, then you'd need to seek further back in chunks, and search for the EOCD signature

Shouldn't be too hard to implement, as all the bits and pieces are already there. But I'll leave that as an exploration/exercise for the reader to implement :) (aka: feel free to iterate on my proof of concept to make it more robust/cover edge cases/etc)

jn-jairo · 2022-10-24T07:56:32Z

Updated the code:

added option to select the hash version (default: 1)
added option to show the old hash along with the new one (default: False)
added backward compatibility with v1, so uploading a info or x/y plot with a v1 hash will select the correct model, no matter the hash version configured

RupertAvery · 2022-11-09T13:02:35Z

My approach is to sum the CRC32's of the files inside the /archive folder and subdirectories (/archive/data).

See #4478

To be clear, it's more of a quick and dirty content-hash rather than a true hash, but for my intended purpose, it's fast and generates unique values.

The reason being, I would like for there to be support for an embedded diffusion-specific metadata file, containing info about the model, most especially the trigger words and descriptions.

By limiting the hash function to model-specific files, we can freely inject and modify metadata without worrying about breaking the hash.

The hash function can be a sum of CRC32s, or a SHA256 of the concatenation, whichever is more unique. From my limited tests a CRC32 sum of all model files is good enough to make unique checkpoints from most checkpoints, even ones with similar base trainings, and is as fast as the current method.

The SD community sorely needs to embed metadata into ckpts, with the explosion of different models and their trained triggers.

Currently if you download a dreambooth model, there is no way to know the triggers unless you find the original download site or the post in reddit that the author wrote.

Having an embedded metadata that documents the triggers and other info will be extremely useful to the community.

Also suggest keeping the existing hash, but adding a hash-v2 in the UI and PNGInfo to avoid breaking existing hashes. Eventually(?) we can phase out the old hash.

Here's my implementation of the v2 hash:

def model_hash_v2(filename):
    try: 
        import zipfile
        z = zipfile.ZipFile(filename, "r")
        sum = 0
        for info in z.infolist():
            if info.filename.startswith("archive"):
                sum = sum + info.CRC & 0xFFFFFFFF
        return '{:08x}'.format(sum)

    except FileNotFoundError:
        return 'NOFILE'

Jonseed · 2022-11-09T14:47:03Z

I can confirm that @RupertAvery way of hashing produces unique hashes, and it is very fast. I also like the idea of adding metadata to the ckpt files, as they are proliferating in number, and they are becoming increasingly difficult to organize.

Can metadata be added to the textual inversion embeddings, hypernetwork embeddings, and aesthetic gradients too? Or would this have to be a separate file? Should these also have a hash to uniquely identify them?

0xdevalias · 2022-11-09T22:15:59Z

My approach is to sum the CRC32's of the files inside the /archive folder and subdirectories (/archive/data).

See #4478

To be clear, it's more of a quick and dirty content-hash rather than a true hash, but for my intended purpose, it's fast and generates unique values.

I can confirm that @RupertAvery way of hashing produces unique hashes, and it is very fast.

@RupertAvery @Jonseed That's basically the same solution proposed earlier in this PR:

@raefu had a nice and consistently fast solution to this: hashing the zip directory section at the end of the file so it's a hash of the attributes and crcs of all the contents.

Originally posted by @dfaker in #2459 (comment)

And PoC implemented by me above:

So my brain got curious, and decided to dive into writing a little PoC script for this:

https://github.com/0xdevalias/poc-quick-zip-sha256-hash

https://github.com/0xdevalias/poc-quick-zip-sha256-hash/blob/main/quick-zip-sha256hash.py

This script will efficiently read the *.ckpt zip file by seeking to the 'end of central directory' (EOCD) record, reading the 'central directory' (CD) offset + length from it, then seeking to the CD offset, reading the CD record in, and then calculating the SHA256 of the CD. It also does some basic error/sanity checking along the way to ensure the file doesn't seem to be corrupted.

..snip..

Originally posted by @0xdevalias in #2459 (comment)

Also suggest keeping the existing hash, but adding a hash-v2 in the UI and PNGInfo to avoid breaking existing hashes. Eventually(?) we can phase out the old hash.

@RupertAvery See prior discussions above in this PR about exactly that:

Some options:

Add v2 to the hash like: 87d1ac53-v2
Use more characters instead of 8: 87d1ac53ab
Use another param in info text: Model hash: 87d1ac53, Model hash version: v2

I like the hash-v2 and/or Model hash: 87d1ac53, Model hash version: v2 options out of the 3 suggestions here. I feel like the 'use more characters' options is too obscure/'magic' feeling. I tend to personally prefer explicitness.

the biggest problem is what we do with all current model hashes

Just riffing off the top of my head here and haven't fully thought this through, but if the original hashing method is kept, alongside this new method (particularly if the new method designates itself as a v2/etc in the hash in some identifiable way), then presumably it would be possible to look up the hash either in both 'old' (current) v1 way, or the new v2 way.

There will still be the edgecases where the current v1 hash clashes for distinct models obviously, but in those cases perhaps it could just show a list of the models that match, and potentially offer to upgrade the embedded hash to the v2 hash. (I'm not actually familiar with the workflow around how the hashes are used, so if the above doesn't match the reality of how they're used, adjust/ignore as appropriate)

eg.

OldV1Hash NewV2Hash Lookup (v1->v2) Reverse Lookup (v2->v1)

AAAAAAAA A2345678-v2 A2345678-v2 AAAAAAAA

BBBBBBBB B2345678-v2 B2345678-v2 BBBBBBBB

CCCCCCCC C2345678-v2 C2345678-v2 or D2345678-v2 CCCCCCCC

CCCCCCCC D2345678-v2 C2345678-v2 or D2345678-v2 CCCCCCCC

Originally posted by @0xdevalias in #2459 (comment)

Updated the code:

added option to select the hash version (default: 1)

added option to show the old hash along with the new one (default: False)

added backward compatibility with v1, so uploading a info or x/y plot with a v1 hash will select the correct model, no matter the hash version configured

..snip..

Originally posted by @jn-jairo in #2459 (comment)

The reason being, I would like for there to be support for an embedded diffusion-specific metadata file, containing info about the model, most especially the trigger words and descriptions.

Having an embedded metadata that documents the triggers and other info will be extremely useful to the community.

👏🏻👌🏻 Agreed.

jn-jairo · 2022-11-10T03:08:19Z

I made a new PR #4546 with the code of this PR and the code suggested by @RupertAvery in #2459 (comment)

@RupertAvery just to be clear, I did this PR before your feature request #4478, the proposal of this PR is to solve the ckpt with the same hash while keeping a backward compatibility, which this PR does, if you wish to add a metadata file inside the ckpt file feel free to open a PR with that code later.

RupertAvery · 2022-11-12T08:13:30Z

So I just tried loading a checkpoint with a file diffusion.json in the root of the zip and it looks like torch itself refuses to load the model weights if there is anything outside of the archive folder, so moving the file into that folder allowed the model to load.

It's an easy fix, not one I'm entirely happy with.

            if info != "archive/diffusion.json":
                sum = sum + info.CRC & 0xFFFFFFFF

JustMaier · 2022-12-12T03:26:32Z

This problem still exists, right?
Do we at least have an idea of what the new standard will be? I need to hash all of the models on Civitai (391 models) and I'd rather not have to spend the bandwidth to do it more than once :P

RupertAvery · 2022-12-14T13:59:36Z

With safetensors, the proposed method of hashing the CRCs no longer works, because safetensors aren't zip files. This leads to the question, how can we hash safetensors properly?

@JustMaier I've also thought about indexing the civitai models, and if it's possible to just just the part of the file necessary to compute the hashes, of maybce ask civitai to expose the hashes in an API

JustMaier · 2022-12-14T15:24:37Z

So I'm the guy behind Civitai, we don't have the hashes right now because I wasn't sure what was going to be done about this issue and I didn't want to pull everything down for hashing more than once.

Since CRC hashing isn't an option for safetensors I think the standard should be SHA256. If I understand the concern with that, is that it will take longer to compute, right?

RupertAvery · 2022-12-15T11:51:54Z

Hi! The problem with computing hashes is that Automatic1111 does it in real time, and that's probably why such shortcut method was used.

If we're going to add a new format anyway, i.e. safetensors, then I advocate creating a standard container for checkpoints, safetensor OR ckpt, with metadata to say which it is, what the hash is, and have a whole lot of space for author information.

Just a 2MB header with space for plaintext json metadata, and a precomputed SHA256 would be good enough. Though, 2MB is probably overkill. Having that empy padded space would allow authors to freely edit their metadata, without having to repack the actual weights. Also, the sha256 is just there for anyone who wants to actually check it.

We just need tools to enable aothors to move to the new format, and support from the WebUIs to read it. Like they say, build it, and they will come.

It's an additional burden on finetuners, but hopefully dreambooth UIs can integrate this into their process,

It isn't even a new format, it's just an additional header, with the actual data offset. I don't know how easy it is for pytorch to load a file from an offset instead of directly though. If it is possible, we don't even have to break anything, it's just an alternate way of loading checkpoints into memory.

Doesn anybody know if this is feasible?

also, what's a good extension for this container format?

.diffusion?

JustMaier · 2022-12-15T15:15:59Z

We just need tools to enable aothors to move to the new format, and support from the WebUIs to read it. Like they say, build it, and they will come.

I like the idea of this, there is plenty of metadata that could and should be included in models (think merge tracking etc), the challenge would be propagating the standard. I think this could be helped with easy-to-use python packages that make implementing the standard easier and additional tools that can be used by end users to add metadata to existing AI art resources (checkpoints, embeds, hypernetworks, etc). Additionally, I'd like to think that if we implemented it as part of Civitai, that we could automatically apply this metadata to the 1,284 resources we're currently housing and help start the trend.

One bonus of coming up with a metadata standard is that it can continue to be used as new checkpoint formats or other AI art resources are released. For example, LORAs just hit the scene, wouldn't it be great if they were able to include the same metadata format.

RupertAvery · 2022-12-16T07:57:52Z

Yea, and we could extend this concept to embeddings, hypernetworks. Different extension perhaps, but having a header there.

I'm the author of Diffusion Toolkit,

https://github.com/RupertAvery/DiffusionToolkit

and I see that as a way to make it accessible to windows users. I plan to put more checkpoint related tooling into it anyway.

JustMaier · 2022-12-18T17:24:35Z

I wonder if there is a way to include the metadata without making the files not work in tools that don’t support the metadata so that adoption doesn’t have to be blocked by “waiting for my favorite generator to support the format”

great tool btw. I need to dig into it to see how you’ve pulled out the metadata from each of those tools. I need the same thing on Civitai. Right now it only supports AUTO metadata. Got a file I should look at?

also if your serious about this, we should start a proposal somewhere to see if we can gather any input and get a few tool maintainers on board.

RupertAvery · 2022-12-19T01:22:27Z

I wonder if there is a way to include the metadata without making the files not work in tools that don’t support the metadata so that adoption doesn’t have to be blocked by “waiting for my favorite generator to support the format”

That was my original goal with ckpts, since they are zip files, they can contain anything without breaking the loading, you just have to tweak some scripts to allow the file we're going to add. The possibility of that went away with safetensors.

Another way of course is just to have a .json file next to the ckpt. Instant "metadata".

Even then, we still have to somehow add support to GUIs to read the metadata and display it somewhere useful. It would help if we could get someone already familiar with gradio and a1111's gui code on board. Like maybe an extension author. I'm willing to dive in, but I'm a little busy with other things right now.

Right now it only supports AUTO metadata. Got a file I should look at?

Everything is in Metadata.cs. If you look at the closed issues, there will be images there with test data.

also if your serious about this, we should start a proposal somewhere to see if we can gather any input and get a few tool maintainers on board.

We'll just have to try to push this forward as much as possible by making a branch and promoting it (like, telling anyone willing to try the fork). Unfortunately, not everyone will be git-savvy or waiting for us to merge the constant infux of commits from the main repo.

By including metadata authoring ni DIffusion Toolkit, I hope to generate some hype for it. I could start with storing it in a side-along json file, or in the database. It's only useful to Diffusion Toolkit, but at least users can check manage their models a bit.

I actually thought of loading sample images and other information from civitai when viewing models in Toolkit, and was hoping to reach out to you for that, but I don't know if that's okay (probably isn't).

You have a great site and really contributes to making models searchable, accessible and documenting their information (triggers) where possible.

I have started something by building a file wrapper that SHOULD make it so that, when it gets read by the consumer, it offsets everything. It kind of works, right now I'm testing it with an offset of zero, so I don't have to actually offset everything, just a proof of concept. It works for the zip file loader part (i'm testing it on a ckpt renamed to .diffusion) but as soon as it gets into torch.load, I get this:

  File "D:\conda\AUTOMATIC1111\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

This is the wrapper diffusion_format.py:

import io

class DiffusionFile:

    def __init__(self, file: str) -> None:
        print(f"Opening {file}")
        self.fp = open(file, 'rb')
        self.offset = 0
        self.fp.seek(self.offset)
        self.safetensors = False

    def __enter__(self):
        print("__enter__...")
        return


    def __exit__(self):
        print("__exit__...")
        return

    def readinto(self, buffer):
        print("ReadInto...")
        return self.fp.readinto(buffer)

    def fileno(self):
        print("FlieNo...")
        return self.fp.fileno()

    def seekable(self):
        print("Seekable...")
        return True

    def peek(self, size: int):
        print("Peeking...")
        return self.fp.peek(size)

    def isatty(self):
        print("Iaattyy...")
        return self.fp.isatty()


    def detach(self):
        print("Detaching...")
        return self.fp.detach()

    def flush(self):
        self.fp.flush()
        return

    def close(self):
        print("Closing...")
        self.fp.close()
        return

    def read(self, size: int = None):
        print("Reading...")
        return self.fp.read(size)

    def read1(self, size: int = None):
        print("Reading1...")
        return self.fp.read1(size)

    def read1into1(self, buffer):
        print("Reading1...")
        return self.fp.readinto1(buffer)

    def readline(self, size: int = None): 
        print("Reading Line...")
        return self.fp.readline(size)

    def tell(self):
        print("telling...")
        return self.fp.tell() - self.offset

    def seek(self, offset: int, whence: int = 0):
        print("Seeking...")
        self.fp.seek(self.offset + offset, whence)
        return

    def writable(self):
        print("writable...")
        return False


    def readable(self):
        print("readable...")
        return True

    def getclosed(self):
        print('getclosed')
        return self.fp.closed

    def getmode(self):
        print("getmode...")
        return self.fp.mode

    def getname(self):
        print("getname...")
        return self.fp.name

    name=property(getname)

    mode=property(getmode)

    closed=property(getclosed)

def load_file(filename):
    print("Loading diffusion file...")
    return DiffusionFile(filename)

And this is where I inject it in sd_models.py

from modules import diffusion_format

...

def read_state_dict(checkpoint_file, print_global_state=False, map_location=None):
    _, extension = os.path.splitext(checkpoint_file)

    has_safetensors = False

    if extension.lower() == ".diffusion":
        checkpoint_file = diffusion_format.load_file(checkpoint_file)
        has_safetensors = checkpoint_file.safetensors
    else:
        has_safetensors = extension.lower() == ".safetensors"

    if has_safetensors:
        pl_sd = safetensors.torch.load_file(checkpoint_file, device=map_location or shared.weight_load_location)
    else:
        pl_sd = torch.load(checkpoint_file, map_location=map_location or shared.weight_load_location)

This is going way off topic of course.

We should definitely start some issue or something somewhere else, focused on making a container format and defining its spec. But where so that we can still get good visibility from other devs?

RupertAvery · 2022-12-20T18:50:29Z

See this repository for more information and as a place for discussion on the proposed container format:

https://github.com/RupertAvery/DiffusionFormat

aka7774 · 2023-01-11T18:04:10Z

hash_file = os.path.splitext(filename)[0] + '.sha256'

The two hashes are different:

a.ckpt
a.safetensors

filename + '.sha256'

a.ckpt.sha256
a.safetensors.sha256

jn-jairo · 2023-01-11T18:24:35Z

hash_file = os.path.splitext(filename)[0] + '.sha256'

The two hashes are different:

a.ckpt

a.safetensors

filename + '.sha256'

a.ckpt.sha256

a.safetensors.sha256

All this PR was made before the safetensors exist in this project, this PR isn't going anywhere, because auto didn't show interest in a new hash method, so I won't bother to update it until there is some chance of a new hash method to be accepted in this repo.

aka7774 · 2023-01-11T18:44:51Z

This extension is working well
https://github.com/aka7774/sd_infotext_ex

jn-jairo · 2023-01-14T13:30:35Z

Issue solved by a95f135

* Revert "Revert "Add Densepose (TorchScript)"" * 🐛 Fix unload

Option to create the model hash based on the entire ckpt file and sav…

110f502

…e it to a sha256 file

jn-jairo requested a review from AUTOMATIC1111 as a code owner October 13, 2022 06:37

jmpaz mentioned this pull request Oct 13, 2022

RFE: Add [model] to Directory Name Pattern #2314

Closed

Merge branch 'master' into model_hash

1d3bd26

Model hash version 2 with backward compatibility

b272e76

Merge branch 'master' into model_hash

49d0e4a

This was referenced Nov 9, 2022

[Feature Request]: Change the way the ckpt hash is defined #4478

Closed

[Bug]: Model hashes begin to repeat #4298

Closed

0xdevalias mentioned this pull request Nov 9, 2022

Not freeing RAM when changing between checkpoints #2180

Closed

jn-jairo mentioned this pull request Nov 10, 2022

Solving issue of differents ckpt with same hash (CRC solution) #4546

Closed

0xdevalias mentioned this pull request Nov 26, 2022

[Feature Request]: Support for new 2.0 models | 768x768 resolution + new 512x512 + depth + inpainting #5011

Closed

1 task

anastasiuspernat mentioned this pull request Dec 1, 2022

Add model name to output image metadata to differentiate output from trained models with same hashes #5270

Closed

bbc-mc mentioned this pull request Dec 13, 2022

Hash indexing bbc-mc/sdweb-merge-board#4

Closed

AstroOrbis approved these changes Jan 9, 2023

View reviewed changes

jn-jairo closed this Jan 14, 2023

Atry pushed a commit to Atry/stable-diffusion-webui that referenced this pull request Jul 11, 2024

Reland "Add Densepose (TorchScript)" (AUTOMATIC1111#2459)

eaa18bb

* Revert "Revert "Add Densepose (TorchScript)"" * 🐛 Fix unload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving issue of differents ckpt with same hash #2459

Solving issue of differents ckpt with same hash #2459

jn-jairo commented Oct 13, 2022

dfaker commented Oct 13, 2022 •

edited

Loading

jn-jairo commented Oct 13, 2022

d8ahazard commented Oct 13, 2022

jn-jairo commented Oct 13, 2022

dfaker commented Oct 14, 2022

AUTOMATIC1111 commented Oct 14, 2022

0xdevalias commented Oct 24, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

jn-jairo commented Oct 24, 2022

0xdevalias commented Oct 24, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

jn-jairo commented Oct 24, 2022

0xdevalias commented Oct 24, 2022

jn-jairo commented Oct 24, 2022

RupertAvery commented Nov 9, 2022 •

edited

Loading

Jonseed commented Nov 9, 2022

0xdevalias commented Nov 9, 2022 •

edited

Loading

jn-jairo commented Nov 10, 2022

RupertAvery commented Nov 12, 2022 •

edited

Loading

JustMaier commented Dec 12, 2022

RupertAvery commented Dec 14, 2022

JustMaier commented Dec 14, 2022

RupertAvery commented Dec 15, 2022 •

edited

Loading

JustMaier commented Dec 15, 2022

RupertAvery commented Dec 16, 2022

JustMaier commented Dec 18, 2022

RupertAvery commented Dec 19, 2022 •

edited

Loading

RupertAvery commented Dec 20, 2022

aka7774 commented Jan 11, 2023 •

edited

Loading

jn-jairo commented Jan 11, 2023

aka7774 commented Jan 11, 2023

jn-jairo commented Jan 14, 2023

Solving issue of differents ckpt with same hash #2459

Solving issue of differents ckpt with same hash #2459

Conversation

jn-jairo commented Oct 13, 2022

Issue

Solution

Environment this was tested in

dfaker commented Oct 13, 2022 • edited Loading

jn-jairo commented Oct 13, 2022

d8ahazard commented Oct 13, 2022

jn-jairo commented Oct 13, 2022

dfaker commented Oct 14, 2022

AUTOMATIC1111 commented Oct 14, 2022

0xdevalias commented Oct 24, 2022 • edited Loading

0xdevalias commented Oct 24, 2022 • edited Loading

jn-jairo commented Oct 24, 2022

0xdevalias commented Oct 24, 2022 • edited Loading

0xdevalias commented Oct 24, 2022 • edited Loading

jn-jairo commented Oct 24, 2022

0xdevalias commented Oct 24, 2022

jn-jairo commented Oct 24, 2022

RupertAvery commented Nov 9, 2022 • edited Loading

Jonseed commented Nov 9, 2022

0xdevalias commented Nov 9, 2022 • edited Loading

jn-jairo commented Nov 10, 2022

RupertAvery commented Nov 12, 2022 • edited Loading

JustMaier commented Dec 12, 2022

RupertAvery commented Dec 14, 2022

JustMaier commented Dec 14, 2022

RupertAvery commented Dec 15, 2022 • edited Loading

JustMaier commented Dec 15, 2022

RupertAvery commented Dec 16, 2022

JustMaier commented Dec 18, 2022

RupertAvery commented Dec 19, 2022 • edited Loading

RupertAvery commented Dec 20, 2022

aka7774 commented Jan 11, 2023 • edited Loading

jn-jairo commented Jan 11, 2023

aka7774 commented Jan 11, 2023

jn-jairo commented Jan 14, 2023

dfaker commented Oct 13, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

0xdevalias commented Oct 24, 2022 •

edited

Loading

RupertAvery commented Nov 9, 2022 •

edited

Loading

0xdevalias commented Nov 9, 2022 •

edited

Loading

RupertAvery commented Nov 12, 2022 •

edited

Loading

RupertAvery commented Dec 15, 2022 •

edited

Loading

RupertAvery commented Dec 19, 2022 •

edited

Loading

aka7774 commented Jan 11, 2023 •

edited

Loading