Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature brainstorming #4

Open
4 of 9 tasks
dmarx opened this issue Apr 9, 2022 · 6 comments
Open
4 of 9 tasks

Feature brainstorming #4

dmarx opened this issue Apr 9, 2022 · 6 comments

Comments

@dmarx
Copy link
Owner

dmarx commented Apr 9, 2022

  • loss scoring
  • multi-perceptor
  • weighted multi-perceptor
  • cutout methods? + augs? make that an independent library maybe?
  • perceptor weight interpolations/schedules - https://discord.com/channels/729741769192767510/730484623028519072/956979309686423602
  • API should be agnostic wrt media type, i.e. contrasting modalities could both be text, or one be audio and other video, etc.
  • optionally augment w positional information/embeddings?
  • Maybe some minimal translation API to facilitate use by non-english users and conversely support for non-english models
  • Check for installed/available CLIP, use vendored if not available
@dmarx
Copy link
Owner Author

dmarx commented Apr 9, 2022

let's not boil the ocean. goals for MVP:

  • centralize stuff
  • simplify downloading and installing
    • assuming target use case is notebooks: don't even need to worry about updating or checking for existing models for immediate use case

MVP is basically just a git clone --recurse-submodules with maybe a few bells and whistles.

@dmarx
Copy link
Owner Author

dmarx commented Apr 9, 2022

imagining usage...

import perceptors as pct

pct.available_models() # list all models
pct.available_models('clip') # pattern match

clip_rn50 = pct.Perceptor('clip_rn50') # load a model
clip_vit16 = pct.Perceptor('clip_vit16') # load another

# combine models for multi-clip
multi_clip = clip_rn50 + clip_vit16

# adjust model-specific weight
multi_clip.set_weight('clip_vit16', .1) # set weight by name
multi_clip.set_weight(0, .5) # set weight by index

# manage models
multi_clip += pct.Perceptor('clip_rn101') # add another model algebraically
multi_clip.bind('clip_vit32') # add another clip model by name
multi_clip.unbind('clip_vit16') # dissociate a bound model by name

text = clip_rn50.tokenize_text('foo bar')
text_emb = clip_rn50.embed_text('foo bar')

img_emb = clip_rn50.embed_image('path/to/image')
img_emb = clip_rn50.embed_image(img: torch.Tensor)
img_emb = clip_rn50.embed_image(img: PIL.Image)

multi_clip.embed_text('foo bar')
multi_clip.embed_image(img: ...)

@apolinario
Copy link
Collaborator

apolinario commented Apr 19, 2022

One small issue people had when they were adding SLIP to many different text-to-image notebooks and code-bases was that the input resolution wasn't part of the model

So you see things like this on Disco Diffusion for e.g.:

#when using SLIP Base model the dimensions need to be hard coded to avoid AttributeError: 'VisionTransformer' object has no attribute 'input_resolution'
          try:
                input_resolution=model_stat["clip_model"].visual.input_resolution
          except:
                input_resolution=224

I feel having a default but user-changeable input resolution per model model if the model itself doesn't present one could be part of the feature-list

@dmarx
Copy link
Owner Author

dmarx commented Apr 19, 2022

100%, I already encountered this issue with other CLIP providers too. I tracked down the code snippet in the original openai release that calculates this, but I like the idea of a default attribute too

@apolinario
Copy link
Collaborator

Another point in reference to usage:
I feel there could be two ways of using it. One way very similar to how you wrote at imagining usage, but the other I feel it could be identical to OpenAI's CLIP. It may be the case that this wouldn't allow for some of the fancy combinations of perceptors (although I feel this could be bridged), but on the other hand this would allow for a snappy adoption.

Someone could just replace the from CLIP import clip to from mmc import clip and everything would work automatically with a bunch of more perceptors out of the box. Could be a way to entry to then say "hey now that you are using this library, why not replace your custom multi-perceptor code with this one"

@dmarx
Copy link
Owner Author

dmarx commented Apr 30, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants