-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AudioLDM2模型复现前向推理 #366
AudioLDM2模型复现前向推理 #366
Conversation
Thanks for your contribution! |
文件数量较多,有些代码文件和套件重复。可直接import。例如gpt2, latent_encoder, unet文件夹等 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种config的文件,无需上传,统一配置成from_pretrained()这样的接口,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请提供前向推理的对齐结果,可以是结果文件,或输出tensor对齐截图等 |
inference sample results.zip |
因为原作的这些模型跟套件已有的模型结构和推理过程有一些区别,比如这里面的roberta-base 和 gpt2 我尽可能简化一下吧 |
转换的参数文件 (model_state.pdparams) 和 config 文件:https://aistudio.baidu.com/datasetdetail/252967 |
如果模型结构没差别,推理过程有差异,可以只重写forward,参考https://github.com/PaddlePaddle/PaddleMIX/blob/develop/paddlemix/models/qwen_vl/modeling.py#L101 |
网络定义的名字不完全对齐。建议和套件已有的模型对齐。如果有和已有模型不一样的结构,需重写forward函数 |
好的,我再改一下 |
LGTM,辛苦update到最新的paddlemix,让ci跑过后合入 |
好的,autoencoder和unet也需要再改一下,今天应该能改好 |
修改后的参数和config: https://aistudio.baidu.com/datasetdetail/257191 |
```bash | ||
python run_predict.py \ | ||
--text "Musical constellations twinkling in the night sky, forming a cosmic melody." \ | ||
--model_name_or_path "/home/aistudio/data/data252967" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不要用具体的路径
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个examples的__init__.py可以去掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
# = sum_i x[i] sinc(pi * orig_freq * ((i - orig_freq) / orig_freq - j / new_freq)) | ||
# = sum_i x[i + orig_freq] sinc(pi * orig_freq * (i / orig_freq - j / new_freq)) | ||
# so y[j+new_freq] uses the same filter as y[j], but on a shifted version of x by `orig_freq`. | ||
# This will explain the F.conv1d after, with a stride of orig_freq. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些注释可以酌情删掉一些
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
|
||
# self.time_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["time_pooling_factors"]) | ||
# self.freq_pool = max(self.cond_stage_config["crossattn_audiomae_pooled"]["params"]["freq_pooling_factors"]) | ||
# self.mae_token_num = int(512/(self.time_pool*self.freq_pool)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
cond_dict = self.get_input(batch) | ||
|
||
# self.model.train() | ||
# print("!!!!!!!!!!!!!train") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
@@ -0,0 +1,5 @@ | |||
librosa | |||
ppdiffusers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppdiffusers这个包可以不放在这里,在README.md文档里引导到这里安装就行https://github.com/PaddlePaddle/PaddleMIX/blob/develop/README.md?plain=1#L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
重新给了一些小修的comment。另外ci一直未过,是否基于最新的paddlemix提的pr。可重点对比一下这个脚本单测,https://github.com/PaddlePaddle/PaddleMIX/blob/develop/tests/models/test_minigpt4.py#L561 。 |
感谢review。刚才git pull --rebase了,本地分支这里应该已经是最新的 ci里面报错的 tests.models.test_minigpt4.MiniGPT4VisionModelTest 的 test_save_load 是调用了父类 ModelTesterMixin 的方法,是不是应该在 MiniGPT4VisionModelTest 类里面把 test_save_load 重写成 pass |
后面我们再统一查一下,当前ci不过不影响合入 |
任务:PaddlePaddle#250 - text-to-audio推理已跑通
任务:#250