Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading from Myportfolio #95

Closed
RonaldBolado opened this issue Jul 14, 2018 · 7 comments
Closed

Downloading from Myportfolio #95

RonaldBolado opened this issue Jul 14, 2018 · 7 comments

Comments

@RonaldBolado
Copy link

RonaldBolado commented Jul 14, 2018

Hello mikf, this is more of a question than an issue, since github lacks the basic functionality of private messaging. Don't know if you'd wanna help, but doesn't hurt to ask.

Do you have any idea how I could download original images from Adobe's myportfolio.com? I want to download the originals from this gallery https://hannahcosgrove.myportfolio.com/chloe but I can't crack the URL no matter what I do.

For example, the fourth image's URL in that gallery is:

https://pro2-bar.myportfolio.com/v1/assets/fd9fa03abcd36223281f9896f812fb93/aa98eca7-087d-42b6-bfc4-a8b2b0d3a3e2_rw_600.jpg?h=cc86793c96bdf2d8ebd2b9f345e2ebd5

But through Google images I was able to find a higher res version of it:

https://pro2-bar-s3-cdn-cf.myportfolio.com/fd9fa03abcd36223281f9896f812fb93/b370dc78-d71f-45d8-8eb9-7a62383f3d18_rwc_0x781x3289x3289x3289.jpg?h=2f2696760729f970b4a86b5112f5c22c

It looks like the URL has two IDs (separated by "?h=") that have to match in order to display the original image, I don't know, and the image's filename inside the server has some funny information (0x781x3289x3289x3289.jpg) and I don't know how to fetch the original. Also, the one I got through google images is cropped as you will notice, so it's in the original resolution, but cropped. So there could be cropped and uncropped versions in different resolutions you can get through the right URL?

Since you made this software for scraping images, I figured you'd have an insight or two. Do you have a clue as to how to download the original pics?

Thanks a lot.

@mikf
Copy link
Owner

mikf commented Jul 14, 2018

Hmm, I managed to find the original URLs. Here is the one for the 4th picture:
https://pro2-bar-s3-cdn-cf6.myportfolio.com/fd9fa03abcd36223281f9896f812fb93/aa98eca7-087d-42b6-bfc4-a8b2b0d3a3e2.jpg?h=0c8b03bdd125d846670e88c4e7e7b34c

As you might notice, this URL has no _rw_600, _rw_1200 or whatever at its end, but you have to find the correct ?h=... value. I was about to write a big paragraph about how this is probably a MD5 hex-digest and so on, but, as it turns out, the whole site is just one big Lightbox gallery and all the original image URL are in the data-src attribute of a few div elements:

    <div class="js-lightbox" data-src="https://pro2-bar-s3-cdn-cf5.myportfolio.com/fd9fa03abcd36223281f9896f812fb93/6ba271fa-9fef-457d-9857-41208fc2f2f7.jpg?h=c1ba7d8ec8cdeb83afb3af2d5ac0bb18">

The higher res version you found with Google images is the <meta property=og:image ...> value of the site.

Should I add a generic lightbox extractor to gallery-dl? Wouldn't be too hard if they are all built like this one.

@RonaldBolado
Copy link
Author

RonaldBolado commented Jul 14, 2018

Hey thanks a lot for the information, I was able to download all of them, except for the first and the last one. The links I find under the "project-modules" div for the first and last pics are the lower res ones. Can you check if I'm doing something wrong? For the first image, I'm getting this link: https://pro2-bar.myportfolio.com/v1/assets/fd9fa03abcd36223281f9896f812fb93/09154b97-ca07-4a78-8523-3343902e16c5.jpg?h=f9501a25640e45395f79b0fcea9c5fc9 I think they for some reason uploaded the low res one, which is a bummer, I really wanted the first one in native res.

If you can, do add, there's some stuff there I'd like to have downloaded.

@Hrxn
Copy link
Contributor

Hrxn commented Jul 14, 2018

Should I add a generic lightbox extractor to gallery-dl? Wouldn't be too hard if they are all built like this one.

Sounds interesting. There are probably lots of different JS implementations for stuff like this, but they all work in a similar fashion. With the odd exception here and there, maybe..

A nice little overview of sites that use such a lightbox gallery would be nice, so that there could be some estimation possible about how far such a feature would get.

OP mentioned that myportfolio.com is from Adobe. I don't see any connection to Behance (https://www.behance.net/), although that is owned by Adobe as well, and is kinda big. Lots of image galleries, definitely. And their lightbox seems to be pretty similar.

Thinking about that, the real crux here is probably only accessing the right element in the HTML DOM tree. But that's what CSS Selectors (or XPath) are for, this should also be relatively straightforward with Python, I think. And such selectors could easily be added to gallery-dl.conf for any site, I was imaging something like this:

[
    {
      "custom_site_1": "selector_string",
    },
    {
      "custom_site_2: ...
    }
     ... and so on...
]

😅

@mikf
Copy link
Owner

mikf commented Jul 14, 2018

@RonaldBolado
No, no are not doing anything wrong. That is the link to the best/highest-res version available. The first and last images have, for whatever reason, a lower resolution than the other four.
If you take another look at the HTML source, you can see that there are only 600w and 1200w versions for the first and last one, but also a 1920w version for images 2 to 5.

@Hrxn
https://www.behance.net/'s implementation of Lightbox is quite different from the one at myportfolio.com, so maybe this is going to be a bit harder than anticipated. Maybe taking a look at the lightbox source code itself might help.

@Hrxn
Copy link
Contributor

Hrxn commented Jul 14, 2018

I admit, I did not really look too closely. The purpose was primarily to find out if there is actually a lightbox at all that is part of the normal HTML of the site, and not something that is hidden behind JS, React like.

@RonaldBolado
Copy link
Author

@mikf

So implementing Behance's would be way too difficult?

@mikf
Copy link
Owner

mikf commented Jul 19, 2018

No, that's not what I meant. I hoped to be able to implement some kind of generalized solution for what I believed to be (official) Lightbox galleries, but 1) both sites differ quite a bit, which makes generalizing harder, and 2) the system they use, as it turns out, has nothing to do Lightbox except the class-names of a few HTML elements.

Anyway, I added support for myportfolio.com galleries and users, and will probably do behance.net tomorrow. For domain names not looking like <user>.myportfolio.com, you can add myportfolio: in front of the whole URL; for example myportfolio:https://tooco.com.ar/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants