Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bitbucket] Add bitbucket backend to perceval #653

Closed
wants to merge 1 commit into from

Conversation

imnitishng
Copy link
Contributor

This PR adds Bitbucket backend to Perceval.
Issues and Pull requests from bitbucket repositories can be extracted with this iteration.
Based on the discussions here #367

Signed-off-by: Nitish Gupta imnitish.ng@gmail.com

Signed-off-by: Nitish Gupta <imnitish.ng@gmail.com>
@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 23, 2020

Hi @valeriocos, here is the first iteration for the backend.
Bitbucket has a different way of handling authorization and retrieving data from access tokens.

In order to use the backend we need 3 tokens,

  • Bitbucket Client ID / Key (obtained from bitbucket)
  • Bitbucket Secret Key (obtained from bitbucket)
  • Refresh Token (obtained via CURL instructions given below)

Using these 3 keys, the access token is extracted which is used to make API requests.
Access tokens expire in 2 hours (ref), hence the process to create a new one is mandatory.

For getting refresh token from bitbucket one needs to follow these steps -

  • In your browser go to
    https://bitbucket.org/site/oauth2/authorize?client_id={client_id}&response_type=code
    (replace {client_id} with the key from bitbucket)
  • Authorize under your bitbucket account.
  • After that, your browser will be redirected to
    {your_redirect_link}/?code={code}
    (save this {code} this is what we need)
  • After this we need to CURL filling in our details -
    curl -X POST -u {client_id}:{secret_id} https://bitbucket.org/site/oauth2/token -d grant_type=authorization_code -d code={code}
    (use the above extracted code in place of {code})
  • The response will be a JSON containing out refresh token.

Usage -
perceval bitbucket owner repo --category issue -c {client_id} -s {secret_id} -r {refresh_token}
I tested on this public repo.
categories - issue, pull_request
Will add all this info in README once we finalize the PR

@valeriocos
Copy link
Member

Thank you @imnitishng for the PR and details

Using these 3 keys, the access token is extracted which is used to make API requests.
Access tokens expire in 2 hours (ref), hence the process to create a new one is mandatory.

This may be a really big problem. Is there any other way to bypass the 2 hours limit?
I see that the reference you posted is about OAuth on Bitbucket Cloud, however Bitbucket server seems to be less restrictive.

What's the difference between Bitbucket Cloud and Bitbucket server?
Are you targetting a specific version of the Bitbucket API?

Thanks!

@vchrombie
Copy link
Member

Hi @imnitishng
Nice work.

I just have a few questions.

I understand that you need 3 tokens to fetch the data.

  • client id
  • secret key
  • refresh token

Using these 3 keys, the access token is extracted which is used to make API requests.

Do you mean there will be a new token?

Access tokens expire in 2 hours (ref), hence the process to create a new one is mandatory.

Will the 3 tokens gonna expire or only the refresh token?

For getting refresh token from bitbucket one needs to follow these steps -

  • In your browser go to
    https://bitbucket.org/site/oauth2/authorize?client_id={client_id}&response_type=code
    (replace {client_id} with the key from bitbucket)
  • Authorize under your bitbucket account.
  • After that, your browser will be redirected to
    {your_redirect_link}/?code={code}
    (save this {code} this is what we need)
  • After this we need to CURL filling in our details -
    curl -X POST -u {client_id}:{secret_id} https://bitbucket.org/site/oauth2/token -d grant_type=authorization_code -d code={code}
    (use the above extracted code in place of {code})
  • The response will be a JSON containing out refresh token.

Is it possible to automate this process?

@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 23, 2020

@valeriocos

Is there any other way to bypass the 2 hours limit?

No they are restrictions set by Bitbucket. They don't use API tokens, instead Bearer tokens are used.
I don't think 2 hour expiry is a problem.
We already have the 3 necessary tokens and everytime perceval is called we create a new access token to interact with bitbucket. It is highly unlikely that perceval will run for more that 2 hours. It will exhaust the rate limit restrictions a lot sooner. (see here for limits). So essentially we always start data retrieval with a fresh token which is automatically generated, no end user participation is needed except for providing 3 tokens.

however Bitbucket server seems to be less restrictive.

I'm sorry I did not have a look Bitbucket server yet, I will have a look. I followed this.
I went through it, and believe that Server APIs cannot be used for the publically hosted repositories under bitbucket.org(bitbucket cloud). We need to use Cloud APIs for this. Let me know if you discover something else.

What's the difference between Bitbucket Cloud and Bitbucket server?

Bitbucket server is another service offered by Atlassian. Bitbucket Server is hosted on-premise while Bitbucket Cloud is hosted on Atlassian’s servers and accessed via a URL (just like github). Bitbucket Server is some other paid service with extra features which I don't think we need to target since their APIs won't work with Bitbucket Cloud repositories.

Are you targetting a specific version of the Bitbucket API?

I am using the APIs which are meant to communicate with repositories hosted on bitbucket.org. The one you specified seems to be used to communicate with Bitbucket server hosted projects.
If we go to Bitbucket server documentation, there is no mention of communication with publically hosted repositories . While Bitbucket Cloud does mention all that for 3rd party development purpose (ref) Which I believe we should be using because Bitbucket Server REST APIs in look to me like they're used for local development or interaction (ref).

@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 23, 2020

Hi @vchrombie, thanks.

Do you mean there will be a new token?

Yes! It is the access token without it we cannot communicate. No other token can be used for communication.

Will the 3 tokens gonna expire or only the refresh token?

None of the 3 tokens mentioned will expire.
Only the access token we derive will expire in 2 hours. Which will be renewed on every run.

Is it possible to automate this process?

I tried to do it, but it is not possible since

  • Authorize under your bitbucket account.

we need to grant a yes permission for authorization in the browser at this step. Just like granting OAuth permission for GitHub. So as far as I know it is only possible manually.

@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 23, 2020

Here is the part where access token is extracted from all the 3 tokens. Please have a look.
def _extract_access_token(self, client_id, secret_id, refresh_token)

@vchrombie
Copy link
Member

Hi @imnitishng
Thanks for the detailed reply.

I have a few more confirmations from my side.

We already have the 3 necessary tokens and everytime perceval is called we create a new access token to interact with bitbucket.

Yes! It is the access token without it we cannot communicate. No other token can be used for communication.

None of the 3 tokens mentioned will expire.
Only the access token we derive will expire in 2 hours. Which will be renewed on every run.

So, we should just provide the 3 keys, the access token will be generated every time you execute the backend (bitbucket.py#L447) which is valid for 2 hours. So, the next time you are executing the backend, a new access token is generated. So there will be a unique access token for every execution of Perceval backend.

It is highly unlikely that perceval will run for more that 2 hours.

Reading this gives me another doubt. I am just curious to know what happens if the fetching takes more than 2 hours?

What's the difference between Bitbucket Cloud and Bitbucket server?

Bitbucket server is another ...

Is it something like gitlab?
self-hosted like https://invent.kde.org/
and under gitlab like https://gitlab.com/amfoss

Is it possible to automate this process?

we need to grant a yes permission for authorization in the browser at this step. Just like granting OAuth permission for GitHub. So as far as I know it is only possible manually.

Thanks for the clarification. Anyways since it is just for once, I think it is fine. I thought this would be the one expiring. I misunderstood, sorry.
Maybe selenium can help you. Just have a look when you are free.

@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 24, 2020

So, we should just provide the 3 keys, the access token will be generated every time you execute the backend (bitbucket.py#L447) which is valid for 2 hours. So, the next time you are executing the backend, a new access token is generated. So there will be a unique access token for every execution of Perceval backend.

Yes, you are correct.

Reading this gives me another doubt. I am just curious to know what happens if the fetching takes more than 2 hours?

401 response will be returned and Perceval run ends.

Is it something like gitlab?
self-hosted like https://invent.kde.org/
and under gitlab like https://gitlab.com/amfoss

Well I'm not sure about that. Out of all the info I've found out about Bitbucket Server, I didn't see anywhere that bitbucket server projects are available on the Web. They are hosted in-house and available for the team of people working only. (not opensource)
Only Bitbucket cloud projects are available for everyone to explore on bitbucket.org.

Maybe selenium can help you. Just have a look when you are free.

Sure, thanks.

@valeriocos
Copy link
Member

valeriocos commented Apr 24, 2020

Please keep in mind that Perceval has been conceived to do one thing only, which is fetching data from software repositories. A backend that needs to autogenerate a token doesn't seem to fit in this schema. Furthermore, a token that expires in 2 hours doesn't seem to have been designed to fetch data from a repository, instead its use is probably meant to address other needs (e.g., pushing commits, merging pull requests, etc.).

Before evaluating the next development tasks, it would be convenient to:

  • check token usage policy and restrictions.
  • check if the API(s) can be queried using user username/password of the user.
  • know the difference between the two APIs (do they have the same endpoints, do they return the same information).
  • check how to grant a yes permission from low-level librarires (e.g., requests). Selenium (and other similar tools) doesn't seem a viable option since (i) it adds an additional level of complexity in the code, thus more points of failure, (ii) it may require some tuning in the server where the backend runs.

Thanks!

@vchrombie
Copy link
Member

  • check how to grant a yes permission from low-level librarires (e.g., requests). Selenium (and other similar tools) doesn't seem a viable option since (i) it adds an additional level of complexity in the code, thus more points of failure, (ii) it may require some tuning in the server where the backend runs.

Okay. I suggested it so that it can be a individual script, not included in perceval. Like a gist just to extract the refresh token.

Other than that.

#653 (comment)

Agreed, thanks.

@imnitishng
Copy link
Contributor Author

imnitishng commented Apr 24, 2020

Hi @valeriocos I understand your concern here is some more info.

A backend that needs to autogenerate a token

I believe this should be seen as a step to choose the token for communication, similar to the step done in Github backend

def _choose_best_api_token(self):
for choosing the best token out of several tokens. (instead there is just one token here extracted by querying bitbucket using the 3 user authorization tokens we have)

check token usage policy and restrictions.

Access tokens are generated from the Key, Secret and Refresh tokens.
Here is a snippet of the usage scopes Bitbucket allows for users authenticating using these keys.
Screenshot from 2020-04-24 13-33-07
So they serve both read and write purposes. There are no restrictions for reading repository data.

The final keys obtained will be these (they won't expire) (deleted before posting)
Screenshot from 2020-04-24 13-33-16

For rate limit restrictions, we get 1000 queries per hour for any access to URLs under api.bitbucket.org/2.0/repositories/* (more details here)

check if the API(s) can be queried using user username/password of the user.

Coming to the authentication problem, Bitbucket allows 3 ways to authenticate users -
Basic
Basic authentication can be used, querying bitbucket using username and passwords like
https://username:password@api.bitbucket.org/2.0/repositories/imnitish/libqxt/issues works.
So we can use basic authentication if you don't like the idea of extracting access tokens.
A drawback to basic authentication is that it cannot be used on accounts with 2-factor-auth / 2-step-verification.

API Key
API keys are supported but meant for only team accounts, so an individual user like us cannot get an API key.

OAuth2
This is the secure way bitbucket reccomends for authentication, and this is the way I implemented the backend, it uses access code flow based on RFC-6749.
This picture might make things clear.
Screenshot from 2020-04-24 13-36-40
This is the same thing as authenticating with a username(client ID) and password(secret key) but with an added step of generating access token.

Find more info about authentication with Bitbucket API here.

know the difference between the two APIs (do they have the same endpoints, do they return the same information).

They both are different APIs meant for different products offered by Atlassian.

Bitbucket Cloud is the product we are targeting. Bitbucket Cloud is an opensource platform hosting projects on the web accessed via URLs. (Eg.- https://bitbucket.org/libqxt/libqxt/)
Bitbucket Server APIs cannot be used for Bitbucket Cloud.

Bitbucket Server is a different product, which has self-hosted projects among professional teams, the APIs used for those projects hosted under custom domains like http://example.com/rest/access-tokens/1.0/users/{userSlug}, not under bitbucket.org.

I hope we are clear on the point that Bitbucket server and Bitbucket Cloud are 2 different products with 2 completely different APIs, that cannot be interchanged with one another.
I know this is a point raised here #367 (comment). I believe we can add Bitbucket server support later as a different backend named bitbucketserver in perceval, independent of this backend.

@imnitishng
Copy link
Contributor Author

We can also remove all this complexity and provide instructions to the user on how to generate the access tokens, but they will have to do it every time they run perceval (if they run it after the 2 hour window).
But Bitbucket access tokens, are pretty large, here is an expired one

access_token=
9q1wWP3IHN2IuHp427CCADhsDToG9ArZoHY741OnQLOaFRw1zse-v9mFXfKbOkLwE5_DM5JHZJJ0EPFGUp814GIqOW7PK-9XvEf_QkKEFBLU5kyWfOUZnICT

so it might not be a good option to go with.

@imnitishng
Copy link
Contributor Author

Any instructions on this @valeriocos?
If you want then I can submit a new PR with username and password authentication method as you suggested.

@vchrombie
Copy link
Member

Hi @imnitishng, thanks for the work on the bitbucket backend. Do you think you can move the work to a new repository?

Supporting this #645 (comment), it would be great if you can create a new repository and maintain this backend.

Recently, I moved the Zulip Backend to a new repo. The workflows, documentation, and integrations are everything are in place. You can use it as a template repository and create the project grimoirelab-perceval-bitbucket. What do you think.? Please let me know if you need any help.

Zulip Backend for Perceval: https://github.com/vchrombie/grimoirelab-perceval-zulip

@vchrombie
Copy link
Member

Adhering to the contributing guidelines (#incubating-repositories) and after communicating with @imnitishng, this work is moved to a separate repository perceval-backends/grimoirelab-perceval-bitbucket and will be maintained over there for some time. We can open this PR again and update it when we want to merge this backend into the main module.

I'll be opening a PR adding the repo to the readme. The pull request will be closed soon if no one has any objection.

Best,
Venu

@vchrombie
Copy link
Member

Closing this PR in favour of #653 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants