Skip to content

Splitting #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,5 @@ dmypy.json
.pyre/

.idea
.DS_Store
proxies_extension.zip
41 changes: 31 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,19 @@ This project automates solving Google reCAPTCHA v2 with image challenges (3x3 an

## Features

- Uses **Selenium WebDriver** to interact with the browser and manipulate elements on the reCAPTCHA page.
- **2Captcha API** helps solve image-based captchas using artificial intelligence.
- **Selenium WebDriver**: Interacts with the browser and manipulates elements on the reCAPTCHA page.
- **2Captcha API**: Solves image-based captchas using artificial intelligence.
- Handles both **3x3** and **4x4** captchas with custom logic for each.
- Tracks image updates and handles captcha error messages efficiently.
- Modular design with separated logic into helper classes for easy code maintenance and future expansion.
- Tracks image updates and handles captcha error messages efficiently using custom error handling.

## Code Structure

The project is structured as follows:

- **`utils/actions.py`**: Contains the `PageActions` class, which encapsulates common browser actions (clicking, switching frames, etc.).
- **`utils/helpers.py`**: Contains the `CaptchaHelper` class, responsible for solving captchas, executing JS, and handling captcha error messages.
- **`js_scripts/`**: JavaScript files that extract captcha data and track image updates.

## Usage

Expand Down Expand Up @@ -43,16 +52,28 @@ python solve_recaptcha.py

## How It Works

1. Browser Initialization: A browser is opened using Selenium WebDriver.
2. Captcha Data Retrieval: JavaScript extracts the image tiles from reCAPTCHA and sends them to the 2Captcha service for solving.
3. Captcha Submission: Once a solution is received from 2Captcha, Selenium simulates clicking on the correct image tiles based on the solution.
4. Captcha Submission: The solution is submitted once the captcha is solved.
1. **Browser Initialization:** A browser is opened using Selenium WebDriver.
2. **Captcha Data Retrieval:** JavaScript extracts the image tiles from reCAPTCHA and sends them to the 2Captcha service for solving.
3. **Captcha Submission:** Once a solution is received from 2Captcha, Selenium simulates clicking on the correct image tiles based on the solution.
4. **Final Submission:** The solution is submitted once the captcha is solved.

## Captcha Solving Logic

- For 3x3 captchas, the previous captcha ID (previousID) is saved to speed up solving when images are updated.
- For 4x4 captchas, no previousID is saved, and each solution is processed from scratch.
- Error messages, such as “Please try again” are handled, and the solving process is retried if needed.
- **3x3 Captchas:** Previous captcha ID (previousID) is saved to speed up solving when images are updated.
- **4x4 Captchas:** No previousID is saved, and each solution is processed from scratch.
- **Error Handling:** Messages like “Please try again” are handled, and the solving process is retried if needed.

## Modular Design

The project follows a modular design for better maintainability:

- **PageActions Class:** Handles general browser interactions like switching to iframes, clicking elements, and returning focus to the main content.
- **CaptchaHelper Class:** Encapsulates captcha-specific logic, such as solving the captcha via 2Captcha API, handling error messages, and executing JavaScript in the browser.

## JavaScript Scripts

- `get_captcha_data.js`: Extracts captcha image tiles for solving. The source code of the script is located here https://gist.github.com/kratzky/20ea5f4f142cec8f1de748b3f3f84bfc
- `track_image_updates.js`: Monitors requests to check if captcha images are updated.

<!-- Shared links -->
[2captcha-demo]: https://2captcha.com/demo
Expand Down
53 changes: 53 additions & 0 deletions js_scripts/get_captcha_data.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
window.getCaptchaData = () => {
return new Promise((resolve, reject) => {
let canvas = document.createElement('canvas');
let ctx = canvas.getContext('2d');
let comment = document.querySelector('.rc-imageselect-desc-wrapper').innerText.replace(/\n/g, ' ');

let img4x4 = document.querySelector('img.rc-image-tile-44');
if (!img4x4) {
let table3x3 = document.querySelector('table.rc-imageselect-table-33 > tbody');
if (!table3x3) {
reject('Can not find reCAPTCHA elements');
}

let initial3x3img = table3x3.querySelector('img.rc-image-tile-33');

canvas.width = initial3x3img.naturalWidth;
canvas.height = initial3x3img.naturalHeight;
ctx.drawImage(initial3x3img, 0, 0);

let updatedTiles = document.querySelectorAll('img.rc-image-tile-11');

if (updatedTiles.length > 0) {
const pos = [
{ x: 0, y: 0 }, { x: ctx.canvas.width / 3, y: 0 }, { x: ctx.canvas.width / 3 * 2, y: 0 },
{ x: 0, y: ctx.canvas.height / 3 }, { x: ctx.canvas.width / 3, y: ctx.canvas.height / 3 }, { x: ctx.canvas.width / 3 * 2, y: ctx.canvas.height / 3 },
{ x: 0, y: ctx.canvas.height / 3 * 2 }, { x: ctx.canvas.width / 3, y: ctx.canvas.height / 3 * 2 }, { x: ctx.canvas.width / 3 * 2, y: ctx.canvas.height / 3 * 2 }
];
updatedTiles.forEach((t) => {
const ind = t.parentElement.parentElement.parentElement.tabIndex - 3;
ctx.drawImage(t, pos[ind - 1].x, pos[ind - 1].y);
});
}
resolve({
rows: 3,
columns: 3,
type: 'GridTask',
comment,
body: canvas.toDataURL().replace(/^data:image\/?[A-z]*;base64,/, '')
});
} else {
canvas.width = img4x4.naturalWidth;
canvas.height = img4x4.naturalHeight;
ctx.drawImage(img4x4, 0, 0);
resolve({
rows: 4,
columns: 4,
comment,
body: canvas.toDataURL().replace(/^data:image\/?[A-z]*;base64,/, ''),
type: 'GridTask'
});
}
});
};
22 changes: 22 additions & 0 deletions js_scripts/track_image_updates.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
window.monitorRequests = () => {
let found = false;

const observer = new PerformanceObserver((list) => {
const entries = list.getEntries();
entries.forEach((entry) => {
if (entry.initiatorType === 'xmlhttprequest' || entry.initiatorType === 'fetch') {
const url = new URL(entry.name);
if (url.href.includes("recaptcha/api2/replaceimage")) {
found = true; // If the request is found, set the flag to true
}
}
});
});

observer.observe({ entryTypes: ['resource'] });

// We return the result after 10 seconds
return new Promise((resolve) => {
setTimeout(() => resolve(found), 10000);
});
};
139 changes: 139 additions & 0 deletions main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
import time
from selenium import webdriver
import os
from twocaptcha import TwoCaptcha
from utils.actions import PageActions
from utils.helpers import CaptchaHelper

# CONFIGURATION
url = "https://2captcha.com/demo/recaptcha-v2"
apikey = os.getenv('APIKEY_2CAPTCHA') # Get the API key for the 2Captcha service from environment variables
solver = TwoCaptcha(apikey)

# LOCATORS
l_iframe_captcha = "//iframe[@title='reCAPTCHA']"
l_checkbox_captcha = "//span[@role='checkbox']"
l_popup_captcha = "//iframe[contains(@title, 'two minutes')]"
l_verify_button = "//button[@id='recaptcha-verify-button']"
l_submit_button_captcha = "//button[@type='submit']"
l_try_again = "//div[@class='rc-imageselect-incorrect-response']"
l_select_more = "//div[@class='rc-imageselect-error-select-more']"
l_dynamic_more = "//div[@class='rc-imageselect-error-dynamic-more']"
l_select_something = "//div[@class='rc-imageselect-error-select-something']"

# MAIN LOGIC
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})

with webdriver.Chrome(options=options) as browser:
browser.get(url)
print("Started")

# Instantiate helper classes
page_actions = PageActions(browser)
captcha_helper = CaptchaHelper(browser, solver)

# We start by clicking on the captcha checkbox
page_actions.switch_to_iframe(l_iframe_captcha)
page_actions.click_checkbox(l_checkbox_captcha)
page_actions.switch_to_default_content()
page_actions.switch_to_iframe(l_popup_captcha)
time.sleep(1)

# Load JS files
script_get_data_captcha = captcha_helper.load_js_script('js_scripts/get_captcha_data.js')
script_change_tracking = captcha_helper.load_js_script('js_scripts/track_image_updates.js')

# Inject JS once
captcha_helper.execute_js(script_get_data_captcha)
captcha_helper.execute_js(script_change_tracking)

id = None # Initialize the id variable for captcha

while True:
# Get captcha data by calling the JS function directly
captcha_data = browser.execute_script("return getCaptchaData();")

# Forming parameters for solving captcha
params = {
"method": "base64",
"img_type": "recaptcha",
"recaptcha": 1,
"cols": captcha_data['columns'],
"rows": captcha_data['rows'],
"textinstructions": captcha_data['comment'],
"lang": "en",
"can_no_answer": 1
}

# If the 3x3 captcha is an id, add previousID to the parameters
if params['cols'] == 3 and id:
params["previousID"] = id

print("Params before solving captcha:", params)

# Send captcha for solution
result = captcha_helper.solver_captcha(file=captcha_data['body'], **params)

if result is None:
print("Captcha solving failed or timed out. Stopping the process.")
break

# Check if the captcha was solved successfully
elif result and 'no_matching_images' not in result['code']:
# We save the id only on the first successful iteration for 3x3 captcha
if id is None and params['cols'] == 3 and result['captchaId']:
id = result['captchaId'] # Save id for subsequent iterations

answer = result['code']
number_list = captcha_helper.pars_answer(answer)

# Processing for 3x3
if params['cols'] == 3:
# Click on the answers found
page_actions.clicks(number_list)

# Check if the images have been updated
image_update = page_actions.check_for_image_updates()

if image_update:
# If the images have been updated, continue with the saved id
print(f"Images updated, continuing with previousID: {id}")
continue # Continue the loop

# Press the check button after clicks
page_actions.click_check_button(l_verify_button)

# Processing for 4x4
elif params['cols'] == 4:
# Click on the answers found and immediately press the check button
page_actions.clicks(number_list)
page_actions.click_check_button(l_verify_button)

# After clicking, we check for errors and image updates
image_update = page_actions.check_for_image_updates()

if image_update:
print(f"Images updated, continuing without previousID")
continue # Continue the loop

# If the images are not updated, check the error messages
if captcha_helper.handle_error_messages(l_try_again, l_select_more, l_dynamic_more, l_select_something):
continue # If an error is visible, restart the loop

# If there are no errors, send the captcha
page_actions.switch_to_default_content()
page_actions.click_check_button(l_submit_button_captcha)
break # Exit the loop if the captcha is solved

elif 'no_matching_images' in result['code']:
# If the captcha returned the code "no_matching_images", check the errors
page_actions.click_check_button(l_verify_button)
if captcha_helper.handle_error_messages(l_try_again, l_select_more, l_dynamic_more, l_select_something):
continue # Restart the loop if an error is visible
else:
page_actions.switch_to_default_content()
page_actions.click_check_button(l_submit_button_captcha)
break # Exit loop

time.sleep(10)
Loading