Optical character recognition WITHOUT IA, of Auction house inside Wakfu. Project is discontinued because I believe it would make the game worse for most if it became easy to retrieve data from auction house for everyone. If you compare to MMORPG like WoW, there is and never was an API to retrieve the auction house provided by the game.
Keep in mind this is being written in 2025, the project is from 2019.
In time, I tried different OCR based on neural networks, results were medium to mediocre, and I didn't want data filled with inaccurate and randomly wrong values, especially on variable fields like the price of the item. The name of the item being slightly wrong, can be fixed with other methods.
I choose to make a pixel perfect OCR, since a screenshot with no compression was the intended way to retrieve initial data.
I also intended this project as an exercice to implement proper unit tests.
Auction house:
Blueprints:
It contains:
- Font options:
- What font is used.
- How is the font used:
- line/word/chat spacing
- color
- style
- Debug options
- Captures zones options
{
"defaults": {
"dymanicAnchorTolerance": 0,
"debugDraw": true,
"debugDrawOutputFile": "wakfu_blueprint_input_debug_draw.png",
"debugColorHTMLCode": "#FF0000",
"font": {
"lineSpacing": {
"min": 2,
"max": 7
},
"wordSpacing": {
"min": 5,
"max": 6
},
"charSpacing": {
"min": 0,
"max": 2
},
"fontTTFFile": "wci.ttf",
"color": "#FFFFFF",
"size": 10,
"tolerance": 7,
"fontStyle": "Regular",
"fontFamily": "Wakfu Client Interface Verdana",
"whiteListChars": "abcdefghijklmnopqrstuvwxyz123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"debug": false
}
}
}
Using the font options, each character we will need, is generated using the TTF files (TrueType Fonts). TTF file use a vetor-based approach, similar to vector art in tools like Illustrator.
{
"label": "Opened craft window",
"dynamicAnchorImageName": "wakfu_blueprint_opened_blueprint_anchor.png",
"dymanicAnchorTolerance": 5,
"debugDraw": true,
"rect": {
"topLeft": {
"x": -69,
"y": -18
},
"bottomRight": {
"x": 181,
"y": 373
}
},
"captureZones": []
}
A capture zone is a rectangle space on the image with constant size that will be found based on an "anchor image" or relative position to parent.
The anchor image represent a small part of the image that is unique or not, but should be constant between screenshots. They allow us to identify parts of the game UI which position is variable.
The letter R followed by the start of lower case e, letter are in white:
Above anchor image is used in the blueprint screenshot, we have the text "Recette niveau 26" on the right middle of the screenshot, there is always "Recette niveau" with the same color, this allows use to identify a row.
Capture zone inside capture zone can use the parent position to place themselves.
Notice how "canBeMissing"
is used to remove the rows for which the recipe
ingredients are not shown.
{
"defaults": {
"dymanicAnchorTolerance": 0,
"debugDraw": true,
"debugDrawOutputFile": "wakfu_blueprint_input_debug_draw.png",
"debugColorHTMLCode": "#FF0000",
"font": {
"lineSpacing": {
"min": 2,
"max": 7
},
"wordSpacing": {
"min": 5,
"max": 6
},
"charSpacing": {
"min": 0,
"max": 2
},
"fontTTFFile": "wci.ttf",
"color": "#FFFFFF",
"size": 10,
"tolerance": 7,
"fontStyle": "Regular",
"fontFamily": "Wakfu Client Interface Verdana",
"whiteListChars": "abcdefghijklmnopqrstuvwxyz123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"debug": false
}
},
"captureZones": [
{
"label": "Opened craft window",
"dynamicAnchorImageName": "wakfu_blueprint_opened_blueprint_anchor.png",
"dymanicAnchorTolerance": 5,
"debugDraw": true,
"rect": {
"topLeft": {
"x": -69,
"y": -18
},
"bottomRight": {
"x": 181,
"y": 373
}
},
"captureZones": [
{
"label": "Ingredient row 1",
"rect": {
"topLeft": {
"x": 12,
"y": 64
},
"bottomRight": {
"x": 218,
"y": 90
}
}
},
{
"label": "Ingredient row 2",
"rect": {
"topLeft": {
"x": 12,
"y": 90
},
"bottomRight": {
"x": 218,
"y": 116
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 3",
"rect": {
"topLeft": {
"x": 12,
"y": 116
},
"bottomRight": {
"x": 218,
"y": 142
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 4",
"rect": {
"topLeft": {
"x": 12,
"y": 142
},
"bottomRight": {
"x": 218,
"y": 168
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 5",
"rect": {
"topLeft": {
"x": 12,
"y": 168
},
"bottomRight": {
"x": 218,
"y": 194
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 6",
"rect": {
"topLeft": {
"x": 12,
"y": 194
},
"bottomRight": {
"x": 218,
"y": 220
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 7",
"rect": {
"topLeft": {
"x": 12,
"y": 220
},
"bottomRight": {
"x": 218,
"y": 246
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 8",
"rect": {
"topLeft": {
"x": 12,
"y": 246
},
"bottomRight": {
"x": 218,
"y": 272
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 9",
"rect": {
"topLeft": {
"x": 12,
"y": 272
},
"bottomRight": {
"x": 218,
"y": 298
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 10",
"rect": {
"topLeft": {
"x": 12,
"y": 298
},
"bottomRight": {
"x": 218,
"y": 324
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 11",
"rect": {
"topLeft": {
"x": 12,
"y": 324
},
"bottomRight": {
"x": 218,
"y": 350
}
},
"canBeMissing": true
},
{
"label": "Ingredient row 12",
"rect": {
"topLeft": {
"x": 12,
"y": 350
},
"bottomRight": {
"x": 218,
"y": 376
}
},
"canBeMissing": true
}
]
}
]
}
Once you found the smallest piece of image you want to send to the OCR, characters are then read from left to right, using provided options to give a precise, and organized (!) result.
[
{
"labels": [
"Opened craft window",
"Ingredient row 1"
],
"text": "49xCoudeTruche"
},
{
"labels": [
"Opened craft window",
"Ingredient row 2"
],
"text": "41xplumedeTruche"
},
{
"labels": [
"Opened craft window",
"Ingredient row 3"
],
"text": "7xMancheRudimentaire"
},
{
"labels": [
"Opened craft window",
"Ingredient row 4"
],
"text": "3xplancheFragile"
},
{
"labels": [
"Opened craft window",
"Ingredient row 5"
],
"text": ""
},
{
"labels": [
"Opened craft window",
"Ingredient row 6"
],
"text": "Chancesderussite"
},
{
"labels": [
"Opened craft window",
"Ingredient row 7"
],
"text": ""
},
{
"labels": [
"Opened craft window",
"Ingredient row 8"
],
"text": "Chancesderussite"
},
{
"labels": [
"Opened craft window",
"Ingredient row 9"
],
"text": ""
},
{
"labels": [
"Opened craft window",
"Ingredient row 10"
],
"text": ""
},
{
"labels": [
"Opened craft window",
"Ingredient row 11"
],
"text": ""
},
{
"labels": [
"Opened craft window",
"Ingredient row 12"
],
"text": ""
}
]
Limitation of this approach is that some characters can have the same exact pixels on a simple font. Especially the lower you go in size. The spacing between character can be the same as the space character. With additional data processing this should however not pose an issue. Some uppercase and lowercase end up having the same pixels.
In PixelsFinder.service.cs, x/y correct loop order for e2e (functional test) is different from unit tests. Some non char symbol get confused with 3 lll on auction house screenshot.
- do not differentiate upper from lower case characters when the pixels are the same, but at a different position.
- Some "not font color" space is needed at top and bottom of the each character tested on main image. (if you don't get any matches try to make capture zone bigger in height)
Baseline, we could add a baseline input model to indicate where the line is. Would enable to make pP issue working (p had same pixels as P, but different position). Would also be an optimization. And fix -- and T issue. Cons:
- If the line slightly move we lose everything.
Each char is supposed to have a baseline, after finding every possible match without taking baseline in account, we could filter using baseline (and the biggest like in above section).
Current problem: confusing -- and T, because there is no "not font color" exclusion zone on the sides AND vertical exclusion zone for -- is higher than T, parser check for - first and find it. Pixels get used, and T cannot be formed anymore. Also T should not have an exclusion zone on the side, so we cannot modify horizontal exlusion zone. --> Would try to parse every pixel without excluding anything. than you check for conflict between what chars are found. On line there cannot be two char collinding on X. If you have something collinding on X, it has to be a conflict like -- and T, take the biggest.
(Solution would be to test every char, then make the appropriate choice. Or to use baseline)