Note
This repo contains replication code and data for the paper Beze (2024).
The real estate dataset is available on Zenodo. For more details, see the data section below.
Tip
The order in which the scripts should be run is provided in script/main.sh.
Expand
- R 4.3.3
The necessary R packages are listed in the
renv.lock
file. You can install them by running the following command in the R console:# renv::init() # to initialize renv on the project if you don't clone the repo renv::restore()
- Python 3.12
The necessary Python packages are listed in the
requirements.txt
file. You can install them with uv:uv pip install -r requirements.txt
The data used in the analysis constitutes two main parts: real estate data and building footprint data.
Variable description
var | description | group | remark |
---|---|---|---|
id | ID of the property (prepended with the provider name) | The ID uniquely identifies properties; in the raw data, it may not have been, even within a provider. | |
listing_type | Listing type (for rent or sale etc.) | listing and property types | Parsed if not provided |
property_type | Property type (house, apartment, etc.) | listing and property types | Parsed if not provided |
price | Price of the property in local currency (Ethiopian Birr (ETB)) | price | Other currency units are converted to ETB |
price_type | The type of price (fixed, negotiable, etc.) | price | Parsed if not provided |
price_adj | Price of the property adjusted for inflation | price | |
price_sqm | Price of the property per square meter | price | |
price_adj_sqm | Price of the property per square meter adjusted for inflation | price | |
size_sqm | Floor area of the property in square meters | size | Imputed if not provided |
size_sqm_is_imputed | Yes if the floor area of the property was imputed | size | |
plot_size | Lot size of the property in square meters | size | |
address | Address of the property (untouched as provided) | address | |
address_main | Address of the property (manually corrected or cleaned) | address | The address of the property has been manually corrected or cleaned. Addresses for properties have been manually extracted from the description of the property. |
address_alt | Address of the property (extracted with Gemini Pro) | address | Equals to address_main if extraction failed or null |
unique_address_grp | Address group counter | address | This variable identifies properties with the same addresses. |
place_name | The name of the geocoded place, from the geocoding api,address | address | |
place_id | The id of the geocoded place | address | |
subcity | The subcity name | address | |
lng | The longitude of the property location | address | |
lat | The latitude of the property location | address | |
is_lng_lat_sampled | Yes if lng,lat is sampled | address | When the address is broad like “Bole” or even “Addis Ababa” a random (lng,lat) can be sampled from the subcity or Addis polygons.” |
date_published | The date the property was published on the website | time | |
time | The month (formatted year-month-01) the property was published on the website | time | |
year | The year the property was published on the website | time | |
quarter | The quarter the property was published on the website | time | |
title | The title of the property ad | description | |
description | The description of the property ad | description | |
num_bedrooms | The number of bedrooms in the property | features | |
num_bathrooms | The number of bathrooms in the property | features | |
num_images | The number of images in the property ad | features | |
features | A list of additional features of the property | features | A list of additional features, unstructured. |
condition | The condition of the property | features | |
furnishing | The furnishing level of the property | features | E.g. fully furnished, semi-furnished, etc. |
pets | Yes if pets are allowed in the property | features | Applicable to rentals. Parsed if not provided |
floor | The floor location of the property | features | Applicable to apartments. It may refer to the number of floors in some cases. |
garden | Yes if the property has a garden | features | Parsed if not provided |
parking | Yes if the property has parking | features | Parsed if not provided |
kitchen | Yes if the property has a kitchen | features | Parsed if not provided |
elevator | Yes if the property has an elevator | features | Parsed if not provided |
balcony | Yes if the property has a balcony | features | Parsed if not provided |
water | Yes if the property has water | features | Parsed if not provided |
power | Yes if the property has electricity | features | Parsed if not provided |
seller_address | The address of the seller mentioned in the ad | Phone number, email or social media information about the seller/agent. | |
dist_meskel_square | The distance from the property location to the CBD (Meskel Square) in km | Distance to the CBD | |
dist_arat_kilo | The distance from the property location to the CBD (Arat Kilo) in km | Distance to the CBD | |
dist_piassa | The distance from the property location to the CBD (Piassa) in km | Distance to the CBD | |
exchange_rate | Monthly Birr to USD exchange rates | Source: National Bank of Ethiopia | |
misclassified_or_outliers_flag | Yes if the property’s listing or type are thought to be misclassified or outlier. |
If you want to reproduce the data using the scripts, you can follow the steps in script/main.sh.
If you run the scripts successfully, you will have: The primary dataset for the analysis is constructed from data/housing/processed/listings_cleaned.csv, a cleaned version of the scraped data from all providers. The raw data is available in data/housing/raw for the providers included in the analysis. Missing attributes in the dataset are imputed using
Gemini Pro
, and the imputed data can be found in data/housing/processed/structured/tidy. Finally, property addresses are geocoded using Google Places API and OSM nominatim. The georeferenced data is available in data/housing/processed/tidy/listings_cleaned_tidy__geocoded.csv.
Important
During web scraping, I tried to respect the robots.txt
file of the
website. See the contents in
data/housing/robots_txt.
A list of real estate providers in Addis
name | num_ads |
---|---|
Loozap Ethiopia | 75358 |
Cari Africa Homes | 42612 |
AfroTie | 30000 |
JIji | 12272 |
Qefira | 8121 |
Ethiopia Property Centre | 3649 |
Engocha | 2059 |
Real Ethio | 1585 |
Airbnb Addis Ababa | 1000 |
EthiopianHome | 990 |
Ethiopian Properties | 880 |
Sarrbet | 741 |
Ethiopia Realty | 717 |
Ermithe Ethiopia | 645 |
LiveEthio | 625 |
ZeGebeya.com | 560 |
Zerzir | 539 |
Real Addis | 513 |
Beten | 495 |
Kemezor | 434 |
HahuZon | 400 |
Ethiobetoch | 315 |
Verenda | 285 |
Mondinion | 268 |
Yegna Home | 247 |
Expat | 233 |
Keys to Addis | 219 |
Ebuy | 216 |
Addis Agents | 195 |
Rent in Addis Agent | 175 |
Betoch | 126 |
Sheger Home | 120 |
Ethio Broker | 105 |
Betbegara | 83 |
Addis Property Listings | 76 |
Shega Home | 60 |
Realtor Ethiopia | 33 |
Addis Gojo | 32 |
Notes: The number of ads is as of April 2024. Qefira shut down in June 2023. |
The building variables are extracted from two sources:
- The German Aerospace Center (DLR): the World Settlement Footprint (WSF) 3D and WSF 2019v1 datasets.
- Open buildings from Google.
Please cite the paper or dataset for any use of the code or data in this repository.
@article{Beze_2024,
title = {Testing the Gradient Predictions of the Monocentric City Model in Addis Ababa},
ISSN = {1556-5068},
url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4803607},
DOI = {10.2139/ssrn.4803607},
journal = {SSRN Electronic Journal},
publisher = {Elsevier BV},
author = {Beze, Eyayaw},
year = {2024}
}
@misc{Beze_2024_dataset,
title = {Georeferenced real estate data for Addis Ababa},
author = {Beze, Eyayaw},
year = {2024},
doi = {10.5281/ZENODO.11205969},
url = {https://zenodo.org/doi/10.5281/zenodo.11205969},
publisher = {Zenodo},
copyright = {Creative Commons Attribution 4.0 International}
}