# Arcx feature generation The objective of this project is to generate a general purpose feature or index that can be used for different use cases within arcx. # Initial hypothesis I'd like to explore the effect of having / not having stablecoins stored in a wallet as a protection to fluctuations in the market. `are wallets which hold a good non-volatile (e.g stablecoins) / volatile asset relation less propense to liquidation events? ` to answer this question we would need to calculate the `volatile` assets balances in a wallet across time along with the `non-volatile`. This might be too complex to come by for the scope of this project, instead I propose a `proxy` index to analyze this hypothesis: ` stablecoin_to_eth_ratio = ETH_USD / STABLE_USD ` where: * `STABLE_USD = DAI+USDC+USDT` * `ETH_USD = ETH_VALUE*ETH_PRICE_USD` in representation of volatile assets we only consider ETH. We established an initial threshold or operating boundaries for the index, meaning it would only apply when the following condition holds ``` ETH_USD>=100 and STABLE_USD>=100 ``` The repurposed hypothesis would then be that if you hold a fair amount of stablecoins to support your ETH ( **high stablecoin_to_eth_ratio**) then you have a more robust position against market fluctuations and therefore are less likely to have liquidation events. (** see caveats section **) # Sample data using the `blockchain-etl.ethereum_compound` dataset available in bigquery, I sampled some `borrow` and `liquidate` events from the compound protocol. We calculate the index only in this subset of addresses to reduce the workload and also to get more insights in a real scenario. # Pipeline to calculate this index we process a few datasets from biguqery and using external APIs: * ERC20 transfers to asses usdc,dai,usdt balances over time * Ethereum transfers and fees to asses eth over time * Ethereum prices per date to combine with ethereum balances I leverage DBT to run the pipeline, the model queries can be found under **models/eth_to_stable**. It is worth noting that both `ethereum_balances_over_time` and `stablecoin_balance_over_time` could be used as standalone indexes in other contexts. # Indexes the table `supple-antenna-326401.arcx_dev.eth_to_stable_index` has view access enabled for allUsers so you should be able to query it in your bq connsole. ``` COLUMN TYPE address STRING date DATE stablecoin_usd_balance NUMERIC eth_usd_balance FLOAT eth_to_stablecoin_ratio FLOAT ``` * *eth_to_stablecoin_ratio*: holds the value of the index explained in the section above across different dates. * *eth_usd_balance* : is the eth balance in usd value across lifetime of the wallet * *stablecoin_usd_balance* : is the usd balance of all (usdc+dai+usdt) stablecoins combined across the lifetime of the wallet. regardless of the proposed index the last two indices should be useful for many problems and a lot other features could be derived from them. # Using dbt You can use the dbt models and generate this and intermediate datasets by running ```dbt seed && dbt run``` you'll need to setup a project in bigquery first since it relies on public datasets on that platform. # Analysis we do some validation of the features, explore the distribution of the index, analyze the sample data and use the index within the borrowing context in the notebook: https://github.com/dnul/arcx_stablecoin/blob/main/Analyzing_index_eth_to_stablecoin_ratio.ipynb # Caveats * ETH might not be representative of the value of volatile assets. Addresses might contain several other tokens that accrue more value and are not captured by this approach * Representing stablecoin by using usdc+dai+usdt might miss some other cases such as cDAI,Paxos or other assets and even having such assets in an exchange or other address. # Further work There are so many paths to go from here, i'll just list a few * calculating total value for wallet across time considering all erc20 tokens and re-calculate ratio against stable assets. * expand address base to sample the index on more wallets. * aggregate all three indexes into different time-windows to detect trends,spikes or relative growth to engineer new features. * sample data from other lending protcols to further validate predictive power of index