Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making money using end-to-end reinforcement learning with self-replicating agents #3752

Closed
synctext opened this issue Jul 18, 2018 · 8 comments
Assignees

Comments

@synctext
Copy link
Member

synctext commented Jul 18, 2018

Scientific goal: create a collaborative live research ecosystem for reinforcement learning

It is impossible to publish in leading AI venues without industry-level resources. Scientist are being starved of contributing their knowledge due to lack of access. Without industry-level resources (thousands of cores from Google, Facebook, or Deepmind clusters) and valuable huge datasets it's impossible to compete.

The "publish or perish" model encourages scientists to cut as many corners as they can in order to produce as many publications as they can. This directly conflicts with the realities of AI, it's hard and requires a lot of work to provide an advancement on the state-of-the-art.

Unpublished codes and a sensitivity to training conditions have made it difficult for AI researchers to reproduce many key results. AI has become a form of "alchemy.". This initiative will create the first fully open re-usable environment. Ideas compete for success and can be re-used.

More specifically, we need open re-usable AI with embodiment and self-replication, see this Science Magazine publication. Bio-inspiration is key.

Robotics researchers increasingly agree that ideas from biology and self-organization can
strongly benefit the design of autonomous robots. Biological organisms have evolved to
perform and survive in a world characterized by rapid changes, high uncertainty, indefinite
richness, and limited availability of information. Industrial robots, in contrast, operate in
highly controlled environments with no or very little uncertainty. Although many
challenges remain, concepts from biologically inspired (bio-inspired) robotics will eventually
enable researchers to engineer machines for the real world that possess at least some of
the desirable properties of biological organisms, such as adaptivity, robustness,
versatility, and agility.

image

Engineering goal: making money in our micro-economy using end-to-end reinforcement learning with our framework of self-replicating agents using VPS/VPN buying and decentral market

This type of robot will sense the world around it and act upon it. Without intelligent actions it will fail to reproduce and die off. "Motor commands" in the above picture in the old-AI robot world are replaced with robo trading. The whole ecosystem is fully self-organising and has no point of control, central server, or single-point-of-failure.

Robo trading is based on crypto tokens. Since the launch of our first primitive ledger in 2007 we have been working on an accounting system for Bittorrent. Something we now call a token for Bittorrent. Our live deployment and self-replicating AI now make the next step possible: self-replicating AI based on deep reinforcement learning. One full-time phd student is responsible for realizing our token economy: #3337 (see pages of detail there).

From #2925 :
The basic idea is to create a micro-economy within the Tribler platform for earning, spending and trading bandwidth tokens. This brings together various research topics, including blockchain-powered decentralized market, anonymous downloading and hidden seeding. Trustworthy behavior and participation should be rewarded while cheating should be punished. A basic policy should prevent users from selfishly consuming bandwidth without giving anything back. This directly addresses the tragedy-of-the-commons phenomena.
Our initial release should provide basic primitives to earn, trade and spend tokens. Our work could be extended with more sophisticated techniques like TrustChain record mixing, multiple identities, a robust reputation mechanism for tunnel selection, global consensus and verifiable public proofs (proof-of-bandwidth/proof-of-relay).

Agent specifications

The agent will earn Trustchain records by seeding in Bittorrent and relaying Tor-like traffic. It will sell these records for Bitcoins or Ethereum on our decentral market. Using these coins and our Plebnet framework it will buy VPN and VPS infrastructure and essentially replicate. Detailed architecture of this ecosystem:

token_architecture

The role of AI

The challenge is to put AI at the core of this work. All decisions about money, tokens, replication, and hoarding credits for survival will be taken by autonomous intelligence. By applying end-to-end reinforcement learning we will use a single goal which will drive the behavior of agents: survival.

Generating income is only a means to an end, the primary object of survival. Every month the agent needs to have sufficient bitcoins or ethereum to replace or it will "die". Various parameters will be implemented to influence the behavior and strategies of an agent.

  • survival based on cost of VPN/VPS, optional multiply (e.g. what providers to prefer)
    • strategy: buy cheapest or buy at random
    • strategy: high probability to stick with current provider or alternate with providers each month
  • understand bid/ask volume and market (over)supply (basic understanding of the market)
  • communicate to other agents; e.g. not expected to survive this month or thriving and breeding
    • cooperative experience sharing: disclose private decisions and obtained reward (security?)
    • gossip performance and reliability of VPN/VPS providers
  • parameters for cpu/disk storage versus bandwidth
    • what product is more in demand
    • long-term archiving or short-term flashcrowd boosting
  • market making for Bitcoin versus Trustchain bandwidth coins
    • probably best implemented using traditional techniques (as discussed)
    • no need for or critical role for AI
  • AI innovation
    • not the expert on this, involve experts like: Loog&Tax
    • get creative for the Q-learner
    • something with adversarial?
  • we have no idea what to do
    • implement and deploy the minimal viable agent
    • get creative and be inspired by operational experience
    • don't overthink at the start, just do it.
    • this is pioneering work, nobody can help us.
    • have fun and don't be evil or Skynet

A lot of loose parts of this vision are already in place. The integration step and meaningful intelligence is still lacking. Plebnet is operational:
demo_vision

@synctext synctext added this to the Backlog milestone Jul 18, 2018
@synctext
Copy link
Member Author

Diversity is essential to survival. Multiple independant code basis.

@synctext
Copy link
Member Author

f-MRI scans show that humans have social norms and reinforcement learning.
Reinforcement Learning Signal Predicts Social Conformity
Useful source of bio-socio mimicry experiments. Buzzword bingo score of 6+: data-driven bio-socio mimicry using blockchain-based deep reinforcement learning.
"using functional magnetic resonance imaging, that conformity is based on mechanisms that comply with principles of reinforcement learning. We found that individual judgments of facial attractiveness are adjusted in line with group opinion. Conflict with group opinion triggered a neuronal response in the rostral cingulate zone and the ventral striatum similar to the “prediction error” signal suggested by neuroscientific models of reinforcement learning. "
image

@synctext
Copy link
Member Author

synctext commented Oct 20, 2018

f-MRI and public goods
Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange
Linking this reinforcement learning issue to the math mode of #2805.
"Using a multiround version of an economic exchange (trust game), we report that reciprocity expressed by one player strongly predicts future trust expressed by their partner—a behavioral finding mirrored by neural responses in the dorsal striatum. Here, analyses within and between brains revealed two signals—one encoded by response magnitude, and the other by response timing. Response magnitude correlated with the “intention to trust” on the next play of the game, and the peak of these “intention to trust” responses shifted its time of occurrence by 14 seconds as player reputations developed. This temporal transfer resembles a similar shift of reward prediction errors common to reinforcement learning models, but in the context of a social exchange. These data extend previous model-based functional magnetic resonance imaging studies into the social domain and broaden our view of the spectrum of functions implemented by the dorsal striatum."

ToDo: biology-based models of trust (above paper gives real-world measurements of the human brain for trust).

@synctext
Copy link
Member Author

Hidden Technical Debt in Machine Learning Systems Google said: hardest part of AI is not AI.
"Only a small fraction of real-world ML systems is composed of the ML code, as shown
by the small black box in the middle. The required surrounding infrastructure is vast and complex."
hardest-about-ai-is-not-ai

@synctext
Copy link
Member Author

synctext commented Dec 16, 2019

@MateiAnton A first key start for this hugely ambitious project is to get something more efficient going.

As also mentioned in #4659 the current code is not deployed, only lab experiments:

The current bot can operate on our decentralised exchange and has a primitive understanding of pricing and orderbooks.

First step is to use the stable API provided by the creator of Sporestack into Cloudomate. Then we have the stable building blocks for self-replication and making everything more sophisticated/"intelligent".

Quick search came up with some prior work, not something we can re-use I believe. Fancy stuff, also far too complex to re-use; Stock Trading Bot using Deep Reinforcement Learning (Deep Q-learning), Keras and TensorFlow.
A Live Machine-Learning based Cryptocurrency Trader

Old 2008 scientific paper: Autonomous Forex Trading Agents

@synctext synctext assigned MateiAnton and unassigned mjuchli Dec 16, 2019
@synctext
Copy link
Member Author

reading on X-mas break and found related work, "autonomous bidding agent"
busy quarter.
next steps:

  • mixed, but focus on reading related scientific work
  • please focus on scientific paper of stuff that actually worked with actual money. Note this is strangely very rare. (e.g. fake money competition)

@synctext
Copy link
Member Author

@MateiAnton busy with regular classes this 3rd quarter. Please understand, re-use, and extend ongoing/prior work: https://github.com/Tribler/distributed-ai-kernel (Python or Kotlin)

@qstokkink
Copy link
Contributor

It seems like this student work was either completed or dropped several years ago. I'll close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants