Reinforcement Learning #55

srini1948 · 2017-07-05T20:17:36Z

RL has been added to original ConvNetJS.
Will you be adding that too?
Any plans for LSTM?

Thanks

MarcoMeter · 2017-07-05T20:32:28Z

+1

Shouldn't be that tough to implement DQN. Maybe I can contribute that in like 8 weeks. Though, I haven't checked out if ConvNetSharp is suitable for my implementation in Unity performance wise.

srini1948 · 2017-07-05T20:39:29Z

Thanks – the only reason I ask is because I am a lot more comfortable with C#. I find the Python array methods extremely confusing – right now I am experimenting with CNTK. I wish Microsoft will release their C# interface for CNTK – currently they support only eval mode. Sent from Mail for Windows 10 From: MarcoMeter Sent: Wednesday, July 5, 2017 4:32 PM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) +1 Shouldn't be that tough to implement DQN. Maybe I can contribute that in like 8 weeks. Though, I haven't checked out if ConvNetSharp is suitable for my implementation in Unity performance wise. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

cbovar · 2017-07-06T00:40:08Z

For DQN you can check out this repo. It should be easy to adapt it to newer version of ConvNetSharp.

I have worked on LSTM. I will eventually release a 'shakespeare' demo. I only worked on the GPU versions.

cbovar · 2017-07-06T00:44:41Z

I also see a DQN using WPF for display in this fork

MarcoMeter · 2017-07-06T05:14:31Z

I worked with Deep-QLearning-Demo the past weeks. But it lacks in performance (due to single threaded) and that code is hard to read and to maintain. But well, this was almost completely adapted from the ConvNetJS version, which uses that strange coding convention.

srini1948 · 2017-07-06T10:37:54Z

Thanks. I will From: Cédric [mailto:notifications@github.com] Sent: Wednesday, July 05, 2017 8:40 PM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) For DQN you can check out this repo <https://github.com/dubezOniner/Deep-QLearning-Demo-csharp> . It should be easy to adapt it to newer version of ConvNetSharp. I have worked on LSTM. I will eventually release a 'shakespeare' demo. I only worked on the GPU versions. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AYKGyCGi-1GLmC3pq-74VDp28MrPBw9kks5sLCzpgaJpZM4OO0YG> .Image removed by sender.

srini1948 · 2017-07-06T10:38:09Z

Thanks., From: MarcoMeter [mailto:notifications@github.com] Sent: Thursday, July 06, 2017 1:15 AM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) I worked with Deep-QLearning-Demo the past weeks. But it lacks in performance (due to single threaded) and that code is hard to read and to maintain. But well, this was almost completely adapted from the ConvNetJS version, which uses that strange coding convention. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AYKGyPG8BxLhaWnxcLSO0apVY6GMZ0HDks5sLG03gaJpZM4OO0YG> .Image removed by sender.

srini1948 · 2017-07-07T15:12:21Z

I tried it – very nicely written and the graphics are great! Sent from Mail for Windows 10 From: Cédric Sent: Wednesday, July 5, 2017 8:40 PM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) For DQN you can check out this repo. It should be easy to adapt it to newer version of ConvNetSharp. I have worked on LSTM. I will eventually release a 'shakespeare' demo. I only worked on the GPU versions. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

MarcoMeter · 2017-07-07T15:33:16Z

I'm applying DQN to my game BRO ( https://www.youtube.com/watch?v=_mZaGTGn96Y ) right now.
Within the next months , I'll release BRO open source on Github. BRO features an AI framework and a match sequence editor for match automation. The game is done with Unity.

Right now I need a much faster DQN implementation. The DQN Demo from above lacks in that, so the training time takes 30 minutes. That's why I'm considering to contribute DQN to this repo.

And this is a video about the AI framework and the match sequence editor https://www.youtube.com/watch?v=EE7EqoaOL34

srini1948 · 2017-07-07T15:52:53Z

If ConvNetSharp works with GPU can the DeepQLearning Demo be modified to use the Volume.GPU structure? It will be much faster. Sent from Mail for Windows 10 From: MarcoMeter Sent: Friday, July 7, 2017 11:33 AM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) I'm applying DQN to my game BRO ( https://www.youtube.com/watch?v=_mZaGTGn96Y ) right now. Within the next months , I'll release BRO open source on Github. BRO features an AI framework and a match sequence editor for match automation. The game is done with Unity. Right now I need a much faster DQN implementation. The DQN Demo from above lacks in that, so the training time takes 30 minutes. That's why I'm considering to contribute DQN to this repo. And this is a video about the AI framework and the match sequence editor https://www.youtube.com/watch?v=EE7EqoaOL34 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

MarcoMeter · 2017-07-24T16:23:12Z

Hey,

if anybody has some ideas for testing the to be implemented DQN algorithm, please let me know.

So far I've got these ideas for integration testing:

ContNetJs's apple and poison example (windows forms), just like the already mentioned C# port (Deep-QLearning-Demo-csharp)
slot machine (just console application)
a moving target, which has to be shot by the agent (maybe Unity game engine)
agent has to move a basket to catch fruits and to avoid stuff like poison (maybe Unity game engine)

I'll probably find more through research.

After that, I'll start with the DQN implementation. I'll probably start with the implementation of "Deep-QLearning-Demo-csharp". Then I'll compare it to the Python implementation done by DeepMind for the Atari games.

srini1948 · 2017-07-24T18:32:48Z

Thanks. Srini

…

Sent from my Galaxy Tab A -------- Original message --------From: MarcoMeter <notifications@github.com> Date: 7/24/17 12:23 PM (GMT-05:00) To: cbovar/ConvNetSharp <ConvNetSharp@noreply.github.com> Cc: srini1948 <ranjisrini@gmail.com>, Author <author@noreply.github.com> Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) Hey, if anybody has some ideas for testing the to be implemented DQN algorithm, please let me know. So far I've got these ideas for integration testing: ContNetJs's apple and poison example (windows forms), just like the already mentioned C# port (Deep-QLearning-Demo-csharp) slot machine (just console application) a moving target, which has to be shot by the agent (maybe Unity game engine) agent has to move a basket to catch fruits and to avoid stuff like poison (maybe Unity game engine) I'll probably find more through research. After that, I'll start with the DQN implementation. I'll probably start with the implementation of "Deep-QLearning-Demo-csharp". Then I'll compare it to the Python implementation done by DeepMind for the Atari games. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/cbovar/ConvNetSharp","title":"cbovar/ConvNetSharp","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/cbovar/ConvNetSharp"}},"updates":{"snippets":[{"icon":"PERSON","message":"@MarcoMeter in #55: Hey,\r\n\r\nif anybody has some ideas for testing the **to be** implemented DQN algorithm, please let me know.\r\n\r\nSo far I've got these ideas for integration testing:\r\n\r\n- ContNetJs's apple and poison example (windows forms), just like the already mentioned C# port (Deep-QLearning-Demo-csharp)\r\n- slot machine (just console application)\r\n- a moving target, which has to be shot by the agent (maybe Unity game engine)\r\n- agent has to move a basket to catch fruits and to avoid stuff like poison (maybe Unity game engine)\r\n\r\nI'll probably find more through research.\r\n\r\nAfter that, I'll start with the DQN implementation. I'll probably start with the implementation of \"Deep-QLearning-Demo-csharp\". Then I'll compare it to the Python implementation done by DeepMind for the Atari games."}],"action":{"name":"View Issue","url":"#55 (comment)"}}}

cbovar · 2017-07-24T23:56:57Z

Maybe you could also try on a very simple task, reproduce the input:

0 -> 0
1 -> 1

It may fit in an unit test.

MarcoMeter · 2017-07-25T07:38:37Z

I wrote a simple slot machine (console app) using 3 reels. Just hold down space to start the slot machine and to stop each reel one by one.

New items for the reels' slots are sampled from a specific probability distribution.

In the end, the agent has to decide when to stop the first reel, the second reel and finally the third one.
(I should consider to let the AI decide which reel to stop first to have a few more dimensions on the outputs)

SlotMachine.zip

Given this slot machine example, I'm going to approach the DQN implementation now.

srini1948 · 2017-07-25T13:16:24Z

FYI: As you probably know the Arcade Learning Environment ( containing simulation of Atari games) is available from OpenAI for Linux and MacOS. I managed to compile it on Windows from source to create a DLL that can be used with Python. I was wondering if this will be useful for c# too. Srini Sent from Mail for Windows 10 From: MarcoMeter Sent: Tuesday, July 25, 2017 3:38 AM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) I wrote a simple slot machine (console app) using 3 reels. Just hold down space to start the slot machine and to stop each reel one by one. New items for the reels' slots are sampled from a specific probability distribution. In the end, the agent has to decide when to stop the first reel, the second reel and finally the third one. (I should consider to let the AI decide which reel to stop first to have a few more dimensions on the outputs) SlotMachine.zip Given this slot machine example, I'm going to approach now the DQN implementation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

MarcoMeter · 2017-07-25T13:23:34Z

Creating an interface between Python and C# might end up consuming too much time. I know there is the so called IronPython (http://ironpython.net/) Library which allows to use Python in C#, but I haven't really looked into it.

srini1948 · 2017-07-25T14:07:14Z

I was suggesting using the DLL directly from C#. See https://gym.openai.com/ It exposes the interface in the attached header file that allows you to load any of the Atari games such as Pong. During training episodes this AI will update status. Sent from Mail for Windows 10 From: MarcoMeter Sent: Tuesday, July 25, 2017 9:23 AM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) Creating an interface between Python and C# might end up consuming too much time. I know there is the so called IronPython (http://ironpython.net/) Library which allows to use Python in C#, but I haven't really looked into it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread. #ifndef __ALE_C_WRAPPER_H__ #define __ALE_C_WRAPPER_H__ // Declare var for __declspec(dllexport) #define EXPORT __declspec(dllexport) #include <ale_interface.hpp> extern "C" { // Declares int rgb_palette[256] EXPORT ALEInterface *ALE_new() { return new ALEInterface(); } EXPORT void ALE_del(ALEInterface *ale) { delete ale; } EXPORT const char *getString(ALEInterface *ale, const char *key) { return ale->getString(key).c_str(); } EXPORT int getInt(ALEInterface *ale, const char *key) { return ale->getInt(key); } EXPORT bool getBool(ALEInterface *ale, const char *key) { return ale->getBool(key); } EXPORT float getFloat(ALEInterface *ale, const char *key) { return ale->getFloat(key); } EXPORT void setString(ALEInterface *ale, const char *key, const char *value) { ale->setString(key, value); } EXPORT void setInt(ALEInterface *ale, const char *key, int value) { ale->setInt(key, value); } EXPORT void setBool(ALEInterface *ale, const char *key, bool value) { ale->setBool(key, value); } EXPORT void setFloat(ALEInterface *ale, const char *key, float value) { ale->setFloat(key, value); } EXPORT void loadROM(ALEInterface *ale, const char *rom_file) { ale->loadROM(rom_file); } EXPORT int act(ALEInterface *ale, int action) { return ale->act((Action)action); } EXPORT bool game_over(ALEInterface *ale) { return ale->game_over(); } EXPORT void reset_game(ALEInterface *ale) { ale->reset_game(); } EXPORT void getLegalActionSet(ALEInterface *ale, int *actions) { ActionVect action_vect = ale->getLegalActionSet(); for (unsigned int i = 0; i < ale->getLegalActionSet().size(); i++) { actions[i] = action_vect[i]; } } EXPORT int getLegalActionSize(ALEInterface *ale) { return ale->getLegalActionSet().size(); } EXPORT void getMinimalActionSet(ALEInterface *ale, int *actions) { ActionVect action_vect = ale->getMinimalActionSet(); for (unsigned int i = 0; i < ale->getMinimalActionSet().size(); i++) { actions[i] = action_vect[i]; } } EXPORT int getMinimalActionSize(ALEInterface *ale) { return ale->getMinimalActionSet().size(); } EXPORT int getFrameNumber(ALEInterface *ale) { return ale->getFrameNumber(); } EXPORT int lives(ALEInterface *ale) { return ale->lives(); } EXPORT int getEpisodeFrameNumber(ALEInterface *ale) { return ale->getEpisodeFrameNumber(); } EXPORT void getScreen(ALEInterface *ale, unsigned char *screen_data) { int w = ale->getScreen().width(); int h = ale->getScreen().height(); pixel_t *ale_screen_data = (pixel_t *)ale->getScreen().getArray(); memcpy(screen_data, ale_screen_data, w*h * sizeof(pixel_t)); } EXPORT void getRAM(ALEInterface *ale, unsigned char *ram) { unsigned char *ale_ram = ale->getRAM().array(); int size = ale->getRAM().size(); memcpy(ram, ale_ram, size * sizeof(unsigned char)); } EXPORT int getRAMSize(ALEInterface *ale) { return ale->getRAM().size(); } EXPORT int getScreenWidth(ALEInterface *ale) { return ale->getScreen().width(); } EXPORT int getScreenHeight(ALEInterface *ale) { return ale->getScreen().height(); } EXPORT void getScreenRGB(ALEInterface *ale, unsigned char *output_buffer) { size_t w = ale->getScreen().width(); size_t h = ale->getScreen().height(); size_t screen_size = w*h; pixel_t *ale_screen_data = ale->getScreen().getArray(); ale->theOSystem->colourPalette().applyPaletteRGB(output_buffer, ale_screen_data, screen_size); } EXPORT void getScreenGrayscale(ALEInterface *ale, unsigned char *output_buffer) { size_t w = ale->getScreen().width(); size_t h = ale->getScreen().height(); size_t screen_size = w*h; pixel_t *ale_screen_data = ale->getScreen().getArray(); ale->theOSystem->colourPalette().applyPaletteGrayscale(output_buffer, ale_screen_data, screen_size); } EXPORT void saveState(ALEInterface *ale) { ale->saveState(); } EXPORT void loadState(ALEInterface *ale) { ale->loadState(); } EXPORT ALEState* cloneState(ALEInterface *ale) { return new ALEState(ale->cloneState()); } EXPORT void restoreState(ALEInterface *ale, ALEState* state) { ale->restoreState(*state); } EXPORT ALEState* cloneSystemState(ALEInterface *ale) { return new ALEState(ale->cloneSystemState()); } EXPORT void restoreSystemState(ALEInterface *ale, ALEState* state) { ale->restoreSystemState(*state); } EXPORT void deleteState(ALEState* state) { delete state; } EXPORT void saveScreenPNG(ALEInterface *ale, const char *filename) { ale->saveScreenPNG(filename); } // Encodes the state as a raw bytestream. This may have multiple '\0' characters // and thus should not be treated as a C string. Use encodeStateLen to find the length // of the buffer to pass in, or it will be overrun as this simply memcpys bytes into the buffer. EXPORT void encodeState(ALEState *state, char *buf, int buf_len); EXPORT int encodeStateLen(ALEState *state); EXPORT ALEState *decodeState(const char *serialized, int len); } #endif

MarcoMeter · 2017-07-25T20:01:25Z

Here is an update on the progress referring to a commit on the DQN branch of my fork:

Added major chunks of the DeepQLearner.cs [WIP]
A few TODOs left before testing and verification:

TODO: Overload or modify RetrievePolicy() to make use of Volumes, return output Volume from the net as well
TODO: Overload or modify GetNetInput() to make use of Volumes
TODO: Compute loss
TODO: Verify the consistency of the composed neural net upon initializing the DeepQLearner

https://github.com/MarcoMeter/ConvNetSharp/commit/5711468362d6f3551f82bad1e24d784e31f59a4b

MarcoMeter · 2017-07-26T19:36:22Z

And there is one more major thing on the list:

Adding a regression layer. I guess there is no regression layer implemented yet, right?

cbovar · 2017-07-27T00:31:34Z

It seems that RegressionLayer disappeared at some point (from tag 0.3.2). I will try to reintroduce it this week end.

MarcoMeter · 2017-07-27T07:03:00Z

Maybe this is related to this commit, because the file 'F:\Repositories\ConvNetSharp\src\ConvNetSharp\Layers\RegressionLayer.cs' got removed:

Commit: 56fec45 [56fec45]
Parents: 5a47e2e, 37cdfbf
Author: Augustin Juricic ajuricic@neogov.net
Date: Dienstag, 28. März 2017 11:18:18
Committer: Augustin Juricic
Merge remote-tracking branch 'github/master' into develop

cbovar · 2017-07-28T01:45:17Z

I think I have never implemented RegressionLayer since ConvNetSharp handles batch.

cbovar · 2017-07-29T04:33:49Z

RegressionLayer committed

MarcoMeter · 2017-07-29T07:15:26Z

Great, thanks. I'll move on soon.

MarcoMeter · 2017-07-31T11:42:27Z

As of now, I'm struggling with the issue that the computed action values grow exponentially towards positive or negative infinity.

cbovar · 2017-07-31T14:46:11Z

Have you tried with a lower learning rate? E.g. 0.001

MarcoMeter · 2017-07-31T15:07:40Z

The learning rate slightly delays this outcome.

Nevertheless, for the outputs I'm expecting values to be less than 2. Just because of the fact that the maximum reward for the slot machine example is 1, which is probably handed out after making at least 3 decisions.

MarcoMeter · 2017-08-02T15:13:23Z

I'm still trying to figure out the issue. Maybe I'm misusing the volume class, or I might don't have enough experience with the actual implementation of neural nets (like understanding every single detail of the regression layer implementation). So I'm dropping some more information.

Here is some pseudo code (Matiisen, 2015) featuring the core pieces of the algorithm:

`initialize replay memory D
initialize action-value function Q with random weights
observe initial state s
repeat
select an action a
with probability ε select a random action
otherwise select a = argmaxa’Q(s,a’)
carry out action a
observe reward r and new state s’
store experience <s, a, r, s’> in replay memory D

sample random transitions <ss, aa, rr, ss’> from replay memory D
calculate target for each minibatch transition
    if ss’ is terminal state then tt = rr
    otherwise tt = rr + γmaxa’Q(ss’, aa’)
train the Q network using (tt - Q(ss, aa))^2 as loss

s = s'

until terminated`

And this is the stated loss function for training:

For the DQN implementation of Karpathy this loss function seems to be not present. The regression layer implementation looks to be similar (comparing Karpathy and this repo). For the rest, everything is implemented accordingly (i.e. sampling experiences for computing new q values).

Using the Deep Q Learning Demo CSharp, the output values for the slot machine stay below 0.02.
SlotMachine.zip

MarcoMeter · 2017-08-02T18:24:56Z

And this is a flow chart of the implementation of the Q Learning part

cbovar · 2017-08-03T03:32:11Z

I haven't had time to look at the code yet. But you could maybe make the problem even simpler (like this) to make it easier to debug.

MarcoMeter · 2017-08-03T08:28:22Z

I could implement an example for contextual bandits according to Bandit Dungeon Demo (Example form the same author of your provided link).

I just fear that the bandit examples are not complex enough for using a policy network. At least it can be observed, if the q-values grow to infinity or not.

srini1948 · 2017-08-03T10:25:56Z

Policy networks are a challenge. I have been trying to reproduce Karpathy’s policy gradient solution for Pong using Microsoft CNTK – for some reason the training does not converge even though I am using the same network architecture and parameters. From: MarcoMeter [mailto:notifications@github.com] Sent: Thursday, August 03, 2017 4:28 AM To: cbovar/ConvNetSharp Cc: srini1948; Author Subject: Re: [cbovar/ConvNetSharp] Reinforcement Learning (#55) I could implement an example for contextual bandits according to Bandit Dungeon Demo <https://github.com/Unity-Technologies/BanditDungeon> (Example form the same author of your provided link). I just fear that the bandit examples are not complex enough for using a policy network. At least it can be observed, if the q-values grow to infinity or not. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#55 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AYKGyGYWO4bj2vdKILPpqtINtylrbDByks5sUYSngaJpZM4OO0YG> .Image removed by sender.

MarcoMeter · 2017-08-10T11:22:19Z

The only news I have is that I'm working on a different example (made with Unity). This example is about controlling a basket to catch rewarding items and to avoid punishing ones.

Concerning the DQN implementation I'm still stuck. I hope that Cedric can find some time to check the usage of Volumes.

cbovar · 2017-08-10T12:31:10Z

Sorry guys. I have been very busy with my new job. I'll try to look at this soon.

MarcoMeter · 2017-08-11T08:51:47Z

I just tested the implementation on the apples & poison example. The issue of exploding output values is observed as well.

I didn't add the example to the version control, since the code is not well written but functional (I took the known implementation and just substituted the DQN parts).

ApplesPoisonDQNDemo.zip

MarcoMeter · 2017-08-18T09:58:44Z

Just some update:

I created a UI for displaying the training progress. The red graph plots the average reward and the blue one the average loss. I resolved a bug concerning the epsilon exploration strategy (epsilon was always equal to 1 due to an integer division).

As Cederic fixed a bug of the regression layer, the outputs do not explode anymore. Regardless, I did not achieve a good behavior for the slot machine yet. Though I came up with a new reward function, which signals rewards based on the result of one stopped reel. The first stop rewards the agent by the item of slot (e.g. 0.5 for a cherry or 1 for a 7). Stopping the second or third reel rewards the agent by 1 for a matching item. For failure, the agent is punished by -0.5. Waiting does not punish or reward the agent. Most of the time the agent learns to wait. It seems to be that this way, any punishments are avoided.

I probably try to focus now on the Apples and Poison demo, because suitable hyperparemeters are already known. One drawback is the performance. The referenced demo performs much better. So I'll have to find the bottleneck.

cbovar · 2017-08-18T10:34:43Z

I think you should focus on getting the correct results first. For the performances, we can look at it later (using batch size > 1 and GPU will help)

MarcoMeter · 2017-08-19T07:11:42Z

Still it surprises me that the Apples and Poison demo is much much slower compared to Deep-QLearning-Demo-csharp.

Edit 1: If I'm enabling the GPU support, by changing the namespaces, I get a BadImageFormatException because it can not load the ConvNetSharp.Volume.GPU. Even though it is added to the references to all project dependencies.

Edit 2: The Apples and Poison demo takes probably a whole day for training. It progresses on like 4fps.

Edit 3: 240,000 learning steps (DeepQLearner.Backward) take 27h. In comparison to Deep-QLearning-Demo-csharp, 50.000 learning steps take less than 9 minutes.

cbovar · 2017-08-21T00:37:54Z

You probably get BadImageFormatException because you are in 32bits. GPU only works in 64bits.

MarcoMeter · 2017-08-21T06:38:25Z

Thanks, this solved the BadImageFormatException.

And now its a CudaException thrown at CudaHostMemoryRegion.cs:25, triggered by
var chosenAction = _brain.Forward(new ConvNetSharp.Volume.GPU.Double.Volume(GatherInput(), new Shape(GatherInput().Length)));

One question:
Is there any way to avoid specifying the full path to the Volume object, like seen above? VS complains about that Volume is a namespace even though the namespace is imported. The ConvNetSharp.Volume namespace is required for the Shape class, so I guess that's the conflict.

cbovar · 2017-08-26T07:30:52Z

I have fixed the loss computation in the regression layer.

I think there is an issue here. You get the output of the FinalState and update the reward related to current Action. However you should get the output related to the InitialState.

In ConvNetJs, it only regress on the current Action dimension here.

You could do something like that:

// Create desired output volume
var desiredOutputVolume = _trainingOptions.Net.Forward(experience.InitialState).Clone();
desiredOutputVolume.Set(actionPolicy.Action, newActionValue);

I applied this modification on this branch: https://github.com/cbovar/ConvNetSharp/tree/DQN

MarcoMeter · 2017-08-26T08:01:37Z

It looks that you are right on that. I missed out on that detail inside the train function.

cbovar · 2017-08-26T08:43:53Z

As for the exception in GPU (CudaException thrown at CudaHostMemoryRegion.cs:25), it turns out it's a multi-threading issue: some volume allocation is done in the workerthread whereas the GPU context was acquired on the main thread.

masatoyamada1973 · 2017-09-22T14:06:03Z

desiredOutputVolume.Set(actionPolicy.Action, newActionValue)
↓
desiredOutputVolume.Set(experience.Action, newActionValue)

？

MarcoMeter · 2017-10-24T12:01:27Z

Hey,
I wanted to let you guys know that I stopped working on this concern.
I switched to working with Python and the just released ML Agents of Unity.

GospodinNoob · 2017-12-21T13:57:27Z

@MarcoMeter Hello. Link (https://github.com/MarcoMeter/Basket-Catch-Deep-Reinforcement-Learning) is broken. Is there an opportunity to download the source code of this Unity implementation (Unity project)? Thanks.

MarcoMeter · 2017-12-21T14:00:00Z

@GospodinNoob
https://github.com/MarcoMeter/Unity-ML-Environments

GospodinNoob · 2017-12-21T14:16:47Z

@MarcoMeter Thanks

GospodinNoob · 2017-12-21T14:36:48Z

@MarcoMeter Maybe you have a repo with Unity and your DQN? I am trying to add it, but still have some misunderstanding of this system. Of course, if it not hard to you)

Reinforcement Learning #55

Reinforcement Learning #55

Comments

srini1948 commented Jul 5, 2017

MarcoMeter commented Jul 5, 2017

srini1948 commented Jul 5, 2017 via email

cbovar commented Jul 6, 2017

cbovar commented Jul 6, 2017 • edited Loading

MarcoMeter commented Jul 6, 2017

srini1948 commented Jul 6, 2017 via email

srini1948 commented Jul 6, 2017 via email

srini1948 commented Jul 7, 2017 via email

MarcoMeter commented Jul 7, 2017

srini1948 commented Jul 7, 2017 via email

MarcoMeter commented Jul 24, 2017

srini1948 commented Jul 24, 2017 via email

cbovar commented Jul 24, 2017

MarcoMeter commented Jul 25, 2017 • edited Loading

srini1948 commented Jul 25, 2017 via email

MarcoMeter commented Jul 25, 2017

srini1948 commented Jul 25, 2017 via email

MarcoMeter commented Jul 25, 2017

MarcoMeter commented Jul 26, 2017

cbovar commented Jul 27, 2017

MarcoMeter commented Jul 27, 2017

cbovar commented Jul 28, 2017

cbovar commented Jul 29, 2017

MarcoMeter commented Jul 29, 2017

MarcoMeter commented Jul 31, 2017

cbovar commented Jul 31, 2017

MarcoMeter commented Jul 31, 2017

MarcoMeter commented Aug 2, 2017

MarcoMeter commented Aug 2, 2017 • edited Loading

cbovar commented Aug 3, 2017

MarcoMeter commented Aug 3, 2017

srini1948 commented Aug 3, 2017 via email

MarcoMeter commented Aug 10, 2017

cbovar commented Aug 10, 2017

MarcoMeter commented Aug 11, 2017

MarcoMeter commented Aug 18, 2017 • edited Loading

cbovar commented Aug 18, 2017

MarcoMeter commented Aug 19, 2017 • edited Loading

cbovar commented Aug 21, 2017

MarcoMeter commented Aug 21, 2017 • edited Loading

cbovar commented Aug 26, 2017 • edited Loading

MarcoMeter commented Aug 26, 2017

cbovar commented Aug 26, 2017

masatoyamada1973 commented Sep 22, 2017

MarcoMeter commented Oct 24, 2017

GospodinNoob commented Dec 21, 2017

MarcoMeter commented Dec 21, 2017

GospodinNoob commented Dec 21, 2017

GospodinNoob commented Dec 21, 2017 • edited Loading

cbovar commented Jul 6, 2017 •

edited

Loading

MarcoMeter commented Jul 25, 2017 •

edited

Loading

MarcoMeter commented Aug 2, 2017 •

edited

Loading

MarcoMeter commented Aug 18, 2017 •

edited

Loading

MarcoMeter commented Aug 19, 2017 •

edited

Loading

MarcoMeter commented Aug 21, 2017 •

edited

Loading

cbovar commented Aug 26, 2017 •

edited

Loading

GospodinNoob commented Dec 21, 2017 •

edited

Loading