August 8th, 2018 - by Derek Rodriguez
A couple months ago I stumbled across this paper and thought it was pretty cool. “Deep Q-Learning looks cool, let’s give it a go.” I thought to myself. Then I remembered I don’t have a colossal HPC cluster at my disposal to do things like make a DoTA 2 AI that goes toe-to-toe with the pros and I got sad. "Well, maybe I can offload the simulation of the card game to that Raspberry Pi I have sitting around and do all the learning on my desktop". Thus, the idea of an AI Gym with a socket interface was born.
"That sounds like something OpenAI Gym already does by default. This isn’t new at all."
You’re right. There is no such thing as originality, everything new is old, no one is special and art is a lie. Before we get ahead of ourselves, let’s note that apparently in the age of smart thermostats, cloud-connected everything, and Docker witchcraft, OpenAI Gym instances still run on locally on your machine. In case you didn’t know, it takes a long time to send a packet over a network—in terms of CPU cycles, that is.
How long, you ask? If your CPU is ticking away at anywhere around 3GHz, and it’s executing one instruction per tick, you’d be able to execute roughly 1.5 million instructions before that packet could do a round trip around a single data center. Here are the numbers I used to do the math. For something like blackjack, where each round of play isn’t going to take more than a couple thousand instructions, this meant I would need the hardware that would support several thousand threads of execution. Looks like were back at the DoTA-2-playing HPC mega-cluster. Well, it was worth a thought.
That being said, since that calculation was essentially done on the back of a napkin, I could be very wrong. I’ve begun making my own blackjack gym just to be certain that it isn't worthwhile. I’ve been hired to be a TA for a programming class at my university this semester, and I need to brush up on my C programming skills anyways.
Update: After several months of torturing myself with graduate school fellowships, I've finally got almost all the necessary logic done for the blackjack gym. Realistically, once I get splitting hands implemented, then I can fork my work and start rolling out netcode.