twiddling bits and atoms's blog

Posted Thu 09 February 2017

Machine learning on other people's computers

Recently I've been experimenting with machine learning models for image recognition tasks (convolutional neural networks, CNN).

The training phase of the machine learning requires a whole lot of floating point operations. Even though current generation CPUs are really good at number crunching, their performance is still not enough- GPUs outperform them by an order of magnitude or even more (for this specific task).

So, to train machine learning models in reasonable time one requires GPU. A slight problem with this is that it requires a desktop PC and $1200 for one of the best performing consumer GPUs (NVidia Titan X).

What about cloud?

There is no cloud, just other people's computers

One frequently employed alternative is to perform the training on virtual machines with GPUs in data-centers managed by Amazon (AWS), Microsoft (Azure) and others.

I almost went this route but for some reason I am unable to confirm my phone number on AWS. The support has a response-time of about 3 days so I had to look for alternatives.

Nimbix & Docker

Nimbix provides various software products running on their machines (which can have GPUs attached to them). They advertise "bare-metal performance" but I don't really know if that means the Docker images are run on bare-metal.

Their platform has ready-to-run configurations for some of the machine-learning frameworks (Theano, Tensorflow), but it also has an option to run your own Docker image, which means you can prepare and try out the batch jobs on your local computer and push it to Nimbix for the real training run.

I used this Dockerfile to train a simple TensorFlow model on a server with K80 GPU at slightly more than $1/hour, which is a bargain considering it's ~$5000 price tag.

Of course, the economics don't work out that great if you're training ML models 24/7 but this cost is slightly more than regular price of AWS p2.xlarge instance with the same GPU.

The upside is- you don't have to set up and manage OS- just submit the batch job and receive an e-mail when it's done.

Conclusion

It's interesting to see how computing has came a full circle: you have to prepare a program and execute it on large and expensive machine which you can't access directly (just like people programming the early mainframe computers).

On the other hand, I guess that we're lucky that we don't have to submit the batch jobs in paper binders anymore.

Category: misc

Comments