gpu - Is there is a gradient descent implementation that uses matrix matrix multiplication? -
i'm using below gradient descent implementation in octave ml.
i tried first increase number of cpu cores , run octave multithreaded using openblas still didn't results i'm looking for, tried using nvidia's toolkit , tesla k80 gpu
i'm loading octave using drop in nvblas following instructions in article:
drop-in acceleration of gnu octave
when checked nvidia-smi found gpu idle although testing using matrix matrix multiplication yielding ~9 teraflops
later came understand matrix vector multiplication used above mentioned implementation not supported per nvblas documentation
so question there gradient descent implementation uses matrix matrix multiplication or equivalent can replace gradient descent implementation have?
Comments
Post a Comment