machine learning - generating vector from text data for KMeans using spark -
i new spark , machine learning. trying cluster using kmeans data like 1::hi how 2::i fine, how in data, separator :: , actual text cluster second column has text data. after reading on spark official page , numerous articles have written following code not able generate vector provide input kmeans.train step. import org.apache.spark.sparkconf import org.apache.spark.sparkcontext import org.apache.spark.mllib.clustering.{kmeans, kmeansmodel} import org.apache.spark.mllib.linalg.vectors val sc = new sparkcontext("local", "test") val sqlcontext= new org.apache.spark.sql.sqlcontext(sc) import sqlcontext.implicits._ import org.apache.spark.ml.feature.{hashingtf, idf, tokenizer} val rawdata = sc.textfile("data/mllib/km.txt").map(line => line.split("::")(1)) val sentencedata = rawdata.todf("sentence") val tokenizer = new tokenizer().setinputcol("sentence").setoutputcol("words") val wordsdata = tokenizer...