I am doing some basic handson in spark using scala.
I would like to know why the count function is not working with mapValues and map function
When I apply sum,min,max then it works.. Also Is there any place where I can refer all the applicable functions that can be applied on Iterable[String] from groupbykeyRDD?
MyCode:
scala> val records = List( "CHN|2", "CHN|3" , "BNG|2","BNG|65")
records: List[String] = List(CHN|2, CHN|3, BNG|2, BNG|65)
scala> val recordsRDD = sc.parallelize(records)
recordsRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[119] at parallelize at <console>:23
scala> val mapRDD = recordsRDD.map(elem => elem.split("\\|"))
mapRDD: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[120] at map at <console>:25
scala> val keyvalueRDD = mapRDD.map(elem => (elem(0),elem(1)))
keyvalueRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[121] at map at <console>:27
scala> val groupbykeyRDD = keyvalueRDD.groupByKey()
groupbykeyRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = ShuffledRDD[122] at groupByKey at <console>:29
scala> groupbykeyRDD.mapValues(elem => elem.count).collect
<console>:32: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
groupbykeyRDD.mapValues(elem => elem.count).collect
^
scala> groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect
<console>:32: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect
Expected output :
Array((CHN,2) ,(BNG,2))
The error you are having has nothing to do with spark, it's a pure scala compilation error.
You can try in a scala (no spark at all) console :
scala> val iterableTest: Iterable[String] = Iterable("test")
iterableTest: Iterable[String] = List(test)
scala> iterableTest.count
<console>:29: error: missing argument list for method count in trait TraversableOnce
This is because Iterable
does not define a count
(with no arguments) method. It does define a count method, though, but which needs a predicate function argument, which is why you get this specific error about partially unapplied functions.
It does have a size
method though, that you could swap in your sample to make it work.