Why does the count function not work with card values ​​in Spark?

advertisements

I am doing some basic handson in spark using scala.

I would like to know why the count function is not working with mapValues and map function

When I apply sum,min,max then it works.. Also Is there any place where I can refer all the applicable functions that can be applied on Iterable[String] from groupbykeyRDD?

MyCode:

scala> val records = List( "CHN|2", "CHN|3" , "BNG|2","BNG|65")
records: List[String] = List(CHN|2, CHN|3, BNG|2, BNG|65)

scala> val recordsRDD = sc.parallelize(records)
recordsRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[119] at parallelize at <console>:23

scala> val mapRDD = recordsRDD.map(elem => elem.split("\\|"))
mapRDD: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[120] at map at <console>:25

scala> val keyvalueRDD = mapRDD.map(elem => (elem(0),elem(1)))
keyvalueRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[121] at map at <console>:27

scala> val groupbykeyRDD = keyvalueRDD.groupByKey()
groupbykeyRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = ShuffledRDD[122] at groupByKey at <console>:29

scala> groupbykeyRDD.mapValues(elem => elem.count).collect
<console>:32: error: missing arguments for method count in trait  TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
          groupbykeyRDD.mapValues(elem => elem.count).collect
                                               ^

scala> groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect
<console>:32: error: missing arguments for method count in trait TraversableOnce;
follow this method with `_' if you want to treat it as a partially applied function
          groupbykeyRDD.map(elem => (elem._1 ,elem._2.count)).collect

Expected output :

 Array((CHN,2) ,(BNG,2))


The error you are having has nothing to do with spark, it's a pure scala compilation error.

You can try in a scala (no spark at all) console :

scala> val iterableTest: Iterable[String] = Iterable("test")
iterableTest: Iterable[String] = List(test)

scala> iterableTest.count
<console>:29: error: missing argument list for method count in trait TraversableOnce

This is because Iterable does not define a count (with no arguments) method. It does define a count method, though, but which needs a predicate function argument, which is why you get this specific error about partially unapplied functions.

It does have a size method though, that you could swap in your sample to make it work.