I am running spark-1.0.0 by connecting to a spark standalone cluster which has one master and two slaves. I ran wordcount.py by Spark-submit, actually it reads data from HDFS and also write the results into HDFS. So far everything is fine and the results will correctly be written into HDFS. But the thing makes me concern is that when I check Stdout for each worker, it is empty I dont know whether it is suppose to be empty? and I got following in stderr:
stderr log page for Some(app-20140704174955-0002)
Spark Executor Command: "java" "-cp" ":: /usr/local/spark-1.0.0/conf: /usr/local/spark-1.0.0 /assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar:/usr/local/hadoop/conf" " -XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend " "akka.tcp://[email protected]:54477/user/CoarseGrainedScheduler" "0" "slave2" "1 " "akka.tcp://[email protected]:41483/user/Worker" "app-20140704174955-0002" ======================================== 14/07/04 17:50:14 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://[email protected]:33758] -> [akka.tcp://[email protected]:54477] disassociated! Shutting down.
Spark always writes everything, even INFO to stderr. People seem to do this to stop stdout buffering messages and causing less predictable logging. It's an acceptable practice when it's known that an application is never going to be used in bash scripting, so especially common for logging.