Can an MCU really do parallel processing?
Let's just say that I wana countdown, send data through another interface, and do one more work such as Light up an LED all at the sametime.
Is that even possible?
A processor with multiple execution units or cores can perform parallel processing. Most microcontrollers do not have multiple execution units.
Some architectures support SIMD (Single Instruction/Multiple Data) instructions that can generate multiple results from a single instruction - this is a low level form of parallel processing, similarly DSPs (Digital Signal Processors) and microcontrollers with DSP instructions support dual or multiple MAC (multiply/accumulate) units that are also a form of parallel processing. Both SIMD and MAC are used primarily for number crunching and signal processing applications. High end DSPs often support other instruction level parallel execution capabilities.
Another low-level architecture feature that allows parallel execution is pipeline execution. This allows instructions that may take multiple cycles to run to generate one result per cycle by running different stages of the same operation simultaneously.
Most microcontrollers can support a multi-tasking or multi-threading scheduler that can give the impression of concurrent execution by scheduling execution time to each task according to the scheduling algorithm used. While this is not parallel processing and in fact adds an overhead rather than accelerates processing, it is useful in other ways such as functional partitioning of the code and, in the case of a real-time priority based preemptive scheduler, achieving real-time response to events. For the example used case you give in your question, this form of scheduling is entirely appropriate and adequate. See Real-time Operating System (RTOS)
A way of achieving a high level of parallelism at a low level where individual operations of the same process can occur simultaneously (when one does not depend of the result of the other, or a pipeline is used) is to implement a process on an FPGA - essentially to implement the processing in hardware rather than software, but the languages used to program FPGAs share similarities with software languages.