What is the difference between ^ M, $ and \ n (newline) in Python?

advertisements

This question already has an answer here:

  • What does ^M character mean in Vim? 13 answers

I´m working with a text file (containing vcards) and need to remove ^M (leaving newlines) at the end of each line except within the NOTES field (since it seems that there the ^M signals the line continues and wraps around). See enclosed figure taken from a vi editor of the file where the newline characters are shown in blue automatically by vi...

If I read the lines of the file with a with statement (and write them after processing them), how should I process each line?

More broadly, what is the difference between ^M, $, and \n (newline) in Python? What is the role of each?

Note that the focus of my question is completely unrelated to:

Vcard parser with Python

Of course I can just remove the odd behavior in the NOTES field within vi by entering s/\n // or using re or even with serialize or even using the sortedChildren method within vobject. But this is not the focus of my question.

The focus is more broad. It is to understand what is this ^M character, if it is related to newlines, if it is related to Python or just a vobject construct. If it is the latter, and ^M has no general meaning, why it´s signaled in blue by the vi editor? What I find a bit odd is that in vi these line breaks with the ^M are followed by a carriage return and a BLANK space as if "^M$ " was a special sequence to denote "unintentional line break"... Again, is this three character sequence special to vobjects, more generic or just part of my imagination (and in either of these three cases why blue on vi).

What I´m trying to understand in the current question is why vi marks ^M as blue, what is the difference with $ in vi and weather these two characters have any special meaning in python. Since I notice that vi sets ^M in carriage returns related to the "NOTES" field, which seems to have fixed length (regardless of return breaks on NOTES), I´m trying to understand why and I have not found any explanation for it.


RFC 6350 states:

Individual lines within vCard are delimited by […] a CRLF sequence (U+000D followed by U+000A).

[…]

Long logical lines of text can be split into a multiple-physical-line representation […] by inserting a CRLF immediately followed by a single white space character (space (U+0020) or horizontal tab (U+0009)).

You're only noticing this in the NOTES section because you found long lines of text there. You have to read the documentation for the file format you are trying to parse.