Recognize Coding

GNU Emacs Manual. Node: Recognize Coding

PREV Coding Systems International NEXT Specify Coding

16.8: Recognizing Coding Systems

Most of the time, Emacs can recognize which coding system to use for any given file---once you have specified your preferences.

Some coding systems can be recognized or distinguished by which byte sequences appear in the data. However, there are coding systems that cannot be distinguished, not even potentially. For example, there is no way to distinguish between Latin-1 and Latin-2; they use the same byte values with different meanings.

Emacs handles this situation by means of a priority list of coding systems. Whenever Emacs reads a file, if you do not specify the coding system to use, Emacs checks the data against each coding system, starting with the first in priority and working down the list, until it finds a coding system that fits the data. Then it converts the file contents assuming that they are represented in this coding system.

The priority list of coding systems depends on the selected language environment (see Language Environments). For example, if you use French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use Czech, you probably want Latin-2 to be preferred. This is one of the reasons to specify a language environment.

However, you can alter the priority list in detail with the command M-x prefer-coding-system. This command reads the name of a coding system from the minibuffer, and adds it to the front of the priority list, so that it is preferred to all others. If you use this command several times, each use adds one element to the front of the priority list.

If you use a coding system that specifies the end-of-line conversion type, such as iso-8859-1-dos, what that means is that Emacs should attempt to recognize iso-8859-1 with priority, and should use DOS end-of-line conversion in case it recognizes iso-8859-1.

Sometimes a file name indicates which coding system to use for the file. The variable file-coding-system-alist specifies this correspondence. There is a special function modify-coding-system-alist for adding elements to this list. For example, to read and write all `.txt' files using the coding system china-iso-8bit, you can execute this Lisp expression:

(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)

The first argument should be file, the second argument should be a regular expression that determines which files this applies to, and the third argument says which coding system to use for these files.

Emacs recognizes which kind of end-of-line conversion to use based on the contents of the file: if it sees only carriage-returns, or only carriage-return linefeed sequences, then it chooses the end-of-line conversion accordingly. You can inhibit the automatic use of end-of-line conversion by setting the variable inhibit-eol-conversion to non-nil.

You can specify the coding system for a particular file using the `-*-...-*-' construct at the beginning of a file, or a local variables list at the end (see File Variables). You do this by defining a value for the ``variable'' named coding. Emacs does not really have a variable coding; instead of setting a variable, it uses the specified coding system for the file. For example, `-*-mode: C; coding: latin-1;-*-' specifies use of the Latin-1 coding system, as well as C mode. If you specify the coding explicitly in the file, that overrides file-coding-system-alist.

The variable auto-coding-alist is the strongest way to specify the coding system for certain patterns of file names; this variable even overrides `-*-coding:-*-' tags in the file itself. Emacs uses this feature for tar and archive files, to prevent Emacs from being confused by a `-*-coding:-*-' tag in a member of the archive and thinking it applies to the archive file as a whole.

Once Emacs has chosen a coding system for a buffer, it stores that coding system in buffer-file-coding-system and uses that coding system, by default, for operations that write from this buffer into a file. This includes the commands save-buffer and write-region. If you want to write files from this buffer using a different coding system, you can specify a different coding system for the buffer using set-buffer-file-coding-system (see Specify Coding).

When you send a message with Mail mode (see Sending Mail), Emacs has four different ways to determine the coding system to use for encoding the message text. It tries the buffer's own value of buffer-file-coding-system, if that is non-nil. Otherwise, it uses the value of sendmail-coding-system, if that is non-nil. The third way is to use the default coding system for new files, which is controlled by your choice of language environment, if that is non-nil. If all of these three values are nil, Emacs encodes outgoing mail using the Latin-1 coding system.

When you get new mail in Rmail, each message is translated automatically from the coding system it is written in---as if it were a separate file. This uses the priority list of coding systems that you have specified. If a MIME message specifies a character set, Rmail obeys that specification, unless rmail-decode-mime-charset is nil.

For reading and saving Rmail files themselves, Emacs uses the coding system specified by the variable rmail-file-coding-system. The default value is nil, which means that Rmail files are not translated (they are read and written in the Emacs internal character code).

Coding Systems

International NEXT

Specify Coding