Coding Systems International Specify Coding
Most of the time, Emacs can recognize which coding system to use for any given file---once you have specified your preferences.
Some coding systems can be recognized or distinguished by which byte sequences appear in the data. However, there are coding systems that cannot be distinguished, not even potentially. For example, there is no way to distinguish between Latin-1 and Latin-2; they use the same byte values with different meanings.
Emacs handles this situation by means of a priority list of coding systems. Whenever Emacs reads a file, if you do not specify the coding system to use, Emacs checks the data against each coding system, starting with the first in priority and working down the list, until it finds a coding system that fits the data. Then it converts the file contents assuming that they are represented in this coding system.
The priority list of coding systems depends on the selected language environment (see Language Environments). For example, if you use French, you probably want Emacs to prefer Latin-1 to Latin-2; if you use Czech, you probably want Latin-2 to be preferred. This is one of the reasons to specify a language environment.
However, you can alter the priority list in detail with the command M-x prefer-coding-system. This command reads the name of a coding system from the minibuffer, and adds it to the front of the priority list, so that it is preferred to all others. If you use this command several times, each use adds one element to the front of the priority list.
If you use a coding system that specifies the end-of-line conversion
type, such as iso-8859-1-dos
, what that means is that Emacs
should attempt to recognize iso-8859-1
with priority, and should
use DOS end-of-line conversion in case it recognizes iso-8859-1
.
Sometimes a file name indicates which coding system to use for the
file. The variable file-coding-system-alist
specifies this
correspondence. There is a special function
modify-coding-system-alist
for adding elements to this list. For
example, to read and write all `.txt
' files using the coding system
china-iso-8bit
, you can execute this Lisp expression:
(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
The first argument should be file
, the second argument should be
a regular expression that determines which files this applies to, and
the third argument says which coding system to use for these files.
Emacs recognizes which kind of end-of-line conversion to use based on
the contents of the file: if it sees only carriage-returns, or only
carriage-return linefeed sequences, then it chooses the end-of-line
conversion accordingly. You can inhibit the automatic use of
end-of-line conversion by setting the variable inhibit-eol-conversion
to non-nil
.
You can specify the coding system for a particular file using the
`-*-...-*-
' construct at the beginning of a file, or a local
variables list at the end (see File Variables). You do this by
defining a value for the ``variable'' named coding
. Emacs does
not really have a variable coding
; instead of setting a variable,
it uses the specified coding system for the file. For example,
`-*-mode: C; coding: latin-1;-*-
' specifies use of the Latin-1
coding system, as well as C mode. If you specify the coding explicitly
in the file, that overrides file-coding-system-alist
.
The variable auto-coding-alist
is the strongest way to specify
the coding system for certain patterns of file names; this variable even
overrides `-*-coding:-*-
' tags in the file itself. Emacs uses this
feature for tar and archive files, to prevent Emacs from being confused
by a `-*-coding:-*-
' tag in a member of the archive and thinking it
applies to the archive file as a whole.
Once Emacs has chosen a coding system for a buffer, it stores that
coding system in buffer-file-coding-system
and uses that coding
system, by default, for operations that write from this buffer into a
file. This includes the commands save-buffer
and
write-region
. If you want to write files from this buffer using
a different coding system, you can specify a different coding system for
the buffer using set-buffer-file-coding-system
(see Specify Coding).
When you send a message with Mail mode (see Sending Mail), Emacs has
four different ways to determine the coding system to use for encoding
the message text. It tries the buffer's own value of
buffer-file-coding-system
, if that is non-nil
. Otherwise,
it uses the value of sendmail-coding-system
, if that is
non-nil
. The third way is to use the default coding system for
new files, which is controlled by your choice of language environment,
if that is non-nil
. If all of these three values are nil
,
Emacs encodes outgoing mail using the Latin-1 coding system.
When you get new mail in Rmail, each message is translated
automatically from the coding system it is written in---as if it were a
separate file. This uses the priority list of coding systems that you
have specified. If a MIME message specifies a character set, Rmail
obeys that specification, unless rmail-decode-mime-charset
is
nil
.
For reading and saving Rmail files themselves, Emacs uses the coding
system specified by the variable rmail-file-coding-system
. The
default value is nil
, which means that Rmail files are not
translated (they are read and written in the Emacs internal character
code).