There are three main properties of an ADR Cue data text file that we are going to look at:
- Text encoding
- Line ending characters
- Field delimiter characters
The text encoding sets the rule used to convert text to and from binary data; the line ending character splits the text of the whole file into separate lines, or in this case ADR cues; while the field delimiter character splits the text of each line into separate fields.
As an example, a table containing the following data...
|Cue 1 – Field 1||Cue 1 – Field 2||Cue 1 – Field 3|
|Cue 2 – Field 1||Cue 2 – Field 2||Cue 2 – Field 3|
|Cue 3 – Field 1||Cue 3 – Field 2||Cue 3 – Field 3|
|Cue 4 – Field 1||Cue 4 – Field 2||Cue 4 – Field 3|
...would be exported to a text file with the line ending and delimiter characters in the following positions:
Cue 1 – Field 1 [DEL] Cue 1 – Field 2 [DEL] Cue 1 – Field 3 [EOL]
Cue 2 – Field 1 [DEL] Cue 2 – Field 2 [DEL] Cue 2 – Field 3 [EOL]
Cue 3 – Field 1 [DEL] Cue 3 – Field 2 [DEL] Cue 3 – Field 3 [EOL]
Cue 4 – Field 1 [DEL] Cue 4 – Field 2 [DEL] Cue 4 – Field 3 [EOL]
To a computer, a text file is just a bunch of ones and zeros. As it cannot store a "letter" or "number" directly, it needs to encode them into a sequence of bits or ones and zeros. The rule that is used to convert the text and numbers to and from these bits (binary data) is called the Encoding Scheme or Encoding.
The base encoding used by most computers is called ASCII and contains 128 characters. These characters include A-Z, uppercase and lower case, the numbers 0-9 and various symbols and control characters. This encoding works fine for basic English text, but to accommodate other languages such as German, additional encodings were created to extend the base ASCII definition and store characters like: é, ß, ü, ä and ö. These extended encodings are platform specific as can be seen below from some of the encodings available in the OS X application TextEdit:
In western countries, OS X will, by default, save a text file with a Mac OS Roman encoding while Windows will save a text file with a Windows Latin (otherwise referred to as Windows ANSI or Windows 1252) encoding. If you create a text file in Windows, saving it with the default Windows Latin text encoding and then open it in OS X with its default text encoding Mac OS Roman, most characters will display correctly, as both encodings use the base ASCII character set. However, as in the example above, certain characters will be displayed differently because they use a different encoding. For example, the characters “ and ” in Windows will be displayed in OS X as ì and î. To display these characters correctly in OS X, the text file would need to be re-opened with a Windows Latin text encoding (the encoding that was used to save the file).
In addition to Mac OS Roman and Windows Latin, the other encodings you’ll come across are Unicode encodings, for example UTF-8 and UTF-16. Unicode encodings define over 120,000 characters and are used to store the text of most languages. Hence when working with non-English text, especially Asia languages or when there are a mixture of languages, it is best to use a Unicode encoding.
Encodings in practice
Every system has its default encoding depending on the language selected. When all systems involved use the same platform and language, text files can be transferred without any issues. Errors only occur when a text file is transferred between systems that are on a different platform or language setting – and therefore use a different default encoding. Then you need to know what encoding was used to save the document, so the application opening the file can be setup to use that encoding and decode the binary data correctly.
All text-based applications should have a way to set the text encoding used to open and save its files. Some will even try to automatically set the encoding while loading, or others like our application EdiCue, provide a preview window so that you can try different encoding settings while checking to see that all characters are displayed correctly.
It is also important to save text files with an encoding that will be able to encode or store the characters in your text.
Line ending characters
For ADR Cue data that has been exported from a third-party application, each line of a text file represents a separate cue. To tell the computer that the next character needs to be displayed on a new line, an invisible line ending character is inserted into the text.
The character or characters used to signal this new line vary between operating systems. Either a LF (line feed) character or CR (carriage return) character or both are inserted. Here is a list of what each operating system uses:
As text files generated in Windows by default contain a CR and LF character, when opened by an application in OS X these files will open correctly as it only needs to see an LF character to display the text correctly. However, when opening a text file in Windows that was created in OS X, you may find that all text is displayed on a single line as Windows applications are expecting a CR and LF character to signal a new line while OS X applications will only insert an LF character.
To overcome this problem, the OS X and Windows versions of EdiCue generate text files with Windows line endings. While importing text files into the OS X and Windows versions of EdiCue, EdiCue automatically detects the line ending used.
Field delimiter characters
To separate the contents of each line into separate fields a field delimiter character is inserted. Various characters can be used to perform this task. The most common are a tab, comma or semi-colon character.
If you have a choice over which delimiter character is used while exporting a text file, select one that isn’t already present in your data. If you don’t, your fields will not be separated correctly during import. Selecting a tab character as a delimiter generally works best as ADR cue data generally doesn’t contain a tab character.