Transferring ADR Cues - Part 1

Being able to transfer ADR cue data from one system or application to another can save hours of manual cueing, or copy and pasting of individual data fields. However, the most common way of transferring this data is via a text file, and there are several things about this method that can get you stuck. When dealing with non-English text, or transferring between different operating systems, there’s a good chance your data won’t come across as expected.

Split into two parts, this first blog post provides a background to how text files work, while part two provides examples of how to manage text files and prepare them for import into EdiCue.


Overview

There are three main properties of an ADR Cue data text file that we are going to look at:

    • Text encoding
    • Line ending characters
    • Field delimiter characters

The text encoding sets the rule used to convert text to and from binary data; the line ending character splits the text of the whole file into separate lines, or in this case ADR cues; while the field delimiter character splits the text of each line into separate fields.

As an example, a table containing the following data...

Sample Data
Cue 1 – Field 1 Cue 1 – Field 2 Cue 1 – Field 3
Cue 2 – Field 1 Cue 2 – Field 2 Cue 2 – Field 3
Cue 3 – Field 1 Cue 3 – Field 2 Cue 3 – Field 3
Cue 4 – Field 1 Cue 4 – Field 2 Cue 4 – Field 3

 ...would be exported to a text file with the line ending and delimiter characters in the following positions:

Cue 1 – Field 1 [DEL] Cue 1 – Field 2 [DEL] Cue 1 – Field 3 [EOL]
Cue 2 – Field 1 [DEL] Cue 2 – Field 2 [DEL] Cue 2 – Field 3 [EOL]
Cue 3 – Field 1 [DEL] Cue 3 – Field 2 [DEL] Cue 3 – Field 3 [EOL]
Cue 4 – Field 1 [DEL] Cue 4 – Field 2 [DEL] Cue 4 – Field 3 [EOL]

Placement of line ending ('EOL') and delimiter ('DEL') characters
Text Encoding

To a computer, a text file is just a bunch of ones and zeros. As it cannot store a "letter" or "number" directly, it needs to encode them into a sequence of bits or ones and zeros. The rule that is used to convert the text and numbers to and from these bits (binary data) is called the Encoding Scheme or Encoding.

The base encoding used by most computers is called ASCII and contains 128 characters. These characters include A-Z, uppercase and lower case, the numbers 0-9 and various symbols and control characters. This encoding works fine for basic English text, but to accommodate other languages such as German, additional encodings were created to extend the base ASCII definition and store characters like: é, ß, ü, ä and ö. These extended encodings are platform specific as can be seen below from some of the encodings available in the OS X application TextEdit: 

Some of the encodings available in TextEdit

For western countries the encoding Mac OS Roman was created for Mac OS, while Windows Latin was created for Windows. Both contain 256 characters, the first 128 of which are identical to ASCII.

In western countries, OS X will, by default, save a text file with a Mac OS Roman encoding while Windows will save a text file with a Windows Latin (otherwise referred to as Windows ANSI or Windows 1252) encoding. If you create a text file in Windows, saving it with the default Windows Latin text encoding and then open it in OS X with its default text encoding Mac OS Roman, most characters will display correctly, as both encodings use the base ASCII character set. However, as in the example above, certain characters will be displayed differently because they use a different encoding. For example, the characters “ and ” in Windows will be displayed in OS X as ì and î. To display these characters correctly in OS X, the text file would need to be re-opened with a Windows Latin text encoding (the encoding that was used to save the file).

In addition to Mac OS Roman and Windows Latin, the other encodings you’ll come across are Unicode encodings, for example UTF-8 and UTF-16. Unicode encodings define over 120,000 characters and are used to store the text of most languages. Hence when working with non-English text, especially Asia languages or when there are a mixture of languages, it is best to use a Unicode encoding.

Encodings in practice

Every system has its default encoding depending on the language selected. When all systems involved use the same platform and language, text files can be transferred without any issues. Errors only occur when a text file is transferred between systems that are on a different platform or language setting – and therefore use a different default encoding. Then you need to know what encoding was used to save the document, so the application opening the file can be setup to use that encoding and decode the binary data correctly.

All text-based applications should have a way to set the text encoding used to open and save its files. Some will even try to automatically set the encoding while loading, or others like our application EdiCue, provide a preview window so that you can try different encoding settings while checking to see that all characters are displayed correctly. 

It is also important to save text files with an encoding that will be able to encode or store the characters in your text. 

Line ending characters

For ADR Cue data that has been exported from a third-party application, each line of a text file represents a separate cue. To tell the computer that the next character needs to be displayed on a new line, an invisible line ending character is inserted into the text. 

The character or characters used to signal this new line vary between operating systems. Either a LF (line feed) character or CR (carriage return) character or both are inserted. Here is a list of what each operating system uses: 

Default line ending characters
Operating SystemCharacter
OS X LF
OS-9 CR
Windows CR+LF

As text files generated in Windows by default contain a CR and LF character, when opened by an application in OS X these files will open correctly as it only needs to see an LF character to display the text correctly. However, when opening a text file in Windows that was created in OS X, you may find that all text is displayed on a single line as Windows applications are expecting a CR and LF character to signal a new line while OS X applications will only insert an LF character.

To overcome this problem, the OS X and Windows versions of EdiCue generate text files with Windows line endings. While importing text files into the OS X and Windows versions of EdiCue, EdiCue automatically detects the line ending used.

Field delimiter characters

To separate the contents of each line into separate fields a field delimiter character is inserted. Various characters can be used to perform this task. The most common are a tab, comma or semi-colon character.

If you have a choice over which delimiter character is used while exporting a text file, select one that isn’t already present in your data. If you don’t, your fields will not be separated correctly during import. Selecting a tab character as a delimiter generally works best as ADR cue data generally doesn’t contain a tab character. 


Let me know if you have any questions or feedback in the comments below. See part two on how to manage text files and prepare them for import into our application EdiCue.

Mark

Last modified on Wednesday, 24 February 2016 22:08
Mark Franken

Mark Franken is an award-winning software developer and founder of Sounds In Sync, specializing in cutting edge solutions for sound post-production. His work as a sound editor on some of the top feature films and television shows has inspired him to develop these programs that are used by sound professionals word-wide. He can be contacted via the Sounds In Sync website.

Website: www.soundsinsync.com/contact

Sounds In Sync

PO Box 51
New Brighton
NSW 2483
Australia