Live Sound Over Ethernet, By Mark Amundson
Well, I could not put it off any longer: this analog curmudgeon has to acknowledge that the invasion of digitized live audio has begun and will benefit us all. So I spent many hours diving through the jargon and FAQs to bring you all the dirt on Ethernet and the popular live sound network schemes. I tell ya, my head hurts, and now I understand why IT networking professionals command six-figure salaries.

No, I am not a “newbie” at digital and Ethernet, as I have done my penance designing communications protocols and seen enough ack/nack packet ladder charts for a couple of lifetimes. My challenge to you is this: kick back and try to digest the fire hose of info I am about to lay on you. I will attempt some layman descriptions along the way.

Ethernet

I am going to forego the Ethernet history lesson and mention that Ethernet is a seven-layer system of protocols that define the method of network communication. These layers are briefly described in Table 1. A protocol is simply a set of rules that narrow the freedom of how data/signals flow. It you relate Ethernet to using a highway system, the physical protocol would be constraints on the road and lane sizes, the minimum and maximum speed limits, the amount of vehicles that could pass per time, and the sizes of the vehicles. The analogy applies to the datalink layer, as the vehicles have to have defined license plates, bumpers, tires, weight restrictions and definite start locations and destination addresses. Extending this to the network layer, you would describe the payload (human or cargo), how many items and the order/method of unloading them.

In live sound uses for the Ethernet, two dominant network schemes are used today: CobraNet and EtherSound. While I plan to hit the high points of each, first the common physical layer aspects each need to be explained. Both use a cable system with RJ-45 eight-pin connections with at least four wires (two twisted pairs) defined as 100baseTX, or capable of 100 Mega-bit per second transmission rates up to 100 meters. Figure 1 shows the pinout of the RJ-45 plugs and sockets with the likely wire color codes. For most of us, this is supposed to be the end of the technology discussion, and the manufacturers will wish us to just buy what we need from here out.

Ma Bell to the Rescue

I want to spend some quality time on the cable and connector description, as it will pay dividends in related ways. The whole “RJ” system of connectors was developed for low cost deployment, but needs special attention, as it is the weakest link. When dealing with RJ connector cable wiring, described as category 5 unshielded twisted pair or CAT5 UTP, the color code is borrowed from the telephone industry. Telephone cable wire basic colors in order are 1. Blue, 2. Orange, 3. Green, 4. Brown and 5. Slate gray. In addition, each wire is striped and has another color called a binder color that indicates which multiple of the five wire colors are necessary to get to 25 circuit bundles. These binder colors are 1-5 White, 6-10 Red, 11-15 Black, 16-20 Yellow and 21-25 Violet. For most RJ-45 wiring, the binder color will always be white, with blue, orange, green and brown as basic colors for the eight pins.

To complicate things a touch, all wires come in pairs with tip(+) conductor, using the binder color on the insulation striped with the basic color. The ring(-) conductor, twisted with tip conductor, is reverse colored, meaning the basic color is the insulation striped with the binder color. So, for example, pair one is white/blue (tip) and blue/white(ring). Note that there is no sleeve or ground reference in this cable system. Each of the conductors is usually a flimsy 24-gauge solid wire in the CAT5 cable, and prone to break if flexed improperly. Note that Ma Bell wires to most homes with 19 or 22-gauge pairs, with two pairs provided in case of breakage or if a second phone service is later required. Also note the blue/white and orange/white pairs are connected to your Network Interface Box on the outside of your residence. With Ethernet wiring, you may find yourself cross-patching to good pairs of wires on RJ-45 connections should you not have spare CAT5 cabling.

A quick note should be made that in an RJ-45, pair one is usually left spare and sometimes wired. Pair two (orange/white) is the DTE transmit data pair and pair three (green/white) is the DTE receive data pair. Pair four (brown/white) is usually not wired. All end-point gear is defined as DTE (data terminating equipment) and can be considered the senders and receivers of the digital audio traffic. DCE is data communications equipment and can be considered gear that combines, splits or filters (i.e. a firewall) data between DTE elements. DCE are hubs, routers, servers, gateways, bridges and switchers to pass data to the audio handling processors (patch panels, consoles, computers, etc.). Imagine a data highway with two lanes, with traffic going in opposite directions (full-duplex, transmit and receive). Even though some digital audio DTEs are dedicated senders or receivers, each has a return path to retrieve commands or status information from “master” DTE.

Spokes and Chains

Getting back to CobraNet and EtherSound networks, each has recommended wiring methods. CobraNet’s preferred method is a hub and spoke interconnection system where all DTEs connect to a single piece of gear, called an Ethernet hub, no matter if you are primarily a sender (master) or receiver (slave) of digital audio. One DTE element is defined as the primary master, or “conductor,” and sends about 750 “beat” frames per second as means of sequencing up to 64 channels of digital audio down the transmit wires to the hub. The hub takes all the DTE transmissions and sums them back as receive data to all the DTE receive wires. Unlike normal computer Ethernet protocols, CobraNet sets up a synchronous (isosynchronous) form of frame traffic where the conductor DTE orchestrates the sequence of other DTEs to transmit their digital audio data. Typical Ethernet non-audio purposes just send frames of data from the DTEs (PCs) whenever they feel like it (asynchronous), and any accidental collisions are retransmitted after a random delay.

EtherSound has a preferred method of chaining DTEs along a one-way path, much like you would do with a MIDI system. EtherSound DTEs usually have two to more RJ-45 connections with receive (in) and transmit (out) jacks denoted on each DTE element. DCE elements like hubs can still be used, but only if all the DTE transmitters are chained upstream on one spoke, and the hub multicasts to multiple receive DTE elements (like main and monitor digital consoles). The advantage of EtherSound is that chaining removes the hub, and the 64 channels of 24-bit, 48k-sample digital audio flow in only one direction from senders to receivers. Digigram is the parent of the EtherSound network protocols, and also manufactures single-rackspace DTE patch interfaces such as the ES8in for eight XLR jacked line level signal inputs, ES8out for eight XLR analog audio line outputs, and an ES8mic for eight XLR mic-level inputs. Thus, one could design sub-snake rack cases with multiple Es8mic and ES8in boxes chained back to monitor world’s console, then CAT5’d up to the FOH console. Then each console could send CAT5 cables back to side-stage ES8out boxes for main and monitor mixes, or directly to EtherSound capable speaker processors or power amplifiers.

Less of the Cable Mess

While both CobraNet and EtherSound will have at least two networks (digital audio sources to consoles, digital audio mixes from consoles), both share the isosynchronous traffic requirement to minimize audio latency and have slightly different control methods at the datalink and network layers. While CobraNet may look more difficult with spoked hub setup, if the sender and receiver DTEs are just two boxes, no hub is necessary, with all 64 channels available between the two DTEs. An example of this would be a single snakebox serving as the sender DTE and a single digital console serving as the receiver DTE and conductor element, plus doing both main and monitor mixes to a return CobraNet. Do not be afraid of acquiring computer store off-the-shelf Ethernet hubs for CobraNet, as their reliability is proven every day by cubicle PC rats like us. Figures 2 and 3 show typical CobraNet and EtherSound networks.

Contrasts and Conclusions

EtherSound uses the above-mentioned chain method of moving 64 channels of 24-bit, 48k-sample digital audio around to places of need. Companies like InnovaSON, Fostex, Digigram and others have adopted EtherSound as their system of choice. CobraNet uses a spoked method for 64 channels of 48k-sample digital audio with selectable resolutions of 16-, 20-, or 24-bit digitization. Companies like the Harmon Group (JBL, Crown, BSS, Soundcraft, etc.), QSC, Peavey and others have adopted CobraNet as their system of choice. EtherSound has a fixed latency of six samples, and CobraNet has a variable latency of up to 256 samples.

The main thing to remember about live audio over Ethernet is that each bit is transmitted at 10 nanoseconds, while each audio sample (usually three bytes) is moving in 0.24 microseconds excluding the frame and packet overheads. Moving 64 samples brings the math up to about 15 microseconds, leaving about 90% or more of the latency budget for overhead and DTE asynchronous messaging. With latencies from a fraction of a millisecond to a couple of milliseconds, no human should be able to detect any delay in the performance due to the networks.

Lastly, I want to extend kudos to Neutrik for coming out with a series of rugged RJ-45 connectors using XLR-like features for Ethernet cabling. Now all we need is some tour-grade CAT5 UTP for 100BaseTX transmissions. Ethernet connections are now possible in fiber-optic (100BaseFX), and will extend transmissions from 100 meters to two kilometers using media converters.

The above article was published by Front of House (FOH) Magazine.
[ www.FOHonline.com ]
[ Tech Archive ] [ Theory & Practice Archive ] [ FOH Article Archive ]
[ Back ] [ Contact Me ] [ Home Page ]