Why we should care about DUALabs
J.B. Nicholson-Owens
April, 2004
Minor updates on June 6, 2004, and April 4, 2005
What is DUALabs?
The United States government hired a private programming firm called DUALabs to create a data compression program. This program would be used to distribute the 1960 and 1970 censuses in compressed form on fewer tapes.
What happened with these censuses?
1960 and 1970 US census data was distributed only in the proprietary DUALabs compression format. Like any computer programmer, DUALabs wrote their compression software to run on the computers of the day. This meant that DUALabs wrote IBM assembler language for the IBM 370/135. With effort, DUALabs' source code could be translated into something that could run on today's computers and today's operating systems. But we don't have DUALabs' source code to work with.
Data archivists of the 1960s and 1970s decompressed some of the census data and managed to keep decompressed copies of portions of the 1960 and 1970 censuses around, but apparently nobody decompressed the entire 1960 or 1970 census. For many years now, data archivists who want to do research with this census data must cobble together incomplete copies from previous decompression runs and (if that is insufficient) reverse-engineer the format to decompress it. To date, nobody has a complete copy of the 1960 and 1970 censuses. The DUALabs compressed copies are available.
All of this happened because the federal government didn't have the foresight to require that the contractors at DUALabs write free software—the US Government didn't require that DUALabs supply complete source code for the software as well as the executable copies of the programs. Contractors are amenable to writing what will be released as free software because contractors get paid for the work writing the software, not trying to distribute copies of the program for a fee.
What became of DUALabs?
DUALabs went out of business in the 1970s and I know of nobody who has the source code to the compression software they wrote. I have found a textual reference to a Perl program which claims to convert DUALabs format into ASCII, but I'm not able to find a copy of the program.
What can we learn from the DUALabs example?
Today many people encourage the use of non-free software and non-free data formats. They encourage you to adopt Monkey's Audio, Apple's lossless codec, or Shorten for lossless audio compression. They should encourage the use of FLAC instead despite that other audio compressors are more popular or they have some attractive technical feature.
Even the Internet Archive accepts and distributes data in non-free compression formats. The Internet Archive does not go out of their way to recommend the exclusive use of free software programs and free software data formats. Nowhere do they warn patrons or contributors to the archive that huge chunks of their archive can become utterly useless because of the format of the data. Patrons of the archive in the future will discover that software proprietors are not going to be around to distribute updated versions of these programs to run on future computers and future operating systems. It is foolish to choose to become dependent upon software proprietors. The best way to ensure that you don't make the same mistake the US government did is to use exclusively free software on your computer and use only free software data formats. Even if your hardware becomes obsolete, you can hire a programmer to write a new program to read the archived data.