Unix Notes

Unix evolution

The Evolution of the Unix Time-Sharing System, by Dennis M. Ritchie

Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in Unix, a forked process continues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system [2], which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else. The system already handled multiple (i.e. two) processes; there was a process table, and the processes were swapped between main memory and the disk. The initial implementation of fork required only

Expansion of the process table

Addition of a fork call that copied the current process to the disk swap area, using the already existing swap IO primitives, and made some adjustments to the process table.

In fact, the PDP-7’s fork call required precisely 27 lines of assembly code.

Also check out the evolution of a prior smes/rmes messaging feature into the current exit and wait.

Advice from Doug McIlroy — the source of pipes!

Literate programming and Unix tools

Programming pearls: Literate programming, by Jon Bentley with Don Knuth, CACM, May 1986

When was the last time you spent a pleasant evening in a comfortable chair, reading a good program? …

… I wrote Knuth a letter, asking whether he had any spare programs handy that I might publish.… He responded, “Why should you let me choose the program? My claim is that programming is an artistic endeavor and that the WEB system gives me the best way to write beautiful programs. Therefore I should be able to meet a stiffer test: I should be able to write a superliterate program that will be noticeably better than an ordinary one, whatever the topic. So how about this: you tell me what sort of program you want me to write, and I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose—at least the best by current standards.” …

I chose a problem that I had assigned to several classes on data structures.

Given a text file and an integer K, you are to print the K most common words in the file (and the number of their occurrences) in decreasing frequency.

I left open a number of issues, such as the precise definition of words and limits on sizes such as maximum number of words. I did impose an efficiency constraint: a user should be able to find the 100 most frequent words in a twenty-page technical paper without undue emotional trauma.

Programming pearls: A literate program, by Jon Bentley with Don Knuth and Doug McIlroy; CACM, June 1986

From Knuth’s program:

Given a word in the buffer, we will want to look for it in a dynamic dictionary of all words that have appeared so far. We expect many words to occur often, so we want a search technique that will find existing words quickly. Furthermore, the dictionary should accommodate words of variable length, and (ideally) it should also facilitate the task of alphabetic ordering.

These constraints suggest a variant of the data structure introduced by Frank M. Liang in his Ph.D. thesis [“Word Hy-phen-a-tion by Com-pu-ter,” Stanford University, 1983]. Liang’s structure, which we may call a hash trie, requires comparatively few operations to find a word that is already present, although it may take somewhat longer to insert a new entry. Some space is sacrificed—we will need two pointers, a count, and another 5-bit field for each character in the dictionary, plus extra space to keep the hash table from becoming congested—but relatively large memories are commonplace nowadays, so the method seems ideal for the present application.

From Doug McIlroy’s review:

I found Don Knuth’s program convincing as a demonstration of WEB and fascinating for its data structure, but I disagree with it on engineering grounds. …

[A] major piece of engineering built for the ages, as Knuth’s program is, should have a large factor of safety. Would it, for example, work on the Bible? [500,000 machine words might be necessary.] Knuth provided for 128K integers … Still, unless the program were run in a multi-megabyte [!] memory, it would likely have to ignore some late-arriving words, and not necessarily the least frequent ones, either: The word “Jesus” doesn’t appear until three-fourths of the way through the Bible.

Very few people can obtain the virtuoso services of Knuth (or afford the equivalent person-weeks of lesser personnel) to attack nonce problems such as Bentley’s from the ground up. But old Unix hands know instinctively how to solve this one in a jiffy. … The following shell script was written on the spot and worked on the first try. It took 30 seconds to handle a 10,000-word file on a VAC-11/750.
(1)   tr -cs A-Za-z '
      ' |
(2)   tr A-Z a-z |
(3)   sort |
(4)   uniq -c |
(5)   sort -rn |
(6)   sed ${1}q                     [or “head -n $1”]
… The utilities employed in this trivial solution are Unix staples. They grew up over the years as people noticed useful steps that tended to recur in real problems. Every one was written first for a particular need, but untangled from the specific application. …

Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial-strength Faberge egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start.