From pnr at planet.nl  Wed May  8 06:59:17 2024
From: pnr at planet.nl (Paul Ruizendaal)
Date: Tue, 7 May 2024 21:59:17 +0100 (GMT+01:00)
Subject: [TUHS] On the uniqueness of DMR's C compiler
Message-ID: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>

In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip

After creating a basic tool chain (edit, asm, link and a simple executive), John  set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.

This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)

As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).

There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.html
https://www.bell-labs.com/usr/dmr/www/chist.html
https://www.bell-labs.com/usr/dmr/www/hopl.html

It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.

As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.

I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.

Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240507/a97ea198/attachment.htm>

From robpike at gmail.com  Wed May  8 08:07:44 2024
From: robpike at gmail.com (Rob Pike)
Date: Wed, 8 May 2024 08:07:44 +1000
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
Message-ID: <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>

I'm not sure I accept your starting position. There were several compilers
for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code,
but I don't believe they all were. And of course there was BCPL, which was
- and is - tiny; was it on the 11?

And there were other small machines from other manufacturers, all of which
had some form of Fortran and other bespoke things, such as RPG on the small
IBMs. I think the uniqueness was in the set of conditions more than in the
Unix C compiler itself.

But you may be right.

-rob


On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal <pnr at planet.nl> wrote:

> In the last months, I've spent a little time on curating John Walker's
> Unix clone and software stack, including an emulator to run it:
> https://gitlab.com/marinchip
>
> After creating a basic tool chain (edit, asm, link and a simple
> executive), John  set out to find a compiler. Among the first programs were
> a port of the META 3 compiler-generator (similar to TMG on early Unix) and
> a port of Birch-Hansen’s Pascal compiler. META was used to create a
> compiler that generated threaded code. He found neither compiler good
> enough for his goals and settled on writing his Unix-like OS in assembler.
> As the 9900 architecture withered after 1980, this sealed the fate of this
> OS early on -- had he found a good compiler, the code might have competed
> alongside Coherent, Idris, and Minix during the 80’s.
>
> This made me realise once more how unique the Ritchie C compiler was. In
> my view its uniqueness combines three aspects:
> 1. The C language itself
> 2. The ability to run natively on small hardware (even an LSI-11 system)
> 3. Generating code with modest overhead versus handwritten assembler (say
> 30%)
>
> As has been observed before, working at a higher abstraction level makes
> it easier to work on algorithms and on refactoring, often earning back the
> efficiency loss. John Walkers work may be case in point: I estimate that
> his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as
> compiled for the 9900 architecture).
>
> There are three papers on DMR’s website about the history of the compiler
> and a compare-and-contrast with other compilers of the era:
> https://www.bell-labs.com/usr/dmr/www/primevalC.html
> https://www.bell-labs.com/usr/dmr/www/chist.html
> https://www.bell-labs.com/usr/dmr/www/hopl.html
>
> It seems to me that these papers rather understate the importance of
> generating good quality code. As far as I can tell, BCPL and BLISS came
> close, but were too large to run on a PDP-11 and only existed as
> cross-compilers. PL/M was a cross-compiler and generated poorer code.
> Pascal on small machines compiled to a virtual machine. As far as I can
> tell, during most of the 70s there was no other compiler that generated
> good quality code and ran natively on a small (i.e. PDP-11 class) machine.
>
> As far as I can tell the uniqueness was mostly in the “c1” phase of the
> compiler. The front-end code of the “c0” phase seems to use more or less
> similar techniques as many contemporary compilers. The “c1” phase seems to
> have been unique in that it managed to do register allocation and
> instruction selection with a pattern matcher and associated code tables
> squeezed into a small address space. On a small machine, other native
> compilers of the era typically proceeded to generate threaded code, code
> for a virtual machine or poor quality native code that evaluated
> expressions using stack operations rather than registers.
>
> I am not sure why DMR's approach was not more widely used in the 1970’s.
> The algorithms he used do not seem to be new and appear to have their roots
> in other (larger) compilers of the 1960’s. The basic design seems to have
> been in place from the very first iterations of his compiler in 1972 (see
> V2 tree on TUHS) and he does not mention these algorithms as being special
> or innovative in his later papers.
>
> Any observations / opinions on why DMR’s approach was not more widely used
> in the 1970’s?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/8eea515e/attachment.htm>

From pnr at planet.nl  Wed May  8 19:35:21 2024
From: pnr at planet.nl (Paul Ruizendaal)
Date: Wed, 8 May 2024 10:35:21 +0100 (GMT+01:00)
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
Message-ID: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>

Thanks for pointing that out. Here's an interesting paper on the DEC PDP11 Fortran compilers:
http://forum.6502.org/download/file.php?id=1724&sid=f6a721f3e05774cff076da72f5a731a6

Before 1975 they used direct threading, thereafter there was a native compiler for the higher-end models. I think this one may have required split i/d, but that is not entirely clear from the text.

I think the same holds for BCPL on the PDP11: compiling to "ocode" or "intcode" in the early 70s, native thereafter -- still have to find source for the latter.

Still, I should have first asked: Does anyone have pointers to small machine native compilers from the 1970's that produced efficient assembler output?

I am already aware of the 1978 Whitesmith C compiler.

7 May 2024 23:07:58 Rob Pike <robpike at gmail.com>:

> I'm not sure I accept your starting position. There were several compilers for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code, but I don't believe they all were. And of course there was BCPL, which was - and is - tiny; was it on the 11?
> 
> And there were other small machines from other manufacturers, all of which had some form of Fortran and other bespoke things, such as RPG on the small IBMs. I think the uniqueness was in the set of conditions more than in the Unix C compiler itself.
> 
> But you may be right.
> 
> -rob
> 
> 
> 
> 
> On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal <pnr at planet.nl> wrote:
>> In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
>> https://gitlab.com/marinchip
>> 
>> After creating a basic tool chain (edit, asm, link and a simple executive), John  set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.
>> 
>> This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
>> 1. The C language itself
>> 2. The ability to run natively on small hardware (even an LSI-11 system)
>> 3. Generating code with modest overhead versus handwritten assembler (say 30%)
>> 
>> As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).
>> 
>> There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
>> https://www.bell-labs.com/usr/dmr/www/primevalC.html
>> https://www.bell-labs.com/usr/dmr/www/chist.html
>> https://www.bell-labs.com/usr/dmr/www/hopl.html
>> 
>> It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.
>> 
>> As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.
>> 
>> I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.
>> 
>> Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/b39499c2/attachment-0001.htm>

From e5655f30a07f at ewoof.net  Wed May  8 21:09:29 2024
From: e5655f30a07f at ewoof.net (Michael =?utf-8?B?S2rDtnJsaW5n?=)
Date: Wed, 8 May 2024 11:09:29 +0000
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
Message-ID: <33ca504d-5167-4796-a277-b9d2865b7fb1@home.arpa>

On 7 May 2024 21:59 +0100, from pnr at planet.nl (Paul Ruizendaal):
> It seems to me that these papers rather understate the importance of
> generating good quality code. As far as I can tell, BCPL and BLISS
> came close, but were too large to run on a PDP-11 and only existed
> as cross-compilers.

https://www.softwarepreservation.org/projects/BCPL/index.html#York
appears to indicate that by 1974 there existed a native PDP-11 (/40 or
/45) BCPL compiler which ran under RSX-11.

--
Michael Kjörling                     🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

From robpike at gmail.com  Wed May  8 23:12:28 2024
From: robpike at gmail.com (Rob Pike)
Date: Wed, 8 May 2024 23:12:28 +1000
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
Message-ID: <CAKzdPgwwEXqcvFwRHyA=d-Rvo0UoNwv8JOKJanQbN2XZ+H56AQ@mail.gmail.com>

I believe Ken Thompson might have been a referee for that paper. At least,
he once mentioned to me that he had reviewed a paper about the threading in
the DEC Fortran compiler.

-rob


On Wed, May 8, 2024 at 7:35 PM Paul Ruizendaal <pnr at planet.nl> wrote:

> Thanks for pointing that out. Here's an interesting paper on the DEC PDP11
> Fortran compilers:
>
> http://forum.6502.org/download/file.php?id=1724&sid=f6a721f3e05774cff076da72f5a731a6
>
> Before 1975 they used direct threading, thereafter there was a native
> compiler for the higher-end models. I think this one may have required
> split i/d, but that is not entirely clear from the text.
>
> I think the same holds for BCPL on the PDP11: compiling to "ocode" or
> "intcode" in the early 70s, native thereafter -- still have to find source
> for the latter.
>
> Still, I should have first asked: Does anyone have pointers to small
> machine native compilers from the 1970's that produced efficient assembler
> output?
>
> I am already aware of the 1978 Whitesmith C compiler.
>
> 7 May 2024 23:07:58 Rob Pike <robpike at gmail.com>:
>
> I'm not sure I accept your starting position. There were several compilers
> for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code,
> but I don't believe they all were. And of course there was BCPL, which was
> - and is - tiny; was it on the 11?
>
> And there were other small machines from other manufacturers, all of which
> had some form of Fortran and other bespoke things, such as RPG on the small
> IBMs. I think the uniqueness was in the set of conditions more than in the
> Unix C compiler itself.
>
> But you may be right.
>
> -rob
>
>
>
>
> On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal <pnr at planet.nl> wrote:
>
>> In the last months, I've spent a little time on curating John Walker's
>> Unix clone and software stack, including an emulator to run it:
>> https://gitlab.com/marinchip
>>
>> After creating a basic tool chain (edit, asm, link and a simple
>> executive), John  set out to find a compiler. Among the first programs were
>> a port of the META 3 compiler-generator (similar to TMG on early Unix) and
>> a port of Birch-Hansen’s Pascal compiler. META was used to create a
>> compiler that generated threaded code. He found neither compiler good
>> enough for his goals and settled on writing his Unix-like OS in assembler.
>> As the 9900 architecture withered after 1980, this sealed the fate of this
>> OS early on -- had he found a good compiler, the code might have competed
>> alongside Coherent, Idris, and Minix during the 80’s.
>>
>> This made me realise once more how unique the Ritchie C compiler was. In
>> my view its uniqueness combines three aspects:
>> 1. The C language itself
>> 2. The ability to run natively on small hardware (even an LSI-11 system)
>> 3. Generating code with modest overhead versus handwritten assembler (say
>> 30%)
>>
>> As has been observed before, working at a higher abstraction level makes
>> it easier to work on algorithms and on refactoring, often earning back the
>> efficiency loss. John Walkers work may be case in point: I estimate that
>> his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as
>> compiled for the 9900 architecture).
>>
>> There are three papers on DMR’s website about the history of the compiler
>> and a compare-and-contrast with other compilers of the era:
>> https://www.bell-labs.com/usr/dmr/www/primevalC.html
>> https://www.bell-labs.com/usr/dmr/www/chist.html
>> https://www.bell-labs.com/usr/dmr/www/hopl.html
>>
>> It seems to me that these papers rather understate the importance of
>> generating good quality code. As far as I can tell, BCPL and BLISS came
>> close, but were too large to run on a PDP-11 and only existed as
>> cross-compilers. PL/M was a cross-compiler and generated poorer code.
>> Pascal on small machines compiled to a virtual machine. As far as I can
>> tell, during most of the 70s there was no other compiler that generated
>> good quality code and ran natively on a small (i.e. PDP-11 class) machine.
>>
>> As far as I can tell the uniqueness was mostly in the “c1” phase of the
>> compiler. The front-end code of the “c0” phase seems to use more or less
>> similar techniques as many contemporary compilers. The “c1” phase seems to
>> have been unique in that it managed to do register allocation and
>> instruction selection with a pattern matcher and associated code tables
>> squeezed into a small address space. On a small machine, other native
>> compilers of the era typically proceeded to generate threaded code, code
>> for a virtual machine or poor quality native code that evaluated
>> expressions using stack operations rather than registers.
>>
>> I am not sure why DMR's approach was not more widely used in the 1970’s.
>> The algorithms he used do not seem to be new and appear to have their roots
>> in other (larger) compilers of the 1960’s. The basic design seems to have
>> been in place from the very first iterations of his compiler in 1972 (see
>> V2 tree on TUHS) and he does not mention these algorithms as being special
>> or innovative in his later papers.
>>
>> Any observations / opinions on why DMR’s approach was not more widely
>> used in the 1970’s?
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/968d554f/attachment.htm>

From clemc at ccc.com  Thu May  9 01:51:11 2024
From: clemc at ccc.com (Clem Cole)
Date: Wed, 8 May 2024 11:51:11 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
Message-ID: <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>

I agree with Rob. I fear the OP might have more limited experience with
what was available at the time and how it was developed. The following is
undoubtedly incomplete. It is what I could remember quickly to answer the
question of real compilers for the PDP-11.

As others have pointed out, the original DEC PDP-11 FTN, like the original
PDP-6 and PDP-8, was based on threaded DEC F4 technology.  After
introducing the PDP-10, the 36-bit compiler team at DEC started a project
to rewrite FORTRAN (in BLISS) as a true compiler.  As was reminded at lunch
last week (I still eat weekly with many of the DEC TLG folks), DEC had two
groups -- a development team and a support team.    I think some of the
external confusion comes from both teams releasing products to the world,
and the outside world did not always understand the differences. So, when I
say the "compiler" group, I generally refer to the former - although many
people started in the latter and eventually became part of the former. They
key point here is that F4 (which was from the support folks), lived for a
while in parallel with stuff coming from what eventually would become TLG
[Technical Languages (and tools) Group].

The primary DEC-supported technical languages were all written in BLISS-11
and cross-compiled from the PDP-10 (originally).  However, they could run
in 11/40 class (shared I/D) space machines. Remember, DEC operating systems
could do overlays - although there were probably some differences with what
could be generated [I'd need to pull the old RT11 manuals for each]. Yes,
FORTRAN was the primary technical language, but DEC's TLG supported other
languages for the PDP-11 from COBOL to BASIC, and 3rd parties filled out
the available suite.

Probably the #1 3rd party, PDP-11 compiler, is (was) the OMSI Pascal
compiler (which generated direct PDP-11 code) for all classes of PDP-11s
[the OP referred to the Pascal that generated P4 code and ran interpreter
for same.  The UCSD Pascal worked this way, but I never saw anything other
than students use it for teaching, while the OMSI compiler was a force for
PDP-11 programmers, and you saw it in many PDP-11 shops - including some I
worked].  I'm pretty sure the RT11 and  RSX11 versions of this can be
easily found in the wild, but I have not looked for the UNIX version (note
that there was one).

Note - from a SW marketplace for PDP-11s, the money was on the DEC
operating systems, not UNIX.  So, there was little incentive to move those
tools, which I think is why the OP may not have experienced them.  Another
important political thing to consider is that TLG did their development on
PDP-10s and later Vaxen inside DEC.   Since everything was written in BLISS
and DEC marketing 100% missed/sunk that boat, the concept of self-hosting
the compiler was not taken seriously (ISTR: there was a project to make it
self-host on RSX, but it was abandoned since customers were not beating
DEC's door down for BLISS on many PDP-11 systems).

Besides DMR's compiler for the PDP-11.  Steve Johnson developed PCC and
later PCC2.  Both ran on all flavors of PDP-11s, although I believe since
the lack of support for overlays in the research UNIX editions limited the
compilers and ISTR, there were both 11/40 and 11/45 class binaries with
different-sized tables.

On our Unix boxes, we also had a PDP-11 Pascal compiler from Free
University in Europe (VU) - I don't remember much about it nor can I find
it in a quick search of my currently online stuff. ISTR That one may have
been 11/45 class - we had it on the TekLabs 11/70 and I don't remember
having in on any of our 40-class systems.

The Whitesmith's C has been mentioned.  That compiler ran on all the PDP-11
UNIXs of the day, plus its native Idris, as well as the DEC OSs.  It did
not use an interpreter per se, but instead compiled to something Plauger
called 'ANAT" - a natural assembler.  He then ran an optimizer over this
output and his final pass converted from ANAT code to the PDP-11 (or Z80 as
it turns out).  I argue that ANAT was what we now think of in modern
compilers as the IL, but others might argue differently. We ran it on our
RT-11 systems, although ISTR came with the UNIX version, so we had it on
the 11/70, too. That may have been because we used it to cross-compile for
the Z80.

Tannabaum and the team have the Amsterdam compiler toolkit. This had front
ends for C and Pascal and could generate code for PDP-11s and several other
microprocessors. I do not know how widely it was used for the PDP11s.

Per Brinch, Hansen also implemented Parallel Pascal and his own OS for the
40-class PDP-11s. He talks about this in his book Pascal on Small Systems.

Holt and team wrote Concurrent Euclid and TUNIS for the 40-class machines.

Wirth released a Modula for the 11, although we mostly ran it on the 68000s
and a Lilith system.

IIRC, Mike Malcom and the team built a true B compiler so they could
develop Thoth.   As the 11/40 was one of the original Thoth target
systems,  I would have expected that to exist, but I have never used it.

As was mentioned before, there was  BCPL for the PDP-11.  I believe that a
BCPL compiler can even be found on one of the  USENIX tapes in the TUHS
archives, but I have not looked.

Finally, ISTR, in the mid-late 1970s one of the Universities in Europe
(??Edinburgh, maybe??), developed and released an Algol flavor for the
PDP-11, but I never used it.   Again, you might want to check the TUHS
archives.  In my own case, while I had used Algol on the PDP-8s and 10s,
plus the IBM systems, and by then Pascal had become the hot alternative
language and was close enough I never had a desire/need for it.   Plus
since there were a number of Pascal implementations available for 11s and
no one in Teklabs was asking for it, I never chased it down.

To quote Tom Lehrer .. "*These are the only ones that the news has come to
Huvrd. There may be many others ..*."

Clem


ᐧ
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/618ef320/attachment.htm>

From nobozo at gmail.com  Thu May  9 02:07:47 2024
From: nobozo at gmail.com (Jon Forrest)
Date: Wed, 8 May 2024 09:07:47 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
Message-ID: <fb33c2d0-bd7f-4730-9b4d-1313fd16dff1@gmail.com>

There was also a Modula2 compiler for the PDP-11 from a university in the UK,
propably York. It was used to some degree at Ford Aerospace for the
KSOS secure Unix project. I think it required separate I&D.

Jon

From ats at offog.org  Thu May  9 03:05:51 2024
From: ats at offog.org (Adam Sampson)
Date: Wed, 08 May 2024 18:05:51 +0100
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 (Clem Cole's message of "Wed, 8 May 2024 11:51:11 -0400")
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
Message-ID: <y2acypw9qts.fsf@offog.org>

Clem Cole <clemc at ccc.com> writes:

> Finally, ISTR, in the mid-late 1970s one of the Universities in Europe
> (??Edinburgh, maybe??), developed and released an Algol flavor for the
> PDP-11, but I never used it.

That sounds like Edinburgh's IMP, which eventually had backends for a
very wide variety of platforms. Several versions are available here:
  https://history.dcs.ed.ac.uk/archive/languages/

-- 
Adam Sampson <ats at offog.org>                         <http://offog.org/>

From aek at bitsavers.org  Thu May  9 03:45:47 2024
From: aek at bitsavers.org (Al Kossow)
Date: Wed, 8 May 2024 10:45:47 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
Message-ID: <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org>

On 5/8/24 8:51 AM, Clem Cole wrote:

> IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth.   As the 11/40 was one of the original Thoth target 
> systems,  I would have expected that to exist, but I have never used it.
>

Thoth has been a white whale for me for decades. AFAIK nothing has survived from it.

"Decus" (Conroy's) C (transliteration of the assembler Unix C) should also be mentioned.


From tom.perrine+tuhs at gmail.com  Thu May  9 03:49:02 2024
From: tom.perrine+tuhs at gmail.com (Tom Perrine)
Date: Wed, 8 May 2024 10:49:02 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <fb33c2d0-bd7f-4730-9b4d-1313fd16dff1@gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 <fb33c2d0-bd7f-4730-9b4d-1313fd16dff1@gmail.com>
Message-ID: <CAJq=PCWR4zmqan=1m20CLDSvxu5G2Wk6pcpNTGeJnOQUDiOqJA@mail.gmail.com>

Hi Jon (and others),

I was part of the KSOS (later KSOS-11 and KSOS-32) team at LOGICON, which
picked up a follow-on contract to use KSOS-11 in a true multi-level-secure
production environment. Our target was SYSTEM_LOW as TOP SECRET.

Yes, we used that compiler for all the KSOS kernel and all the trusted
user-space code.

KSOS-11 only ran on PDP-11/70, and it did use split I&D.

I have access to the KSOS-11 source code, and have been trying to rebuild
that OS, BUT I haven't been able to find that Modula compiler.

KSOS-11 was a very small kernel, but there was a set of libraries that
presented a UNIX system call interface, so it could run some PWB userspace
tools, if they were re-compiled.

I'm using the term KSOS-11, as there was a follow-on project (KSOS-32) that
ported the original PDP KSOS to 11/780. I wrote a completely new (simpler)
scheduler, the bootstrap and memory management layer for that one.

And, for "reasons", the entire KSOS project at Logicon was shut down just a
week or so after the first user login to KSOS-32.

KSOS-11 itself and some multi-level applications did ship to DoD customers,
and it ran MLS applications for the Navy and USAFE.

--tep

ps. Jon was kind enough to remind me that we had corresponded about this in
the past -and- to remind me to send to the list, and not just him :-)


On Wed, May 8, 2024 at 9:08 AM Jon Forrest <nobozo at gmail.com> wrote:

> There was also a Modula2 compiler for the PDP-11 from a university in the
> UK,
> propably York. It was used to some degree at Ford Aerospace for the
> KSOS secure Unix project. I think it required separate I&D.
>
> Jon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/790ee591/attachment.htm>

From clemc at ccc.com  Thu May  9 04:12:15 2024
From: clemc at ccc.com (Clem Cole)
Date: Wed, 8 May 2024 14:12:15 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org>
Message-ID: <CAC20D2PwdHJhysU022X=Rk3VicqbgmUWH1dF+QxT6BnWvbQUxg@mail.gmail.com>

On Wed, May 8, 2024 at 1:46 PM Al Kossow <aek at bitsavers.org> wrote:

> Thoth has been a white whale for me for decades.

Ditto.  Although, I believe the late John Beety had his 'Thoth Thucks" tee
shirt for years.  I believe Kelly Booth still does.


> AFAIK nothing has survived from it.
>
You can argue that V-Kernel and QNX are children of Thoth - but they were
both in a flavor of Waterloo C that did not think ever targeted the PDP-11
[that might be a misunderstanding WRT Waterloo C].

>
> "Decus" (Conroy's) C (transliteration of the assembler Unix C) should also
> be mentioned.
>
Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an
end-around for the AT&T lawyers by taking the *.s files, and converting
them to MACRO11, and then
redoing the assembler code to use originally RT11 I/O and later RSX11.
That said, it had its own life and ran on the DEC OSses, not UNIX, so it
probably counts.
That said, I thought Paul was asking about different core compiler
implementations, and I would argue the DECUS/Conroy compiler is the DMR
compiler, while the list I offered was all different core implementations.

I'm curious about Jon and Tom's MOD2 compiler.   Other than Wirth's, which
targeted the 68000, Lilith, and VAX, I did not know of another for the
PDP-11.  Any idea of its origin story? I would have expected it to have
derived from Wirth's Modula subsystem.  FWIW:  The DEC Mod-II and Mod-III
were new implementations from DEC WRL or SRC (I forget).  They targeted
Alpha and I, maybe Vax.  I'd have to ask someone like Larry Stewart or Jeff
Mogul who might know/remember, but I thought that the font end to the DEC
MOD2 compiler might have been partly based on Wirths but rewritten and by
the time of the MOD3 FE was a new one originally written using the previous
MOD2 compiler -- but I don't remember that detail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/b0f6a39a/attachment.htm>

From clemc at ccc.com  Thu May  9 04:12:55 2024
From: clemc at ccc.com (Clem Cole)
Date: Wed, 8 May 2024 14:12:55 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2PwdHJhysU022X=Rk3VicqbgmUWH1dF+QxT6BnWvbQUxg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org>
 <CAC20D2PwdHJhysU022X=Rk3VicqbgmUWH1dF+QxT6BnWvbQUxg@mail.gmail.com>
Message-ID: <CAC20D2PedGJ3RKcyRN6awj0wLr-UBSupbsbtqdHVAFC_yAHy-w@mail.gmail.com>

s/Beety/Beatty/  -- sorry
ᐧ

On Wed, May 8, 2024 at 2:12 PM Clem Cole <clemc at ccc.com> wrote:

>
>
> On Wed, May 8, 2024 at 1:46 PM Al Kossow <aek at bitsavers.org> wrote:
>
>> Thoth has been a white whale for me for decades.
>
> Ditto.  Although, I believe the late John Beety had his 'Thoth Thucks" tee
> shirt for years.  I believe Kelly Booth still does.
>
>
>
>> AFAIK nothing has survived from it.
>>
> You can argue that V-Kernel and QNX are children of Thoth - but they were
> both in a flavor of Waterloo C that did not think ever targeted the PDP-11
> [that might be a misunderstanding WRT Waterloo C].
>
>>
>> "Decus" (Conroy's) C (transliteration of the assembler Unix C) should
>> also be mentioned.
>>
> Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an
> end-around for the AT&T lawyers by taking the *.s files, and converting
> them to MACRO11, and then
> redoing the assembler code to use originally RT11 I/O and later RSX11.
> That said, it had its own life and ran on the DEC OSses, not UNIX, so it
> probably counts.
> That said, I thought Paul was asking about different core compiler
> implementations, and I would argue the DECUS/Conroy compiler is the DMR
> compiler, while the list I offered was all different core implementations.
>
> I'm curious about Jon and Tom's MOD2 compiler.   Other than Wirth's, which
> targeted the 68000, Lilith, and VAX, I did not know of another for the
> PDP-11.  Any idea of its origin story? I would have expected it to have
> derived from Wirth's Modula subsystem.  FWIW:  The DEC Mod-II and Mod-III
> were new implementations from DEC WRL or SRC (I forget).  They targeted
> Alpha and I, maybe Vax.  I'd have to ask someone like Larry Stewart or Jeff
> Mogul who might know/remember, but I thought that the font end to the DEC
> MOD2 compiler might have been partly based on Wirths but rewritten and by
> the time of the MOD3 FE was a new one originally written using the previous
> MOD2 compiler -- but I don't remember that detail.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/81a3ea43/attachment.htm>

From douglas.mcilroy at dartmouth.edu  Thu May  9 04:29:19 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Wed, 8 May 2024 14:29:19 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
Message-ID: <CAKH6PiXy_-+nAm08ciJvZL0mLeXxh3hFHasy-wcJhpQbE=DwiA@mail.gmail.com>

There was nothing unique about the size or the object code of Dennis's C
compiler. In the 1960s, Digitek had a thriving business of making Fortran
compilers for all manner of machines. To optimize space usage, the
compilers' internal memory model comprised  variable-size movable tables,
called "rolls". To exploit this non-native architecture, the compilers
themselves were interpreted, although they generated native code. Bob
McClure tells me he used one on an SDS910 that had 8K 16-bit words.

Dennis was one-up on Digitek in having a self-maintaining compiler. Thus,
when he implemented an optimization, the source would grow, but the
compiler binary might even shrink thanks to self-application.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/49c0ecdc/attachment.htm>

From stewart at serissa.com  Thu May  9 11:27:41 2024
From: stewart at serissa.com (Lawrence Stewart)
Date: Wed, 8 May 2024 21:27:41 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2PwdHJhysU022X=Rk3VicqbgmUWH1dF+QxT6BnWvbQUxg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org>
 <CAC20D2PwdHJhysU022X=Rk3VicqbgmUWH1dF+QxT6BnWvbQUxg@mail.gmail.com>
Message-ID: <24A2FC48-8720-49B3-BF85-E53C9B09B32A@serissa.com>

Regarding the Dec Research Modula-X compilers, I am fairly sure that Modula-2 for the VAX was a WRL (Digital Western Research Lab) thing, because it was used for the WRL CAD tool suite used to design the WRL Titan and the SRC (Systems Research Center) Firefly machines.  SRC did the Modula-2-Plus compiler for the VAX, which added garbage collection. The Firefly OS was Modula, but included an Ultrix system call set so it could run Ultrix binaries.  I may be wrong about this, but I think Wirth then did Modula-3 and then Oberon.  WRL and SRC never had any PDP-11’s as far as I know.
-L


> On May 8, 2024, at 2:12 PM, Clem Cole <clemc at ccc.com> wrote:
> 
> 
> 
> On Wed, May 8, 2024 at 1:46 PM Al Kossow <aek at bitsavers.org <mailto:aek at bitsavers.org>> wrote:
>> Thoth has been a white whale for me for decades.
> Ditto.  Although, I believe the late John Beety had his 'Thoth Thucks" tee shirt for years.  I believe Kelly Booth still does.
> 
>  
>> AFAIK nothing has survived from it.
> You can argue that V-Kernel and QNX are children of Thoth - but they were both in a flavor of Waterloo C that did not think ever targeted the PDP-11 [that might be a misunderstanding WRT Waterloo C]. 
>> 
>> "Decus" (Conroy's) C (transliteration of the assembler Unix C) should also be mentioned.
> Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an end-around for the AT&T lawyers by taking the *.s files, and converting them to MACRO11, and then
> redoing the assembler code to use originally RT11 I/O and later RSX11.  That said, it had its own life and ran on the DEC OSses, not UNIX, so it probably counts.
> That said, I thought Paul was asking about different core compiler implementations, and I would argue the DECUS/Conroy compiler is the DMR compiler, while the list I offered was all different core implementations.
> 
> I'm curious about Jon and Tom's MOD2 compiler.   Other than Wirth's, which targeted the 68000, Lilith, and VAX, I did not know of another for the PDP-11.  Any idea of its origin story? I would have expected it to have derived from Wirth's Modula subsystem.  FWIW:  The DEC Mod-II and Mod-III were new implementations from DEC WRL or SRC (I forget).  They targeted Alpha and I, maybe Vax.  I'd have to ask someone like Larry Stewart or Jeff Mogul who might know/remember, but I thought that the font end to the DEC MOD2 compiler might have been partly based on Wirths but rewritten and by the time of the MOD3 FE was a new one originally written using the previous MOD2 compiler -- but I don't remember that detail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/3fcae697/attachment-0001.htm>

From paul at mcjones.org  Thu May  9 13:39:32 2024
From: paul at mcjones.org (Paul McJones)
Date: Wed, 8 May 2024 20:39:32 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <171519201646.4052234.694570138790187562@minnie.tuhs.org>
References: <171519201646.4052234.694570138790187562@minnie.tuhs.org>
Message-ID: <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org>

> On Wed, 8 May 2024 14:12:15 -0400,Clem Cole <clemc at ccc.com <mailto:clemc at ccc.com>> wrote:
> 
> FWIW:  The DEC Mod-II and Mod-III
> were new implementations from DEC WRL or SRC (I forget).  They targeted
> Alpha and I, maybe Vax.  I'd have to ask someone like Larry Stewart or Jeff
> Mogul who might know/remember, but I thought that the font end to the DEC
> MOD2 compiler might have been partly based on Wirths but rewritten and by
> the time of the MOD3 FE was a new one originally written using the previous
> MOD2 compiler -- but I don't remember that detail.

Michael Powell at DEC WRL wrote a Modula 2 compiler that generated VAX code. Here’s an extract from announcement.d accompanying a 1992 release of the compiler from gatekeeper.dec.com <http://gatekeeper.dec.com/>:

The compiler was designed and built by Michael L. Powell, and originally
released in 1984.  Joel McCormack sped the compiler up, fixed lots of bugs, and
swiped/wrote a User's Manual.  Len Lattanzi ported the compiler to the MIPS. 

Later, Paul Rovner and others at DEC SRC designed Modula-2+ (a language extension with exceptions, threads, garbage collection, and runtime type dispatch). The Modula-2+ compiler was originally based on Powell’s compiler. Modula-2+ ran on the VAX.

Here’s a DEC SRC research report on Modula-2+:
http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-3.pdf

Modula-3 was designed at DEC SRC and Olivetti Labs. It had a portable implementation (using the GCC back end) and ran on a number of machines including Alpha.


Paul


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/571805ae/attachment.htm>

From imp at bsdimp.com  Thu May  9 13:46:20 2024
From: imp at bsdimp.com (Warner Losh)
Date: Wed, 8 May 2024 21:46:20 -0600
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org>
References: <171519201646.4052234.694570138790187562@minnie.tuhs.org>
 <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org>
Message-ID: <CANCZdforcuQd6u5-KOVWiUaC_g2pEOYBTN+zGH7+TyqdgUPJAg@mail.gmail.com>

On Wed, May 8, 2024, 9:39 PM Paul McJones <paul at mcjones.org> wrote:

> On Wed, 8 May 2024 14:12:15 -0400,Clem Cole <clemc at ccc.com> wrote:
>
>
> FWIW:  The DEC Mod-II and Mod-III
> were new implementations from DEC WRL or SRC (I forget).  They targeted
> Alpha and I, maybe Vax.  I'd have to ask someone like Larry Stewart or Jeff
> Mogul who might know/remember, but I thought that the font end to the DEC
> MOD2 compiler might have been partly based on Wirths but rewritten and by
> the time of the MOD3 FE was a new one originally written using the previous
> MOD2 compiler -- but I don't remember that detail.
>
>
> Michael Powell at DEC WRL wrote a Modula 2 compiler that generated VAX
> code. Here’s an extract from announcement.d accompanying a 1992 release of
> the compiler from gatekeeper.dec.com:
>
> The compiler was designed and built by Michael L. Powell, and originally
> released in 1984.  Joel McCormack sped the compiler up, fixed lots of
> bugs, and
> swiped/wrote a User's Manual.  Len Lattanzi ported the compiler to the
> MIPS.
>
>
> Later, Paul Rovner and others at DEC SRC designed Modula-2+ (a language
> extension with exceptions, threads, garbage collection, and runtime type
> dispatch). The Modula-2+ compiler was originally based on Powell’s
> compiler. Modula-2+ ran on the VAX.
>
> Here’s a DEC SRC research report on Modula-2+:
> http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-3.pdf
>
> Modula-3 was designed at DEC SRC and Olivetti Labs. It had a portable
> implementation (using the GCC back end) and ran on a number of machines
> including Alpha.
>

FreeBSD's cvsup was written using it. The threading made it possible to
make maximum use of the 56k modems of the time and speed downloads of the
source changes. The port for modula-3 changed a number of time from gcc to
egcs back to gcc before running out of steam...

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240508/eb324bf9/attachment.htm>

From tuhs at tuhs.org  Fri May 10 06:40:28 2024
From: tuhs at tuhs.org (Paul Ruizendaal via TUHS)
Date: Thu, 9 May 2024 22:40:28 +0200
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
Message-ID: <C7223B04-B299-475E-9EB2-03FA59CF0994@planet.nl>

Thanks everybody for the feedback and pointers, much appreciated!

The main point is clear: the premise that the DMR C compiler had unique (native, small machine) code generation during most of the 70’s does not hold up.

Clean Cole is correct in observing that (certainly for the 70’s) I’m skewed to stuff from academia with a blind spot for the commercial compilers of that era.

Doug McIlroy’s remarks on Digitek were most helpful and I’ll expand a bit on that below.

I was aware of the Digitek / Ryan-Macfarland compilers before, but in my mind they compiled to a virtual machine (mis-understanding a description of “programmed operators” and because their compilers for microcomputers did so in the 80’s). Digging into this more led me to a 1970 report "Programming Languages and their Compilers, Preliminary Notes” by John Cocke and J.T. Schwartz:
https://www.softwarepreservation.org/projects/FORTRAN/paper/Bright-FORTRANComesToWestinghouseBettis-1971.pdf

It is a nearly 800 page review of then current languages and compilers and it includes some discussion of the Digitek compilers as the state of the art for small machines and has some further description of how they worked (pp. 233-237, 749). It also mentions their PL/1 for Multics fiasco (for background https://www.multicians.org/pl1.html).

- The Digitek compilers were indeed small enough to run on PDP-11 class machines and even smaller, and they produced quite reasonable native code. In this sense, they were in the same spot as the DMR C compiler which was hence not unique in this regard -- as Doug points out.

- They consisted of two parts: a front end coded in “Programmed Operators" (POPS) generating an intermediate language, and a custom coded back-end that converted the IL to native code.

- POPS were in effect a VM for compiler construction (although expressed as assembler operations). To move a compiler to a new machine only the POPS VM had to be recoded, which was a very manageable job. From the description in the above book it sounds very similar to the META 3 compiler generator setup, but expressed in a different form.

- Unfortunately, I have not been able to find a description of the POPS IL.

- The smaller Digitek compilers had a limited level of optimisations, carried out at the code generation phase. The optimisations described sound quite similar to what the DMR C compiler did in its c1 phase (special casing +1 and -1, combining constants, mul/div to shift, etc.)

- Code generation seems to have been through code snippets for each IL operation, selecting from one of 3 addressing modes: register, memory and indexed; the text isn’t quite clear. It sounds reasonable for small machines in the 60’s.

- The later Ryan-MacFarland microcomputer compilers seem to have used the same POPS based front-end technology, but using an interpreter to execute the IL directly.

Interestingly, the above book has a final chapter about “the self-compiling compiler”. To quote: “The scheme to be described is one which has often been considered, and in some cases even implemented. It involves the use of a compiler written in its own language, and capable therefore of compiling itself over to a new machine.” It proceeds to describe such a compiler in quite some detail, including using a table driven code generator.

Seen through this lens, the DMR C compiler could be viewed as a re-imagining of the Digitek small system compilers using a self-compiling lexer/parser instead of POPS (or TMG or META) and a (also self-compiling) code generator evolved to handle the richer PDP-11 addressing modes. The concept seems to have been in the air at that time.

Now I am left wondering why the IL-to-native back-ends were not more used in academic small machine compilers in the 70’s -- but this too may be the result of a skewed view on my part.


From aek at bitsavers.org  Fri May 10 06:57:18 2024
From: aek at bitsavers.org (Al Kossow)
Date: Thu, 9 May 2024 13:57:18 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <C7223B04-B299-475E-9EB2-03FA59CF0994@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <C7223B04-B299-475E-9EB2-03FA59CF0994@planet.nl>
Message-ID: <b20f7075-f992-a7b8-40dc-32e572ad4fba@bitsavers.org>

On 5/9/24 1:40 PM, Paul Ruizendaal via TUHS wrote:

> - Unfortunately, I have not been able to find a description of the POPS IL.

I went down that rabbit hole researching why Don Knuth had a high opinion of it.
If you dig around, the IL is described in the SDS Fortran documentation
http://bitsavers.org/pdf/sds/9xx/lang/900883A_9300_FORTRAN_IV_Tech_Aug65.pdf
and http://bitsavers.org/pdf/digitek/Data_Structures_in_Digiteks_FORTRAN_IV_Compiler_for_the_SDS_900_Series.pdf

and a compiler listing
https://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-1-pdf/k-1-C1051-Digitek-FORTRAN.pdf


From davida at pobox.com  Fri May 10 16:15:52 2024
From: davida at pobox.com (David Arnold)
Date: Fri, 10 May 2024 16:15:52 +1000
Subject: [TUHS] nl section delimiters
Message-ID: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>

nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to delimit header, body, and trailer sections within its input. 

I wondered if anyone was able to shed light on the reason those were adopted as the defaults?

I would have expected perhaps something compatible with *roff (like, .\” something). 

FreeBSD claims nl first appeared in System III (although it previously claimed SVR2), but I haven’t dug into the implementation any further. 

Thanks in advance,


d

From robpike at gmail.com  Fri May 10 20:08:33 2024
From: robpike at gmail.com (Rob Pike)
Date: Fri, 10 May 2024 20:08:33 +1000
Subject: [TUHS] nl section delimiters
In-Reply-To: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
Message-ID: <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>

Didn't recognize the command, looked it up. Sigh.

  pr -tn <file>

seems sufficient for me, but then that raises the question of your question.

I've been developing a theory about how the existence of something leads to
things being added to it that you didn't need at all and only thought of
when the original thing was created. Bloat by example, if you will. I
suspect it will not be a popular theory, however accurately it may describe
the technological world.

-rob


On Fri, May 10, 2024 at 4:16 PM David Arnold <davida at pobox.com> wrote:

> nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to
> delimit header, body, and trailer sections within its input.
>
> I wondered if anyone was able to shed light on the reason those were
> adopted as the defaults?
>
> I would have expected perhaps something compatible with *roff (like, .\”
> something).
>
> FreeBSD claims nl first appeared in System III (although it previously
> claimed SVR2), but I haven’t dug into the implementation any further.
>
> Thanks in advance,
>
>
>
> d
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/6f334266/attachment.htm>

From tuhs at tuhs.org  Sat May 11 02:05:15 2024
From: tuhs at tuhs.org (segaloco via TUHS)
Date: Fri, 10 May 2024 16:05:15 +0000
Subject: [TUHS] nl section delimiters
In-Reply-To: <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
 <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
Message-ID: <WvDE-ueTJJYDLwTxL2MK49Ub_KsWOJ3UmpsSheFD-r57lSy2OBPqN_AVPIfmW7NMM-D0XYAAky9O7ZN8DXbgl7IhuDum_duveZLsHFRKXEo=@protonmail.com>

On Friday, May 10th, 2024 at 3:08 AM, Rob Pike <robpike at gmail.com> wrote:

> Didn't recognize the command, looked it up. Sigh.
> 
> pr -tn <file>
> 
> seems sufficient for me, but then that raises the question of your question.
> 
> I've been developing a theory about how the existence of something leads to things being added to it that you didn't need at all and only thought of when the original thing was created. Bloat by example, if you will. I suspect it will not be a popular theory, however accurately it may describe the technological world.
> 
> -rob
> 
> 
> On Fri, May 10, 2024 at 4:16 PM David Arnold <davida at pobox.com> wrote:
> 
> > nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to delimit header, body, and trailer sections within its input.
> > 
> > I wondered if anyone was able to shed light on the reason those were adopted as the defaults?
> > 
> > I would have expected perhaps something compatible with *roff (like, .\” something).
> > 
> > FreeBSD claims nl first appeared in System III (although it previously claimed SVR2), but I haven’t dug into the implementation any further.
> > 
> > Thanks in advance,
> > 
> > 
> > 
> > d

https://www.tuhs.org/pipermail/tuhs/2022-July/026197.html

Here's an earlier thread on nl that doesn't answer your specific question on the sequences but may provide some background on nl(1).

- Matt G.

From clemc at ccc.com  Sat May 11 02:36:43 2024
From: clemc at ccc.com (Clem Cole)
Date: Fri, 10 May 2024 12:36:43 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
Message-ID: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>

While the idea of small tools that do one job well is the core tenant of
what I think of as the UNIX philosophy, this goes a bit beyond UNIX, so I
have moved this discussion to COFF and BCCing TUHS for now.

The key is that not all "bloat" is the same (really)—or maybe one person's
bloat is another person's preference.  That said, NIH leads to pure bloat
with little to recommend it, while multiple offerings are a choice. Maybe
the difference between the two may be one person's view over another.

On Fri, May 10, 2024 at 6:08 AM Rob Pike <robpike at gmail.com> wrote:

> Didn't recognize the command, looked it up. Sigh.
>
Like Rob -- this was a new one for me, too.
I looked, and it is on the SYS3 tape; see:
https://www.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/man/man1/nl.1


>   pr -tn <file>
>

> seems sufficient for me, but then that raises the question of your
> question.
>
Agreed, that has been burned into the ROMs in my  fingers since the
mid-1970s 😀
BTW: SYS3 has pr(1) with both switches too  (more in a minute)


> I've been developing a theory about how the existence of something leads
> to things being added to it that you didn't need at all and only thought of
> when the original thing was created.
>
That is a good point, and I generally agree with you.


> Bloat by example, if you will. I suspect it will not be a popular theory,
> however accurately it may describe the technological world.
>

Of course, sometimes the new features >>are<< easier (more natural *for
some people*).  And herein lies the core problem. The bloat is often
repetitive, and I suggest that it is often implemented in the wrong place -
and usually for the wrong reasons.

Bloat comes about because somebody thinks they need some feature and
probably doesn't understand that it is already there or how they can use
it. But they do know about it, their tool must be set up to exploit it - so
they do not need to reinvent it.  GUI-based tools are notorious for this
failure. Everyone seems to have a built-in (unique) editor, or a private
way to set up configuration options et al. But ... that walled garden is
comfortable for many users and >>can be<< useful sometimes.

Long ago, UNIX programmers learned that looking for $EDITOR in the
environment was way better than creating one.  Configuration was as ASCII
text, stored in /etc for system-wide and dot files in the home for users.
But it also means the >>output<< of each tool needs to be usable by each
other [*i.e.*, docx or xlx files are a no-no).

For example, for many things on my Mac, I do use the GUI-based tools --
there is no doubt they are better integrated with the core Mac system >>for
some tasks.<< But only if I obey a set of rules Apple decrees.  For
instance, this email read is easier much of the time than MH (or the HM
front end, for that matter), which I used for probably 25-30 years. But on
my Mac, I always have 4 or 5 iterm2(1) open running zsh(1) these days. And,
much of my typing (and everything I do as a programmer) is done in the shell
(including a simple text editor, not an 'IDE').  People who love IDEs swear
by them -- I'm just not impressed - there is nothing they do for me that
makes it easier, and I have learned yet another scheme.

That said, sadly, Apple is forcing me to learn yet another debugger since
none of the traditional UNIX-based ones still work on the M1-based systems.
But at least LLDB is in the same key as sdb/dbx/gdb *et al*., so it is a
PITA but not a huge thing as, in the end, LLDB is still based on the UNIX
idea of a single well-designed and specific to the task tool, to do each
job and can work with each other.

FWIW: I was recently a tad gob-smacked by the core idea of UNIX and its
tools, which I have taken for a fact since the 1970s.

It turns out that I've been helping with the PiDP-10 users (all of the
PiDPs are cool, BTW). Before I saw UNIX, I was paid to program a PDP-10. In
fact, my first UNIX job was helping move programs from the 10 to the UNIX.
Thus ... I had been thinking that doing a little PDP-10 hacking shouldn't
be too hard to dust off some of that old knowledge.  While some of it has,
of course, come back.  But daily, I am discovering small things that are so
natural with a few simple tools can be hard on those systems.

I am realizing (rediscovering) that the "build it into my tool" was the
norm in those days.   So instead of a pr(1) command, there was a tool that
created output to the lineprinter. You give it a file, and it is its job to
figure out what to do with it, so it has its set of features (switches) -
so "bloat" is that each tool (like many current GUI tools) has private ways
of doing things. If the maker of tool X decided to support some idea, they
would do it like tool Y.  The problem, of course, was that tools X and Y
had to 'know about' each type of file (in IBM terms, use its "access
method").  Yes, the engineers at DEC, in their wisdom, tried to
"standardize" those access methods/switches/features >>if you implemented
them<< -- but they are not all there.

This leads me back to the question Rob raises.  Years ago, I got into an
argument with Dave Cutler RE: UNIX *vs.* VMS. Dave's #1 complaint about
UNIX in those days was that it was not "standardized."  Every program was
different, and more to Dave's point, there was no attempt to make switches
or errors the same [getopt(3) had been introduced but was not being used by
most applications).  He hated that tar/tp used "keys" and tools like cpio
used switches.  Dave hated that I/O was so simple - in his world all user
programs should use his RMS access method of course [1].  VMS, TOPS, *etc.*,
tried to maintain a system-wide error scheme, and users could look things
like errors up in a system DB by error number, *etc*.  Simply put, VMS is
very "top-down."

My point with Dave was that by being "bottom-up," the best ideas in  UNIX
were able to rise. And yes, it did mean some rough edges and repeated
implementations of the same idea.  But UNIX offered a choice, and while Rob
and I like and find: pr -tn perfectly acceptable thank you, clearly someone
else desired the features that nl provides. The folks that put together
System 3 offer both solutions and let the user choose.

This, of course, comes as bloat, but maybe that is a type of bloat so bad?


My own thinking is this - get things down to the basics and simplest
privatives and then build back up.  It's okay to offer choices, as long as
the foundation is simple and clean.  To me, bloat becomes an issue when you
do the same thing over and over again, particularly because you can not
utilize what is there already, the worst example is NIH - which happens way
more than it should.


I think the kind of bloat that GUI tools and TOPS et al. created forces
recreation, not reuse. But offering choice and the expense of multiple
tools that do the same things strikes me as reasonable/probably a good
thing.


1.]  BTW: One of my favorite DEC stories WRT to VMS engineering has to do
with the RMS I/O system.  Supporting C using VMS was a bit of PITA.
 Eventually, the VMS engineers added Stream I/O - which simplified the C
runtime, but it was also made available for all technical languages.
Fairly soon after it was released, the DEC Marketing folks discovered
almost all new programs, regardless of language, had started to use Stream
I/O and many older programs were being rewritten by customers to use it. In
fact, inside of DEC itself, the languages group eventually rewrote things
like the FTN runtime to use streams, making it much smaller/easier to
maintain.   My line in the old days: "It's not so bad that ever I/O has
offer 1000 options, it's that Dave to check each one for every I/O. It's a
classic example of how you can easily build RMS I/O out of stream-based
I/O, but the other way around is much harder.   My point here is to *use
the right primitives*. RMS may have made it easier to build RDB, but it
impeded everything else.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/4150acba/attachment.htm>

From jpl.jpl at gmail.com  Sat May 11 02:50:22 2024
From: jpl.jpl at gmail.com (John P. Linderman)
Date: Fri, 10 May 2024 12:50:22 -0400
Subject: [TUHS] nl section delimiters
In-Reply-To: <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
 <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
Message-ID: <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA@mail.gmail.com>

I'll accept Rob's theory. Instead of taking the time to go through the
alphabet soup of options to nl and pr and ls, learning a tool like awk or
perl or python makes implementing most of what these commands do (or what
you wish they could do) a one-finger exercise. -- jpl

On Fri, May 10, 2024 at 6:09 AM Rob Pike <robpike at gmail.com> wrote:

> Didn't recognize the command, looked it up. Sigh.
>
>   pr -tn <file>
>
> seems sufficient for me, but then that raises the question of your
> question.
>
> I've been developing a theory about how the existence of something leads
> to things being added to it that you didn't need at all and only thought of
> when the original thing was created. Bloat by example, if you will. I
> suspect it will not be a popular theory, however accurately it may describe
> the technological world.
>
> -rob
>
>
> On Fri, May 10, 2024 at 4:16 PM David Arnold <davida at pobox.com> wrote:
>
>> nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to
>> delimit header, body, and trailer sections within its input.
>>
>> I wondered if anyone was able to shed light on the reason those were
>> adopted as the defaults?
>>
>> I would have expected perhaps something compatible with *roff (like, .\”
>> something).
>>
>> FreeBSD claims nl first appeared in System III (although it previously
>> claimed SVR2), but I haven’t dug into the implementation any further.
>>
>> Thanks in advance,
>>
>>
>>
>> d
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/13bf4982/attachment.htm>

From paul.winalski at gmail.com  Sat May 11 03:28:40 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Fri, 10 May 2024 13:28:40 -0400
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAKH6PiXy_-+nAm08ciJvZL0mLeXxh3hFHasy-wcJhpQbE=DwiA@mail.gmail.com>
References: <CAKH6PiXy_-+nAm08ciJvZL0mLeXxh3hFHasy-wcJhpQbE=DwiA@mail.gmail.com>
Message-ID: <CABH=_VTCZz-N2rj6neWFfVb7=HKJ9tMJqNw6PcEv+2aJ_7a8iQ@mail.gmail.com>

On Wed, May 8, 2024 at 2:29 PM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

>
> Dennis was one-up on Digitek in having a self-maintaining compiler. Thus,
> when he implemented an optimization, the source would grow, but the
> compiler binary might even shrink thanks to self-application.
>

Another somewhat non-intuitive aspect of optimizing compilers is that
simply adding optimizations can cause an increase in compilation speed by
reducing the amount of IL in the program being compiled.  Less IL due to
optimization means less time spent in later phases of the compilation
process.

Regarding native compilers for small machines, IBM had compilers for
Fortran, COBOL, and PL/I that ran in 32K on System/360 and produced
tolerably good code (yes, one could do better with handwritten assembler).
And they generated real code, no threaded code cop-out.  And we're talking
full PL/I here, not the subset that ANSI later standardized.  The compilers
were table-driven as much as possible, heavily overlaid, and used three
scratch files on disk (split-cylinder allocated to minimize seek time).

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/4ea5abbd/attachment.htm>

From paul at mcjones.org  Sat May 11 04:55:54 2024
From: paul at mcjones.org (Paul McJones)
Date: Fri, 10 May 2024 11:55:54 -0700
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <171535904627.4052234.5321502833323676423@minnie.tuhs.org>
References: <171535904627.4052234.5321502833323676423@minnie.tuhs.org>
Message-ID: <5F50D05C-C898-4BFF-B57F-028494ED9EBD@mcjones.org>

> On Thu, 9 May 2024 22:40:28 +0200, Paul Ruizendaal <pnr at planet.nl <mailto:pnr at planet.nl>> wrote:
> 
> .... Digging into this more led me to a 1970 report "Programming Languages and their Compilers, Preliminary Notes” by John Cocke and J.T. Schwartz:
> https://www.softwarepreservation.org/projects/FORTRAN/paper/Bright-FORTRANComesToWestinghouseBettis-1971.pdf

Actually, the link is here:

https://www.softwarepreservation.org/projects/FORTRAN/CockeSchwartz_ProgLangCompilers.pdf

And more about Jack Schwartz is here:

https://www.softwarepreservation.org/projects/SETL/index.html#Precursors


Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/503f606c/attachment-0001.htm>

From steffen at sdaoden.eu  Sat May 11 09:05:55 2024
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Sat, 11 May 2024 01:05:55 +0200
Subject: [TUHS] nl section delimiters
In-Reply-To: <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA@mail.gmail.com>
References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
 <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
 <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA@mail.gmail.com>
Message-ID: <20240510230555.j3xsYPX4@steffen%sdaoden.eu>

John P. Linderman wrote in
 <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA at mail.gmail.com>:
 |I'll accept Rob's theory. Instead of taking the time to go through the
 |alphabet soup of options to nl and pr and ls, learning a tool like awk or
 |perl or python makes implementing most of what these commands do (or what
 |you wish they could do) a one-finger exercise. -- jpl

But it misses the coolness of the empty true(1), and the last
possibly requires more CPU cycles for startup than you had on
a work day (said into the blue, completely unmathematically).

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

From douglas.mcilroy at dartmouth.edu  Sat May 11 12:19:45 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Fri, 10 May 2024 22:19:45 -0400
Subject: [TUHS] nl section delimiters
Message-ID: <CAKH6PiWG-cFwQYPKDhKg_LdDKChC8odbxnnTOjC9+KZea7OgQA@mail.gmail.com>

> But it misses the coolness of the empty true(1).

Too cool. With an empty true(1), execl("true", "true", 0) is out in the
cold.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240510/822f9a7f/attachment.htm>

From ralph at inputplus.co.uk  Sat May 11 19:07:41 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sat, 11 May 2024 10:07:41 +0100
Subject: [TUHS] nl section delimiters
In-Reply-To: <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA@mail.gmail.com>
References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com>
 <CAKzdPgwGdkDUx_FSm0sVSUeFV=R_nox7H4qhBKjVN0drHq89pg@mail.gmail.com>
 <CAC0cEp-E8f8nXaCMM34opqcgPQFpOQnE6Z9m3g1o4N_ET5-nsA@mail.gmail.com>
Message-ID: <20240511090741.906E3215AA@orac.inputplus.co.uk>

Hi jpl,

> Instead of taking the time to go through the alphabet soup of options
> to nl and pr and ls, learning a tool like awk or perl or python

pr(1) was in V5, where one of its stderr messages was ‘Very funny.’.
awk arrived in V7.

-- 
Cheers, Ralph.

From ralph at inputplus.co.uk  Sat May 11 19:16:13 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sat, 11 May 2024 10:16:13 +0100
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CABH=_VTCZz-N2rj6neWFfVb7=HKJ9tMJqNw6PcEv+2aJ_7a8iQ@mail.gmail.com>
References: <CAKH6PiXy_-+nAm08ciJvZL0mLeXxh3hFHasy-wcJhpQbE=DwiA@mail.gmail.com>
 <CABH=_VTCZz-N2rj6neWFfVb7=HKJ9tMJqNw6PcEv+2aJ_7a8iQ@mail.gmail.com>
Message-ID: <20240511091613.2399F215AA@orac.inputplus.co.uk>

Hi,

Paul W. wrote:
> The compilers were table-driven as much as possible, heavily overlaid,
> and used three scratch files on disk (split-cylinder allocated to
> minimize seek time).

I didn't know the term ‘split cylinder’.  A cylinder has multiple
tracks, each with its own read/write head.  Allocate different tracks to
different files and switching files needs no physical movement or
settling time; just electronically switch R/W head.

-- 
Cheers, Ralph.

From g.branden.robinson at gmail.com  Sat May 11 23:42:21 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 11 May 2024 08:42:21 -0500
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CABH=_VTCZz-N2rj6neWFfVb7=HKJ9tMJqNw6PcEv+2aJ_7a8iQ@mail.gmail.com>
References: <CAKH6PiXy_-+nAm08ciJvZL0mLeXxh3hFHasy-wcJhpQbE=DwiA@mail.gmail.com>
 <CABH=_VTCZz-N2rj6neWFfVb7=HKJ9tMJqNw6PcEv+2aJ_7a8iQ@mail.gmail.com>
Message-ID: <20240511134221.w7v35qdey7z7j6wf@illithid>

At 2024-05-10T13:28:40-0400, Paul Winalski wrote:
> On Wed, May 8, 2024 at 2:29 PM Douglas McIlroy <
> douglas.mcilroy at dartmouth.edu> wrote:
> > Dennis was one-up on Digitek in having a self-maintaining compiler.
> > Thus, when he implemented an optimization, the source would grow,
> > but the compiler binary might even shrink thanks to
> > self-application.
> 
> Another somewhat non-intuitive aspect of optimizing compilers is that
> simply adding optimizations can cause an increase in compilation speed
> by reducing the amount of IL in the program being compiled.  Less IL
> due to optimization means less time spent in later phases of the
> compilation process.

This fact was rediscovered later when people found that some code
compiled with "-Os" (optimize for space) was faster than some code
optimized for speed ("-O1", "-O2", and so on).

The reason turned out to be that the reduced code size meant fewer cache
evictions, so you gained performance by skipping instances of
instruction fetches all the way from the slow main memory bus.

Think of all those poor unrolled loops...

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240511/d0356e94/attachment.sig>

From steffen at sdaoden.eu  Sun May 12 06:48:16 2024
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Sat, 11 May 2024 22:48:16 +0200
Subject: [TUHS] nl section delimiters
In-Reply-To: <CAKH6PiWG-cFwQYPKDhKg_LdDKChC8odbxnnTOjC9+KZea7OgQA@mail.gmail.com>
References: <CAKH6PiWG-cFwQYPKDhKg_LdDKChC8odbxnnTOjC9+KZea7OgQA@mail.gmail.com>
Message-ID: <20240511204816.UwAcweCX@steffen%sdaoden.eu>

Douglas McIlroy wrote in
 <CAKH6PiWG-cFwQYPKDhKg_LdDKChC8odbxnnTOjC9+KZea7OgQA at mail.gmail.com>:
 |> But it misses the coolness of the empty true(1).
 |
 |Too cool. With an empty true(1), execl("true", "true", 0) is out in the
 |cold.

There i stand singing "ein Männlein steht im Walde" (a "little
man" stands in the forest).

..ok, but then i do note here and now the certain lists where the
question on whether an additional entry in the search path does
make any sense at all for certain constructs comes up regulary,
(even) i have lived this multiple times already, it is about

  The [.] command search [.] allows for a standard utility to be
  implemented as a regular built-in as long as it is found in the
  appropriate place in a PATH search.
  [.]command -v true might yield /bin/true or some similar pathname.

  Other [non-standard] utilities [.] might exist only as built-ins
  and have no pathname associated with them. These produce output
  identified as (regular) built-ins. Applications encountering
  these are not able to count on execing them, using them with
  nohup, overriding them with a different PATH, and so on.

The next POSIX standard will have around 4058 pages (3950 without
index) and 137171 lines (not counting index).

And i was surely laughing when this list it surely was came along
this somewhen in the past, and isn't that just "a muscle car":

  #?0|kent:unix-hist$ git show Research-V7:bin/true | wc -c
  0

Many greetings and best wishes!!

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

From athornton at gmail.com  Mon May 13 05:34:20 2024
From: athornton at gmail.com (Adam Thornton)
Date: Sun, 12 May 2024 12:34:20 -0700
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <20240511213532.GB8330@mit.edu>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
Message-ID: <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>

On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso at mit.edu> wrote:

>
> I bet most of the young'uns would not be trying to do this as a shell
> script, but using the Cloud SDK with perl or python or Go, which is
> *way* more bloaty than using /bin/sh.
>
> So while some of us old farts might be bemoaning the death of the Unix
> philosophy, perhaps part of the reality is that the Unix philosophy
> were ideal for a simpler time, but might not be as good of a fit
> today


I'm finding myself in agreement.  I might well do this with jq, but as you
point out, you're using the jq DSL pretty extensively to pull out the
fields.  On the other hand, I don't think that's very different than piping
stuff through awk, and I don't think anyone feels like _that_ would be
cheating.  And jq -L is pretty much equivalent to awk -F, which is how I
would do this in practice, rather than trying to inline the whole jq bit.

But it does come down to the same argument as
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

And it is true that while fork() is a great model for single-threaded
pipeline-looking tasks, it's not really what you want for an interactive
multithreaded application on your phone's GUI.

Oddly, I'd have a slightly different reason for reaching for Python (which
is probably how I'd do this anyway), and that's the batteries-included
bit.  If I write in Python, I've got the gcloud api available as a Python
module, and I've got a JSON parser also available as a Python module (but I
bet all the JSON unmarshalling is already handled in the gcloud library),
and I don't have to context-switch to the same degree that I would if I
were stringing it together in the shell.  Instead of "make an HTTP request
to get JSON text back, then parse that with repeated calls to jq", I'd just
get an object back from the instance fetch request, pick out the fields I
wanted, and I'd be done.

I'm afraid only old farts write anything in Perl anymore.  The kids just
mutter "OK, Boomer" when you try to tell them how much better CPAN was than
PyPi.  And it sure feels like all the cool kids have abandoned Go for Rust,
although Go would be a perfectly reasonable choice for this task as well
(and would look a lot like Python: get an object back, pick off the useful
fields).

Adam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240512/b2d93f10/attachment.htm>

From lm at mcvoy.com  Mon May 13 05:47:07 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Sun, 12 May 2024 12:47:07 -0700
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
Message-ID: <20240512194707.GL9216@mcvoy.com>

On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote:
> But it does come down to the same argument as
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
> 
> And it is true that while fork() is a great model for single-threaded
> pipeline-looking tasks, it's not really what you want for an interactive
> multithreaded application on your phone's GUI.

Perhaps a meaningless aside, but I agree on fork().  In the last major
project I did, which was cross platform {windows,macos, all the major
Unices, Linux}, we adopted spawn() rather than fork/exec.  There is no way
(that I know of) to fake fork() on Windows but it's easy to fake spawn().

--lm

From johnl at taugh.com  Mon May 13 06:13:48 2024
From: johnl at taugh.com (John Levine)
Date: 12 May 2024 16:13:48 -0400
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <20240512194707.GL9216@mcvoy.com>
Message-ID: <20240512201349.0DB6A8A9D055@ary.qy>

It appears that Larry McVoy <lm at mcvoy.com> said:
>Perhaps a meaningless aside, but I agree on fork().  In the last major
>project I did, which was cross platform {windows,macos, all the major
>Unices, Linux}, we adopted spawn() rather than fork/exec.  There is no way
>(that I know of) to fake fork() on Windows but it's easy to fake spawn().

The whole point of fork() is that it lets you get the effect of spawn with
a lot less internal mechanism.  Spawn is equivalent to:

  fork()
  ... do stuff to files and environment ...
  exec()

By separating the fork and the exec, they didn't have to put all of
the stuff in the 12 paragraphs in the spawn() man page into the the
tiny PDP-11 kernel.

These days now that programs include multi-megabyte shared libraries
just for fun, I agree that the argument is less persuasive.  On the
third hard, we now understand virtual memory and paging systems a lot
better so we don't need kludges like vfork().

R's,
John


From dave at horsfall.org  Mon May 13 06:43:35 2024
From: dave at horsfall.org (Dave Horsfall)
Date: Mon, 13 May 2024 06:43:35 +1000 (EST)
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
Message-ID: <alpine.BSF.2.21.9999.2405130640280.15285@aneurin.horsfall.org>

On Sun, 12 May 2024, Adam Thornton wrote:

> I'm afraid only old farts write anything in Perl anymore.  The kids just
> mutter "OK, Boomer" when you try to tell them how much better CPAN was than
> PyPi.  And it sure feels like all the cool kids have abandoned Go for Rust,
> although Go would be a perfectly reasonable choice for this task as well
> (and would look a lot like Python: get an object back, pick off the useful
> fields).

I must be an old fart then; the last language I used where white space was 
part of the syntax was FORTRAN...

-- Dave

From crossd at gmail.com  Mon May 13 08:56:35 2024
From: crossd at gmail.com (Dan Cross)
Date: Sun, 12 May 2024 18:56:35 -0400
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <20240512201349.0DB6A8A9D055@ary.qy>
References: <20240512194707.GL9216@mcvoy.com>
 <20240512201349.0DB6A8A9D055@ary.qy>
Message-ID: <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>

On Sun, May 12, 2024 at 4:14 PM John Levine <johnl at taugh.com> wrote:
> It appears that Larry McVoy <lm at mcvoy.com> said:
> >Perhaps a meaningless aside, but I agree on fork().  In the last major
> >project I did, which was cross platform {windows,macos, all the major
> >Unices, Linux}, we adopted spawn() rather than fork/exec.  There is no way
> >(that I know of) to fake fork() on Windows but it's easy to fake spawn().
>
> The whole point of fork() is that it lets you get the effect of spawn with
> a lot less internal mechanism.  Spawn is equivalent to:
>
>   fork()
>   ... do stuff to files and environment ...
>   exec()
>
> By separating the fork and the exec, they didn't have to put all of
> the stuff in the 12 paragraphs in the spawn() man page into the the
> tiny PDP-11 kernel.

Perhaps, but as I've written here before, `fork`/`exec` vs `spawn` is
a false dichotomy. Another alternative is a `proccreate`/`procrun`
pair, the former of which creates an unrunnable process, the latter of
which marks it runnable. Coupled with a set of primitives to
manipulate the state of an extant, but unrunnable, process and you
have the advantages of fork/exec without the downsides (which are
well-known; https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf).
Similarly, this gives you the functionality of spawn, without the
downside of a singularly complicated interface. Could you have
implemented that in something as small as the PDP-7? Perhaps not, but
it does not follow that `fork` now remains a good primitive.

My spelunking in the original GENIE documentation leads me to believe
that its `fork` provided functionality similar to what I described.

        - Dan C.

From lm at mcvoy.com  Mon May 13 09:34:54 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Sun, 12 May 2024 16:34:54 -0700
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
References: <20240512194707.GL9216@mcvoy.com>
 <20240512201349.0DB6A8A9D055@ary.qy>
 <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
Message-ID: <20240512233454.GM9216@mcvoy.com>

On Sun, May 12, 2024 at 06:56:35PM -0400, Dan Cross wrote:
> Similarly, this gives you the functionality of spawn, without the
> downside of a singularly complicated interface. Could you have
> implemented that in something as small as the PDP-7? Perhaps not, but
> it does not follow that `fork` now remains a good primitive.

Our spawnvp() implmentation is 40 lines of code.  Worked fine everywhere.
I can post it if you like.

From dave at horsfall.org  Mon May 13 11:34:38 2024
From: dave at horsfall.org (Dave Horsfall)
Date: Mon, 13 May 2024 11:34:38 +1000 (EST)
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <20240512233454.GM9216@mcvoy.com>
References: <20240512194707.GL9216@mcvoy.com>
 <20240512201349.0DB6A8A9D055@ary.qy>
 <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
 <20240512233454.GM9216@mcvoy.com>
Message-ID: <alpine.BSF.2.21.9999.2405131133310.15285@aneurin.horsfall.org>

On Sun, 12 May 2024, Larry McVoy wrote:

> Our spawnvp() implmentation is 40 lines of code.  Worked fine everywhere.
> I can post it if you like.

Pretty please...

-- Dave

From flexibeast at gmail.com  Mon May 13 12:33:55 2024
From: flexibeast at gmail.com (Alexis)
Date: Mon, 13 May 2024 12:33:55 +1000
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
 (Adam Thornton's message of "Sun, 12 May 2024 12:34:20 -0700")
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
Message-ID: <87y18ebfu4.fsf@gmail.com>


> On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso at mit.edu> 
> wrote:
>
> So while some of us old farts might be bemoaning the death of 
> the
> Unix
> philosophy, perhaps part of the reality is that the Unix 
> philosophy
> were ideal for a simpler time, but might not be as good of a fit
> today

Hm .... i guess it might depend on the specific use-case(s) 
involved?

At one point i realised that a primary reason i enjoy using *n*x 
systems is that they're fundamentally 
_text-oriented_. (Unsurprisingly, of course, given the context in 
which Unix was developed.) i spend a lot of my time interacting 
and working with text, and *n*x systems provide me with many 
useful tools for this. Quoting the old "UNIX As Literature" piece, 
https://theody.net/elements.html:

"[T]he most recurrent complaint was that [Unix] was too 
text-oriented. People really hated the command line, with all the 
utilities, obscure flags, and arguments they had to memorize. They 
hated all the typing. One mislaid character and you had to start 
over. Interestingly, this complaint came most often from users of 
the GUI-laden Macintosh or Windows platforms. ...

"[A] suspiciously high proportion of my UNIX colleagues had 
already developed, in some prior career, a comfort and fluency 
with text and printed words. ...

"With UNIX, text — on the command line, STDIN, STDOUT, STDERR — is 
the primary interface mechanism: UNIX system utilities are a sort 
of Lego construction set for word-smiths. Pipes and filters 
connect one utility to the next, text flows invisibly 
between. Working with a shell, awk/lex derivatives, or the utility 
set is literally a word dance."

Perl, with its pervasive regex-based functionality and extensive 
Unicode support, fits neatly into this. i find regexes an 
_incredibly_ powerful tool for working with text, whether via 
Perl, sed, awk, or whatever. But my experience is that many people 
treat regexes as an anathema, with Zawinski's "Now you have two 
problems" regularly trotted out as a thought-terminating 
cliché. Sure, regexes can, and do, get used where they shouldn't 
be[a]; that doesn't mean the baby should be thrown out with the 
bathwater. 

But if one is only working with text under sufferance, trying to 
avoid it via substantially more graphically-oriented environments, 
the text-based "Unix philosophy" and the tools associated with it 
might feel (and actually be) much less appropriate and 
useful. Fair enough. The Unix construction set will still be there 
for those of us who find them very appropriate and tremendously 
useful.


Alexis.

[a] It seems unlikely that anyone on this list hasn't already seen 
this, but just in case:

https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

i'm looking forward to that comment sending OpenAI over the 
Mountains of Madness.

From imp at bsdimp.com  Mon May 13 12:57:05 2024
From: imp at bsdimp.com (Warner Losh)
Date: Sun, 12 May 2024 20:57:05 -0600
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <87y18ebfu4.fsf@gmail.com>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
 <87y18ebfu4.fsf@gmail.com>
Message-ID: <CANCZdfpKqPiqT7-8bVVOtHb-oQnf9xKm2V5Sm14HBxwbC0Hy1g@mail.gmail.com>

On Sun, May 12, 2024, 8:34 PM Alexis <flexibeast at gmail.com> wrote:

>
> > On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso at mit.edu>
> > wrote:
> >
> > So while some of us old farts might be bemoaning the death of
> > the
> > Unix
> > philosophy, perhaps part of the reality is that the Unix
> > philosophy
> > were ideal for a simpler time, but might not be as good of a fit
> > today
>
> Hm .... i guess it might depend on the specific use-case(s)
> involved?
>

I created, years ago, a set of time legos. They were connected as a network
of producer / consumer interfaces. Each lego would do one thing and pass
the results to the next thing in the chain. A driver would read timing data
from the driver and convert it to a MI interface. Different other legos
would take time differences, compute phase or frequency differences and
these would feed into more sophisticated algorithms or output etc. All
locking was on yhe pipe's queues so all these algorithms were lock free
apart from the queueing or dequeueing of data.

Concrptually, this is just a bunch of pipe, with many to 1 or 1 to many
added. Each lego did one thing and passed the results along to the thing in
the chain... much like 'cmd | grep | awk | more'. Plus MI data
representations for almost everything so only the driver reader thread
cared about the hw. See also tty abstraction or ifnet abstraction in
unix....

So actually not a set of FDs passing data between process, but threads
doing the same sort of thing.

The whole data filtering paradigm works in lots of different ways. And it
still works really well by analogy.

Warner

ObComplaint: fork sucks for address spaces with 100s of threads. Forst
thing we created a child process we used to broker different threads
needing to run popen or system... having a create process / munge process /
start process API is kinda what we did behind the scenes though with "send
this data" and "receive that data". We iterated to this after the first
dozen attempts to closely broker fork/exec dance proved... unreliable.

At one point i realised that a primary reason i enjoy using *n*x
> systems is that they're fundamentally
> _text-oriented_. (Unsurprisingly, of course, given the context in
> which Unix was developed.) i spend a lot of my time interacting
> and working with text, and *n*x systems provide me with many
> useful tools for this. Quoting the old "UNIX As Literature" piece,
> https://theody.net/elements.html:
>
> "[T]he most recurrent complaint was that [Unix] was too
> text-oriented. People really hated the command line, with all the
> utilities, obscure flags, and arguments they had to memorize. They
> hated all the typing. One mislaid character and you had to start
> over. Interestingly, this complaint came most often from users of
> the GUI-laden Macintosh or Windows platforms. ...
>
> "[A] suspiciously high proportion of my UNIX colleagues had
> already developed, in some prior career, a comfort and fluency
> with text and printed words. ...
>
> "With UNIX, text — on the command line, STDIN, STDOUT, STDERR — is
> the primary interface mechanism: UNIX system utilities are a sort
> of Lego construction set for word-smiths. Pipes and filters
> connect one utility to the next, text flows invisibly
> between. Working with a shell, awk/lex derivatives, or the utility
> set is literally a word dance."
>
> Perl, with its pervasive regex-based functionality and extensive
> Unicode support, fits neatly into this. i find regexes an
> _incredibly_ powerful tool for working with text, whether via
> Perl, sed, awk, or whatever. But my experience is that many people
> treat regexes as an anathema, with Zawinski's "Now you have two
> problems" regularly trotted out as a thought-terminating
> cliché. Sure, regexes can, and do, get used where they shouldn't
> be[a]; that doesn't mean the baby should be thrown out with the
> bathwater.
>
> But if one is only working with text under sufferance, trying to
> avoid it via substantially more graphically-oriented environments,
> the text-based "Unix philosophy" and the tools associated with it
> might feel (and actually be) much less appropriate and
> useful. Fair enough. The Unix construction set will still be there
> for those of us who find them very appropriate and tremendously
> useful.
>
>
> Alexis.
>
> [a] It seems unlikely that anyone on this list hasn't already seen
> this, but just in case:
>
>
> https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
>
> i'm looking forward to that comment sending OpenAI over the
> Mountains of Madness.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240512/b74a9bbd/attachment.htm>

From andreww591 at gmail.com  Mon May 13 13:29:01 2024
From: andreww591 at gmail.com (Andrew Warkentin)
Date: Sun, 12 May 2024 21:29:01 -0600
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
References: <20240512194707.GL9216@mcvoy.com>
 <20240512201349.0DB6A8A9D055@ary.qy>
 <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
Message-ID: <CAD-qYGp1io4FsTQSkXXUJ-w7S19u23eXxiBE5Jm9fy8+U20++w@mail.gmail.com>

On Sun, May 12, 2024 at 4:57 PM Dan Cross <crossd at gmail.com> wrote:
>l.
>
> Perhaps, but as I've written here before, `fork`/`exec` vs `spawn` is
> a false dichotomy. Another alternative is a `proccreate`/`procrun`
> pair, the former of which creates an unrunnable process, the latter of
> which marks it runnable. Coupled with a set of primitives to
> manipulate the state of an extant, but unrunnable, process and you
> have the advantages of fork/exec without the downsides (which are
> well-known; https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf).
> Similarly, this gives you the functionality of spawn, without the
> downside of a singularly complicated interface. Could you have
> implemented that in something as small as the PDP-7? Perhaps not, but
> it does not follow that `fork` now remains a good primitive.
>
IMO something like that is the best model (although it probably would
have been a bit complicated for a PDP-7/PDP-11). That's basically what
I'm doing in the OS that I'm writing
<https://gitlab.com/uxrt/uxrt-toplevel>. Processes will basically just
be containers for hierarchical groups of threads, and will have pretty
much no other state besides the command line. All of the context
normally associated with a process (file descriptor space,
permissions/UID/GID, filesystem namespace, virtual address space) will
instead be in separate objects that are explicitly bound to threads.
Separate APIs for creating an empty process, creating threads within
it, manipulating context objects and binding threads to them, and
starting the process will be provided (all of these APIs will use a
file-based transport underneath; this will be the first OS I know of
where literally everything is a file). The base process APIs will be
general enough to allow an efficient copy-on-write fork() to be
implemented on top of them for backwards compatibility and the
remaining use cases where forking still makes sense (since even all
process memory will be implemented with files, this will be
implemented with a special in-memory "shadow filesystem" that creates
alternate mappings of other memory filesystems).

Really I'd say there are actually several design decisions in
conventional Unix that made sense on a PDP-7 or PDP-11, but no longer
make sense in the modern world. For instance, the rather inflexible
security model with its fixed set of root-only system calls rather
than some form of role-based access control, or the use of on-disk
device nodes bound by numbers rather than something like separate
special filesystems for each driver that get union mounted together,
or the lack of integrated support for userspace filesystem servers
(yes, there's FUSE, but it's kind of a poorly integrated hack that is
rarely used for anything important).

From meillo at marmaro.de  Mon May 13 15:23:47 2024
From: meillo at marmaro.de (markus schnalke)
Date: Mon, 13 May 2024 07:23:47 +0200
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
Message-ID: <1s6OA7-1nI-00@marmaro.de>

Hoi.

> On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso at mit.edu> wrote:
> 
>     I bet most of the young'uns would not be trying to do this as a shell
>     script, but using the Cloud SDK with perl or python or Go, which is
>     *way* more bloaty than using /bin/sh.
> 
>     So while some of us old farts might be bemoaning the death of the Unix
>     philosophy, perhaps part of the reality is that the Unix philosophy
>     were ideal for a simpler time, but might not be as good of a fit
>     today

It depends on what the Unix philosophy is seen to be. If it is
solving problems by reading text from standard in and printing to
standard out, then that might not be suitable anymore for many of
today's problems. But if it is prefering plain text to binary,
perfering simple solutions to complex ones, increasing the number
of operations one can perform by combining small generic parts,
... all because of good reasons ... Focussing on simplicity,
clarity, generality ... Omitting needless words! ... All this still
holds true, no matter if applied as shell scripts or within the
design of a new programming language or a programming interface.

It's not so much about the tools we use -- these should be suited
for the times you live in and the problems you have to solve --
but it's more about how you look at them and how you look at the
problems and what ideas for solutions you can imagine in your
mind. Here, Unix provides a continuing inspiration.

Only, like with every old book: when we read it today, we have to
read it within the background of the times back then and transfer
its message to today's times. The older the book, the more transfer
work has to be done, the more knowledgable the then younger and
more distant readers have to be, to really understand it.

Thus, in my oppinion, the Unix philosophy remains a good and very
relevant fit today, although not all of its applications from back
then still are.


meillo

From andreww591 at gmail.com  Mon May 13 16:18:00 2024
From: andreww591 at gmail.com (Andrew Warkentin)
Date: Mon, 13 May 2024 00:18:00 -0600
Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized
 Tools
In-Reply-To: <1s6OA7-1nI-00@marmaro.de>
References: <CAC20D2PeOLQre8x4XcJdyGGAFmKUC3j-JB=v=j+VJ7KuFrGojw@mail.gmail.com>
 <CAGg_6+Ov6hYTxQ5M-hEBoOiUQ0UVRP0V+aVi0STKAALLDUGY7g@mail.gmail.com>
 <CAEoi9W7FbGZFhiddHWWqdivGFfgFAj9nsUApomswfP56rqTMpQ@mail.gmail.com>
 <20240511213532.GB8330@mit.edu>
 <CAP2nic1zeJG7tkDbUnEMjxUG_TmYLwCQJhfS6Yt+E1w5s2s4cw@mail.gmail.com>
 <1s6OA7-1nI-00@marmaro.de>
Message-ID: <CAD-qYGqbqLZr+3GQ0zBD0HQn9sYLfMy=Y0SS0Km8zYABtZ0QBw@mail.gmail.com>

On Sun, May 12, 2024 at 11:23 PM markus schnalke <meillo at marmaro.de> wrote:
>
>
> It depends on what the Unix philosophy is seen to be. If it is
> solving problems by reading text from standard in and printing to
> standard out, then that might not be suitable anymore for many of
> today's problems. But if it is prefering plain text to binary,
> perfering simple solutions to complex ones, increasing the number
> of operations one can perform by combining small generic parts,
> ... all because of good reasons ... Focussing on simplicity,
> clarity, generality ... Omitting needless words! ... All this still
> holds true, no matter if applied as shell scripts or within the
> design of a new programming language or a programming interface.
>
> It's not so much about the tools we use -- these should be suited
> for the times you live in and the problems you have to solve --
> but it's more about how you look at them and how you look at the
> problems and what ideas for solutions you can imagine in your
> mind. Here, Unix provides a continuing inspiration.
>
> Only, like with every old book: when we read it today, we have to
> read it within the background of the times back then and transfer
> its message to today's times. The older the book, the more transfer
> work has to be done, the more knowledgable the then younger and
> more distant readers have to be, to really understand it.
>
> Thus, in my oppinion, the Unix philosophy remains a good and very
> relevant fit today, although not all of its applications from back
> then still are.
>
I agree, but it seems that most Unix developers haven't really cared
since the side branches and clones effectively took over from Research
Unix in the early 80s. They've added system calls and ad-hoc socket
RPC interfaces with abandon instead of using generic filesystem-based
extensibility APIs, added options to various commands that should just
have been separate programs, and written desktop
environments/applications that have poor composability, extensibility
and modularity (I guess KDE's KParts kind of counts as a mechanism for
composing applications, but it's limited by being based on plugins
rather than an open IPC-based API). The only Unix desktop I can think
of that really tries to follow the Unix philosophy somewhat is the
now-abandoned Étoilé <http://etoileos.com/etoile/>. There's also the
desktops of the rather obscure BTRON family
<http://tronweb.super-nova.co.jp/btronproducts.html>, although those
OSes are only vaguely Unix-like. Both have an object-centric rather
than application-centric model with support for embedding applications
within each other and controlling them with RPC APIs.

IMO, the best practical realization of the Unix philosophy for the
modern era would be a QNX/Plan 9-like OS with an Étoilé/BTRON-like
desktop, hence why I'm working on one. Some of the specifics of the
original Unix philosophy may not be relevant to large parts of modern
computing, but I'd say the general ideas still are.

From chet.ramey at case.edu  Mon May 13 23:12:05 2024
From: chet.ramey at case.edu (Chet Ramey)
Date: Mon, 13 May 2024 09:12:05 -0400
Subject: [TUHS] nl section delimiters
In-Reply-To: <20240511204816.UwAcweCX@steffen%sdaoden.eu>
References: <CAKH6PiWG-cFwQYPKDhKg_LdDKChC8odbxnnTOjC9+KZea7OgQA@mail.gmail.com>
 <20240511204816.UwAcweCX@steffen%sdaoden.eu>
Message-ID: <d5cd2e8f-e480-41bf-b498-b52a0b6457fe@case.edu>

On 5/11/24 4:48 PM, Steffen Nurpmeso wrote:

>    The [.] command search [.] allows for a standard utility to be
>    implemented as a regular built-in as long as it is found in the
>    appropriate place in a PATH search.
>    [.]command -v true might yield /bin/true or some similar pathname.

To be fair, no one really implements this. ksh93 is the shell that comes
closest.

The next edition of the standard acknowledges the status quo with the
`intrinsics' concept.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240513/58355cec/attachment.sig>

From lm at mcvoy.com  Mon May 13 23:21:54 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Mon, 13 May 2024 06:21:54 -0700
Subject: [TUHS] forking,
 Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <alpine.BSF.2.21.9999.2405131133310.15285@aneurin.horsfall.org>
References: <20240512194707.GL9216@mcvoy.com>
 <20240512201349.0DB6A8A9D055@ary.qy>
 <CAEoi9W51WC0R-ZTFz_AC_UY5K1OAjqf6158tG0K5bK1ofhmxuw@mail.gmail.com>
 <20240512233454.GM9216@mcvoy.com>
 <alpine.BSF.2.21.9999.2405131133310.15285@aneurin.horsfall.org>
Message-ID: <20240513132154.GN9216@mcvoy.com>

On Mon, May 13, 2024 at 11:34:38AM +1000, Dave Horsfall wrote:
> On Sun, 12 May 2024, Larry McVoy wrote:
> 
> > Our spawnvp() implmentation is 40 lines of code.  Worked fine everywhere.
> > I can post it if you like.
> 
> Pretty please...
> 
> -- Dave

/*
 * Copyright 1999-2002,2004-2006,2015-2016 BitMover, Inc
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include "system.h"

void	(*spawn_preHook)(int flags, char *av[]) = 0;

#ifndef WIN32
pid_t
bk_spawnvp(int flags, char *cmdname, char *av[])
{
	int	fd, status;
	pid_t	pid;
	char	*exec;

	/* Tell the calling process right away if there is no such program */
	unless (exec = which((char*)cmdname)) return (-1);

	if (spawn_preHook) spawn_preHook(flags, av);
	if (pid = fork()) {	/* parent */
		free(exec);
		if (pid == -1) return (pid);
		unless (flags & (_P_DETACH|_P_NOWAIT)) {
			if (waitpid(pid, &status, 0) != pid) status = -1;
			return (status);
		}
		return (pid);
	} else {		/* child */
		/*
		 * See win32/uwtlib/wapi_intf.c:spawnvp_ex()
		 * We leave nothing open on a detach, but leave
		 * in/out/err open on a normal fork/exec.
		 */
		if (flags & _P_DETACH) {
			unless (getenv("_NO_SETSID")) setsid();
			/* close everything to match winblows */
			for (fd = 0; fd < 100; fd++) (close)(fd);
		} else {
			/*
			 * Emulate having everything except in/out/err
			 * as being marked as close on exec to match winblows.
			 */
			for (fd = 3; fd < 100; fd++) (close)(fd);
		}
		execv(exec, av);
		perror(exec);
		_exit(19);
	}
}

#else /* ======== WIN32 ======== */

pid_t
bk_spawnvp(int flags, char *cmdname, char *av[])
{
	pid_t	pid;
	char	*exec;

	/* Tell the calling process right away if there is no such program */
	unless (exec = which((char*)cmdname)) return (-1);

	if (spawn_preHook) spawn_preHook(flags, av);
	/*
	 * We use our own version of spawn in uwtlib
	 * because the NT spawn() does not work well with tcl
	 */
	pid = _spawnvp_ex(flags, exec, av, 1);
	free(exec);
	return (pid);
}
#endif /* WIN32 */


From douglas.mcilroy at dartmouth.edu  Mon May 13 23:34:36 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Mon, 13 May 2024 09:34:36 -0400
Subject: [TUHS] If forking is bad, how about buffering?
Message-ID: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>

So fork() is a significant nuisance. How about the far more ubiquitous
problem of IO buffering?

On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote:
> But it does come down to the same argument as
>
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf

The Microsoft manifesto says that fork() is an evil hack. One of the cited
evils is that one must remember to flush output buffers before forking, for
fear it will be emitted twice. But buffering is the culprit, not the
victim. Output buffers must be flushed for many other reasons: to avoid
deadlock; to force prompt delivery of urgent output; to keep output from
being lost in case of a subsequent failure. Input buffers can also steal
data by reading ahead into stuff that should go to another consumer. In all
these cases buffering can break compositionality. Yet the manifesto blames
an instance of the hazard on fork()!

To assure compositionality, one must flush output buffers at every possible
point where an unknown downstream consumer might correctly act on the
received data with observable results. And input buffering must never
ingest data that the program will not eventually use. These are tough
criteria to meet in general without sacrificing buffering.

The advent of pipes vividly exposed the non-compositionality of output
buffering. Interactive pipelines froze when users could not provide input
that would force stuff to be flushed until the input was informed by that
very stuff. This phenomenon motivated cat -u, and stdio's convention of
line buffering for stdout. The premier example of input buffering eating
other programs' data was mitigated by "here documents" in the Bourne shell.

These precautions are mere fig leaves that conceal important special cases.
The underlying evil of buffered IO still lurks. The justification is that
it's necessary to match the characteristics of IO devices and to minimize
system-call overhead.  The former necessity requires the attention of
hardware designers, but the latter is in the hands of programmers. What can
be done to mitigate the pain of border-crossing into the kernel? L4 and its
ilk have taken a whack. An even more radical approach might flow from the
"whitepaper" at www.codevalley.com.

In any even the abolition of buffering is a grand challenge.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240513/3b254986/attachment.htm>

From andreww591 at gmail.com  Tue May 14 08:01:24 2024
From: andreww591 at gmail.com (Andrew Warkentin)
Date: Mon, 13 May 2024 16:01:24 -0600
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
Message-ID: <CAD-qYGp_38GsC-V64poiTnkX40bqMZqSbLXm+Ngogvo4fQTbjQ@mail.gmail.com>

On Mon, May 13, 2024 at 7:42 AM Douglas McIlroy
<douglas.mcilroy at dartmouth.edu> wrote:
>
>
> These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead.  The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com.
>
QNX copies messages directly between address spaces without any
intermediary buffering, similarly to L4-like kernels. However, some of
its libraries and servers do still use intermediary buffers.

From robpike at gmail.com  Tue May 14 17:10:38 2024
From: robpike at gmail.com (Rob Pike)
Date: Tue, 14 May 2024 17:10:38 +1000
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
Message-ID: <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>

I agree with your (as usual) perceptive analysis. Only stopping by to point
out that I took the buffering out of cat. I didn't have your perspicacity
on why it should happen, just a desire to remove all the damn flags. When I
was done, cat.c was 35 lines long. Do a read, do a write, continue until
EOF. Guess what? That's all you need if you want to cat files.

Sad to say Bell Labs's cat door was hard to open and most of the world
still has a cat with flags. And buffers.

-rob


On Mon, May 13, 2024 at 11:35 PM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

> So fork() is a significant nuisance. How about the far more ubiquitous
> problem of IO buffering?
>
> On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote:
> > But it does come down to the same argument as
> >
> https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
>
> The Microsoft manifesto says that fork() is an evil hack. One of the cited
> evils is that one must remember to flush output buffers before forking, for
> fear it will be emitted twice. But buffering is the culprit, not the
> victim. Output buffers must be flushed for many other reasons: to avoid
> deadlock; to force prompt delivery of urgent output; to keep output from
> being lost in case of a subsequent failure. Input buffers can also steal
> data by reading ahead into stuff that should go to another consumer. In all
> these cases buffering can break compositionality. Yet the manifesto blames
> an instance of the hazard on fork()!
>
> To assure compositionality, one must flush output buffers at every
> possible point where an unknown downstream consumer might correctly act on
> the received data with observable results. And input buffering must never
> ingest data that the program will not eventually use. These are tough
> criteria to meet in general without sacrificing buffering.
>
> The advent of pipes vividly exposed the non-compositionality of output
> buffering. Interactive pipelines froze when users could not provide input
> that would force stuff to be flushed until the input was informed by that
> very stuff. This phenomenon motivated cat -u, and stdio's convention of
> line buffering for stdout. The premier example of input buffering eating
> other programs' data was mitigated by "here documents" in the Bourne shell.
>
> These precautions are mere fig leaves that conceal important special
> cases. The underlying evil of buffered IO still lurks. The justification is
> that it's necessary to match the characteristics of IO devices and to
> minimize system-call overhead.  The former necessity requires the attention
> of hardware designers, but the latter is in the hands of programmers. What
> can be done to mitigate the pain of border-crossing into the kernel? L4 and
> its ilk have taken a whack. An even more radical approach might flow from
> the "whitepaper" at www.codevalley.com.
>
> In any even the abolition of buffering is a grand challenge.
>
> Doug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240514/2f9f5b28/attachment.htm>

From g.branden.robinson at gmail.com  Tue May 14 21:10:32 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Tue, 14 May 2024 06:10:32 -0500
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
Message-ID: <20240514111032.2kotrrjjv772h5f4@illithid>

I've wondered about the cat flag war myself, and have a theory.  Might
as well air it here since the real McCoy (and McIlroy) are available to
shoot it down.  :)

I'm sure the following attempt at knot-slashing is not novel, but people
relentlessly return to this issue as if the presence of _flags_ is the
problem.  (Plan 9 fans recite this point ritually, like a mantra.)

I say it isn't.

At 2024-05-14T17:10:38+1000, Rob Pike wrote:
> I agree with your (as usual) perceptive analysis. Only stopping by to
> point out that I took the buffering out of cat. I didn't have your
> perspicacity on why it should happen, just a desire to remove all the
> damn flags. When I was done, cat.c was 35 lines long. Do a read, do a
> write, continue until EOF. Guess what? That's all you need if you want
> to cat files.
> 
> Sad to say Bell Labs's cat door was hard to open and most of the world
> still has a cat with flags. And buffers.

I think this dispute is a proxy fight between two communities, or more
precisely two views of what cat(1), and other elementary Unix commands,
primarily exist to achieve.  In my opinion both perspectives are valid,
and it's better to consider what each perspective wants than mandate
that either is superior.

Viewpoint 1: Perspective from Pike's Peak

Elementary Unix commands should be elementary.  Unix is a kernel.
Programs that do simple things with system calls should remain simple.
This practices makes the system (the kernel interface) easier to learn,
and to motivate and justify to others.  Programs therefore test the
simplicity and utility of, and can reveal flaws in, the set of
primitives that the kernel exposes.  This is valuable stuff for a
research organization.  "Research" was right there in the CSRC's name.

Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]

cat(1)'s man page did not advertise the traits in the foregoing
viewpoint as objectives, and never did.[2]  Its avowed purpose was to
copy, without interruption or separation, 1..n files from storage to and
output channel or stream (which might be redirected).

I don't need to tell convince that this is a worthwhile application.
But when we think about the many possible ways--and destinations--a
person might have in mind for that I/O channel, we have to face the
necessity of buffering or performance goes through the floor.

It is 1978.  Some VMS or, ugh, CP/M advocate from those piddly little
toy machines will come along.  "Ha ha," they will say, "our OS is way
faster than the storied Unix even at the simple task of dumping files".

Nowhere[citation needed] outside of C tutorials is cat implemented as

int c;
while((c = getchar()) != EOF) putchar(c);

or its read()/write() system call equivalent.

The output channel might be across a network in a distributed computing
environment.  Nobody wants to work with one byte at a time in that
situation.  Ethernet's minimum packet size is 64 bytes.  No one wants
that kind of overhead.

While composing this mail, I had a look at an early, pre-C version of
cat, spelling error in the only comment line and all.

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/cat.s

putc:
	movb	r0,(r2)+
	cmp	r2,$obuf+512.
	blo	1f
	mov	$1,r0
	sys	write; obuf; 512.
	mov	$obuf,r2

Well, look at that.  Buffering.  The author of this tool of course knew
the kernel well, including the size of its internal disk buffers (on the
assumption that I/O would mainly be happening to and from disks).

But that's a "leaky abstraction", or a "layering violation".  (That'll
be two tickets to the eternal fires of Brogrammer Hell, thanks.)  Once
you sweep away the break room buzzwords we understand that cat is
presuming things that it should not (the size of the kernel's buffers,
and the nature of devices serving as source and sink).

And this, as we all know, is one of the reasons the standard I/O library
came into existence.  Mike Lesk, I surmise, understood that the
"applications programmer" having knowledge of kernel internals was in
general neither necessary nor desirable.

What _should_ have happened, IMAO, is that as stdio.h came into
existence and the commercialization and USG/PWB-ification of Unix became
truly inevitable, is that Viewpoint 1 should have been salvaged for the
benefit of continuing operating systems research and kernel development.

But!

We should have kept cat(1), and let it grow as many flags as practical
use demanded--_except_ for `-u`--and at the _same time_ developed a new
kcat(1) command that really was just a thin wrapper around system calls.
Then you'd be a lot closer to measuring what the kernel was really
doing, what you were paying for it, and you could still boast of your
elegance in OS textbooks.

I concede that the name "kcat" would have been twice the length a
certain prominent user of the Unix kernel would have tolerated.  Maybe
"kc" would have been better.  The remaining 61 alphanumeric sigils that
might follow the 'k' would have been reserved for other exercises of the
kernel interface.  If your kernel is sufficiently lean,[3] 62 cases
exercising it ought to be enough for anybody.

Regards,
Branden

[1] https://news.ycombinator.com/item?id=29082014
[2] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[3] https://dl.acm.org/doi/10.1145/224056.224075
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240514/1e5a0f81/attachment.sig>

From ggm at algebras.org  Wed May 15 08:08:49 2024
From: ggm at algebras.org (George Michaelson)
Date: Wed, 15 May 2024 08:08:49 +1000
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
Message-ID: <CAKr6gn0Dm+qc+jM6UTfwe0oZGEXC5tY91e49KNFjps0q0nNS2w@mail.gmail.com>

Maybe dd is the right place to decide how to buffer?  It appears to
understand thats part of it's role. I use mbuffer, and I have
absolutely no idea if its proffered buffer, scatter/gather, SETSOCKOPT
behaviour does or does not improve things but I use it, even though
netcat exists...

G

From tuhs at tuhs.org  Wed May 15 08:34:37 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Tue, 14 May 2024 15:34:37 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
Message-ID: <A673B76D-4E21-4884-8FD1-1038E92F9B04@iitbombay.org>

Buffering is used all over the place. Even serial devices use a 16 byte of buffer -- all to reduce the cost of per unit (character, disk block or packet etc.) processing or to smooth data flow or to utilize the available bandwidth. But in such applications the receiver/sender usually has a way of getting an alert when the FIFO has data/is empty. As long as you provide that you can compose more complex network of components. Imagine components connected via FIFOs that provide empty, almost empty, almost full, full signals. And may be more in case of lossy connections. [Though at a lower level you'd model these fifo as components too so at that level there'd be *no* buffering! Sort of like Carl Hewitt's Actor model!]

Your complaint seems more about how buffers are currently used and where the "network" of components are dynamically formed.

> On May 13, 2024, at 6:34 AM, Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:
> 
> So fork() is a significant nuisance. How about the far more ubiquitous problem of IO buffering?
> 
> On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote:
> > But it does come down to the same argument as
> > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf
> 
> The Microsoft manifesto says that fork() is an evil hack. One of the cited evils is that one must remember to flush output buffers before forking, for fear it will be emitted twice. But buffering is the culprit, not the victim. Output buffers must be flushed for many other reasons: to avoid deadlock; to force prompt delivery of urgent output; to keep output from being lost in case of a subsequent failure. Input buffers can also steal data by reading ahead into stuff that should go to another consumer. In all these cases buffering can break compositionality. Yet the manifesto blames an instance of the hazard on fork()! 
> 
> To assure compositionality, one must flush output buffers at every possible point where an unknown downstream consumer might correctly act on the received data with observable results. And input buffering must never ingest data that the program will not eventually use. These are tough criteria to meet in general without sacrificing buffering.
> 
> The advent of pipes vividly exposed the non-compositionality of output buffering. Interactive pipelines froze when users could not provide input that would force stuff to be flushed until the input was informed by that very stuff. This phenomenon motivated cat -u, and stdio's convention of line buffering for stdout. The premier example of input buffering eating other programs' data was mitigated by "here documents" in the Bourne shell.
> 
> These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead.  The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com <http://www.codevalley.com/>.
> 
> In any even the abolition of buffering is a grand challenge.
> 
> Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240514/d6a2bb1b/attachment.htm>

From crossd at gmail.com  Thu May 16 00:42:33 2024
From: crossd at gmail.com (Dan Cross)
Date: Wed, 15 May 2024 10:42:33 -0400
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240514111032.2kotrrjjv772h5f4@illithid>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
Message-ID: <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>

On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson
<g.branden.robinson at gmail.com> wrote:
> [snip]
> Viewpoint 1: Perspective from Pike's Peak

Clever.

> Elementary Unix commands should be elementary.  Unix is a kernel.
> Programs that do simple things with system calls should remain simple.
> This practices makes the system (the kernel interface) easier to learn,
> and to motivate and justify to others.  Programs therefore test the
> simplicity and utility of, and can reveal flaws in, the set of
> primitives that the kernel exposes.  This is valuable stuff for a
> research organization.  "Research" was right there in the CSRC's name.

I believe this is at once making a more complex argument than was
proffered, and at the same misses the contextual essence that Unix was
created in.

> Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]
>
> cat(1)'s man page did not advertise the traits in the foregoing
> viewpoint as objectives, and never did.[2]  Its avowed purpose was to
> copy, without interruption or separation, 1..n files from storage to and
> output channel or stream (which might be redirected).
>
> I don't need to tell convince that this is a worthwhile application.
> But when we think about the many possible ways--and destinations--a
> person might have in mind for that I/O channel, we have to face the
> necessity of buffering or performance goes through the floor.
>
> It is 1978.  Some VMS

I don't know about that; VMS IO is notably slower than Unix IO by
default. Unlike VMS, Unix uses the buffer cache to serialize access to
the underlying storage device(s). Ironically, caching here is a major
win, not just for speed, but to make it relatively easy to reason
about the state of a block, since that state is removed from the
minutiae of the underlying storage device and instead handled in the
bio layer. Treating the block cache as a fixed-size pool yields a
relatively simple state machine for synchronizing between the
in-memory and on-disk representations of data.

>[snip]
> And this, as we all know, is one of the reasons the standard I/O library
> came into existence.  Mike Lesk, I surmise, understood that the
> "applications programmer" having knowledge of kernel internals was in
> general neither necessary nor desirable.

I'm not sure about that.  I suspect that the justification _may_ have
been more along the lines of noting that many programs implemented
their own, largely similar buffering strategies, and that it was
preferable to centralize those into a single library, and also noting
that building some kinds of programs was inconvenient using raw system
calls. For instance, something like `gets` is handy, but is _annoying_
to write using just read(2). It can obviously be done, but if I don't
have to, I'd prefer not to.

> [snip]
> We should have kept cat(1), and let it grow as many flags as practical
> use demanded--_except_ for `-u`--and at the _same time_ developed a new
> kcat(1) command that really was just a thin wrapper around system calls.
> Then you'd be a lot closer to measuring what the kernel was really
> doing, what you were paying for it, and you could still boast of your
> elegance in OS textbooks.
> [snip]

Here's where I think this misses the mark: this focuses too much on
the idea that simple programs exist as to be tests for, and exemplars
of, the kernel system call interface, but what evidence do you have
for that? A simpler explanation is that simple programs are easier to
write, easier to read, easier to reason about, test, and examine for
correctness. Unix amplified this with Doug's "garden hoses of data"
idea and the advent of pipes; here, it was found that small, simple
programs could be combined in often surprisingly unanticipated ways.

Unix built up a philosophy about _how_ to write programs that was
rooted in the problems that were interesting when Unix was first
created. Something we often forget is that research systems are built
to address problems that are interesting _to the researchers who build
them_. This context can shape a system, and we see that with Unix: a
highly synchronous system call interface, because overly elaborate
async interfaces were hard to program; a simple file abstraction that
was easy to use (open/creat/read/write/close/seek/stat) because files
on other contemporary systems were baroque things that were difficult
to use; a simple primitive for the creation of processes because,
again, on other systems processes were very heavy, complicated things
that were difficult to use. Unix took problems related to IO and
processes and made them easy. By the 80s, these were pretty well
understood, so focus shifted to other things (languages, networking,
etc).

Unix is one of those rare beasts that escaped the lab and made it out
there in the wild. It became the workhorse that beget a whole two or
three generations of commercial work; it's unsurprising that when the
web explosion happened, Unix became the basis for it: it was there, it
was familiar, and by then it wasn't a research project anymore, but a
basis for serious commercial work. That it has retained the original
system call interface is almost incidental; perhaps that fits with
your brocolli-man analogy.

        - Dan C.

From g.branden.robinson at gmail.com  Thu May 16 02:42:12 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Wed, 15 May 2024 11:42:12 -0500
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
Message-ID: <20240515164212.beswgy4h2nwvbdck@illithid>

Hi Dan,

Thanks for the considered response.  I was beginning to fear that my
musing was of moronically minimal merit.

At 2024-05-15T10:42:33-0400, Dan Cross wrote:
> On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson
> <g.branden.robinson at gmail.com> wrote:
> > [snip]
> > Viewpoint 1: Perspective from Pike's Peak
> 
> Clever.

If Rob's never heard _that_ one before, I am deeply disappointed.

> > Elementary Unix commands should be elementary.  Unix is a kernel.
> > Programs that do simple things with system calls should remain
> > simple.  This practices makes the system (the kernel interface)
> > easier to learn, and to motivate and justify to others.  Programs
> > therefore test the simplicity and utility of, and can reveal flaws
> > in, the set of primitives that the kernel exposes.  This is valuable
> > stuff for a research organization.  "Research" was right there in
> > the CSRC's name.
> 
> I believe this is at once making a more complex argument than was
> proffered, and at the same misses the contextual essence that Unix was
> created in.

My understanding of that context is, "a pleasant environment for
software development" (McIlroy)[0].  My notion of software development
entails (when not under managerial pressure to bang something together
for the exploitation of "market advantage") analysis and reanalysis of
software components to make them more efficient and more composable.

As a response to the perceived bloat of Multics, the development of the
Unix kernel absolutely involved much critical reappraisal of what
_needed_ to be in a kernel, and of which services were so essential that
they must be offered.

As a microkernel Kool-Aid drinker, I tend to view Unix's origin in that
light, which was reinforced by the severe limitations of the PDP-7 where
it was born.  Possibly many of the decisions about where to draw the
kernel service/userspace service line we made by instinct or seasoned
judgment, but the CSRC being a research organization, I'd be surprised
if matters of empirical measurement were far from top of mind.

It's a shame we don't have more insight into Thompson's development
process, especially in those early days.  I think we have a tendency to
conceive of Unix as having sprung from his fingers already crystallized,
like a mineral Athena from the forehead of Zeus.  I would wager (and
welcome correction if he has the patience) that he made and reversed
decisions based on the experience of using the system.  Some episodes in
McIlroy's "A Research Unix Reader" illustrate that this was a recurring
feature of its _later_ development, so why not in the incubation period?
That, too, is empirical measurement, even if informal.  Many revisions
are made in software because we find in testing that something is "too
damn slow", or runs the system out of memory too often.

So to summarize, I want to push back on your counter here.  Making
little things to measure system features is a salutary practice in OS
development.  Stevens's _Advanced Programming in the Unix Environment_
is, shall we say, tricked out with exhibits along these lines.  The
author's dedication to _measurement_ as opposed to partisan opinion is,
I think, a major factor in its status as a landmark work and as
nigh-essential reading for the serious Unix developer to this day.

Put differently, why would anyone _care_ about making cat(1) simple if
one didn't have these objectives in mind?

> > Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]
> >
> > cat(1)'s man page did not advertise the traits in the foregoing
> > viewpoint as objectives, and never did.[2]  Its avowed purpose was
> > to copy, without interruption or separation, 1..n files from storage
> > to and output channel or stream (which might be redirected).
> >
> > I don't need to tell convince that this is a worthwhile application.
> > But when we think about the many possible ways--and destinations--a
> > person might have in mind for that I/O channel, we have to face the
> > necessity of buffering or performance goes through the floor.
> >
> > It is 1978.  Some VMS
> 
> I don't know about that; VMS IO is notably slower than Unix IO by
> default. Unlike VMS, Unix uses the buffer cache to serialize access to
> the underlying storage device(s).

I must confess I have little experience with VMS (and none more recent
than 30 years ago) and offered it as an example mainly because it was
actually around in 1978 (if still fresh from the foundry).

My personal backstory is much more along the lines of my other example,
CP/M on toy computers (8-bit data bus pffffffft, right?).

> Ironically, caching here is a major win, not just for speed, but to
> make it relatively easy to reason about the state of a block, since
> that state is removed from the minutiae of the underlying storage
> device and instead handled in the bio layer. Treating the block cache
> as a fixed-size pool yields a relatively simple state machine for
> synchronizing between the in-memory and on-disk representations of
> data.

I entirely agree with this.  I contemplated following up Bakul Shah's
post with a mention of Jim Gettys's work on bufferbloat.[1]  So let me
do that here, and venture the opinion that a "buffer" as popularly
conceived and implemented (more or less just a hunk of memory to house
data) is too damn dumb a data structure for many of the uses to which it
is put.

If/when people address these problems, they do what the Unix buffer
cache did; they elaborate it with state.  This is a repeated design
pattern: see SIGURG for example.

Off the top of my head I perceive three circumstances that buffers often
need to manage.

1.  Avoidance of underrun.  Such were the joys of CD-R burning.  But
    also important in streaming or other real-time applications to avoid
    interruption.  Essentially you want to be able to say, "I'm running
    out of data at the current rate, please supply more ASAP".

2.  Avoidance of overrun.  The problems of modem-like flow control are
    familiar to most.  An important insight here, reinforced if not
    pioneered by Gettys, is that "just making the buffer bigger", the
    brogrammer solution, is not always the wise choice.

3.  Cancellation.  Familiar to all as SIGPIPE.  Sometimes all of the
    data in the buffer is invalidated.  The sender needs to stop
    transmitting ASAP, and the receiver can discard whatever it has.

I apologize for the armchair approach.  I have no doubt that much
literature exists that has covered this stuff far more rigorously.  And
yet much of that knowledge has not made its way down the mountain into
practice.  That, I think, was at least part of Doug's point.  Academics
may have considered the topic adequately, but practitioners are too
often solving problems as if it's 1972.

> >[snip]
> > And this, as we all know, is one of the reasons the standard I/O
> > library came into existence.  Mike Lesk, I surmise, understood that
> > the "applications programmer" having knowledge of kernel internals
> > was in general neither necessary nor desirable.
> 
> I'm not sure about that.  I suspect that the justification _may_ have
> been more along the lines of noting that many programs implemented
> their own, largely similar buffering strategies, and that it was
> preferable to centralize those into a single library, and also noting
> that building some kinds of programs was inconvenient using raw system
> calls. For instance, something like `gets` is handy,

An interesting choice given its notoriety as a nuclear landmine of
insecurity.  ;-)

> but is _annoying_ to write using just read(2). It can obviously be
> done, but if I don't have to, I'd prefer not to.

I think you are justifying why stdio was written _as a library_, as your
points seem to be pretty typical examples of why we move code thither
from applications.  My emphasis is a little different: why was buffered
I/O in particular (when it could so easily have been string handling)
the nucleus of what would be become a large standard library with its
toes in many waters, so huge that projects like uclibc and musl arose
for the purpose of (in part) chopping back out the stuff they felt they
didn't need?

My _claim_ is that stdio.h was the first piece of the library to walk
upright because the need for it was most intense.  More so than with
strings; in fact we've learned that Nelson's original C string library
was tricky to use well, was often elaborated by others in unfortunate
ways.[7]

But there was no I/O at all without going through the kernel, and while
there were many ways to get that job done, the best leveraged knowledge
of what the kernel had to work with.  And yet, the kernel might get
redesigned.

Could stdio itself have been done better?  Korn and Vo tried.[8]

> Here's where I think this misses the mark: this focuses too much on
> the idea that simple programs exist as to be tests for, and exemplars
> of, the kernel system call interface, but what evidence do you have
> for that?

A little bit of experience, long after the 1970s, of working with
automated tests for the seL4 microkernel.

> A simpler explanation is that simple programs are easier to
> write, easier to read, easier to reason about, test, and examine for
> correctness.

All certainly true.  But these things are just as true of programs that
don't directly make system calls at all.  cat(1), as ideally envisioned
by Pike (if I understand the Platonic ideal of his position correctly),
not only makes system calls, but dirties its hands with the standard
library as little as possible (if you recognize no options, you need
neither call nor reimplement getopt(3)) and certainly not for the
central task.

Again I think we are not so much disagreeing as much as I'm finding out
that I didn't adequately emphasize the distinctions I was making.

> Unix amplified this with Doug's "garden hoses of data" idea and the
> advent of pipes; here, it was found that small, simple programs could
> be combined in often surprisingly unanticipated ways.

Agreed; but given that pipes-as-a-service are supplied by the _kernel_,
we are once again talking about system calls.

One of the projects I never got off the ground with seL4 was a
reconsideration from first principles of what sorts of more or less
POSIXish buffering and piping mechanisms should be offered (in userland
of course).  For those who are scandalized that a microkernel doesn't
offer pipes itself, see this Heiser piece on "IPC" in that system.[2]

> Unix built up a philosophy about _how_ to write programs that was
> rooted in the problems that were interesting when Unix was first
> created. Something we often forget is that research systems are built
> to address problems that are interesting _to the researchers who build
> them_.

I agree.

> This context can shape a system, and we see that with Unix: a
> highly synchronous system call interface, because overly elaborate
> async interfaces were hard to program;

And still are, apparently even without the qualifier "overly elaborate".
...though Go (and JavaScript?) fans may disagree.

> a simple file abstraction that was easy to use
> (open/creat/read/write/close/seek/stat) because files on other
> contemporary systems were baroque things that were difficult to use;

Absolutely.  It's a truism in the Unix community that it's possible to
simulated record-oriented storage and retrieval on top of a byte stream,
but hard to do the converse.

Though, being a truism, it might be worthwhile to critically reconsider
it and more rigorously establish how we know what we think we know.
That's another reason I endorse the microkernel mission.  Let's lower
the cost of experimentation on parts of the system that of themselves
don't demand privilege.  It's a highly concurrent, NUMA world out there.

> a simple primitive for the creation of processes because, again, on
> other systems processes were very heavy, complicated things that were
> difficult to use.

It is with some dismay that I look at what they are, _on Unix_, today.

https://github.com/torvalds/linux/blob/1b294a1f35616977caddaddf3e9d28e576a1adbc/include/linux/sched.h#L748
https://github.com/openbsd/src/blob/master/sys/sys/proc.h#L138

Contrast:

https://github.com/jeffallen/xv6/blob/master/proc.h#L65

> Unix took problems related to IO and processes and made them easy. By
> the 80s, these were pretty well understood, so focus shifted to other
> things (languages, networking, etc).

True, but beside my point.  Pike's point about cat and its flags was, I
think, a call to reconsider more fundamental things.  To question what
we thought we knew--about how best to design core components of the
system, for example.  Do we really need the efflorescence of options
that perfuses not simply the GNU versions of such components (a popular
sink for abuse), but Busybox and *BSD implementations as well?

Every developer of such a component should consider the cost/benefit
ratio of flags, and then RE-consider them at intervals.  Even at the
cost of backward compatibility.  (Deprecation cycles and
mitigation/migration plans are good.)

> Unix is one of those rare beasts that escaped the lab and made it out
> there in the wild. It became the workhorse that beget a whole two or
> three generations of commercial work; it's unsurprising that when the
> web explosion happened, Unix became the basis for it: it was there, it
> was familiar, and by then it wasn't a research project anymore, but a
> basis for serious commercial work.

Yes, and in a sense this success has cost all of us.[3][4][5]

> That it has retained the original system call interface is almost
> incidental;

In _structure_, sure; in detail, I'm not sure this claim withstands
scrutiny.  Just _count_ the system calls we have today vs. V6 or V7.

> perhaps that fits with your brocolli-man analogy.

I'm unfamiliar with this metaphor.  It makes me wonder how to place it
in company with the requirements documents that led to the Ada language:
Strawman, Woodenman, Ironman, and Steelman.

At least it's likely better eating than any of those.  ;-)

Since no one else ever says it on this list, let me point out what a
terrific and unfairly maligned language Ada is.  In reading the
minutes of the latest WG14 meeting[6] I marvel anew at how C has over
time slowly, slowly accreted type- and memory-safety features that Ada
had in 1983 (or even in 1980, before its formal standardization).

Regards,
Branden

[0] https://www.gnu.org/software/groff/manual/groff.html.node/Background.html
[1] https://gettys.wordpress.com/category/bufferbloat/
[2] https://microkerneldude.org/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/
[3] https://tianyin.github.io/misc/irrelevant.pdf (guess who)
[4] https://www.youtube.com/watch?v=36myc8wQhLo (Timothy Roscoe)
[5] https://queue.acm.org/detail.cfm?id=3212479 (David Chisnall)
[6] https://www.open-std.org/JTC1/sc22/wg14/www/docs/n3227.htm
    Skip down to section 5.  Note particularly `_Optional`.
[7] https://www.symas.com/post/the-sad-state-of-c-strings
[8] https://www.semanticscholar.org/paper/SFIO%3A-Safe-Fast-String-File-IO-Korn-Vo/8014266693afda38a0a177a9b434fedce98eb7de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240515/e075f340/attachment-0001.sig>

From dave at horsfall.org  Thu May 16 16:34:54 2024
From: dave at horsfall.org (Dave Horsfall)
Date: Thu, 16 May 2024 16:34:54 +1000 (EST)
Subject: [TUHS] Be there a "remote diff" utility?
Message-ID: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>

Every so often I want to compare files on remote machines, but all I can 
do is to fetch them first (usually into /tmp); I'd like to do something 
like:

    rdiff host1:file1 host2:file2

Breathes there such a beast?  I see that Penguin/OS has already taken 
"rdiff" which doesn't seem to do what I want.

Think of it as an extension to the Unix philosophy of "Everything looks 
like a file"...

-- Dave

From arnold at skeeve.com  Thu May 16 16:51:43 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Thu, 16 May 2024 00:51:43 -0600
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
Message-ID: <202405160651.44G6pi8G018059@freefriends.org>

Maybe

	diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)

?

That could be put into a shell script that does the approriate
text manipulations on the original $1 and $2.

HTH,

Arnold

Dave Horsfall <dave at horsfall.org> wrote:

> Every so often I want to compare files on remote machines, but all I can 
> do is to fetch them first (usually into /tmp); I'd like to do something 
> like:
>
>     rdiff host1:file1 host2:file2
>
> Breathes there such a beast?  I see that Penguin/OS has already taken 
> "rdiff" which doesn't seem to do what I want.
>
> Think of it as an extension to the Unix philosophy of "Everything looks 
> like a file"...
>
> -- Dave

From ralph at inputplus.co.uk  Thu May 16 17:33:51 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Thu, 16 May 2024 08:33:51 +0100
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <202405160651.44G6pi8G018059@freefriends.org>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
Message-ID: <20240516073351.267351FAE3@orac.inputplus.co.uk>

Hi,

I've set ‘mail-followup-to: coff at tuhs.org’.

> > Every so often I want to compare files on remote machines, but all
> > I can do is to fetch them first (usually into /tmp); I'd like to do
> > something like:
> >
> >     rdiff host1:file1 host2:file2
> >
> > Breathes there such a beast?

No, nor should there.  It would be slain less it beget rcmp, rcomm,
rpaste, ...

> > Think of it as an extension to the Unix philosophy of "Everything
> > looks like a file"...

Then make remote files look local as far as their access is concerned.
Ideally at the system-call level.  Less ideal, at libc.a.

> Maybe
>
>     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)

This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
‘Banner’ set which spews the contents of a file before authentication,
e.g. the pointless

    This computer system is the property of ...

    Disconnect NOW if you have not been expressly authorised to use this
    system.  Unauthorised use is a criminal offence under the Computer
    Misuse Act 1990.

    Communications on or through ...uk's computer systems may be
    monitored or recorded to secure effective system operation and for
    other lawful purposes.

It appears on stderr so doesn't upset the diff but does clutter.
And discarding stderr is too sloppy.

-- 
Cheers, Ralph.

From ggm at algebras.org  Thu May 16 18:59:42 2024
From: ggm at algebras.org (George Michaelson)
Date: Thu, 16 May 2024 18:59:42 +1000
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <20240516073351.267351FAE3@orac.inputplus.co.uk>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
 <20240516073351.267351FAE3@orac.inputplus.co.uk>
Message-ID: <CAKr6gn05iL4JNip01DYax8d-rqH6Azu2xvgGbVLdDfhJ_4tWGQ@mail.gmail.com>

Sshfs

G

On Thu, 16 May 2024, 5:34 pm Ralph Corderoy, <ralph at inputplus.co.uk> wrote:

> Hi,
>
> I've set ‘mail-followup-to: coff at tuhs.org’.
>
> > > Every so often I want to compare files on remote machines, but all
> > > I can do is to fetch them first (usually into /tmp); I'd like to do
> > > something like:
> > >
> > >     rdiff host1:file1 host2:file2
> > >
> > > Breathes there such a beast?
>
> No, nor should there.  It would be slain less it beget rcmp, rcomm,
> rpaste, ...
>
> > > Think of it as an extension to the Unix philosophy of "Everything
> > > looks like a file"...
>
> Then make remote files look local as far as their access is concerned.
> Ideally at the system-call level.  Less ideal, at libc.a.
>
> > Maybe
> >
> >     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)
>
> This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
> ‘Banner’ set which spews the contents of a file before authentication,
> e.g. the pointless
>
>     This computer system is the property of ...
>
>     Disconnect NOW if you have not been expressly authorised to use this
>     system.  Unauthorised use is a criminal offence under the Computer
>     Misuse Act 1990.
>
>     Communications on or through ...uk's computer systems may be
>     monitored or recorded to secure effective system operation and for
>     other lawful purposes.
>
> It appears on stderr so doesn't upset the diff but does clutter.
> And discarding stderr is too sloppy.
>
> --
> Cheers, Ralph.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/a96d0114/attachment.htm>

From arnold at skeeve.com  Thu May 16 19:01:11 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Thu, 16 May 2024 03:01:11 -0600
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <20240516073351.267351FAE3@orac.inputplus.co.uk>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
 <20240516073351.267351FAE3@orac.inputplus.co.uk>
Message-ID: <202405160901.44G91CN0007274@freefriends.org>

Ralph Corderoy <ralph at inputplus.co.uk> wrote:

> > Maybe
> >
> >     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)
>
> This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
> ‘Banner’ set which spews the contents of a file before authentication,
> e.g. the pointless
>
> [....]
>
> It appears on stderr so doesn't upset the diff but does clutter.

All true, I didn't think about that.

> And discarding stderr is too sloppy.

But the author of a personal script knows his/her remote machines
and can decide if

     diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2 2>/dev/null)

is appropriate or not.

My main point was that the problem is easily solved with a
few lines of shell, so no need for a utility, especially one
written in C or some other compiled language.

Thanks,

Arnold

From douglas.mcilroy at dartmouth.edu  Thu May 16 22:31:27 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Thu, 16 May 2024 08:31:27 -0400
Subject: [TUHS] Be there a "remote diff" utility?
Message-ID: <CAKH6PiVk9ZrcwpWVhEbfUAWSMhbK89XKgogWb8GV9BL=gQP3+w@mail.gmail.com>

With the disclaimer that I have never used it, I note that FUSE/sshfs
allows one to  mount remote file systems.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/89bee550/attachment.htm>

From rminnich at gmail.com  Fri May 17 03:08:40 2024
From: rminnich at gmail.com (ron minnich)
Date: Thu, 16 May 2024 10:08:40 -0700
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <202405160901.44G91CN0007274@freefriends.org>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
 <20240516073351.267351FAE3@orac.inputplus.co.uk>
 <202405160901.44G91CN0007274@freefriends.org>
Message-ID: <CAP6exYLYjqFP+kzC7TK19=Esfi2bDv5RfLjMmoj-pyBfgisaLA@mail.gmail.com>

" The 9import tool allows an arbitrary file on a remote system, with the
capability of running the Plan 9 exportfs(4) service, to be imported into
the local name space. Usually file is a directory, so the complete file
tree under the directory is made available."
https://9fans.github.io/plan9port/man/man4/9import.html

9import host1 / /tmp/host1
9import host2 /tmp/host2
diff /tmp/host1/a/b/c /tmp/host2/a/b/c
(or whatever command you want that works with files. No need for stuff like
'rdiff' etc.)

stuff you take for granted on some systems ...

I have the plan 9 cpu command working (written in Go) and I think it's time
I get import working more widely, it's just too useful.

On Thu, May 16, 2024 at 2:01 AM <arnold at skeeve.com> wrote:

> Ralph Corderoy <ralph at inputplus.co.uk> wrote:
>
> > > Maybe
> > >
> > >     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)
> >
> > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
> > ‘Banner’ set which spews the contents of a file before authentication,
> > e.g. the pointless
> >
> > [....]
> >
> > It appears on stderr so doesn't upset the diff but does clutter.
>
> All true, I didn't think about that.
>
> > And discarding stderr is too sloppy.
>
> But the author of a personal script knows his/her remote machines
> and can decide if
>
>      diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2
> 2>/dev/null)
>
> is appropriate or not.
>
> My main point was that the problem is easily solved with a
> few lines of shell, so no need for a utility, especially one
> written in C or some other compiled language.
>
> Thanks,
>
> Arnold
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/36dbae55/attachment.htm>

From tuhs at tuhs.org  Fri May 17 03:12:27 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Thu, 16 May 2024 10:12:27 -0700
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <CAKH6PiVk9ZrcwpWVhEbfUAWSMhbK89XKgogWb8GV9BL=gQP3+w@mail.gmail.com>
References: <CAKH6PiVk9ZrcwpWVhEbfUAWSMhbK89XKgogWb8GV9BL=gQP3+w@mail.gmail.com>
Message-ID: <A5DC2242-12E0-4425-B14B-0DAB287D4752@iitbombay.org>

An interesting question is whether there exists a diff algorithm
which *minimizes* data movement across the network. Assuming similar
lengths, you can halve it by running the diff at one of the hosts
but can one do better if the two files are fairly similar? Is this
even a theoretical possibility? I don't see links to any such
algorithm on wikipedia's diff page but I figured there might be
someone on TUHS who may have speculated or know about this!

Bakul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/261ebae1/attachment.htm>

From rich.salz at gmail.com  Fri May 17 04:12:12 2024
From: rich.salz at gmail.com (Rich Salz)
Date: Thu, 16 May 2024 14:12:12 -0400
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <A5DC2242-12E0-4425-B14B-0DAB287D4752@iitbombay.org>
References: <CAKH6PiVk9ZrcwpWVhEbfUAWSMhbK89XKgogWb8GV9BL=gQP3+w@mail.gmail.com>
 <A5DC2242-12E0-4425-B14B-0DAB287D4752@iitbombay.org>
Message-ID: <CAFH29tryquG_eYFpwcx2ZJVm=3NuvTyCCYHhVk7xr06vmQ0xOg@mail.gmail.com>

The rsync protocol might be appropriate.  See
https://www.samba.org/~tridge/phd_thesis.pdf and
https://rsync.samba.org/tech_report/node2.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/8e8eac1d/attachment.htm>

From tuhs at tuhs.org  Fri May 17 04:38:37 2024
From: tuhs at tuhs.org (Ben Greenfield via TUHS)
Date: Thu, 16 May 2024 14:38:37 -0400
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <CAFH29tryquG_eYFpwcx2ZJVm=3NuvTyCCYHhVk7xr06vmQ0xOg@mail.gmail.com>
References: <CAKH6PiVk9ZrcwpWVhEbfUAWSMhbK89XKgogWb8GV9BL=gQP3+w@mail.gmail.com>
 <A5DC2242-12E0-4425-B14B-0DAB287D4752@iitbombay.org>
 <CAFH29tryquG_eYFpwcx2ZJVm=3NuvTyCCYHhVk7xr06vmQ0xOg@mail.gmail.com>
Message-ID: <EF4ABFCF-F0D9-47D0-B60C-3706DCEB8803@cogs.com>

I use rsync for that with the -n dry run flag

> On May 16, 2024, at 2:12 PM, Rich Salz <rich.salz at gmail.com> wrote:
> 
> The rsync protocol might be appropriate.  See https://www.samba.org/~tridge/phd_thesis.pdf and https://rsync.samba.org/tech_report/node2.html
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/3cb8cf26/attachment.htm>

From fariborz.t at gmail.com  Fri May 17 04:51:11 2024
From: fariborz.t at gmail.com (Skip Tavakkolian)
Date: Thu, 16 May 2024 11:51:11 -0700
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <CAP6exYLYjqFP+kzC7TK19=Esfi2bDv5RfLjMmoj-pyBfgisaLA@mail.gmail.com>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
 <20240516073351.267351FAE3@orac.inputplus.co.uk>
 <202405160901.44G91CN0007274@freefriends.org>
 <CAP6exYLYjqFP+kzC7TK19=Esfi2bDv5RfLjMmoj-pyBfgisaLA@mail.gmail.com>
Message-ID: <CAA1C+h2Khq01Y85qrCZ46A597JCJTM+aCAsry_S4bKMWHgeYHA@mail.gmail.com>

To add to Ron's post, Plan 9's cpu exports the origination's namespace to
the destination; by convention it is mounted on /mnt/term at destination.

host1% cpu -h host2
host2% diff file2 /mnt/term/usr/me/file1


On Thu, May 16, 2024 at 10:09 AM ron minnich <rminnich at gmail.com> wrote:

> " The 9import tool allows an arbitrary file on a remote system, with the
> capability of running the Plan 9 exportfs(4) service, to be imported into
> the local name space. Usually file is a directory, so the complete file
> tree under the directory is made available."
> https://9fans.github.io/plan9port/man/man4/9import.html
>
> 9import host1 / /tmp/host1
> 9import host2 /tmp/host2
> diff /tmp/host1/a/b/c /tmp/host2/a/b/c
> (or whatever command you want that works with files. No need for stuff
> like 'rdiff' etc.)
>
> stuff you take for granted on some systems ...
>
> I have the plan 9 cpu command working (written in Go) and I think it's
> time I get import working more widely, it's just too useful.
>
> On Thu, May 16, 2024 at 2:01 AM <arnold at skeeve.com> wrote:
>
>> Ralph Corderoy <ralph at inputplus.co.uk> wrote:
>>
>> > > Maybe
>> > >
>> > >     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)
>> >
>> > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
>> > ‘Banner’ set which spews the contents of a file before authentication,
>> > e.g. the pointless
>> >
>> > [....]
>> >
>> > It appears on stderr so doesn't upset the diff but does clutter.
>>
>> All true, I didn't think about that.
>>
>> > And discarding stderr is too sloppy.
>>
>> But the author of a personal script knows his/her remote machines
>> and can decide if
>>
>>      diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2
>> 2>/dev/null)
>>
>> is appropriate or not.
>>
>> My main point was that the problem is easily solved with a
>> few lines of shell, so no need for a utility, especially one
>> written in C or some other compiled language.
>>
>> Thanks,
>>
>> Arnold
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/b98cccdb/attachment.htm>

From marc.donner at gmail.com  Fri May 17 05:51:45 2024
From: marc.donner at gmail.com (Marc Donner)
Date: Thu, 16 May 2024 15:51:45 -0400
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <CAA1C+h2Khq01Y85qrCZ46A597JCJTM+aCAsry_S4bKMWHgeYHA@mail.gmail.com>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
 <202405160651.44G6pi8G018059@freefriends.org>
 <20240516073351.267351FAE3@orac.inputplus.co.uk>
 <202405160901.44G91CN0007274@freefriends.org>
 <CAP6exYLYjqFP+kzC7TK19=Esfi2bDv5RfLjMmoj-pyBfgisaLA@mail.gmail.com>
 <CAA1C+h2Khq01Y85qrCZ46A597JCJTM+aCAsry_S4bKMWHgeYHA@mail.gmail.com>
Message-ID: <CALQ0xCD1--+_=71g0+2yGWSXu7WLGZfR5nUQvXWE-FEAKm248g@mail.gmail.com>

If I recall correctly, there is a combination of flags to rsync that will
generate a report on a file, a set of files, or a set of directories to
tell if they are different.

I seem to recall DPK or RCT doing something clever with rsync and cksum to
get this sort of result without having to stream a lot of data across the
long-haul network back in the day.

Best,

Marc
=====
nygeek.net
mindthegapdialogs.com/home <https://www.mindthegapdialogs.com/home>


On Thu, May 16, 2024 at 2:51 PM Skip Tavakkolian <fariborz.t at gmail.com>
wrote:

> To add to Ron's post, Plan 9's cpu exports the origination's namespace to
> the destination; by convention it is mounted on /mnt/term at destination.
>
> host1% cpu -h host2
> host2% diff file2 /mnt/term/usr/me/file1
>
>
> On Thu, May 16, 2024 at 10:09 AM ron minnich <rminnich at gmail.com> wrote:
>
>> " The 9import tool allows an arbitrary file on a remote system, with the
>> capability of running the Plan 9 exportfs(4) service, to be imported into
>> the local name space. Usually file is a directory, so the complete file
>> tree under the directory is made available."
>> https://9fans.github.io/plan9port/man/man4/9import.html
>>
>> 9import host1 / /tmp/host1
>> 9import host2 /tmp/host2
>> diff /tmp/host1/a/b/c /tmp/host2/a/b/c
>> (or whatever command you want that works with files. No need for stuff
>> like 'rdiff' etc.)
>>
>> stuff you take for granted on some systems ...
>>
>> I have the plan 9 cpu command working (written in Go) and I think it's
>> time I get import working more widely, it's just too useful.
>>
>> On Thu, May 16, 2024 at 2:01 AM <arnold at skeeve.com> wrote:
>>
>>> Ralph Corderoy <ralph at inputplus.co.uk> wrote:
>>>
>>> > > Maybe
>>> > >
>>> > >     diff -u <(ssh host1 cat file1) <(ssh host2 cat file2)
>>> >
>>> > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s
>>> > ‘Banner’ set which spews the contents of a file before authentication,
>>> > e.g. the pointless
>>> >
>>> > [....]
>>> >
>>> > It appears on stderr so doesn't upset the diff but does clutter.
>>>
>>> All true, I didn't think about that.
>>>
>>> > And discarding stderr is too sloppy.
>>>
>>> But the author of a personal script knows his/her remote machines
>>> and can decide if
>>>
>>>      diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2
>>> 2>/dev/null)
>>>
>>> is appropriate or not.
>>>
>>> My main point was that the problem is easily solved with a
>>> few lines of shell, so no need for a utility, especially one
>>> written in C or some other compiled language.
>>>
>>> Thanks,
>>>
>>> Arnold
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240516/4eae607d/attachment.htm>

From tytso at mit.edu  Fri May 17 05:53:09 2024
From: tytso at mit.edu (Theodore Ts'o)
Date: Thu, 16 May 2024 13:53:09 -0600
Subject: [TUHS] Be there a "remote diff" utility?
In-Reply-To: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
References: <alpine.BSF.2.21.9999.2405161623130.15285@aneurin.horsfall.org>
Message-ID: <20240516195309.GB287325@mit.edu>

On Thu, May 16, 2024 at 04:34:54PM +1000, Dave Horsfall wrote:
> Every so often I want to compare files on remote machines, but all I can 
> do is to fetch them first (usually into /tmp); I'd like to do something 
> like:
> 
>     rdiff host1:file1 host2:file2
> 
> Breathes there such a beast?  I see that Penguin/OS has already taken 
> "rdiff" which doesn't seem to do what I want.

rdiff is something which someone on the internet had created, as part
of the librsync package[1].  Thia isn't considered part of the core
package (for example, Debian consideres it as an "optional" package)
but rather something which various distributions have packaged for the
convenience for their users.

[1] https://librsync.github.io/

So if this is considered part of Penguin/OS, would we also consider
"nethack" or X11 part of BSD 4.3, since it was available and often
would be commonly installed on BSD 4.3 systems?  Or are all packages
which are in FreeBSD's ports "part of FreeBSD"?  Or all packages in
MacPorts part of MacOS?

In any case, the way I'd suggest that you do this that works as an
extention to the Unix philosohy of "Everything looks like a file" is
to use FUSE:

sshfs host1:/ ~/mnt/host1
sshfs host2:/ ~/mnt/host2
diff ~/mnt/host1/file1 ~/mnt/host2/file2

Cheers,

					- Ted

From douglas.mcilroy at dartmouth.edu  Sun May 19 04:07:38 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Sat, 18 May 2024 14:07:38 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
Message-ID: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>

I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is
more".
       % less --help | wc
      298
Last time I looked, the line count was about 220. Bloat is self-catalyzing.

What prompted me to look was another disheartening discovery. The "small
special tool" Gnu diff has a 95-page manual!  And it doesn't cover the
option I was looking up (-h). To be fair, the manual includes related
programs like diff3(1), sdiff(1) and patch(1), but the original manual for
each fit on one page.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240518/7f723109/attachment.htm>

From brantley at coraid.com  Sun May 19 04:13:36 2024
From: brantley at coraid.com (Brantley Coile)
Date: Sat, 18 May 2024 14:13:36 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
Message-ID: <1F0ECDE3-F653-48FB-AC95-FCE84C9B14A5@coraid.com>

I'm so grateful that we are able to work using Plan 9. 

aztec% wc -l /sys/src/cmd/p.c
     90 /sys/src/cmd/p.c
aztec% 

So the size of Plan 9's paginator's source code is 208 lines smaller than the help for that paginator. And it has no options.

Just say'n.

bwc

> On May 18, 2024, at 2:07 PM, Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:
> 
> I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more".
>        % less --help | wc
>       298
> Last time I looked, the line count was about 220. Bloat is self-catalyzing. 
> 
> What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual!  And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page.
> 
> Doug


From lm at mcvoy.com  Sun May 19 04:18:25 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Sat, 18 May 2024 11:18:25 -0700
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
Message-ID: <20240518181825.GT9216@mcvoy.com>

On Sat, May 18, 2024 at 02:07:38PM -0400, Douglas McIlroy wrote:
> I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is
> more".
>        % less --help | wc
>       298
> Last time I looked, the line count was about 220. Bloat is self-catalyzing.
> 
> What prompted me to look was another disheartening discovery. The "small
> special tool" Gnu diff has a 95-page manual!  And it doesn't cover the
> option I was looking up (-h). To be fair, the manual includes related
> programs like diff3(1), sdiff(1) and patch(1), but the original manual for
> each fit on one page.

Normally I agree with Doug but on documentation, the less is more leaves me
cold.  It's fine when it is V7 cat that had maybe an option or two.  GNU
diff is a complex beast and it needs lots of docs.

Personally, I like it when man pages have a few usage examples, the BitKeeper
docs are like that.  But I'm ok with a terse man page with a SEE ALSO that
points to a user guide.

Docs should be helpful.
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From ralph at inputplus.co.uk  Sun May 19 04:22:18 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sat, 18 May 2024 19:22:18 +0100
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
Message-ID: <20240518182218.44ED921309@orac.inputplus.co.uk>

Hi Doug,

>        % less --help | wc -l
>       298
> Last time I looked, the line count was about 220. Bloat is self-catalyzing.

Adding a --help option is a sign the man page lacks succintness.
It's the easier solution.

Another point against adding --help: there's a second attempt to
describe the source.

-- 
Cheers, Ralph.

From tuhs at tuhs.org  Sun May 19 04:31:41 2024
From: tuhs at tuhs.org (=?utf-8?b?UGV0ZXIgV2VpbmJlcmdlciAo5rip5Y2a5qC8KSB2aWEgVFVIUw==?=)
Date: Sat, 18 May 2024 14:31:41 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
Message-ID: <CAOUkXSpMXxMoRZZZwovZGqSm_nzbG5FFzGP82_iKW7HguFhMUg@mail.gmail.com>

There is a common problem in our field. When something (a command, a
language, a library, etc) has a flaw, we say to ourselves, "This is
not good. If we remove this flaw things will be better."  as if it's
an obvious truth.

Sometimes it is true, but it's frequently questionable, and all too
often it's just wrong. We have no commonly accepted way of balancing
complexity and function; usually complexity wins. When AI takes my job
it will be because it's better at dealing with the mindless complexity
of enormous APIs (and command-line flags).


On Sat, May 18, 2024 at 2:08 PM Douglas McIlroy
<douglas.mcilroy at dartmouth.edu> wrote:
>
> I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more".
>        % less --help | wc
>       298
> Last time I looked, the line count was about 220. Bloat is self-catalyzing.
>
> What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual!  And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page.
>
> Doug

From clemc at ccc.com  Sun May 19 04:52:05 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 18 May 2024 14:52:05 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <20240518181825.GT9216@mcvoy.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518181825.GT9216@mcvoy.com>
Message-ID: <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>

On Sat, May 18, 2024 at 2:18 PM Larry McVoy <lm at mcvoy.com> wrote:

> But I'm ok with a terse man page with a SEE ALSO that points to a user
> guide.
>
Only if the SEE ALSO has more complete and relevant information -
otherwise, it degrades to VMS's famous "see figure 1" SPR.

>
> Docs should be helpful.
>
And easy to extract information.

The issue to be comes back to the type of information each document is
designed to give. I believe there at least three types of docs:

   1. Full manuals explain how something is built and it it used.  It helps
   to have theory/principles of operations behind it and enough detail when
   done, you can understand why and how to use it.
   2. Tutorials are excellent for someone trying to learn a new tool.  Less
   theory - and more -- examples, showing off the features and how to do
   something.
   3. References pages - need to be quick look-ups to remind someone how to
   use something - particularly for tools you don't use every
   day/generally don't memorize.


There are at least two more: an academic paper which might be looked at as
a start of #1 and full books which take #1 to even more details.  Some
academic papers indeed are fine manuals, and I can also argue the "manual"
for some tools like awk/sed or, for that matter, yacc(1) are full
books. But the idea is the >>complete<< review here.

Tutorials and reference pages are supposed to easy helpful things -- but
often miss the mark for the audience.  To me, the problem is the wrong type
of information is put in each one and, more importantly, people's
expectations from the document.  I love properly built manual pages - I
detest things like the VMS/TOPS help command or gnu info pages. What I
really hate is when there is no manual, but they tell you see the HELP
command -- but which command or "subtopic" -- Yikes.  The traditional
man system is simple quick reminders, basic reference and I can move on.
For instance, I needed to remember which C library has the definition these
days for some set of functions and what are its error return codes -- man 3
functions, I'm done.

Tutorials are funny.  For some people, what they want to learn the ideas
behind a tool.  Typically, I don't need that as much as how this toll does
some function.   For instance, Apple is forcing me the learn lldb because
the traditional debuggers derived from UCB's DBX are not there.   It's
similar to different.  The man page is useful only for the command lines
switches.   It turns out the commands are all really long, but they have
abbreviations and can be aliases.  I found references to this in an lldb
tutorial - but the tutorial is written to teach people more how to use a
debugger to debug there code, and less how this debugger maps into the
traditional functions.  Hey I would like to find an cheat sheet or a set of
aliases that map DBX/GDB into it -- but so far I've found nothing.

So Larry -- I agree with you ... "*Docs should be helpful*," but I fear
saying like that is a bit like the Faber College Motto/Founder's
Quote: "*Knowledge
is good*."


ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240518/37389ad9/attachment.htm>

From luther.johnson at makerlisp.com  Sun May 19 05:19:32 2024
From: luther.johnson at makerlisp.com (Luther Johnson)
Date: Sat, 18 May 2024 12:19:32 -0700
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518181825.GT9216@mcvoy.com>
 <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>
Message-ID: <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com>

Complexity is entropy. It occurs naturally in all human endeavor. It
takes work to keep things small, orderly, and rational. But there is
also a point where although a tool may be perfect in its conception and
execution, from its own perspective, it is not as useful as a slightly
more disorderly version that does what people want it to do. "Well they
shouldn't want that !" is a common response. Then people write scripts
to do for themselves what the tool doesn't do. Which might be right, but
it might lead to a whole bunch of similar scripts to do the same thing,
just a little differently And that's when we discover that it would have
been better to have it in the one tool in the first place.

So it's a back and forth, trial and error process. Eventually new
balances get struck, and people of like minds and tastes find a new
center, like Plan 9, or other things.

Myself, I do tend to like tools that are smaller and more single-minded
in their function (and that makes it possible to have documentation that
is clearer and more concise), but as an example, sometimes I want the
"-u" switch on diff, to make a patch, sometimes I don't, the default
display is better for a quick review (but I think or expect that the
essential diff engine is being shared). It's all a matter of judgment,
but you can't apply good judgment until you have the experience gained
from trying several alternatives. So things will get bloated up, and
then they will need to be pruned and re-engineered, but hopefully we
don't throw out the most helpful exceptions to the rule just because
they don't fit with some sort of consistency aesthetic.

On 05/18/2024 11:52 AM, Clem Cole wrote:
>
>
> On Sat, May 18, 2024 at 2:18 PM Larry McVoy <lm at mcvoy.com
> <mailto:lm at mcvoy.com>> wrote:
>
>     But I'm ok with a terse man page with a SEE ALSO thatpoints to a
>     user guide.
>
> Only if the SEE ALSO has more complete and relevant information -
> otherwise, it degrades to VMS's famous "see figure 1" SPR.
>
>
>     Docs should be helpful.
>
> And easy to extract information.
>
> The issue to be comes back to the type of information each document is
> designed to give. I believe there at least three types of docs:
>
>  1. Full manuals explain how something is built and it it used.  It
>     helps to have theory/principles of operations behind it and enough
>     detail when done, you can understand why and howto use it.
>  2. Tutorials are excellent for someone trying to learn a new tool.
>     Less theory - and more -- examples, showing off the features and
>     how to do something.
>  3. References pages - need to be quick look-ups to remind someone how
>     to use something - particularly for tools you don't use every
>     day/generally don't memorize.
>
>
> There are at least two more: an academic paper which might be looked
> at as a start of #1 and full books which take #1 to even more
> details.  Some academic papers indeed are fine manuals, and I can also
> argue the "manual" for some tools like awk/sed or, for that matter,
> yacc(1) are full books. But the idea is the >>complete<< review here.
>
> Tutorials and reference pages are supposed to easy helpful things --
> but often miss the mark for the audience. To me, the problem is the
> wrong type of information is put in each one and, more importantly,
> people's expectations from the document.  I love properly builtmanual
> pages - I detest things like the VMS/TOPS help command or gnu info
> pages. What I really hate is when there is no manual, but they tell
> you see the HELP command -- but which command or "subtopic" -- Yikes.
> The traditional man system is simple quick reminders,
> basicreferenceand I can move on.  For instance, I needed to remember
> which C library has the definition these days for some set of
> functions and what are its error return codes -- man 3 functions, I'm
> done.
>
> Tutorials are funny.  For some people, what they want to learn the
> ideas behind a tool.  Typically, I don't need that as much as how this
> toll does some function.   For instance, Apple is forcing me the learn
> lldb because the traditional debuggers derived from UCB's DBX are not
> there.   It's similar to different. The man page is useful only for
> the command lines switches.   It turns out the commands are all really
> long, but they have abbreviations and can be aliases.  I found
> references to this in an lldb tutorial - but the tutorial is written
> to teach people more how to use a debugger to debug there code, and
> less how this debugger maps into the traditional functions.  Hey I
> would like to find an cheat sheet or a set of aliases that map DBX/GDB
> into it -- but so far I've found nothing.
>
> So Larry -- I agree with you ... "/Docs should be helpful/," but I
> fear saying like that is a bit like the Faber College
> Motto/Founder's Quote: "/Knowledge is good/."
>
>
>
> ᐧ

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240518/0dd28a27/attachment.htm>

From stuff at riddermarkfarm.ca  Sun May 19 05:32:37 2024
From: stuff at riddermarkfarm.ca (Stuff Received)
Date: Sat, 18 May 2024 15:32:37 -0400
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518181825.GT9216@mcvoy.com>
 <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>
Message-ID: <b5cf8b9e-fcbf-20d9-1c2a-4ebaf8164a9f@riddermarkfarm.ca>

On 2024-05-18 14:52, Clem Cole wrote (in part):
> Hey I would like to find 
> an cheat sheet or a set of aliases that map DBX/GDB into it -- but so 
> far I've found nothing.

Does this help? https://lldb.llvm.org/use/map.html

(I confess that learning lldb has been quite the chore.)

S.

From tuhs at tuhs.org  Sun May 19 06:12:38 2024
From: tuhs at tuhs.org (segaloco via TUHS)
Date: Sat, 18 May 2024 20:12:38 +0000
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518181825.GT9216@mcvoy.com>
 <CAC20D2NbHEdxKU6HA_ZQWYmc-HGhLE+F4e8XFoWePToRBP4-9Q@mail.gmail.com>
 <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com>
Message-ID: <aC6McOgZFm_rDmUAD0q7Mex9iJtGdEKNSQOKVV1btRyvuUOXtX1p7ETIIlY9A4A-jdeaPzB2fUaklGEfy9DFv6dfBKLnB-S92Dc8gMlvph0=@protonmail.com>

On Saturday, May 18th, 2024 at 12:19 PM, Luther Johnson <luther.johnson at makerlisp.com> wrote:

> Complexity is entropy. It occurs naturally in all human endeavor. It takes work to keep things small, orderly, and rational. But there is also a point where although a tool may be perfect in its conception and execution, from its own perspective, it is not as useful as a slightly more disorderly version that does what people want it to do. "Well they shouldn't want that !" is a common response. Then people write scripts to do for themselves what the tool doesn't do. Which might be right, but it might lead to a whole bunch of similar scripts to do the same thing, just a little differently And that's when we discover that it would have been better to have it in the one tool in the first place.
> 
> So it's a back and forth, trial and error process. Eventually new balances get struck, and people of like minds and tastes find a new center, like Plan 9, or other things.
> 
> Myself, I do tend to like tools that are smaller and more single-minded in their function (and that makes it possible to have documentation that is clearer and more concise), but as an example, sometimes I want the "-u" switch on diff, to make a patch, sometimes I don't, the default display is better for a quick review (but I think or expect that the essential diff engine is being shared). It's all a matter of judgment, but you can't apply good judgment until you have the experience gained from trying several alternatives. So things will get bloated up, and then they will need to be pruned and re-engineered, but hopefully we don't throw out the most helpful exceptions to the rule just because they don't fit with some sort of consistency aesthetic.
> 
> On 05/18/2024 11:52 AM, Clem Cole wrote:
> 
> > 
> > 
> > On Sat, May 18, 2024 at 2:18 PM Larry McVoy <lm at mcvoy.com> wrote:
> > 
> > > But I'm ok with a terse man page with a SEE ALSO that points to a user guide.
> > 
> > Only if the SEE ALSO has more complete and relevant information - otherwise, it degrades to VMS's famous "see figure 1" SPR.
> > 
> > > 
> > > Docs should be helpful.
> > 
> > And easy to extract information.
> > 
> > 
> > The issue to be comes back to the type of information each document is designed to give. I believe there at least three types of docs:
> > 
> > 1.  Full manuals explain how something is built and it it used. It helps to have theory/principles of operations behind it and enough detail when done, you can understand why and how to use it.
> > 2.  Tutorials are excellent for someone trying to learn a new tool. Less theory - and more -- examples, showing off the features and how to do something.
> > 3.  References pages - need to be quick look-ups to remind someone how to use something - particularly for tools you don't use every day/generally don't memorize.
> > 
> > 
> > 
> > There are at least two more: an academic paper which might be looked at as a start of #1 and full books which take #1 to even more details. Some academic papers indeed are fine manuals, and I can also argue the "manual" for some tools like awk/sed or, for that matter, yacc(1) are full books. But the idea is the >>complete<< review here.
> > 
> > 
> > Tutorials and reference pages are supposed to easy helpful things -- but often miss the mark for the audience. To me, the problem is the wrong type of information is put in each one and, more importantly, people's expectations from the document. I love properly built manual pages - I detest things like the VMS/TOPS help command or gnu info pages. What I really hate is when there is no manual, but they tell you see the HELP command -- but which command or "subtopic" -- Yikes. The traditional man system is simple quick reminders, basic reference and I can move on. For instance, I needed to remember which C library has the definition these days for some set of functions and what are its error return codes -- man 3 functions, I'm done.
> > 
> > 
> > Tutorials are funny. For some people, what they want to learn the ideas behind a tool. Typically, I don't need that as much as how this toll does some function. For instance, Apple is forcing me the learn lldb because the traditional debuggers derived from UCB's DBX are not there. It's similar to different. The man page is useful only for the command lines switches. It turns out the commands are all really long, but they have abbreviations and can be aliases. I found references to this in an lldb tutorial - but the tutorial is written to teach people more how to use a debugger to debug there code, and less how this debugger maps into the traditional functions. Hey I would like to find an cheat sheet or a set of aliases that map DBX/GDB into it -- but so far I've found nothing.
> > 
> > 
> > So Larry -- I agree with you ... "Docs should be helpful," but I fear saying like that is a bit like the Faber College Motto/Founder's Quote: "Knowledge is good."
> > 
> > 
> > 
> > 
> > ᐧ

Facing ever-growing complexity, I often find myself turning strictly to the POSIX/SUS manpages for anything that has one, not only due to an interest in keeping things as portable as possible, but also admittedly out of some trepidation that the cool shiny specific feature of the week for a specific implementation doesn't have quite the same stabilizing standard behind it, and as such has the unlikely but real potential to change right out from under you in a new major version.  Issuing 'man 1p' or 'man 3p' before most studying has become habit, turning to a vendors' docs only when necessary.

Granted, no standardization of debuggers, assemblers, linkers, etc. makes this much trickier when working with embedded stuff or intensive diagnostics, so in that regard I've thus far been aligned with the GNU family of these components.  For the sake of embedded devs, it would be nice if the as/ld/db set of utilities had some sort of guiding light driving disparate implementations.  A particular example of divergent behavior wearing a familiar mask is the cc65 suite.  The assembler and linker smell of predictable UNIX fare, but differ in a number of little, quite annoying ways, among them "export" instead of "globl", cheap labels based strictly on counts forward and backward rather than recyclable numeric labels, just little things.  While a standard isn't the end all be all solution to everything, it certainly decreases at least some of the cognitive load, giving you a subset of behaviors to learn once, only turning to specifics when you've exhausted your options (or patience) with the intersections of various implementations.

I see a domino effect in this sort of thing too, one basal tool diverges a bit from other versions, then folks who only ever use that implementation head down a new fork in the road, those in their "camp" follow, before long whatever that difference is happens to be entrenched in a number of folks' vocabularies.  Like linguistic divergence, eventually the dialectical bits of how they work are no longer mutually intelligible.  The Tower of Babel grows ever higher, only time will tell whether the varied, sometimes contradictory styles of architecture are a strength or weakness.  Like evolution, oft times the most successful, sensible approaches prevail, but nature has a funny way of lapsing on our narrow understanding of "fitness" too.  After all, most of what many of us use on the regular guarantees no "fitness for a particular purpose".  Does this make stability an organic consequence or a happy accident?  I know I couldn't say.

- Matt G.

From steffen at sdaoden.eu  Sun May 19 06:33:19 2024
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Sat, 18 May 2024 22:33:19 +0200
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
Message-ID: <20240518203319.3oAKtOSk@steffen%sdaoden.eu>

Douglas McIlroy wrote in
 <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg at mail.gmail.com>:
 |I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is
 |more".
 |       % less --help | wc
 |      298
 |Last time I looked, the line count was about 220. Bloat is self-catalyzing.

I do not buy that.
You are working on Windows and in the meantime have switched to
one of those graphical browser monsters (i think) where each
instance has more code active than the entire Unix history
altogether.

less(1) can now Unicode, and that is not as easy with ISO/POSIX as
it was on Plan9 for example which simply goes UTF-8 and has some
(smart) lookup tables (now in go, more or less, last i looked),
but that is not the whole picture of it.

It can those ANSI / ISO 6429 color sequences that everybody wants,
as you have them everywhere, even GNU's yacc, bison.

The OpenBSD people took a port done by an OpenSolaris (i think,
that scene, anyhow) guy, and together they stripped it down
massively.

But i do not use it, because after almost exactly a decade i got
upstreamed to Nudelman's less(1) the necessary patches to have
active hyperlinks on the terminal, in a normal Unix (roff mdoc)
manual.  (These work via OSC-8 escape sequences; it was a "15
files changed, 601 insertions(+), 9 deletions(-)" patch, which
included careful quoting of file paths etc. for man(1) openings
(ie, such code gets lengthy), but he did it differently a bit, and
left off some things i wanted, included others (good), but if you
use --mouse with his one then you have a real browser feeling.
I have problems with --mouse, unfortunately, because when used you
can no longer copy+paste -- he would need to add clipboard control
in addition i'd say.., adding even more code.)

You know, it may be viable for some tools, but for others, .. not.
You say it yourself in your "A Research UNIX Reader": "Electronic
mail was there from the start.  Never satisfied with its exact
behavior, everybody touched it at one time or another".
In the meantime the IETF went grazy and produced masses of
standards, and unfortunately each one adds a little bit that needs
to be addressed differently, and all that needs documentation.
Now mail is an extreme example.

And almost a quarter of a century ago i wrote a small pager that
even had a clock, and it required less CPU on a day with some
scrolling than less/ncurses for a one time scroll through the
document.  But that pager is history, and less is still there,
running everywhere, and being used by me dozens to hundreds time
a day.  Also with colours, with searching, and now also with

  ^O^N  ^On         *  Search forward for (N-th) OSC8 hyperlink.
  ^O^P  ^Op         *  Search backward for (N-th) OSC8 hyperlink.
  ^O^L  ^Ol            Jump to the currently selected OSC8 hyperlink.

And prepared mdoc manuals can now display on a normal Unix
terminal in a normal (actively OSC-8 supporting $PAGER) a TOC (at
will, with links), and have external (man:, but also http: etc;
man is built into less(1) -- yay!) links, too.
For example here ∞ is an external, and † are internal links:

   The OpenSSL program ciphers(1)∞ should be referred to when creating a
   custom cipher list.  Variables of interest for TLS in general are
   tls-ca-dir†, tls-ca-file†, tls-ca-flags†, tls-ca-no-defaults†,
   tls-config-file†, tls-config-module†, tls-config-pairs

So ^O^L on that ciphers(1) opens a new man(1)ual instance.
For all this functionality a program with 221K bytes is small:

  221360 May 18 22:13 ...less*

Also it starts up into interactive mode with --help.
So you could have "full interactivity" and colours and mouse, and
configurability to a large extend, which somehow has to be
documented, in just 221 K bytes.

I give in in that i try to have --help/-h and --long-help/-H, but
sometimes that -h is only minimal, because a screenful of data is
simply not enough to allow users to have a notion.

So less could split the manual into a less.1 and a less-book.7.
The same is true for bash, for sure.  (And for my little mailer.)
But things tend to divert, and it is hard enough to keep one
manual in sync with the codebase, especially if you develop
focused and expert-idiotized in a one man show.

 |What prompted me to look was another disheartening discovery. The "small
 |special tool" Gnu diff has a 95-page manual!  And it doesn't cover the
 |option I was looking up (-h). To be fair, the manual includes related
 |programs like diff3(1), sdiff(1) and patch(1), but the original manual for
 |each fit on one page.
 --End of <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg at mail.gmail\
 .com>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

From tuhs at tuhs.org  Sun May 19 11:04:23 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Sat, 18 May 2024 18:04:23 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240515164212.beswgy4h2nwvbdck@illithid>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
Message-ID: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>

On May 15, 2024, at 9:42 AM, G. Branden Robinson <g.branden.robinson at gmail.com> wrote:
> 
> I contemplated following up Bakul Shah's
> post with a mention of Jim Gettys's work on bufferbloat.[1]  So let me
> do that here, and venture the opinion that a "buffer" as popularly
> conceived and implemented (more or less just a hunk of memory to house
> data) is too damn dumb a data structure for many of the uses to which it
> is put.

Note that even if you remove every RAM buffer between the two
endpoints of a TCP connection, you still have a "buffer". Example:
If you have a 1Gbps pipe between SF & NYC, the pipe itself can store
something like 3.5MB to 4MB in each direction! As the pipe can be
lossy, you have to buffer up N (=bandwidth*latency) bytes at the
sending end (until you see an ack for the previous Nth byte), if you
want to utilize the full bandwidth.

Now what happens if the sender program exits right after sending
the last byte? Something on behalf of the sender has to buffer up
and stick around to complete the TCP dance. Even if the sender is
cat -u, the kernel or a network daemon process atop a microkernel
has to buffer this data[1].

Unfortunately you can't abolish latency! But where to put buffers
is certainly an engineering choice that can impact compositionality
or other problems such as bufferbloat.

[1] This brings up a separate point: in a microkernel even a simple
thing like "foo | bar" would require a third process - a "pipe
service", to buffer up the output of foo! You may have reduced
the overhead of individual syscalls but you will have more of
cross-domain calls!

From lm at mcvoy.com  Sun May 19 11:21:14 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Sat, 18 May 2024 18:21:14 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
Message-ID: <20240519012114.GU9216@mcvoy.com>

On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
> [1] This brings up a separate point: in a microkernel even a simple
> thing like "foo | bar" would require a third process - a "pipe
> service", to buffer up the output of foo! You may have reduced
> the overhead of individual syscalls but you will have more of
> cross-domain calls!

Do any micro kernels do address space to address space bcopy()?
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From stewart at serissa.com  Sun May 19 11:26:31 2024
From: stewart at serissa.com (Serissa)
Date: Sat, 18 May 2024 21:26:31 -0400
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240519012114.GU9216@mcvoy.com>
References: <20240519012114.GU9216@mcvoy.com>
Message-ID: <F1BC6F9C-3740-4EA6-BA70-95CB032466DC@serissa.com>

MIT's FOS (Factored Operating System) research OS did cross address space copies as part of its messaging machinery.

HPC networking does this by using shared memory (Cross Memory Attach and XPMEM) in a traditional kernel. 

-L


> On May 18, 2024, at 9:21 PM, Larry McVoy <lm at mcvoy.com> wrote:
> 
> ﻿On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
>> [1] This brings up a separate point: in a microkernel even a simple
>> thing like "foo | bar" would require a third process - a "pipe
>> service", to buffer up the output of foo! You may have reduced
>> the overhead of individual syscalls but you will have more of
>> cross-domain calls!
> 
> Do any micro kernels do address space to address space bcopy()?
> --
> ---
> Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat


From tuhs at tuhs.org  Sun May 19 11:40:42 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Sat, 18 May 2024 18:40:42 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240519012114.GU9216@mcvoy.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
Message-ID: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>

On May 18, 2024, at 6:21 PM, Larry McVoy <lm at mcvoy.com> wrote:
> 
> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
>> [1] This brings up a separate point: in a microkernel even a simple
>> thing like "foo | bar" would require a third process - a "pipe
>> service", to buffer up the output of foo! You may have reduced
>> the overhead of individual syscalls but you will have more of
>> cross-domain calls!
> 
> Do any micro kernels do address space to address space bcopy()?

mmapping the same page in two processes won't be hard but now
you have complicated cat (or some iolib)!

From tuhs at tuhs.org  Sun May 19 11:50:57 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Sat, 18 May 2024 18:50:57 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
 <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
Message-ID: <80302C1F-99D7-4E5F-8656-2E7E67C40422@iitbombay.org>

On May 18, 2024, at 6:40 PM, Bakul Shah <bakul at iitbombay.org> wrote:
> 
> On May 18, 2024, at 6:21 PM, Larry McVoy <lm at mcvoy.com> wrote:
>> 
>> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
>>> [1] This brings up a separate point: in a microkernel even a simple
>>> thing like "foo | bar" would require a third process - a "pipe
>>> service", to buffer up the output of foo! You may have reduced
>>> the overhead of individual syscalls but you will have more of
>>> cross-domain calls!
>> 
>> Do any micro kernels do address space to address space bcopy()?
> 
> mmapping the same page in two processes won't be hard but now
> you have complicated cat (or some iolib)!

And there are other issues. As Doug said in his original message
in this thread: "And input buffering must never ingest data that
the program will not eventually use." Consider something like this:

(echo 1; echo 2)|(read; cat)

This will print 2. Emulating this with mmaped buffers and copying
will not be easy....


From lm at mcvoy.com  Sun May 19 12:02:56 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Sat, 18 May 2024 19:02:56 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
 <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
Message-ID: <20240519020256.GV9216@mcvoy.com>

On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote:
> On May 18, 2024, at 6:21???PM, Larry McVoy <lm at mcvoy.com> wrote:
> > 
> > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
> >> [1] This brings up a separate point: in a microkernel even a simple
> >> thing like "foo | bar" would require a third process - a "pipe
> >> service", to buffer up the output of foo! You may have reduced
> >> the overhead of individual syscalls but you will have more of
> >> cross-domain calls!
> > 
> > Do any micro kernels do address space to address space bcopy()?
> 
> mmapping the same page in two processes won't be hard but now
> you have complicated cat (or some iolib)!

I recall asking Linus if that could be done to save TLB entries, as in
multiple processes map a portion of their address space (at the same
virtual location) and then they all use the same TLB entries for that
part of their address space.  He said it couldn't be done because the
process ID concept was hard wired into the TLB.  I don't know if TLB
tech has evolved such that a single process could have multiple "process"
IDs associated with it in the TLB.

I wanted it because if you could share part of your address space with
another process, using the same TLB entries, then motivation for threads
could go away (I've never been a threads fan but I acknowledge why
you might need them).  I was channeling Rob's "If you think you need
threads, your processes are too fat".  

The idea of using processes instead of threads falls down when you
consider TLB usage.  And TLB usage, when you care about performance, is
an issue.  I could craft you some realistic benchmarks, mirroring real
world work loads, that would kill the idea of replacing threads with
processes unless they shared TLB entries.  Think of a N-way threaded
application, lots of address space used, that application uses all of the
TLB.  Now do that with N processes and your TLB is N times less effective.

This was a conversation decades ago so maybe TLB tech now has solved this.
I doubt it, if this was a solved problem I think every OS would say screw
threads, just use processes and mmap().  The nice part of that model
is you can choose what parts of your address space you want to share.
That cuts out a HUGE swath of potential problems where another thread
can go poke in a part of your address space that you don't want poked.
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From andreww591 at gmail.com  Sun May 19 12:26:54 2024
From: andreww591 at gmail.com (Andrew Warkentin)
Date: Sat, 18 May 2024 20:26:54 -0600
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240519012114.GU9216@mcvoy.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
Message-ID: <CAD-qYGp0Y9WxqfFP4oi5X5ZnAi4D7fttUBq+vVxDfWOxVeUsVg@mail.gmail.com>

On Sat, May 18, 2024 at 7:27 PM Larry McVoy <lm at mcvoy.com> wrote:
>
> Do any micro kernels do address space to address space bcopy()?
>

QNX and some L4-like kernels copy directly between address spaces. QNX
copies between readv()/writev()-style vectors of arbitrary length.
L4-like kernels have different forms of direct copy; Pistachio
supports copying between a collection of "strings" that are limited to
4M each. seL4 on the other hand is limited to a single page-sized
fixed buffer for each thread (I've been working on an as-yet unnamed
fork of it that supports QNX-like vectors for the OS I'm working on; I
gave up on my previous plan to use async queues and intermediary
buffers to support arbitrary-length messages in user space, since that
was turning out to be rather ugly and would have had a high risk of
priority inversion).

From tuhs at tuhs.org  Sun May 19 12:28:03 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Sat, 18 May 2024 19:28:03 -0700
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240519020256.GV9216@mcvoy.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
 <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
 <20240519020256.GV9216@mcvoy.com>
Message-ID: <5216605C-37DD-4B39-9363-4DF9327FEEAB@iitbombay.org>


On May 18, 2024, at 7:02 PM, Larry McVoy <lm at mcvoy.com> wrote:
> 
> On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote:
>> On May 18, 2024, at 6:21???PM, Larry McVoy <lm at mcvoy.com> wrote:
>>> 
>>> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
>>>> [1] This brings up a separate point: in a microkernel even a simple
>>>> thing like "foo | bar" would require a third process - a "pipe
>>>> service", to buffer up the output of foo! You may have reduced
>>>> the overhead of individual syscalls but you will have more of
>>>> cross-domain calls!
>>> 
>>> Do any micro kernels do address space to address space bcopy()?
>> 
>> mmapping the same page in two processes won't be hard but now
>> you have complicated cat (or some iolib)!
> 
> I recall asking Linus if that could be done to save TLB entries, as in
> multiple processes map a portion of their address space (at the same
> virtual location) and then they all use the same TLB entries for that
> part of their address space.  He said it couldn't be done because the
> process ID concept was hard wired into the TLB.  I don't know if TLB
> tech has evolved such that a single process could have multiple "process"
> IDs associated with it in the TLB.

Two TLB entries can point to the same physical page. Is that not good
enough? One process can give its address space a..b and the kernel
(or the memory daemon) maps a..b to other process'es a'..b'. a..b may
be associated with a file so any IO would have to be seen by both.

> I wanted it because if you could share part of your address space with
> another process, using the same TLB entries, then motivation for threads
> could go away (I've never been a threads fan but I acknowledge why
> you might need them).  I was channeling Rob's "If you think you need
> threads, your processes are too fat".

> The idea of using processes instead of threads falls down when you
> consider TLB usage.  And TLB usage, when you care about performance, is
> an issue.  I could craft you some realistic benchmarks, mirroring real
> world work loads, that would kill the idea of replacing threads with
> processes unless they shared TLB entries.  Think of a N-way threaded
> application, lots of address space used, that application uses all of the
> TLB.  Now do that with N processes and your TLB is N times less effective.
> 
> This was a conversation decades ago so maybe TLB tech now has solved this.
> I doubt it, if this was a solved problem I think every OS would say screw
> threads, just use processes and mmap().  The nice part of that model
> is you can choose what parts of your address space you want to share.
> That cuts out a HUGE swath of potential problems where another thread
> can go poke in a part of your address space that you don't want poked.

You can sort of evolve plan9's rfork to do a partial address share.
The issue with process vs thread is the context switch time. Sharing
pages doesn't change that.

From andreww591 at gmail.com  Sun May 19 12:53:39 2024
From: andreww591 at gmail.com (Andrew Warkentin)
Date: Sat, 18 May 2024 20:53:39 -0600
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <20240519020256.GV9216@mcvoy.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
 <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
 <20240519020256.GV9216@mcvoy.com>
Message-ID: <CAD-qYGp5Hw2E8O7QkVq86uwa=_wef7c1tS3OgpuZA_GEvHhX3g@mail.gmail.com>

On Sat, May 18, 2024 at 8:03 PM Larry McVoy <lm at mcvoy.com> wrote:
>
> On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote:
> > On May 18, 2024, at 6:21???PM, Larry McVoy <lm at mcvoy.com> wrote:
> > >
> > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
> > >> [1] This brings up a separate point: in a microkernel even a simple
> > >> thing like "foo | bar" would require a third process - a "pipe
> > >> service", to buffer up the output of foo! You may have reduced
> > >> the overhead of individual syscalls but you will have more of
> > >> cross-domain calls!
> > >
> > > Do any micro kernels do address space to address space bcopy()?
> >
> > mmapping the same page in two processes won't be hard but now
> > you have complicated cat (or some iolib)!
>
> I recall asking Linus if that could be done to save TLB entries, as in
> multiple processes map a portion of their address space (at the same
> virtual location) and then they all use the same TLB entries for that
> part of their address space.  He said it couldn't be done because the
> process ID concept was hard wired into the TLB.  I don't know if TLB
> tech has evolved such that a single process could have multiple "process"
> IDs associated with it in the TLB.
>
> I wanted it because if you could share part of your address space with
> another process, using the same TLB entries, then motivation for threads
> could go away (I've never been a threads fan but I acknowledge why
> you might need them).  I was channeling Rob's "If you think you need
> threads, your processes are too fat".
>
> The idea of using processes instead of threads falls down when you
> consider TLB usage.  And TLB usage, when you care about performance, is
> an issue.  I could craft you some realistic benchmarks, mirroring real
> world work loads, that would kill the idea of replacing threads with
> processes unless they shared TLB entries.  Think of a N-way threaded
> application, lots of address space used, that application uses all of the
> TLB.  Now do that with N processes and your TLB is N times less effective.
>
> This was a conversation decades ago so maybe TLB tech now has solved this.
> I doubt it, if this was a solved problem I think every OS would say screw
> threads, just use processes and mmap().  The nice part of that model
> is you can choose what parts of your address space you want to share.
> That cuts out a HUGE swath of potential problems where another thread
> can go poke in a part of your address space that you don't want poked.
>

I've never been a fan of the rfork()/clone() model. With the OS I'm
working on, rather than using processes that share state as threads, a
process will more or less just be a collection of threads that share a
command line and get replaced on exec(). All of the state usually
associated with a process (e.g. file descriptor space, filesystem
namespace, virtual address space, memory allocations) will instead be
stored in separate container objects that can be shared between
threads. It will be possible to share any of these containers between
processes, or use different combinations between threads within a
process. This would allow more control over what gets shared between
threads/processes than rfork()/clone() because the state containers
will appear in the filesystem and be explicitly bound to threads
rather than being anonymous and only transferred on rfork()/clone().
Emulating rfork()/clone on top of this will be easy enough though.

From mrochkind at gmail.com  Sun May 19 18:30:32 2024
From: mrochkind at gmail.com (Marc Rochkind)
Date: Sun, 19 May 2024 11:30:32 +0300
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAD-qYGp5Hw2E8O7QkVq86uwa=_wef7c1tS3OgpuZA_GEvHhX3g@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
 <20240519012114.GU9216@mcvoy.com>
 <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org>
 <20240519020256.GV9216@mcvoy.com>
 <CAD-qYGp5Hw2E8O7QkVq86uwa=_wef7c1tS3OgpuZA_GEvHhX3g@mail.gmail.com>
Message-ID: <CAOkr1zVPfswu5kQ+8xd2zEGWTsi2++kOQkkbGd82gdv0YOrhFQ@mail.gmail.com>

Yes, many classic commands -- cat, cp, and others -- were sleekly and
succinctly written.

In part because they were devoid of error checking.

I recall how annoying it was one time in the early 70s to cp a bunch of
files to a file system that was out of space.

As I grew older, my concept of what constituted elegant programming changed.

UNIX was a *research* project, not a production system!

At one of the first UNIX meetings, somebody from an OSS (operations support
system) was talking about the limitations of UNIX when Doug asked, "Why are
you using UNIX?"

Marc

On Sun, May 19, 2024, 5:54 AM Andrew Warkentin <andreww591 at gmail.com> wrote:

> On Sat, May 18, 2024 at 8:03 PM Larry McVoy <lm at mcvoy.com> wrote:
> >
> > On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote:
> > > On May 18, 2024, at 6:21???PM, Larry McVoy <lm at mcvoy.com> wrote:
> > > >
> > > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote:
> > > >> [1] This brings up a separate point: in a microkernel even a simple
> > > >> thing like "foo | bar" would require a third process - a "pipe
> > > >> service", to buffer up the output of foo! You may have reduced
> > > >> the overhead of individual syscalls but you will have more of
> > > >> cross-domain calls!
> > > >
> > > > Do any micro kernels do address space to address space bcopy()?
> > >
> > > mmapping the same page in two processes won't be hard but now
> > > you have complicated cat (or some iolib)!
> >
> > I recall asking Linus if that could be done to save TLB entries, as in
> > multiple processes map a portion of their address space (at the same
> > virtual location) and then they all use the same TLB entries for that
> > part of their address space.  He said it couldn't be done because the
> > process ID concept was hard wired into the TLB.  I don't know if TLB
> > tech has evolved such that a single process could have multiple "process"
> > IDs associated with it in the TLB.
> >
> > I wanted it because if you could share part of your address space with
> > another process, using the same TLB entries, then motivation for threads
> > could go away (I've never been a threads fan but I acknowledge why
> > you might need them).  I was channeling Rob's "If you think you need
> > threads, your processes are too fat".
> >
> > The idea of using processes instead of threads falls down when you
> > consider TLB usage.  And TLB usage, when you care about performance, is
> > an issue.  I could craft you some realistic benchmarks, mirroring real
> > world work loads, that would kill the idea of replacing threads with
> > processes unless they shared TLB entries.  Think of a N-way threaded
> > application, lots of address space used, that application uses all of the
> > TLB.  Now do that with N processes and your TLB is N times less
> effective.
> >
> > This was a conversation decades ago so maybe TLB tech now has solved
> this.
> > I doubt it, if this was a solved problem I think every OS would say screw
> > threads, just use processes and mmap().  The nice part of that model
> > is you can choose what parts of your address space you want to share.
> > That cuts out a HUGE swath of potential problems where another thread
> > can go poke in a part of your address space that you don't want poked.
> >
>
> I've never been a fan of the rfork()/clone() model. With the OS I'm
> working on, rather than using processes that share state as threads, a
> process will more or less just be a collection of threads that share a
> command line and get replaced on exec(). All of the state usually
> associated with a process (e.g. file descriptor space, filesystem
> namespace, virtual address space, memory allocations) will instead be
> stored in separate container objects that can be shared between
> threads. It will be possible to share any of these containers between
> processes, or use different combinations between threads within a
> process. This would allow more control over what gets shared between
> threads/processes than rfork()/clone() because the state containers
> will appear in the filesystem and be explicitly bound to threads
> rather than being anonymous and only transferred on rfork()/clone().
> Emulating rfork()/clone on top of this will be easy enough though.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/24d5aa1a/attachment.htm>

From mrochkind at gmail.com  Sun May 19 18:39:32 2024
From: mrochkind at gmail.com (Marc Rochkind)
Date: Sun, 19 May 2024 11:39:32 +0300
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
Message-ID: <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>

UNIX man pages were almost universally accurate, complete, and succinct.

That third admirable attribute gave me the opportunity to write *Advanced
UNIX Programming*.

So I wasn't complaining.

Marc

On Sat, May 18, 2024, 11:33 PM Steffen Nurpmeso <steffen at sdaoden.eu> wrote:

> Douglas McIlroy wrote in
>  <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg at mail.gmail.com>:
>  |I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less
> is
>  |more".
>  |       % less --help | wc
>  |      298
>  |Last time I looked, the line count was about 220. Bloat is
> self-catalyzing.
>
> I do not buy that.
> You are working on Windows and in the meantime have switched to
> one of those graphical browser monsters (i think) where each
> instance has more code active than the entire Unix history
> altogether.
>
> less(1) can now Unicode, and that is not as easy with ISO/POSIX as
> it was on Plan9 for example which simply goes UTF-8 and has some
> (smart) lookup tables (now in go, more or less, last i looked),
> but that is not the whole picture of it.
>
> It can those ANSI / ISO 6429 color sequences that everybody wants,
> as you have them everywhere, even GNU's yacc, bison.
>
> The OpenBSD people took a port done by an OpenSolaris (i think,
> that scene, anyhow) guy, and together they stripped it down
> massively.
>
> But i do not use it, because after almost exactly a decade i got
> upstreamed to Nudelman's less(1) the necessary patches to have
> active hyperlinks on the terminal, in a normal Unix (roff mdoc)
> manual.  (These work via OSC-8 escape sequences; it was a "15
> files changed, 601 insertions(+), 9 deletions(-)" patch, which
> included careful quoting of file paths etc. for man(1) openings
> (ie, such code gets lengthy), but he did it differently a bit, and
> left off some things i wanted, included others (good), but if you
> use --mouse with his one then you have a real browser feeling.
> I have problems with --mouse, unfortunately, because when used you
> can no longer copy+paste -- he would need to add clipboard control
> in addition i'd say.., adding even more code.)
>
> You know, it may be viable for some tools, but for others, .. not.
> You say it yourself in your "A Research UNIX Reader": "Electronic
> mail was there from the start.  Never satisfied with its exact
> behavior, everybody touched it at one time or another".
> In the meantime the IETF went grazy and produced masses of
> standards, and unfortunately each one adds a little bit that needs
> to be addressed differently, and all that needs documentation.
> Now mail is an extreme example.
>
> And almost a quarter of a century ago i wrote a small pager that
> even had a clock, and it required less CPU on a day with some
> scrolling than less/ncurses for a one time scroll through the
> document.  But that pager is history, and less is still there,
> running everywhere, and being used by me dozens to hundreds time
> a day.  Also with colours, with searching, and now also with
>
>   ^O^N  ^On         *  Search forward for (N-th) OSC8 hyperlink.
>   ^O^P  ^Op         *  Search backward for (N-th) OSC8 hyperlink.
>   ^O^L  ^Ol            Jump to the currently selected OSC8 hyperlink.
>
> And prepared mdoc manuals can now display on a normal Unix
> terminal in a normal (actively OSC-8 supporting $PAGER) a TOC (at
> will, with links), and have external (man:, but also http: etc;
> man is built into less(1) -- yay!) links, too.
> For example here ∞ is an external, and † are internal links:
>
>    The OpenSSL program ciphers(1)∞ should be referred to when creating a
>    custom cipher list.  Variables of interest for TLS in general are
>    tls-ca-dir†, tls-ca-file†, tls-ca-flags†, tls-ca-no-defaults†,
>    tls-config-file†, tls-config-module†, tls-config-pairs
>
> So ^O^L on that ciphers(1) opens a new man(1)ual instance.
> For all this functionality a program with 221K bytes is small:
>
>   221360 May 18 22:13 ...less*
>
> Also it starts up into interactive mode with --help.
> So you could have "full interactivity" and colours and mouse, and
> configurability to a large extend, which somehow has to be
> documented, in just 221 K bytes.
>
> I give in in that i try to have --help/-h and --long-help/-H, but
> sometimes that -h is only minimal, because a screenful of data is
> simply not enough to allow users to have a notion.
>
> So less could split the manual into a less.1 and a less-book.7.
> The same is true for bash, for sure.  (And for my little mailer.)
> But things tend to divert, and it is hard enough to keep one
> manual in sync with the codebase, especially if you develop
> focused and expert-idiotized in a one man show.
>
>  |What prompted me to look was another disheartening discovery. The "small
>  |special tool" Gnu diff has a 95-page manual!  And it doesn't cover the
>  |option I was looking up (-h). To be fair, the manual includes related
>  |programs like diff3(1), sdiff(1) and patch(1), but the original manual
> for
>  |each fit on one page.
>  --End of <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg at mail.gmail\
>  .com>
>
> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/9c8b084d/attachment.htm>

From ralph at inputplus.co.uk  Sun May 19 18:58:15 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sun, 19 May 2024 09:58:15 +0100
Subject: [TUHS] The 'usage: ...' message.  (Was: On Bloat...)
In-Reply-To: <20240518182218.44ED921309@orac.inputplus.co.uk>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518182218.44ED921309@orac.inputplus.co.uk>
Message-ID: <20240519085815.5B37A20146@orac.inputplus.co.uk>

Hi,

I wrote:
> Another point against adding --help: there's a second attempt to
> describe the source.

It occurred to me --help's the third attempt as there's already ‘usage:
argv[0] ...’.  Back when running man took time and paper, I can see
a one-line summary to aid memory was useful.  I wondered when it first
appeared.

I've found V2, https://www.tuhs.org/cgi-bin/utree.pl?file=V2/cmd, has
cmp.s with

    cmp     (sp)+,$3
    beq     1f
    jsr     r5,mesg; <Usage: cmp arg1 arg2\n\0>; .even
    sys     exit

And cp.c has

    if(argc != 3) {
	    write(1,"Usage: cp oldfile newfile\n",26);
	    exit();
    }

Given the lack of options, the need for a usage message surprises me.
But then ‘cp a-src a-dest b-src b-dest ...’ used to copy files in pairs.
Perhaps when this was dropped, one too many losses?, the usage was
needed to remind users of the change.

Any earlier Unix examples known by the list?
And was ‘usage: ...’ adopted from an earlier system?

-- 
Cheers, Ralph.

From ralph at inputplus.co.uk  Sun May 19 20:41:27 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sun, 19 May 2024 11:41:27 +0100
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
Message-ID: <20240519104127.64670208CA@orac.inputplus.co.uk>

Hi,

Doug wrote:
> The underlying evil of buffered IO still lurks.  The justification is
> that it's necessary to match the characteristics of IO devices and to
> minimize system-call overhead.  The former necessity requires the
> attention of hardware designers, but the latter is in the hands of
> programmers.  What can be done to mitigate the pain of border-crossing
> into the kernel?

Has there been any system-on-chip experimentation with hardware ‘pipes’?
They have LIFOs for UARTs.  What about LIFO hardware tracking the
content of shared memory?

Registers can be written to give the base address and buffer size.
Various water marks set: every byte as it arrives versus ‘It's not worth
getting out of bed for less than 64 KiB’.  Read-only registers would
allow polling when the buffer is full or empty, or a ‘device’ could be
configured to interrupt.  Trying to read/write a byte which wasn't
‘yours’ would trap.

It would be two cores synchronising without the kernel thanks to
hardware.

-- 
Cheers, Ralph.

From douglas.mcilroy at dartmouth.edu  Mon May 20 00:03:12 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Sun, 19 May 2024 10:03:12 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
Message-ID: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>

> was ‘usage: ...’ adopted from an earlier system?

"Usage" was one of those lovely ideas, one exposure to which flips its
status from unknown to eternal truth. I am sure my first exposure was on
Unix, but I don't remember when. Perhaps because it radically departs from
Ken's "?" in qed/ed, I have subconsciously attributed it to Dennis.

The genius of "usage" and "?" is that they don't attempt to tell one what's
wrong. Most diagnostics cite a rule or hidden limit that's been violated or
describe the mistake (e.g. "missing semicolon") , sometimes raising more
questions than they answer.

Another non-descriptive style of error message that I admired was that of
Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed,
it reported where, and automatically provided a sample token that would
allow the parsing to progress. I found this uniform convention to be at
least as informative as distinct hand-crafted messages, which almost by
definition can't foresee every contingency. Alas, this elegant scheme seems
not to have inspired imitators.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/a24a5cb1/attachment.htm>

From paul.winalski at gmail.com  Mon May 20 02:04:53 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Sun, 19 May 2024 12:04:53 -0400
Subject: [TUHS] If forking is bad, how about buffering?
In-Reply-To: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
References: <CAKH6PiWr1YXhHUrgy=NW5gerDGrBnxMYEPtv1x1ho+n4FmFUzw@mail.gmail.com>
 <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>
 <20240514111032.2kotrrjjv772h5f4@illithid>
 <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com>
 <20240515164212.beswgy4h2nwvbdck@illithid>
 <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org>
Message-ID: <CABH=_VTWgkmY=h7PEU2yproyjnDyQB2vdPu7QRgNkUeu=g7akA@mail.gmail.com>

On Sat, May 18, 2024 at 9:04 PM Bakul Shah via TUHS <tuhs at tuhs.org> wrote:

>
> Note that even if you remove every RAM buffer between the two
> endpoints of a TCP connection, you still have a "buffer".


True, and it's unavoidable.  The full name of the virtual circuit
communication protocol is TCP/IP (Transmission Control Protocol over
Internet Protocol).  The underlying IP is the protocol used to actually
transfer the data from machine to machine.  It provides datagram service,
meaning that messages may be duplicated, lost, delivered out of order, or
delivered with errors.  The job of TCP is to provide virtual circuit
service, meaning that messages are delivered once, in order, without
errors, and reliably.

To cope with the underlying datagam service, TCP has to put error checksums
on each message, assign sequence numbers to each message, and has to send
an acknowledgement to the sender when a message is received.  It also has
to be prepared to resend messages if there's no acknowledgement or if the
ack says the message was received with errors.  You can't do all that
without buffering messages.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/f4222dc1/attachment.htm>

From paul.winalski at gmail.com  Mon May 20 02:18:07 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Sun, 19 May 2024 12:18:07 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
References: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
Message-ID: <CABH=_VQ=1rWgxFvE-qS5BXuJ1BN=rGN2mJ620vtBEiUouPTnBQ@mail.gmail.com>

On Sun, May 19, 2024 at 10:03 AM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

>
> Another non-descriptive style of error message that I admired was that of
> Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed,
> it reported where, and automatically provided a sample token that would
> allow the parsing to progress. I found this uniform convention to be at
> least as informative as distinct hand-crafted messages, which almost by
> definition can't foresee every contingency. Alas, this elegant scheme seems
> not to have inspired imitators.
>
> The hazard with this approach is that the suggested syntactic correction
might simply lead the user farther into the weeds.  It depends on how far
the parse has gone off the rails before a grammatical error is found.
Pascal and BASIC (at least the original Dartmouth BASIC) have simple,
well-behaved grammars and the suggested syntactic correction is likely to
be correct.  It doesn't work as well for more syntactically complicated
languages such as C (consider an error resulting from use of == instead of
=) or PL/I.  And it's nigh on impossible for languages with ill-behaved
grammars such as Fortran and COBOL (among other grammatical evils, Fortran
has context-sensitive lexiing).

Commercial compiler writers avoid this techniq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/488d0361/attachment.htm>

From paul.winalski at gmail.com  Mon May 20 02:21:43 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Sun, 19 May 2024 12:21:43 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CABH=_VQ=1rWgxFvE-qS5BXuJ1BN=rGN2mJ620vtBEiUouPTnBQ@mail.gmail.com>
References: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
 <CABH=_VQ=1rWgxFvE-qS5BXuJ1BN=rGN2mJ620vtBEiUouPTnBQ@mail.gmail.com>
Message-ID: <CABH=_VTh82U1bG5ys=5n+82PvXz7aGQoqTHguGaLHNP_zP+rxw@mail.gmail.com>

Ack!  I finger-fumbled and accidentally sent this message incomplete.
Here's the complete version.  Sorry about that, Chief!

On Sun, May 19, 2024 at 12:18 PM Paul Winalski <paul.winalski at gmail.com>
wrote:

> On Sun, May 19, 2024 at 10:03 AM Douglas McIlroy <
> douglas.mcilroy at dartmouth.edu> wrote:
>
>>
>> Another non-descriptive style of error message that I admired was that of
>> Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed,
>> it reported where, and automatically provided a sample token that would
>> allow the parsing to progress. I found this uniform convention to be at
>> least as informative as distinct hand-crafted messages, which almost by
>> definition can't foresee every contingency. Alas, this elegant scheme seems
>> not to have inspired imitators.
>>
>> The hazard with this approach is that the suggested syntactic correction
> might simply lead the user farther into the weeds.  It depends on how far
> the parse has gone off the rails before a grammatical error is found.
> Pascal and BASIC (at least the original Dartmouth BASIC) have simple,
> well-behaved grammars and the suggested syntactic correction is likely to
> be correct.  It doesn't work as well for more syntactically complicated
> languages such as C (consider an error resulting from use of == instead of
> =) or PL/I.  And it's nigh on impossible for languages with ill-behaved
> grammars such as Fortran and COBOL (among other grammatical evils, Fortran
> has context-sensitive lexiing).
>
> Commercial compiler writers avoid this techniq
>

Commercial compiler writers avoid this technique because it turns into an
error report generator.  The potential user benefit is outweighed by all of
the "your compiler suggested X as a correction when the problem was Y"
that need to be answered.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/6450bc40/attachment.htm>

From ralph at inputplus.co.uk  Mon May 20 03:22:16 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sun, 19 May 2024 18:22:16 +0100
Subject: [TUHS] The 'usage: ...' message.
In-Reply-To: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
References: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
Message-ID: <20240519172216.E554E2130A@orac.inputplus.co.uk>

Hi Doug,

> Perhaps because it radically departs from Ken's "?" in qed/ed

That spread elsewhere.  When PDP-7 Unix's cp.s is given an odd number of
arguments, leaving the last unpaired, it prints the argument followed by
‘ ?’.  https://www.tuhs.org/cgi-bin/utree.pl?file=PDP7-Unix/cmd/cp.s

    mes:
       040000;077012		

    unbal:				
       lac name2			
       tad d4			
       dac 1f
       lac d1
       sys write; 1: 0; 4
       lac d1
       sys write; mes; 2
       sys exit

-- 
Cheers, Ralph.

From dave at horsfall.org  Mon May 20 06:42:09 2024
From: dave at horsfall.org (Dave Horsfall)
Date: Mon, 20 May 2024 06:42:09 +1000 (EST)
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
References: <CAKH6PiVSU4o5twt8EXZO6FAKdxMu19SOWLEyYRVa0yBM3BXAUA@mail.gmail.com>
Message-ID: <alpine.BSF.2.21.9999.2405200632070.15285@aneurin.horsfall.org>

On Sun, 19 May 2024, Douglas McIlroy wrote:

> Another non-descriptive style of error message that I admired was that 
> of Berkeley Pascal's syntax diagnostics. When the LR parser could not 
> proceed, it reported where, and automatically provided a sample token 
> that would allow the parsing to progress. I found this uniform 
> convention to be at least as informative as distinct hand-crafted 
> messages, which almost by definition can't foresee every contingency. 
> Alas, this elegant scheme seems not to have inspired imitators.

I did something like that for our compiler-writing assignment.  An 
ALGOL-like language (I think I used ALGOLW) it would detect when a 
semicolon was missing, and insert it (with a warning).  As a test case, it 
successfully compiled a program with no semicolons at all...

-- Dave

From douglas.mcilroy at dartmouth.edu  Mon May 20 09:08:12 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Sun, 19 May 2024 19:08:12 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
Message-ID: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>

>> Another non-descriptive style of error message that I admired was that
>> of Berkeley Pascal's syntax diagnostics. When the LR parser could not
>> proceed, it reported where, and automatically provided a sample token
>> that would allow the parsing to progress. I found this uniform
>> convention to be at least as informative as distinct hand-crafted
>> messages, which almost by definition can't foresee every contingency.
>> Alas, this elegant scheme seems not to have inspired imitators.

> The hazard with this approach is that the suggested syntactic correction
> might simply lead the user farther into the weeds

I don't think there's enough experience to justify this claim. Before I
experienced the Berkeley compiler, I would have thought such bad outcomes
were inevitable in any language. Although the compilers' suggestions often
bore little or no relationship to the real correction,  I always found them
informative. In particular, the utterly consistent style assured there was
never an issue of ambiguity or of technical jargon.

The compiler taught me Pascal in an evening. I had scanned the Pascal
Report a couple of years before but had never written a Pascal program.
With no manual at hand, I looked at one program to find out what
mumbo-jumbo had to come first and how to print integers, then wrote the
rest by trial and error. Within a couple of hours  I had a working program
good enough to pass muster in an ACM journal.

An example arose that one might think would lead "into the weeds". The
parser balked before 'or' in a compound Boolean expression like  'a=b and
c=d or x=y'. It couldn't suggest a right paren because no left paren had
been seen. Whatever suggestion it did make (perhaps 'then') was enough to
lead me to insert a remote left paren and teach me that parens are required
around Boolean-valued subexpressions. (I will agree that this lesson might
be less clear to a programming novice, but so might be many conventional
diagnostics, e.g. "no effect".)

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/060b437e/attachment.htm>

From robpike at gmail.com  Mon May 20 10:58:54 2024
From: robpike at gmail.com (Rob Pike)
Date: Mon, 20 May 2024 10:58:54 +1000
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
Message-ID: <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>

The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch
input. It tried automatically to keep things running after a parsing error
by inserting some token - semicolon, parenthesis, whatever seemed best -
and continuing to parse, in order to maximize the amount of input that
could be parsed before giving up. At least, that's what I took the
motivation to be. It rarely succeeded in fixing the actual problem, despite
PL/I being plastered with semicolons, but it did tend to ferret out more
errors per run. I found the tactic helpful.

-rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/5422b10b/attachment-0001.htm>

From arnold at skeeve.com  Mon May 20 13:19:02 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Sun, 19 May 2024 21:19:02 -0600
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
Message-ID: <202405200319.44K3J2Jq117819@freefriends.org>

Rob Pike <robpike at gmail.com> wrote:

> The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch
> input. It tried automatically to keep things running after a parsing error
> by inserting some token - semicolon, parenthesis, whatever seemed best -
> and continuing to parse, in order to maximize the amount of input that
> could be parsed before giving up. At least, that's what I took the
> motivation to be. It rarely succeeded in fixing the actual problem, despite
> PL/I being plastered with semicolons, but it did tend to ferret out more
> errors per run. I found the tactic helpful.
>
> -rob

Gawk used to do this, until people started fuzzing it, causing cascading
errors and eventually core dumps. Now the first syntax error is fatal.
It got to the point where I added this text to the manual:

	In recent years, people have been running "fuzzers" to generate
	invalid awk programs in order to find and report (so-called)
	bugs in gawk.

	In general, such reports are not of much practical use. The
	programs they create are not realistic and the bugs found are
	generally from some kind of memory corruption that is fatal
	anyway.

	So, if you want to run a fuzzer against gawk and report the
	results, you may do so, but be aware that such reports don’t
	carry the same weight as reports of real bugs do.

(Yeah, I've just changed the subject, feel free to stay on topic. :-)

Arnold

From imp at bsdimp.com  Mon May 20 13:43:11 2024
From: imp at bsdimp.com (Warner Losh)
Date: Sun, 19 May 2024 21:43:11 -0600
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <202405200319.44K3J2Jq117819@freefriends.org>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
Message-ID: <CANCZdfoQ7EH9mHVK=OY7_86TA1zpwNLAjHJ5V8emo3Q22U3AbQ@mail.gmail.com>

On Sun, May 19, 2024, 9:19 PM <arnold at skeeve.com> wrote:

> Rob Pike <robpike at gmail.com> wrote:
>
> > The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used
> batch
> > input. It tried automatically to keep things running after a parsing
> error
> > by inserting some token - semicolon, parenthesis, whatever seemed best -
> > and continuing to parse, in order to maximize the amount of input that
> > could be parsed before giving up. At least, that's what I took the
> > motivation to be. It rarely succeeded in fixing the actual problem,
> despite
> > PL/I being plastered with semicolons, but it did tend to ferret out more
> > errors per run. I found the tactic helpful.
> >
> > -rob
>
> Gawk used to do this, until people started fuzzing it, causing cascading
> errors and eventually core dumps. Now the first syntax error is fatal.
> It got to the point where I added this text to the manual:
>
>         In recent years, people have been running "fuzzers" to generate
>         invalid awk programs in order to find and report (so-called)
>         bugs in gawk.
>
>         In general, such reports are not of much practical use. The
>         programs they create are not realistic and the bugs found are
>         generally from some kind of memory corruption that is fatal
>         anyway.
>
>         So, if you want to run a fuzzer against gawk and report the
>         results, you may do so, but be aware that such reports don’t
>         carry the same weight as reports of real bugs do.
>
> (Yeah, I've just changed the subject, feel free to stay on topic. :-)
>


Awk bailing out near line 1.

Warner

> Arnold
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/6fa4971d/attachment.htm>

From tuhs at tuhs.org  Mon May 20 13:54:59 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Sun, 19 May 2024 20:54:59 -0700
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
Message-ID: <D0DFE2BB-D8FC-4C0E-B9C3-7E75A2CF821B@iitbombay.org>

I remember helping newbie students at USC who were very confused that even though they made the changes "PL/C USES", their program didn't work! 

> On May 19, 2024, at 5:58 PM, Rob Pike <robpike at gmail.com> wrote:
> 
> The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch input. It tried automatically to keep things running after a parsing error by inserting some token - semicolon, parenthesis, whatever seemed best - and continuing to parse, in order to maximize the amount of input that could be parsed before giving up. At least, that's what I took the motivation to be. It rarely succeeded in fixing the actual problem, despite PL/I being plastered with semicolons, but it did tend to ferret out more errors per run. I found the tactic helpful.
> 
> -rob
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/c21e6739/attachment.htm>

From arnold at skeeve.com  Mon May 20 14:46:53 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Sun, 19 May 2024 22:46:53 -0600
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CANCZdfoQ7EH9mHVK=OY7_86TA1zpwNLAjHJ5V8emo3Q22U3AbQ@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
 <CANCZdfoQ7EH9mHVK=OY7_86TA1zpwNLAjHJ5V8emo3Q22U3AbQ@mail.gmail.com>
Message-ID: <202405200446.44K4kr1Q124396@freefriends.org>

Warner Losh <imp at bsdimp.com> wrote:

> > (Yeah, I've just changed the subject, feel free to stay on topic. :-)
>
> Awk bailing out near line 1.

	$ gawk --nostalgia
	awk: bailing out near line 1
	Aborted (core dumped)

A very long time Easter Egg... :-)

Arnold

From athornton at gmail.com  Mon May 20 16:07:37 2024
From: athornton at gmail.com (Adam Thornton)
Date: Sun, 19 May 2024 23:07:37 -0700
Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools
In-Reply-To: <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
 <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
Message-ID: <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>

I can't tell you--although some of you will know--what a delight it is to
be working on a project with an actual documentation engineer.

That person (Jonathan Sick, if any of you want to hire him) has engineered
things such that it is easy to write good documentation for the projects we
write, and not very onerous.

He's put in an enormous amount of effort to ensure that if we write
reasonably clean code, we can also auto-generate accurate and complete API
documentation for it.  And to the degree we want to write explanatory docs,
that's catered for too.

It has been an amazing experience compared to my entire prior history.

Adam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240519/54cfda86/attachment.htm>

From ralph at inputplus.co.uk  Mon May 20 19:20:13 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Mon, 20 May 2024 10:20:13 +0100
Subject: [TUHS] A fuzzy awk.  (Was: The 'usage: ...' message.)
In-Reply-To: <202405200319.44K3J2Jq117819@freefriends.org>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
Message-ID: <20240520092013.21BD01FB2F@orac.inputplus.co.uk>

Hi Arnold,

> > in order to maximize the amount of input that could be parsed before
> > giving up.
>
> Gawk used to do this, until people started fuzzing it, causing
> cascading errors and eventually core dumps.  Now the first syntax
> error is fatal.

This is the first time I've heard of making life difficult for fuzzers
so I'm curious...

I'm assuming you agree the eventual core dump was a bug somewhere to be
fixed, and probably was.  Stopping on the first error lessens the
‘attack surface’ for the fuzzer.  Do you think there remains a bug which
would bite a user which the fuzzer might have found more easily before
the shrunken surface?

-- 
Cheers, Ralph.

From arnold at skeeve.com  Mon May 20 21:58:51 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Mon, 20 May 2024 05:58:51 -0600
Subject: [TUHS] A fuzzy awk.  (Was: The 'usage: ...' message.)
In-Reply-To: <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
 <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
Message-ID: <202405201158.44KBwpi6166059@freefriends.org>

Ralph Corderoy <ralph at inputplus.co.uk> wrote:

> This is the first time I've heard of making life difficult for fuzzers
> so I'm curious...

I was making life easier for me. :-)

> I'm assuming you agree the eventual core dump was a bug somewhere to be
> fixed, and probably was.

Not really. Hugely syntactically invalid programs can end up causing
memory corruption as necessary data structures don't get built correctly
(or at all); since they're invalid, subsequent bits of gawk that expect
valid data structures end up not working.  These are "bugs" that can't
happen when using the tool correctly.

> Stopping on the first error lessens the ‘attack surface’ for the
> fuzzer.  Do you think there remains a bug which would bite a user which
> the fuzzer might have found more easily before the shrunken surface?

No.

I don't have any examples handy, but you can look back through the
bug-gawk archives for some examples of these reports.  The number
of true bugs that fuzzers have caught (if any!) could be counted
on one hand.

Sometimes they like to claim that the "bugs" they find could cause
denial of service attacks. That's also specious, gawk isn't used for
long-running server kinds of programs.

The joys of being a Free Software Maintainer.

Arnold

P.S. I don't claim that gawk is bug-free.  But I do think that there
are qualitatively different kinds of bugs, and bug reports.

From douglas.mcilroy at dartmouth.edu  Mon May 20 23:06:30 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Mon, 20 May 2024 09:06:30 -0400
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
Message-ID: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>

I'm surprised by nonchalance about bad inputs evoking bad program behavior.
That attitude may have been excusable 50 years ago. By now, though, we have
seen so much malicious exploitation of open avenues of "undefined behavior"
that we can no longer ignore bugs that "can't happen when using the tool
correctly". Mature software should not brook incorrect usage.

"Bailing out near line 1" is a sign of defensive precautions. Crashes and
unjustified output betray their absence.

I commend attention to the LangSec movement, which advocates for rigorously
enforced separation between legal and illegal inputs.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/8e3d3815/attachment.htm>

From chet.ramey at case.edu  Mon May 20 23:10:03 2024
From: chet.ramey at case.edu (Chet Ramey)
Date: Mon, 20 May 2024 09:10:03 -0400
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
In-Reply-To: <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
 <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
Message-ID: <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu>

On 5/20/24 5:20 AM, Ralph Corderoy wrote:
> Hi Arnold,
> 
>>> in order to maximize the amount of input that could be parsed before
>>> giving up.
>>
>> Gawk used to do this, until people started fuzzing it, causing
>> cascading errors and eventually core dumps.  Now the first syntax
>> error is fatal.
> 
> This is the first time I've heard of making life difficult for fuzzers
> so I'm curious...

It's not making life difficult for them -- they can still fuzz all they
want. Chances are better they'll find a genuine bug if you stop right away.


> I'm assuming you agree the eventual core dump was a bug somewhere to be
> fixed, and probably was.  > Stopping on the first error lessens the
> ‘attack surface’ for the fuzzer.  Do you think there remains a bug which
> would bite a user which the fuzzer might have found more easily before
> the shrunken surface?

Chances are small. (People fuzz bash all the time, and that is my
experience.)

Look at it this way. Free Software maintainers have limited resources. Is
it better to spend time on bugs that will affect a larger percentage of
the user population, instead of those that require artificial circumstances
that won't be encountered by normal usage? Those get pushed down on the
priority list.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/c3a3e713/attachment.sig>

From arnold at skeeve.com  Mon May 20 23:14:07 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Mon, 20 May 2024 07:14:07 -0600
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
In-Reply-To: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
Message-ID: <202405201314.44KDE7rq170661@freefriends.org>

Perhaps I should not respond to this immediately. But:

Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:

> I'm surprised by nonchalance about bad inputs evoking bad program behavior.
> That attitude may have been excusable 50 years ago. By now, though, we have
> seen so much malicious exploitation of open avenues of "undefined behavior"
> that we can no longer ignore bugs that "can't happen when using the tool
> correctly". Mature software should not brook incorrect usage.

It's not nonchalance, not at all!

The current behavior is to die on the first syntax error, instead of
trying to be "helpful" by continuing to try to parse the program in the
hope of reporting other errors.

> "Bailing out near line 1" is a sign of defensive precautions. Crashes and
> unjustified output betray their absence.

The crashes came because errors cascaded.  I don't see a reason to spend
valuable, *personal* time on adding defenses *where they aren't needed*.

A steel door on your bedroom closet does no good if your front door
is made of balsa wood. My change was to stop the badness at the
front door.

> I commend attention to the LangSec movement, which advocates for rigorously
> enforced separation between legal and illegal inputs.

Illegal input, in gawk, as far as I know, should always cause a syntax
error report and an immediate exit.

If it doesn't, that is a bug, and I'll be happy to try to fix it.

I hope that clarifies things.

Arnold

From chet.ramey at case.edu  Mon May 20 23:25:11 2024
From: chet.ramey at case.edu (Chet Ramey)
Date: Mon, 20 May 2024 09:25:11 -0400
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
In-Reply-To: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
Message-ID: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu>

On 5/20/24 9:06 AM, Douglas McIlroy wrote:
> I'm surprised by nonchalance about bad inputs evoking bad program behavior.

I think the claim is that it's better to stop immediately with an error
on invalid input rather than guess at the user's intent and try to go on.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/2e9a0285/attachment.sig>

From ralph at inputplus.co.uk  Mon May 20 23:30:17 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Mon, 20 May 2024 14:30:17 +0100
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
 <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
 <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu>
Message-ID: <20240520133017.BFA761FB2F@orac.inputplus.co.uk>

Hi Chet,

> Is it better to spend time on bugs that will affect a larger
> percentage of the user population, instead of those that require
> artificial circumstances that won't be encountered by normal usage?
> Those get pushed down on the priority list.

You're talking about pushing unlikely, fuzzed bugs down the prioritised
list, but we're discussing those bugs not getting onto the list for
consideration.

Lack of resources also applies to triaging bugs and I agree a fuzzed bug
which hands over a 42 KiB of dense, gibberish awk will probably not get
volunteer attention.  But then fuzzers can seek a smaller test case,
similar to Andreas Zeller's delta debugging.

I'm in no way criticising Arnold who, like you, has spent many years
voluntarily enhancing a program many of us use every day.  But it's
interesting to shine some light on this corner to better understand
what's happening.

-- 
Cheers, Ralph.

From ralph at inputplus.co.uk  Mon May 20 23:41:55 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Mon, 20 May 2024 14:41:55 +0100
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu>
Message-ID: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk>

Hi Chet,

> Doug wrote:
> > I'm surprised by nonchalance about bad inputs evoking bad program
> > behavior.
>
> I think the claim is that it's better to stop immediately with an
> error on invalid input rather than guess at the user's intent and try
> to go on.

That aside, having made the decision to patch up the input so more
punched cards are consumed, the patch should be bug free.

Say it's inserting a semicolon token for pretence.  It should have
initialised source-file locations just as if it were real.  Not an
uninitialised pointer to a source filename so a later dereference
failed.

I can see an avalanche of errors in an earlier gawk caused problems, but
each time there would have been a first patch of the input which made
a mistake causing the pebble to start rolling.  My understanding is that
there was potentially a lot of these and rather than fix them it was
more productive of the limited time to stop patching the input.  Then
the code which patched could be deleted, getting rid of the buggy bits
along the way?

-- 
Cheers, Ralph.

From chet.ramey at case.edu  Mon May 20 23:48:12 2024
From: chet.ramey at case.edu (Chet Ramey)
Date: Mon, 20 May 2024 09:48:12 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <20240520133017.BFA761FB2F@orac.inputplus.co.uk>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <202405200319.44K3J2Jq117819@freefriends.org>
 <20240520092013.21BD01FB2F@orac.inputplus.co.uk>
 <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu>
 <20240520133017.BFA761FB2F@orac.inputplus.co.uk>
Message-ID: <2bfd3b3e-5e0a-4685-9dda-63fc6546e46a@case.edu>

On 5/20/24 9:30 AM, Ralph Corderoy wrote:
> Hi Chet,
> 
>> Is it better to spend time on bugs that will affect a larger
>> percentage of the user population, instead of those that require
>> artificial circumstances that won't be encountered by normal usage?
>> Those get pushed down on the priority list.
> 
> You're talking about pushing unlikely, fuzzed bugs down the prioritised
> list, but we're discussing those bugs not getting onto the list for
> consideration.

I think the question is whether they were bugs in gawk at all, or the
result of gawk trying to be helpful by guessing at the script's intent
and trying to go on. Arnold's reaction to that, which had these negative
effects most often as the result of fuzzing attempts, was to exit on the
first syntax error.

Would those `bugs' have manifested themselves if gawk hadn't tried to do
this? Are they bugs at all? Guessing at intent is bound to be wrong some
of the time, and cause errors of its own.

I'm saying that fuzzing does occasionally find obscure bugs -- bugs that
would never be encountered in normal usage -- and those should be fixed.
Eventually.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/5a00b372/attachment.sig>

From ralph at inputplus.co.uk  Mon May 20 23:54:04 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Mon, 20 May 2024 14:54:04 +0100
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
Message-ID: <20240520135404.1B4181FB2F@orac.inputplus.co.uk>

Hi,

Doug wrote:
> I commend attention to the LangSec movement, which advocates for
> rigorously enforced separation between legal and illegal inputs.

    https://langsec.org

   ‘The Language-theoretic approach (LangSec) regards the Internet
    insecurity epidemic as a consequence of ‘ad hoc’ programming of
    input handling at all layers of network stacks, and in other kinds
    of software stacks.  LangSec posits that the only path to
    trustworthy software that takes untrusted inputs is treating all
    valid or expected inputs as a formal language, and the respective
    input-handling routines as a ‘recognizer’ for that language.
    The recognition must be feasible, and the recognizer must match the
    language in required computation power.

   ‘When input handling is done in ad hoc way, the ‘de facto’
    recognizer, i.e. the input recognition and validation code ends up
    scattered throughout the program, does not match the programmers'
    assumptions about safety and validity of data, and thus provides
    ample opportunities for exploitation.  Moreover, for complex input
    languages the problem of full recognition of valid or expected
    inputs may be *undecidable*, in which case no amount of
    input-checking code or testing will suffice to secure the program.
    Many popular protocols and formats fell into this trap, the
    empirical fact with which security practitioners are all too
    familiar.

   ‘LangSec helps draw the boundary between protocols and API designs
    that can and cannot be secured and implemented securely, and charts
    a way to building truly trustworthy protocols and systems.  A longer
    summary of LangSec in this USENIX Security BoF hand-out, and in the
    talks, articles, and papers below.’

That does look interesting; I'd not heard of it.

-- 
Cheers, Ralph.

From g.branden.robinson at gmail.com  Tue May 21 00:00:47 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Mon, 20 May 2024 09:00:47 -0500
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
In-Reply-To: <202405201314.44KDE7rq170661@freefriends.org>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <202405201314.44KDE7rq170661@freefriends.org>
Message-ID: <20240520140047.4x4lwzs6wmo34uge@illithid>

Hi folks,

At 2024-05-20T07:14:07-0600, arnold at skeeve.com wrote:
> Douglas McIlroy <douglas.mcilroy at dartmouth.edu> wrote:
> > I'm surprised by nonchalance about bad inputs evoking bad program
> > behavior.  That attitude may have been excusable 50 years ago. By
> > now, though, we have seen so much malicious exploitation of open
> > avenues of "undefined behavior" that we can no longer ignore bugs
> > that "can't happen when using the tool correctly". Mature software
> > should not brook incorrect usage.
> 
> It's not nonchalance, not at all!
> 
> The current behavior is to die on the first syntax error, instead of
> trying to be "helpful" by continuing to try to parse the program in
> the hope of reporting other errors.
[...]
> The crashes came because errors cascaded.  I don't see a reason to
> spend valuable, *personal* time on adding defenses *where they aren't
> needed*.
> 
> A steel door on your bedroom closet does no good if your front door is
> made of balsa wood. My change was to stop the badness at the front
> door.
> 
> > I commend attention to the LangSec movement, which advocates for
> > rigorously enforced separation between legal and illegal inputs.
> 
> Illegal input, in gawk, as far as I know, should always cause a syntax
> error report and an immediate exit.
> 
> If it doesn't, that is a bug, and I'll be happy to try to fix it.
> 
> I hope that clarifies things.

For grins, and for a data point from elsewhere in GNU-land, GNU troff is
pretty robust to this sort of thing.  Much as I might like to boast of
having improved it in this area, it appears to have already come with
iron long johns courtesy of James Clark and/or Werner Lemberg.  I threw
troff its own ELF executable as a crude fuzz test some years ago, and I
don't recall needing to fix anything except unhelpfully vague diagnostic
messages (a phenomenon I am predisposed to observe anyway).

I did notice today that in one case we were spewing back out unprintable
characters (newlines, character codes > 127) _in_ one (but only one) of
the diagnostic messages, and while that's ugly, it's not an obvious
exploitation vector to me.

Nevertheless I decided to fix it and it will be in my next push.

So here's the mess you get when feeding GNU troff to itself.  No GNU
troff since before 1.22.3 core dumps on this sort of unprepossessing
input.

$ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' | sort | uniq -c
     17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape sequence parameter
     10 troff:/usr/bin/troff: error: a space character is not allowed in an escape sequence parameter
      1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier
      1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting delimiter
      1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression escape sequence
      1 troff:/usr/bin/troff: error: invalid positional argument number ','
      3 troff:/usr/bin/troff: error: invalid positional argument number '<'
      3 troff:/usr/bin/troff: error: invalid positional argument number 'D'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'E'
     10 troff:/usr/bin/troff: error: invalid positional argument number 'H'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'I9'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'L'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'LD'
      2 troff:/usr/bin/troff: error: invalid positional argument number 'LL'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'LT'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'M'
      4 troff:/usr/bin/troff: error: invalid positional argument number 'P'
      5 troff:/usr/bin/troff: error: invalid positional argument number 'X'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'dH'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'h'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'l'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'p'
      1 troff:/usr/bin/troff: error: invalid positional argument number 'x'
      3 troff:/usr/bin/troff: error: invalid positional argument number '|'
     35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable)
      3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence

The second to last (and most frequent) message in the list above is the
"new" one.  Here's the diff.

diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8d828a01e..596ecf6f9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm)
   }
   else {
     const char *p;
-    for (p = s; *p && csdigit(*p); p++)
-      ;
-    if (*p)
-      copy_mode_error("invalid positional argument number '%1'", s);
+    bool is_valid = true;
+    bool is_printable = true;
+    for (p = s; *p != 0 /* nullptr */; p++) {
+      if (!csdigit(*p))
+       is_valid = false;
+      if (!csprint(*p))
+       is_printable = false;
+    }
+    if (!is_valid) {
+      const char msg[] = "invalid positional argument number";
+      if (is_printable)
+       copy_mode_error("%1 '%2'", msg, s);
+      else
+       copy_mode_error("%1 (unprintable)", msg);
+    }
     else
       input_stack::push(input_stack::get_arg(atoi(s)));
   }

GNU troff may have started out with an easier task in this area than an
AWK or a shell had; its syntax is not block-structured in the same way,
so parser state recovery is easier, and it's _inherently_ a filter.

The only fruitful fuzz attack on groff I can recall was upon indexed
bibliographic database files, which are a binary format.  This went
unresolved for several years[1] but I fixed it for groff 1.23.0.

https://bugs.debian.org/716109

Regards,
Branden

[1] I think I understand the low triage priority.  Few groff users use
    the refer(1) preprocessor, and of those who do, even fewer find
    modern systems so poorly performant at text scanning that they
    desire the services of indxbib(1) to speed lookup of bibliographic
    entries.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/14a220d5/attachment-0001.sig>

From stewart at serissa.com  Tue May 21 00:09:11 2024
From: stewart at serissa.com (Serissa)
Date: Mon, 20 May 2024 10:09:11 -0400
Subject: [TUHS] A fuzzy awk.
Message-ID: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>

Well this is obviously a hot button topic.  AFAIK I was nearby when fuzz-testing for software was invented. I was the main advocate for hiring Andy Payne into the Digital Cambridge Research Lab.  One of his little projects was a thing that generated random but correct C programs and fed them to different compilers or compilers with different switches to see if they crashed or generated incorrect results.  Overnight, his tester filed 300 or so bug reports against the Digital C compiler.  This was met with substantial pushback, but it was a mostly an issue that many of the reports traced to the same underlying bugs.

Bill McKeemon expanded the technique and published "Differential Testing of Software" https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf

Andy had encountered the underlying idea while working as an intern on the Alpha processor development team.  Among many other testers, they used an architectural tester called REX to generate more or less random sequences of instructions, which were then run through different simulation chains (functional, RTL, cycle-accurate) to see if they did the same thing.  Finding user-accessible bugs in hardware seems like a good thing.

The point of generating correct programs (mentioned under the term LangSec here) goes a long way to avoid irritating the maintainers.  Making the test cases short is also maintainer-friendly.  The test generator is also in a position to annotate the source with exactly what it is supposed to do, which is also helpful.

-L


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/c1037ade/attachment.htm>

From clemc at ccc.com  Tue May 21 00:23:56 2024
From: clemc at ccc.com (Clem Cole)
Date: Mon, 20 May 2024 10:23:56 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
Message-ID: <CAC20D2O7iqmEt_UQLqDXb6RhJVQNgDuSWjihP_KjfyCpzyazxw@mail.gmail.com>

I was going to keep silent on this one until I realized I disagree with
both Doug and Rob here (always a little dangerous). But because of personal
experience, I have a pretty strong opinion is not really a win.  Note that I
cribbed this email response from an answer I wrote on Quora to the
question:

*When you are programming and commit a minor error, such as forgetting a
semicolon, the compiler throws an error and makes you fix it for yourself.
Why doesn’t it just fix it by itself and notify you of the fix instead?*


FWIW: The first version of the idea that I now about was DWIM - *Do What I
Mean* feature from BBN’s LISP (that eventually made it into InterLISP). As
the Wikipedia page describes DWIM became known as "Damn Warren's Infernal
Machine" [more details in the DWIM section of the jargon file].  As Doug
points out, the original Pascal implementation for Unix, pix(1), also
supported this idea of fixing your code for you, and as Rob points out, UCB’s
pix(1) took the idea of trying to keep going and make the compile work from
the earlier Cornell PL/C compiler for the IBM 360[1], which to quote
Wikipedia:

“The PL/C compiler had the unusual capability of never failing to compile
> any program, through the use of extensive automatic correction of many
> syntax errors and by converting any remaining syntax errors to output
> statements.”


The problem is that people can be lazy, and instead of using " DWIM" as a
tool to speed up their development and fix their own errors, they just
ignore the errors. In fact, when we were teaching the “Intro to CS” course
at UCB in the early 1980s; we actually had students turn in programs that
had syntax errors in them because pix(1) had corrected their code -- instead
of the student fixing his/her code before handing the program into the TA
(and then they would complain when they got “marked down” on the assignment
— sigh).

IMO: All in all, the experiment failed because many (??most??) people
really don’t work that way. Putting a feature like this in an IDE or even
an editor like emacs might be reasonable since the sources would be
modified, but it means you need to like using an IDE. I also ask --> what
happens when the computer’s (IDE) guess is different from the programmer's
real intent, and since it was ‘fixed’ behind the curtain, you don’t notice
it?

Some other people have suggested that DWIM isn’t a little like spelling
‘auto-correct’ or tools like ‘Grammarly.’ The truth is, I have a love/hate
relationship with auto-correct, particularly on my mobile devices. I'm
dyslexic, so tools like this can be helpful to me sometimes, but I spend a
great deal of my time fighting these types of tools because they are so
often wrong, particularly with a small screen/keyboard, that it is just
“not fun.”

This brings me back to my experience. IMO, auto-correct for programming is
like DWIM all over again, and the cure causes more problems than it solves.

Clem

[1] I should add that after Cornell’s PL/C compiler was introduced, IBM
eventually added a similar feature to its own PL/1, although it was not
nearly as extensive as the Cornell solution. I’m sure you can find people
who liked it, but in both cases, I personally never found it that useful.

>
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/6874536d/attachment.htm>

From chet.ramey at case.edu  Tue May 21 00:26:09 2024
From: chet.ramey at case.edu (Chet Ramey)
Date: Mon, 20 May 2024 10:26:09 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu>
 <20240520134155.7A06E1FB2F@orac.inputplus.co.uk>
Message-ID: <4eb98dcf-a241-44e9-8f73-30a97ac1a353@case.edu>

On 5/20/24 9:41 AM, Ralph Corderoy wrote:
> Hi Chet,
> 
>> Doug wrote:
>>> I'm surprised by nonchalance about bad inputs evoking bad program
>>> behavior.
>>
>> I think the claim is that it's better to stop immediately with an
>> error on invalid input rather than guess at the user's intent and try
>> to go on.
> 
> That aside, having made the decision to patch up the input so more
> punched cards are consumed, the patch should be bug free.
> 
> Say it's inserting a semicolon token for pretence.  It should have
> initialised source-file locations just as if it were real.  Not an
> uninitialised pointer to a source filename so a later dereference
> failed.
> 
> I can see an avalanche of errors in an earlier gawk caused problems, but
> each time there would have been a first patch of the input which made
> a mistake causing the pebble to start rolling.  My understanding is that
> there was potentially a lot of these and rather than fix them it was
> more productive of the limited time to stop patching the input.  Then
> the code which patched could be deleted, getting rid of the buggy bits
> along the way?

Maybe we're talking about the same thing. My impression is that at
each point there was more than one potential token to insert and go on,
and gawk chose one (probably the most common one), in the hopes that it
would be able to report as many errors as possible. There's always the
chance you'll be wrong there.

(I have no insight into the actual nature of these issues, or the actual
corruption that caused the crashes, so take the next with skepticism.)

And then rather than go back and modify other state after inserting
this token -- which gawk did not do -- for the sole purpose of making
this guessing more crash-resistant, Arnold chose a different approach:
exit on invalid input.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet at case.edu    http://tiswww.cwru.edu/~chet/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/62bb1a35/attachment.sig>

From ake.nordin at netia.se  Tue May 21 01:39:02 2024
From: ake.nordin at netia.se (=?UTF-8?Q?=C3=85ke_Nordin?=)
Date: Mon, 20 May 2024 17:39:02 +0200
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <20240520135404.1B4181FB2F@orac.inputplus.co.uk>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <20240520135404.1B4181FB2F@orac.inputplus.co.uk>
Message-ID: <a7803330-e7ef-457b-822f-9b035028a2b5@netia.se>

On 2024-05-20 15:54, Ralph Corderoy wrote:

> Doug wrote:
>> I commend attention to the LangSec movement, which advocates for
>> rigorously enforced separation between legal and illegal inputs.
>     https://langsec.org
>
>    ‘The Language-theoretic approach (LangSec) regards the Internet
>     insecurity epidemic as a consequence of ‘ad hoc’ programming of
>     input handling at all layers of network stacks, and in other kinds
>     of software stacks.  LangSec posits that the only path to
>     trustworthy software that takes untrusted inputs is treating all
>     valid or expected inputs as a formal language, and the respective
>     input-handling routines as a ‘recognizer’ for that language.

. . .

>    ‘LangSec helps draw the boundary between protocols and API designs
>     that can and cannot be secured and implemented securely, and charts
>     a way to building truly trustworthy protocols and systems.  A longer
>     summary of LangSec in this USENIX Security BoF hand-out, and in the
>     talks, articles, and papers below.’

Yes, it's an interesting concept. Those *n?x tools that have
lex/yacc frontends are probably closer to this than the average
hack.

It may become hard to reconcile this with the robustness principle 
(Be conservative in what you send, be liberal in what you accept)
that Jon Postel popularized. Maybe it becomes necessary, though.

-- 
Åke Nordin <ake.nordin at netia.se>, resident Net/Lunix/telecom geek.
Netia Data AB, Stockholm SWEDEN *46#7O466OI99#


From paul.winalski at gmail.com  Tue May 21 01:43:49 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Mon, 20 May 2024 11:43:49 -0400
Subject: [TUHS] Documentation (was On Bloat and the Idea of Small
 Specialized Tools)
In-Reply-To: <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
 <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
 <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>
Message-ID: <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>

On Mon, May 20, 2024 at 2:08 AM Adam Thornton <athornton at gmail.com> wrote:

> I can't tell you--although some of you will know--what a delight it is to
> be working on a project with an actual documentation engineer.
>
> That person (Jonathan Sick, if any of you want to hire him) has engineered
> things such that it is easy to write good documentation for the projects we
> write, and not very onerous.
>
> Design for documentability, testability, and ease of maintenance are what
distinguishes good software engineering from hackery.

Back when I worked in DEC's software development tools group, we had
professional technical writers who write the manuals and online help text.
There was an unexpected (at least by me) benefit during a project's design
phase, too.  Documentation was written in parallel with the code, so once
the user interface specification was arrived at, first order of business
was to sit down with the tech writer and explain it to them.  Sometimes in
the process of doing that youd stop and think, "wait a minute--we don't
really want it doing that".  Or you'd find that you bhad difficulty
articulating exactly how a particular feature behaves.  That's a red flag
that you've designed the feature to be too obscure and complex, or that
there's something flat-out wrong with it.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/7133f5fe/attachment.htm>

From paul.winalski at gmail.com  Tue May 21 02:06:49 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Mon, 20 May 2024 12:06:49 -0400
Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.)
In-Reply-To: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
Message-ID: <CABH=_VSDFUhcXnH707nTP3UrQyQBXDdZBFJ077EizFCO1eM=Xw@mail.gmail.com>

On Mon, May 20, 2024 at 9:17 AM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

> I'm surprised by nonchalance about bad inputs evoking bad program
> behavior. That attitude may have been excusable 50 years ago. By now,
> though, we have seen so much malicious exploitation of open avenues of
> "undefined behavior" that we can no longer ignore bugs that "can't happen
> when using the tool correctly". Mature software should not brook incorrect
> usage.
>
> Accepting bad inputs can also lead to security issues.  The data breaches
from SQL-based attacks are a modern case in point.

IMO, as a programmer you owe it to your users to do your best to detect bad
input and to handle it in a graceful fashion.  Nothing is more frustrating
to a user than to have a program blow up in their face with a seg fault, or
even worse, simply exit silently.

As the DEC compiler team's expert on object files, I was called on to add
object file support to a compiler back end originally targeted to VMS
only.  I inherited support of the object file generator for Unix COFF and
later wrote the support for Microsoft PECOFF and ELF.  When our group was
bought by Intel I did the object file support for Apple OS X MACH-O in the
Intel compiler back end.

I found that the folks who write linkers are particularly lazy about error
checking and error handling.  They assume that the compiler always
generates clean object files.  That's OK I suppose if the compiler and
linker people are in the same organization.  If the linker falls over you
can just go down the hall and have the linker developer debug the issue and
tell you where you went wrong.  But that doesn't work when they work for
different companies and the compiler person doesn't have access to the
linker sources.  I ran into a lot of cases where my buggy object file
caused the linker to seg fault or, even worse, simply exit without an error
message.

I ended up writing a very thorough formatted dumper for each object file
format that did very thorough checking for proper syntax and as many
semantic errors (e.g., symbol table index number out of range) as I could.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/7912ce88/attachment.htm>

From benjamin.p.kallus.gr at dartmouth.edu  Tue May 21 02:09:54 2024
From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus)
Date: Mon, 20 May 2024 12:09:54 -0400
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <a7803330-e7ef-457b-822f-9b035028a2b5@netia.se>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <20240520135404.1B4181FB2F@orac.inputplus.co.uk>
 <a7803330-e7ef-457b-822f-9b035028a2b5@netia.se>
Message-ID: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>

> It may become hard to reconcile this with the robustness principle
> (Be conservative in what you send, be liberal in what you accept)
> that Jon Postel popularized. Maybe it becomes necessary, though.

Yes; the LangSec people essentially reject the robustness principle.

See https://langsec.org/papers/postel-patch.pdf

-Ben

From andrew at humeweb.com  Tue May 21 02:37:55 2024
From: andrew at humeweb.com (Andrew Hume)
Date: Mon, 20 May 2024 09:37:55 -0700
Subject: [TUHS] Documentation (was On Bloat and the Idea of Small
 Specialized Tools)
In-Reply-To: <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
 <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
 <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>
 <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>
Message-ID: <10EC571B-A75C-47EE-BECC-1B1800B9843C@humeweb.com>


> On May 20, 2024, at 8:43 AM, Paul Winalski <paul.winalski at gmail.com> wrote:
>   Sometimes in the process of doing that youd stop and think, "wait a minute--we don't really want it doing that".  Or you'd find that you bhad difficulty articulating exactly how a particular feature behaves.  That's a red flag that you've designed the feature to be too obscure and complex, or that there's something flat-out wrong with it.

that’s what i used doug mcilroy for! i especially remember that for mk(1).


From woods at robohack.ca  Tue May 21 03:30:54 2024
From: woods at robohack.ca (Greg A. Woods)
Date: Mon, 20 May 2024 10:30:54 -0700
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAC20D2O7iqmEt_UQLqDXb6RhJVQNgDuSWjihP_KjfyCpzyazxw@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
 <CAC20D2O7iqmEt_UQLqDXb6RhJVQNgDuSWjihP_KjfyCpzyazxw@mail.gmail.com>
Message-ID: <m1s96qc-0036s2C@more.local>

At Mon, 20 May 2024 10:23:56 -0400, Clem Cole <clemc at ccc.com> wrote:
Subject: [TUHS] Re: The 'usage: ...' message. (Was: On Bloat...)
>
> This brings me back to my experience. IMO, auto-correct for programming is
> like DWIM all over again, and the cure causes more problems than it solves.

We're deep down that rabbit hole this time with LLM/GPT systems
generating large swathes of code that I believe all too often gets into
production without any human programmer fully vetting its fitness for
purpose, or perhaps even understanding it.

--
					Greg A. Woods <gwoods at acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods at robohack.ca>
Planix, Inc. <woods at planix.com>     Avoncote Farms <woods at avoncote.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP Digital Signature
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/76733c97/attachment.sig>

From stuff at riddermarkfarm.ca  Tue May 21 03:40:52 2024
From: stuff at riddermarkfarm.ca (Stuff Received)
Date: Mon, 20 May 2024 13:40:52 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
References: <CAKH6PiUoQBjD3cMHWELN3Hp+jf0=2fyFcqNbfwwe76YfKjGFiw@mail.gmail.com>
 <CAKzdPgyr25oP2GP90H0r6U2H6SJjNMxhoWZ6JZmHsrapogYNoQ@mail.gmail.com>
Message-ID: <f6a54237-fd9b-9582-104c-42ea7a9a6172@riddermarkfarm.ca>

On 2024-05-19 20:58, Rob Pike wrote:
> The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used 
> batch input. It tried automatically to keep things running after a 
> parsing error by inserting some token - semicolon, parenthesis, whatever 
> seemed best - and continuing to parse, in order to maximize the amount 
> of input that could be parsed before giving up. At least, that's what I 
> took the motivation to be. It rarely succeeded in fixing the actual 
> problem, despite PL/I being plastered with semicolons, but it did tend 
> to ferret out more errors per run. I found the tactic helpful.
> 
> -rob
> 

Possibly way off topic but Toronto allowed anyone to run PL/C decks for 
free, which I often did.  One day, they decided to allow all of the card 
to be read as text and my card numbers generated all sorts of errors. 
(At least easily fixed by a visit the card punch.)

S.

From ylee at columbia.edu  Tue May 21 04:38:06 2024
From: ylee at columbia.edu (Yeechang Lee)
Date: Mon, 20 May 2024 11:38:06 -0700
Subject: [TUHS] Documentation (was On Bloat and the Idea of Small
 Specialized Tools)
In-Reply-To: <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
 <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
 <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>
 <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>
Message-ID: <26187.39054.137077.761468@dobie-old.ylee.org>

Paul Winalski says:
> Sometimes in the process of doing that youd stop and think, "wait a
> minute--we don't really want it doing that".  Or you'd find that you
> bhad difficulty articulating exactly how a particular feature
> behaves.  That's a red flag that you've designed the feature to be
> too obscure and complex, or that there's something flat-out wrong
> with it.

My understanding is that an unexpected result of the requirement to draft all federal laws in Canada in both English and French is something similar: The discussion process ensuring that a bill's meaning is identical in both languages helps rid the text of ambiguities and errors regardless of language.

From phil at ultimate.com  Tue May 21 05:27:24 2024
From: phil at ultimate.com (Phil Budne)
Date: Mon, 20 May 2024 15:27:24 -0400
Subject: [TUHS] Documentation (was On Bloat and the Idea of Small
 Specialized Tools)
In-Reply-To: <26187.39054.137077.761468@dobie-old.ylee.org>
References: <CAKH6PiXaYGEUmVFRX99eM6s3+nTJrbVvkuBRa-Awhhd69xzJrg@mail.gmail.com>
 <20240518203319.3oAKtOSk@steffen%sdaoden.eu>
 <CAOkr1zUeW9cMt=Dk+bXbRtCkh=m7bZtkKNaqO8zF43kxt6V+ww@mail.gmail.com>
 <CAP2nic07dtEUjgC4Ex8g0iOaDuGsXyhTxY23v0x8wG_cb3D_+A@mail.gmail.com>
 <CABH=_VQU1hQ0FhxWg8ZC=U0VTG0gZQuQKzyn13DCdJwEKvRz2Q@mail.gmail.com>
 <26187.39054.137077.761468@dobie-old.ylee.org>
Message-ID: <202405201927.44KJROeX064950@ultimate.com>

Yeechang Lee:
> My understanding is that an unexpected result of the requirement to
> draft all federal laws in Canada in both English and French is
> something similar: The discussion process ensuring that a bill's
> meaning is identical in both languages helps rid the text of
> ambiguities and errors regardless of language.

It always seemed to me that ISO standards were written to be equally
incomprehensible in all languages, substituting terms like Protocol
Data Unit (PDU) for familiar ones like Packet.

In the early Internet, where there wasn't ANY money to be made in
antisocial conduct, it was easier to justify sentiments like "Rough
consensus and working code" and "be liberal in what you accept".

Lest ye forget, "industry standards" were once limited to things like
magnetic patterns on half-inch tape and the serial transmission of
bits, and at the LOWEST of levels.  Reading a tape written on another
vendor's system wasn't easy when I got started in the early 80's; In
addition to ASCII and EBCDIC, there were still systems with
vendor-specific 6-bit character sets, never mind punched cards.  I
remember going on a campus tour in the late 70's where there was an
ASCII terminal hooked up to some system that had BASIC (the standard
at the time was ANSI "Minimal BASIC"; a full(er) standard took long
enough that it was dead on arrival), but instead of "RETURN" required
typing CTRL/C (defined in ASCII as End Of Text) to enter a line!

In that context, getting ANYTHING working across vendors was a
victory, and having one system refuse to speak to another because of
some small detail in what one of them considered reasonable (or not)
was asking for trouble.

The times and stakes today are distinctly different.

From johnl at taugh.com  Tue May 21 06:02:26 2024
From: johnl at taugh.com (John Levine)
Date: 20 May 2024 16:02:26 -0400
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
Message-ID: <20240520200226.80F428B9493A@ary.qy>

It appears that Ben Kallus <benjamin.p.kallus.gr at dartmouth.edu> said:
>> It may become hard to reconcile this with the robustness principle
>> (Be conservative in what you send, be liberal in what you accept)
>> that Jon Postel popularized. Maybe it becomes necessary, though.
>
>Yes; the LangSec people essentially reject the robustness principle.
>
>See https://langsec.org/papers/postel-patch.pdf

On the contrary, they actually understand it.

Postel was widely misunderstood to say that you should try to accept
arbitrary garbage. People who knew him tell me that he meant to be
liberal when the spec is ambiguous, not to allow stuff that is just
wrong. As their quote from RFC 1122 points out, he also said you
should be prepared for arbitrary garbage so you can reject it.

R's,
John

From johnl at taugh.com  Tue May 21 06:10:58 2024
From: johnl at taugh.com (John Levine)
Date: 20 May 2024 16:10:58 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <CAC20D2O7iqmEt_UQLqDXb6RhJVQNgDuSWjihP_KjfyCpzyazxw@mail.gmail.com>
Message-ID: <20240520201100.50BE18B94A62@ary.qy>

It appears that Clem Cole <clemc at ccc.com> said:
>“The PL/C compiler had the unusual capability of never failing to compile
>> any program, through the use of extensive automatic correction of many
>> syntax errors and by converting any remaining syntax errors to output
>> statements.”
>
>The problem is that people can be lazy, and instead of using " DWIM" as a
>tool to speed up their development and fix their own errors, they just
>ignore the errors. ...

PL/C was a long time ago in the early 1970s. People used it on batch
systems whre you handed in your cards at the window, waited a while,
and later got your printout back. Or at advanced places, you could
run the cards through the reader yourself, then wait until the batch
ran.

In that environment, the benefit from possibly guessing an error
correction right meant fewer trips to the card reader. In my youth I
did a fair amount of programming that way in WATFOR/WATFIV and Algol W
where we really tried to get the programs right since we wanted to
finish up and go home.

When I was using interactive systems where you could fix one bug and
try again, over and over, it seemed like cheating.

R's,
John

From lm at mcvoy.com  Tue May 21 06:11:22 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Mon, 20 May 2024 13:11:22 -0700
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <20240520200226.80F428B9493A@ary.qy>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy>
Message-ID: <20240520201122.GC27662@mcvoy.com>

On Mon, May 20, 2024 at 04:02:26PM -0400, John Levine wrote:
> It appears that Ben Kallus <benjamin.p.kallus.gr at dartmouth.edu> said:
> >> It may become hard to reconcile this with the robustness principle
> >> (Be conservative in what you send, be liberal in what you accept)
> >> that Jon Postel popularized. Maybe it becomes necessary, though.
> >
> >Yes; the LangSec people essentially reject the robustness principle.
> >
> >See https://langsec.org/papers/postel-patch.pdf
> 
> On the contrary, they actually understand it.
> 
> Postel was widely misunderstood to say that you should try to accept
> arbitrary garbage. People who knew him tell me that he meant to be
> liberal when the spec is ambiguous, not to allow stuff that is just
> wrong. As their quote from RFC 1122 points out, he also said you
> should be prepared for arbitrary garbage so you can reject it.

Yeah, I read the pdf and I took away the same thing as John.
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From benjamin.p.kallus.gr at dartmouth.edu  Tue May 21 07:00:40 2024
From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus)
Date: Mon, 20 May 2024 17:00:40 -0400
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <20240520201122.GC27662@mcvoy.com>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com>
Message-ID: <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>

What I meant was that the LangSec people reject the robustness
principle as it is commonly understood (i.e., make a "reasonable"
guess when receiving garbage), not necessarily that their view is
incompatible with Postel's original vision. This interpretation of the
principle is pretty widespread; take a look at the Nginx mailing list
if you have any doubt. I attribute this to the same phenomenon that
inverted the meaning of REST.

-Ben

From johnl at taugh.com  Tue May 21 07:03:23 2024
From: johnl at taugh.com (John R Levine)
Date: 20 May 2024 17:03:23 -0400
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com>
 <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
Message-ID: <b15fde3a-1dda-7506-3cd9-382c83d18be1@taugh.com>

> What I meant was that the LangSec people reject the robustness
> principle as it is commonly understood (i.e., make a "reasonable"
> guess when receiving garbage), not necessarily that their view is
> incompatible with Postel's original vision. This interpretation of the
> principle is pretty widespread; take a look at the Nginx mailing list
> if you have any doubt. I attribute this to the same phenomenon that
> inverted the meaning of REST.

Oh, OK, no disagreement there.  I'm as tired as you are of people invoking 
Postel to excuse slovenly code.

Regards,
John Levine, johnl at taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

From lm at mcvoy.com  Tue May 21 07:14:38 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Mon, 20 May 2024 14:14:38 -0700
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy>
 <20240520201122.GC27662@mcvoy.com>
 <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
Message-ID: <20240520211438.GF27662@mcvoy.com>

On Mon, May 20, 2024 at 05:00:40PM -0400, Ben Kallus wrote:
> What I meant was that the LangSec people reject the robustness
> principle as it is commonly understood (i.e., make a "reasonable"
> guess when receiving garbage)

That most certainly is not what I took from what Postel said.  And I
say that as someone who designed a distributed system that had client
and server sides and had to make that work across versions from last
week to 10-20 years ago.

I took it more as "Be more and more careful what you say, get that more
correct with each release, but tolerate the less correct stuff you might
get from earlier versions".  In no way did I think he meant ``make a
"reasonable" guess when receiving garbage''.  Garbage is garbage, you
error on that.  
-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From benjamin.p.kallus.gr at dartmouth.edu  Tue May 21 07:46:48 2024
From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus)
Date: Mon, 20 May 2024 17:46:48 -0400
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <20240520211438.GF27662@mcvoy.com>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com>
 <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
 <20240520211438.GF27662@mcvoy.com>
Message-ID: <CAB6pCSbFDabCiRpvx0uk=5BMR1=2F-ATJRWW2kHDz7VYE10=Yw@mail.gmail.com>

My point was that, regardless of Postel's original intent, many people
have interpreted his principle to mean that accepting garbage is good.
*This* interpretation is incompatible with LangSec.

See RFC 9413 for an exploration of the many interpretations of
Postel's principle.

-Ben

From lm at mcvoy.com  Tue May 21 07:57:40 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Mon, 20 May 2024 14:57:40 -0700
Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.)
In-Reply-To: <CAB6pCSbFDabCiRpvx0uk=5BMR1=2F-ATJRWW2kHDz7VYE10=Yw@mail.gmail.com>
References: <CAB6pCSaf-YuBFwvqqH19tt=epqSesga7qFERKNWrFZfS3BuBAg@mail.gmail.com>
 <20240520200226.80F428B9493A@ary.qy>
 <20240520201122.GC27662@mcvoy.com>
 <CAB6pCSYdhxS2hxQE768Pgj62sfNqgM+gY_4L1z3fiyfDXAFPhQ@mail.gmail.com>
 <20240520211438.GF27662@mcvoy.com>
 <CAB6pCSbFDabCiRpvx0uk=5BMR1=2F-ATJRWW2kHDz7VYE10=Yw@mail.gmail.com>
Message-ID: <20240520215740.GG27662@mcvoy.com>

Those would be the stupid people and you can't fix stupid.  Seriously,
people can twist anything into anything.  Just because dumb people
didn't understand his principle doesn't mean it was a bad principle.

On Mon, May 20, 2024 at 05:46:48PM -0400, Ben Kallus wrote:
> My point was that, regardless of Postel's original intent, many people
> have interpreted his principle to mean that accepting garbage is good.
> *This* interpretation is incompatible with LangSec.
> 
> See RFC 9413 for an exploration of the many interpretations of
> Postel's principle.
> 
> -Ben

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From cowan at ccil.org  Tue May 21 11:14:55 2024
From: cowan at ccil.org (John Cowan)
Date: Mon, 20 May 2024 21:14:55 -0400
Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...)
In-Reply-To: <20240520201100.50BE18B94A62@ary.qy>
References: <CAC20D2O7iqmEt_UQLqDXb6RhJVQNgDuSWjihP_KjfyCpzyazxw@mail.gmail.com>
 <20240520201100.50BE18B94A62@ary.qy>
Message-ID: <CAD2gp_T06aeZ607fr6veE9T2CmSpxOPE+=nT5ywAUxAhGBjVZQ@mail.gmail.com>

On Mon, May 20, 2024 at 4:11 PM John Levine <johnl at taugh.com> wrote:

It appears that Clem Cole <clemc at ccc.com> said:
> >“The PL/C compiler had the unusual capability of never failing to compile
> >> any program, through the use of extensive automatic correction of many
> >> syntax errors and by converting any remaining syntax errors to output
> >> statements.”
> PL/C was a long time ago in the early 1970s. People used it on batch
> systems whre you handed in your cards at the window, waited a while,
> and later got your printout back. Or at advanced places, you could
> run the cards through the reader yourself, then wait until the batch
> ran.


PL/C was a 3rd-generation autocorrection programming language.  CORC was
the 1962 version and CUPL was the 1966 version (same date as DWIM), neither
of them based on PL/I.  There is an implementation of both at <
http://www.catb.org/~esr/cupl/>.

The Wikipedia DWIM article also points to Magit, the Emacs git client.

>
> In that environment, the benefit from possibly guessing an error
> correction right meant fewer trips to the card reader. In my youth I
> did a fair amount of programming that way in WATFOR/WATFIV and Algol W
> where we really tried to get the programs right since we wanted to
> finish up and go home.
>
> When I was using interactive systems where you could fix one bug and
> try again, over and over, it seemed like cheating.
>
> R's,
> John
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240520/98aa6102/attachment.htm>

From robpike at gmail.com  Tue May 21 11:56:30 2024
From: robpike at gmail.com (Rob Pike)
Date: Tue, 21 May 2024 11:56:30 +1000
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
Message-ID: <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>

Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before
1998. And I believe Doug McIlroy was generating random regular expressions
to compare different implementations. It's probably impossible to decide
who invented fuzzing, so the credit will surely go to the person who named
it.

-rob


On Tue, May 21, 2024 at 12:09 AM Serissa <stewart at serissa.com> wrote:

> Well this is obviously a hot button topic.  AFAIK I was nearby when
> fuzz-testing for software was invented. I was the main advocate for hiring
> Andy Payne into the Digital Cambridge Research Lab.  One of his little
> projects was a thing that generated random but correct C programs and fed
> them to different compilers or compilers with different switches to see if
> they crashed or generated incorrect results.  Overnight, his tester filed
> 300 or so bug reports against the Digital C compiler.  This was met with
> substantial pushback, but it was a mostly an issue that many of the reports
> traced to the same underlying bugs.
>
> Bill McKeemon expanded the technique and published "Differential Testing
> of Software"
> https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
>
> Andy had encountered the underlying idea while working as an intern on the
> Alpha processor development team.  Among many other testers, they used an
> architectural tester called REX to generate more or less random sequences
> of instructions, which were then run through different simulation chains
> (functional, RTL, cycle-accurate) to see if they did the same thing.
> Finding user-accessible bugs in hardware seems like a good thing.
>
> The point of generating correct programs (mentioned under the term LangSec
> here) goes a long way to avoid irritating the maintainers.  Making the test
> cases short is also maintainer-friendly.  The test generator is also in a
> position to annotate the source with exactly what it is supposed to do,
> which is also helpful.
>
> -L
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240521/d6aff2f5/attachment.htm>

From lm at mcvoy.com  Tue May 21 12:47:43 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Mon, 20 May 2024 19:47:43 -0700
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
Message-ID: <20240521024743.GE25728@mcvoy.com>

I think the title might go to my OS prof, Bart Miller.  He did a paper 

https://www.paradyn.org/papers/fuzz.pdf

that named it that in 1990.  

On Tue, May 21, 2024 at 11:56:30AM +1000, Rob Pike wrote:
> Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before
> 1998. And I believe Doug McIlroy was generating random regular expressions
> to compare different implementations. It's probably impossible to decide
> who invented fuzzing, so the credit will surely go to the person who named
> it.
> 
> -rob
> 
> 
> On Tue, May 21, 2024 at 12:09???AM Serissa <stewart at serissa.com> wrote:
> 
> > Well this is obviously a hot button topic.  AFAIK I was nearby when
> > fuzz-testing for software was invented. I was the main advocate for hiring
> > Andy Payne into the Digital Cambridge Research Lab.  One of his little
> > projects was a thing that generated random but correct C programs and fed
> > them to different compilers or compilers with different switches to see if
> > they crashed or generated incorrect results.  Overnight, his tester filed
> > 300 or so bug reports against the Digital C compiler.  This was met with
> > substantial pushback, but it was a mostly an issue that many of the reports
> > traced to the same underlying bugs.
> >
> > Bill McKeemon expanded the technique and published "Differential Testing
> > of Software"
> > https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
> >
> > Andy had encountered the underlying idea while working as an intern on the
> > Alpha processor development team.  Among many other testers, they used an
> > architectural tester called REX to generate more or less random sequences
> > of instructions, which were then run through different simulation chains
> > (functional, RTL, cycle-accurate) to see if they did the same thing.
> > Finding user-accessible bugs in hardware seems like a good thing.
> >
> > The point of generating correct programs (mentioned under the term LangSec
> > here) goes a long way to avoid irritating the maintainers.  Making the test
> > cases short is also maintainer-friendly.  The test generator is also in a
> > position to annotate the source with exactly what it is supposed to do,
> > which is also helpful.
> >
> > -L
> >
> >
> >

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From stewart at serissa.com  Tue May 21 12:54:36 2024
From: stewart at serissa.com (Lawrence Stewart)
Date: Mon, 20 May 2024 22:54:36 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <20240521024743.GE25728@mcvoy.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <20240521024743.GE25728@mcvoy.com>
Message-ID: <BFB0C596-76F7-4807-9797-8056C0E01BE0@serissa.com>

Good to learn more of the history!  I wonder when the technique got started on the hardware side?  
I wouldn’t be surprised if IBM were doing some of this for the S/360 since it was a nearly 
compatible set of systems.
-L

> On May 20, 2024, at 10:47 PM, Larry McVoy <lm at mcvoy.com> wrote:
> 
> I think the title might go to my OS prof, Bart Miller.  He did a paper 
> 
> https://www.paradyn.org/papers/fuzz.pdf
> 
> that named it that in 1990.  
> 
> On Tue, May 21, 2024 at 11:56:30AM +1000, Rob Pike wrote:
>> Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before
>> 1998. And I believe Doug McIlroy was generating random regular expressions
>> to compare different implementations. It's probably impossible to decide
>> who invented fuzzing, so the credit will surely go to the person who named
>> it.
>> 
>> -rob
>> 
>> 
>> On Tue, May 21, 2024 at 12:09???AM Serissa <stewart at serissa.com> wrote:
>> 
>>> Well this is obviously a hot button topic.  AFAIK I was nearby when
>>> fuzz-testing for software was invented. I was the main advocate for hiring
>>> Andy Payne into the Digital Cambridge Research Lab.  One of his little
>>> projects was a thing that generated random but correct C programs and fed
>>> them to different compilers or compilers with different switches to see if
>>> they crashed or generated incorrect results.  Overnight, his tester filed
>>> 300 or so bug reports against the Digital C compiler.  This was met with
>>> substantial pushback, but it was a mostly an issue that many of the reports
>>> traced to the same underlying bugs.
>>> 
>>> Bill McKeemon expanded the technique and published "Differential Testing
>>> of Software"
>>> https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
>>> 
>>> Andy had encountered the underlying idea while working as an intern on the
>>> Alpha processor development team.  Among many other testers, they used an
>>> architectural tester called REX to generate more or less random sequences
>>> of instructions, which were then run through different simulation chains
>>> (functional, RTL, cycle-accurate) to see if they did the same thing.
>>> Finding user-accessible bugs in hardware seems like a good thing.
>>> 
>>> The point of generating correct programs (mentioned under the term LangSec
>>> here) goes a long way to avoid irritating the maintainers.  Making the test
>>> cases short is also maintainer-friendly.  The test generator is also in a
>>> position to annotate the source with exactly what it is supposed to do,
>>> which is also helpful.
>>> 
>>> -L
>>> 
>>> 
>>> 
> 
> -- 
> ---
> Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat


From robpike at gmail.com  Tue May 21 13:36:13 2024
From: robpike at gmail.com (Rob Pike)
Date: Tue, 21 May 2024 13:36:13 +1000
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <BFB0C596-76F7-4807-9797-8056C0E01BE0@serissa.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <20240521024743.GE25728@mcvoy.com>
 <BFB0C596-76F7-4807-9797-8056C0E01BE0@serissa.com>
Message-ID: <CAKzdPgzBeGRmCLUezUsaY3xM+Q3BJYKGCTQnLO6gVsv4EvsCFg@mail.gmail.com>

Eventually Dennis told Ron to stop as he wasn't interested in protecting
against insane things like "unsigned register union". Now that computing
has become more adversarial, he might feel differently.

-rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240521/0409a744/attachment.htm>

From ggm at algebras.org  Tue May 21 13:53:35 2024
From: ggm at algebras.org (George Michaelson)
Date: Tue, 21 May 2024 13:53:35 +1000
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
Message-ID: <CAKr6gn0-Hh-2hniTO2ReZTObtzX+2ar0q58-dRFPc6PvYpiQSg@mail.gmail.com>

On Tue, May 21, 2024 at 11:56 AM Rob Pike <robpike at gmail.com> wrote:
>It's probably impossible to decide who invented fuzzing, so the credit will surely go to the person who named it.

That theory probably applies to the Earl of Sandwich, And the Earl of
Cardigan. Hoare Belisha also did ok for giant orange balls at zebra
crossings. I'm less sure the Earl of Zebra feels recognised, or that
Eugène-René Poubelle feels happy with his namesake (he should do, dust
bins are huge)

From tuhs at tuhs.org  Tue May 21 21:59:50 2024
From: tuhs at tuhs.org (=?utf-8?b?UGV0ZXIgV2VpbmJlcmdlciAo5rip5Y2a5qC8KSB2aWEgVFVIUw==?=)
Date: Tue, 21 May 2024 07:59:50 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CAKzdPgzBeGRmCLUezUsaY3xM+Q3BJYKGCTQnLO6gVsv4EvsCFg@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <20240521024743.GE25728@mcvoy.com>
 <BFB0C596-76F7-4807-9797-8056C0E01BE0@serissa.com>
 <CAKzdPgzBeGRmCLUezUsaY3xM+Q3BJYKGCTQnLO6gVsv4EvsCFg@mail.gmail.com>
Message-ID: <CAOUkXSoZJuG5qveOapqV3KP5BqqEDSX=ydHyys5pz8_E8oB=XA@mail.gmail.com>

On a lesser note, one day I got tired of C compiler crashes (probably
on the Vax, possibly originating in my code generator) and converted
them into 'fatal internal error' messages.

On Mon, May 20, 2024 at 11:36 PM Rob Pike <robpike at gmail.com> wrote:
>
> Eventually Dennis told Ron to stop as he wasn't interested in protecting against insane things like "unsigned register union". Now that computing has become more adversarial, he might feel differently.
>
> -rob
>

From paul.winalski at gmail.com  Wed May 22 02:59:38 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Tue, 21 May 2024 12:59:38 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
Message-ID: <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>

On Tue, May 21, 2024 at 12:09 AM Serissa <stewart at serissa.com> wrote:

> Well this is obviously a hot button topic.  AFAIK I was nearby when
>> fuzz-testing for software was invented. I was the main advocate for hiring
>> Andy Payne into the Digital Cambridge Research Lab.  One of his little
>> projects was a thing that generated random but correct C programs and fed
>> them to different compilers or compilers with different switches to see if
>> they crashed or generated incorrect results.  Overnight, his tester filed
>> 300 or so bug reports against the Digital C compiler.  This was met with
>> substantial pushback, but it was a mostly an issue that many of the reports
>> traced to the same underlying bugs.
>>
>> Bill McKeemon expanded the technique and published "Differential Testing
>> of Software"
>> https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
>>
>
In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product
teams to introduce fuzz testing into our testing process.  As with the C
compiler work at DEC Cambridge, fuzz testing for other compilers (Fortran,
PL/I) also found large numbers of bugs.

The pushback from the compiler folks was mainly a matter of priorities.
Fuzz testing is very adept at finding edge conditions, but most failing
fuzz tests have syntax that no human programmer would ever write.  As a
compiler engineer you have limited time to devote to bug testing.  Do you
spend that time addressing real customer issues that have been reported or
do you spend it fixing problems with code that no human being would ever
write?  To take an example that really happened, a fuzz test consisting of
100 nested parentheses caused an overflow in a parser table (it could only
handle 50 nested parens).  Is that worth fixing?

As you pointed out, fuzz test failures tend to occur in clusters and many
of the failures eventually are traced to the same underlying bug.  Which
leads to the counter-argument to the pushback.  The fuzz tests are finding
real underlying bugs.  Why not fix them before a customer runs into them?
That very thing did happen several times.  A customer-reported bug was
fixed and suddenly several of the fuzz test problems that had been reported
went away.  Another consideration is that, even back in the 1980s, humans
weren't the only ones writing programs.  There were programs writing
programs and they sometimes produced bizarre (but syntactically correct)
code.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240521/d3356ee0/attachment.htm>

From tuhs at tuhs.org  Wed May 22 03:56:03 2024
From: tuhs at tuhs.org (segaloco via TUHS)
Date: Tue, 21 May 2024 17:56:03 +0000
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
Message-ID: <KCjmXMPI5ZDuexJE27aZRTMulIK9fIBcXHIE3KYFZCykgH1WVny1RaA2T7gl2POe0b25M0xJT67DxMJTMQTXTth8-CC5UPQMxPta4jlzbYI=@protonmail.com>

On Tuesday, May 21st, 2024 at 9:59 AM, Paul Winalski <paul.winalski at gmail.com> wrote:

> On Tue, May 21, 2024 at 12:09 AM Serissa <stewart at serissa.com> wrote:
> 
> > > Well this is obviously a hot button topic. AFAIK I was nearby when fuzz-testing for software was invented. I was the main advocate for hiring Andy Payne into the Digital Cambridge Research Lab. One of his little projects was a thing that generated random but correct C programs and fed them to different compilers or compilers with different switches to see if they crashed or generated incorrect results. Overnight, his tester filed 300 or so bug reports against the Digital C compiler. This was met with substantial pushback, but it was a mostly an issue that many of the reports traced to the same underlying bugs.
> > > 
> > > Bill McKeemon expanded the technique and published "Differential Testing of Software" https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
> 
> In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product teams to introduce fuzz testing into our testing process. As with the C compiler work at DEC Cambridge, fuzz testing for other compilers (Fortran, PL/I) also found large numbers of bugs.
> 
> The pushback from the compiler folks was mainly a matter of priorities. Fuzz testing is very adept at finding edge conditions, but most failing fuzz tests have syntax that no human programmer would ever write. As a compiler engineer you have limited time to devote to bug testing. Do you spend that time addressing real customer issues that have been reported or do you spend it fixing problems with code that no human being would ever write? To take an example that really happened, a fuzz test consisting of 100 nested parentheses caused an overflow in a parser table (it could only handle 50 nested parens). Is that worth fixing?
> 
> As you pointed out, fuzz test failures tend to occur in clusters and many of the failures eventually are traced to the same underlying bug. Which leads to the counter-argument to the pushback. The fuzz tests are finding real underlying bugs. Why not fix them before a customer runs into them? That very thing did happen several times. A customer-reported bug was fixed and suddenly several of the fuzz test problems that had been reported went away. Another consideration is that, even back in the 1980s, humans weren't the only ones writing programs. There were programs writing programs and they sometimes produced bizarre (but syntactically correct) code.
> 
> -Paul W.

A happy medium could be including far-out fuzzing to characterize issues, but not necessarily then immediately sink the resources into resolving bizarre discoveries from the fuzzing.  Better to know then not but also have the wisdom to determine "is someone actually going to trip this" vs. "this is something that is possible and good to document".  In my own work we have several of the latter where something is almost guaranteed to never happen with a human interaction, but is also something we want documented somewhere so if unlikely problem <xyz> ever does happen, the discovery is already done and we just start plotting out a solution.  That's also some nice low hanging fruit to pluck when there isn't much else going on, but avoids the phenomenon where we sink critical time into bugfixes with a microscopic ROI.

- Matt G.

From luther.johnson at makerlisp.com  Wed May 22 04:12:29 2024
From: luther.johnson at makerlisp.com (Luther Johnson)
Date: Tue, 21 May 2024 11:12:29 -0700
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
Message-ID: <de71e472-dca3-1c69-34ae-05d424847d35@makerlisp.com>

I like this anecdote because it points out the difference between being
able to handle and process bizarre conditions, as if they were something
that should work, which is maybe not that helpful, vs. detecting them
and doing something reasonable, like failiing with a "limit exceeded"
message. A silent, insidious failure down the line because a limit was
exceeded is never good. If "fuzz testing" helps exercise limits and
identifies places where software hasn't realized it has exceeded its
limits, has run off the end of a table, etc., that seems like a good
thing to me.

On 05/21/2024 09:59 AM, Paul Winalski wrote:
> On Tue, May 21, 2024 at 12:09 AM Serissa <stewart at serissa.com
> <mailto:stewart at serissa.com>> wrote:
>
>         Well this is obviously a hot button topic.  AFAIK I was nearby
>         when fuzz-testing for software was invented. I was the main
>         advocate for hiring Andy Payne into the Digital Cambridge
>         Research Lab.  One of his little projects was a thing that
>         generated random but correct C programs and fed them to
>         different compilers or compilers with different switches to
>         see if they crashed or generated incorrect results.
>         Overnight, his tester filed 300 or so bug reports against the
>         Digital C compiler.  This was met with substantial pushback,
>         but it was a mostly an issue that many of the reports traced
>         to the same underlying bugs.
>
>         Bill McKeemon expanded the technique and published
>         "Differential Testing of Software"
>         https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf
>         <https://www.cs.swarthmore.edu/%7Ebylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf>
>
> In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product
> teams to introduce fuzz testing into our testing process.  As with the
> C compiler work at DEC Cambridge, fuzz testing for other compilers
> (Fortran, PL/I) also found large numbers of bugs.
>
> The pushback from the compiler folks was mainly a matter of
> priorities.  Fuzz testing is very adept at finding edge conditions,
> but most failing fuzz tests have syntax that no human programmer would
> ever write.  As a compiler engineer you have limited time to devote to
> bug testing.  Do you spend that time addressing real customer issues
> that have been reported or do you spend it fixing problems with code
> that no human being would ever write?  To take an example that really
> happened, a fuzz test consisting of 100 nested parentheses caused an
> overflow in a parser table (it could only handle 50 nested parens).
> Is that worth fixing?
>
> As you pointed out, fuzz test failures tend to occur in clusters and
> many of the failures eventually are traced to the same underlying
> bug.  Which leads to the counter-argument to the pushback.  The fuzz
> tests are finding real underlying bugs.  Why not fix them before a
> customer runs into them?  That very thing did happen several times.  A
> customer-reported bug was fixed and suddenly several of the fuzz test
> problems that had been reported went away.  Another consideration is
> that, even back in the 1980s, humans weren't the only ones writing
> programs.  There were programs writing programs and they sometimes
> produced bizarre (but syntactically correct) code.
>
> -Paul W.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240521/899f3704/attachment.htm>

From dave at horsfall.org  Wed May 22 13:26:36 2024
From: dave at horsfall.org (Dave Horsfall)
Date: Wed, 22 May 2024 13:26:36 +1000 (EST)
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
Message-ID: <alpine.BSF.2.21.9999.2405221325100.15285@aneurin.horsfall.org>

On Tue, 21 May 2024, Paul Winalski wrote:

> To take an example that really happened, a fuzz test consisting of 100 
> nested parentheses caused an overflow in a parser table (it could only 
> handle 50 nested parens).  Is that worth fixing?

Well, they could be a rabid LISP programmer...

-- Dave

From flexibeast at gmail.com  Wed May 22 15:08:29 2024
From: flexibeast at gmail.com (Alexis)
Date: Wed, 22 May 2024 15:08:29 +1000
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <alpine.BSF.2.21.9999.2405221325100.15285@aneurin.horsfall.org>
 (Dave Horsfall's message of "Wed, 22 May 2024 13:26:36 +1000 (EST)")
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
 <alpine.BSF.2.21.9999.2405221325100.15285@aneurin.horsfall.org>
Message-ID: <875xv6bfhu.fsf@gmail.com>

Dave Horsfall <dave at horsfall.org> writes:

> On Tue, 21 May 2024, Paul Winalski wrote:
>
>> To take an example that really happened, a fuzz test consisting 
>> of 100 
>> nested parentheses caused an overflow in a parser table (it 
>> could only 
>> handle 50 nested parens).  Is that worth fixing?
>
> Well, they could be a rabid LISP programmer...

Just did a quick check of some of the ELisp packages on my system:

* For my own packages, the maximum was 10 closing parentheses.
* For the packages in my elpa/ directory, the maximum was 26 in 
  ducpel-glyphs.el, where they were part of a glyph, rather than 
  delimiting code. The next highest value was 16, in org.el and 
  magit-sequence.el.

i would suggest that any Lisp with more than a couple of dozen 
closing parentheses is in dire need of refactoring. Although of 
course someone who's rabid is probably not in the appropriate 
mental state for that. :-)


Alexis.

From jpl.jpl at gmail.com  Wed May 22 22:20:38 2024
From: jpl.jpl at gmail.com (John P. Linderman)
Date: Wed, 22 May 2024 08:20:38 -0400
Subject: [TUHS] Gordon Bell has died
Message-ID: <CAC0cEp-h_r+YQSWpLZgcW8BskiKphGkqm7Ez2nT_dk9kb7G17Q@mail.gmail.com>

https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?unlocked_article_code=1.t00.arl-.blsWtHq8G62d&smid=url-share
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240522/0720cac0/attachment.htm>

From imp at bsdimp.com  Wed May 22 23:12:36 2024
From: imp at bsdimp.com (Warner Losh)
Date: Wed, 22 May 2024 07:12:36 -0600
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <875xv6bfhu.fsf@gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
 <alpine.BSF.2.21.9999.2405221325100.15285@aneurin.horsfall.org>
 <875xv6bfhu.fsf@gmail.com>
Message-ID: <CANCZdfohe9rp60=+xhQbdzQXkz00vHitPCPFppE_7rcas8yHwQ@mail.gmail.com>

On Tue, May 21, 2024, 11:08 PM Alexis <flexibeast at gmail.com> wrote:

> Dave Horsfall <dave at horsfall.org> writes:
>
> > On Tue, 21 May 2024, Paul Winalski wrote:
> >
> >> To take an example that really happened, a fuzz test consisting
> >> of 100
> >> nested parentheses caused an overflow in a parser table (it
> >> could only
> >> handle 50 nested parens).  Is that worth fixing?
> >
> > Well, they could be a rabid LISP programmer...
>
> Just did a quick check of some of the ELisp packages on my system:
>
> * For my own packages, the maximum was 10 closing parentheses.
> * For the packages in my elpa/ directory, the maximum was 26 in
>   ducpel-glyphs.el, where they were part of a glyph, rather than
>   delimiting code. The next highest value was 16, in org.el and
>   magit-sequence.el.
>
> i would suggest that any Lisp with more than a couple of dozen
> closing parentheses is in dire need of refactoring. Although of
> course someone who's rabid is probably not in the appropriate
> mental state for that. :-)
>

That's what ']' is for.

Warner

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240522/58b124c7/attachment.htm>

From arnold at skeeve.com  Wed May 22 23:44:14 2024
From: arnold at skeeve.com (arnold at skeeve.com)
Date: Wed, 22 May 2024 07:44:14 -0600
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk>
References: <CAKH6PiXQDOxRuZDMvzMVqzHbdgykMtWkdSVWNe4EeHEk9oXoxQ@mail.gmail.com>
 <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu>
 <20240520134155.7A06E1FB2F@orac.inputplus.co.uk>
Message-ID: <202405221344.44MDiEGJ326164@freefriends.org>

I've been travelling, so I haven't been able to answer
these mails until now.

Ralph Corderoy <ralph at inputplus.co.uk> wrote:

> I can see an avalanche of errors in an earlier gawk caused problems, but
> each time there would have been a first patch of the input which made
> a mistake causing the pebble to start rolling.  My understanding is that
> there was potentially a lot of these and rather than fix them it was
> more productive of the limited time to stop patching the input.  Then
> the code which patched could be deleted, getting rid of the buggy bits
> along the way?

That's not the case. Gawk didn't try to patch the input. It
simply set a flag saying "don't try to run" but kept on parsing
anyway, in the hope of finding more errors.

That was a bad idea, because the representation of the program
being built was then not in the correct state to have more
stuff parsed and converted into byte code.

Very early on, the first parse error caused an exit. I changed
it to keep going to try to be helpful. But when that became a source
for essentially specious bug reports and a time sink for me, it
became time to go back to exiting on the first problem.

HTH,

Arnold

From paul.winalski at gmail.com  Thu May 23 01:37:39 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Wed, 22 May 2024 11:37:39 -0400
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <de71e472-dca3-1c69-34ae-05d424847d35@makerlisp.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
 <de71e472-dca3-1c69-34ae-05d424847d35@makerlisp.com>
Message-ID: <CABH=_VThuZ+HGkmdbHAzMQt3Ubh4xEvQZZcnrsGxv1uwRfeRLw@mail.gmail.com>

On Tue, May 21, 2024 at 2:12 PM Luther Johnson <luther.johnson at makerlisp.com>
wrote:

> I like this anecdote because it points out the difference between being
> able to handle and process bizarre conditions, as if they were something
> that should work, which is maybe not that helpful, vs. detecting them and
> doing something reasonable, like failiing with a "limit exceeded" message
>
That is in fact precisely how the DEC compiler handled the 100 nested
parentheses condition.

> . A silent, insidious failure down the line because a limit was exceeded
> is never good.
>
Amen!  One should always do bounds checking when dealing with fixed-size
aggregate data structures.  One compiler that I worked on got a bug report
of bad code being generated.  The problem was an illegal optimization that
never should have triggered but did due to a corrupted data table.  Finding
the culprit of the corruption took hours.  It finally turned out to be due
to overflow of an adjacent data table in use elsewhere in the compiler.
The routine to add another entry to that table didn't check for table
overflow.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240522/3919491b/attachment.htm>

From lm at mcvoy.com  Thu May 23 04:49:04 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Wed, 22 May 2024 11:49:04 -0700
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <CABH=_VThuZ+HGkmdbHAzMQt3Ubh4xEvQZZcnrsGxv1uwRfeRLw@mail.gmail.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
 <de71e472-dca3-1c69-34ae-05d424847d35@makerlisp.com>
 <CABH=_VThuZ+HGkmdbHAzMQt3Ubh4xEvQZZcnrsGxv1uwRfeRLw@mail.gmail.com>
Message-ID: <20240522184904.GK25728@mcvoy.com>

On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson at makerlisp.com>
> wrote:
> 
> > I like this anecdote because it points out the difference between being
> > able to handle and process bizarre conditions, as if they were something
> > that should work, which is maybe not that helpful, vs. detecting them and
> > doing something reasonable, like failiing with a "limit exceeded" message
> >
> That is in fact precisely how the DEC compiler handled the 100 nested
> parentheses condition.
> 
> > . A silent, insidious failure down the line because a limit was exceeded
> > is never good.
> >
> Amen!  One should always do bounds checking when dealing with fixed-size
> aggregate data structures.  One compiler that I worked on got a bug report
> of bad code being generated.  The problem was an illegal optimization that
> never should have triggered but did due to a corrupted data table.  Finding
> the culprit of the corruption took hours.  It finally turned out to be due
> to overflow of an adjacent data table in use elsewhere in the compiler.
> The routine to add another entry to that table didn't check for table
> overflow.

We invented a data structure that gets around this problem nicely.  It's
an array of pointers that starts at [1] instead of [0].  The [0]
entry encodes 2 things:

In the upper bits, the log(2) the size of the array.  So all arrays
have at least [0] and [1].  So 2 pointers is the smallest array and
that was important to us, we wanted it to scale up and scale down.

In the lower bits, we record the number of used entries in the array.
We assumed 32 bit pointers and with those we got ~134 million entries
as our maximum number of entries.

Usage is like

char **space = allocLines(4);	// start with space for 4 entries

space = addLine(space, "I am [1]");
space = addLine(space, "I am [2]");
space = addLine(space, "I am [3]");
space = addLine(space, "I am [4]");	// realloc's to 8 entries

freelines(space, 0);	// second arg is typically 0 or free()

It works GREAT.  We used it all over BitKeeper, for stuff as small as
commit comments to arrays of data structures.  It scales down, scales
up.  Helper functions:

/*
 * liblines - interfaces for autoexpanding data structures
 *
 * s= allocLines(n)
 *      pre allocate space for slightly less than N entries.
 * s = addLine(s, line)
 *      add line to s, allocating as needed.
 *      line must be a pointer to preallocated space.
 * freeLines(s, freep)
 *      free the lines array; if freep is set, call that on each entry.
 *      if freep is 0, do not free each entry.
 * buf = popLine(s)
 *      return the most recently added line (not an alloced copy of it)
 * reverseLines(s)
 *      reverse the order of the lines in the array
 * sortLines(space, compar)
 *      sort the lines using the compar function if set, else string_sort()
 * removeLine(s, which, freep)
 *      look for all lines which match "which" and remove them from the array
 *      returns number of matches found
 * removeLineN(s, i, freep)
 *      remove the 'i'th line.
 * lines = splitLine(buf, delim, lines)
 *      split buf on any/all chars in delim and put the tokens in lines.
 * buf = joinLines(":", s)
 *      return one string which is all the strings glued together with ":"
 *      does not free s, caller must free s.
 * buf = findLine(lines, needle);
 *      Return the index the line in lines that matches needle
 */

It's all open source, apache licensed, but you'd have to tease it out of
the bitkeeper source tree.  Wouldn't be that hard and it would be useful.

From lm at mcvoy.com  Thu May 23 06:17:53 2024
From: lm at mcvoy.com (Larry McVoy)
Date: Wed, 22 May 2024 13:17:53 -0700
Subject: [TUHS] A fuzzy awk.
In-Reply-To: <20240522184904.GK25728@mcvoy.com>
References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com>
 <CAKzdPgyOPp+M8_nSUD=LeB-qrduBBsQDG-HgJjExKn-GAeVzLA@mail.gmail.com>
 <CABH=_VR9TEnPLtjexUKtpkfG-81bg=g1X2+0v7upN=f-sEkA4A@mail.gmail.com>
 <de71e472-dca3-1c69-34ae-05d424847d35@makerlisp.com>
 <CABH=_VThuZ+HGkmdbHAzMQt3Ubh4xEvQZZcnrsGxv1uwRfeRLw@mail.gmail.com>
 <20240522184904.GK25728@mcvoy.com>
Message-ID: <20240522201753.GL25728@mcvoy.com>

Wayne teased this into a stand alone library here: 

https://github.com/wscott/bksupport

On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote:
> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote:
> > On Tue, May 21, 2024 at 2:12???PM Luther Johnson <luther.johnson at makerlisp.com>
> > wrote:
> > 
> > > I like this anecdote because it points out the difference between being
> > > able to handle and process bizarre conditions, as if they were something
> > > that should work, which is maybe not that helpful, vs. detecting them and
> > > doing something reasonable, like failiing with a "limit exceeded" message
> > >
> > That is in fact precisely how the DEC compiler handled the 100 nested
> > parentheses condition.
> > 
> > > . A silent, insidious failure down the line because a limit was exceeded
> > > is never good.
> > >
> > Amen!  One should always do bounds checking when dealing with fixed-size
> > aggregate data structures.  One compiler that I worked on got a bug report
> > of bad code being generated.  The problem was an illegal optimization that
> > never should have triggered but did due to a corrupted data table.  Finding
> > the culprit of the corruption took hours.  It finally turned out to be due
> > to overflow of an adjacent data table in use elsewhere in the compiler.
> > The routine to add another entry to that table didn't check for table
> > overflow.
> 
> We invented a data structure that gets around this problem nicely.  It's
> an array of pointers that starts at [1] instead of [0].  The [0]
> entry encodes 2 things:
> 
> In the upper bits, the log(2) the size of the array.  So all arrays
> have at least [0] and [1].  So 2 pointers is the smallest array and
> that was important to us, we wanted it to scale up and scale down.
> 
> In the lower bits, we record the number of used entries in the array.
> We assumed 32 bit pointers and with those we got ~134 million entries
> as our maximum number of entries.
> 
> Usage is like
> 
> char **space = allocLines(4);	// start with space for 4 entries
> 
> space = addLine(space, "I am [1]");
> space = addLine(space, "I am [2]");
> space = addLine(space, "I am [3]");
> space = addLine(space, "I am [4]");	// realloc's to 8 entries
> 
> freelines(space, 0);	// second arg is typically 0 or free()
> 
> It works GREAT.  We used it all over BitKeeper, for stuff as small as
> commit comments to arrays of data structures.  It scales down, scales
> up.  Helper functions:
> 
> /*
>  * liblines - interfaces for autoexpanding data structures
>  *
>  * s= allocLines(n)
>  *      pre allocate space for slightly less than N entries.
>  * s = addLine(s, line)
>  *      add line to s, allocating as needed.
>  *      line must be a pointer to preallocated space.
>  * freeLines(s, freep)
>  *      free the lines array; if freep is set, call that on each entry.
>  *      if freep is 0, do not free each entry.
>  * buf = popLine(s)
>  *      return the most recently added line (not an alloced copy of it)
>  * reverseLines(s)
>  *      reverse the order of the lines in the array
>  * sortLines(space, compar)
>  *      sort the lines using the compar function if set, else string_sort()
>  * removeLine(s, which, freep)
>  *      look for all lines which match "which" and remove them from the array
>  *      returns number of matches found
>  * removeLineN(s, i, freep)
>  *      remove the 'i'th line.
>  * lines = splitLine(buf, delim, lines)
>  *      split buf on any/all chars in delim and put the tokens in lines.
>  * buf = joinLines(":", s)
>  *      return one string which is all the strings glued together with ":"
>  *      does not free s, caller must free s.
>  * buf = findLine(lines, needle);
>  *      Return the index the line in lines that matches needle
>  */
> 
> It's all open source, apache licensed, but you'd have to tease it out of
> the bitkeeper source tree.  Wouldn't be that hard and it would be useful.

-- 
---
Larry McVoy           Retired to fishing          http://www.mcvoy.com/lm/boat

From douglas.mcilroy at dartmouth.edu  Thu May 23 23:49:18 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Thu, 23 May 2024 09:49:18 -0400
Subject: [TUHS] A fuzzy awk
Message-ID: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>

> Doug McIlroy was generating random regular expressions

Actually not. I exhaustively (within limits) tested an RE recognizer
without knowingly generating any RE either mechanically or by hand.

The trick: From recursive equations (easily derived from the grammar of
REs), I counted how many REs exist up to various limits on token counts,
Then I generated all strings that satisfied those limits, turned the
recognizer loose on them and counted how many it accepted. Any disagreement
of counts revealed the existence (but not any symptom) of bugs.

Unlike most diagnostic techniques, this scheme produces a certificate of
(very high odds on) correctness over a representative subdomain. The scheme
also agnostically checks behavior on bad inputs as well as good.  It does
not, however, provide a stress test of a recognizer's capacity limits. And
its exponential nature limits its applicability to rather small domains.
(REs have only 5 distinct kinds of token.)

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/596a2139/attachment.htm>

From jefftwopointzero at gmail.com  Fri May 24 04:55:16 2024
From: jefftwopointzero at gmail.com (Jeffrey Joshua Rollin)
Date: Thu, 23 May 2024 19:55:16 +0100
Subject: [TUHS] Gordon Bell has died
In-Reply-To: <CAC0cEp-h_r+YQSWpLZgcW8BskiKphGkqm7Ez2nT_dk9kb7G17Q@mail.gmail.com>
References: <CAC0cEp-h_r+YQSWpLZgcW8BskiKphGkqm7Ez2nT_dk9kb7G17Q@mail.gmail.com>
Message-ID: <5738D239-4D89-4742-A30F-A0CCB1288780@gmail.com>


> On 22 May 2024, at 13:20, John P. Linderman <jpl.jpl at gmail.com> wrote:
> 
> https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?unlocked_article_code=1.t00.arl-.blsWtHq8G62d&smid=url-share


Very sad news.

Jeff.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/927bd3fe/attachment.htm>

From will.senn at gmail.com  Fri May 24 04:58:22 2024
From: will.senn at gmail.com (Will Senn)
Date: Thu, 23 May 2024 13:58:22 -0500
Subject: [TUHS] Running v7 in Open-SIMH - update for 2024
Message-ID: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>

All,

I can't believe it's been 9 years since I wrote up my original notes on 
getting Research Unix v7 running in SIMH. Crazy how time flies. Well, 
this past week Clem found a bug in my scripts that create tape images. 
It seem like they were missing a tape mark at the end. Not a showstopper 
by any means, but we like to keep a clean house. So, I applied his fixes 
and updated the scripts along with the resultant tape image and Warren 
has updated them in the archive:

https://www.tuhs.org/Archive/Distributions/Research/Keith_Bostic_v7/

I've also updated the note to address the fixes, to use the latest 
version of Open-SIMH on Linux Mint 21.3 "Virginia" (my host of choice 
these days), and to bring the transcripts up to date:

https://decuser.github.io/unix/research-unix/v7/2024/05/23/research-unix-v7-3.2.html

Later,

Will

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/3acef224/attachment.htm>

From g.branden.robinson at gmail.com  Fri May 24 05:01:55 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Thu, 23 May 2024 14:01:55 -0500
Subject: [TUHS] Running v7 in Open-SIMH - update for 2024
In-Reply-To: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>
References: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>
Message-ID: <20240523190155.rexm26o5rqegvc7u@illithid>

Hi Will,

At 2024-05-23T13:58:22-0500, Will Senn wrote:
> I can't believe it's been 9 years since I wrote up my original notes
> on getting Research Unix v7 running in SIMH. Crazy how time flies.
> Well, this past week Clem found a bug in my scripts that create tape
> images. It seem like they were missing a tape mark at the end. Not a
> showstopper by any means, but we like to keep a clean house. So, I
> applied his fixes and updated the scripts along with the resultant
> tape image
[...]

I'd like to join the many people who have previously thanked you for
this work.  Your resource made V7 Unix troff and nroff accessible to me,
and that access has been invaluable to me in my efforts on groff.

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/3f83b1e2/attachment.sig>

From will.senn at gmail.com  Fri May 24 06:00:18 2024
From: will.senn at gmail.com (Will Senn)
Date: Thu, 23 May 2024 15:00:18 -0500
Subject: [TUHS] Running v7 in Open-SIMH - update for 2024
In-Reply-To: <20240523190155.rexm26o5rqegvc7u@illithid>
References: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>
 <20240523190155.rexm26o5rqegvc7u@illithid>
Message-ID: <8ee116d2-c8b0-467b-a8c1-33ea0aa7a081@gmail.com>

Hi Branden,

On 5/23/24 2:01 PM, G. Branden Robinson wrote:
> Hi Will,
>
> At 2024-05-23T13:58:22-0500, Will Senn wrote:
>> I can't believe it's been 9 years
> [...]
>
> I'd like to join the many people who have previously thanked you for
> this work.  Your resource made V7 Unix troff and nroff accessible to me,
> and that access has been invaluable to me in my efforts on groff.
>
> Regards,
> Branden
Aw, you definitely made my day. As a regular user of groff, I am 
thrilled to have helped, even in this small way.

Will

From clemc at ccc.com  Fri May 24 06:48:53 2024
From: clemc at ccc.com (Clem Cole)
Date: Thu, 23 May 2024 16:48:53 -0400
Subject: [TUHS] Running v7 in Open-SIMH - update for 2024
In-Reply-To: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>
References: <ac2c9284-f87c-4fda-91e4-e149dc7046c4@gmail.com>
Message-ID: <CAC20D2PLXvfg79VsoX9yAzN_y_R53uvu-Xb30jWTkvPHHELxEw@mail.gmail.com>

FYI - POR is to push some new tools I have been creating into OpenSIMH
shortly.

In fairness to Will, this is in the class of a "2-minute minor," not a
"4-minute major."  I back into this issue as I was working on Oscar's new
PiDP-10 and moving a very old (v6 syntax) UNIXC program that manipulates
PDP-10 backup and TOPS-20 Dumper images.  PDP-10s do things in 36 bits,
which does not map cleanly to the 8 data bits of a 9-track tape (you don't
want to know what the 10 does unless you have to deal with it).  So, I
wrote some tools to better examine and flexibly manipulate TAP files [the
debug code for tapes in SIMH is a bit of a mess].  Anyway, as I was testing
something, I thought I had made an error in my new tap_decode(1) tool when
I was looking at the v7.tap.gz file that Warren has in the TUHS archives
(that Will supplied/created with his mktape scripts).   When I looked more
carefully, it was missing a record. It turns out SIMH will silently
"attach" a TAP image without a proper 9-track logical end-of-tape (it
should give a warning).  It also turns out Will's directions never looked
for the actual 9-track EOT records - so nobody ever saw this.  I mentioned
it to him quietly - cudo's for coming clean.

FWIW: I always recommend Will's documents for V6 and V7 (in fact, we point
to them in the OpenSIMH archives at my suggestion).  The truth is, I wish
we had had access to a few more that are as good as Will's for some of the
other OSses.

Clem
ᐧ

On Thu, May 23, 2024 at 2:58 PM Will Senn <will.senn at gmail.com> wrote:

> All,
>
> I can't believe it's been 9 years since I wrote up my original notes on
> getting Research Unix v7 running in SIMH. Crazy how time flies. Well, this
> past week Clem found a bug in my scripts that create tape images. It seem
> like they were missing a tape mark at the end. Not a showstopper by any
> means, but we like to keep a clean house. So, I applied his fixes and
> updated the scripts along with the resultant tape image and Warren has
> updated them in the archive:
>
> https://www.tuhs.org/Archive/Distributions/Research/Keith_Bostic_v7/
>
> I've also updated the note to address the fixes, to use the latest version
> of Open-SIMH on Linux Mint 21.3 "Virginia" (my host of choice these days),
> and to bring the transcripts up to date:
>
>
> https://decuser.github.io/unix/research-unix/v7/2024/05/23/research-unix-v7-3.2.html
>
> Later,
>
> Will
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/11dd9edc/attachment.htm>

From robpike at gmail.com  Fri May 24 06:52:35 2024
From: robpike at gmail.com (Rob Pike)
Date: Fri, 24 May 2024 06:52:35 +1000
Subject: [TUHS] A fuzzy awk
In-Reply-To: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
Message-ID: <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>

The semantic distinction is important but the end result is very similar.
"Fuzzing" as it is now called (for no reason I can intuit) tries to get to
the troublesome cases faster by a sort of depth-first search, but
exhaustive will always beat it for value. Our exhaustive tester for bitblt,
first done by John Reiser if I remember right, set the stage for my own
thinking about how you properly test something.

-rob


On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy <
douglas.mcilroy at dartmouth.edu> wrote:

> > Doug McIlroy was generating random regular expressions
>
> Actually not. I exhaustively (within limits) tested an RE recognizer
> without knowingly generating any RE either mechanically or by hand.
>
> The trick: From recursive equations (easily derived from the grammar of
> REs), I counted how many REs exist up to various limits on token counts,
> Then I generated all strings that satisfied those limits, turned the
> recognizer loose on them and counted how many it accepted. Any disagreement
> of counts revealed the existence (but not any symptom) of bugs.
>
> Unlike most diagnostic techniques, this scheme produces a certificate of
> (very high odds on) correctness over a representative subdomain. The
> scheme also agnostically checks behavior on bad inputs as well as good.  It
> does not, however, provide a stress test of a recognizer's capacity limits. And
> its exponential nature limits its applicability to rather small domains.
> (REs have only 5 distinct kinds of token.)
>
> Doug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/00a29f2b/attachment.htm>

From andrew at humeweb.com  Fri May 24 15:41:55 2024
From: andrew at humeweb.com (andrew at humeweb.com)
Date: Thu, 23 May 2024 22:41:55 -0700
Subject: [TUHS] A fuzzy awk
In-Reply-To: <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
Message-ID: <E9F14CF4-7BFE-4976-8BCD-20FB92801A68@humeweb.com>

i did some of the later testing of bitblt.
it was a lovely thing, slowly constructing a trustable synthetic bitblt
of ever great size and range that you could compare the bitblt to be tested against.

and we did find a couple of bugs, much to reiser’s chagrin.

> On May 23, 2024, at 1:52 PM, Rob Pike <robpike at gmail.com> wrote:
> 
> The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something.
> 
> -rob
> 
> 
> On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy <douglas.mcilroy at dartmouth.edu <mailto:douglas.mcilroy at dartmouth.edu>> wrote:
>> > Doug McIlroy was generating random regular expressions
>> 
>> Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand.
>> 
>> The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. 
>> 
>> Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good.  It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.)
>> 
>> Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240523/c0494669/attachment-0001.htm>

From ralph at inputplus.co.uk  Fri May 24 17:17:47 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Fri, 24 May 2024 08:17:47 +0100
Subject: [TUHS] A fuzzy awk
In-Reply-To: <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
Message-ID: <20240524071747.77477213B8@orac.inputplus.co.uk>

Hi,

Rob wrote:
> "Fuzzing" as it is now called (for no reason I can intuit)

Barton Miller describes coining the term.

   ‘That night, I was logged on to the Unix system in my office via
    a dial-up phone line over a 1200 baud modem.  ...
    I wanted a name that would evoke the feeling of random, unstructured
    data.  After trying out several ideas, I settled on the term “fuzz”.’

        — https://pages.cs.wisc.edu/~bart/fuzz/Foreword1.html

Line noise inspired him, as he describes.

-- 
Cheers, Ralph.

From robpike at gmail.com  Fri May 24 17:41:36 2024
From: robpike at gmail.com (Rob Pike)
Date: Fri, 24 May 2024 17:41:36 +1000
Subject: [TUHS] A fuzzy awk
In-Reply-To: <20240524071747.77477213B8@orac.inputplus.co.uk>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <20240524071747.77477213B8@orac.inputplus.co.uk>
Message-ID: <CAKzdPgyWdFymYu2B05Xn51Mv+b7z3NOEBXfoCfb9OJTB2z8UKg@mail.gmail.com>

I'm sure that's the etymology but fuzzing isn't exactly random. That's
kinda the point of it.

-rob


On Fri, May 24, 2024 at 5:18 PM Ralph Corderoy <ralph at inputplus.co.uk>
wrote:

> Hi,
>
> Rob wrote:
> > "Fuzzing" as it is now called (for no reason I can intuit)
>
> Barton Miller describes coining the term.
>
>    ‘That night, I was logged on to the Unix system in my office via
>     a dial-up phone line over a 1200 baud modem.  ...
>     I wanted a name that would evoke the feeling of random, unstructured
>     data.  After trying out several ideas, I settled on the term “fuzz”.’
>
>         — https://pages.cs.wisc.edu/~bart/fuzz/Foreword1.html
>
> Line noise inspired him, as he describes.
>
> --
> Cheers, Ralph.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/60c5924d/attachment.htm>

From ralph at inputplus.co.uk  Fri May 24 20:00:56 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Fri, 24 May 2024 11:00:56 +0100
Subject: [TUHS] Is fuzz testing random?  (Was: A fuzzy awk)
In-Reply-To: <CAKzdPgyWdFymYu2B05Xn51Mv+b7z3NOEBXfoCfb9OJTB2z8UKg@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <20240524071747.77477213B8@orac.inputplus.co.uk>
 <CAKzdPgyWdFymYu2B05Xn51Mv+b7z3NOEBXfoCfb9OJTB2z8UKg@mail.gmail.com>
Message-ID: <20240524100056.2B01220210@orac.inputplus.co.uk>

Hi Rob,

> I'm sure that's the etymology but fuzzing isn't exactly random.
> That's kinda the point of it.

I was just curious about the etymology, but thinking about it...

The path crept along isn't random but guided by observation, say new
output or increased coverage.  But rather than exhaustively generate all
possible inputs, a random subset is chosen to allow deeper progress to
be made more quickly.

-- 
Cheers, Ralph.

From halbert at halwitz.org  Fri May 24 21:56:50 2024
From: halbert at halwitz.org (Dan Halbert)
Date: Fri, 24 May 2024 07:56:50 -0400
Subject: [TUHS] A fuzzy awk
In-Reply-To: <20240524071747.77477213B8@orac.inputplus.co.uk>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <20240524071747.77477213B8@orac.inputplus.co.uk>
Message-ID: <422de511-c7d8-4a0d-a548-7bacd98d38ec@halwitz.org>

On 5/24/24 03:17, Ralph Corderoy wrote:
> Rob wrote:
>> "Fuzzing" as it is now called (for no reason I can intuit)
> Barton Miller describes coining the term.
>
As to where the inspiration of choice of word came from, I'll speculate 
: Bart Miller was a CS grad student contemporary of mine at Berkeley. 
Prof. Lotfi Zadeh was working on fuzzy logic, fuzzy sets, and 
"possibility theory". (Prof. William Kahan hated this work, and called 
it "wrong, and pernicious": cf. 
https://www.sciencedirect.com/science/article/abs/pii/S0020025508000716.) 
So the term "fuzzy" was almost infamous in the department.

Prof. Richard Lipton was also at Berkeley at that time, and was working 
on program mutation testing, which fuzzes the program to determine the 
adequacy of test coverage, rather than fuzzing the test data.

Dan H.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/3147c4e4/attachment.htm>

From g.branden.robinson at gmail.com  Sat May 25 10:03:48 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Fri, 24 May 2024 19:03:48 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
Message-ID: <20240525000348.hq5zvwm6x4evl44h@illithid>

Hi folks,

I'm finding it difficult to find any direct sources on the question in
the subject line.

Does anyone here have any source material they can point me to
documenting the existence of a port of BSD curses to Unix Version 7?

I know that curses made it into 2.9BSD for the PDP-11, but that's not
quite the same thing.

There are comments in System V Release 2's curses.h file[1][2] (very
different from 4BSD's[3]) that suggest some effort to accommodate
Version 7's terminal driver.  So I would _presume_ that curses got
ported to Version 7.  But that's System V, right when it started
diverging from BSD curses, and moreover, presumption is not evidence.

Even personal accounts/anecdotes would be helpful.  Maybe some of you
_wrote_ curses applications for Version 7 machines.

Regards,
Branden

[1] System III apparently did not have curses at all.  Both it and 4BSD
    were released in 1980.  System V Release 1 doesn't seem to, either.
[2] https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/include/curses.h
[3] https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/include/curses.h
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/45fa1c73/attachment.sig>

From tuhs at tuhs.org  Sat May 25 10:17:53 2024
From: tuhs at tuhs.org (Bakul Shah via TUHS)
Date: Fri, 24 May 2024 17:17:53 -0700
Subject: [TUHS] A fuzzy awk
In-Reply-To: <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
Message-ID: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>

What would be nice if programming languages provided some support for such exhaustive testing[1].

At one point I had suggested turning Go's Interface type to something like Guttag style abstract data types in that relevant axioms are specified right in the interface definition. The idea was that any concrete type that implements that interface must satisfy its axioms. Even if the compiler ignored these axioms, one can write a support program that can generate a set of comprehensive tests based on these axioms. [Right now a type "implementing" an interface only needs to have a set of methods that exactly match the interface methods but nothing more] The underlying idea is that each type is in essence a constraint on what values an instance of that type can take. So adding such axioms simply tightens (& documents) these constraints. Just the process of coming up with such axioms can improve the design (sor of like test driven design but better!).

Now it may be that applying this to anything more complex than stacks won't work well & it won't be perfect but I thought this was worth experimenting with. This would be like functional testing of all the nuts and bolts and components that go in an airplane. The airplane may still fall apart but that would be a "composition" error!

[1] There are "proof assisant" or formal spec languages such as TLA+, Coq, Isabelle etc. but they don't get used much by the average programmer. I want something more retail!

> On May 23, 2024, at 1:52 PM, Rob Pike <robpike at gmail.com> wrote:
> 
> The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something.
> 
> -rob
> 
> 
> On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy <douglas.mcilroy at dartmouth.edu <mailto:douglas.mcilroy at dartmouth.edu>> wrote:
>> > Doug McIlroy was generating random regular expressions
>> 
>> Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand.
>> 
>> The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. 
>> 
>> Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good.  It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.)
>> 
>> Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/4f684137/attachment.htm>

From clemc at ccc.com  Sat May 25 10:46:19 2024
From: clemc at ccc.com (Clem Cole)
Date: Fri, 24 May 2024 20:46:19 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <20240525000348.hq5zvwm6x4evl44h@illithid>
References: <20240525000348.hq5zvwm6x4evl44h@illithid>
Message-ID: <CAC20D2O8svnQypQcf6j_6jtx+aTyYSeJ9TRySo7xw1Be6+c9=A@mail.gmail.com>

I’m traveling this weekend so I’m doing this by memory.  ISTR The original
curses was developed on Ing70 as part of Rogue and that It missed the 2BSD
tape by about a year.  See if you can find an early Rogue distribution and
I think you’ll find it there.  If not look in the early net news source
distributions.

Sent from a handheld expect more typos than usual


On Fri, May 24, 2024 at 8:04 PM G. Branden Robinson <
g.branden.robinson at gmail.com> wrote:

> Hi folks,
>
> I'm finding it difficult to find any direct sources on the question in
> the subject line.
>
> Does anyone here have any source material they can point me to
> documenting the existence of a port of BSD curses to Unix Version 7?
>
> I know that curses made it into 2.9BSD for the PDP-11, but that's not
> quite the same thing.
>
> There are comments in System V Release 2's curses.h file[1][2] (very
> different from 4BSD's[3]) that suggest some effort to accommodate
> Version 7's terminal driver.  So I would _presume_ that curses got
> ported to Version 7.  But that's System V, right when it started
> diverging from BSD curses, and moreover, presumption is not evidence.
>
> Even personal accounts/anecdotes would be helpful.  Maybe some of you
> _wrote_ curses applications for Version 7 machines.
>
> Regards,
> Branden
>
> [1] System III apparently did not have curses at all.  Both it and 4BSD
>     were released in 1980.  System V Release 1 doesn't seem to, either.
> [2]
> https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/include/curses.h
> [3]
> https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/include/curses.h
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/6b0eb2bd/attachment.htm>

From g.branden.robinson at gmail.com  Sat May 25 10:57:01 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Fri, 24 May 2024 19:57:01 -0500
Subject: [TUHS] A fuzzy awk
In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
Message-ID: <20240525005701.efxidwmww56qmiwa@illithid>

[restricting to list; strong opinions here]

At 2024-05-24T17:17:53-0700, Bakul Shah via TUHS wrote:
> What would be nice if programming languages provided some support for
> such exhaustive testing[1].

[rearranging]
> At one point I had suggested turning Go's Interface type to something
> like Guttag style abstract data types in that relevant axioms are
> specified right in the interface definition.

It's an excellent idea.

> The underlying idea is that each type is in essence a constraint on
> what values an instance of that type can take.

In the simple form of a data type plus a range constraint, that's the
Ada definition of a subtype since day one--Ada '80 or Ada 83 if you
insist on the standardized form of the language.

40 years later we have Linus Torvalds tearing up his achievement
certificate in "kinder, gentler email interactions" just to trash the
notion of range checks on data types.[1][2][3]

Naturally, the brogrammers are quick to take Torvalds's side.[4]

Pascal had range checks too, and Kernighan famously punked on Wirth for
that.  I'm not certain, but I get the feeling the latter got somewhat
over-interpreted.  (To be fair to Kernighan, Pascal _as specced in the
Revised Report of 1973_[5] was in my opinion too weak a language to
leave the lab, for many of the reasons he noted.  The inflexible array
typing was fatal, in my view.)

> The idea was that any concrete type that implements that interface
> must satisfy its axioms.

Yes.  There is of course much more to the universe of potential
constraints than range checks.  Ada 2022 has these in great generality
with "subtype predicates".

http://www.ada-auth.org/standards/22aarm/html/aa-3-2-4.html

> Even if the compiler ignored these axioms,

I don't understand why this idea wasn't seized upon with more force at
the CSRC.  The notion of a compiler flag that turned "extra" (in the
Ritchie compiler circa 1980, this is perhaps expressed better as "any")
correctness checks could not have been a novelty.  NDEBUG and assert()
are similarly extremely old even in Unix.

> one can write a support program that can generate a set of
> comprehensive tests based on these axioms.

Yes.  As I understand it, this is how Spark/Ada got started.  Specially
annotated comments expressing predicates communicated with such a
support program, running much like the sort of automated theorem
prover you characterize below as not "retail".

In the last two revision cycles of the Ada standard (2013, 2022),
Spark/Ada's enhancements have made it into the language--though I am not
certain, and would not claim, that they compose with _every_ language
feature.  Spark/Ada started life as a subset of the language for a
reason.

But C has its own subset, MISRA C, so this is hardly a reason to scoff.

> [Right now a type "implementing" an interface only needs to
> have a set of methods that exactly match the interface methods but
> nothing more] The underlying idea is that each type is in essence a
> constraint on what values an instance of that type can take. So adding
> such axioms simply tightens (& documents) these constraints. Just the
> process of coming up with such axioms can improve the design (sor of
> like test driven design but better!).

Absolutely.  Generally, software engineers like to operationalize things
consistently enough that they can then be scripted/automated.

Evidently software testing is so mind-numblingly tedious that the will
to undertake it, even with automation, evaporates.

> Now it may be that applying this to anything more complex than stacks
> won't work well & it won't be perfect but I thought this was worth
> experimenting with. This would be like functional testing of all the
> nuts and bolts and components that go in an airplane. The airplane may
> still fall apart but that would be a "composition" error!

Yes.  And even if you can prove 100% of the theorems in your system, you
may learn to your dismay that your specification was defective.
Automated provers are as yet no aid to system architects.

> [1] There are "proof assisant" or formal spec languages such as TLA+,
> Coq, Isabelle etc. but they don't get used much by the average
> programmer. I want something more retail!

I've had a little exposure to these.  They are indeed esoteric, but also
extremely resource-hungry.  My _impression_, based on no hard data, is
that increasing the abilities of static analyzers and the expressiveness
with which they are directed with predicates is much cheaper.

But a lot of programmers will not budge at any cost, and will moreover
be celebrated by their peers for their obstinacy.  See footnotes.

There is much work still to be done.

Regards,
Branden

[1] https://lore.kernel.org/all/202404291502.612E0A10 at keescook/
    https://lore.kernel.org/all/CAHk-=wi5YPwWA8f5RAf_Hi8iL0NhGJeL6MN6UFWwRMY8L6UDvQ at mail.gmail.com/
[2] https://lore.kernel.org/lkml/CAHk-=whkGHOmpM_1kNgzX1UDAs10+UuALcpeEWN29EE0m-my=w at mail.gmail.com/
[3] https://www.businessinsider.com/linus-torvalds-linux-time-away-empathy-2018-9
[4] https://lwn.net/Articles/973108/
[5] https://archive.org/details/1973-the-programming-language-pascal-revised-report-wirth
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/43aeb4d3/attachment.sig>

From g.branden.robinson at gmail.com  Sat May 25 10:57:52 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Fri, 24 May 2024 19:57:52 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2O8svnQypQcf6j_6jtx+aTyYSeJ9TRySo7xw1Be6+c9=A@mail.gmail.com>
References: <20240525000348.hq5zvwm6x4evl44h@illithid>
 <CAC20D2O8svnQypQcf6j_6jtx+aTyYSeJ9TRySo7xw1Be6+c9=A@mail.gmail.com>
Message-ID: <20240525005752.bbcyvkd4k2rhcxek@illithid>

At 2024-05-24T20:46:19-0400, Clem Cole wrote:
> I’m traveling this weekend so I’m doing this by memory.  ISTR The original
> curses was developed on Ing70 as part of Rogue and that It missed the 2BSD
> tape by about a year.  See if you can find an early Rogue distribution and
> I think you’ll find it there.  If not look in the early net news source
> distributions.

Thanks, Clem!

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240524/23499887/attachment.sig>

From jsg at jsg.id.au  Sat May 25 20:48:54 2024
From: jsg at jsg.id.au (Jonathan Gray)
Date: Sat, 25 May 2024 20:48:54 +1000
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <20240525000348.hq5zvwm6x4evl44h@illithid>
References: <20240525000348.hq5zvwm6x4evl44h@illithid>
Message-ID: <ZlHCFj67W74P3v_p@largo.jsg.id.au>

On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote:
> Hi folks,
> 
> I'm finding it difficult to find any direct sources on the question in
> the subject line.
> 
> Does anyone here have any source material they can point me to
> documenting the existence of a port of BSD curses to Unix Version 7?

"In particular, the C shell, curses, termcap, vi and job control were
ported back to Version 7 (and later System III) so that it was not
unusual to find these features on otherwise pure Bell releases."
from Documentation/Books/Life_with_Unix_v2.pdf

in some v7ish distributions: unisoft, xenix, nu machine, venix?

https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437
https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up

https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf pg 580
https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up

https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412
https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up

From tuhs at tuhs.org  Sat May 25 21:08:55 2024
From: tuhs at tuhs.org (Arrigo Triulzi via TUHS)
Date: Sat, 25 May 2024 13:08:55 +0200
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <ZlHCFj67W74P3v_p@largo.jsg.id.au>
References: <ZlHCFj67W74P3v_p@largo.jsg.id.au>
Message-ID: <A9B37F57-BE39-4A8D-8F99-5C201D6FF143@alchemistowl.org>


On 25 May 2024, at 12:49, Jonathan Gray <jsg at jsg.id.au> wrote:
> in some v7ish distributions: unisoft, xenix, nu machine, venix?

In Xenix 286 I have “fond” memories of some characters being inverted in curses so you had your windows (if you drew them) looking weird.

I had an #ifdef in my code to flip the characters…

Arrigo


From clemc at ccc.com  Sat May 25 22:16:42 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 08:16:42 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <ZlHCFj67W74P3v_p@largo.jsg.id.au>
References: <20240525000348.hq5zvwm6x4evl44h@illithid>
 <ZlHCFj67W74P3v_p@largo.jsg.id.au>
Message-ID: <CAC20D2NGJHs4RDiZjXWSN4JG0TixDB2DTcGygmn-CYxEUofq+Q@mail.gmail.com>

Oh how I hate history rewrites.  Job control was developed by Kulp on V7 in
Europe and MIT.  Joy saw it and added it what would become 4BSD.

The others were all developed on V7 (PDP11)at UCB.  They were not back
ported either. The vax work inherited them from V7.

It is true, The public tended to see these as 4BSD features as that was the
vehicle that got larger distribution.

Sent from a handheld expect more typos than usual


On Sat, May 25, 2024 at 6:49 AM Jonathan Gray <jsg at jsg.id.au> wrote:

> On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote:
> > Hi folks,
> >
> > I'm finding it difficult to find any direct sources on the question in
> > the subject line.
> >
> > Does anyone here have any source material they can point me to
> > documenting the existence of a port of BSD curses to Unix Version 7?
>
> "In particular, the C shell, curses, termcap, vi and job control were
> ported back to Version 7 (and later System III) so that it was not
> unusual to find these features on otherwise pure Bell releases."
> from Documentation/Books/Life_with_Unix_v2.pdf
>
> in some v7ish distributions: unisoft, xenix, nu machine, venix?
>
> https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437
>
> https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up
>
>
> https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf
> pg 580
>
> https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up
>
> https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412
>
> https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/f895fab2/attachment.htm>

From davida at pobox.com  Sat May 25 23:56:21 2024
From: davida at pobox.com (David Arnold)
Date: Sat, 25 May 2024 23:56:21 +1000
Subject: [TUHS] A fuzzy awk
In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
Message-ID: <52098DD5-4FE0-4892-9288-12FE70793484@pobox.com>


> On 25 May 2024, at 10:18, Bakul Shah via TUHS <tuhs at tuhs.org> wrote:
> 
> ﻿
> What would be nice if programming languages provided some support for such exhaustive testing[1].
> 
> At one point I had suggested turning Go's Interface type to something like Guttag style abstract data types in that relevant axioms are specified right in the interface definition. The idea was that any concrete type that implements that interface must satisfy its axioms. Even if the compiler ignored these axioms, one can write a support program that can generate a set of comprehensive tests based on these axioms.

Sounds like Eiffel, whose compiler had support for checking pre and post conditions (and maybe invariants?) at runtime, or disabling the checks for “performance” mode. 


d

From douglas.mcilroy at dartmouth.edu  Sun May 26 01:06:24 2024
From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy)
Date: Sat, 25 May 2024 11:06:24 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
Message-ID: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>

> Does anyone here have any source material they can point me to
> documenting the existence of a port of BSD curses to Unix Version 7?

Curses appears in the v8 manual but not v7. Of course a
conclusion that it was not ported to v7 turns on dates. Does
v7 refer to a point in time or an interval that extended until we
undertook to prepare the v8 manual? Obviously curses was
ported during or before that interval. If curses was available
when the v7 manual was prepared, I (who edited both editions)
evidently was unaware of any dependence on it then.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/4202c189/attachment.htm>

From rich.salz at gmail.com  Sun May 26 01:11:39 2024
From: rich.salz at gmail.com (Rich Salz)
Date: Sat, 25 May 2024 11:11:39 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
Message-ID: <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>

I thought that Rob Pike was involved in the port

    /R$, troll
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/61285e19/attachment.htm>

From g.branden.robinson at gmail.com  Sun May 26 01:28:32 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 25 May 2024 10:28:32 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <ZlHCFj67W74P3v_p@largo.jsg.id.au>
 <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
Message-ID: <20240525152832.zzjipv2wjcuedyld@illithid>

Hi Jonathan & Doug,

At 2024-05-25T20:48:54+1000, Jonathan Gray wrote:
> On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote:
> > Does anyone here have any source material they can point me to
> > documenting the existence of a port of BSD curses to Unix Version 7?
> 
> "In particular, the C shell, curses, termcap, vi and
[ snip per Clem Cole ;-) ]
> were ported back to Version 7 (and later System III) so that it was
> not unusual to find these features on otherwise pure Bell releases."
> from Documentation/Books/Life_with_Unix_v2.pdf

Thanks!  This is exactly the sort of source citation I was looking for.

At 2024-05-25T11:06:24-0400, Douglas McIlroy wrote:
> Curses appears in the v8 manual but not v7. Of course a
> conclusion that it was not ported to v7 turns on dates.

I was confident that curses was not "part" of v7 because of these
factors.  (1) It wasn't in the manual; (2) archives of v7 in which we
now traffic as historical artifacts show no trace of it; and (3) the
story of its origin and development, even when distorted, doesn't place
it at the CSRC as far back as 1977/8.

But, if someone placed to know had claimed that it was, that would have
been a claim worth investigating.

> Does v7 refer to a point in time or an interval that extended until we
> undertook to prepare the v8 manual? Obviously curses was ported during
> or before that interval.

Perhaps one reason my question can be read two ways is that I'm
interested in both aspects of the issue.

I'm trying to write a "History" section for the primary ncurses man page
and clean up other problems its documentation has, like a boilerplate
reference to "Version 7 curses" in many of its other man pages, which
repeatedly implies such a thing as a separate line of development from
"BSD curses" and "System V curses".  I've been dubious of that language
since first encountering it, but I want a good documentary record to
support my proposal to chop it out.

> If curses was available when the v7 manual was prepared, I (who edited
> both editions) evidently was unaware of any dependence on it then.

I see no evidence that you missed it.  :)

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/54d754ed/attachment-0001.sig>

From clemc at ccc.com  Sun May 26 01:40:13 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 11:40:13 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
Message-ID: <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>

It was never needed to be ported -- it was developed on V7.
It was released in comp.sources.unix volume1 as pcurses

That said, I believe late volumes have nervous updates.

Clem
ᐧ

On Sat, May 25, 2024 at 11:11 AM Rich Salz <rich.salz at gmail.com> wrote:

> I thought that Rob Pike was involved in the port
>
>     /R$, troll
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/93cc61b8/attachment.htm>

From clemc at ccc.com  Sun May 26 01:43:54 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 11:43:54 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
Message-ID: <CAC20D2O5S-iU5VOnfbrZZQT1ZwLsPn9tV+FL6xrCUyv7P3gUww@mail.gmail.com>

l hate autocorrect ...  s/nervous/numerous/
ᐧ

On Sat, May 25, 2024 at 11:40 AM Clem Cole <clemc at ccc.com> wrote:

> It was never needed to be ported -- it was developed on V7.
> It was released in comp.sources.unix volume1 as pcurses
>
> That said, I believe late volumes have nervous updates.
>
> Clem
> ᐧ
>
> On Sat, May 25, 2024 at 11:11 AM Rich Salz <rich.salz at gmail.com> wrote:
>
>> I thought that Rob Pike was involved in the port
>>
>>     /R$, troll
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/08708a77/attachment.htm>

From clemc at ccc.com  Sun May 26 01:51:12 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 11:51:12 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
Message-ID: <CAC20D2MnSTfCAq1kJS2aPosacyEOmhcnQ=uXfqnP9Ub9CdnRtw@mail.gmail.com>

On Sat, May 25, 2024 at 11:40 AM Clem Cole <clemc at ccc.com> wrote:

> It was never needed to be ported -- it was developed on V7.
> It was released in comp.sources.unix volume1 as pcurses
>
> That said, I believe late volumes have nervous updates.
>
> Clem
> ᐧ
>
>>
>> As Rich points out, the comp.source.unix version may be a later Cornell
version, but I am fairly sure that the original was developed in Cory Hall,
I believe on Ing70, although it may have been the Cory Hall 11/70. I
remember finding bugs in it when we ran it on the Teklabs 11/70, which was
definitely a heavily hacked V7-based system with much of 2BSD and other UCB
tools added to it.

The point is while Vaxen had been released, we did not have one at
Tektronix at the time, and I got a lot of V7-based tools from folks in Cory
Hall.
ᐧ
ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/752daab9/attachment.htm>

From g.branden.robinson at gmail.com  Sun May 26 01:57:37 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 25 May 2024 10:57:37 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
Message-ID: <20240525155737.bwmngdyf4qnj4avv@illithid>

Hi Clem,

At 2024-05-25T11:40:13-0400, Clem Cole wrote:
> It was never needed to be ported -- it was developed on V7.
> It was released in comp.sources.unix volume1 as pcurses

This bit conflicts with other accounts.  Here's what I have in draft.

HISTORY
     4BSD (1980) introduced curses, implemented largely by Kenneth
     C. R. C. Arnold, who organized the terminal abstraction and screen
     management features of Bill Joy’s vi(1) editor into a library.
     That system ran only on the VAX architecture; curses saw a port to
     2.9BSD (1983) for the PDP‐11.

     System V Release 2 (SVr2, 1984) significantly revised curses and
     replaced the termcap portion thereof with a different API for
     terminal handling, terminfo.  System V added form and menu
     libraries in SVr3 (1987) and enhanced curses with color support in
     SVr3.2 later the same year.  SVr4 (1989) brought the panel library.

pcurses by distinction was, by the accounts I have, a later effort by
Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its
termcap bits with a reimplementation terminfo.  This was apparently done
for licensing reasons, as BSD code was free ("as in freedom") and System
V certainly was not.

The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms,
which starts as follows.  Note the 2nd and 3rd paragraphs.

.po +.5i
.TL
The Curses Reference Manual
.AU
Pavel Curtis
.NH
Introduction
.LP
Terminfo is a database describing many capabilities of over 150
different terminals.  Curses is a subroutine package which
presents a high level screen model to the programmer, while
dealing with issues such as terminal differences and optimization of
output to change one screenfull of text into another.
.LP
Terminfo is based on Berkeley's termcap database, but contains a
number of improvements and extensions.  Parameterized strings are
introduced, making it possible to describe such capabilities as
video attributes, and to handle far more unusual terminals than
possible with termcap.
.LP
Curses is also based on Berkeley's curses package, with many
improvements.  The package makes use of the insert and delete
line and character features of terminals so equipped, and
determines how to optimally use these features with no help from the
programmer.  It allows arbitrary combinations of video attributes
to be displayed, even on terminals that leave ``magic cookies''
on the screen to mark changes in attributes.

> That said, I believe late volumes have nervous updates.

I'm gathering data for another paragraph of that "History" section now.
The long and short of it seems to be that:

BSD curses, besides getting ported to many platforms, begat pcurses.

pcurses begat PCCurses, PDCurses, and ncurses.

PCCurses died.

PDCurses went dormant, begat PDCursesMod, and roused from its slumber.

ncurses, after a long period of erratic early administration that seemed
more concerned with seizing celebrity status for its developers (one of
whom was more single-minded and successful at this goal than the other)
than with software development, has been maintained with a steady hand
over 25 years.

There also exists NetBSD curses, which wasn't developed ex nihilo but
I'm not sure yet what origin it forked from.

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/7a993389/attachment.sig>

From clemc at ccc.com  Sun May 26 02:06:27 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 12:06:27 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <20240525155737.bwmngdyf4qnj4avv@illithid>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
Message-ID: <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>

Ken was working in Ing70 [he was part of the Ingres group] - IngVax did not
yet exist,
ᐧ
ᐧ

On Sat, May 25, 2024 at 11:57 AM G. Branden Robinson <
g.branden.robinson at gmail.com> wrote:

> Hi Clem,
>
> At 2024-05-25T11:40:13-0400, Clem Cole wrote:
> > It was never needed to be ported -- it was developed on V7.
> > It was released in comp.sources.unix volume1 as pcurses
>
> This bit conflicts with other accounts.  Here's what I have in draft.
>
> HISTORY
>      4BSD (1980) introduced curses, implemented largely by Kenneth
>      C. R. C. Arnold, who organized the terminal abstraction and screen
>      management features of Bill Joy’s vi(1) editor into a library.
>      That system ran only on the VAX architecture; curses saw a port to
>      2.9BSD (1983) for the PDP‐11.
>
>      System V Release 2 (SVr2, 1984) significantly revised curses and
>      replaced the termcap portion thereof with a different API for
>      terminal handling, terminfo.  System V added form and menu
>      libraries in SVr3 (1987) and enhanced curses with color support in
>      SVr3.2 later the same year.  SVr4 (1989) brought the panel library.
>
> pcurses by distinction was, by the accounts I have, a later effort by
> Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its
> termcap bits with a reimplementation terminfo.  This was apparently done
> for licensing reasons, as BSD code was free ("as in freedom") and System
> V certainly was not.
>
> The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms,
> which starts as follows.  Note the 2nd and 3rd paragraphs.
>
> .po +.5i
> .TL
> The Curses Reference Manual
> .AU
> Pavel Curtis
> .NH
> Introduction
> .LP
> Terminfo is a database describing many capabilities of over 150
> different terminals.  Curses is a subroutine package which
> presents a high level screen model to the programmer, while
> dealing with issues such as terminal differences and optimization of
> output to change one screenfull of text into another.
> .LP
> Terminfo is based on Berkeley's termcap database, but contains a
> number of improvements and extensions.  Parameterized strings are
> introduced, making it possible to describe such capabilities as
> video attributes, and to handle far more unusual terminals than
> possible with termcap.
> .LP
> Curses is also based on Berkeley's curses package, with many
> improvements.  The package makes use of the insert and delete
> line and character features of terminals so equipped, and
> determines how to optimally use these features with no help from the
> programmer.  It allows arbitrary combinations of video attributes
> to be displayed, even on terminals that leave ``magic cookies''
> on the screen to mark changes in attributes.
>
> > That said, I believe late volumes have nervous updates.
>
> I'm gathering data for another paragraph of that "History" section now.
> The long and short of it seems to be that:
>
> BSD curses, besides getting ported to many platforms, begat pcurses.
>
> pcurses begat PCCurses, PDCurses, and ncurses.
>
> PCCurses died.
>
> PDCurses went dormant, begat PDCursesMod, and roused from its slumber.
>
> ncurses, after a long period of erratic early administration that seemed
> more concerned with seizing celebrity status for its developers (one of
> whom was more single-minded and successful at this goal than the other)
> than with software development, has been maintained with a steady hand
> over 25 years.
>
> There also exists NetBSD curses, which wasn't developed ex nihilo but
> I'm not sure yet what origin it forked from.
>
> Regards,
> Branden
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/a34b4580/attachment.htm>

From g.branden.robinson at gmail.com  Sun May 26 02:13:20 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 25 May 2024 11:13:20 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
Message-ID: <20240525161320.3jvozzlgvr6tfyxl@illithid>

Hi Clem,

At 2024-05-25T12:06:27-0400, Clem Cole wrote:
> Ken [Arnold] was working in Ing70 [he was part of the Ingres group] -
> IngVax did not yet exist,

That does complicate my simplistic story.  Ing70 was, then, as you noted
in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix,
but rather something with various bits of BSD (also in active
development, I reckon).  Nevertheless, I venture, the first officially
distributed curses was in 4BSD, a VAX-only release.  But, it stands to
reason that BSD curses never got far from its -11-portable roots; it
must have been obvious that the library would be desired on such hosts
and the CSRG came to officially support it thus in 2.9BSD 3 years later.

Hmm.  I'll have to chew on how to recast that economically.

Thanks for all the light you're throwing on this!

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/f153f33f/attachment.sig>

From clemc at ccc.com  Sun May 26 02:14:10 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 12:14:10 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
Message-ID: <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>

Ouch -- there was no licensing issue with curses or termcap.

termcap and curses were written at UCB.

When MaryAnn went to Columbus - there was  desire to rewrite to be
"compiled".  That work was terminfo.    AT&T >>restricted<< terminfo.
Pavel (with coaching from a few of us, including me], wrote a  new
implementation of terminfo.
When he was added it, he combined a rewrite of curses.

Clem
ᐧ

On Sat, May 25, 2024 at 12:06 PM Clem Cole <clemc at ccc.com> wrote:

> Ken was working in Ing70 [he was part of the Ingres group] - IngVax did
> not yet exist,
> ᐧ
> ᐧ
>
> On Sat, May 25, 2024 at 11:57 AM G. Branden Robinson <
> g.branden.robinson at gmail.com> wrote:
>
>> Hi Clem,
>>
>> At 2024-05-25T11:40:13-0400, Clem Cole wrote:
>> > It was never needed to be ported -- it was developed on V7.
>> > It was released in comp.sources.unix volume1 as pcurses
>>
>> This bit conflicts with other accounts.  Here's what I have in draft.
>>
>> HISTORY
>>      4BSD (1980) introduced curses, implemented largely by Kenneth
>>      C. R. C. Arnold, who organized the terminal abstraction and screen
>>      management features of Bill Joy’s vi(1) editor into a library.
>>      That system ran only on the VAX architecture; curses saw a port to
>>      2.9BSD (1983) for the PDP‐11.
>>
>>      System V Release 2 (SVr2, 1984) significantly revised curses and
>>      replaced the termcap portion thereof with a different API for
>>      terminal handling, terminfo.  System V added form and menu
>>      libraries in SVr3 (1987) and enhanced curses with color support in
>>      SVr3.2 later the same year.  SVr4 (1989) brought the panel library.
>>
>> pcurses by distinction was, by the accounts I have, a later effort by
>> Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its
>> termcap bits with a reimplementation terminfo.  This was apparently done
>> for licensing reasons, as BSD code was free ("as in freedom") and System
>> V certainly was not.
>>
>> The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms,
>> which starts as follows.  Note the 2nd and 3rd paragraphs.
>>
>> .po +.5i
>> .TL
>> The Curses Reference Manual
>> .AU
>> Pavel Curtis
>> .NH
>> Introduction
>> .LP
>> Terminfo is a database describing many capabilities of over 150
>> different terminals.  Curses is a subroutine package which
>> presents a high level screen model to the programmer, while
>> dealing with issues such as terminal differences and optimization of
>> output to change one screenfull of text into another.
>> .LP
>> Terminfo is based on Berkeley's termcap database, but contains a
>> number of improvements and extensions.  Parameterized strings are
>> introduced, making it possible to describe such capabilities as
>> video attributes, and to handle far more unusual terminals than
>> possible with termcap.
>> .LP
>> Curses is also based on Berkeley's curses package, with many
>> improvements.  The package makes use of the insert and delete
>> line and character features of terminals so equipped, and
>> determines how to optimally use these features with no help from the
>> programmer.  It allows arbitrary combinations of video attributes
>> to be displayed, even on terminals that leave ``magic cookies''
>> on the screen to mark changes in attributes.
>>
>> > That said, I believe late volumes have nervous updates.
>>
>> I'm gathering data for another paragraph of that "History" section now.
>> The long and short of it seems to be that:
>>
>> BSD curses, besides getting ported to many platforms, begat pcurses.
>>
>> pcurses begat PCCurses, PDCurses, and ncurses.
>>
>> PCCurses died.
>>
>> PDCurses went dormant, begat PDCursesMod, and roused from its slumber.
>>
>> ncurses, after a long period of erratic early administration that seemed
>> more concerned with seizing celebrity status for its developers (one of
>> whom was more single-minded and successful at this goal than the other)
>> than with software development, has been maintained with a steady hand
>> over 25 years.
>>
>> There also exists NetBSD curses, which wasn't developed ex nihilo but
>> I'm not sure yet what origin it forked from.
>>
>> Regards,
>> Branden
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/02e6a62a/attachment.htm>

From clemc at ccc.com  Sun May 26 02:21:17 2024
From: clemc at ccc.com (Clem Cole)
Date: Sat, 25 May 2024 12:21:17 -0400
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <20240525161320.3jvozzlgvr6tfyxl@illithid>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <20240525161320.3jvozzlgvr6tfyxl@illithid>
Message-ID: <CAC20D2M2jLFZNhNJQsJpSzKasSftTRafSscG_4SeT0pY0jbANg@mail.gmail.com>

On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson <
g.branden.robinson at gmail.com> wrote:

> That does complicate my simplistic story.  Ing70 was, then, as you noted
> in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix,
> but rather something with various bits of BSD (also in active
> development, I reckon).
>
Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was
very similar to Teklabs which I ran.
It had all of 2BSD on it, but the kernel work that we think of as 'BSD" was
3.0BSD and later 4.0BSD and that was 100% on the Vax.

The point is it was a 16 bits system, the Johnson C compiler with some
fixes from the greater USENIX community including UCB.
There was >>no port<< needed.

This was its native tongue.

It was >>included<< in later BSD released which is how people came to know
it because 4.XBSD was became much more widely used than V7+2BSD.

The 2.9 work of Keith at al, started because the UCB Math Dept could not
afford a VAX.   DEC had released the  v7m code to support overlays, so
slowly
changed from the VAX made it back into the V7 based kernel - which took a
new life.

Clem


ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/d67de2e9/attachment.htm>

From g.branden.robinson at gmail.com  Sun May 26 02:25:49 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 25 May 2024 11:25:49 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>
Message-ID: <20240525162549.yg2qndtloodv3upq@illithid>

Hi Clem,

At 2024-05-25T12:14:10-0400, Clem Cole wrote:
> Ouch -- there was no licensing issue with [BSD] curses or termcap.

Right.  I wasn't trying to imply otherwise.  That's why Pavel Curtis
could use BSD curses as a basis for his pcurses.

It is only System V curses that was encumbered.  And now it too is
available for inspection, if in a somewhat gray area for anyone with
commercial ambitions.

> termcap and curses were written at UCB.

Agreed.  I've seen no claim anywhere to the contrary.

> When MaryAnn went to Columbus - there was  desire to rewrite to be
> "compiled".  That work was terminfo.    AT&T >>restricted<< terminfo.

Yes.  This too is my understanding.  terminfo is a better API (and
source format) than termcap, but I also surmise that better support for
deployment environments with large "fleets" of video terminals was also
seen by AT&T management as an enticing prospect for vendor lock-in.

> Pavel (with coaching from a few of us, including me], wrote a  new
> implementation of terminfo.  When he was added it, he combined a
> rewrite of curses.

Thank you for the confirmation.  And for supplying some coaching all
those years ago--we're still enjoying the benefits today!

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/97634744/attachment.sig>

From g.branden.robinson at gmail.com  Sun May 26 02:38:10 2024
From: g.branden.robinson at gmail.com (G. Branden Robinson)
Date: Sat, 25 May 2024 11:38:10 -0500
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2M2jLFZNhNJQsJpSzKasSftTRafSscG_4SeT0pY0jbANg@mail.gmail.com>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <20240525161320.3jvozzlgvr6tfyxl@illithid>
 <CAC20D2M2jLFZNhNJQsJpSzKasSftTRafSscG_4SeT0pY0jbANg@mail.gmail.com>
Message-ID: <20240525163810.flvazgbj6tq3l5rw@illithid>

Hi Clem,

At 2024-05-25T12:21:17-0400, Clem Cole wrote:
> On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson <
> g.branden.robinson at gmail.com> wrote:
> 
> > That does complicate my simplistic story.  Ing70 was, then, as you noted
> > in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix,
> > but rather something with various bits of BSD (also in active
> > development, I reckon).
> >
> Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was
> very similar to Teklabs which I ran.

Yes, sorry, I was hasty and sloppy.  I should have qualified that
"Version 7 Unix" with "pure".  Though I wonder if anyone ran "pure"
distributions of anything by today's standards, with our flatpaks and VM
images and containers and distributions and Linux kernel "taint" flags.

And, blessed be, our reproducible builds.  So there is such a thing as
progress.

> The point is it was a 16 bits system, the Johnson C compiler with some
> fixes from the greater USENIX community including UCB.
> There was >>no port<< needed.
> 
> This was its native tongue.

Okay.  My crystal ball shows wordsmithing in my future.

> It was >>included<< in later BSD released which is how people came to
> know it because 4.XBSD was became much more widely used than V7+2BSD.

Acknowledged.

> The 2.9 work of Keith at al, started because the UCB Math Dept could
> not afford a VAX.   DEC had released the  v7m code to support
> overlays, so slowly changed from the VAX made it back into the V7
> based kernel - which took a new life.

Ah, I'd never heard the actual origin story of later 2BSD's reason for
parallel development.  Thanks!

Back when I was first learning Unix, a mere 30 years ago, I asked a
local guru why the kernel image was called "vmunix" instead of just
plain "unix".  I got a correct answer, but then asked why you'd keep
calling it "vmunix" when no non-VM Unix was even available for the
platform.  Historical inertia and the long shadow of the work that
became 4BSD.  (Linus's decision to name his kernel's image "vmlinux" [or
"vmlinuz" for those remember having those lulz] when in its case no
non-VM version had ever existed anywhere, nor even been desired or
conceived, struck me as an excess of continuity.)

Unix geeks are conservative about the weirdest things.

Regards,
Branden
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/f3ab7ccf/attachment.sig>

From imp at bsdimp.com  Sun May 26 03:02:00 2024
From: imp at bsdimp.com (Warner Losh)
Date: Sat, 25 May 2024 11:02:00 -0600
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <20240525163810.flvazgbj6tq3l5rw@illithid>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <20240525161320.3jvozzlgvr6tfyxl@illithid>
 <CAC20D2M2jLFZNhNJQsJpSzKasSftTRafSscG_4SeT0pY0jbANg@mail.gmail.com>
 <20240525163810.flvazgbj6tq3l5rw@illithid>
Message-ID: <CANCZdfpJ-CByNutzhjzaYBzqarX1SHOkaFb5ZkVztHsBmwwSfA@mail.gmail.com>

On Sat, May 25, 2024, 10:38 AM G. Branden Robinson <
g.branden.robinson at gmail.com> wrote:

> Hi Clem,
>
> At 2024-05-25T12:21:17-0400, Clem Cole wrote:
> > On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson <
> > g.branden.robinson at gmail.com> wrote:
> >
> > > That does complicate my simplistic story.  Ing70 was, then, as you
> noted
> > > in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix,
> > > but rather something with various bits of BSD (also in active
> > > development, I reckon).
> > >
> > Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was
> > very similar to Teklabs which I ran.
>
> Yes, sorry, I was hasty and sloppy.  I should have qualified that
> "Version 7 Unix" with "pure".  Though I wonder if anyone ran "pure"
> distributions of anything by today's standards, with our flatpaks and VM
> images and containers and distributions and Linux kernel "taint" flags.
>
> And, blessed be, our reproducible builds.  So there is such a thing as
> progress.
>
> > The point is it was a 16 bits system, the Johnson C compiler with some
> > fixes from the greater USENIX community including UCB.
> > There was >>no port<< needed.
> >
> > This was its native tongue.
>
> Okay.  My crystal ball shows wordsmithing in my future.
>
> > It was >>included<< in later BSD released which is how people came to
> > know it because 4.XBSD was became much more widely used than V7+2BSD.
>
> Acknowledged.
>
> > The 2.9 work of Keith at al, started because the UCB Math Dept could
> > not afford a VAX.   DEC had released the  v7m code to support
> > overlays, so slowly changed from the VAX made it back into the V7
> > based kernel - which took a new life.
>
> Ah, I'd never heard the actual origin story of later 2BSD's reason for
> parallel development.  Thanks!
>

The 2.8 kernel from the 2.83 archive is a V7 with a bunch of hacks /
features #ifdef'd into the tree with a primitive config thing to cons up
the #defines. This is still largely present in 2.9, but with less rigid
adherence for bug fixes. It's very clear that for the kernel this was
followed. I've not studied userland to comment on that but i think not.

It also explains why the release notes kept saying it was the last release
starting iirc with 2.8...

Warner

Back when I was first learning Unix, a mere 30 years ago, I asked a
> local guru why the kernel image was called "vmunix" instead of just
> plain "unix".  I got a correct answer, but then asked why you'd keep
> calling it "vmunix" when no non-VM Unix was even available for the
> platform.  Historical inertia and the long shadow of the work that
> became 4BSD.  (Linus's decision to name his kernel's image "vmlinux" [or
> "vmlinuz" for those remember having those lulz] when in its case no
> non-VM version had ever existed anywhere, nor even been desired or
> conceived, struck me as an excess of continuity.)
>
> Unix geeks are conservative about the weirdest things.
>
> Regards,
> Branden
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/39b91f51/attachment-0001.htm>

From paul.winalski at gmail.com  Sun May 26 03:18:17 2024
From: paul.winalski at gmail.com (Paul Winalski)
Date: Sat, 25 May 2024 13:18:17 -0400
Subject: [TUHS] A fuzzy awk
In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
Message-ID: <CABH=_VSEMhT_GJZS0rWPUytdJ_EwFCYq6y8wWV5SJT8tAYupPQ@mail.gmail.com>

On Fri, May 24, 2024 at 8:18 PM Bakul Shah via TUHS <tuhs at tuhs.org> wrote:

At one point I had suggested turning Go's Interface type to something like
> Guttag style abstract data types in that relevant axioms are specified
> right in the interface definition. The idea was that any concrete type that
> implements that interface must satisfy its axioms. Even if the compiler
> ignored these axioms, one can write a support program that can generate a
> set of comprehensive tests based on these axioms. [Right now a type
> "implementing" an interface only needs to have a set of methods that
> exactly match the interface methods but nothing more] The underlying idea
> is that each type is in essence a constraint on what values an instance of
> that type can take. So adding such axioms simply tightens (& documents)
> these constraints. Just the process of coming up with such axioms can
> improve the design (sor of like test driven design but better!).
>

At one point I worked with a programming language called Gypsy that
implemented this concept.  Each routine had a prefix that specified axioms
on the routine's parameters and outputs.  The rest of Gypsy was a
conventional procedural language but the semantics were carefully chosen to
allow for automated proof of correctness.  I wrote a formal specification
for the DECnet session layer protocol (DECnet's equivalent of TCP) in
Gypsy.  I turned up a subtle bug in the prose version of the protocol
specification in the process.

-Paul W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/245cf076/attachment.htm>

From steve at quintile.net  Sun May 26 03:24:07 2024
From: steve at quintile.net (Steve Simon)
Date: Sat, 25 May 2024 18:24:07 +0100
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
Message-ID: <A2F1953A-D418-425E-9089-47BF16AE9AF3@quintile.net>


with my pedantic head on…

The “7th Edition” was the name of the Perkin Elmer port (nee Interdata), derived from Richard Miller’s work.

This was Unix Version 7 from the labs, with a v6 C compiler, with vi, csh, and curses from 2.4BSD (though we where never 100% sure about this version).

You never forget your first Unix :-)

-Steve


From tom.perrine+tuhs at gmail.com  Sun May 26 03:36:53 2024
From: tom.perrine+tuhs at gmail.com (Tom Perrine)
Date: Sat, 25 May 2024 10:36:53 -0700
Subject: [TUHS] A fuzzy awk
In-Reply-To: <CABH=_VSEMhT_GJZS0rWPUytdJ_EwFCYq6y8wWV5SJT8tAYupPQ@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
 <CABH=_VSEMhT_GJZS0rWPUytdJ_EwFCYq6y8wWV5SJT8tAYupPQ@mail.gmail.com>
Message-ID: <CAJq=PCWKE1Pdkm7527rasqrcPTKUUY0wCWuMX3nDgP=b+K07SQ@mail.gmail.com>

Another Gypsy user here...

For KSOS-11 the kernel was described in SPECIAL - as a set of axioms and
theorems. There was no actual connection between the formal specification
in SPECIAL and the Modula code.

Some of the critical user-space code for a trusted downgrade program, to
bridge data from higher levels of classification to lower, was written in
Gypsy. I visited UT Austin and Dr Good(?)'s team to learn it, IIRC. Gypsy
was considered better in that the specification was tied to the executable
through the pre/post conditions - and the better support for semi-automated
theorem proving.


On Sat, May 25, 2024 at 10:18 AM Paul Winalski <paul.winalski at gmail.com>
wrote:

> On Fri, May 24, 2024 at 8:18 PM Bakul Shah via TUHS <tuhs at tuhs.org> wrote:
>
> At one point I had suggested turning Go's Interface type to something like
>> Guttag style abstract data types in that relevant axioms are specified
>> right in the interface definition. The idea was that any concrete type that
>> implements that interface must satisfy its axioms. Even if the compiler
>> ignored these axioms, one can write a support program that can generate a
>> set of comprehensive tests based on these axioms. [Right now a type
>> "implementing" an interface only needs to have a set of methods that
>> exactly match the interface methods but nothing more] The underlying idea
>> is that each type is in essence a constraint on what values an instance of
>> that type can take. So adding such axioms simply tightens (& documents)
>> these constraints. Just the process of coming up with such axioms can
>> improve the design (sor of like test driven design but better!).
>>
>
> At one point I worked with a programming language called Gypsy that
> implemented this concept.  Each routine had a prefix that specified axioms
> on the routine's parameters and outputs.  The rest of Gypsy was a
> conventional procedural language but the semantics were carefully chosen to
> allow for automated proof of correctness.  I wrote a formal specification
> for the DECnet session layer protocol (DECnet's equivalent of TCP) in
> Gypsy.  I turned up a subtle bug in the prose version of the protocol
> specification in the process.
>
> -Paul W.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/f02f7246/attachment.htm>

From sauer at technologists.com  Sun May 26 03:53:01 2024
From: sauer at technologists.com (Charles H Sauer (he/him))
Date: Sat, 25 May 2024 12:53:01 -0500
Subject: [TUHS] Prof Don Good [was Re: A fuzzy awk
In-Reply-To: <CAJq=PCWKE1Pdkm7527rasqrcPTKUUY0wCWuMX3nDgP=b+K07SQ@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <CAKzdPgyi5pt6XyJtBAqzmh6KdsFssmvXE75_Ukd_0-eHiw8PCQ@mail.gmail.com>
 <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org>
 <CABH=_VSEMhT_GJZS0rWPUytdJ_EwFCYq6y8wWV5SJT8tAYupPQ@mail.gmail.com>
 <CAJq=PCWKE1Pdkm7527rasqrcPTKUUY0wCWuMX3nDgP=b+K07SQ@mail.gmail.com>
Message-ID: <958b0893-6829-41b9-a096-bf732e338ea1@technologists.com>

On 5/25/2024 12:36 PM, Tom Perrine wrote:
> Another Gypsy user here...
> 
> For KSOS-11 the kernel was described in SPECIAL - as a set of axioms and 
> theorems. There was no actual connection between the formal 
> specification in SPECIAL and the Modula code.
> 
> Some of the critical user-space code for a trusted downgrade program, to 
> bridge data from higher levels of classification to lower, was written 
> in Gypsy. I visited UT Austin and Dr Good(?)'s team to learn it, IIRC. 
> Gypsy was considered better in that the specification was tied to the 
> executable through the pre/post conditions - and the better support for 
> semi-automated theorem proving.

When I was transitioning from being a rock n' roller to computer science 
student, I took my first undergraduate languages course from Don.

https://www.dignitymemorial.com/obituaries/austin-tx/donald-good-8209907

Charlie

-- 
voice: +1.512.784.7526       e-mail: sauer at technologists.com
fax: +1.512.346.5240         Web: https://technologists.com/sauer/
Facebook/Google/LinkedIn/Twitter: CharlesHSauer

From ats at offog.org  Sun May 26 04:07:17 2024
From: ats at offog.org (Adam Sampson)
Date: Sat, 25 May 2024 19:07:17 +0100
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>
 (Clem Cole's message of "Sat, 25 May 2024 12:14:10 -0400")
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>
Message-ID: <y2ay17x9356.fsf@offog.org>

Clem Cole <clemc at ccc.com> writes:

> Pavel (with coaching from a few of us, including me], wrote a new
> implementation of terminfo.  When he was added it, he combined a
> rewrite of curses.

>From the utzoo Usenet archive...

--start--

From: utzoo!decvax!harpo!floyd!vax135!cornell!pavel
Newsgroups: net.general
Title: New Curses/Terminfo Package
Article-I.D.: cornell.3348
Posted: Sat Jul 10 15:10:14 1982
Received: Sun Jul 11 03:55:13 1982

At this past week's USENIX meeting, Mark Horton announced the completion
of a replacement database/interface for the Berkeley 'termcap' setup.  The
new version is called 'terminfo' and has several advantages over termcap:
	- The database is compiled and therefore start-up time for
	  programs using the package is considerably reduced, even 
	  faster than reading a single-entry termcap database.
	- The database is more human-readable and flexible.
	- Many more terminals can be supported due to the addition
	  of several new capabilities, generalised parameter
	  mechanisms (enabling the full use of, for example, the ANSI
	  cursor-forward capability by allowing you to say 'move forward
	  35 spaces' as opposed to 'move forward' 35 times), a fully
	  general yet efficient arithmetic mechanism which should allow
	  the use of \any/ bizarre cursor-addressing scheme which can
	  be computed, etc.
	- A \far/ better set of routines for accessing the database,
	  requiring, for example, only a single call to read in an
	  entire entry, making all of the terminal's capabilities fully
	  available to the calling program.  No more need for 'tgetent',
	  'tgetstr', etc.
Conversion of existing programs from termcap to terminfo is very easy and
usually consists mostly of throwing out all of the garbage needed to read
and store a termcap entry.

As a companion to the change to terminfo, Mark has also completed work on
a re-vamped version of the Curses screen-handling library package.  The new
version has many, many advantages over the previous version, some of which
are listed below:
	- New curses can use insert/delete line/character capabilities
	  in terminals which have them, considerably speeding up many
	  applications
	- It is possible to use the new curses on more than one type of
	  terminal at once
	- All of the video attributes of a terminal (e.g. reverse video,
	  boldface, blinking, etc.) can be used, in tandem if possible
	- New curses handles terminals like the Televideos with the
	  so-called 'magic cookie' glitch which leaves markers on the
	  screen for each change of video attributes
	- The arrow and function keys of terminals can be input just as
	  though they were single characters, even on terminals which use
	  multi-character sequences for these functions.  The new curses
	  does all necessary interpretation, passing back to the program
	  only a defined constant telling which key was pressed.
	- There is a user-accessable scrolling region
	- The use of shell escapes and the csh ^Z job control feature is
	  supported more fully
	- On systems which can support the notion, updates of the screen
	  will abort if a character is typed at the keyboard, thus allowing
	  the application to possibly avoid useless output
	- It should now be possible for most programs to be written very
	  portably to run on most versions of UNIX, including System III,
	  Berkeley UNIX, V7, Bell Labs internal UNIX, etc.  This portability
	  extends to the use of most terminal modes, such as raw mode,
	  echoing, etc.

Now for the bad news.  Mark, being an employee of Bell Labs, cannot release
any of his code.  Estimates currently run as high as 18 months for a Bell
release.  Even then, nothing could be guaranteed as to its price.  As a result,
I have decided to do a public-domain implementation of both terminfo and the
new curses.  They will be compatible with Mark's versions.  I have arranged
for the library/database to be distributed with the next Berkeley Software
Distribution, 4.2BSD, in December of this year.  It will also be made available
for free to any requestor.  I agree with Mark when he says that terminfo is
clearly superior to termcap and deserves to be made a new and lasting standard.

I expect to be able to begin recruiting test sites for both curses and terminfo
by the end of September.

If you have any questions, comments or suggestions, please send them to me, not
the network. 

	Pavel Curtis
	{decvax,allegra,vax135,harpo,...}!cornell!pavel
	Pavel.Cornell at Udel-Relay

--end--

-- 
Adam Sampson <ats at offog.org>                         <http://offog.org/>

From alanglasser at gmail.com  Sun May 26 08:28:27 2024
From: alanglasser at gmail.com (Alan Glasser)
Date: Sat, 25 May 2024 18:28:27 -0400
Subject: [TUHS] Did UNIX Ever Touch SPC-SWAP, EPL,
 or EPLX (1A Languages)?
In-Reply-To: <xv5Yh_-qaISR1NMcDZxT-u0eS7LWSr7Im5AlapgURum52oDIRq07Hu4S77V7PlsWb6URG1Mzxz4OLl4ILZq6J48F3pfCDcR4FRE1GwBKy8s=@protonmail.com>
References: <xv5Yh_-qaISR1NMcDZxT-u0eS7LWSr7Im5AlapgURum52oDIRq07Hu4S77V7PlsWb6URG1Mzxz4OLl4ILZq6J48F3pfCDcR4FRE1GwBKy8s=@protonmail.com>
Message-ID: <CALpTLGp6mZTypVMt+LTTumeaLg7aQxcjt--xLdizB9x7z3vYYw@mail.gmail.com>

Matt,

First, sorry for the delayed response.

In around 1994 through late 1996 I worked on the FlashPort project in Bell
Labs.
A significant project that we completed was FlashPort'ing the 4ESS SWAP
assembler from TSS/360 to Solaris.
My memory is that the 4E team wanted to get off of TSS and onto Unix.

Alan

https://techmonitor.ai/technology/emulator_house_echo_logic_folded_back_into_att


On Fri, Apr 5, 2024 at 12:59 AM segaloco via TUHS <tuhs at tuhs.org> wrote:

> So I've been doing a bit of reading on 1A and 4ESS technologies lately,
> getting
> a feel for the state of things just prior to 3B and 5ESS popping onto the
> scene,
> and came across some BSTJ references to the programming environments
> involved
> in the 4ESS and TSPS No. 1 systems.
>
> The general assembly system targeting the 1A machine language was known as
> SPC-SWAP (SWitching Assembly Program)[1](p. 206) and ran under OS/360/370,
> with
> editing typically performed in QED.  This then gave way to the EPL (ESS
> Programming Language) and ultimately EPLX (EPL eXtra)[2](p. 1)[3](p. 8)
> languages which, among other things, were used for later 4ESS work with
> cross-
> compilers for at least TSS/360 by the sounds of it.
>
> Are there any recollections of attempts by the Bell System to rebase any of
> these 1A-targeting environments into UNIX, or by the time UNIX was being
> considered more broadly for Bell System projects, was 3B/5ESS technology
> well on
> the way, rendering attempting to move entrenched IBM-based environments
> for the
> older switching computation systems moot?
>
> For the record, in addition to the evolution of ESS to the 5ESS
> generation, a
> revision of TSPS, 1B, was also introduced which was rebased on the 3B20D
> processor and utilized the same 3B cross-compilation SGS under UNIX as
> other 3B-
> targeted applications[4].  Interestingly, the paper on software development
> in [4](p. 109) still makes reference to Programmer's Workbench as of 1982,
> implying that nomenclature may have still been the norm at some Bell Labs
> sites
> such as Naperville, Illinois, although I can't tell if they're referring to
> PWB as in the branch of UNIX or the environment of make, sccs, etc.
>
> Additionally, is anyone aware of surviving accessible specimens of SWAP
> assembly, EPL, or EPLX code or literature beyond the BSTJ references and
> paper
> referenced in the IEEE library below?  Thanks for any insights!
>
> - Matt G.
>
> [1] -
> https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V58N06_197907_Part_1.pdf
> [2] - https://ieeexplore.ieee.org/document/810323
> [3] -
> https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V60N06_198107_Part_2.pdf
> [4] -
> https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V62N03_198303_Part_3.pdf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/65906b98/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 060804.PDF
Type: application/pdf
Size: 28592 bytes
Desc: not available
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240525/65906b98/attachment.pdf>

From robpike at gmail.com  Sun May 26 09:06:25 2024
From: robpike at gmail.com (Rob Pike)
Date: Sun, 26 May 2024 09:06:25 +1000
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <CAC20D2NGJHs4RDiZjXWSN4JG0TixDB2DTcGygmn-CYxEUofq+Q@mail.gmail.com>
References: <20240525000348.hq5zvwm6x4evl44h@illithid>
 <ZlHCFj67W74P3v_p@largo.jsg.id.au>
 <CAC20D2NGJHs4RDiZjXWSN4JG0TixDB2DTcGygmn-CYxEUofq+Q@mail.gmail.com>
Message-ID: <CAKzdPgznQAOV=CFpBY6Fp5VJeO+-NZH5aGdnstrRtaqbaa6cAQ@mail.gmail.com>

Reminds me of my typesetting story (search the list's archives for versatec
and vegents, that should find it.)

-rob


On Sat, May 25, 2024 at 10:17 PM Clem Cole <clemc at ccc.com> wrote:

> Oh how I hate history rewrites.  Job control was developed by Kulp on V7
> in Europe and MIT.  Joy saw it and added it what would become 4BSD.
>
> The others were all developed on V7 (PDP11)at UCB.  They were not back
> ported either. The vax work inherited them from V7.
>
> It is true, The public tended to see these as 4BSD features as that was
> the vehicle that got larger distribution.
>
> Sent from a handheld expect more typos than usual
>
>
> On Sat, May 25, 2024 at 6:49 AM Jonathan Gray <jsg at jsg.id.au> wrote:
>
>> On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote:
>> > Hi folks,
>> >
>> > I'm finding it difficult to find any direct sources on the question in
>> > the subject line.
>> >
>> > Does anyone here have any source material they can point me to
>> > documenting the existence of a port of BSD curses to Unix Version 7?
>>
>> "In particular, the C shell, curses, termcap, vi and job control were
>> ported back to Version 7 (and later System III) so that it was not
>> unusual to find these features on otherwise pure Bell releases."
>> from Documentation/Books/Life_with_Unix_v2.pdf
>>
>> in some v7ish distributions: unisoft, xenix, nu machine, venix?
>>
>> https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437
>>
>> https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up
>>
>>
>> https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf
>> pg 580
>>
>> https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up
>>
>> https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412
>>
>> https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240526/ef7dd2e5/attachment.htm>

From joe at celo.io  Sun May 26 21:10:32 2024
From: joe at celo.io (Joe)
Date: Sun, 26 May 2024 13:10:32 +0200
Subject: [TUHS] Test, test
In-Reply-To: <ZV/usgsC9XuNDbZM@minnie.tuhs.org>
References: <ZV/usgsC9XuNDbZM@minnie.tuhs.org>
Message-ID: <d0ccde2d-6669-42be-aecd-526b2568461b@celo.io>

On 11/24/23 01:30, Warren Toomey via TUHS wrote:
> Just checking that the TUHS mailing list is still working.
> It's been awfully quiet!
> Cheers, Warren

I was just wondering the same now, last received mail was in November.

Testing ...

From ralph at inputplus.co.uk  Sun May 26 21:31:58 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Sun, 26 May 2024 12:31:58 +0100
Subject: [TUHS] Yes, the list is working.  (Was: Test, test)
In-Reply-To: <d0ccde2d-6669-42be-aecd-526b2568461b@celo.io>
References: <ZV/usgsC9XuNDbZM@minnie.tuhs.org>
 <d0ccde2d-6669-42be-aecd-526b2568461b@celo.io>
Message-ID: <20240526113158.6B1601FB21@orac.inputplus.co.uk>

Hi Joe,

> > Just checking that the TUHS mailing list is still working.  It's
> > been awfully quiet!
>
> I was just wondering the same now, last received mail was in November.

Yes, the list is working fine.  If you look at an email from the list,
its header will have list-* fields with useful content which includes

    List-Archive: <https://www.tuhs.org/mailman3/hyperkitty/list/tuhs at tuhs.org/>

That will show if there are emails reaching the list's software which
you aren't receiving.

-- 
Cheers, Ralph.

From ralph at inputplus.co.uk  Mon May 27 19:39:09 2024
From: ralph at inputplus.co.uk (Ralph Corderoy)
Date: Mon, 27 May 2024 10:39:09 +0100
Subject: [TUHS] Testing an RE recogniser exhaustively.  (Was: A fuzzy awk)
In-Reply-To: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
Message-ID: <20240527093909.91CAD21F18@orac.inputplus.co.uk>

Hi,

Doug wrote:
> The trick: From recursive equations (easily derived from the grammar
> of REs), I counted how many REs exist up to various limits on token
> counts,  Then I generated all strings that satisfied those limits,
> turned the recognizer loose on them and counted how many it accepted.

Which reminded me of Doug's paper.

    Enumerating the strings of regular languages,
    J. Functional Programming 14 (2004) 503-518

    Haskell code is developed for two ways to list the strings of the
    language defined by a regular expression: directly by set operations
    and indirectly by converting to and simulating an equivalent
    automaton.  The exercise illustrates techniques for dealing with
    infinite ordered domains and leads to an effective standard form for
    nondeterministic finite automata.

    PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf

It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions.  Doug writes:

    The even-a language (ab*a|b)* is defined by automaton h, with three
    start states.

	h0 = State 0 ’~’ []
	h1 = State 1 ’b’ [h4,h1,h0]
	h2 = State 2 ’a’ [h4,h1,h0]
	h3 = State 3 ’b’ [h2,h3]
	h4 = State 4 ’a’ [h2,h3]
	h = [h4,h1,h0]

The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.

-- 
Cheers, Ralph.

From hellwig.geisse at mni.thm.de  Mon May 27 23:03:09 2024
From: hellwig.geisse at mni.thm.de (Hellwig Geisse)
Date: Mon, 27 May 2024 15:03:09 +0200
Subject: [TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk)
In-Reply-To: <20240527093909.91CAD21F18@orac.inputplus.co.uk>
References: <CAKH6PiU8t3HoTXkeY3MFV02Sy6p4M1k2KChcbbga37_SX5pYHg@mail.gmail.com>
 <20240527093909.91CAD21F18@orac.inputplus.co.uk>
Message-ID: <2fa56390518c73b7f8e4563b5bae0fc48e374b03.camel@mni.thm.de>

Hi,

On Mon, 2024-05-27 at 10:39 +0100, Ralph Corderoy wrote:
> 
> Which reminded me of Doug's paper.
> 
>     Enumerating the strings of regular languages,
>     J. Functional Programming 14 (2004) 503-518
> 

Thanks for the pointer. That's a nice paper,
turned into an equally nice testing method.

Hellwig

From tuhs at tuhs.org  Tue May 28 03:37:40 2024
From: tuhs at tuhs.org (segaloco via TUHS)
Date: Mon, 27 May 2024 17:37:40 +0000
Subject: [TUHS] Did UNIX Ever Touch SPC-SWAP, EPL,
 or EPLX (1A Languages)?
In-Reply-To: <CALpTLGp6mZTypVMt+LTTumeaLg7aQxcjt--xLdizB9x7z3vYYw@mail.gmail.com>
References: <xv5Yh_-qaISR1NMcDZxT-u0eS7LWSr7Im5AlapgURum52oDIRq07Hu4S77V7PlsWb6URG1Mzxz4OLl4ILZq6J48F3pfCDcR4FRE1GwBKy8s=@protonmail.com>
 <CALpTLGp6mZTypVMt+LTTumeaLg7aQxcjt--xLdizB9x7z3vYYw@mail.gmail.com>
Message-ID: <bFoYmPgli_NWhQtmA9jQls09p51XTwVFxxpypRrr2-uN-23_JMlaFZl_unDwNU1tyr44SbMdlTttkWh99AOcvZcusiLDime1_Ho3ZkCw78I=@protonmail.com>

On Saturday, May 25th, 2024 at 3:28 PM, Alan Glasser <alanglasser at gmail.com> wrote:

> Matt,
> First, sorry for the delayed response.
> 
> In around 1994 through late 1996 I worked on the FlashPort project in Bell Labs.
> A significant project that we completed was FlashPort'ing the 4ESS SWAP assembler from TSS/360 to Solaris.
> My memory is that the 4E team wanted to get off of TSS and onto Unix.
> 
> Alan
> 
> https://techmonitor.ai/technology/emulator_house_echo_logic_folded_back_into_att
> 
> 
> On Fri, Apr 5, 2024 at 12:59 AM segaloco via TUHS <tuhs at tuhs.org> wrote:
> 
> > So I've been doing a bit of reading on 1A and 4ESS technologies lately, getting
> > a feel for the state of things just prior to 3B and 5ESS popping onto the scene,
> > and came across some BSTJ references to the programming environments involved
> > in the 4ESS and TSPS No. 1 systems.
> > 
> > The general assembly system targeting the 1A machine language was known as
> > SPC-SWAP (SWitching Assembly Program)[1](p. 206) and ran under OS/360/370, with
> > editing typically performed in QED. This then gave way to the EPL (ESS
> > Programming Language) and ultimately EPLX (EPL eXtra)[2](p. 1)[3](p. 8)
> > languages which, among other things, were used for later 4ESS work with cross-
> > compilers for at least TSS/360 by the sounds of it.
> > 
> > Are there any recollections of attempts by the Bell System to rebase any of
> > these 1A-targeting environments into UNIX, or by the time UNIX was being
> > considered more broadly for Bell System projects, was 3B/5ESS technology well on
> > the way, rendering attempting to move entrenched IBM-based environments for the
> > older switching computation systems moot?
> > 
> > For the record, in addition to the evolution of ESS to the 5ESS generation, a
> > revision of TSPS, 1B, was also introduced which was rebased on the 3B20D
> > processor and utilized the same 3B cross-compilation SGS under UNIX as other 3B-
> > targeted applications[4]. Interestingly, the paper on software development
> > in [4](p. 109) still makes reference to Programmer's Workbench as of 1982,
> > implying that nomenclature may have still been the norm at some Bell Labs sites
> > such as Naperville, Illinois, although I can't tell if they're referring to
> > PWB as in the branch of UNIX or the environment of make, sccs, etc.
> > 
> > Additionally, is anyone aware of surviving accessible specimens of SWAP
> > assembly, EPL, or EPLX code or literature beyond the BSTJ references and paper
> > referenced in the IEEE library below? Thanks for any insights!
> > 
> > - Matt G.
> > 
> > [1] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V58N06_197907_Part_1.pdf
> > [2] - https://ieeexplore.ieee.org/document/810323
> > [3] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V60N06_198107_Part_2.pdf
> > [4] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V62N03_198303_Part_3.pdf

Wow, FlashPort sounds like quite the endeavor!  It's funny, I've been considering something along those lines for attempting to port older console video games to computer, somewhere between emulation and a true port, essentially emulation where most of the actual translation of CPU operations has been done before-hand (AOT) rather than the common interpreter or dynacomp approaches (JIT).  Glad to see a sizeable example of that sort of thing being used.

Now if only Nokia would take a walk through the archives and see if any of this stuff still exists...

- Matt G.

From mah at mhorton.net  Tue May 28 04:31:17 2024
From: mah at mhorton.net (Mary Ann Horton)
Date: Mon, 27 May 2024 11:31:17 -0700
Subject: [TUHS] Was curses ported to Seventh Edition Unix?
In-Reply-To: <y2ay17x9356.fsf@offog.org>
References: <CAKH6PiVnhwjuRZWxX-Gc0AU4fSaraNXUSPAnXX=taeSfVy6OyA@mail.gmail.com>
 <CAFH29trJ7V3UZRUKmsEqC6_9VgmxKShmsC5oNit_tvw5EJt8uw@mail.gmail.com>
 <CAC20D2Mf_Q+w_XpszoQrAcv_QmZQFFexNGDAGfOBPVMJuFOu4A@mail.gmail.com>
 <20240525155737.bwmngdyf4qnj4avv@illithid>
 <CAC20D2N0xZCr_Fwe8Z77gELa3D59ROqg9LFUuZWOkhvxrjPJYQ@mail.gmail.com>
 <CAC20D2Nmqeb1_jfWQVko-vKVpXQ2yDDz00qDiKEt-=qxaoK+FQ@mail.gmail.com>
 <y2ay17x9356.fsf@offog.org>
Message-ID: <78be4696-e743-4231-9c6a-32b6edd92f09@mhorton.net>

Adam, thank you for finding this and setting the record straight.

AT&T management had nothing to do with it. I self-censored because 
AT&T's policy was that anything I wrote belonged to my employer.

Pavel graciously offered to clone my work, and I slipped him the spec 
and the algorithm for the new improved curses. His version was FOSS and 
became the de facto standard everywhere except AT&T, where it wound up 
in System V Release 4 / Solaris.

Thanks,

/Mary Ann Horton/ (she/her/ma'am)
       Award Winning Author
maryannhorton.com <https://maryannhorton.com>


On 5/25/24 11:07, Adam Sampson wrote:
> Clem Cole<clemc at ccc.com>  writes:
>
>> Pavel (with coaching from a few of us, including me], wrote a new
>> implementation of terminfo.  When he was added it, he combined a
>> rewrite of curses.
>  From the utzoo Usenet archive...
>
> --start--
>
> From: utzoo!decvax!harpo!floyd!vax135!cornell!pavel
> Newsgroups: net.general
> Title: New Curses/Terminfo Package
> Article-I.D.: cornell.3348
> Posted: Sat Jul 10 15:10:14 1982
> Received: Sun Jul 11 03:55:13 1982
>
> At this past week's USENIX meeting, Mark Horton announced the completion
> of a replacement database/interface for the Berkeley 'termcap' setup.  The
> new version is called 'terminfo' and has several advantages over termcap:
> 	- The database is compiled and therefore start-up time for
> 	  programs using the package is considerably reduced, even
> 	  faster than reading a single-entry termcap database.
> 	- The database is more human-readable and flexible.
> 	- Many more terminals can be supported due to the addition
> 	  of several new capabilities, generalised parameter
> 	  mechanisms (enabling the full use of, for example, the ANSI
> 	  cursor-forward capability by allowing you to say 'move forward
> 	  35 spaces' as opposed to 'move forward' 35 times), a fully
> 	  general yet efficient arithmetic mechanism which should allow
> 	  the use of \any/ bizarre cursor-addressing scheme which can
> 	  be computed, etc.
> 	- A \far/ better set of routines for accessing the database,
> 	  requiring, for example, only a single call to read in an
> 	  entire entry, making all of the terminal's capabilities fully
> 	  available to the calling program.  No more need for 'tgetent',
> 	  'tgetstr', etc.
> Conversion of existing programs from termcap to terminfo is very easy and
> usually consists mostly of throwing out all of the garbage needed to read
> and store a termcap entry.
>
> As a companion to the change to terminfo, Mark has also completed work on
> a re-vamped version of the Curses screen-handling library package.  The new
> version has many, many advantages over the previous version, some of which
> are listed below:
> 	- New curses can use insert/delete line/character capabilities
> 	  in terminals which have them, considerably speeding up many
> 	  applications
> 	- It is possible to use the new curses on more than one type of
> 	  terminal at once
> 	- All of the video attributes of a terminal (e.g. reverse video,
> 	  boldface, blinking, etc.) can be used, in tandem if possible
> 	- New curses handles terminals like the Televideos with the
> 	  so-called 'magic cookie' glitch which leaves markers on the
> 	  screen for each change of video attributes
> 	- The arrow and function keys of terminals can be input just as
> 	  though they were single characters, even on terminals which use
> 	  multi-character sequences for these functions.  The new curses
> 	  does all necessary interpretation, passing back to the program
> 	  only a defined constant telling which key was pressed.
> 	- There is a user-accessable scrolling region
> 	- The use of shell escapes and the csh ^Z job control feature is
> 	  supported more fully
> 	- On systems which can support the notion, updates of the screen
> 	  will abort if a character is typed at the keyboard, thus allowing
> 	  the application to possibly avoid useless output
> 	- It should now be possible for most programs to be written very
> 	  portably to run on most versions of UNIX, including System III,
> 	  Berkeley UNIX, V7, Bell Labs internal UNIX, etc.  This portability
> 	  extends to the use of most terminal modes, such as raw mode,
> 	  echoing, etc.
>
> Now for the bad news.  Mark, being an employee of Bell Labs, cannot release
> any of his code.  Estimates currently run as high as 18 months for a Bell
> release.  Even then, nothing could be guaranteed as to its price.  As a result,
> I have decided to do a public-domain implementation of both terminfo and the
> new curses.  They will be compatible with Mark's versions.  I have arranged
> for the library/database to be distributed with the next Berkeley Software
> Distribution, 4.2BSD, in December of this year.  It will also be made available
> for free to any requestor.  I agree with Mark when he says that terminfo is
> clearly superior to termcap and deserves to be made a new and lasting standard.
>
> I expect to be able to begin recruiting test sites for both curses and terminfo
> by the end of September.
>
> If you have any questions, comments or suggestions, please send them to me, not
> the network.
>
> 	Pavel Curtis
> 	{decvax,allegra,vax135,harpo,...}!cornell!pavel
> 	Pavel.Cornell at Udel-Relay
>
> --end--
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240527/ccd95e30/attachment.htm>

From web at loomcom.com  Wed May 29 06:37:37 2024
From: web at loomcom.com (Seth Morabito)
Date: Tue, 28 May 2024 13:37:37 -0700
Subject: [TUHS] IN/ix
Message-ID: <023b2172-d8f8-456a-91ce-071d95f6b921@app.fastmail.com>

A few years ago, someone -- and I've forgotten who, forgive me -- kindly gave me a copy of the source code for a UNIX for the AT&T PC6300 called IN/ix, developed by INTERACTIVE Systems. I have found precious little about this system online. Apparently the PC/ix UNIX for the IBM PC XT is fairly well preserved, but I can't find much about IN/ix.

For what it's worth, the login herald in the source code reads:

"IN/ix Office System (c) Copyright INTERACTIVE Systems Corp. 1983, 1988"

Presumably this was PC/ix, but targeting the AT&T 6300? Does anyone have any more knowledge of IN/ix?

If you're interested in digging into it yourself, I've dropped the source here:

https://archives.loomcom.com/pc6300/

(N.B.: All the files inside the zip are compressed, that's just how I got it)

-Seth
-- 
  Seth Morabito * Poulsbo, WA * https://loomcom.com/

From e5655f30a07f at ewoof.net  Wed May 29 21:57:50 2024
From: e5655f30a07f at ewoof.net (Michael =?utf-8?B?S2rDtnJsaW5n?=)
Date: Wed, 29 May 2024 11:57:50 +0000
Subject: [TUHS] OS and vendor identification
Message-ID: <dc8a9c4d-f01f-47c6-b4df-905b941af755@home.arpa>

I spotted this elsewhere and thought that maybe someone here might be
able to contribute.

https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html

-- 
Michael Kjörling                     🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”


From lars at nocrew.org  Thu May 30 03:22:56 2024
From: lars at nocrew.org (Lars Brinkhoff)
Date: Wed, 29 May 2024 17:22:56 +0000
Subject: [TUHS] OS and vendor identification
In-Reply-To: <dc8a9c4d-f01f-47c6-b4df-905b941af755@home.arpa> ("Michael
 \=\?utf-8\?Q\?Kj\=C3\=B6rling\=22's\?\= message of "Wed, 29 May 2024 11:57:50
 +0000")
References: <dc8a9c4d-f01f-47c6-b4df-905b941af755@home.arpa>
Message-ID: <7wed9kjzwv.fsf@junk.nocrew.org>

Michael Kjörling wrote:
> I spotted this elsewhere and thought that maybe someone here might be
> able to contribute.
> https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html

Chances are you will find something on Bitsavers:
https://google.com/search?q=%22triton%22+%22unix%22+site%3Abitsavers.org

From crossd at gmail.com  Thu May 30 03:31:10 2024
From: crossd at gmail.com (Dan Cross)
Date: Wed, 29 May 2024 13:31:10 -0400
Subject: [TUHS] OS and vendor identification
In-Reply-To: <dc8a9c4d-f01f-47c6-b4df-905b941af755@home.arpa>
References: <dc8a9c4d-f01f-47c6-b4df-905b941af755@home.arpa>
Message-ID: <CAEoi9W4uJStSLpnt9sggNJM5D8kpZHwzq0r-tW4-6A28itSmHw@mail.gmail.com>

On Wed, May 29, 2024 at 8:07 AM Michael Kjörling <e5655f30a07f at ewoof.net> wrote:
> I spotted this elsewhere and thought that maybe someone here might be
> able to contribute.
>
> https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html

ACIS/AOS in that listing almost surely refers to the "ACademic
Information System" / "Academic Operating System", which was IBM's
port of 4.3BSD-Tahoe+NFS to the RT (e.g., 6150/6151/6152) using the
ROMP processor sold to universities. These were used in Project
Athena, I believe; Ted can say more about that than I can.

I'd send this to Zach directly, but I don't have his email address.

          - Dan C.

From pnr at planet.nl  Fri May 31 22:00:55 2024
From: pnr at planet.nl (Paul Ruizendaal)
Date: Fri, 31 May 2024 14:00:55 +0200
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
Message-ID: <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl>

I’m further looking into BCPL / B / C family compilers on 16-bit mini-computers prior to 1979.

Lot’s of interesting stuff. BCPL was extended with structures at least twice and plenty struggle with (un)scaled pointers. It seems that the Nova was a much easier target than the PDP-11, with a simpler code generator sufficing to generate quality code. I’ll report more fully when I’m further along with my review.

> On May 8, 2024, at 5:51 PM, Clem Cole <clemc at ccc.com> wrote:
> 
> IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth.   As the 11/40 was one of the original Thoth target systems,  I would have expected that to exist, but I have never used it.

Yes, they did. I’m working through the various papers on Thoth and the Eh / Zed compilers (essentially B with tweaks). I’ve requested pdf’s of two theses that are only on micro-fiche from the Uni of Waterloo library, hopefully this is possible. The original target machines were Honeywell 6060, DG Nova, Microdata 1600/30 and TI-990. The latter is close enough to a PDP-11. This compiler is from 1976.

I’ve browsed around for surviving Thoth source code, but it would seem to be lost. Does anyone know of surviving Thoth bits?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.tuhs.org/pipermail/tuhs/attachments/20240531/21655091/attachment.htm>

From peter.martin.yardley at gmail.com  Fri May 31 22:21:03 2024
From: peter.martin.yardley at gmail.com (Peter Yardley)
Date: Fri, 31 May 2024 22:21:03 +1000
Subject: [TUHS] On the uniqueness of DMR's C compiler
In-Reply-To: <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl>
References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl>
 <CAKzdPgx855c5a-AirEmCvi4+ryrV8=R2z-gOyUQRAGWziACBQg@mail.gmail.com>
 <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl>
 <CAC20D2N6PqeBHw3vS=K-5O16do8qe7cZ6p6NE8=tq67k45sBrg@mail.gmail.com>
 <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl>
Message-ID: <601CEA28-C64A-4BF3-AC7A-245ED4E653EA@gmail.com>

I believe the Nova became a Mil Std instruction set (proven without hazard). Its architecture was pretty simple.

We sold ours to the Navy.

> On 31 May 2024, at 10:00 PM, Paul Ruizendaal <pnr at planet.nl> wrote:
> 
> I’m further looking into BCPL / B / C family compilers on 16-bit mini-computers prior to 1979.
> 
> Lot’s of interesting stuff. BCPL was extended with structures at least twice and plenty struggle with (un)scaled pointers. It seems that the Nova was a much easier target than the PDP-11, with a simpler code generator sufficing to generate quality code. I’ll report more fully when I’m further along with my review.
> 
>> On May 8, 2024, at 5:51 PM, Clem Cole <clemc at ccc.com> wrote:
>> 
>> IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth.   As the 11/40 was one of the original Thoth target systems,  I would have expected that to exist, but I have never used it.
> 
> Yes, they did. I’m working through the various papers on Thoth and the Eh / Zed compilers (essentially B with tweaks). I’ve requested pdf’s of two theses that are only on micro-fiche from the Uni of Waterloo library, hopefully this is possible. The original target machines were Honeywell 6060, DG Nova, Microdata 1600/30 and TI-990. The latter is close enough to a PDP-11. This compiler is from 1976.
> 
> I’ve browsed around for surviving Thoth source code, but it would seem to be lost. Does anyone know of surviving Thoth bits?
> 
> 

Peter Yardley
peter.martin.yardley at gmail.com