From pnr at planet.nl Wed May 8 06:59:17 2024 From: pnr at planet.nl (Paul Ruizendaal) Date: Tue, 7 May 2024 21:59:17 +0100 (GMT+01:00) Subject: [TUHS] On the uniqueness of DMR's C compiler Message-ID: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it: https://gitlab.com/marinchip After creating a basic tool chain (edit, asm, link and a simple executive), John  set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s. This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects: 1. The C language itself 2. The ability to run natively on small hardware (even an LSI-11 system) 3. Generating code with modest overhead versus handwritten assembler (say 30%) As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture). There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era: https://www.bell-labs.com/usr/dmr/www/primevalC.html https://www.bell-labs.com/usr/dmr/www/chist.html https://www.bell-labs.com/usr/dmr/www/hopl.html It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine. As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers. I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers. Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robpike at gmail.com Wed May 8 08:07:44 2024 From: robpike at gmail.com (Rob Pike) Date: Wed, 8 May 2024 08:07:44 +1000 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> Message-ID: I'm not sure I accept your starting position. There were several compilers for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code, but I don't believe they all were. And of course there was BCPL, which was - and is - tiny; was it on the 11? And there were other small machines from other manufacturers, all of which had some form of Fortran and other bespoke things, such as RPG on the small IBMs. I think the uniqueness was in the set of conditions more than in the Unix C compiler itself. But you may be right. -rob On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal wrote: > In the last months, I've spent a little time on curating John Walker's > Unix clone and software stack, including an emulator to run it: > https://gitlab.com/marinchip > > After creating a basic tool chain (edit, asm, link and a simple > executive), John set out to find a compiler. Among the first programs were > a port of the META 3 compiler-generator (similar to TMG on early Unix) and > a port of Birch-Hansen’s Pascal compiler. META was used to create a > compiler that generated threaded code. He found neither compiler good > enough for his goals and settled on writing his Unix-like OS in assembler. > As the 9900 architecture withered after 1980, this sealed the fate of this > OS early on -- had he found a good compiler, the code might have competed > alongside Coherent, Idris, and Minix during the 80’s. > > This made me realise once more how unique the Ritchie C compiler was. In > my view its uniqueness combines three aspects: > 1. The C language itself > 2. The ability to run natively on small hardware (even an LSI-11 system) > 3. Generating code with modest overhead versus handwritten assembler (say > 30%) > > As has been observed before, working at a higher abstraction level makes > it easier to work on algorithms and on refactoring, often earning back the > efficiency loss. John Walkers work may be case in point: I estimate that > his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as > compiled for the 9900 architecture). > > There are three papers on DMR’s website about the history of the compiler > and a compare-and-contrast with other compilers of the era: > https://www.bell-labs.com/usr/dmr/www/primevalC.html > https://www.bell-labs.com/usr/dmr/www/chist.html > https://www.bell-labs.com/usr/dmr/www/hopl.html > > It seems to me that these papers rather understate the importance of > generating good quality code. As far as I can tell, BCPL and BLISS came > close, but were too large to run on a PDP-11 and only existed as > cross-compilers. PL/M was a cross-compiler and generated poorer code. > Pascal on small machines compiled to a virtual machine. As far as I can > tell, during most of the 70s there was no other compiler that generated > good quality code and ran natively on a small (i.e. PDP-11 class) machine. > > As far as I can tell the uniqueness was mostly in the “c1” phase of the > compiler. The front-end code of the “c0” phase seems to use more or less > similar techniques as many contemporary compilers. The “c1” phase seems to > have been unique in that it managed to do register allocation and > instruction selection with a pattern matcher and associated code tables > squeezed into a small address space. On a small machine, other native > compilers of the era typically proceeded to generate threaded code, code > for a virtual machine or poor quality native code that evaluated > expressions using stack operations rather than registers. > > I am not sure why DMR's approach was not more widely used in the 1970’s. > The algorithms he used do not seem to be new and appear to have their roots > in other (larger) compilers of the 1960’s. The basic design seems to have > been in place from the very first iterations of his compiler in 1972 (see > V2 tree on TUHS) and he does not mention these algorithms as being special > or innovative in his later papers. > > Any observations / opinions on why DMR’s approach was not more widely used > in the 1970’s? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pnr at planet.nl Wed May 8 19:35:21 2024 From: pnr at planet.nl (Paul Ruizendaal) Date: Wed, 8 May 2024 10:35:21 +0100 (GMT+01:00) Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> Message-ID: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Thanks for pointing that out. Here's an interesting paper on the DEC PDP11 Fortran compilers: http://forum.6502.org/download/file.php?id=1724&sid=f6a721f3e05774cff076da72f5a731a6 Before 1975 they used direct threading, thereafter there was a native compiler for the higher-end models. I think this one may have required split i/d, but that is not entirely clear from the text. I think the same holds for BCPL on the PDP11: compiling to "ocode" or "intcode" in the early 70s, native thereafter -- still have to find source for the latter. Still, I should have first asked: Does anyone have pointers to small machine native compilers from the 1970's that produced efficient assembler output? I am already aware of the 1978 Whitesmith C compiler. 7 May 2024 23:07:58 Rob Pike : > I'm not sure I accept your starting position. There were several compilers for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code, but I don't believe they all were. And of course there was BCPL, which was - and is - tiny; was it on the 11? > > And there were other small machines from other manufacturers, all of which had some form of Fortran and other bespoke things, such as RPG on the small IBMs. I think the uniqueness was in the set of conditions more than in the Unix C compiler itself. > > But you may be right. > > -rob > > > > > On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal wrote: >> In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it: >> https://gitlab.com/marinchip >> >> After creating a basic tool chain (edit, asm, link and a simple executive), John  set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s. >> >> This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects: >> 1. The C language itself >> 2. The ability to run natively on small hardware (even an LSI-11 system) >> 3. Generating code with modest overhead versus handwritten assembler (say 30%) >> >> As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture). >> >> There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era: >> https://www.bell-labs.com/usr/dmr/www/primevalC.html >> https://www.bell-labs.com/usr/dmr/www/chist.html >> https://www.bell-labs.com/usr/dmr/www/hopl.html >> >> It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine. >> >> As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers. >> >> I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers. >> >> Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s? -------------- next part -------------- An HTML attachment was scrubbed... URL: From e5655f30a07f at ewoof.net Wed May 8 21:09:29 2024 From: e5655f30a07f at ewoof.net (Michael =?utf-8?B?S2rDtnJsaW5n?=) Date: Wed, 8 May 2024 11:09:29 +0000 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> Message-ID: <33ca504d-5167-4796-a277-b9d2865b7fb1@home.arpa> On 7 May 2024 21:59 +0100, from pnr at planet.nl (Paul Ruizendaal): > It seems to me that these papers rather understate the importance of > generating good quality code. As far as I can tell, BCPL and BLISS > came close, but were too large to run on a PDP-11 and only existed > as cross-compilers. https://www.softwarepreservation.org/projects/BCPL/index.html#York appears to indicate that by 1974 there existed a native PDP-11 (/40 or /45) BCPL compiler which ran under RSX-11. -- Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?” From robpike at gmail.com Wed May 8 23:12:28 2024 From: robpike at gmail.com (Rob Pike) Date: Wed, 8 May 2024 23:12:28 +1000 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: I believe Ken Thompson might have been a referee for that paper. At least, he once mentioned to me that he had reviewed a paper about the threading in the DEC Fortran compiler. -rob On Wed, May 8, 2024 at 7:35 PM Paul Ruizendaal wrote: > Thanks for pointing that out. Here's an interesting paper on the DEC PDP11 > Fortran compilers: > > http://forum.6502.org/download/file.php?id=1724&sid=f6a721f3e05774cff076da72f5a731a6 > > Before 1975 they used direct threading, thereafter there was a native > compiler for the higher-end models. I think this one may have required > split i/d, but that is not entirely clear from the text. > > I think the same holds for BCPL on the PDP11: compiling to "ocode" or > "intcode" in the early 70s, native thereafter -- still have to find source > for the latter. > > Still, I should have first asked: Does anyone have pointers to small > machine native compilers from the 1970's that produced efficient assembler > output? > > I am already aware of the 1978 Whitesmith C compiler. > > 7 May 2024 23:07:58 Rob Pike : > > I'm not sure I accept your starting position. There were several compilers > for RT-11 and RSX/11-M. RSX (and perhaps RT) Fortran were threaded code, > but I don't believe they all were. And of course there was BCPL, which was > - and is - tiny; was it on the 11? > > And there were other small machines from other manufacturers, all of which > had some form of Fortran and other bespoke things, such as RPG on the small > IBMs. I think the uniqueness was in the set of conditions more than in the > Unix C compiler itself. > > But you may be right. > > -rob > > > > > On Wed, May 8, 2024 at 6:59 AM Paul Ruizendaal wrote: > >> In the last months, I've spent a little time on curating John Walker's >> Unix clone and software stack, including an emulator to run it: >> https://gitlab.com/marinchip >> >> After creating a basic tool chain (edit, asm, link and a simple >> executive), John set out to find a compiler. Among the first programs were >> a port of the META 3 compiler-generator (similar to TMG on early Unix) and >> a port of Birch-Hansen’s Pascal compiler. META was used to create a >> compiler that generated threaded code. He found neither compiler good >> enough for his goals and settled on writing his Unix-like OS in assembler. >> As the 9900 architecture withered after 1980, this sealed the fate of this >> OS early on -- had he found a good compiler, the code might have competed >> alongside Coherent, Idris, and Minix during the 80’s. >> >> This made me realise once more how unique the Ritchie C compiler was. In >> my view its uniqueness combines three aspects: >> 1. The C language itself >> 2. The ability to run natively on small hardware (even an LSI-11 system) >> 3. Generating code with modest overhead versus handwritten assembler (say >> 30%) >> >> As has been observed before, working at a higher abstraction level makes >> it easier to work on algorithms and on refactoring, often earning back the >> efficiency loss. John Walkers work may be case in point: I estimate that >> his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as >> compiled for the 9900 architecture). >> >> There are three papers on DMR’s website about the history of the compiler >> and a compare-and-contrast with other compilers of the era: >> https://www.bell-labs.com/usr/dmr/www/primevalC.html >> https://www.bell-labs.com/usr/dmr/www/chist.html >> https://www.bell-labs.com/usr/dmr/www/hopl.html >> >> It seems to me that these papers rather understate the importance of >> generating good quality code. As far as I can tell, BCPL and BLISS came >> close, but were too large to run on a PDP-11 and only existed as >> cross-compilers. PL/M was a cross-compiler and generated poorer code. >> Pascal on small machines compiled to a virtual machine. As far as I can >> tell, during most of the 70s there was no other compiler that generated >> good quality code and ran natively on a small (i.e. PDP-11 class) machine. >> >> As far as I can tell the uniqueness was mostly in the “c1” phase of the >> compiler. The front-end code of the “c0” phase seems to use more or less >> similar techniques as many contemporary compilers. The “c1” phase seems to >> have been unique in that it managed to do register allocation and >> instruction selection with a pattern matcher and associated code tables >> squeezed into a small address space. On a small machine, other native >> compilers of the era typically proceeded to generate threaded code, code >> for a virtual machine or poor quality native code that evaluated >> expressions using stack operations rather than registers. >> >> I am not sure why DMR's approach was not more widely used in the 1970’s. >> The algorithms he used do not seem to be new and appear to have their roots >> in other (larger) compilers of the 1960’s. The basic design seems to have >> been in place from the very first iterations of his compiler in 1972 (see >> V2 tree on TUHS) and he does not mention these algorithms as being special >> or innovative in his later papers. >> >> Any observations / opinions on why DMR’s approach was not more widely >> used in the 1970’s? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Thu May 9 01:51:11 2024 From: clemc at ccc.com (Clem Cole) Date: Wed, 8 May 2024 11:51:11 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: I agree with Rob. I fear the OP might have more limited experience with what was available at the time and how it was developed. The following is undoubtedly incomplete. It is what I could remember quickly to answer the question of real compilers for the PDP-11. As others have pointed out, the original DEC PDP-11 FTN, like the original PDP-6 and PDP-8, was based on threaded DEC F4 technology. After introducing the PDP-10, the 36-bit compiler team at DEC started a project to rewrite FORTRAN (in BLISS) as a true compiler. As was reminded at lunch last week (I still eat weekly with many of the DEC TLG folks), DEC had two groups -- a development team and a support team. I think some of the external confusion comes from both teams releasing products to the world, and the outside world did not always understand the differences. So, when I say the "compiler" group, I generally refer to the former - although many people started in the latter and eventually became part of the former. They key point here is that F4 (which was from the support folks), lived for a while in parallel with stuff coming from what eventually would become TLG [Technical Languages (and tools) Group]. The primary DEC-supported technical languages were all written in BLISS-11 and cross-compiled from the PDP-10 (originally). However, they could run in 11/40 class (shared I/D) space machines. Remember, DEC operating systems could do overlays - although there were probably some differences with what could be generated [I'd need to pull the old RT11 manuals for each]. Yes, FORTRAN was the primary technical language, but DEC's TLG supported other languages for the PDP-11 from COBOL to BASIC, and 3rd parties filled out the available suite. Probably the #1 3rd party, PDP-11 compiler, is (was) the OMSI Pascal compiler (which generated direct PDP-11 code) for all classes of PDP-11s [the OP referred to the Pascal that generated P4 code and ran interpreter for same. The UCSD Pascal worked this way, but I never saw anything other than students use it for teaching, while the OMSI compiler was a force for PDP-11 programmers, and you saw it in many PDP-11 shops - including some I worked]. I'm pretty sure the RT11 and RSX11 versions of this can be easily found in the wild, but I have not looked for the UNIX version (note that there was one). Note - from a SW marketplace for PDP-11s, the money was on the DEC operating systems, not UNIX. So, there was little incentive to move those tools, which I think is why the OP may not have experienced them. Another important political thing to consider is that TLG did their development on PDP-10s and later Vaxen inside DEC. Since everything was written in BLISS and DEC marketing 100% missed/sunk that boat, the concept of self-hosting the compiler was not taken seriously (ISTR: there was a project to make it self-host on RSX, but it was abandoned since customers were not beating DEC's door down for BLISS on many PDP-11 systems). Besides DMR's compiler for the PDP-11. Steve Johnson developed PCC and later PCC2. Both ran on all flavors of PDP-11s, although I believe since the lack of support for overlays in the research UNIX editions limited the compilers and ISTR, there were both 11/40 and 11/45 class binaries with different-sized tables. On our Unix boxes, we also had a PDP-11 Pascal compiler from Free University in Europe (VU) - I don't remember much about it nor can I find it in a quick search of my currently online stuff. ISTR That one may have been 11/45 class - we had it on the TekLabs 11/70 and I don't remember having in on any of our 40-class systems. The Whitesmith's C has been mentioned. That compiler ran on all the PDP-11 UNIXs of the day, plus its native Idris, as well as the DEC OSs. It did not use an interpreter per se, but instead compiled to something Plauger called 'ANAT" - a natural assembler. He then ran an optimizer over this output and his final pass converted from ANAT code to the PDP-11 (or Z80 as it turns out). I argue that ANAT was what we now think of in modern compilers as the IL, but others might argue differently. We ran it on our RT-11 systems, although ISTR came with the UNIX version, so we had it on the 11/70, too. That may have been because we used it to cross-compile for the Z80. Tannabaum and the team have the Amsterdam compiler toolkit. This had front ends for C and Pascal and could generate code for PDP-11s and several other microprocessors. I do not know how widely it was used for the PDP11s. Per Brinch, Hansen also implemented Parallel Pascal and his own OS for the 40-class PDP-11s. He talks about this in his book Pascal on Small Systems. Holt and team wrote Concurrent Euclid and TUNIS for the 40-class machines. Wirth released a Modula for the 11, although we mostly ran it on the 68000s and a Lilith system. IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth. As the 11/40 was one of the original Thoth target systems, I would have expected that to exist, but I have never used it. As was mentioned before, there was BCPL for the PDP-11. I believe that a BCPL compiler can even be found on one of the USENIX tapes in the TUHS archives, but I have not looked. Finally, ISTR, in the mid-late 1970s one of the Universities in Europe (??Edinburgh, maybe??), developed and released an Algol flavor for the PDP-11, but I never used it. Again, you might want to check the TUHS archives. In my own case, while I had used Algol on the PDP-8s and 10s, plus the IBM systems, and by then Pascal had become the hot alternative language and was close enough I never had a desire/need for it. Plus since there were a number of Pascal implementations available for 11s and no one in Teklabs was asking for it, I never chased it down. To quote Tom Lehrer .. "*These are the only ones that the news has come to Huvrd. There may be many others ..*." Clem ᐧ ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nobozo at gmail.com Thu May 9 02:07:47 2024 From: nobozo at gmail.com (Jon Forrest) Date: Wed, 8 May 2024 09:07:47 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: There was also a Modula2 compiler for the PDP-11 from a university in the UK, propably York. It was used to some degree at Ford Aerospace for the KSOS secure Unix project. I think it required separate I&D. Jon From ats at offog.org Thu May 9 03:05:51 2024 From: ats at offog.org (Adam Sampson) Date: Wed, 08 May 2024 18:05:51 +0100 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: (Clem Cole's message of "Wed, 8 May 2024 11:51:11 -0400") References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: Clem Cole writes: > Finally, ISTR, in the mid-late 1970s one of the Universities in Europe > (??Edinburgh, maybe??), developed and released an Algol flavor for the > PDP-11, but I never used it. That sounds like Edinburgh's IMP, which eventually had backends for a very wide variety of platforms. Several versions are available here: https://history.dcs.ed.ac.uk/archive/languages/ -- Adam Sampson From aek at bitsavers.org Thu May 9 03:45:47 2024 From: aek at bitsavers.org (Al Kossow) Date: Wed, 8 May 2024 10:45:47 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org> On 5/8/24 8:51 AM, Clem Cole wrote: > IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth.   As the 11/40 was one of the original Thoth target > systems,  I would have expected that to exist, but I have never used it. > Thoth has been a white whale for me for decades. AFAIK nothing has survived from it. "Decus" (Conroy's) C (transliteration of the assembler Unix C) should also be mentioned. From tom.perrine+tuhs at gmail.com Thu May 9 03:49:02 2024 From: tom.perrine+tuhs at gmail.com (Tom Perrine) Date: Wed, 8 May 2024 10:49:02 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: Hi Jon (and others), I was part of the KSOS (later KSOS-11 and KSOS-32) team at LOGICON, which picked up a follow-on contract to use KSOS-11 in a true multi-level-secure production environment. Our target was SYSTEM_LOW as TOP SECRET. Yes, we used that compiler for all the KSOS kernel and all the trusted user-space code. KSOS-11 only ran on PDP-11/70, and it did use split I&D. I have access to the KSOS-11 source code, and have been trying to rebuild that OS, BUT I haven't been able to find that Modula compiler. KSOS-11 was a very small kernel, but there was a set of libraries that presented a UNIX system call interface, so it could run some PWB userspace tools, if they were re-compiled. I'm using the term KSOS-11, as there was a follow-on project (KSOS-32) that ported the original PDP KSOS to 11/780. I wrote a completely new (simpler) scheduler, the bootstrap and memory management layer for that one. And, for "reasons", the entire KSOS project at Logicon was shut down just a week or so after the first user login to KSOS-32. KSOS-11 itself and some multi-level applications did ship to DoD customers, and it ran MLS applications for the Navy and USAFE. --tep ps. Jon was kind enough to remind me that we had corresponded about this in the past -and- to remind me to send to the list, and not just him :-) On Wed, May 8, 2024 at 9:08 AM Jon Forrest wrote: > There was also a Modula2 compiler for the PDP-11 from a university in the > UK, > propably York. It was used to some degree at Ford Aerospace for the > KSOS secure Unix project. I think it required separate I&D. > > Jon > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Thu May 9 04:12:15 2024 From: clemc at ccc.com (Clem Cole) Date: Wed, 8 May 2024 14:12:15 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org> Message-ID: On Wed, May 8, 2024 at 1:46 PM Al Kossow wrote: > Thoth has been a white whale for me for decades. Ditto. Although, I believe the late John Beety had his 'Thoth Thucks" tee shirt for years. I believe Kelly Booth still does. > AFAIK nothing has survived from it. > You can argue that V-Kernel and QNX are children of Thoth - but they were both in a flavor of Waterloo C that did not think ever targeted the PDP-11 [that might be a misunderstanding WRT Waterloo C]. > > "Decus" (Conroy's) C (transliteration of the assembler Unix C) should also > be mentioned. > Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an end-around for the AT&T lawyers by taking the *.s files, and converting them to MACRO11, and then redoing the assembler code to use originally RT11 I/O and later RSX11. That said, it had its own life and ran on the DEC OSses, not UNIX, so it probably counts. That said, I thought Paul was asking about different core compiler implementations, and I would argue the DECUS/Conroy compiler is the DMR compiler, while the list I offered was all different core implementations. I'm curious about Jon and Tom's MOD2 compiler. Other than Wirth's, which targeted the 68000, Lilith, and VAX, I did not know of another for the PDP-11. Any idea of its origin story? I would have expected it to have derived from Wirth's Modula subsystem. FWIW: The DEC Mod-II and Mod-III were new implementations from DEC WRL or SRC (I forget). They targeted Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff Mogul who might know/remember, but I thought that the font end to the DEC MOD2 compiler might have been partly based on Wirths but rewritten and by the time of the MOD3 FE was a new one originally written using the previous MOD2 compiler -- but I don't remember that detail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Thu May 9 04:12:55 2024 From: clemc at ccc.com (Clem Cole) Date: Wed, 8 May 2024 14:12:55 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org> Message-ID: s/Beety/Beatty/ -- sorry ᐧ On Wed, May 8, 2024 at 2:12 PM Clem Cole wrote: > > > On Wed, May 8, 2024 at 1:46 PM Al Kossow wrote: > >> Thoth has been a white whale for me for decades. > > Ditto. Although, I believe the late John Beety had his 'Thoth Thucks" tee > shirt for years. I believe Kelly Booth still does. > > > >> AFAIK nothing has survived from it. >> > You can argue that V-Kernel and QNX are children of Thoth - but they were > both in a flavor of Waterloo C that did not think ever targeted the PDP-11 > [that might be a misunderstanding WRT Waterloo C]. > >> >> "Decus" (Conroy's) C (transliteration of the assembler Unix C) should >> also be mentioned. >> > Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an > end-around for the AT&T lawyers by taking the *.s files, and converting > them to MACRO11, and then > redoing the assembler code to use originally RT11 I/O and later RSX11. > That said, it had its own life and ran on the DEC OSses, not UNIX, so it > probably counts. > That said, I thought Paul was asking about different core compiler > implementations, and I would argue the DECUS/Conroy compiler is the DMR > compiler, while the list I offered was all different core implementations. > > I'm curious about Jon and Tom's MOD2 compiler. Other than Wirth's, which > targeted the 68000, Lilith, and VAX, I did not know of another for the > PDP-11. Any idea of its origin story? I would have expected it to have > derived from Wirth's Modula subsystem. FWIW: The DEC Mod-II and Mod-III > were new implementations from DEC WRL or SRC (I forget). They targeted > Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff > Mogul who might know/remember, but I thought that the font end to the DEC > MOD2 compiler might have been partly based on Wirths but rewritten and by > the time of the MOD3 FE was a new one originally written using the previous > MOD2 compiler -- but I don't remember that detail. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From douglas.mcilroy at dartmouth.edu Thu May 9 04:29:19 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Wed, 8 May 2024 14:29:19 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler Message-ID: There was nothing unique about the size or the object code of Dennis's C compiler. In the 1960s, Digitek had a thriving business of making Fortran compilers for all manner of machines. To optimize space usage, the compilers' internal memory model comprised variable-size movable tables, called "rolls". To exploit this non-native architecture, the compilers themselves were interpreted, although they generated native code. Bob McClure tells me he used one on an SDS910 that had 8K 16-bit words. Dennis was one-up on Digitek in having a self-maintaining compiler. Thus, when he implemented an optimization, the source would grow, but the compiler binary might even shrink thanks to self-application. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From stewart at serissa.com Thu May 9 11:27:41 2024 From: stewart at serissa.com (Lawrence Stewart) Date: Wed, 8 May 2024 21:27:41 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> <517e03bf-09d2-9e5e-fe21-df17318d4080@bitsavers.org> Message-ID: <24A2FC48-8720-49B3-BF85-E53C9B09B32A@serissa.com> Regarding the Dec Research Modula-X compilers, I am fairly sure that Modula-2 for the VAX was a WRL (Digital Western Research Lab) thing, because it was used for the WRL CAD tool suite used to design the WRL Titan and the SRC (Systems Research Center) Firefly machines. SRC did the Modula-2-Plus compiler for the VAX, which added garbage collection. The Firefly OS was Modula, but included an Ultrix system call set so it could run Ultrix binaries. I may be wrong about this, but I think Wirth then did Modula-3 and then Oberon. WRL and SRC never had any PDP-11’s as far as I know. -L > On May 8, 2024, at 2:12 PM, Clem Cole wrote: > > > > On Wed, May 8, 2024 at 1:46 PM Al Kossow > wrote: >> Thoth has been a white whale for me for decades. > Ditto. Although, I believe the late John Beety had his 'Thoth Thucks" tee shirt for years. I believe Kelly Booth still does. > > >> AFAIK nothing has survived from it. > You can argue that V-Kernel and QNX are children of Thoth - but they were both in a flavor of Waterloo C that did not think ever targeted the PDP-11 [that might be a misunderstanding WRT Waterloo C]. >> >> "Decus" (Conroy's) C (transliteration of the assembler Unix C) should also be mentioned. > Hmmmm, it's a flavor of Dennis' compiler in disguise and was sort of an end-around for the AT&T lawyers by taking the *.s files, and converting them to MACRO11, and then > redoing the assembler code to use originally RT11 I/O and later RSX11. That said, it had its own life and ran on the DEC OSses, not UNIX, so it probably counts. > That said, I thought Paul was asking about different core compiler implementations, and I would argue the DECUS/Conroy compiler is the DMR compiler, while the list I offered was all different core implementations. > > I'm curious about Jon and Tom's MOD2 compiler. Other than Wirth's, which targeted the 68000, Lilith, and VAX, I did not know of another for the PDP-11. Any idea of its origin story? I would have expected it to have derived from Wirth's Modula subsystem. FWIW: The DEC Mod-II and Mod-III were new implementations from DEC WRL or SRC (I forget). They targeted Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff Mogul who might know/remember, but I thought that the font end to the DEC MOD2 compiler might have been partly based on Wirths but rewritten and by the time of the MOD3 FE was a new one originally written using the previous MOD2 compiler -- but I don't remember that detail. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcjones.org Thu May 9 13:39:32 2024 From: paul at mcjones.org (Paul McJones) Date: Wed, 8 May 2024 20:39:32 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <171519201646.4052234.694570138790187562@minnie.tuhs.org> References: <171519201646.4052234.694570138790187562@minnie.tuhs.org> Message-ID: <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org> > On Wed, 8 May 2024 14:12:15 -0400,Clem Cole > wrote: > > FWIW: The DEC Mod-II and Mod-III > were new implementations from DEC WRL or SRC (I forget). They targeted > Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff > Mogul who might know/remember, but I thought that the font end to the DEC > MOD2 compiler might have been partly based on Wirths but rewritten and by > the time of the MOD3 FE was a new one originally written using the previous > MOD2 compiler -- but I don't remember that detail. Michael Powell at DEC WRL wrote a Modula 2 compiler that generated VAX code. Here’s an extract from announcement.d accompanying a 1992 release of the compiler from gatekeeper.dec.com : The compiler was designed and built by Michael L. Powell, and originally released in 1984. Joel McCormack sped the compiler up, fixed lots of bugs, and swiped/wrote a User's Manual. Len Lattanzi ported the compiler to the MIPS. Later, Paul Rovner and others at DEC SRC designed Modula-2+ (a language extension with exceptions, threads, garbage collection, and runtime type dispatch). The Modula-2+ compiler was originally based on Powell’s compiler. Modula-2+ ran on the VAX. Here’s a DEC SRC research report on Modula-2+: http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-3.pdf Modula-3 was designed at DEC SRC and Olivetti Labs. It had a portable implementation (using the GCC back end) and ran on a number of machines including Alpha. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From imp at bsdimp.com Thu May 9 13:46:20 2024 From: imp at bsdimp.com (Warner Losh) Date: Wed, 8 May 2024 21:46:20 -0600 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org> References: <171519201646.4052234.694570138790187562@minnie.tuhs.org> <6CFD774F-F714-4AD0-A37E-E40013B8A281@mcjones.org> Message-ID: On Wed, May 8, 2024, 9:39 PM Paul McJones wrote: > On Wed, 8 May 2024 14:12:15 -0400,Clem Cole wrote: > > > FWIW: The DEC Mod-II and Mod-III > were new implementations from DEC WRL or SRC (I forget). They targeted > Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff > Mogul who might know/remember, but I thought that the font end to the DEC > MOD2 compiler might have been partly based on Wirths but rewritten and by > the time of the MOD3 FE was a new one originally written using the previous > MOD2 compiler -- but I don't remember that detail. > > > Michael Powell at DEC WRL wrote a Modula 2 compiler that generated VAX > code. Here’s an extract from announcement.d accompanying a 1992 release of > the compiler from gatekeeper.dec.com: > > The compiler was designed and built by Michael L. Powell, and originally > released in 1984. Joel McCormack sped the compiler up, fixed lots of > bugs, and > swiped/wrote a User's Manual. Len Lattanzi ported the compiler to the > MIPS. > > > Later, Paul Rovner and others at DEC SRC designed Modula-2+ (a language > extension with exceptions, threads, garbage collection, and runtime type > dispatch). The Modula-2+ compiler was originally based on Powell’s > compiler. Modula-2+ ran on the VAX. > > Here’s a DEC SRC research report on Modula-2+: > http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-3.pdf > > Modula-3 was designed at DEC SRC and Olivetti Labs. It had a portable > implementation (using the GCC back end) and ran on a number of machines > including Alpha. > FreeBSD's cvsup was written using it. The threading made it possible to make maximum use of the 56k modems of the time and speed downloads of the source changes. The port for modula-3 changed a number of time from gcc to egcs back to gcc before running out of steam... Warner -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Fri May 10 06:40:28 2024 From: tuhs at tuhs.org (Paul Ruizendaal via TUHS) Date: Thu, 9 May 2024 22:40:28 +0200 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> Message-ID: Thanks everybody for the feedback and pointers, much appreciated! The main point is clear: the premise that the DMR C compiler had unique (native, small machine) code generation during most of the 70’s does not hold up. Clean Cole is correct in observing that (certainly for the 70’s) I’m skewed to stuff from academia with a blind spot for the commercial compilers of that era. Doug McIlroy’s remarks on Digitek were most helpful and I’ll expand a bit on that below. I was aware of the Digitek / Ryan-Macfarland compilers before, but in my mind they compiled to a virtual machine (mis-understanding a description of “programmed operators” and because their compilers for microcomputers did so in the 80’s). Digging into this more led me to a 1970 report "Programming Languages and their Compilers, Preliminary Notes” by John Cocke and J.T. Schwartz: https://www.softwarepreservation.org/projects/FORTRAN/paper/Bright-FORTRANComesToWestinghouseBettis-1971.pdf It is a nearly 800 page review of then current languages and compilers and it includes some discussion of the Digitek compilers as the state of the art for small machines and has some further description of how they worked (pp. 233-237, 749). It also mentions their PL/1 for Multics fiasco (for background https://www.multicians.org/pl1.html). - The Digitek compilers were indeed small enough to run on PDP-11 class machines and even smaller, and they produced quite reasonable native code. In this sense, they were in the same spot as the DMR C compiler which was hence not unique in this regard -- as Doug points out. - They consisted of two parts: a front end coded in “Programmed Operators" (POPS) generating an intermediate language, and a custom coded back-end that converted the IL to native code. - POPS were in effect a VM for compiler construction (although expressed as assembler operations). To move a compiler to a new machine only the POPS VM had to be recoded, which was a very manageable job. From the description in the above book it sounds very similar to the META 3 compiler generator setup, but expressed in a different form. - Unfortunately, I have not been able to find a description of the POPS IL. - The smaller Digitek compilers had a limited level of optimisations, carried out at the code generation phase. The optimisations described sound quite similar to what the DMR C compiler did in its c1 phase (special casing +1 and -1, combining constants, mul/div to shift, etc.) - Code generation seems to have been through code snippets for each IL operation, selecting from one of 3 addressing modes: register, memory and indexed; the text isn’t quite clear. It sounds reasonable for small machines in the 60’s. - The later Ryan-MacFarland microcomputer compilers seem to have used the same POPS based front-end technology, but using an interpreter to execute the IL directly. Interestingly, the above book has a final chapter about “the self-compiling compiler”. To quote: “The scheme to be described is one which has often been considered, and in some cases even implemented. It involves the use of a compiler written in its own language, and capable therefore of compiling itself over to a new machine.” It proceeds to describe such a compiler in quite some detail, including using a table driven code generator. Seen through this lens, the DMR C compiler could be viewed as a re-imagining of the Digitek small system compilers using a self-compiling lexer/parser instead of POPS (or TMG or META) and a (also self-compiling) code generator evolved to handle the richer PDP-11 addressing modes. The concept seems to have been in the air at that time. Now I am left wondering why the IL-to-native back-ends were not more used in academic small machine compilers in the 70’s -- but this too may be the result of a skewed view on my part. From aek at bitsavers.org Fri May 10 06:57:18 2024 From: aek at bitsavers.org (Al Kossow) Date: Thu, 9 May 2024 13:57:18 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> Message-ID: On 5/9/24 1:40 PM, Paul Ruizendaal via TUHS wrote: > - Unfortunately, I have not been able to find a description of the POPS IL. I went down that rabbit hole researching why Don Knuth had a high opinion of it. If you dig around, the IL is described in the SDS Fortran documentation http://bitsavers.org/pdf/sds/9xx/lang/900883A_9300_FORTRAN_IV_Tech_Aug65.pdf and http://bitsavers.org/pdf/digitek/Data_Structures_in_Digiteks_FORTRAN_IV_Compiler_for_the_SDS_900_Series.pdf and a compiler listing https://archive.computerhistory.org/resources/text/Knuth_Don_X4100/PDF_index/k-1-pdf/k-1-C1051-Digitek-FORTRAN.pdf From davida at pobox.com Fri May 10 16:15:52 2024 From: davida at pobox.com (David Arnold) Date: Fri, 10 May 2024 16:15:52 +1000 Subject: [TUHS] nl section delimiters Message-ID: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to delimit header, body, and trailer sections within its input. I wondered if anyone was able to shed light on the reason those were adopted as the defaults? I would have expected perhaps something compatible with *roff (like, .\” something). FreeBSD claims nl first appeared in System III (although it previously claimed SVR2), but I haven’t dug into the implementation any further. Thanks in advance, d From robpike at gmail.com Fri May 10 20:08:33 2024 From: robpike at gmail.com (Rob Pike) Date: Fri, 10 May 2024 20:08:33 +1000 Subject: [TUHS] nl section delimiters In-Reply-To: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> Message-ID: Didn't recognize the command, looked it up. Sigh. pr -tn seems sufficient for me, but then that raises the question of your question. I've been developing a theory about how the existence of something leads to things being added to it that you didn't need at all and only thought of when the original thing was created. Bloat by example, if you will. I suspect it will not be a popular theory, however accurately it may describe the technological world. -rob On Fri, May 10, 2024 at 4:16 PM David Arnold wrote: > nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to > delimit header, body, and trailer sections within its input. > > I wondered if anyone was able to shed light on the reason those were > adopted as the defaults? > > I would have expected perhaps something compatible with *roff (like, .\” > something). > > FreeBSD claims nl first appeared in System III (although it previously > claimed SVR2), but I haven’t dug into the implementation any further. > > Thanks in advance, > > > > d > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Sat May 11 02:05:15 2024 From: tuhs at tuhs.org (segaloco via TUHS) Date: Fri, 10 May 2024 16:05:15 +0000 Subject: [TUHS] nl section delimiters In-Reply-To: References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> Message-ID: On Friday, May 10th, 2024 at 3:08 AM, Rob Pike wrote: > Didn't recognize the command, looked it up. Sigh. > > pr -tn > > seems sufficient for me, but then that raises the question of your question. > > I've been developing a theory about how the existence of something leads to things being added to it that you didn't need at all and only thought of when the original thing was created. Bloat by example, if you will. I suspect it will not be a popular theory, however accurately it may describe the technological world. > > -rob > > > On Fri, May 10, 2024 at 4:16 PM David Arnold wrote: > > > nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to delimit header, body, and trailer sections within its input. > > > > I wondered if anyone was able to shed light on the reason those were adopted as the defaults? > > > > I would have expected perhaps something compatible with *roff (like, .\” something). > > > > FreeBSD claims nl first appeared in System III (although it previously claimed SVR2), but I haven’t dug into the implementation any further. > > > > Thanks in advance, > > > > > > > > d https://www.tuhs.org/pipermail/tuhs/2022-July/026197.html Here's an earlier thread on nl that doesn't answer your specific question on the sequences but may provide some background on nl(1). - Matt G. From clemc at ccc.com Sat May 11 02:36:43 2024 From: clemc at ccc.com (Clem Cole) Date: Fri, 10 May 2024 12:36:43 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools Message-ID: While the idea of small tools that do one job well is the core tenant of what I think of as the UNIX philosophy, this goes a bit beyond UNIX, so I have moved this discussion to COFF and BCCing TUHS for now. The key is that not all "bloat" is the same (really)—or maybe one person's bloat is another person's preference. That said, NIH leads to pure bloat with little to recommend it, while multiple offerings are a choice. Maybe the difference between the two may be one person's view over another. On Fri, May 10, 2024 at 6:08 AM Rob Pike wrote: > Didn't recognize the command, looked it up. Sigh. > Like Rob -- this was a new one for me, too. I looked, and it is on the SYS3 tape; see: https://www.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/man/man1/nl.1 > pr -tn > > seems sufficient for me, but then that raises the question of your > question. > Agreed, that has been burned into the ROMs in my fingers since the mid-1970s 😀 BTW: SYS3 has pr(1) with both switches too (more in a minute) > I've been developing a theory about how the existence of something leads > to things being added to it that you didn't need at all and only thought of > when the original thing was created. > That is a good point, and I generally agree with you. > Bloat by example, if you will. I suspect it will not be a popular theory, > however accurately it may describe the technological world. > Of course, sometimes the new features >>are<< easier (more natural *for some people*). And herein lies the core problem. The bloat is often repetitive, and I suggest that it is often implemented in the wrong place - and usually for the wrong reasons. Bloat comes about because somebody thinks they need some feature and probably doesn't understand that it is already there or how they can use it. But they do know about it, their tool must be set up to exploit it - so they do not need to reinvent it. GUI-based tools are notorious for this failure. Everyone seems to have a built-in (unique) editor, or a private way to set up configuration options et al. But ... that walled garden is comfortable for many users and >>can be<< useful sometimes. Long ago, UNIX programmers learned that looking for $EDITOR in the environment was way better than creating one. Configuration was as ASCII text, stored in /etc for system-wide and dot files in the home for users. But it also means the >>output<< of each tool needs to be usable by each other [*i.e.*, docx or xlx files are a no-no). For example, for many things on my Mac, I do use the GUI-based tools -- there is no doubt they are better integrated with the core Mac system >>for some tasks.<< But only if I obey a set of rules Apple decrees. For instance, this email read is easier much of the time than MH (or the HM front end, for that matter), which I used for probably 25-30 years. But on my Mac, I always have 4 or 5 iterm2(1) open running zsh(1) these days. And, much of my typing (and everything I do as a programmer) is done in the shell (including a simple text editor, not an 'IDE'). People who love IDEs swear by them -- I'm just not impressed - there is nothing they do for me that makes it easier, and I have learned yet another scheme. That said, sadly, Apple is forcing me to learn yet another debugger since none of the traditional UNIX-based ones still work on the M1-based systems. But at least LLDB is in the same key as sdb/dbx/gdb *et al*., so it is a PITA but not a huge thing as, in the end, LLDB is still based on the UNIX idea of a single well-designed and specific to the task tool, to do each job and can work with each other. FWIW: I was recently a tad gob-smacked by the core idea of UNIX and its tools, which I have taken for a fact since the 1970s. It turns out that I've been helping with the PiDP-10 users (all of the PiDPs are cool, BTW). Before I saw UNIX, I was paid to program a PDP-10. In fact, my first UNIX job was helping move programs from the 10 to the UNIX. Thus ... I had been thinking that doing a little PDP-10 hacking shouldn't be too hard to dust off some of that old knowledge. While some of it has, of course, come back. But daily, I am discovering small things that are so natural with a few simple tools can be hard on those systems. I am realizing (rediscovering) that the "build it into my tool" was the norm in those days. So instead of a pr(1) command, there was a tool that created output to the lineprinter. You give it a file, and it is its job to figure out what to do with it, so it has its set of features (switches) - so "bloat" is that each tool (like many current GUI tools) has private ways of doing things. If the maker of tool X decided to support some idea, they would do it like tool Y. The problem, of course, was that tools X and Y had to 'know about' each type of file (in IBM terms, use its "access method"). Yes, the engineers at DEC, in their wisdom, tried to "standardize" those access methods/switches/features >>if you implemented them<< -- but they are not all there. This leads me back to the question Rob raises. Years ago, I got into an argument with Dave Cutler RE: UNIX *vs.* VMS. Dave's #1 complaint about UNIX in those days was that it was not "standardized." Every program was different, and more to Dave's point, there was no attempt to make switches or errors the same [getopt(3) had been introduced but was not being used by most applications). He hated that tar/tp used "keys" and tools like cpio used switches. Dave hated that I/O was so simple - in his world all user programs should use his RMS access method of course [1]. VMS, TOPS, *etc.*, tried to maintain a system-wide error scheme, and users could look things like errors up in a system DB by error number, *etc*. Simply put, VMS is very "top-down." My point with Dave was that by being "bottom-up," the best ideas in UNIX were able to rise. And yes, it did mean some rough edges and repeated implementations of the same idea. But UNIX offered a choice, and while Rob and I like and find: pr -tn perfectly acceptable thank you, clearly someone else desired the features that nl provides. The folks that put together System 3 offer both solutions and let the user choose. This, of course, comes as bloat, but maybe that is a type of bloat so bad? My own thinking is this - get things down to the basics and simplest privatives and then build back up. It's okay to offer choices, as long as the foundation is simple and clean. To me, bloat becomes an issue when you do the same thing over and over again, particularly because you can not utilize what is there already, the worst example is NIH - which happens way more than it should. I think the kind of bloat that GUI tools and TOPS et al. created forces recreation, not reuse. But offering choice and the expense of multiple tools that do the same things strikes me as reasonable/probably a good thing. 1.] BTW: One of my favorite DEC stories WRT to VMS engineering has to do with the RMS I/O system. Supporting C using VMS was a bit of PITA. Eventually, the VMS engineers added Stream I/O - which simplified the C runtime, but it was also made available for all technical languages. Fairly soon after it was released, the DEC Marketing folks discovered almost all new programs, regardless of language, had started to use Stream I/O and many older programs were being rewritten by customers to use it. In fact, inside of DEC itself, the languages group eventually rewrote things like the FTN runtime to use streams, making it much smaller/easier to maintain. My line in the old days: "It's not so bad that ever I/O has offer 1000 options, it's that Dave to check each one for every I/O. It's a classic example of how you can easily build RMS I/O out of stream-based I/O, but the other way around is much harder. My point here is to *use the right primitives*. RMS may have made it easier to build RDB, but it impeded everything else. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpl.jpl at gmail.com Sat May 11 02:50:22 2024 From: jpl.jpl at gmail.com (John P. Linderman) Date: Fri, 10 May 2024 12:50:22 -0400 Subject: [TUHS] nl section delimiters In-Reply-To: References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> Message-ID: I'll accept Rob's theory. Instead of taking the time to go through the alphabet soup of options to nl and pr and ls, learning a tool like awk or perl or python makes implementing most of what these commands do (or what you wish they could do) a one-finger exercise. -- jpl On Fri, May 10, 2024 at 6:09 AM Rob Pike wrote: > Didn't recognize the command, looked it up. Sigh. > > pr -tn > > seems sufficient for me, but then that raises the question of your > question. > > I've been developing a theory about how the existence of something leads > to things being added to it that you didn't need at all and only thought of > when the original thing was created. Bloat by example, if you will. I > suspect it will not be a popular theory, however accurately it may describe > the technological world. > > -rob > > > On Fri, May 10, 2024 at 4:16 PM David Arnold wrote: > >> nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to >> delimit header, body, and trailer sections within its input. >> >> I wondered if anyone was able to shed light on the reason those were >> adopted as the defaults? >> >> I would have expected perhaps something compatible with *roff (like, .\” >> something). >> >> FreeBSD claims nl first appeared in System III (although it previously >> claimed SVR2), but I haven’t dug into the implementation any further. >> >> Thanks in advance, >> >> >> >> d >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Sat May 11 03:28:40 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Fri, 10 May 2024 13:28:40 -0400 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: Message-ID: On Wed, May 8, 2024 at 2:29 PM Douglas McIlroy < douglas.mcilroy at dartmouth.edu> wrote: > > Dennis was one-up on Digitek in having a self-maintaining compiler. Thus, > when he implemented an optimization, the source would grow, but the > compiler binary might even shrink thanks to self-application. > Another somewhat non-intuitive aspect of optimizing compilers is that simply adding optimizations can cause an increase in compilation speed by reducing the amount of IL in the program being compiled. Less IL due to optimization means less time spent in later phases of the compilation process. Regarding native compilers for small machines, IBM had compilers for Fortran, COBOL, and PL/I that ran in 32K on System/360 and produced tolerably good code (yes, one could do better with handwritten assembler). And they generated real code, no threaded code cop-out. And we're talking full PL/I here, not the subset that ANSI later standardized. The compilers were table-driven as much as possible, heavily overlaid, and used three scratch files on disk (split-cylinder allocated to minimize seek time). -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at mcjones.org Sat May 11 04:55:54 2024 From: paul at mcjones.org (Paul McJones) Date: Fri, 10 May 2024 11:55:54 -0700 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <171535904627.4052234.5321502833323676423@minnie.tuhs.org> References: <171535904627.4052234.5321502833323676423@minnie.tuhs.org> Message-ID: <5F50D05C-C898-4BFF-B57F-028494ED9EBD@mcjones.org> > On Thu, 9 May 2024 22:40:28 +0200, Paul Ruizendaal > wrote: > > .... Digging into this more led me to a 1970 report "Programming Languages and their Compilers, Preliminary Notes” by John Cocke and J.T. Schwartz: > https://www.softwarepreservation.org/projects/FORTRAN/paper/Bright-FORTRANComesToWestinghouseBettis-1971.pdf Actually, the link is here: https://www.softwarepreservation.org/projects/FORTRAN/CockeSchwartz_ProgLangCompilers.pdf And more about Jack Schwartz is here: https://www.softwarepreservation.org/projects/SETL/index.html#Precursors Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From steffen at sdaoden.eu Sat May 11 09:05:55 2024 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Sat, 11 May 2024 01:05:55 +0200 Subject: [TUHS] nl section delimiters In-Reply-To: References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> Message-ID: <20240510230555.j3xsYPX4@steffen%sdaoden.eu> John P. Linderman wrote in : |I'll accept Rob's theory. Instead of taking the time to go through the |alphabet soup of options to nl and pr and ls, learning a tool like awk or |perl or python makes implementing most of what these commands do (or what |you wish they could do) a one-finger exercise. -- jpl But it misses the coolness of the empty true(1), and the last possibly requires more CPU cycles for startup than you had on a work day (said into the blue, completely unmathematically). --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From douglas.mcilroy at dartmouth.edu Sat May 11 12:19:45 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Fri, 10 May 2024 22:19:45 -0400 Subject: [TUHS] nl section delimiters Message-ID: > But it misses the coolness of the empty true(1). Too cool. With an empty true(1), execl("true", "true", 0) is out in the cold. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Sat May 11 19:07:41 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sat, 11 May 2024 10:07:41 +0100 Subject: [TUHS] nl section delimiters In-Reply-To: References: <1FECD6DE-3384-406F-8897-8D7C2DAAF636@pobox.com> Message-ID: <20240511090741.906E3215AA@orac.inputplus.co.uk> Hi jpl, > Instead of taking the time to go through the alphabet soup of options > to nl and pr and ls, learning a tool like awk or perl or python pr(1) was in V5, where one of its stderr messages was ‘Very funny.’. awk arrived in V7. -- Cheers, Ralph. From ralph at inputplus.co.uk Sat May 11 19:16:13 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sat, 11 May 2024 10:16:13 +0100 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: Message-ID: <20240511091613.2399F215AA@orac.inputplus.co.uk> Hi, Paul W. wrote: > The compilers were table-driven as much as possible, heavily overlaid, > and used three scratch files on disk (split-cylinder allocated to > minimize seek time). I didn't know the term ‘split cylinder’. A cylinder has multiple tracks, each with its own read/write head. Allocate different tracks to different files and switching files needs no physical movement or settling time; just electronically switch R/W head. -- Cheers, Ralph. From g.branden.robinson at gmail.com Sat May 11 23:42:21 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 11 May 2024 08:42:21 -0500 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: Message-ID: <20240511134221.w7v35qdey7z7j6wf@illithid> At 2024-05-10T13:28:40-0400, Paul Winalski wrote: > On Wed, May 8, 2024 at 2:29 PM Douglas McIlroy < > douglas.mcilroy at dartmouth.edu> wrote: > > Dennis was one-up on Digitek in having a self-maintaining compiler. > > Thus, when he implemented an optimization, the source would grow, > > but the compiler binary might even shrink thanks to > > self-application. > > Another somewhat non-intuitive aspect of optimizing compilers is that > simply adding optimizations can cause an increase in compilation speed > by reducing the amount of IL in the program being compiled. Less IL > due to optimization means less time spent in later phases of the > compilation process. This fact was rediscovered later when people found that some code compiled with "-Os" (optimize for space) was faster than some code optimized for speed ("-O1", "-O2", and so on). The reason turned out to be that the reduced code size meant fewer cache evictions, so you gained performance by skipping instances of instruction fetches all the way from the slow main memory bus. Think of all those poor unrolled loops... Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From steffen at sdaoden.eu Sun May 12 06:48:16 2024 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Sat, 11 May 2024 22:48:16 +0200 Subject: [TUHS] nl section delimiters In-Reply-To: References: Message-ID: <20240511204816.UwAcweCX@steffen%sdaoden.eu> Douglas McIlroy wrote in : |> But it misses the coolness of the empty true(1). | |Too cool. With an empty true(1), execl("true", "true", 0) is out in the |cold. There i stand singing "ein Männlein steht im Walde" (a "little man" stands in the forest). ..ok, but then i do note here and now the certain lists where the question on whether an additional entry in the search path does make any sense at all for certain constructs comes up regulary, (even) i have lived this multiple times already, it is about The [.] command search [.] allows for a standard utility to be implemented as a regular built-in as long as it is found in the appropriate place in a PATH search. [.]command -v true might yield /bin/true or some similar pathname. Other [non-standard] utilities [.] might exist only as built-ins and have no pathname associated with them. These produce output identified as (regular) built-ins. Applications encountering these are not able to count on execing them, using them with nohup, overriding them with a different PATH, and so on. The next POSIX standard will have around 4058 pages (3950 without index) and 137171 lines (not counting index). And i was surely laughing when this list it surely was came along this somewhen in the past, and isn't that just "a muscle car": #?0|kent:unix-hist$ git show Research-V7:bin/true | wc -c 0 Many greetings and best wishes!! --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From athornton at gmail.com Mon May 13 05:34:20 2024 From: athornton at gmail.com (Adam Thornton) Date: Sun, 12 May 2024 12:34:20 -0700 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240511213532.GB8330@mit.edu> References: <20240511213532.GB8330@mit.edu> Message-ID: On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o wrote: > > I bet most of the young'uns would not be trying to do this as a shell > script, but using the Cloud SDK with perl or python or Go, which is > *way* more bloaty than using /bin/sh. > > So while some of us old farts might be bemoaning the death of the Unix > philosophy, perhaps part of the reality is that the Unix philosophy > were ideal for a simpler time, but might not be as good of a fit > today I'm finding myself in agreement. I might well do this with jq, but as you point out, you're using the jq DSL pretty extensively to pull out the fields. On the other hand, I don't think that's very different than piping stuff through awk, and I don't think anyone feels like _that_ would be cheating. And jq -L is pretty much equivalent to awk -F, which is how I would do this in practice, rather than trying to inline the whole jq bit. But it does come down to the same argument as https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf And it is true that while fork() is a great model for single-threaded pipeline-looking tasks, it's not really what you want for an interactive multithreaded application on your phone's GUI. Oddly, I'd have a slightly different reason for reaching for Python (which is probably how I'd do this anyway), and that's the batteries-included bit. If I write in Python, I've got the gcloud api available as a Python module, and I've got a JSON parser also available as a Python module (but I bet all the JSON unmarshalling is already handled in the gcloud library), and I don't have to context-switch to the same degree that I would if I were stringing it together in the shell. Instead of "make an HTTP request to get JSON text back, then parse that with repeated calls to jq", I'd just get an object back from the instance fetch request, pick out the fields I wanted, and I'd be done. I'm afraid only old farts write anything in Perl anymore. The kids just mutter "OK, Boomer" when you try to tell them how much better CPAN was than PyPi. And it sure feels like all the cool kids have abandoned Go for Rust, although Go would be a perfectly reasonable choice for this task as well (and would look a lot like Python: get an object back, pick off the useful fields). Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From lm at mcvoy.com Mon May 13 05:47:07 2024 From: lm at mcvoy.com (Larry McVoy) Date: Sun, 12 May 2024 12:47:07 -0700 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240511213532.GB8330@mit.edu> Message-ID: <20240512194707.GL9216@mcvoy.com> On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote: > But it does come down to the same argument as > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf > > And it is true that while fork() is a great model for single-threaded > pipeline-looking tasks, it's not really what you want for an interactive > multithreaded application on your phone's GUI. Perhaps a meaningless aside, but I agree on fork(). In the last major project I did, which was cross platform {windows,macos, all the major Unices, Linux}, we adopted spawn() rather than fork/exec. There is no way (that I know of) to fake fork() on Windows but it's easy to fake spawn(). --lm From johnl at taugh.com Mon May 13 06:13:48 2024 From: johnl at taugh.com (John Levine) Date: 12 May 2024 16:13:48 -0400 Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240512194707.GL9216@mcvoy.com> Message-ID: <20240512201349.0DB6A8A9D055@ary.qy> It appears that Larry McVoy said: >Perhaps a meaningless aside, but I agree on fork(). In the last major >project I did, which was cross platform {windows,macos, all the major >Unices, Linux}, we adopted spawn() rather than fork/exec. There is no way >(that I know of) to fake fork() on Windows but it's easy to fake spawn(). The whole point of fork() is that it lets you get the effect of spawn with a lot less internal mechanism. Spawn is equivalent to: fork() ... do stuff to files and environment ... exec() By separating the fork and the exec, they didn't have to put all of the stuff in the 12 paragraphs in the spawn() man page into the the tiny PDP-11 kernel. These days now that programs include multi-megabyte shared libraries just for fun, I agree that the argument is less persuasive. On the third hard, we now understand virtual memory and paging systems a lot better so we don't need kludges like vfork(). R's, John From dave at horsfall.org Mon May 13 06:43:35 2024 From: dave at horsfall.org (Dave Horsfall) Date: Mon, 13 May 2024 06:43:35 +1000 (EST) Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240511213532.GB8330@mit.edu> Message-ID: On Sun, 12 May 2024, Adam Thornton wrote: > I'm afraid only old farts write anything in Perl anymore.  The kids just > mutter "OK, Boomer" when you try to tell them how much better CPAN was than > PyPi.  And it sure feels like all the cool kids have abandoned Go for Rust, > although Go would be a perfectly reasonable choice for this task as well > (and would look a lot like Python: get an object back, pick off the useful > fields). I must be an old fart then; the last language I used where white space was part of the syntax was FORTRAN... -- Dave From crossd at gmail.com Mon May 13 08:56:35 2024 From: crossd at gmail.com (Dan Cross) Date: Sun, 12 May 2024 18:56:35 -0400 Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240512201349.0DB6A8A9D055@ary.qy> References: <20240512194707.GL9216@mcvoy.com> <20240512201349.0DB6A8A9D055@ary.qy> Message-ID: On Sun, May 12, 2024 at 4:14 PM John Levine wrote: > It appears that Larry McVoy said: > >Perhaps a meaningless aside, but I agree on fork(). In the last major > >project I did, which was cross platform {windows,macos, all the major > >Unices, Linux}, we adopted spawn() rather than fork/exec. There is no way > >(that I know of) to fake fork() on Windows but it's easy to fake spawn(). > > The whole point of fork() is that it lets you get the effect of spawn with > a lot less internal mechanism. Spawn is equivalent to: > > fork() > ... do stuff to files and environment ... > exec() > > By separating the fork and the exec, they didn't have to put all of > the stuff in the 12 paragraphs in the spawn() man page into the the > tiny PDP-11 kernel. Perhaps, but as I've written here before, `fork`/`exec` vs `spawn` is a false dichotomy. Another alternative is a `proccreate`/`procrun` pair, the former of which creates an unrunnable process, the latter of which marks it runnable. Coupled with a set of primitives to manipulate the state of an extant, but unrunnable, process and you have the advantages of fork/exec without the downsides (which are well-known; https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf). Similarly, this gives you the functionality of spawn, without the downside of a singularly complicated interface. Could you have implemented that in something as small as the PDP-7? Perhaps not, but it does not follow that `fork` now remains a good primitive. My spelunking in the original GENIE documentation leads me to believe that its `fork` provided functionality similar to what I described. - Dan C. From lm at mcvoy.com Mon May 13 09:34:54 2024 From: lm at mcvoy.com (Larry McVoy) Date: Sun, 12 May 2024 16:34:54 -0700 Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240512194707.GL9216@mcvoy.com> <20240512201349.0DB6A8A9D055@ary.qy> Message-ID: <20240512233454.GM9216@mcvoy.com> On Sun, May 12, 2024 at 06:56:35PM -0400, Dan Cross wrote: > Similarly, this gives you the functionality of spawn, without the > downside of a singularly complicated interface. Could you have > implemented that in something as small as the PDP-7? Perhaps not, but > it does not follow that `fork` now remains a good primitive. Our spawnvp() implmentation is 40 lines of code. Worked fine everywhere. I can post it if you like. From dave at horsfall.org Mon May 13 11:34:38 2024 From: dave at horsfall.org (Dave Horsfall) Date: Mon, 13 May 2024 11:34:38 +1000 (EST) Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240512233454.GM9216@mcvoy.com> References: <20240512194707.GL9216@mcvoy.com> <20240512201349.0DB6A8A9D055@ary.qy> <20240512233454.GM9216@mcvoy.com> Message-ID: On Sun, 12 May 2024, Larry McVoy wrote: > Our spawnvp() implmentation is 40 lines of code. Worked fine everywhere. > I can post it if you like. Pretty please... -- Dave From flexibeast at gmail.com Mon May 13 12:33:55 2024 From: flexibeast at gmail.com (Alexis) Date: Mon, 13 May 2024 12:33:55 +1000 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: (Adam Thornton's message of "Sun, 12 May 2024 12:34:20 -0700") References: <20240511213532.GB8330@mit.edu> Message-ID: <87y18ebfu4.fsf@gmail.com> > On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o > wrote: > > So while some of us old farts might be bemoaning the death of > the > Unix > philosophy, perhaps part of the reality is that the Unix > philosophy > were ideal for a simpler time, but might not be as good of a fit > today Hm .... i guess it might depend on the specific use-case(s) involved? At one point i realised that a primary reason i enjoy using *n*x systems is that they're fundamentally _text-oriented_. (Unsurprisingly, of course, given the context in which Unix was developed.) i spend a lot of my time interacting and working with text, and *n*x systems provide me with many useful tools for this. Quoting the old "UNIX As Literature" piece, https://theody.net/elements.html: "[T]he most recurrent complaint was that [Unix] was too text-oriented. People really hated the command line, with all the utilities, obscure flags, and arguments they had to memorize. They hated all the typing. One mislaid character and you had to start over. Interestingly, this complaint came most often from users of the GUI-laden Macintosh or Windows platforms. ... "[A] suspiciously high proportion of my UNIX colleagues had already developed, in some prior career, a comfort and fluency with text and printed words. ... "With UNIX, text — on the command line, STDIN, STDOUT, STDERR — is the primary interface mechanism: UNIX system utilities are a sort of Lego construction set for word-smiths. Pipes and filters connect one utility to the next, text flows invisibly between. Working with a shell, awk/lex derivatives, or the utility set is literally a word dance." Perl, with its pervasive regex-based functionality and extensive Unicode support, fits neatly into this. i find regexes an _incredibly_ powerful tool for working with text, whether via Perl, sed, awk, or whatever. But my experience is that many people treat regexes as an anathema, with Zawinski's "Now you have two problems" regularly trotted out as a thought-terminating cliché. Sure, regexes can, and do, get used where they shouldn't be[a]; that doesn't mean the baby should be thrown out with the bathwater. But if one is only working with text under sufferance, trying to avoid it via substantially more graphically-oriented environments, the text-based "Unix philosophy" and the tools associated with it might feel (and actually be) much less appropriate and useful. Fair enough. The Unix construction set will still be there for those of us who find them very appropriate and tremendously useful. Alexis. [a] It seems unlikely that anyone on this list hasn't already seen this, but just in case: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 i'm looking forward to that comment sending OpenAI over the Mountains of Madness. From imp at bsdimp.com Mon May 13 12:57:05 2024 From: imp at bsdimp.com (Warner Losh) Date: Sun, 12 May 2024 20:57:05 -0600 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <87y18ebfu4.fsf@gmail.com> References: <20240511213532.GB8330@mit.edu> <87y18ebfu4.fsf@gmail.com> Message-ID: On Sun, May 12, 2024, 8:34 PM Alexis wrote: > > > On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o > > wrote: > > > > So while some of us old farts might be bemoaning the death of > > the > > Unix > > philosophy, perhaps part of the reality is that the Unix > > philosophy > > were ideal for a simpler time, but might not be as good of a fit > > today > > Hm .... i guess it might depend on the specific use-case(s) > involved? > I created, years ago, a set of time legos. They were connected as a network of producer / consumer interfaces. Each lego would do one thing and pass the results to the next thing in the chain. A driver would read timing data from the driver and convert it to a MI interface. Different other legos would take time differences, compute phase or frequency differences and these would feed into more sophisticated algorithms or output etc. All locking was on yhe pipe's queues so all these algorithms were lock free apart from the queueing or dequeueing of data. Concrptually, this is just a bunch of pipe, with many to 1 or 1 to many added. Each lego did one thing and passed the results along to the thing in the chain... much like 'cmd | grep | awk | more'. Plus MI data representations for almost everything so only the driver reader thread cared about the hw. See also tty abstraction or ifnet abstraction in unix.... So actually not a set of FDs passing data between process, but threads doing the same sort of thing. The whole data filtering paradigm works in lots of different ways. And it still works really well by analogy. Warner ObComplaint: fork sucks for address spaces with 100s of threads. Forst thing we created a child process we used to broker different threads needing to run popen or system... having a create process / munge process / start process API is kinda what we did behind the scenes though with "send this data" and "receive that data". We iterated to this after the first dozen attempts to closely broker fork/exec dance proved... unreliable. At one point i realised that a primary reason i enjoy using *n*x > systems is that they're fundamentally > _text-oriented_. (Unsurprisingly, of course, given the context in > which Unix was developed.) i spend a lot of my time interacting > and working with text, and *n*x systems provide me with many > useful tools for this. Quoting the old "UNIX As Literature" piece, > https://theody.net/elements.html: > > "[T]he most recurrent complaint was that [Unix] was too > text-oriented. People really hated the command line, with all the > utilities, obscure flags, and arguments they had to memorize. They > hated all the typing. One mislaid character and you had to start > over. Interestingly, this complaint came most often from users of > the GUI-laden Macintosh or Windows platforms. ... > > "[A] suspiciously high proportion of my UNIX colleagues had > already developed, in some prior career, a comfort and fluency > with text and printed words. ... > > "With UNIX, text — on the command line, STDIN, STDOUT, STDERR — is > the primary interface mechanism: UNIX system utilities are a sort > of Lego construction set for word-smiths. Pipes and filters > connect one utility to the next, text flows invisibly > between. Working with a shell, awk/lex derivatives, or the utility > set is literally a word dance." > > Perl, with its pervasive regex-based functionality and extensive > Unicode support, fits neatly into this. i find regexes an > _incredibly_ powerful tool for working with text, whether via > Perl, sed, awk, or whatever. But my experience is that many people > treat regexes as an anathema, with Zawinski's "Now you have two > problems" regularly trotted out as a thought-terminating > cliché. Sure, regexes can, and do, get used where they shouldn't > be[a]; that doesn't mean the baby should be thrown out with the > bathwater. > > But if one is only working with text under sufferance, trying to > avoid it via substantially more graphically-oriented environments, > the text-based "Unix philosophy" and the tools associated with it > might feel (and actually be) much less appropriate and > useful. Fair enough. The Unix construction set will still be there > for those of us who find them very appropriate and tremendously > useful. > > > Alexis. > > [a] It seems unlikely that anyone on this list hasn't already seen > this, but just in case: > > > https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 > > i'm looking forward to that comment sending OpenAI over the > Mountains of Madness. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreww591 at gmail.com Mon May 13 13:29:01 2024 From: andreww591 at gmail.com (Andrew Warkentin) Date: Sun, 12 May 2024 21:29:01 -0600 Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240512194707.GL9216@mcvoy.com> <20240512201349.0DB6A8A9D055@ary.qy> Message-ID: On Sun, May 12, 2024 at 4:57 PM Dan Cross wrote: >l. > > Perhaps, but as I've written here before, `fork`/`exec` vs `spawn` is > a false dichotomy. Another alternative is a `proccreate`/`procrun` > pair, the former of which creates an unrunnable process, the latter of > which marks it runnable. Coupled with a set of primitives to > manipulate the state of an extant, but unrunnable, process and you > have the advantages of fork/exec without the downsides (which are > well-known; https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf). > Similarly, this gives you the functionality of spawn, without the > downside of a singularly complicated interface. Could you have > implemented that in something as small as the PDP-7? Perhaps not, but > it does not follow that `fork` now remains a good primitive. > IMO something like that is the best model (although it probably would have been a bit complicated for a PDP-7/PDP-11). That's basically what I'm doing in the OS that I'm writing . Processes will basically just be containers for hierarchical groups of threads, and will have pretty much no other state besides the command line. All of the context normally associated with a process (file descriptor space, permissions/UID/GID, filesystem namespace, virtual address space) will instead be in separate objects that are explicitly bound to threads. Separate APIs for creating an empty process, creating threads within it, manipulating context objects and binding threads to them, and starting the process will be provided (all of these APIs will use a file-based transport underneath; this will be the first OS I know of where literally everything is a file). The base process APIs will be general enough to allow an efficient copy-on-write fork() to be implemented on top of them for backwards compatibility and the remaining use cases where forking still makes sense (since even all process memory will be implemented with files, this will be implemented with a special in-memory "shadow filesystem" that creates alternate mappings of other memory filesystems). Really I'd say there are actually several design decisions in conventional Unix that made sense on a PDP-7 or PDP-11, but no longer make sense in the modern world. For instance, the rather inflexible security model with its fixed set of root-only system calls rather than some form of role-based access control, or the use of on-disk device nodes bound by numbers rather than something like separate special filesystems for each driver that get union mounted together, or the lack of integrated support for userspace filesystem servers (yes, there's FUSE, but it's kind of a poorly integrated hack that is rarely used for anything important). From meillo at marmaro.de Mon May 13 15:23:47 2024 From: meillo at marmaro.de (markus schnalke) Date: Mon, 13 May 2024 07:23:47 +0200 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240511213532.GB8330@mit.edu> Message-ID: <1s6OA7-1nI-00@marmaro.de> Hoi. > On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o wrote: > > I bet most of the young'uns would not be trying to do this as a shell > script, but using the Cloud SDK with perl or python or Go, which is > *way* more bloaty than using /bin/sh. > > So while some of us old farts might be bemoaning the death of the Unix > philosophy, perhaps part of the reality is that the Unix philosophy > were ideal for a simpler time, but might not be as good of a fit > today It depends on what the Unix philosophy is seen to be. If it is solving problems by reading text from standard in and printing to standard out, then that might not be suitable anymore for many of today's problems. But if it is prefering plain text to binary, perfering simple solutions to complex ones, increasing the number of operations one can perform by combining small generic parts, ... all because of good reasons ... Focussing on simplicity, clarity, generality ... Omitting needless words! ... All this still holds true, no matter if applied as shell scripts or within the design of a new programming language or a programming interface. It's not so much about the tools we use -- these should be suited for the times you live in and the problems you have to solve -- but it's more about how you look at them and how you look at the problems and what ideas for solutions you can imagine in your mind. Here, Unix provides a continuing inspiration. Only, like with every old book: when we read it today, we have to read it within the background of the times back then and transfer its message to today's times. The older the book, the more transfer work has to be done, the more knowledgable the then younger and more distant readers have to be, to really understand it. Thus, in my oppinion, the Unix philosophy remains a good and very relevant fit today, although not all of its applications from back then still are. meillo From andreww591 at gmail.com Mon May 13 16:18:00 2024 From: andreww591 at gmail.com (Andrew Warkentin) Date: Mon, 13 May 2024 00:18:00 -0600 Subject: [TUHS] [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: <1s6OA7-1nI-00@marmaro.de> References: <20240511213532.GB8330@mit.edu> <1s6OA7-1nI-00@marmaro.de> Message-ID: On Sun, May 12, 2024 at 11:23 PM markus schnalke wrote: > > > It depends on what the Unix philosophy is seen to be. If it is > solving problems by reading text from standard in and printing to > standard out, then that might not be suitable anymore for many of > today's problems. But if it is prefering plain text to binary, > perfering simple solutions to complex ones, increasing the number > of operations one can perform by combining small generic parts, > ... all because of good reasons ... Focussing on simplicity, > clarity, generality ... Omitting needless words! ... All this still > holds true, no matter if applied as shell scripts or within the > design of a new programming language or a programming interface. > > It's not so much about the tools we use -- these should be suited > for the times you live in and the problems you have to solve -- > but it's more about how you look at them and how you look at the > problems and what ideas for solutions you can imagine in your > mind. Here, Unix provides a continuing inspiration. > > Only, like with every old book: when we read it today, we have to > read it within the background of the times back then and transfer > its message to today's times. The older the book, the more transfer > work has to be done, the more knowledgable the then younger and > more distant readers have to be, to really understand it. > > Thus, in my oppinion, the Unix philosophy remains a good and very > relevant fit today, although not all of its applications from back > then still are. > I agree, but it seems that most Unix developers haven't really cared since the side branches and clones effectively took over from Research Unix in the early 80s. They've added system calls and ad-hoc socket RPC interfaces with abandon instead of using generic filesystem-based extensibility APIs, added options to various commands that should just have been separate programs, and written desktop environments/applications that have poor composability, extensibility and modularity (I guess KDE's KParts kind of counts as a mechanism for composing applications, but it's limited by being based on plugins rather than an open IPC-based API). The only Unix desktop I can think of that really tries to follow the Unix philosophy somewhat is the now-abandoned Étoilé . There's also the desktops of the rather obscure BTRON family , although those OSes are only vaguely Unix-like. Both have an object-centric rather than application-centric model with support for embedding applications within each other and controlling them with RPC APIs. IMO, the best practical realization of the Unix philosophy for the modern era would be a QNX/Plan 9-like OS with an Étoilé/BTRON-like desktop, hence why I'm working on one. Some of the specifics of the original Unix philosophy may not be relevant to large parts of modern computing, but I'd say the general ideas still are. From chet.ramey at case.edu Mon May 13 23:12:05 2024 From: chet.ramey at case.edu (Chet Ramey) Date: Mon, 13 May 2024 09:12:05 -0400 Subject: [TUHS] nl section delimiters In-Reply-To: <20240511204816.UwAcweCX@steffen%sdaoden.eu> References: <20240511204816.UwAcweCX@steffen%sdaoden.eu> Message-ID: On 5/11/24 4:48 PM, Steffen Nurpmeso wrote: > The [.] command search [.] allows for a standard utility to be > implemented as a regular built-in as long as it is found in the > appropriate place in a PATH search. > [.]command -v true might yield /bin/true or some similar pathname. To be fair, no one really implements this. ksh93 is the shell that comes closest. The next edition of the standard acknowledges the status quo with the `intrinsics' concept. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 203 bytes Desc: OpenPGP digital signature URL: From lm at mcvoy.com Mon May 13 23:21:54 2024 From: lm at mcvoy.com (Larry McVoy) Date: Mon, 13 May 2024 06:21:54 -0700 Subject: [TUHS] forking, Re: [COFF] Re: On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240512194707.GL9216@mcvoy.com> <20240512201349.0DB6A8A9D055@ary.qy> <20240512233454.GM9216@mcvoy.com> Message-ID: <20240513132154.GN9216@mcvoy.com> On Mon, May 13, 2024 at 11:34:38AM +1000, Dave Horsfall wrote: > On Sun, 12 May 2024, Larry McVoy wrote: > > > Our spawnvp() implmentation is 40 lines of code. Worked fine everywhere. > > I can post it if you like. > > Pretty please... > > -- Dave /* * Copyright 1999-2002,2004-2006,2015-2016 BitMover, Inc * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ #include "system.h" void (*spawn_preHook)(int flags, char *av[]) = 0; #ifndef WIN32 pid_t bk_spawnvp(int flags, char *cmdname, char *av[]) { int fd, status; pid_t pid; char *exec; /* Tell the calling process right away if there is no such program */ unless (exec = which((char*)cmdname)) return (-1); if (spawn_preHook) spawn_preHook(flags, av); if (pid = fork()) { /* parent */ free(exec); if (pid == -1) return (pid); unless (flags & (_P_DETACH|_P_NOWAIT)) { if (waitpid(pid, &status, 0) != pid) status = -1; return (status); } return (pid); } else { /* child */ /* * See win32/uwtlib/wapi_intf.c:spawnvp_ex() * We leave nothing open on a detach, but leave * in/out/err open on a normal fork/exec. */ if (flags & _P_DETACH) { unless (getenv("_NO_SETSID")) setsid(); /* close everything to match winblows */ for (fd = 0; fd < 100; fd++) (close)(fd); } else { /* * Emulate having everything except in/out/err * as being marked as close on exec to match winblows. */ for (fd = 3; fd < 100; fd++) (close)(fd); } execv(exec, av); perror(exec); _exit(19); } } #else /* ======== WIN32 ======== */ pid_t bk_spawnvp(int flags, char *cmdname, char *av[]) { pid_t pid; char *exec; /* Tell the calling process right away if there is no such program */ unless (exec = which((char*)cmdname)) return (-1); if (spawn_preHook) spawn_preHook(flags, av); /* * We use our own version of spawn in uwtlib * because the NT spawn() does not work well with tcl */ pid = _spawnvp_ex(flags, exec, av, 1); free(exec); return (pid); } #endif /* WIN32 */ From douglas.mcilroy at dartmouth.edu Mon May 13 23:34:36 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Mon, 13 May 2024 09:34:36 -0400 Subject: [TUHS] If forking is bad, how about buffering? Message-ID: So fork() is a significant nuisance. How about the far more ubiquitous problem of IO buffering? On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote: > But it does come down to the same argument as > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf The Microsoft manifesto says that fork() is an evil hack. One of the cited evils is that one must remember to flush output buffers before forking, for fear it will be emitted twice. But buffering is the culprit, not the victim. Output buffers must be flushed for many other reasons: to avoid deadlock; to force prompt delivery of urgent output; to keep output from being lost in case of a subsequent failure. Input buffers can also steal data by reading ahead into stuff that should go to another consumer. In all these cases buffering can break compositionality. Yet the manifesto blames an instance of the hazard on fork()! To assure compositionality, one must flush output buffers at every possible point where an unknown downstream consumer might correctly act on the received data with observable results. And input buffering must never ingest data that the program will not eventually use. These are tough criteria to meet in general without sacrificing buffering. The advent of pipes vividly exposed the non-compositionality of output buffering. Interactive pipelines froze when users could not provide input that would force stuff to be flushed until the input was informed by that very stuff. This phenomenon motivated cat -u, and stdio's convention of line buffering for stdout. The premier example of input buffering eating other programs' data was mitigated by "here documents" in the Bourne shell. These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead. The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com. In any even the abolition of buffering is a grand challenge. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreww591 at gmail.com Tue May 14 08:01:24 2024 From: andreww591 at gmail.com (Andrew Warkentin) Date: Mon, 13 May 2024 16:01:24 -0600 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: On Mon, May 13, 2024 at 7:42 AM Douglas McIlroy wrote: > > > These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead. The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com. > QNX copies messages directly between address spaces without any intermediary buffering, similarly to L4-like kernels. However, some of its libraries and servers do still use intermediary buffers. From robpike at gmail.com Tue May 14 17:10:38 2024 From: robpike at gmail.com (Rob Pike) Date: Tue, 14 May 2024 17:10:38 +1000 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: I agree with your (as usual) perceptive analysis. Only stopping by to point out that I took the buffering out of cat. I didn't have your perspicacity on why it should happen, just a desire to remove all the damn flags. When I was done, cat.c was 35 lines long. Do a read, do a write, continue until EOF. Guess what? That's all you need if you want to cat files. Sad to say Bell Labs's cat door was hard to open and most of the world still has a cat with flags. And buffers. -rob On Mon, May 13, 2024 at 11:35 PM Douglas McIlroy < douglas.mcilroy at dartmouth.edu> wrote: > So fork() is a significant nuisance. How about the far more ubiquitous > problem of IO buffering? > > On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote: > > But it does come down to the same argument as > > > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf > > The Microsoft manifesto says that fork() is an evil hack. One of the cited > evils is that one must remember to flush output buffers before forking, for > fear it will be emitted twice. But buffering is the culprit, not the > victim. Output buffers must be flushed for many other reasons: to avoid > deadlock; to force prompt delivery of urgent output; to keep output from > being lost in case of a subsequent failure. Input buffers can also steal > data by reading ahead into stuff that should go to another consumer. In all > these cases buffering can break compositionality. Yet the manifesto blames > an instance of the hazard on fork()! > > To assure compositionality, one must flush output buffers at every > possible point where an unknown downstream consumer might correctly act on > the received data with observable results. And input buffering must never > ingest data that the program will not eventually use. These are tough > criteria to meet in general without sacrificing buffering. > > The advent of pipes vividly exposed the non-compositionality of output > buffering. Interactive pipelines froze when users could not provide input > that would force stuff to be flushed until the input was informed by that > very stuff. This phenomenon motivated cat -u, and stdio's convention of > line buffering for stdout. The premier example of input buffering eating > other programs' data was mitigated by "here documents" in the Bourne shell. > > These precautions are mere fig leaves that conceal important special > cases. The underlying evil of buffered IO still lurks. The justification is > that it's necessary to match the characteristics of IO devices and to > minimize system-call overhead. The former necessity requires the attention > of hardware designers, but the latter is in the hands of programmers. What > can be done to mitigate the pain of border-crossing into the kernel? L4 and > its ilk have taken a whack. An even more radical approach might flow from > the "whitepaper" at www.codevalley.com. > > In any even the abolition of buffering is a grand challenge. > > Doug > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Tue May 14 21:10:32 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Tue, 14 May 2024 06:10:32 -0500 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: <20240514111032.2kotrrjjv772h5f4@illithid> I've wondered about the cat flag war myself, and have a theory. Might as well air it here since the real McCoy (and McIlroy) are available to shoot it down. :) I'm sure the following attempt at knot-slashing is not novel, but people relentlessly return to this issue as if the presence of _flags_ is the problem. (Plan 9 fans recite this point ritually, like a mantra.) I say it isn't. At 2024-05-14T17:10:38+1000, Rob Pike wrote: > I agree with your (as usual) perceptive analysis. Only stopping by to > point out that I took the buffering out of cat. I didn't have your > perspicacity on why it should happen, just a desire to remove all the > damn flags. When I was done, cat.c was 35 lines long. Do a read, do a > write, continue until EOF. Guess what? That's all you need if you want > to cat files. > > Sad to say Bell Labs's cat door was hard to open and most of the world > still has a cat with flags. And buffers. I think this dispute is a proxy fight between two communities, or more precisely two views of what cat(1), and other elementary Unix commands, primarily exist to achieve. In my opinion both perspectives are valid, and it's better to consider what each perspective wants than mandate that either is superior. Viewpoint 1: Perspective from Pike's Peak Elementary Unix commands should be elementary. Unix is a kernel. Programs that do simple things with system calls should remain simple. This practices makes the system (the kernel interface) easier to learn, and to motivate and justify to others. Programs therefore test the simplicity and utility of, and can reveal flaws in, the set of primitives that the kernel exposes. This is valuable stuff for a research organization. "Research" was right there in the CSRC's name. Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1] cat(1)'s man page did not advertise the traits in the foregoing viewpoint as objectives, and never did.[2] Its avowed purpose was to copy, without interruption or separation, 1..n files from storage to and output channel or stream (which might be redirected). I don't need to tell convince that this is a worthwhile application. But when we think about the many possible ways--and destinations--a person might have in mind for that I/O channel, we have to face the necessity of buffering or performance goes through the floor. It is 1978. Some VMS or, ugh, CP/M advocate from those piddly little toy machines will come along. "Ha ha," they will say, "our OS is way faster than the storied Unix even at the simple task of dumping files". Nowhere[citation needed] outside of C tutorials is cat implemented as int c; while((c = getchar()) != EOF) putchar(c); or its read()/write() system call equivalent. The output channel might be across a network in a distributed computing environment. Nobody wants to work with one byte at a time in that situation. Ethernet's minimum packet size is 64 bytes. No one wants that kind of overhead. While composing this mail, I had a look at an early, pre-C version of cat, spelling error in the only comment line and all. https://minnie.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/cat.s putc: movb r0,(r2)+ cmp r2,$obuf+512. blo 1f mov $1,r0 sys write; obuf; 512. mov $obuf,r2 Well, look at that. Buffering. The author of this tool of course knew the kernel well, including the size of its internal disk buffers (on the assumption that I/O would mainly be happening to and from disks). But that's a "leaky abstraction", or a "layering violation". (That'll be two tickets to the eternal fires of Brogrammer Hell, thanks.) Once you sweep away the break room buzzwords we understand that cat is presuming things that it should not (the size of the kernel's buffers, and the nature of devices serving as source and sink). And this, as we all know, is one of the reasons the standard I/O library came into existence. Mike Lesk, I surmise, understood that the "applications programmer" having knowledge of kernel internals was in general neither necessary nor desirable. What _should_ have happened, IMAO, is that as stdio.h came into existence and the commercialization and USG/PWB-ification of Unix became truly inevitable, is that Viewpoint 1 should have been salvaged for the benefit of continuing operating systems research and kernel development. But! We should have kept cat(1), and let it grow as many flags as practical use demanded--_except_ for `-u`--and at the _same time_ developed a new kcat(1) command that really was just a thin wrapper around system calls. Then you'd be a lot closer to measuring what the kernel was really doing, what you were paying for it, and you could still boast of your elegance in OS textbooks. I concede that the name "kcat" would have been twice the length a certain prominent user of the Unix kernel would have tolerated. Maybe "kc" would have been better. The remaining 61 alphanumeric sigils that might follow the 'k' would have been reserved for other exercises of the kernel interface. If your kernel is sufficiently lean,[3] 62 cases exercising it ought to be enough for anybody. Regards, Branden [1] https://news.ycombinator.com/item?id=29082014 [2] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1 [3] https://dl.acm.org/doi/10.1145/224056.224075 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From ggm at algebras.org Wed May 15 08:08:49 2024 From: ggm at algebras.org (George Michaelson) Date: Wed, 15 May 2024 08:08:49 +1000 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: Maybe dd is the right place to decide how to buffer? It appears to understand thats part of it's role. I use mbuffer, and I have absolutely no idea if its proffered buffer, scatter/gather, SETSOCKOPT behaviour does or does not improve things but I use it, even though netcat exists... G From tuhs at tuhs.org Wed May 15 08:34:37 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Tue, 14 May 2024 15:34:37 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: Buffering is used all over the place. Even serial devices use a 16 byte of buffer -- all to reduce the cost of per unit (character, disk block or packet etc.) processing or to smooth data flow or to utilize the available bandwidth. But in such applications the receiver/sender usually has a way of getting an alert when the FIFO has data/is empty. As long as you provide that you can compose more complex network of components. Imagine components connected via FIFOs that provide empty, almost empty, almost full, full signals. And may be more in case of lossy connections. [Though at a lower level you'd model these fifo as components too so at that level there'd be *no* buffering! Sort of like Carl Hewitt's Actor model!] Your complaint seems more about how buffers are currently used and where the "network" of components are dynamically formed. > On May 13, 2024, at 6:34 AM, Douglas McIlroy wrote: > > So fork() is a significant nuisance. How about the far more ubiquitous problem of IO buffering? > > On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote: > > But it does come down to the same argument as > > https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf > > The Microsoft manifesto says that fork() is an evil hack. One of the cited evils is that one must remember to flush output buffers before forking, for fear it will be emitted twice. But buffering is the culprit, not the victim. Output buffers must be flushed for many other reasons: to avoid deadlock; to force prompt delivery of urgent output; to keep output from being lost in case of a subsequent failure. Input buffers can also steal data by reading ahead into stuff that should go to another consumer. In all these cases buffering can break compositionality. Yet the manifesto blames an instance of the hazard on fork()! > > To assure compositionality, one must flush output buffers at every possible point where an unknown downstream consumer might correctly act on the received data with observable results. And input buffering must never ingest data that the program will not eventually use. These are tough criteria to meet in general without sacrificing buffering. > > The advent of pipes vividly exposed the non-compositionality of output buffering. Interactive pipelines froze when users could not provide input that would force stuff to be flushed until the input was informed by that very stuff. This phenomenon motivated cat -u, and stdio's convention of line buffering for stdout. The premier example of input buffering eating other programs' data was mitigated by "here documents" in the Bourne shell. > > These precautions are mere fig leaves that conceal important special cases. The underlying evil of buffered IO still lurks. The justification is that it's necessary to match the characteristics of IO devices and to minimize system-call overhead. The former necessity requires the attention of hardware designers, but the latter is in the hands of programmers. What can be done to mitigate the pain of border-crossing into the kernel? L4 and its ilk have taken a whack. An even more radical approach might flow from the "whitepaper" at www.codevalley.com . > > In any even the abolition of buffering is a grand challenge. > > Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From crossd at gmail.com Thu May 16 00:42:33 2024 From: crossd at gmail.com (Dan Cross) Date: Wed, 15 May 2024 10:42:33 -0400 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240514111032.2kotrrjjv772h5f4@illithid> References: <20240514111032.2kotrrjjv772h5f4@illithid> Message-ID: On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson wrote: > [snip] > Viewpoint 1: Perspective from Pike's Peak Clever. > Elementary Unix commands should be elementary. Unix is a kernel. > Programs that do simple things with system calls should remain simple. > This practices makes the system (the kernel interface) easier to learn, > and to motivate and justify to others. Programs therefore test the > simplicity and utility of, and can reveal flaws in, the set of > primitives that the kernel exposes. This is valuable stuff for a > research organization. "Research" was right there in the CSRC's name. I believe this is at once making a more complex argument than was proffered, and at the same misses the contextual essence that Unix was created in. > Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1] > > cat(1)'s man page did not advertise the traits in the foregoing > viewpoint as objectives, and never did.[2] Its avowed purpose was to > copy, without interruption or separation, 1..n files from storage to and > output channel or stream (which might be redirected). > > I don't need to tell convince that this is a worthwhile application. > But when we think about the many possible ways--and destinations--a > person might have in mind for that I/O channel, we have to face the > necessity of buffering or performance goes through the floor. > > It is 1978. Some VMS I don't know about that; VMS IO is notably slower than Unix IO by default. Unlike VMS, Unix uses the buffer cache to serialize access to the underlying storage device(s). Ironically, caching here is a major win, not just for speed, but to make it relatively easy to reason about the state of a block, since that state is removed from the minutiae of the underlying storage device and instead handled in the bio layer. Treating the block cache as a fixed-size pool yields a relatively simple state machine for synchronizing between the in-memory and on-disk representations of data. >[snip] > And this, as we all know, is one of the reasons the standard I/O library > came into existence. Mike Lesk, I surmise, understood that the > "applications programmer" having knowledge of kernel internals was in > general neither necessary nor desirable. I'm not sure about that. I suspect that the justification _may_ have been more along the lines of noting that many programs implemented their own, largely similar buffering strategies, and that it was preferable to centralize those into a single library, and also noting that building some kinds of programs was inconvenient using raw system calls. For instance, something like `gets` is handy, but is _annoying_ to write using just read(2). It can obviously be done, but if I don't have to, I'd prefer not to. > [snip] > We should have kept cat(1), and let it grow as many flags as practical > use demanded--_except_ for `-u`--and at the _same time_ developed a new > kcat(1) command that really was just a thin wrapper around system calls. > Then you'd be a lot closer to measuring what the kernel was really > doing, what you were paying for it, and you could still boast of your > elegance in OS textbooks. > [snip] Here's where I think this misses the mark: this focuses too much on the idea that simple programs exist as to be tests for, and exemplars of, the kernel system call interface, but what evidence do you have for that? A simpler explanation is that simple programs are easier to write, easier to read, easier to reason about, test, and examine for correctness. Unix amplified this with Doug's "garden hoses of data" idea and the advent of pipes; here, it was found that small, simple programs could be combined in often surprisingly unanticipated ways. Unix built up a philosophy about _how_ to write programs that was rooted in the problems that were interesting when Unix was first created. Something we often forget is that research systems are built to address problems that are interesting _to the researchers who build them_. This context can shape a system, and we see that with Unix: a highly synchronous system call interface, because overly elaborate async interfaces were hard to program; a simple file abstraction that was easy to use (open/creat/read/write/close/seek/stat) because files on other contemporary systems were baroque things that were difficult to use; a simple primitive for the creation of processes because, again, on other systems processes were very heavy, complicated things that were difficult to use. Unix took problems related to IO and processes and made them easy. By the 80s, these were pretty well understood, so focus shifted to other things (languages, networking, etc). Unix is one of those rare beasts that escaped the lab and made it out there in the wild. It became the workhorse that beget a whole two or three generations of commercial work; it's unsurprising that when the web explosion happened, Unix became the basis for it: it was there, it was familiar, and by then it wasn't a research project anymore, but a basis for serious commercial work. That it has retained the original system call interface is almost incidental; perhaps that fits with your brocolli-man analogy. - Dan C. From g.branden.robinson at gmail.com Thu May 16 02:42:12 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Wed, 15 May 2024 11:42:12 -0500 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: <20240514111032.2kotrrjjv772h5f4@illithid> Message-ID: <20240515164212.beswgy4h2nwvbdck@illithid> Hi Dan, Thanks for the considered response. I was beginning to fear that my musing was of moronically minimal merit. At 2024-05-15T10:42:33-0400, Dan Cross wrote: > On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson > wrote: > > [snip] > > Viewpoint 1: Perspective from Pike's Peak > > Clever. If Rob's never heard _that_ one before, I am deeply disappointed. > > Elementary Unix commands should be elementary. Unix is a kernel. > > Programs that do simple things with system calls should remain > > simple. This practices makes the system (the kernel interface) > > easier to learn, and to motivate and justify to others. Programs > > therefore test the simplicity and utility of, and can reveal flaws > > in, the set of primitives that the kernel exposes. This is valuable > > stuff for a research organization. "Research" was right there in > > the CSRC's name. > > I believe this is at once making a more complex argument than was > proffered, and at the same misses the contextual essence that Unix was > created in. My understanding of that context is, "a pleasant environment for software development" (McIlroy)[0]. My notion of software development entails (when not under managerial pressure to bang something together for the exploitation of "market advantage") analysis and reanalysis of software components to make them more efficient and more composable. As a response to the perceived bloat of Multics, the development of the Unix kernel absolutely involved much critical reappraisal of what _needed_ to be in a kernel, and of which services were so essential that they must be offered. As a microkernel Kool-Aid drinker, I tend to view Unix's origin in that light, which was reinforced by the severe limitations of the PDP-7 where it was born. Possibly many of the decisions about where to draw the kernel service/userspace service line we made by instinct or seasoned judgment, but the CSRC being a research organization, I'd be surprised if matters of empirical measurement were far from top of mind. It's a shame we don't have more insight into Thompson's development process, especially in those early days. I think we have a tendency to conceive of Unix as having sprung from his fingers already crystallized, like a mineral Athena from the forehead of Zeus. I would wager (and welcome correction if he has the patience) that he made and reversed decisions based on the experience of using the system. Some episodes in McIlroy's "A Research Unix Reader" illustrate that this was a recurring feature of its _later_ development, so why not in the incubation period? That, too, is empirical measurement, even if informal. Many revisions are made in software because we find in testing that something is "too damn slow", or runs the system out of memory too often. So to summarize, I want to push back on your counter here. Making little things to measure system features is a salutary practice in OS development. Stevens's _Advanced Programming in the Unix Environment_ is, shall we say, tricked out with exhibits along these lines. The author's dedication to _measurement_ as opposed to partisan opinion is, I think, a major factor in its status as a landmark work and as nigh-essential reading for the serious Unix developer to this day. Put differently, why would anyone _care_ about making cat(1) simple if one didn't have these objectives in mind? > > Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1] > > > > cat(1)'s man page did not advertise the traits in the foregoing > > viewpoint as objectives, and never did.[2] Its avowed purpose was > > to copy, without interruption or separation, 1..n files from storage > > to and output channel or stream (which might be redirected). > > > > I don't need to tell convince that this is a worthwhile application. > > But when we think about the many possible ways--and destinations--a > > person might have in mind for that I/O channel, we have to face the > > necessity of buffering or performance goes through the floor. > > > > It is 1978. Some VMS > > I don't know about that; VMS IO is notably slower than Unix IO by > default. Unlike VMS, Unix uses the buffer cache to serialize access to > the underlying storage device(s). I must confess I have little experience with VMS (and none more recent than 30 years ago) and offered it as an example mainly because it was actually around in 1978 (if still fresh from the foundry). My personal backstory is much more along the lines of my other example, CP/M on toy computers (8-bit data bus pffffffft, right?). > Ironically, caching here is a major win, not just for speed, but to > make it relatively easy to reason about the state of a block, since > that state is removed from the minutiae of the underlying storage > device and instead handled in the bio layer. Treating the block cache > as a fixed-size pool yields a relatively simple state machine for > synchronizing between the in-memory and on-disk representations of > data. I entirely agree with this. I contemplated following up Bakul Shah's post with a mention of Jim Gettys's work on bufferbloat.[1] So let me do that here, and venture the opinion that a "buffer" as popularly conceived and implemented (more or less just a hunk of memory to house data) is too damn dumb a data structure for many of the uses to which it is put. If/when people address these problems, they do what the Unix buffer cache did; they elaborate it with state. This is a repeated design pattern: see SIGURG for example. Off the top of my head I perceive three circumstances that buffers often need to manage. 1. Avoidance of underrun. Such were the joys of CD-R burning. But also important in streaming or other real-time applications to avoid interruption. Essentially you want to be able to say, "I'm running out of data at the current rate, please supply more ASAP". 2. Avoidance of overrun. The problems of modem-like flow control are familiar to most. An important insight here, reinforced if not pioneered by Gettys, is that "just making the buffer bigger", the brogrammer solution, is not always the wise choice. 3. Cancellation. Familiar to all as SIGPIPE. Sometimes all of the data in the buffer is invalidated. The sender needs to stop transmitting ASAP, and the receiver can discard whatever it has. I apologize for the armchair approach. I have no doubt that much literature exists that has covered this stuff far more rigorously. And yet much of that knowledge has not made its way down the mountain into practice. That, I think, was at least part of Doug's point. Academics may have considered the topic adequately, but practitioners are too often solving problems as if it's 1972. > >[snip] > > And this, as we all know, is one of the reasons the standard I/O > > library came into existence. Mike Lesk, I surmise, understood that > > the "applications programmer" having knowledge of kernel internals > > was in general neither necessary nor desirable. > > I'm not sure about that. I suspect that the justification _may_ have > been more along the lines of noting that many programs implemented > their own, largely similar buffering strategies, and that it was > preferable to centralize those into a single library, and also noting > that building some kinds of programs was inconvenient using raw system > calls. For instance, something like `gets` is handy, An interesting choice given its notoriety as a nuclear landmine of insecurity. ;-) > but is _annoying_ to write using just read(2). It can obviously be > done, but if I don't have to, I'd prefer not to. I think you are justifying why stdio was written _as a library_, as your points seem to be pretty typical examples of why we move code thither from applications. My emphasis is a little different: why was buffered I/O in particular (when it could so easily have been string handling) the nucleus of what would be become a large standard library with its toes in many waters, so huge that projects like uclibc and musl arose for the purpose of (in part) chopping back out the stuff they felt they didn't need? My _claim_ is that stdio.h was the first piece of the library to walk upright because the need for it was most intense. More so than with strings; in fact we've learned that Nelson's original C string library was tricky to use well, was often elaborated by others in unfortunate ways.[7] But there was no I/O at all without going through the kernel, and while there were many ways to get that job done, the best leveraged knowledge of what the kernel had to work with. And yet, the kernel might get redesigned. Could stdio itself have been done better? Korn and Vo tried.[8] > Here's where I think this misses the mark: this focuses too much on > the idea that simple programs exist as to be tests for, and exemplars > of, the kernel system call interface, but what evidence do you have > for that? A little bit of experience, long after the 1970s, of working with automated tests for the seL4 microkernel. > A simpler explanation is that simple programs are easier to > write, easier to read, easier to reason about, test, and examine for > correctness. All certainly true. But these things are just as true of programs that don't directly make system calls at all. cat(1), as ideally envisioned by Pike (if I understand the Platonic ideal of his position correctly), not only makes system calls, but dirties its hands with the standard library as little as possible (if you recognize no options, you need neither call nor reimplement getopt(3)) and certainly not for the central task. Again I think we are not so much disagreeing as much as I'm finding out that I didn't adequately emphasize the distinctions I was making. > Unix amplified this with Doug's "garden hoses of data" idea and the > advent of pipes; here, it was found that small, simple programs could > be combined in often surprisingly unanticipated ways. Agreed; but given that pipes-as-a-service are supplied by the _kernel_, we are once again talking about system calls. One of the projects I never got off the ground with seL4 was a reconsideration from first principles of what sorts of more or less POSIXish buffering and piping mechanisms should be offered (in userland of course). For those who are scandalized that a microkernel doesn't offer pipes itself, see this Heiser piece on "IPC" in that system.[2] > Unix built up a philosophy about _how_ to write programs that was > rooted in the problems that were interesting when Unix was first > created. Something we often forget is that research systems are built > to address problems that are interesting _to the researchers who build > them_. I agree. > This context can shape a system, and we see that with Unix: a > highly synchronous system call interface, because overly elaborate > async interfaces were hard to program; And still are, apparently even without the qualifier "overly elaborate". ...though Go (and JavaScript?) fans may disagree. > a simple file abstraction that was easy to use > (open/creat/read/write/close/seek/stat) because files on other > contemporary systems were baroque things that were difficult to use; Absolutely. It's a truism in the Unix community that it's possible to simulated record-oriented storage and retrieval on top of a byte stream, but hard to do the converse. Though, being a truism, it might be worthwhile to critically reconsider it and more rigorously establish how we know what we think we know. That's another reason I endorse the microkernel mission. Let's lower the cost of experimentation on parts of the system that of themselves don't demand privilege. It's a highly concurrent, NUMA world out there. > a simple primitive for the creation of processes because, again, on > other systems processes were very heavy, complicated things that were > difficult to use. It is with some dismay that I look at what they are, _on Unix_, today. https://github.com/torvalds/linux/blob/1b294a1f35616977caddaddf3e9d28e576a1adbc/include/linux/sched.h#L748 https://github.com/openbsd/src/blob/master/sys/sys/proc.h#L138 Contrast: https://github.com/jeffallen/xv6/blob/master/proc.h#L65 > Unix took problems related to IO and processes and made them easy. By > the 80s, these were pretty well understood, so focus shifted to other > things (languages, networking, etc). True, but beside my point. Pike's point about cat and its flags was, I think, a call to reconsider more fundamental things. To question what we thought we knew--about how best to design core components of the system, for example. Do we really need the efflorescence of options that perfuses not simply the GNU versions of such components (a popular sink for abuse), but Busybox and *BSD implementations as well? Every developer of such a component should consider the cost/benefit ratio of flags, and then RE-consider them at intervals. Even at the cost of backward compatibility. (Deprecation cycles and mitigation/migration plans are good.) > Unix is one of those rare beasts that escaped the lab and made it out > there in the wild. It became the workhorse that beget a whole two or > three generations of commercial work; it's unsurprising that when the > web explosion happened, Unix became the basis for it: it was there, it > was familiar, and by then it wasn't a research project anymore, but a > basis for serious commercial work. Yes, and in a sense this success has cost all of us.[3][4][5] > That it has retained the original system call interface is almost > incidental; In _structure_, sure; in detail, I'm not sure this claim withstands scrutiny. Just _count_ the system calls we have today vs. V6 or V7. > perhaps that fits with your brocolli-man analogy. I'm unfamiliar with this metaphor. It makes me wonder how to place it in company with the requirements documents that led to the Ada language: Strawman, Woodenman, Ironman, and Steelman. At least it's likely better eating than any of those. ;-) Since no one else ever says it on this list, let me point out what a terrific and unfairly maligned language Ada is. In reading the minutes of the latest WG14 meeting[6] I marvel anew at how C has over time slowly, slowly accreted type- and memory-safety features that Ada had in 1983 (or even in 1980, before its formal standardization). Regards, Branden [0] https://www.gnu.org/software/groff/manual/groff.html.node/Background.html [1] https://gettys.wordpress.com/category/bufferbloat/ [2] https://microkerneldude.org/2019/03/07/how-to-and-how-not-to-use-sel4-ipc/ [3] https://tianyin.github.io/misc/irrelevant.pdf (guess who) [4] https://www.youtube.com/watch?v=36myc8wQhLo (Timothy Roscoe) [5] https://queue.acm.org/detail.cfm?id=3212479 (David Chisnall) [6] https://www.open-std.org/JTC1/sc22/wg14/www/docs/n3227.htm Skip down to section 5. Note particularly `_Optional`. [7] https://www.symas.com/post/the-sad-state-of-c-strings [8] https://www.semanticscholar.org/paper/SFIO%3A-Safe-Fast-String-File-IO-Korn-Vo/8014266693afda38a0a177a9b434fedce98eb7de -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From dave at horsfall.org Thu May 16 16:34:54 2024 From: dave at horsfall.org (Dave Horsfall) Date: Thu, 16 May 2024 16:34:54 +1000 (EST) Subject: [TUHS] Be there a "remote diff" utility? Message-ID: Every so often I want to compare files on remote machines, but all I can do is to fetch them first (usually into /tmp); I'd like to do something like: rdiff host1:file1 host2:file2 Breathes there such a beast? I see that Penguin/OS has already taken "rdiff" which doesn't seem to do what I want. Think of it as an extension to the Unix philosophy of "Everything looks like a file"... -- Dave From arnold at skeeve.com Thu May 16 16:51:43 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Thu, 16 May 2024 00:51:43 -0600 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: Message-ID: <202405160651.44G6pi8G018059@freefriends.org> Maybe diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) ? That could be put into a shell script that does the approriate text manipulations on the original $1 and $2. HTH, Arnold Dave Horsfall wrote: > Every so often I want to compare files on remote machines, but all I can > do is to fetch them first (usually into /tmp); I'd like to do something > like: > > rdiff host1:file1 host2:file2 > > Breathes there such a beast? I see that Penguin/OS has already taken > "rdiff" which doesn't seem to do what I want. > > Think of it as an extension to the Unix philosophy of "Everything looks > like a file"... > > -- Dave From ralph at inputplus.co.uk Thu May 16 17:33:51 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Thu, 16 May 2024 08:33:51 +0100 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: <202405160651.44G6pi8G018059@freefriends.org> References: <202405160651.44G6pi8G018059@freefriends.org> Message-ID: <20240516073351.267351FAE3@orac.inputplus.co.uk> Hi, I've set ‘mail-followup-to: coff at tuhs.org’. > > Every so often I want to compare files on remote machines, but all > > I can do is to fetch them first (usually into /tmp); I'd like to do > > something like: > > > > rdiff host1:file1 host2:file2 > > > > Breathes there such a beast? No, nor should there. It would be slain less it beget rcmp, rcomm, rpaste, ... > > Think of it as an extension to the Unix philosophy of "Everything > > looks like a file"... Then make remote files look local as far as their access is concerned. Ideally at the system-call level. Less ideal, at libc.a. > Maybe > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) This is annoyingly noisy if the remote SSH server has sshd_config(5)'s ‘Banner’ set which spews the contents of a file before authentication, e.g. the pointless This computer system is the property of ... Disconnect NOW if you have not been expressly authorised to use this system. Unauthorised use is a criminal offence under the Computer Misuse Act 1990. Communications on or through ...uk's computer systems may be monitored or recorded to secure effective system operation and for other lawful purposes. It appears on stderr so doesn't upset the diff but does clutter. And discarding stderr is too sloppy. -- Cheers, Ralph. From ggm at algebras.org Thu May 16 18:59:42 2024 From: ggm at algebras.org (George Michaelson) Date: Thu, 16 May 2024 18:59:42 +1000 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: <20240516073351.267351FAE3@orac.inputplus.co.uk> References: <202405160651.44G6pi8G018059@freefriends.org> <20240516073351.267351FAE3@orac.inputplus.co.uk> Message-ID: Sshfs G On Thu, 16 May 2024, 5:34 pm Ralph Corderoy, wrote: > Hi, > > I've set ‘mail-followup-to: coff at tuhs.org’. > > > > Every so often I want to compare files on remote machines, but all > > > I can do is to fetch them first (usually into /tmp); I'd like to do > > > something like: > > > > > > rdiff host1:file1 host2:file2 > > > > > > Breathes there such a beast? > > No, nor should there. It would be slain less it beget rcmp, rcomm, > rpaste, ... > > > > Think of it as an extension to the Unix philosophy of "Everything > > > looks like a file"... > > Then make remote files look local as far as their access is concerned. > Ideally at the system-call level. Less ideal, at libc.a. > > > Maybe > > > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) > > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s > ‘Banner’ set which spews the contents of a file before authentication, > e.g. the pointless > > This computer system is the property of ... > > Disconnect NOW if you have not been expressly authorised to use this > system. Unauthorised use is a criminal offence under the Computer > Misuse Act 1990. > > Communications on or through ...uk's computer systems may be > monitored or recorded to secure effective system operation and for > other lawful purposes. > > It appears on stderr so doesn't upset the diff but does clutter. > And discarding stderr is too sloppy. > > -- > Cheers, Ralph. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnold at skeeve.com Thu May 16 19:01:11 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Thu, 16 May 2024 03:01:11 -0600 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: <20240516073351.267351FAE3@orac.inputplus.co.uk> References: <202405160651.44G6pi8G018059@freefriends.org> <20240516073351.267351FAE3@orac.inputplus.co.uk> Message-ID: <202405160901.44G91CN0007274@freefriends.org> Ralph Corderoy wrote: > > Maybe > > > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) > > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s > ‘Banner’ set which spews the contents of a file before authentication, > e.g. the pointless > > [....] > > It appears on stderr so doesn't upset the diff but does clutter. All true, I didn't think about that. > And discarding stderr is too sloppy. But the author of a personal script knows his/her remote machines and can decide if diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2 2>/dev/null) is appropriate or not. My main point was that the problem is easily solved with a few lines of shell, so no need for a utility, especially one written in C or some other compiled language. Thanks, Arnold From douglas.mcilroy at dartmouth.edu Thu May 16 22:31:27 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Thu, 16 May 2024 08:31:27 -0400 Subject: [TUHS] Be there a "remote diff" utility? Message-ID: With the disclaimer that I have never used it, I note that FUSE/sshfs allows one to mount remote file systems. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From rminnich at gmail.com Fri May 17 03:08:40 2024 From: rminnich at gmail.com (ron minnich) Date: Thu, 16 May 2024 10:08:40 -0700 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: <202405160901.44G91CN0007274@freefriends.org> References: <202405160651.44G6pi8G018059@freefriends.org> <20240516073351.267351FAE3@orac.inputplus.co.uk> <202405160901.44G91CN0007274@freefriends.org> Message-ID: " The 9import tool allows an arbitrary file on a remote system, with the capability of running the Plan 9 exportfs(4) service, to be imported into the local name space. Usually file is a directory, so the complete file tree under the directory is made available." https://9fans.github.io/plan9port/man/man4/9import.html 9import host1 / /tmp/host1 9import host2 /tmp/host2 diff /tmp/host1/a/b/c /tmp/host2/a/b/c (or whatever command you want that works with files. No need for stuff like 'rdiff' etc.) stuff you take for granted on some systems ... I have the plan 9 cpu command working (written in Go) and I think it's time I get import working more widely, it's just too useful. On Thu, May 16, 2024 at 2:01 AM wrote: > Ralph Corderoy wrote: > > > > Maybe > > > > > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) > > > > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s > > ‘Banner’ set which spews the contents of a file before authentication, > > e.g. the pointless > > > > [....] > > > > It appears on stderr so doesn't upset the diff but does clutter. > > All true, I didn't think about that. > > > And discarding stderr is too sloppy. > > But the author of a personal script knows his/her remote machines > and can decide if > > diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2 > 2>/dev/null) > > is appropriate or not. > > My main point was that the problem is easily solved with a > few lines of shell, so no need for a utility, especially one > written in C or some other compiled language. > > Thanks, > > Arnold > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Fri May 17 03:12:27 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Thu, 16 May 2024 10:12:27 -0700 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: Message-ID: An interesting question is whether there exists a diff algorithm which *minimizes* data movement across the network. Assuming similar lengths, you can halve it by running the diff at one of the hosts but can one do better if the two files are fairly similar? Is this even a theoretical possibility? I don't see links to any such algorithm on wikipedia's diff page but I figured there might be someone on TUHS who may have speculated or know about this! Bakul -------------- next part -------------- An HTML attachment was scrubbed... URL: From rich.salz at gmail.com Fri May 17 04:12:12 2024 From: rich.salz at gmail.com (Rich Salz) Date: Thu, 16 May 2024 14:12:12 -0400 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: Message-ID: The rsync protocol might be appropriate. See https://www.samba.org/~tridge/phd_thesis.pdf and https://rsync.samba.org/tech_report/node2.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Fri May 17 04:38:37 2024 From: tuhs at tuhs.org (Ben Greenfield via TUHS) Date: Thu, 16 May 2024 14:38:37 -0400 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: Message-ID: I use rsync for that with the -n dry run flag > On May 16, 2024, at 2:12 PM, Rich Salz wrote: > > The rsync protocol might be appropriate. See https://www.samba.org/~tridge/phd_thesis.pdf and https://rsync.samba.org/tech_report/node2.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fariborz.t at gmail.com Fri May 17 04:51:11 2024 From: fariborz.t at gmail.com (Skip Tavakkolian) Date: Thu, 16 May 2024 11:51:11 -0700 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: <202405160651.44G6pi8G018059@freefriends.org> <20240516073351.267351FAE3@orac.inputplus.co.uk> <202405160901.44G91CN0007274@freefriends.org> Message-ID: To add to Ron's post, Plan 9's cpu exports the origination's namespace to the destination; by convention it is mounted on /mnt/term at destination. host1% cpu -h host2 host2% diff file2 /mnt/term/usr/me/file1 On Thu, May 16, 2024 at 10:09 AM ron minnich wrote: > " The 9import tool allows an arbitrary file on a remote system, with the > capability of running the Plan 9 exportfs(4) service, to be imported into > the local name space. Usually file is a directory, so the complete file > tree under the directory is made available." > https://9fans.github.io/plan9port/man/man4/9import.html > > 9import host1 / /tmp/host1 > 9import host2 /tmp/host2 > diff /tmp/host1/a/b/c /tmp/host2/a/b/c > (or whatever command you want that works with files. No need for stuff > like 'rdiff' etc.) > > stuff you take for granted on some systems ... > > I have the plan 9 cpu command working (written in Go) and I think it's > time I get import working more widely, it's just too useful. > > On Thu, May 16, 2024 at 2:01 AM wrote: > >> Ralph Corderoy wrote: >> >> > > Maybe >> > > >> > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) >> > >> > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s >> > ‘Banner’ set which spews the contents of a file before authentication, >> > e.g. the pointless >> > >> > [....] >> > >> > It appears on stderr so doesn't upset the diff but does clutter. >> >> All true, I didn't think about that. >> >> > And discarding stderr is too sloppy. >> >> But the author of a personal script knows his/her remote machines >> and can decide if >> >> diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2 >> 2>/dev/null) >> >> is appropriate or not. >> >> My main point was that the problem is easily solved with a >> few lines of shell, so no need for a utility, especially one >> written in C or some other compiled language. >> >> Thanks, >> >> Arnold >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.donner at gmail.com Fri May 17 05:51:45 2024 From: marc.donner at gmail.com (Marc Donner) Date: Thu, 16 May 2024 15:51:45 -0400 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: <202405160651.44G6pi8G018059@freefriends.org> <20240516073351.267351FAE3@orac.inputplus.co.uk> <202405160901.44G91CN0007274@freefriends.org> Message-ID: If I recall correctly, there is a combination of flags to rsync that will generate a report on a file, a set of files, or a set of directories to tell if they are different. I seem to recall DPK or RCT doing something clever with rsync and cksum to get this sort of result without having to stream a lot of data across the long-haul network back in the day. Best, Marc ===== nygeek.net mindthegapdialogs.com/home On Thu, May 16, 2024 at 2:51 PM Skip Tavakkolian wrote: > To add to Ron's post, Plan 9's cpu exports the origination's namespace to > the destination; by convention it is mounted on /mnt/term at destination. > > host1% cpu -h host2 > host2% diff file2 /mnt/term/usr/me/file1 > > > On Thu, May 16, 2024 at 10:09 AM ron minnich wrote: > >> " The 9import tool allows an arbitrary file on a remote system, with the >> capability of running the Plan 9 exportfs(4) service, to be imported into >> the local name space. Usually file is a directory, so the complete file >> tree under the directory is made available." >> https://9fans.github.io/plan9port/man/man4/9import.html >> >> 9import host1 / /tmp/host1 >> 9import host2 /tmp/host2 >> diff /tmp/host1/a/b/c /tmp/host2/a/b/c >> (or whatever command you want that works with files. No need for stuff >> like 'rdiff' etc.) >> >> stuff you take for granted on some systems ... >> >> I have the plan 9 cpu command working (written in Go) and I think it's >> time I get import working more widely, it's just too useful. >> >> On Thu, May 16, 2024 at 2:01 AM wrote: >> >>> Ralph Corderoy wrote: >>> >>> > > Maybe >>> > > >>> > > diff -u <(ssh host1 cat file1) <(ssh host2 cat file2) >>> > >>> > This is annoyingly noisy if the remote SSH server has sshd_config(5)'s >>> > ‘Banner’ set which spews the contents of a file before authentication, >>> > e.g. the pointless >>> > >>> > [....] >>> > >>> > It appears on stderr so doesn't upset the diff but does clutter. >>> >>> All true, I didn't think about that. >>> >>> > And discarding stderr is too sloppy. >>> >>> But the author of a personal script knows his/her remote machines >>> and can decide if >>> >>> diff -u <(ssh host1 cat file1 2>/dev/null) <(ssh host2 cat file2 >>> 2>/dev/null) >>> >>> is appropriate or not. >>> >>> My main point was that the problem is easily solved with a >>> few lines of shell, so no need for a utility, especially one >>> written in C or some other compiled language. >>> >>> Thanks, >>> >>> Arnold >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tytso at mit.edu Fri May 17 05:53:09 2024 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 16 May 2024 13:53:09 -0600 Subject: [TUHS] Be there a "remote diff" utility? In-Reply-To: References: Message-ID: <20240516195309.GB287325@mit.edu> On Thu, May 16, 2024 at 04:34:54PM +1000, Dave Horsfall wrote: > Every so often I want to compare files on remote machines, but all I can > do is to fetch them first (usually into /tmp); I'd like to do something > like: > > rdiff host1:file1 host2:file2 > > Breathes there such a beast? I see that Penguin/OS has already taken > "rdiff" which doesn't seem to do what I want. rdiff is something which someone on the internet had created, as part of the librsync package[1]. Thia isn't considered part of the core package (for example, Debian consideres it as an "optional" package) but rather something which various distributions have packaged for the convenience for their users. [1] https://librsync.github.io/ So if this is considered part of Penguin/OS, would we also consider "nethack" or X11 part of BSD 4.3, since it was available and often would be commonly installed on BSD 4.3 systems? Or are all packages which are in FreeBSD's ports "part of FreeBSD"? Or all packages in MacPorts part of MacOS? In any case, the way I'd suggest that you do this that works as an extention to the Unix philosohy of "Everything looks like a file" is to use FUSE: sshfs host1:/ ~/mnt/host1 sshfs host2:/ ~/mnt/host2 diff ~/mnt/host1/file1 ~/mnt/host2/file2 Cheers, - Ted From douglas.mcilroy at dartmouth.edu Sun May 19 04:07:38 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Sat, 18 May 2024 14:07:38 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools Message-ID: I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more". % less --help | wc 298 Last time I looked, the line count was about 220. Bloat is self-catalyzing. What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual! And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From brantley at coraid.com Sun May 19 04:13:36 2024 From: brantley at coraid.com (Brantley Coile) Date: Sat, 18 May 2024 14:13:36 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: Message-ID: <1F0ECDE3-F653-48FB-AC95-FCE84C9B14A5@coraid.com> I'm so grateful that we are able to work using Plan 9. aztec% wc -l /sys/src/cmd/p.c 90 /sys/src/cmd/p.c aztec% So the size of Plan 9's paginator's source code is 208 lines smaller than the help for that paginator. And it has no options. Just say'n. bwc > On May 18, 2024, at 2:07 PM, Douglas McIlroy wrote: > > I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more". > % less --help | wc > 298 > Last time I looked, the line count was about 220. Bloat is self-catalyzing. > > What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual! And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page. > > Doug From lm at mcvoy.com Sun May 19 04:18:25 2024 From: lm at mcvoy.com (Larry McVoy) Date: Sat, 18 May 2024 11:18:25 -0700 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: Message-ID: <20240518181825.GT9216@mcvoy.com> On Sat, May 18, 2024 at 02:07:38PM -0400, Douglas McIlroy wrote: > I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is > more". > % less --help | wc > 298 > Last time I looked, the line count was about 220. Bloat is self-catalyzing. > > What prompted me to look was another disheartening discovery. The "small > special tool" Gnu diff has a 95-page manual! And it doesn't cover the > option I was looking up (-h). To be fair, the manual includes related > programs like diff3(1), sdiff(1) and patch(1), but the original manual for > each fit on one page. Normally I agree with Doug but on documentation, the less is more leaves me cold. It's fine when it is V7 cat that had maybe an option or two. GNU diff is a complex beast and it needs lots of docs. Personally, I like it when man pages have a few usage examples, the BitKeeper docs are like that. But I'm ok with a terse man page with a SEE ALSO that points to a user guide. Docs should be helpful. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From ralph at inputplus.co.uk Sun May 19 04:22:18 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sat, 18 May 2024 19:22:18 +0100 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: Message-ID: <20240518182218.44ED921309@orac.inputplus.co.uk> Hi Doug, > % less --help | wc -l > 298 > Last time I looked, the line count was about 220. Bloat is self-catalyzing. Adding a --help option is a sign the man page lacks succintness. It's the easier solution. Another point against adding --help: there's a second attempt to describe the source. -- Cheers, Ralph. From tuhs at tuhs.org Sun May 19 04:31:41 2024 From: tuhs at tuhs.org (=?utf-8?b?UGV0ZXIgV2VpbmJlcmdlciAo5rip5Y2a5qC8KSB2aWEgVFVIUw==?=) Date: Sat, 18 May 2024 14:31:41 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: Message-ID: There is a common problem in our field. When something (a command, a language, a library, etc) has a flaw, we say to ourselves, "This is not good. If we remove this flaw things will be better." as if it's an obvious truth. Sometimes it is true, but it's frequently questionable, and all too often it's just wrong. We have no commonly accepted way of balancing complexity and function; usually complexity wins. When AI takes my job it will be because it's better at dealing with the mindless complexity of enormous APIs (and command-line flags). On Sat, May 18, 2024 at 2:08 PM Douglas McIlroy wrote: > > I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is more". > % less --help | wc > 298 > Last time I looked, the line count was about 220. Bloat is self-catalyzing. > > What prompted me to look was another disheartening discovery. The "small special tool" Gnu diff has a 95-page manual! And it doesn't cover the option I was looking up (-h). To be fair, the manual includes related programs like diff3(1), sdiff(1) and patch(1), but the original manual for each fit on one page. > > Doug From clemc at ccc.com Sun May 19 04:52:05 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 18 May 2024 14:52:05 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240518181825.GT9216@mcvoy.com> References: <20240518181825.GT9216@mcvoy.com> Message-ID: On Sat, May 18, 2024 at 2:18 PM Larry McVoy wrote: > But I'm ok with a terse man page with a SEE ALSO that points to a user > guide. > Only if the SEE ALSO has more complete and relevant information - otherwise, it degrades to VMS's famous "see figure 1" SPR. > > Docs should be helpful. > And easy to extract information. The issue to be comes back to the type of information each document is designed to give. I believe there at least three types of docs: 1. Full manuals explain how something is built and it it used. It helps to have theory/principles of operations behind it and enough detail when done, you can understand why and how to use it. 2. Tutorials are excellent for someone trying to learn a new tool. Less theory - and more -- examples, showing off the features and how to do something. 3. References pages - need to be quick look-ups to remind someone how to use something - particularly for tools you don't use every day/generally don't memorize. There are at least two more: an academic paper which might be looked at as a start of #1 and full books which take #1 to even more details. Some academic papers indeed are fine manuals, and I can also argue the "manual" for some tools like awk/sed or, for that matter, yacc(1) are full books. But the idea is the >>complete<< review here. Tutorials and reference pages are supposed to easy helpful things -- but often miss the mark for the audience. To me, the problem is the wrong type of information is put in each one and, more importantly, people's expectations from the document. I love properly built manual pages - I detest things like the VMS/TOPS help command or gnu info pages. What I really hate is when there is no manual, but they tell you see the HELP command -- but which command or "subtopic" -- Yikes. The traditional man system is simple quick reminders, basic reference and I can move on. For instance, I needed to remember which C library has the definition these days for some set of functions and what are its error return codes -- man 3 functions, I'm done. Tutorials are funny. For some people, what they want to learn the ideas behind a tool. Typically, I don't need that as much as how this toll does some function. For instance, Apple is forcing me the learn lldb because the traditional debuggers derived from UCB's DBX are not there. It's similar to different. The man page is useful only for the command lines switches. It turns out the commands are all really long, but they have abbreviations and can be aliases. I found references to this in an lldb tutorial - but the tutorial is written to teach people more how to use a debugger to debug there code, and less how this debugger maps into the traditional functions. Hey I would like to find an cheat sheet or a set of aliases that map DBX/GDB into it -- but so far I've found nothing. So Larry -- I agree with you ... "*Docs should be helpful*," but I fear saying like that is a bit like the Faber College Motto/Founder's Quote: "*Knowledge is good*." ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From luther.johnson at makerlisp.com Sun May 19 05:19:32 2024 From: luther.johnson at makerlisp.com (Luther Johnson) Date: Sat, 18 May 2024 12:19:32 -0700 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240518181825.GT9216@mcvoy.com> Message-ID: <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com> Complexity is entropy. It occurs naturally in all human endeavor. It takes work to keep things small, orderly, and rational. But there is also a point where although a tool may be perfect in its conception and execution, from its own perspective, it is not as useful as a slightly more disorderly version that does what people want it to do. "Well they shouldn't want that !" is a common response. Then people write scripts to do for themselves what the tool doesn't do. Which might be right, but it might lead to a whole bunch of similar scripts to do the same thing, just a little differently And that's when we discover that it would have been better to have it in the one tool in the first place. So it's a back and forth, trial and error process. Eventually new balances get struck, and people of like minds and tastes find a new center, like Plan 9, or other things. Myself, I do tend to like tools that are smaller and more single-minded in their function (and that makes it possible to have documentation that is clearer and more concise), but as an example, sometimes I want the "-u" switch on diff, to make a patch, sometimes I don't, the default display is better for a quick review (but I think or expect that the essential diff engine is being shared). It's all a matter of judgment, but you can't apply good judgment until you have the experience gained from trying several alternatives. So things will get bloated up, and then they will need to be pruned and re-engineered, but hopefully we don't throw out the most helpful exceptions to the rule just because they don't fit with some sort of consistency aesthetic. On 05/18/2024 11:52 AM, Clem Cole wrote: > > > On Sat, May 18, 2024 at 2:18 PM Larry McVoy > wrote: > > But I'm ok with a terse man page with a SEE ALSO thatpoints to a > user guide. > > Only if the SEE ALSO has more complete and relevant information - > otherwise, it degrades to VMS's famous "see figure 1" SPR. > > > Docs should be helpful. > > And easy to extract information. > > The issue to be comes back to the type of information each document is > designed to give. I believe there at least three types of docs: > > 1. Full manuals explain how something is built and it it used. It > helps to have theory/principles of operations behind it and enough > detail when done, you can understand why and howto use it. > 2. Tutorials are excellent for someone trying to learn a new tool. > Less theory - and more -- examples, showing off the features and > how to do something. > 3. References pages - need to be quick look-ups to remind someone how > to use something - particularly for tools you don't use every > day/generally don't memorize. > > > There are at least two more: an academic paper which might be looked > at as a start of #1 and full books which take #1 to even more > details. Some academic papers indeed are fine manuals, and I can also > argue the "manual" for some tools like awk/sed or, for that matter, > yacc(1) are full books. But the idea is the >>complete<< review here. > > Tutorials and reference pages are supposed to easy helpful things -- > but often miss the mark for the audience. To me, the problem is the > wrong type of information is put in each one and, more importantly, > people's expectations from the document. I love properly builtmanual > pages - I detest things like the VMS/TOPS help command or gnu info > pages. What I really hate is when there is no manual, but they tell > you see the HELP command -- but which command or "subtopic" -- Yikes. > The traditional man system is simple quick reminders, > basicreferenceand I can move on. For instance, I needed to remember > which C library has the definition these days for some set of > functions and what are its error return codes -- man 3 functions, I'm > done. > > Tutorials are funny. For some people, what they want to learn the > ideas behind a tool. Typically, I don't need that as much as how this > toll does some function. For instance, Apple is forcing me the learn > lldb because the traditional debuggers derived from UCB's DBX are not > there. It's similar to different. The man page is useful only for > the command lines switches. It turns out the commands are all really > long, but they have abbreviations and can be aliases. I found > references to this in an lldb tutorial - but the tutorial is written > to teach people more how to use a debugger to debug there code, and > less how this debugger maps into the traditional functions. Hey I > would like to find an cheat sheet or a set of aliases that map DBX/GDB > into it -- but so far I've found nothing. > > So Larry -- I agree with you ... "/Docs should be helpful/," but I > fear saying like that is a bit like the Faber College > Motto/Founder's Quote: "/Knowledge is good/." > > > > ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuff at riddermarkfarm.ca Sun May 19 05:32:37 2024 From: stuff at riddermarkfarm.ca (Stuff Received) Date: Sat, 18 May 2024 15:32:37 -0400 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240518181825.GT9216@mcvoy.com> Message-ID: On 2024-05-18 14:52, Clem Cole wrote (in part): > Hey I would like to find > an cheat sheet or a set of aliases that map DBX/GDB into it -- but so > far I've found nothing. Does this help? https://lldb.llvm.org/use/map.html (I confess that learning lldb has been quite the chore.) S. From tuhs at tuhs.org Sun May 19 06:12:38 2024 From: tuhs at tuhs.org (segaloco via TUHS) Date: Sat, 18 May 2024 20:12:38 +0000 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com> References: <20240518181825.GT9216@mcvoy.com> <0309bfd8-3f85-e687-1500-c1e447599e83@makerlisp.com> Message-ID: On Saturday, May 18th, 2024 at 12:19 PM, Luther Johnson wrote: > Complexity is entropy. It occurs naturally in all human endeavor. It takes work to keep things small, orderly, and rational. But there is also a point where although a tool may be perfect in its conception and execution, from its own perspective, it is not as useful as a slightly more disorderly version that does what people want it to do. "Well they shouldn't want that !" is a common response. Then people write scripts to do for themselves what the tool doesn't do. Which might be right, but it might lead to a whole bunch of similar scripts to do the same thing, just a little differently And that's when we discover that it would have been better to have it in the one tool in the first place. > > So it's a back and forth, trial and error process. Eventually new balances get struck, and people of like minds and tastes find a new center, like Plan 9, or other things. > > Myself, I do tend to like tools that are smaller and more single-minded in their function (and that makes it possible to have documentation that is clearer and more concise), but as an example, sometimes I want the "-u" switch on diff, to make a patch, sometimes I don't, the default display is better for a quick review (but I think or expect that the essential diff engine is being shared). It's all a matter of judgment, but you can't apply good judgment until you have the experience gained from trying several alternatives. So things will get bloated up, and then they will need to be pruned and re-engineered, but hopefully we don't throw out the most helpful exceptions to the rule just because they don't fit with some sort of consistency aesthetic. > > On 05/18/2024 11:52 AM, Clem Cole wrote: > > > > > > > On Sat, May 18, 2024 at 2:18 PM Larry McVoy wrote: > > > > > But I'm ok with a terse man page with a SEE ALSO that points to a user guide. > > > > Only if the SEE ALSO has more complete and relevant information - otherwise, it degrades to VMS's famous "see figure 1" SPR. > > > > > > > > Docs should be helpful. > > > > And easy to extract information. > > > > > > The issue to be comes back to the type of information each document is designed to give. I believe there at least three types of docs: > > > > 1. Full manuals explain how something is built and it it used. It helps to have theory/principles of operations behind it and enough detail when done, you can understand why and how to use it. > > 2. Tutorials are excellent for someone trying to learn a new tool. Less theory - and more -- examples, showing off the features and how to do something. > > 3. References pages - need to be quick look-ups to remind someone how to use something - particularly for tools you don't use every day/generally don't memorize. > > > > > > > > There are at least two more: an academic paper which might be looked at as a start of #1 and full books which take #1 to even more details. Some academic papers indeed are fine manuals, and I can also argue the "manual" for some tools like awk/sed or, for that matter, yacc(1) are full books. But the idea is the >>complete<< review here. > > > > > > Tutorials and reference pages are supposed to easy helpful things -- but often miss the mark for the audience. To me, the problem is the wrong type of information is put in each one and, more importantly, people's expectations from the document. I love properly built manual pages - I detest things like the VMS/TOPS help command or gnu info pages. What I really hate is when there is no manual, but they tell you see the HELP command -- but which command or "subtopic" -- Yikes. The traditional man system is simple quick reminders, basic reference and I can move on. For instance, I needed to remember which C library has the definition these days for some set of functions and what are its error return codes -- man 3 functions, I'm done. > > > > > > Tutorials are funny. For some people, what they want to learn the ideas behind a tool. Typically, I don't need that as much as how this toll does some function. For instance, Apple is forcing me the learn lldb because the traditional debuggers derived from UCB's DBX are not there. It's similar to different. The man page is useful only for the command lines switches. It turns out the commands are all really long, but they have abbreviations and can be aliases. I found references to this in an lldb tutorial - but the tutorial is written to teach people more how to use a debugger to debug there code, and less how this debugger maps into the traditional functions. Hey I would like to find an cheat sheet or a set of aliases that map DBX/GDB into it -- but so far I've found nothing. > > > > > > So Larry -- I agree with you ... "Docs should be helpful," but I fear saying like that is a bit like the Faber College Motto/Founder's Quote: "Knowledge is good." > > > > > > > > > > ᐧ Facing ever-growing complexity, I often find myself turning strictly to the POSIX/SUS manpages for anything that has one, not only due to an interest in keeping things as portable as possible, but also admittedly out of some trepidation that the cool shiny specific feature of the week for a specific implementation doesn't have quite the same stabilizing standard behind it, and as such has the unlikely but real potential to change right out from under you in a new major version. Issuing 'man 1p' or 'man 3p' before most studying has become habit, turning to a vendors' docs only when necessary. Granted, no standardization of debuggers, assemblers, linkers, etc. makes this much trickier when working with embedded stuff or intensive diagnostics, so in that regard I've thus far been aligned with the GNU family of these components. For the sake of embedded devs, it would be nice if the as/ld/db set of utilities had some sort of guiding light driving disparate implementations. A particular example of divergent behavior wearing a familiar mask is the cc65 suite. The assembler and linker smell of predictable UNIX fare, but differ in a number of little, quite annoying ways, among them "export" instead of "globl", cheap labels based strictly on counts forward and backward rather than recyclable numeric labels, just little things. While a standard isn't the end all be all solution to everything, it certainly decreases at least some of the cognitive load, giving you a subset of behaviors to learn once, only turning to specifics when you've exhausted your options (or patience) with the intersections of various implementations. I see a domino effect in this sort of thing too, one basal tool diverges a bit from other versions, then folks who only ever use that implementation head down a new fork in the road, those in their "camp" follow, before long whatever that difference is happens to be entrenched in a number of folks' vocabularies. Like linguistic divergence, eventually the dialectical bits of how they work are no longer mutually intelligible. The Tower of Babel grows ever higher, only time will tell whether the varied, sometimes contradictory styles of architecture are a strength or weakness. Like evolution, oft times the most successful, sensible approaches prevail, but nature has a funny way of lapsing on our narrow understanding of "fitness" too. After all, most of what many of us use on the regular guarantees no "fitness for a particular purpose". Does this make stability an organic consequence or a happy accident? I know I couldn't say. - Matt G. From steffen at sdaoden.eu Sun May 19 06:33:19 2024 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Sat, 18 May 2024 22:33:19 +0200 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: Message-ID: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Douglas McIlroy wrote in : |I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less is |more". | % less --help | wc | 298 |Last time I looked, the line count was about 220. Bloat is self-catalyzing. I do not buy that. You are working on Windows and in the meantime have switched to one of those graphical browser monsters (i think) where each instance has more code active than the entire Unix history altogether. less(1) can now Unicode, and that is not as easy with ISO/POSIX as it was on Plan9 for example which simply goes UTF-8 and has some (smart) lookup tables (now in go, more or less, last i looked), but that is not the whole picture of it. It can those ANSI / ISO 6429 color sequences that everybody wants, as you have them everywhere, even GNU's yacc, bison. The OpenBSD people took a port done by an OpenSolaris (i think, that scene, anyhow) guy, and together they stripped it down massively. But i do not use it, because after almost exactly a decade i got upstreamed to Nudelman's less(1) the necessary patches to have active hyperlinks on the terminal, in a normal Unix (roff mdoc) manual. (These work via OSC-8 escape sequences; it was a "15 files changed, 601 insertions(+), 9 deletions(-)" patch, which included careful quoting of file paths etc. for man(1) openings (ie, such code gets lengthy), but he did it differently a bit, and left off some things i wanted, included others (good), but if you use --mouse with his one then you have a real browser feeling. I have problems with --mouse, unfortunately, because when used you can no longer copy+paste -- he would need to add clipboard control in addition i'd say.., adding even more code.) You know, it may be viable for some tools, but for others, .. not. You say it yourself in your "A Research UNIX Reader": "Electronic mail was there from the start. Never satisfied with its exact behavior, everybody touched it at one time or another". In the meantime the IETF went grazy and produced masses of standards, and unfortunately each one adds a little bit that needs to be addressed differently, and all that needs documentation. Now mail is an extreme example. And almost a quarter of a century ago i wrote a small pager that even had a clock, and it required less CPU on a day with some scrolling than less/ncurses for a one time scroll through the document. But that pager is history, and less is still there, running everywhere, and being used by me dozens to hundreds time a day. Also with colours, with searching, and now also with ^O^N ^On * Search forward for (N-th) OSC8 hyperlink. ^O^P ^Op * Search backward for (N-th) OSC8 hyperlink. ^O^L ^Ol Jump to the currently selected OSC8 hyperlink. And prepared mdoc manuals can now display on a normal Unix terminal in a normal (actively OSC-8 supporting $PAGER) a TOC (at will, with links), and have external (man:, but also http: etc; man is built into less(1) -- yay!) links, too. For example here ∞ is an external, and † are internal links: The OpenSSL program ciphers(1)∞ should be referred to when creating a custom cipher list. Variables of interest for TLS in general are tls-ca-dir†, tls-ca-file†, tls-ca-flags†, tls-ca-no-defaults†, tls-config-file†, tls-config-module†, tls-config-pairs So ^O^L on that ciphers(1) opens a new man(1)ual instance. For all this functionality a program with 221K bytes is small: 221360 May 18 22:13 ...less* Also it starts up into interactive mode with --help. So you could have "full interactivity" and colours and mouse, and configurability to a large extend, which somehow has to be documented, in just 221 K bytes. I give in in that i try to have --help/-h and --long-help/-H, but sometimes that -h is only minimal, because a screenful of data is simply not enough to allow users to have a notion. So less could split the manual into a less.1 and a less-book.7. The same is true for bash, for sure. (And for my little mailer.) But things tend to divert, and it is hard enough to keep one manual in sync with the codebase, especially if you develop focused and expert-idiotized in a one man show. |What prompted me to look was another disheartening discovery. The "small |special tool" Gnu diff has a 95-page manual! And it doesn't cover the |option I was looking up (-h). To be fair, the manual includes related |programs like diff3(1), sdiff(1) and patch(1), but the original manual for |each fit on one page. --End of --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From tuhs at tuhs.org Sun May 19 11:04:23 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Sat, 18 May 2024 18:04:23 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240515164212.beswgy4h2nwvbdck@illithid> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> Message-ID: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> On May 15, 2024, at 9:42 AM, G. Branden Robinson wrote: > > I contemplated following up Bakul Shah's > post with a mention of Jim Gettys's work on bufferbloat.[1] So let me > do that here, and venture the opinion that a "buffer" as popularly > conceived and implemented (more or less just a hunk of memory to house > data) is too damn dumb a data structure for many of the uses to which it > is put. Note that even if you remove every RAM buffer between the two endpoints of a TCP connection, you still have a "buffer". Example: If you have a 1Gbps pipe between SF & NYC, the pipe itself can store something like 3.5MB to 4MB in each direction! As the pipe can be lossy, you have to buffer up N (=bandwidth*latency) bytes at the sending end (until you see an ack for the previous Nth byte), if you want to utilize the full bandwidth. Now what happens if the sender program exits right after sending the last byte? Something on behalf of the sender has to buffer up and stick around to complete the TCP dance. Even if the sender is cat -u, the kernel or a network daemon process atop a microkernel has to buffer this data[1]. Unfortunately you can't abolish latency! But where to put buffers is certainly an engineering choice that can impact compositionality or other problems such as bufferbloat. [1] This brings up a separate point: in a microkernel even a simple thing like "foo | bar" would require a third process - a "pipe service", to buffer up the output of foo! You may have reduced the overhead of individual syscalls but you will have more of cross-domain calls! From lm at mcvoy.com Sun May 19 11:21:14 2024 From: lm at mcvoy.com (Larry McVoy) Date: Sat, 18 May 2024 18:21:14 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> Message-ID: <20240519012114.GU9216@mcvoy.com> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: > [1] This brings up a separate point: in a microkernel even a simple > thing like "foo | bar" would require a third process - a "pipe > service", to buffer up the output of foo! You may have reduced > the overhead of individual syscalls but you will have more of > cross-domain calls! Do any micro kernels do address space to address space bcopy()? -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From stewart at serissa.com Sun May 19 11:26:31 2024 From: stewart at serissa.com (Serissa) Date: Sat, 18 May 2024 21:26:31 -0400 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240519012114.GU9216@mcvoy.com> References: <20240519012114.GU9216@mcvoy.com> Message-ID: MIT's FOS (Factored Operating System) research OS did cross address space copies as part of its messaging machinery. HPC networking does this by using shared memory (Cross Memory Attach and XPMEM) in a traditional kernel. -L > On May 18, 2024, at 9:21 PM, Larry McVoy wrote: > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: >> [1] This brings up a separate point: in a microkernel even a simple >> thing like "foo | bar" would require a third process - a "pipe >> service", to buffer up the output of foo! You may have reduced >> the overhead of individual syscalls but you will have more of >> cross-domain calls! > > Do any micro kernels do address space to address space bcopy()? > -- > --- > Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From tuhs at tuhs.org Sun May 19 11:40:42 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Sat, 18 May 2024 18:40:42 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240519012114.GU9216@mcvoy.com> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> Message-ID: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> On May 18, 2024, at 6:21 PM, Larry McVoy wrote: > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: >> [1] This brings up a separate point: in a microkernel even a simple >> thing like "foo | bar" would require a third process - a "pipe >> service", to buffer up the output of foo! You may have reduced >> the overhead of individual syscalls but you will have more of >> cross-domain calls! > > Do any micro kernels do address space to address space bcopy()? mmapping the same page in two processes won't be hard but now you have complicated cat (or some iolib)! From tuhs at tuhs.org Sun May 19 11:50:57 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Sat, 18 May 2024 18:50:57 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> Message-ID: <80302C1F-99D7-4E5F-8656-2E7E67C40422@iitbombay.org> On May 18, 2024, at 6:40 PM, Bakul Shah wrote: > > On May 18, 2024, at 6:21 PM, Larry McVoy wrote: >> >> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: >>> [1] This brings up a separate point: in a microkernel even a simple >>> thing like "foo | bar" would require a third process - a "pipe >>> service", to buffer up the output of foo! You may have reduced >>> the overhead of individual syscalls but you will have more of >>> cross-domain calls! >> >> Do any micro kernels do address space to address space bcopy()? > > mmapping the same page in two processes won't be hard but now > you have complicated cat (or some iolib)! And there are other issues. As Doug said in his original message in this thread: "And input buffering must never ingest data that the program will not eventually use." Consider something like this: (echo 1; echo 2)|(read; cat) This will print 2. Emulating this with mmaped buffers and copying will not be easy.... From lm at mcvoy.com Sun May 19 12:02:56 2024 From: lm at mcvoy.com (Larry McVoy) Date: Sat, 18 May 2024 19:02:56 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> Message-ID: <20240519020256.GV9216@mcvoy.com> On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote: > On May 18, 2024, at 6:21???PM, Larry McVoy wrote: > > > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: > >> [1] This brings up a separate point: in a microkernel even a simple > >> thing like "foo | bar" would require a third process - a "pipe > >> service", to buffer up the output of foo! You may have reduced > >> the overhead of individual syscalls but you will have more of > >> cross-domain calls! > > > > Do any micro kernels do address space to address space bcopy()? > > mmapping the same page in two processes won't be hard but now > you have complicated cat (or some iolib)! I recall asking Linus if that could be done to save TLB entries, as in multiple processes map a portion of their address space (at the same virtual location) and then they all use the same TLB entries for that part of their address space. He said it couldn't be done because the process ID concept was hard wired into the TLB. I don't know if TLB tech has evolved such that a single process could have multiple "process" IDs associated with it in the TLB. I wanted it because if you could share part of your address space with another process, using the same TLB entries, then motivation for threads could go away (I've never been a threads fan but I acknowledge why you might need them). I was channeling Rob's "If you think you need threads, your processes are too fat". The idea of using processes instead of threads falls down when you consider TLB usage. And TLB usage, when you care about performance, is an issue. I could craft you some realistic benchmarks, mirroring real world work loads, that would kill the idea of replacing threads with processes unless they shared TLB entries. Think of a N-way threaded application, lots of address space used, that application uses all of the TLB. Now do that with N processes and your TLB is N times less effective. This was a conversation decades ago so maybe TLB tech now has solved this. I doubt it, if this was a solved problem I think every OS would say screw threads, just use processes and mmap(). The nice part of that model is you can choose what parts of your address space you want to share. That cuts out a HUGE swath of potential problems where another thread can go poke in a part of your address space that you don't want poked. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From andreww591 at gmail.com Sun May 19 12:26:54 2024 From: andreww591 at gmail.com (Andrew Warkentin) Date: Sat, 18 May 2024 20:26:54 -0600 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240519012114.GU9216@mcvoy.com> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> Message-ID: On Sat, May 18, 2024 at 7:27 PM Larry McVoy wrote: > > Do any micro kernels do address space to address space bcopy()? > QNX and some L4-like kernels copy directly between address spaces. QNX copies between readv()/writev()-style vectors of arbitrary length. L4-like kernels have different forms of direct copy; Pistachio supports copying between a collection of "strings" that are limited to 4M each. seL4 on the other hand is limited to a single page-sized fixed buffer for each thread (I've been working on an as-yet unnamed fork of it that supports QNX-like vectors for the OS I'm working on; I gave up on my previous plan to use async queues and intermediary buffers to support arbitrary-length messages in user space, since that was turning out to be rather ugly and would have had a high risk of priority inversion). From tuhs at tuhs.org Sun May 19 12:28:03 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Sat, 18 May 2024 19:28:03 -0700 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240519020256.GV9216@mcvoy.com> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> <20240519020256.GV9216@mcvoy.com> Message-ID: <5216605C-37DD-4B39-9363-4DF9327FEEAB@iitbombay.org> On May 18, 2024, at 7:02 PM, Larry McVoy wrote: > > On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote: >> On May 18, 2024, at 6:21???PM, Larry McVoy wrote: >>> >>> On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: >>>> [1] This brings up a separate point: in a microkernel even a simple >>>> thing like "foo | bar" would require a third process - a "pipe >>>> service", to buffer up the output of foo! You may have reduced >>>> the overhead of individual syscalls but you will have more of >>>> cross-domain calls! >>> >>> Do any micro kernels do address space to address space bcopy()? >> >> mmapping the same page in two processes won't be hard but now >> you have complicated cat (or some iolib)! > > I recall asking Linus if that could be done to save TLB entries, as in > multiple processes map a portion of their address space (at the same > virtual location) and then they all use the same TLB entries for that > part of their address space. He said it couldn't be done because the > process ID concept was hard wired into the TLB. I don't know if TLB > tech has evolved such that a single process could have multiple "process" > IDs associated with it in the TLB. Two TLB entries can point to the same physical page. Is that not good enough? One process can give its address space a..b and the kernel (or the memory daemon) maps a..b to other process'es a'..b'. a..b may be associated with a file so any IO would have to be seen by both. > I wanted it because if you could share part of your address space with > another process, using the same TLB entries, then motivation for threads > could go away (I've never been a threads fan but I acknowledge why > you might need them). I was channeling Rob's "If you think you need > threads, your processes are too fat". > The idea of using processes instead of threads falls down when you > consider TLB usage. And TLB usage, when you care about performance, is > an issue. I could craft you some realistic benchmarks, mirroring real > world work loads, that would kill the idea of replacing threads with > processes unless they shared TLB entries. Think of a N-way threaded > application, lots of address space used, that application uses all of the > TLB. Now do that with N processes and your TLB is N times less effective. > > This was a conversation decades ago so maybe TLB tech now has solved this. > I doubt it, if this was a solved problem I think every OS would say screw > threads, just use processes and mmap(). The nice part of that model > is you can choose what parts of your address space you want to share. > That cuts out a HUGE swath of potential problems where another thread > can go poke in a part of your address space that you don't want poked. You can sort of evolve plan9's rfork to do a partial address share. The issue with process vs thread is the context switch time. Sharing pages doesn't change that. From andreww591 at gmail.com Sun May 19 12:53:39 2024 From: andreww591 at gmail.com (Andrew Warkentin) Date: Sat, 18 May 2024 20:53:39 -0600 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <20240519020256.GV9216@mcvoy.com> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> <20240519020256.GV9216@mcvoy.com> Message-ID: On Sat, May 18, 2024 at 8:03 PM Larry McVoy wrote: > > On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote: > > On May 18, 2024, at 6:21???PM, Larry McVoy wrote: > > > > > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: > > >> [1] This brings up a separate point: in a microkernel even a simple > > >> thing like "foo | bar" would require a third process - a "pipe > > >> service", to buffer up the output of foo! You may have reduced > > >> the overhead of individual syscalls but you will have more of > > >> cross-domain calls! > > > > > > Do any micro kernels do address space to address space bcopy()? > > > > mmapping the same page in two processes won't be hard but now > > you have complicated cat (or some iolib)! > > I recall asking Linus if that could be done to save TLB entries, as in > multiple processes map a portion of their address space (at the same > virtual location) and then they all use the same TLB entries for that > part of their address space. He said it couldn't be done because the > process ID concept was hard wired into the TLB. I don't know if TLB > tech has evolved such that a single process could have multiple "process" > IDs associated with it in the TLB. > > I wanted it because if you could share part of your address space with > another process, using the same TLB entries, then motivation for threads > could go away (I've never been a threads fan but I acknowledge why > you might need them). I was channeling Rob's "If you think you need > threads, your processes are too fat". > > The idea of using processes instead of threads falls down when you > consider TLB usage. And TLB usage, when you care about performance, is > an issue. I could craft you some realistic benchmarks, mirroring real > world work loads, that would kill the idea of replacing threads with > processes unless they shared TLB entries. Think of a N-way threaded > application, lots of address space used, that application uses all of the > TLB. Now do that with N processes and your TLB is N times less effective. > > This was a conversation decades ago so maybe TLB tech now has solved this. > I doubt it, if this was a solved problem I think every OS would say screw > threads, just use processes and mmap(). The nice part of that model > is you can choose what parts of your address space you want to share. > That cuts out a HUGE swath of potential problems where another thread > can go poke in a part of your address space that you don't want poked. > I've never been a fan of the rfork()/clone() model. With the OS I'm working on, rather than using processes that share state as threads, a process will more or less just be a collection of threads that share a command line and get replaced on exec(). All of the state usually associated with a process (e.g. file descriptor space, filesystem namespace, virtual address space, memory allocations) will instead be stored in separate container objects that can be shared between threads. It will be possible to share any of these containers between processes, or use different combinations between threads within a process. This would allow more control over what gets shared between threads/processes than rfork()/clone() because the state containers will appear in the filesystem and be explicitly bound to threads rather than being anonymous and only transferred on rfork()/clone(). Emulating rfork()/clone on top of this will be easy enough though. From mrochkind at gmail.com Sun May 19 18:30:32 2024 From: mrochkind at gmail.com (Marc Rochkind) Date: Sun, 19 May 2024 11:30:32 +0300 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> <20240519012114.GU9216@mcvoy.com> <767E78C5-E6E7-4CB5-889D-B4E0E5FBA085@iitbombay.org> <20240519020256.GV9216@mcvoy.com> Message-ID: Yes, many classic commands -- cat, cp, and others -- were sleekly and succinctly written. In part because they were devoid of error checking. I recall how annoying it was one time in the early 70s to cp a bunch of files to a file system that was out of space. As I grew older, my concept of what constituted elegant programming changed. UNIX was a *research* project, not a production system! At one of the first UNIX meetings, somebody from an OSS (operations support system) was talking about the limitations of UNIX when Doug asked, "Why are you using UNIX?" Marc On Sun, May 19, 2024, 5:54 AM Andrew Warkentin wrote: > On Sat, May 18, 2024 at 8:03 PM Larry McVoy wrote: > > > > On Sat, May 18, 2024 at 06:40:42PM -0700, Bakul Shah wrote: > > > On May 18, 2024, at 6:21???PM, Larry McVoy wrote: > > > > > > > > On Sat, May 18, 2024 at 06:04:23PM -0700, Bakul Shah via TUHS wrote: > > > >> [1] This brings up a separate point: in a microkernel even a simple > > > >> thing like "foo | bar" would require a third process - a "pipe > > > >> service", to buffer up the output of foo! You may have reduced > > > >> the overhead of individual syscalls but you will have more of > > > >> cross-domain calls! > > > > > > > > Do any micro kernels do address space to address space bcopy()? > > > > > > mmapping the same page in two processes won't be hard but now > > > you have complicated cat (or some iolib)! > > > > I recall asking Linus if that could be done to save TLB entries, as in > > multiple processes map a portion of their address space (at the same > > virtual location) and then they all use the same TLB entries for that > > part of their address space. He said it couldn't be done because the > > process ID concept was hard wired into the TLB. I don't know if TLB > > tech has evolved such that a single process could have multiple "process" > > IDs associated with it in the TLB. > > > > I wanted it because if you could share part of your address space with > > another process, using the same TLB entries, then motivation for threads > > could go away (I've never been a threads fan but I acknowledge why > > you might need them). I was channeling Rob's "If you think you need > > threads, your processes are too fat". > > > > The idea of using processes instead of threads falls down when you > > consider TLB usage. And TLB usage, when you care about performance, is > > an issue. I could craft you some realistic benchmarks, mirroring real > > world work loads, that would kill the idea of replacing threads with > > processes unless they shared TLB entries. Think of a N-way threaded > > application, lots of address space used, that application uses all of the > > TLB. Now do that with N processes and your TLB is N times less > effective. > > > > This was a conversation decades ago so maybe TLB tech now has solved > this. > > I doubt it, if this was a solved problem I think every OS would say screw > > threads, just use processes and mmap(). The nice part of that model > > is you can choose what parts of your address space you want to share. > > That cuts out a HUGE swath of potential problems where another thread > > can go poke in a part of your address space that you don't want poked. > > > > I've never been a fan of the rfork()/clone() model. With the OS I'm > working on, rather than using processes that share state as threads, a > process will more or less just be a collection of threads that share a > command line and get replaced on exec(). All of the state usually > associated with a process (e.g. file descriptor space, filesystem > namespace, virtual address space, memory allocations) will instead be > stored in separate container objects that can be shared between > threads. It will be possible to share any of these containers between > processes, or use different combinations between threads within a > process. This would allow more control over what gets shared between > threads/processes than rfork()/clone() because the state containers > will appear in the filesystem and be explicitly bound to threads > rather than being anonymous and only transferred on rfork()/clone(). > Emulating rfork()/clone on top of this will be easy enough though. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrochkind at gmail.com Sun May 19 18:39:32 2024 From: mrochkind at gmail.com (Marc Rochkind) Date: Sun, 19 May 2024 11:39:32 +0300 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Message-ID: UNIX man pages were almost universally accurate, complete, and succinct. That third admirable attribute gave me the opportunity to write *Advanced UNIX Programming*. So I wasn't complaining. Marc On Sat, May 18, 2024, 11:33 PM Steffen Nurpmeso wrote: > Douglas McIlroy wrote in > : > |I just revisited this ironic echo of Mies van der Rohe's aphorism, "Less > is > |more". > | % less --help | wc > | 298 > |Last time I looked, the line count was about 220. Bloat is > self-catalyzing. > > I do not buy that. > You are working on Windows and in the meantime have switched to > one of those graphical browser monsters (i think) where each > instance has more code active than the entire Unix history > altogether. > > less(1) can now Unicode, and that is not as easy with ISO/POSIX as > it was on Plan9 for example which simply goes UTF-8 and has some > (smart) lookup tables (now in go, more or less, last i looked), > but that is not the whole picture of it. > > It can those ANSI / ISO 6429 color sequences that everybody wants, > as you have them everywhere, even GNU's yacc, bison. > > The OpenBSD people took a port done by an OpenSolaris (i think, > that scene, anyhow) guy, and together they stripped it down > massively. > > But i do not use it, because after almost exactly a decade i got > upstreamed to Nudelman's less(1) the necessary patches to have > active hyperlinks on the terminal, in a normal Unix (roff mdoc) > manual. (These work via OSC-8 escape sequences; it was a "15 > files changed, 601 insertions(+), 9 deletions(-)" patch, which > included careful quoting of file paths etc. for man(1) openings > (ie, such code gets lengthy), but he did it differently a bit, and > left off some things i wanted, included others (good), but if you > use --mouse with his one then you have a real browser feeling. > I have problems with --mouse, unfortunately, because when used you > can no longer copy+paste -- he would need to add clipboard control > in addition i'd say.., adding even more code.) > > You know, it may be viable for some tools, but for others, .. not. > You say it yourself in your "A Research UNIX Reader": "Electronic > mail was there from the start. Never satisfied with its exact > behavior, everybody touched it at one time or another". > In the meantime the IETF went grazy and produced masses of > standards, and unfortunately each one adds a little bit that needs > to be addressed differently, and all that needs documentation. > Now mail is an extreme example. > > And almost a quarter of a century ago i wrote a small pager that > even had a clock, and it required less CPU on a day with some > scrolling than less/ncurses for a one time scroll through the > document. But that pager is history, and less is still there, > running everywhere, and being used by me dozens to hundreds time > a day. Also with colours, with searching, and now also with > > ^O^N ^On * Search forward for (N-th) OSC8 hyperlink. > ^O^P ^Op * Search backward for (N-th) OSC8 hyperlink. > ^O^L ^Ol Jump to the currently selected OSC8 hyperlink. > > And prepared mdoc manuals can now display on a normal Unix > terminal in a normal (actively OSC-8 supporting $PAGER) a TOC (at > will, with links), and have external (man:, but also http: etc; > man is built into less(1) -- yay!) links, too. > For example here ∞ is an external, and † are internal links: > > The OpenSSL program ciphers(1)∞ should be referred to when creating a > custom cipher list. Variables of interest for TLS in general are > tls-ca-dir†, tls-ca-file†, tls-ca-flags†, tls-ca-no-defaults†, > tls-config-file†, tls-config-module†, tls-config-pairs > > So ^O^L on that ciphers(1) opens a new man(1)ual instance. > For all this functionality a program with 221K bytes is small: > > 221360 May 18 22:13 ...less* > > Also it starts up into interactive mode with --help. > So you could have "full interactivity" and colours and mouse, and > configurability to a large extend, which somehow has to be > documented, in just 221 K bytes. > > I give in in that i try to have --help/-h and --long-help/-H, but > sometimes that -h is only minimal, because a screenful of data is > simply not enough to allow users to have a notion. > > So less could split the manual into a less.1 and a less-book.7. > The same is true for bash, for sure. (And for my little mailer.) > But things tend to divert, and it is hard enough to keep one > manual in sync with the codebase, especially if you develop > focused and expert-idiotized in a one man show. > > |What prompted me to look was another disheartening discovery. The "small > |special tool" Gnu diff has a 95-page manual! And it doesn't cover the > |option I was looking up (-h). To be fair, the manual includes related > |programs like diff3(1), sdiff(1) and patch(1), but the original manual > for > |each fit on one page. > --End of .com> > > --steffen > | > |Der Kragenbaer, The moon bear, > |der holt sich munter he cheerfully and one by one > |einen nach dem anderen runter wa.ks himself off > |(By Robert Gernhardt) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Sun May 19 18:58:15 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sun, 19 May 2024 09:58:15 +0100 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: <20240518182218.44ED921309@orac.inputplus.co.uk> References: <20240518182218.44ED921309@orac.inputplus.co.uk> Message-ID: <20240519085815.5B37A20146@orac.inputplus.co.uk> Hi, I wrote: > Another point against adding --help: there's a second attempt to > describe the source. It occurred to me --help's the third attempt as there's already ‘usage: argv[0] ...’. Back when running man took time and paper, I can see a one-line summary to aid memory was useful. I wondered when it first appeared. I've found V2, https://www.tuhs.org/cgi-bin/utree.pl?file=V2/cmd, has cmp.s with cmp (sp)+,$3 beq 1f jsr r5,mesg; ; .even sys exit And cp.c has if(argc != 3) { write(1,"Usage: cp oldfile newfile\n",26); exit(); } Given the lack of options, the need for a usage message surprises me. But then ‘cp a-src a-dest b-src b-dest ...’ used to copy files in pairs. Perhaps when this was dropped, one too many losses?, the usage was needed to remind users of the change. Any earlier Unix examples known by the list? And was ‘usage: ...’ adopted from an earlier system? -- Cheers, Ralph. From ralph at inputplus.co.uk Sun May 19 20:41:27 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sun, 19 May 2024 11:41:27 +0100 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: References: Message-ID: <20240519104127.64670208CA@orac.inputplus.co.uk> Hi, Doug wrote: > The underlying evil of buffered IO still lurks. The justification is > that it's necessary to match the characteristics of IO devices and to > minimize system-call overhead. The former necessity requires the > attention of hardware designers, but the latter is in the hands of > programmers. What can be done to mitigate the pain of border-crossing > into the kernel? Has there been any system-on-chip experimentation with hardware ‘pipes’? They have LIFOs for UARTs. What about LIFO hardware tracking the content of shared memory? Registers can be written to give the base address and buffer size. Various water marks set: every byte as it arrives versus ‘It's not worth getting out of bed for less than 64 KiB’. Read-only registers would allow polling when the buffer is full or empty, or a ‘device’ could be configured to interrupt. Trying to read/write a byte which wasn't ‘yours’ would trap. It would be two cores synchronising without the kernel thanks to hardware. -- Cheers, Ralph. From douglas.mcilroy at dartmouth.edu Mon May 20 00:03:12 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Sun, 19 May 2024 10:03:12 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) Message-ID: > was ‘usage: ...’ adopted from an earlier system? "Usage" was one of those lovely ideas, one exposure to which flips its status from unknown to eternal truth. I am sure my first exposure was on Unix, but I don't remember when. Perhaps because it radically departs from Ken's "?" in qed/ed, I have subconsciously attributed it to Dennis. The genius of "usage" and "?" is that they don't attempt to tell one what's wrong. Most diagnostics cite a rule or hidden limit that's been violated or describe the mistake (e.g. "missing semicolon") , sometimes raising more questions than they answer. Another non-descriptive style of error message that I admired was that of Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed, it reported where, and automatically provided a sample token that would allow the parsing to progress. I found this uniform convention to be at least as informative as distinct hand-crafted messages, which almost by definition can't foresee every contingency. Alas, this elegant scheme seems not to have inspired imitators. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Mon May 20 02:04:53 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Sun, 19 May 2024 12:04:53 -0400 Subject: [TUHS] If forking is bad, how about buffering? In-Reply-To: <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> References: <20240514111032.2kotrrjjv772h5f4@illithid> <20240515164212.beswgy4h2nwvbdck@illithid> <8D556958-0C7F-43F3-8694-D7391E9D89DA@iitbombay.org> Message-ID: On Sat, May 18, 2024 at 9:04 PM Bakul Shah via TUHS wrote: > > Note that even if you remove every RAM buffer between the two > endpoints of a TCP connection, you still have a "buffer". True, and it's unavoidable. The full name of the virtual circuit communication protocol is TCP/IP (Transmission Control Protocol over Internet Protocol). The underlying IP is the protocol used to actually transfer the data from machine to machine. It provides datagram service, meaning that messages may be duplicated, lost, delivered out of order, or delivered with errors. The job of TCP is to provide virtual circuit service, meaning that messages are delivered once, in order, without errors, and reliably. To cope with the underlying datagam service, TCP has to put error checksums on each message, assign sequence numbers to each message, and has to send an acknowledgement to the sender when a message is received. It also has to be prepared to resend messages if there's no acknowledgement or if the ack says the message was received with errors. You can't do all that without buffering messages. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Mon May 20 02:18:07 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Sun, 19 May 2024 12:18:07 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: On Sun, May 19, 2024 at 10:03 AM Douglas McIlroy < douglas.mcilroy at dartmouth.edu> wrote: > > Another non-descriptive style of error message that I admired was that of > Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed, > it reported where, and automatically provided a sample token that would > allow the parsing to progress. I found this uniform convention to be at > least as informative as distinct hand-crafted messages, which almost by > definition can't foresee every contingency. Alas, this elegant scheme seems > not to have inspired imitators. > > The hazard with this approach is that the suggested syntactic correction might simply lead the user farther into the weeds. It depends on how far the parse has gone off the rails before a grammatical error is found. Pascal and BASIC (at least the original Dartmouth BASIC) have simple, well-behaved grammars and the suggested syntactic correction is likely to be correct. It doesn't work as well for more syntactically complicated languages such as C (consider an error resulting from use of == instead of =) or PL/I. And it's nigh on impossible for languages with ill-behaved grammars such as Fortran and COBOL (among other grammatical evils, Fortran has context-sensitive lexiing). Commercial compiler writers avoid this techniq -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Mon May 20 02:21:43 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Sun, 19 May 2024 12:21:43 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: Ack! I finger-fumbled and accidentally sent this message incomplete. Here's the complete version. Sorry about that, Chief! On Sun, May 19, 2024 at 12:18 PM Paul Winalski wrote: > On Sun, May 19, 2024 at 10:03 AM Douglas McIlroy < > douglas.mcilroy at dartmouth.edu> wrote: > >> >> Another non-descriptive style of error message that I admired was that of >> Berkeley Pascal's syntax diagnostics. When the LR parser could not proceed, >> it reported where, and automatically provided a sample token that would >> allow the parsing to progress. I found this uniform convention to be at >> least as informative as distinct hand-crafted messages, which almost by >> definition can't foresee every contingency. Alas, this elegant scheme seems >> not to have inspired imitators. >> >> The hazard with this approach is that the suggested syntactic correction > might simply lead the user farther into the weeds. It depends on how far > the parse has gone off the rails before a grammatical error is found. > Pascal and BASIC (at least the original Dartmouth BASIC) have simple, > well-behaved grammars and the suggested syntactic correction is likely to > be correct. It doesn't work as well for more syntactically complicated > languages such as C (consider an error resulting from use of == instead of > =) or PL/I. And it's nigh on impossible for languages with ill-behaved > grammars such as Fortran and COBOL (among other grammatical evils, Fortran > has context-sensitive lexiing). > > Commercial compiler writers avoid this techniq > Commercial compiler writers avoid this technique because it turns into an error report generator. The potential user benefit is outweighed by all of the "your compiler suggested X as a correction when the problem was Y" that need to be answered. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Mon May 20 03:22:16 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sun, 19 May 2024 18:22:16 +0100 Subject: [TUHS] The 'usage: ...' message. In-Reply-To: References: Message-ID: <20240519172216.E554E2130A@orac.inputplus.co.uk> Hi Doug, > Perhaps because it radically departs from Ken's "?" in qed/ed That spread elsewhere. When PDP-7 Unix's cp.s is given an odd number of arguments, leaving the last unpaired, it prints the argument followed by ‘ ?’. https://www.tuhs.org/cgi-bin/utree.pl?file=PDP7-Unix/cmd/cp.s mes: 040000;077012 unbal: lac name2 tad d4 dac 1f lac d1 sys write; 1: 0; 4 lac d1 sys write; mes; 2 sys exit -- Cheers, Ralph. From dave at horsfall.org Mon May 20 06:42:09 2024 From: dave at horsfall.org (Dave Horsfall) Date: Mon, 20 May 2024 06:42:09 +1000 (EST) Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: On Sun, 19 May 2024, Douglas McIlroy wrote: > Another non-descriptive style of error message that I admired was that > of Berkeley Pascal's syntax diagnostics. When the LR parser could not > proceed, it reported where, and automatically provided a sample token > that would allow the parsing to progress. I found this uniform > convention to be at least as informative as distinct hand-crafted > messages, which almost by definition can't foresee every contingency. > Alas, this elegant scheme seems not to have inspired imitators. I did something like that for our compiler-writing assignment. An ALGOL-like language (I think I used ALGOLW) it would detect when a semicolon was missing, and insert it (with a warning). As a test case, it successfully compiled a program with no semicolons at all... -- Dave From douglas.mcilroy at dartmouth.edu Mon May 20 09:08:12 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Sun, 19 May 2024 19:08:12 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) Message-ID: >> Another non-descriptive style of error message that I admired was that >> of Berkeley Pascal's syntax diagnostics. When the LR parser could not >> proceed, it reported where, and automatically provided a sample token >> that would allow the parsing to progress. I found this uniform >> convention to be at least as informative as distinct hand-crafted >> messages, which almost by definition can't foresee every contingency. >> Alas, this elegant scheme seems not to have inspired imitators. > The hazard with this approach is that the suggested syntactic correction > might simply lead the user farther into the weeds I don't think there's enough experience to justify this claim. Before I experienced the Berkeley compiler, I would have thought such bad outcomes were inevitable in any language. Although the compilers' suggestions often bore little or no relationship to the real correction, I always found them informative. In particular, the utterly consistent style assured there was never an issue of ambiguity or of technical jargon. The compiler taught me Pascal in an evening. I had scanned the Pascal Report a couple of years before but had never written a Pascal program. With no manual at hand, I looked at one program to find out what mumbo-jumbo had to come first and how to print integers, then wrote the rest by trial and error. Within a couple of hours I had a working program good enough to pass muster in an ACM journal. An example arose that one might think would lead "into the weeds". The parser balked before 'or' in a compound Boolean expression like 'a=b and c=d or x=y'. It couldn't suggest a right paren because no left paren had been seen. Whatever suggestion it did make (perhaps 'then') was enough to lead me to insert a remote left paren and teach me that parens are required around Boolean-valued subexpressions. (I will agree that this lesson might be less clear to a programming novice, but so might be many conventional diagnostics, e.g. "no effect".) Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From robpike at gmail.com Mon May 20 10:58:54 2024 From: robpike at gmail.com (Rob Pike) Date: Mon, 20 May 2024 10:58:54 +1000 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch input. It tried automatically to keep things running after a parsing error by inserting some token - semicolon, parenthesis, whatever seemed best - and continuing to parse, in order to maximize the amount of input that could be parsed before giving up. At least, that's what I took the motivation to be. It rarely succeeded in fixing the actual problem, despite PL/I being plastered with semicolons, but it did tend to ferret out more errors per run. I found the tactic helpful. -rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnold at skeeve.com Mon May 20 13:19:02 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Sun, 19 May 2024 21:19:02 -0600 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: <202405200319.44K3J2Jq117819@freefriends.org> Rob Pike wrote: > The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch > input. It tried automatically to keep things running after a parsing error > by inserting some token - semicolon, parenthesis, whatever seemed best - > and continuing to parse, in order to maximize the amount of input that > could be parsed before giving up. At least, that's what I took the > motivation to be. It rarely succeeded in fixing the actual problem, despite > PL/I being plastered with semicolons, but it did tend to ferret out more > errors per run. I found the tactic helpful. > > -rob Gawk used to do this, until people started fuzzing it, causing cascading errors and eventually core dumps. Now the first syntax error is fatal. It got to the point where I added this text to the manual: In recent years, people have been running "fuzzers" to generate invalid awk programs in order to find and report (so-called) bugs in gawk. In general, such reports are not of much practical use. The programs they create are not realistic and the bugs found are generally from some kind of memory corruption that is fatal anyway. So, if you want to run a fuzzer against gawk and report the results, you may do so, but be aware that such reports don’t carry the same weight as reports of real bugs do. (Yeah, I've just changed the subject, feel free to stay on topic. :-) Arnold From imp at bsdimp.com Mon May 20 13:43:11 2024 From: imp at bsdimp.com (Warner Losh) Date: Sun, 19 May 2024 21:43:11 -0600 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: <202405200319.44K3J2Jq117819@freefriends.org> References: <202405200319.44K3J2Jq117819@freefriends.org> Message-ID: On Sun, May 19, 2024, 9:19 PM wrote: > Rob Pike wrote: > > > The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used > batch > > input. It tried automatically to keep things running after a parsing > error > > by inserting some token - semicolon, parenthesis, whatever seemed best - > > and continuing to parse, in order to maximize the amount of input that > > could be parsed before giving up. At least, that's what I took the > > motivation to be. It rarely succeeded in fixing the actual problem, > despite > > PL/I being plastered with semicolons, but it did tend to ferret out more > > errors per run. I found the tactic helpful. > > > > -rob > > Gawk used to do this, until people started fuzzing it, causing cascading > errors and eventually core dumps. Now the first syntax error is fatal. > It got to the point where I added this text to the manual: > > In recent years, people have been running "fuzzers" to generate > invalid awk programs in order to find and report (so-called) > bugs in gawk. > > In general, such reports are not of much practical use. The > programs they create are not realistic and the bugs found are > generally from some kind of memory corruption that is fatal > anyway. > > So, if you want to run a fuzzer against gawk and report the > results, you may do so, but be aware that such reports don’t > carry the same weight as reports of real bugs do. > > (Yeah, I've just changed the subject, feel free to stay on topic. :-) > Awk bailing out near line 1. Warner > Arnold > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Mon May 20 13:54:59 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Sun, 19 May 2024 20:54:59 -0700 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: I remember helping newbie students at USC who were very confused that even though they made the changes "PL/C USES", their program didn't work! > On May 19, 2024, at 5:58 PM, Rob Pike wrote: > > The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used batch input. It tried automatically to keep things running after a parsing error by inserting some token - semicolon, parenthesis, whatever seemed best - and continuing to parse, in order to maximize the amount of input that could be parsed before giving up. At least, that's what I took the motivation to be. It rarely succeeded in fixing the actual problem, despite PL/I being plastered with semicolons, but it did tend to ferret out more errors per run. I found the tactic helpful. > > -rob > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnold at skeeve.com Mon May 20 14:46:53 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Sun, 19 May 2024 22:46:53 -0600 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: <202405200319.44K3J2Jq117819@freefriends.org> Message-ID: <202405200446.44K4kr1Q124396@freefriends.org> Warner Losh wrote: > > (Yeah, I've just changed the subject, feel free to stay on topic. :-) > > Awk bailing out near line 1. $ gawk --nostalgia awk: bailing out near line 1 Aborted (core dumped) A very long time Easter Egg... :-) Arnold From athornton at gmail.com Mon May 20 16:07:37 2024 From: athornton at gmail.com (Adam Thornton) Date: Sun, 19 May 2024 23:07:37 -0700 Subject: [TUHS] On Bloat and the Idea of Small Specialized Tools In-Reply-To: References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Message-ID: I can't tell you--although some of you will know--what a delight it is to be working on a project with an actual documentation engineer. That person (Jonathan Sick, if any of you want to hire him) has engineered things such that it is easy to write good documentation for the projects we write, and not very onerous. He's put in an enormous amount of effort to ensure that if we write reasonably clean code, we can also auto-generate accurate and complete API documentation for it. And to the degree we want to write explanatory docs, that's catered for too. It has been an amazing experience compared to my entire prior history. Adam -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Mon May 20 19:20:13 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Mon, 20 May 2024 10:20:13 +0100 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: <202405200319.44K3J2Jq117819@freefriends.org> References: <202405200319.44K3J2Jq117819@freefriends.org> Message-ID: <20240520092013.21BD01FB2F@orac.inputplus.co.uk> Hi Arnold, > > in order to maximize the amount of input that could be parsed before > > giving up. > > Gawk used to do this, until people started fuzzing it, causing > cascading errors and eventually core dumps. Now the first syntax > error is fatal. This is the first time I've heard of making life difficult for fuzzers so I'm curious... I'm assuming you agree the eventual core dump was a bug somewhere to be fixed, and probably was. Stopping on the first error lessens the ‘attack surface’ for the fuzzer. Do you think there remains a bug which would bite a user which the fuzzer might have found more easily before the shrunken surface? -- Cheers, Ralph. From arnold at skeeve.com Mon May 20 21:58:51 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Mon, 20 May 2024 05:58:51 -0600 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: <20240520092013.21BD01FB2F@orac.inputplus.co.uk> References: <202405200319.44K3J2Jq117819@freefriends.org> <20240520092013.21BD01FB2F@orac.inputplus.co.uk> Message-ID: <202405201158.44KBwpi6166059@freefriends.org> Ralph Corderoy wrote: > This is the first time I've heard of making life difficult for fuzzers > so I'm curious... I was making life easier for me. :-) > I'm assuming you agree the eventual core dump was a bug somewhere to be > fixed, and probably was. Not really. Hugely syntactically invalid programs can end up causing memory corruption as necessary data structures don't get built correctly (or at all); since they're invalid, subsequent bits of gawk that expect valid data structures end up not working. These are "bugs" that can't happen when using the tool correctly. > Stopping on the first error lessens the ‘attack surface’ for the > fuzzer. Do you think there remains a bug which would bite a user which > the fuzzer might have found more easily before the shrunken surface? No. I don't have any examples handy, but you can look back through the bug-gawk archives for some examples of these reports. The number of true bugs that fuzzers have caught (if any!) could be counted on one hand. Sometimes they like to claim that the "bugs" they find could cause denial of service attacks. That's also specious, gawk isn't used for long-running server kinds of programs. The joys of being a Free Software Maintainer. Arnold P.S. I don't claim that gawk is bug-free. But I do think that there are qualitatively different kinds of bugs, and bug reports. From douglas.mcilroy at dartmouth.edu Mon May 20 23:06:30 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Mon, 20 May 2024 09:06:30 -0400 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) Message-ID: I'm surprised by nonchalance about bad inputs evoking bad program behavior. That attitude may have been excusable 50 years ago. By now, though, we have seen so much malicious exploitation of open avenues of "undefined behavior" that we can no longer ignore bugs that "can't happen when using the tool correctly". Mature software should not brook incorrect usage. "Bailing out near line 1" is a sign of defensive precautions. Crashes and unjustified output betray their absence. I commend attention to the LangSec movement, which advocates for rigorously enforced separation between legal and illegal inputs. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From chet.ramey at case.edu Mon May 20 23:10:03 2024 From: chet.ramey at case.edu (Chet Ramey) Date: Mon, 20 May 2024 09:10:03 -0400 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: <20240520092013.21BD01FB2F@orac.inputplus.co.uk> References: <202405200319.44K3J2Jq117819@freefriends.org> <20240520092013.21BD01FB2F@orac.inputplus.co.uk> Message-ID: <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu> On 5/20/24 5:20 AM, Ralph Corderoy wrote: > Hi Arnold, > >>> in order to maximize the amount of input that could be parsed before >>> giving up. >> >> Gawk used to do this, until people started fuzzing it, causing >> cascading errors and eventually core dumps. Now the first syntax >> error is fatal. > > This is the first time I've heard of making life difficult for fuzzers > so I'm curious... It's not making life difficult for them -- they can still fuzz all they want. Chances are better they'll find a genuine bug if you stop right away. > I'm assuming you agree the eventual core dump was a bug somewhere to be > fixed, and probably was. > Stopping on the first error lessens the > ‘attack surface’ for the fuzzer. Do you think there remains a bug which > would bite a user which the fuzzer might have found more easily before > the shrunken surface? Chances are small. (People fuzz bash all the time, and that is my experience.) Look at it this way. Free Software maintainers have limited resources. Is it better to spend time on bugs that will affect a larger percentage of the user population, instead of those that require artificial circumstances that won't be encountered by normal usage? Those get pushed down on the priority list. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 203 bytes Desc: OpenPGP digital signature URL: From arnold at skeeve.com Mon May 20 23:14:07 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Mon, 20 May 2024 07:14:07 -0600 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: References: Message-ID: <202405201314.44KDE7rq170661@freefriends.org> Perhaps I should not respond to this immediately. But: Douglas McIlroy wrote: > I'm surprised by nonchalance about bad inputs evoking bad program behavior. > That attitude may have been excusable 50 years ago. By now, though, we have > seen so much malicious exploitation of open avenues of "undefined behavior" > that we can no longer ignore bugs that "can't happen when using the tool > correctly". Mature software should not brook incorrect usage. It's not nonchalance, not at all! The current behavior is to die on the first syntax error, instead of trying to be "helpful" by continuing to try to parse the program in the hope of reporting other errors. > "Bailing out near line 1" is a sign of defensive precautions. Crashes and > unjustified output betray their absence. The crashes came because errors cascaded. I don't see a reason to spend valuable, *personal* time on adding defenses *where they aren't needed*. A steel door on your bedroom closet does no good if your front door is made of balsa wood. My change was to stop the badness at the front door. > I commend attention to the LangSec movement, which advocates for rigorously > enforced separation between legal and illegal inputs. Illegal input, in gawk, as far as I know, should always cause a syntax error report and an immediate exit. If it doesn't, that is a bug, and I'll be happy to try to fix it. I hope that clarifies things. Arnold From chet.ramey at case.edu Mon May 20 23:25:11 2024 From: chet.ramey at case.edu (Chet Ramey) Date: Mon, 20 May 2024 09:25:11 -0400 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: References: Message-ID: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu> On 5/20/24 9:06 AM, Douglas McIlroy wrote: > I'm surprised by nonchalance about bad inputs evoking bad program behavior. I think the claim is that it's better to stop immediately with an error on invalid input rather than guess at the user's intent and try to go on. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 203 bytes Desc: OpenPGP digital signature URL: From ralph at inputplus.co.uk Mon May 20 23:30:17 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Mon, 20 May 2024 14:30:17 +0100 Subject: [TUHS] A fuzzy awk. In-Reply-To: <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu> References: <202405200319.44K3J2Jq117819@freefriends.org> <20240520092013.21BD01FB2F@orac.inputplus.co.uk> <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu> Message-ID: <20240520133017.BFA761FB2F@orac.inputplus.co.uk> Hi Chet, > Is it better to spend time on bugs that will affect a larger > percentage of the user population, instead of those that require > artificial circumstances that won't be encountered by normal usage? > Those get pushed down on the priority list. You're talking about pushing unlikely, fuzzed bugs down the prioritised list, but we're discussing those bugs not getting onto the list for consideration. Lack of resources also applies to triaging bugs and I agree a fuzzed bug which hands over a 42 KiB of dense, gibberish awk will probably not get volunteer attention. But then fuzzers can seek a smaller test case, similar to Andreas Zeller's delta debugging. I'm in no way criticising Arnold who, like you, has spent many years voluntarily enhancing a program many of us use every day. But it's interesting to shine some light on this corner to better understand what's happening. -- Cheers, Ralph. From ralph at inputplus.co.uk Mon May 20 23:41:55 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Mon, 20 May 2024 14:41:55 +0100 Subject: [TUHS] A fuzzy awk. In-Reply-To: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu> References: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu> Message-ID: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk> Hi Chet, > Doug wrote: > > I'm surprised by nonchalance about bad inputs evoking bad program > > behavior. > > I think the claim is that it's better to stop immediately with an > error on invalid input rather than guess at the user's intent and try > to go on. That aside, having made the decision to patch up the input so more punched cards are consumed, the patch should be bug free. Say it's inserting a semicolon token for pretence. It should have initialised source-file locations just as if it were real. Not an uninitialised pointer to a source filename so a later dereference failed. I can see an avalanche of errors in an earlier gawk caused problems, but each time there would have been a first patch of the input which made a mistake causing the pebble to start rolling. My understanding is that there was potentially a lot of these and rather than fix them it was more productive of the limited time to stop patching the input. Then the code which patched could be deleted, getting rid of the buggy bits along the way? -- Cheers, Ralph. From chet.ramey at case.edu Mon May 20 23:48:12 2024 From: chet.ramey at case.edu (Chet Ramey) Date: Mon, 20 May 2024 09:48:12 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: <20240520133017.BFA761FB2F@orac.inputplus.co.uk> References: <202405200319.44K3J2Jq117819@freefriends.org> <20240520092013.21BD01FB2F@orac.inputplus.co.uk> <7e23b0d6-a8be-4e51-ba5a-21432b2fa055@case.edu> <20240520133017.BFA761FB2F@orac.inputplus.co.uk> Message-ID: <2bfd3b3e-5e0a-4685-9dda-63fc6546e46a@case.edu> On 5/20/24 9:30 AM, Ralph Corderoy wrote: > Hi Chet, > >> Is it better to spend time on bugs that will affect a larger >> percentage of the user population, instead of those that require >> artificial circumstances that won't be encountered by normal usage? >> Those get pushed down on the priority list. > > You're talking about pushing unlikely, fuzzed bugs down the prioritised > list, but we're discussing those bugs not getting onto the list for > consideration. I think the question is whether they were bugs in gawk at all, or the result of gawk trying to be helpful by guessing at the script's intent and trying to go on. Arnold's reaction to that, which had these negative effects most often as the result of fuzzing attempts, was to exit on the first syntax error. Would those `bugs' have manifested themselves if gawk hadn't tried to do this? Are they bugs at all? Guessing at intent is bound to be wrong some of the time, and cause errors of its own. I'm saying that fuzzing does occasionally find obscure bugs -- bugs that would never be encountered in normal usage -- and those should be fixed. Eventually. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 203 bytes Desc: OpenPGP digital signature URL: From ralph at inputplus.co.uk Mon May 20 23:54:04 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Mon, 20 May 2024 14:54:04 +0100 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: Message-ID: <20240520135404.1B4181FB2F@orac.inputplus.co.uk> Hi, Doug wrote: > I commend attention to the LangSec movement, which advocates for > rigorously enforced separation between legal and illegal inputs. https://langsec.org ‘The Language-theoretic approach (LangSec) regards the Internet insecurity epidemic as a consequence of ‘ad hoc’ programming of input handling at all layers of network stacks, and in other kinds of software stacks. LangSec posits that the only path to trustworthy software that takes untrusted inputs is treating all valid or expected inputs as a formal language, and the respective input-handling routines as a ‘recognizer’ for that language. The recognition must be feasible, and the recognizer must match the language in required computation power. ‘When input handling is done in ad hoc way, the ‘de facto’ recognizer, i.e. the input recognition and validation code ends up scattered throughout the program, does not match the programmers' assumptions about safety and validity of data, and thus provides ample opportunities for exploitation. Moreover, for complex input languages the problem of full recognition of valid or expected inputs may be *undecidable*, in which case no amount of input-checking code or testing will suffice to secure the program. Many popular protocols and formats fell into this trap, the empirical fact with which security practitioners are all too familiar. ‘LangSec helps draw the boundary between protocols and API designs that can and cannot be secured and implemented securely, and charts a way to building truly trustworthy protocols and systems. A longer summary of LangSec in this USENIX Security BoF hand-out, and in the talks, articles, and papers below.’ That does look interesting; I'd not heard of it. -- Cheers, Ralph. From g.branden.robinson at gmail.com Tue May 21 00:00:47 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Mon, 20 May 2024 09:00:47 -0500 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: <202405201314.44KDE7rq170661@freefriends.org> References: <202405201314.44KDE7rq170661@freefriends.org> Message-ID: <20240520140047.4x4lwzs6wmo34uge@illithid> Hi folks, At 2024-05-20T07:14:07-0600, arnold at skeeve.com wrote: > Douglas McIlroy wrote: > > I'm surprised by nonchalance about bad inputs evoking bad program > > behavior. That attitude may have been excusable 50 years ago. By > > now, though, we have seen so much malicious exploitation of open > > avenues of "undefined behavior" that we can no longer ignore bugs > > that "can't happen when using the tool correctly". Mature software > > should not brook incorrect usage. > > It's not nonchalance, not at all! > > The current behavior is to die on the first syntax error, instead of > trying to be "helpful" by continuing to try to parse the program in > the hope of reporting other errors. [...] > The crashes came because errors cascaded. I don't see a reason to > spend valuable, *personal* time on adding defenses *where they aren't > needed*. > > A steel door on your bedroom closet does no good if your front door is > made of balsa wood. My change was to stop the badness at the front > door. > > > I commend attention to the LangSec movement, which advocates for > > rigorously enforced separation between legal and illegal inputs. > > Illegal input, in gawk, as far as I know, should always cause a syntax > error report and an immediate exit. > > If it doesn't, that is a bug, and I'll be happy to try to fix it. > > I hope that clarifies things. For grins, and for a data point from elsewhere in GNU-land, GNU troff is pretty robust to this sort of thing. Much as I might like to boast of having improved it in this area, it appears to have already come with iron long johns courtesy of James Clark and/or Werner Lemberg. I threw troff its own ELF executable as a crude fuzz test some years ago, and I don't recall needing to fix anything except unhelpfully vague diagnostic messages (a phenomenon I am predisposed to observe anyway). I did notice today that in one case we were spewing back out unprintable characters (newlines, character codes > 127) _in_ one (but only one) of the diagnostic messages, and while that's ugly, it's not an obvious exploitation vector to me. Nevertheless I decided to fix it and it will be in my next push. So here's the mess you get when feeding GNU troff to itself. No GNU troff since before 1.22.3 core dumps on this sort of unprepossessing input. $ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' | sort | uniq -c 17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape sequence parameter 10 troff:/usr/bin/troff: error: a space character is not allowed in an escape sequence parameter 1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter 1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier 1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting delimiter 1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression escape sequence 1 troff:/usr/bin/troff: error: invalid positional argument number ',' 3 troff:/usr/bin/troff: error: invalid positional argument number '<' 3 troff:/usr/bin/troff: error: invalid positional argument number 'D' 1 troff:/usr/bin/troff: error: invalid positional argument number 'E' 10 troff:/usr/bin/troff: error: invalid positional argument number 'H' 1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi' 1 troff:/usr/bin/troff: error: invalid positional argument number 'I' 1 troff:/usr/bin/troff: error: invalid positional argument number 'I9' 1 troff:/usr/bin/troff: error: invalid positional argument number 'L' 1 troff:/usr/bin/troff: error: invalid positional argument number 'LD' 2 troff:/usr/bin/troff: error: invalid positional argument number 'LL' 5 troff:/usr/bin/troff: error: invalid positional argument number 'LT' 1 troff:/usr/bin/troff: error: invalid positional argument number 'M' 4 troff:/usr/bin/troff: error: invalid positional argument number 'P' 5 troff:/usr/bin/troff: error: invalid positional argument number 'X' 1 troff:/usr/bin/troff: error: invalid positional argument number 'dH' 1 troff:/usr/bin/troff: error: invalid positional argument number 'h' 1 troff:/usr/bin/troff: error: invalid positional argument number 'l' 1 troff:/usr/bin/troff: error: invalid positional argument number 'p' 1 troff:/usr/bin/troff: error: invalid positional argument number 'x' 3 troff:/usr/bin/troff: error: invalid positional argument number '|' 35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable) 3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence The second to last (and most frequent) message in the list above is the "new" one. Here's the diff. diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 8d828a01e..596ecf6f9 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm) } else { const char *p; - for (p = s; *p && csdigit(*p); p++) - ; - if (*p) - copy_mode_error("invalid positional argument number '%1'", s); + bool is_valid = true; + bool is_printable = true; + for (p = s; *p != 0 /* nullptr */; p++) { + if (!csdigit(*p)) + is_valid = false; + if (!csprint(*p)) + is_printable = false; + } + if (!is_valid) { + const char msg[] = "invalid positional argument number"; + if (is_printable) + copy_mode_error("%1 '%2'", msg, s); + else + copy_mode_error("%1 (unprintable)", msg); + } else input_stack::push(input_stack::get_arg(atoi(s))); } GNU troff may have started out with an easier task in this area than an AWK or a shell had; its syntax is not block-structured in the same way, so parser state recovery is easier, and it's _inherently_ a filter. The only fruitful fuzz attack on groff I can recall was upon indexed bibliographic database files, which are a binary format. This went unresolved for several years[1] but I fixed it for groff 1.23.0. https://bugs.debian.org/716109 Regards, Branden [1] I think I understand the low triage priority. Few groff users use the refer(1) preprocessor, and of those who do, even fewer find modern systems so poorly performant at text scanning that they desire the services of indxbib(1) to speed lookup of bibliographic entries. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From stewart at serissa.com Tue May 21 00:09:11 2024 From: stewart at serissa.com (Serissa) Date: Mon, 20 May 2024 10:09:11 -0400 Subject: [TUHS] A fuzzy awk. Message-ID: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Well this is obviously a hot button topic. AFAIK I was nearby when fuzz-testing for software was invented. I was the main advocate for hiring Andy Payne into the Digital Cambridge Research Lab. One of his little projects was a thing that generated random but correct C programs and fed them to different compilers or compilers with different switches to see if they crashed or generated incorrect results. Overnight, his tester filed 300 or so bug reports against the Digital C compiler. This was met with substantial pushback, but it was a mostly an issue that many of the reports traced to the same underlying bugs. Bill McKeemon expanded the technique and published "Differential Testing of Software" https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf Andy had encountered the underlying idea while working as an intern on the Alpha processor development team. Among many other testers, they used an architectural tester called REX to generate more or less random sequences of instructions, which were then run through different simulation chains (functional, RTL, cycle-accurate) to see if they did the same thing. Finding user-accessible bugs in hardware seems like a good thing. The point of generating correct programs (mentioned under the term LangSec here) goes a long way to avoid irritating the maintainers. Making the test cases short is also maintainer-friendly. The test generator is also in a position to annotate the source with exactly what it is supposed to do, which is also helpful. -L -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Tue May 21 00:23:56 2024 From: clemc at ccc.com (Clem Cole) Date: Mon, 20 May 2024 10:23:56 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: I was going to keep silent on this one until I realized I disagree with both Doug and Rob here (always a little dangerous). But because of personal experience, I have a pretty strong opinion is not really a win. Note that I cribbed this email response from an answer I wrote on Quora to the question: *When you are programming and commit a minor error, such as forgetting a semicolon, the compiler throws an error and makes you fix it for yourself. Why doesn’t it just fix it by itself and notify you of the fix instead?* FWIW: The first version of the idea that I now about was DWIM - *Do What I Mean* feature from BBN’s LISP (that eventually made it into InterLISP). As the Wikipedia page describes DWIM became known as "Damn Warren's Infernal Machine" [more details in the DWIM section of the jargon file]. As Doug points out, the original Pascal implementation for Unix, pix(1), also supported this idea of fixing your code for you, and as Rob points out, UCB’s pix(1) took the idea of trying to keep going and make the compile work from the earlier Cornell PL/C compiler for the IBM 360[1], which to quote Wikipedia: “The PL/C compiler had the unusual capability of never failing to compile > any program, through the use of extensive automatic correction of many > syntax errors and by converting any remaining syntax errors to output > statements.” The problem is that people can be lazy, and instead of using " DWIM" as a tool to speed up their development and fix their own errors, they just ignore the errors. In fact, when we were teaching the “Intro to CS” course at UCB in the early 1980s; we actually had students turn in programs that had syntax errors in them because pix(1) had corrected their code -- instead of the student fixing his/her code before handing the program into the TA (and then they would complain when they got “marked down” on the assignment — sigh). IMO: All in all, the experiment failed because many (??most??) people really don’t work that way. Putting a feature like this in an IDE or even an editor like emacs might be reasonable since the sources would be modified, but it means you need to like using an IDE. I also ask --> what happens when the computer’s (IDE) guess is different from the programmer's real intent, and since it was ‘fixed’ behind the curtain, you don’t notice it? Some other people have suggested that DWIM isn’t a little like spelling ‘auto-correct’ or tools like ‘Grammarly.’ The truth is, I have a love/hate relationship with auto-correct, particularly on my mobile devices. I'm dyslexic, so tools like this can be helpful to me sometimes, but I spend a great deal of my time fighting these types of tools because they are so often wrong, particularly with a small screen/keyboard, that it is just “not fun.” This brings me back to my experience. IMO, auto-correct for programming is like DWIM all over again, and the cure causes more problems than it solves. Clem [1] I should add that after Cornell’s PL/C compiler was introduced, IBM eventually added a similar feature to its own PL/1, although it was not nearly as extensive as the Cornell solution. I’m sure you can find people who liked it, but in both cases, I personally never found it that useful. > ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chet.ramey at case.edu Tue May 21 00:26:09 2024 From: chet.ramey at case.edu (Chet Ramey) Date: Mon, 20 May 2024 10:26:09 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk> References: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu> <20240520134155.7A06E1FB2F@orac.inputplus.co.uk> Message-ID: <4eb98dcf-a241-44e9-8f73-30a97ac1a353@case.edu> On 5/20/24 9:41 AM, Ralph Corderoy wrote: > Hi Chet, > >> Doug wrote: >>> I'm surprised by nonchalance about bad inputs evoking bad program >>> behavior. >> >> I think the claim is that it's better to stop immediately with an >> error on invalid input rather than guess at the user's intent and try >> to go on. > > That aside, having made the decision to patch up the input so more > punched cards are consumed, the patch should be bug free. > > Say it's inserting a semicolon token for pretence. It should have > initialised source-file locations just as if it were real. Not an > uninitialised pointer to a source filename so a later dereference > failed. > > I can see an avalanche of errors in an earlier gawk caused problems, but > each time there would have been a first patch of the input which made > a mistake causing the pebble to start rolling. My understanding is that > there was potentially a lot of these and rather than fix them it was > more productive of the limited time to stop patching the input. Then > the code which patched could be deleted, getting rid of the buggy bits > along the way? Maybe we're talking about the same thing. My impression is that at each point there was more than one potential token to insert and go on, and gawk chose one (probably the most common one), in the hopes that it would be able to report as many errors as possible. There's always the chance you'll be wrong there. (I have no insight into the actual nature of these issues, or the actual corruption that caused the crashes, so take the next with skepticism.) And then rather than go back and modify other state after inserting this token -- which gawk did not do -- for the sole purpose of making this guessing more crash-resistant, Arnold chose a different approach: exit on invalid input. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet at case.edu http://tiswww.cwru.edu/~chet/ -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature.asc Type: application/pgp-signature Size: 203 bytes Desc: OpenPGP digital signature URL: From ake.nordin at netia.se Tue May 21 01:39:02 2024 From: ake.nordin at netia.se (=?UTF-8?Q?=C3=85ke_Nordin?=) Date: Mon, 20 May 2024 17:39:02 +0200 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: <20240520135404.1B4181FB2F@orac.inputplus.co.uk> References: <20240520135404.1B4181FB2F@orac.inputplus.co.uk> Message-ID: On 2024-05-20 15:54, Ralph Corderoy wrote: > Doug wrote: >> I commend attention to the LangSec movement, which advocates for >> rigorously enforced separation between legal and illegal inputs. > https://langsec.org > > ‘The Language-theoretic approach (LangSec) regards the Internet > insecurity epidemic as a consequence of ‘ad hoc’ programming of > input handling at all layers of network stacks, and in other kinds > of software stacks. LangSec posits that the only path to > trustworthy software that takes untrusted inputs is treating all > valid or expected inputs as a formal language, and the respective > input-handling routines as a ‘recognizer’ for that language. . . . > ‘LangSec helps draw the boundary between protocols and API designs > that can and cannot be secured and implemented securely, and charts > a way to building truly trustworthy protocols and systems. A longer > summary of LangSec in this USENIX Security BoF hand-out, and in the > talks, articles, and papers below.’ Yes, it's an interesting concept. Those *n?x tools that have lex/yacc frontends are probably closer to this than the average hack. It may become hard to reconcile this with the robustness principle (Be conservative in what you send, be liberal in what you accept) that Jon Postel popularized. Maybe it becomes necessary, though. -- Åke Nordin , resident Net/Lunix/telecom geek. Netia Data AB, Stockholm SWEDEN *46#7O466OI99# From paul.winalski at gmail.com Tue May 21 01:43:49 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Mon, 20 May 2024 11:43:49 -0400 Subject: [TUHS] Documentation (was On Bloat and the Idea of Small Specialized Tools) In-Reply-To: References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Message-ID: On Mon, May 20, 2024 at 2:08 AM Adam Thornton wrote: > I can't tell you--although some of you will know--what a delight it is to > be working on a project with an actual documentation engineer. > > That person (Jonathan Sick, if any of you want to hire him) has engineered > things such that it is easy to write good documentation for the projects we > write, and not very onerous. > > Design for documentability, testability, and ease of maintenance are what distinguishes good software engineering from hackery. Back when I worked in DEC's software development tools group, we had professional technical writers who write the manuals and online help text. There was an unexpected (at least by me) benefit during a project's design phase, too. Documentation was written in parallel with the code, so once the user interface specification was arrived at, first order of business was to sit down with the tech writer and explain it to them. Sometimes in the process of doing that youd stop and think, "wait a minute--we don't really want it doing that". Or you'd find that you bhad difficulty articulating exactly how a particular feature behaves. That's a red flag that you've designed the feature to be too obscure and complex, or that there's something flat-out wrong with it. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Tue May 21 02:06:49 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Mon, 20 May 2024 12:06:49 -0400 Subject: [TUHS] A fuzzy awk. (Was: The 'usage: ...' message.) In-Reply-To: References: Message-ID: On Mon, May 20, 2024 at 9:17 AM Douglas McIlroy < douglas.mcilroy at dartmouth.edu> wrote: > I'm surprised by nonchalance about bad inputs evoking bad program > behavior. That attitude may have been excusable 50 years ago. By now, > though, we have seen so much malicious exploitation of open avenues of > "undefined behavior" that we can no longer ignore bugs that "can't happen > when using the tool correctly". Mature software should not brook incorrect > usage. > > Accepting bad inputs can also lead to security issues. The data breaches from SQL-based attacks are a modern case in point. IMO, as a programmer you owe it to your users to do your best to detect bad input and to handle it in a graceful fashion. Nothing is more frustrating to a user than to have a program blow up in their face with a seg fault, or even worse, simply exit silently. As the DEC compiler team's expert on object files, I was called on to add object file support to a compiler back end originally targeted to VMS only. I inherited support of the object file generator for Unix COFF and later wrote the support for Microsoft PECOFF and ELF. When our group was bought by Intel I did the object file support for Apple OS X MACH-O in the Intel compiler back end. I found that the folks who write linkers are particularly lazy about error checking and error handling. They assume that the compiler always generates clean object files. That's OK I suppose if the compiler and linker people are in the same organization. If the linker falls over you can just go down the hall and have the linker developer debug the issue and tell you where you went wrong. But that doesn't work when they work for different companies and the compiler person doesn't have access to the linker sources. I ran into a lot of cases where my buggy object file caused the linker to seg fault or, even worse, simply exit without an error message. I ended up writing a very thorough formatted dumper for each object file format that did very thorough checking for proper syntax and as many semantic errors (e.g., symbol table index number out of range) as I could. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin.p.kallus.gr at dartmouth.edu Tue May 21 02:09:54 2024 From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus) Date: Mon, 20 May 2024 12:09:54 -0400 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: References: <20240520135404.1B4181FB2F@orac.inputplus.co.uk> Message-ID: > It may become hard to reconcile this with the robustness principle > (Be conservative in what you send, be liberal in what you accept) > that Jon Postel popularized. Maybe it becomes necessary, though. Yes; the LangSec people essentially reject the robustness principle. See https://langsec.org/papers/postel-patch.pdf -Ben From andrew at humeweb.com Tue May 21 02:37:55 2024 From: andrew at humeweb.com (Andrew Hume) Date: Mon, 20 May 2024 09:37:55 -0700 Subject: [TUHS] Documentation (was On Bloat and the Idea of Small Specialized Tools) In-Reply-To: References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Message-ID: <10EC571B-A75C-47EE-BECC-1B1800B9843C@humeweb.com> > On May 20, 2024, at 8:43 AM, Paul Winalski wrote: > Sometimes in the process of doing that youd stop and think, "wait a minute--we don't really want it doing that". Or you'd find that you bhad difficulty articulating exactly how a particular feature behaves. That's a red flag that you've designed the feature to be too obscure and complex, or that there's something flat-out wrong with it. that’s what i used doug mcilroy for! i especially remember that for mk(1). From woods at robohack.ca Tue May 21 03:30:54 2024 From: woods at robohack.ca (Greg A. Woods) Date: Mon, 20 May 2024 10:30:54 -0700 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: At Mon, 20 May 2024 10:23:56 -0400, Clem Cole wrote: Subject: [TUHS] Re: The 'usage: ...' message. (Was: On Bloat...) > > This brings me back to my experience. IMO, auto-correct for programming is > like DWIM all over again, and the cure causes more problems than it solves. We're deep down that rabbit hole this time with LLM/GPT systems generating large swathes of code that I believe all too often gets into production without any human programmer fully vetting its fitness for purpose, or perhaps even understanding it. -- Greg A. Woods Kelowna, BC +1 250 762-7675 RoboHack Planix, Inc. Avoncote Farms -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: OpenPGP Digital Signature URL: From stuff at riddermarkfarm.ca Tue May 21 03:40:52 2024 From: stuff at riddermarkfarm.ca (Stuff Received) Date: Mon, 20 May 2024 13:40:52 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: References: Message-ID: On 2024-05-19 20:58, Rob Pike wrote: > The Cornell PL/I compiler, PL/C, ran on the IBM 360 so of course used > batch input. It tried automatically to keep things running after a > parsing error by inserting some token - semicolon, parenthesis, whatever > seemed best - and continuing to parse, in order to maximize the amount > of input that could be parsed before giving up. At least, that's what I > took the motivation to be. It rarely succeeded in fixing the actual > problem, despite PL/I being plastered with semicolons, but it did tend > to ferret out more errors per run. I found the tactic helpful. > > -rob > Possibly way off topic but Toronto allowed anyone to run PL/C decks for free, which I often did. One day, they decided to allow all of the card to be read as text and my card numbers generated all sorts of errors. (At least easily fixed by a visit the card punch.) S. From ylee at columbia.edu Tue May 21 04:38:06 2024 From: ylee at columbia.edu (Yeechang Lee) Date: Mon, 20 May 2024 11:38:06 -0700 Subject: [TUHS] Documentation (was On Bloat and the Idea of Small Specialized Tools) In-Reply-To: References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> Message-ID: <26187.39054.137077.761468@dobie-old.ylee.org> Paul Winalski says: > Sometimes in the process of doing that youd stop and think, "wait a > minute--we don't really want it doing that".  Or you'd find that you > bhad difficulty articulating exactly how a particular feature > behaves.  That's a red flag that you've designed the feature to be > too obscure and complex, or that there's something flat-out wrong > with it. My understanding is that an unexpected result of the requirement to draft all federal laws in Canada in both English and French is something similar: The discussion process ensuring that a bill's meaning is identical in both languages helps rid the text of ambiguities and errors regardless of language. From phil at ultimate.com Tue May 21 05:27:24 2024 From: phil at ultimate.com (Phil Budne) Date: Mon, 20 May 2024 15:27:24 -0400 Subject: [TUHS] Documentation (was On Bloat and the Idea of Small Specialized Tools) In-Reply-To: <26187.39054.137077.761468@dobie-old.ylee.org> References: <20240518203319.3oAKtOSk@steffen%sdaoden.eu> <26187.39054.137077.761468@dobie-old.ylee.org> Message-ID: <202405201927.44KJROeX064950@ultimate.com> Yeechang Lee: > My understanding is that an unexpected result of the requirement to > draft all federal laws in Canada in both English and French is > something similar: The discussion process ensuring that a bill's > meaning is identical in both languages helps rid the text of > ambiguities and errors regardless of language. It always seemed to me that ISO standards were written to be equally incomprehensible in all languages, substituting terms like Protocol Data Unit (PDU) for familiar ones like Packet. In the early Internet, where there wasn't ANY money to be made in antisocial conduct, it was easier to justify sentiments like "Rough consensus and working code" and "be liberal in what you accept". Lest ye forget, "industry standards" were once limited to things like magnetic patterns on half-inch tape and the serial transmission of bits, and at the LOWEST of levels. Reading a tape written on another vendor's system wasn't easy when I got started in the early 80's; In addition to ASCII and EBCDIC, there were still systems with vendor-specific 6-bit character sets, never mind punched cards. I remember going on a campus tour in the late 70's where there was an ASCII terminal hooked up to some system that had BASIC (the standard at the time was ANSI "Minimal BASIC"; a full(er) standard took long enough that it was dead on arrival), but instead of "RETURN" required typing CTRL/C (defined in ASCII as End Of Text) to enter a line! In that context, getting ANYTHING working across vendors was a victory, and having one system refuse to speak to another because of some small detail in what one of them considered reasonable (or not) was asking for trouble. The times and stakes today are distinctly different. From johnl at taugh.com Tue May 21 06:02:26 2024 From: johnl at taugh.com (John Levine) Date: 20 May 2024 16:02:26 -0400 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: Message-ID: <20240520200226.80F428B9493A@ary.qy> It appears that Ben Kallus said: >> It may become hard to reconcile this with the robustness principle >> (Be conservative in what you send, be liberal in what you accept) >> that Jon Postel popularized. Maybe it becomes necessary, though. > >Yes; the LangSec people essentially reject the robustness principle. > >See https://langsec.org/papers/postel-patch.pdf On the contrary, they actually understand it. Postel was widely misunderstood to say that you should try to accept arbitrary garbage. People who knew him tell me that he meant to be liberal when the spec is ambiguous, not to allow stuff that is just wrong. As their quote from RFC 1122 points out, he also said you should be prepared for arbitrary garbage so you can reject it. R's, John From johnl at taugh.com Tue May 21 06:10:58 2024 From: johnl at taugh.com (John Levine) Date: 20 May 2024 16:10:58 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: Message-ID: <20240520201100.50BE18B94A62@ary.qy> It appears that Clem Cole said: >“The PL/C compiler had the unusual capability of never failing to compile >> any program, through the use of extensive automatic correction of many >> syntax errors and by converting any remaining syntax errors to output >> statements.” > >The problem is that people can be lazy, and instead of using " DWIM" as a >tool to speed up their development and fix their own errors, they just >ignore the errors. ... PL/C was a long time ago in the early 1970s. People used it on batch systems whre you handed in your cards at the window, waited a while, and later got your printout back. Or at advanced places, you could run the cards through the reader yourself, then wait until the batch ran. In that environment, the benefit from possibly guessing an error correction right meant fewer trips to the card reader. In my youth I did a fair amount of programming that way in WATFOR/WATFIV and Algol W where we really tried to get the programs right since we wanted to finish up and go home. When I was using interactive systems where you could fix one bug and try again, over and over, it seemed like cheating. R's, John From lm at mcvoy.com Tue May 21 06:11:22 2024 From: lm at mcvoy.com (Larry McVoy) Date: Mon, 20 May 2024 13:11:22 -0700 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: <20240520200226.80F428B9493A@ary.qy> References: <20240520200226.80F428B9493A@ary.qy> Message-ID: <20240520201122.GC27662@mcvoy.com> On Mon, May 20, 2024 at 04:02:26PM -0400, John Levine wrote: > It appears that Ben Kallus said: > >> It may become hard to reconcile this with the robustness principle > >> (Be conservative in what you send, be liberal in what you accept) > >> that Jon Postel popularized. Maybe it becomes necessary, though. > > > >Yes; the LangSec people essentially reject the robustness principle. > > > >See https://langsec.org/papers/postel-patch.pdf > > On the contrary, they actually understand it. > > Postel was widely misunderstood to say that you should try to accept > arbitrary garbage. People who knew him tell me that he meant to be > liberal when the spec is ambiguous, not to allow stuff that is just > wrong. As their quote from RFC 1122 points out, he also said you > should be prepared for arbitrary garbage so you can reject it. Yeah, I read the pdf and I took away the same thing as John. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From benjamin.p.kallus.gr at dartmouth.edu Tue May 21 07:00:40 2024 From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus) Date: Mon, 20 May 2024 17:00:40 -0400 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: <20240520201122.GC27662@mcvoy.com> References: <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com> Message-ID: What I meant was that the LangSec people reject the robustness principle as it is commonly understood (i.e., make a "reasonable" guess when receiving garbage), not necessarily that their view is incompatible with Postel's original vision. This interpretation of the principle is pretty widespread; take a look at the Nginx mailing list if you have any doubt. I attribute this to the same phenomenon that inverted the meaning of REST. -Ben From johnl at taugh.com Tue May 21 07:03:23 2024 From: johnl at taugh.com (John R Levine) Date: 20 May 2024 17:03:23 -0400 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: References: <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com> Message-ID: > What I meant was that the LangSec people reject the robustness > principle as it is commonly understood (i.e., make a "reasonable" > guess when receiving garbage), not necessarily that their view is > incompatible with Postel's original vision. This interpretation of the > principle is pretty widespread; take a look at the Nginx mailing list > if you have any doubt. I attribute this to the same phenomenon that > inverted the meaning of REST. Oh, OK, no disagreement there. I'm as tired as you are of people invoking Postel to excuse slovenly code. Regards, John Levine, johnl at taugh.com, Taughannock Networks, Trumansburg NY Please consider the environment before reading this e-mail. https://jl.ly From lm at mcvoy.com Tue May 21 07:14:38 2024 From: lm at mcvoy.com (Larry McVoy) Date: Mon, 20 May 2024 14:14:38 -0700 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: References: <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com> Message-ID: <20240520211438.GF27662@mcvoy.com> On Mon, May 20, 2024 at 05:00:40PM -0400, Ben Kallus wrote: > What I meant was that the LangSec people reject the robustness > principle as it is commonly understood (i.e., make a "reasonable" > guess when receiving garbage) That most certainly is not what I took from what Postel said. And I say that as someone who designed a distributed system that had client and server sides and had to make that work across versions from last week to 10-20 years ago. I took it more as "Be more and more careful what you say, get that more correct with each release, but tolerate the less correct stuff you might get from earlier versions". In no way did I think he meant ``make a "reasonable" guess when receiving garbage''. Garbage is garbage, you error on that. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From benjamin.p.kallus.gr at dartmouth.edu Tue May 21 07:46:48 2024 From: benjamin.p.kallus.gr at dartmouth.edu (Ben Kallus) Date: Mon, 20 May 2024 17:46:48 -0400 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: <20240520211438.GF27662@mcvoy.com> References: <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com> <20240520211438.GF27662@mcvoy.com> Message-ID: My point was that, regardless of Postel's original intent, many people have interpreted his principle to mean that accepting garbage is good. *This* interpretation is incompatible with LangSec. See RFC 9413 for an exploration of the many interpretations of Postel's principle. -Ben From lm at mcvoy.com Tue May 21 07:57:40 2024 From: lm at mcvoy.com (Larry McVoy) Date: Mon, 20 May 2024 14:57:40 -0700 Subject: [TUHS] OT: LangSec (Re: A fuzzy awk.) In-Reply-To: References: <20240520200226.80F428B9493A@ary.qy> <20240520201122.GC27662@mcvoy.com> <20240520211438.GF27662@mcvoy.com> Message-ID: <20240520215740.GG27662@mcvoy.com> Those would be the stupid people and you can't fix stupid. Seriously, people can twist anything into anything. Just because dumb people didn't understand his principle doesn't mean it was a bad principle. On Mon, May 20, 2024 at 05:46:48PM -0400, Ben Kallus wrote: > My point was that, regardless of Postel's original intent, many people > have interpreted his principle to mean that accepting garbage is good. > *This* interpretation is incompatible with LangSec. > > See RFC 9413 for an exploration of the many interpretations of > Postel's principle. > > -Ben -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From cowan at ccil.org Tue May 21 11:14:55 2024 From: cowan at ccil.org (John Cowan) Date: Mon, 20 May 2024 21:14:55 -0400 Subject: [TUHS] The 'usage: ...' message. (Was: On Bloat...) In-Reply-To: <20240520201100.50BE18B94A62@ary.qy> References: <20240520201100.50BE18B94A62@ary.qy> Message-ID: On Mon, May 20, 2024 at 4:11 PM John Levine wrote: It appears that Clem Cole said: > >“The PL/C compiler had the unusual capability of never failing to compile > >> any program, through the use of extensive automatic correction of many > >> syntax errors and by converting any remaining syntax errors to output > >> statements.” > PL/C was a long time ago in the early 1970s. People used it on batch > systems whre you handed in your cards at the window, waited a while, > and later got your printout back. Or at advanced places, you could > run the cards through the reader yourself, then wait until the batch > ran. PL/C was a 3rd-generation autocorrection programming language. CORC was the 1962 version and CUPL was the 1966 version (same date as DWIM), neither of them based on PL/I. There is an implementation of both at < http://www.catb.org/~esr/cupl/>. The Wikipedia DWIM article also points to Magit, the Emacs git client. > > In that environment, the benefit from possibly guessing an error > correction right meant fewer trips to the card reader. In my youth I > did a fair amount of programming that way in WATFOR/WATFIV and Algol W > where we really tried to get the programs right since we wanted to > finish up and go home. > > When I was using interactive systems where you could fix one bug and > try again, over and over, it seemed like cheating. > > R's, > John > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robpike at gmail.com Tue May 21 11:56:30 2024 From: robpike at gmail.com (Rob Pike) Date: Tue, 21 May 2024 11:56:30 +1000 Subject: [TUHS] A fuzzy awk. In-Reply-To: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before 1998. And I believe Doug McIlroy was generating random regular expressions to compare different implementations. It's probably impossible to decide who invented fuzzing, so the credit will surely go to the person who named it. -rob On Tue, May 21, 2024 at 12:09 AM Serissa wrote: > Well this is obviously a hot button topic. AFAIK I was nearby when > fuzz-testing for software was invented. I was the main advocate for hiring > Andy Payne into the Digital Cambridge Research Lab. One of his little > projects was a thing that generated random but correct C programs and fed > them to different compilers or compilers with different switches to see if > they crashed or generated incorrect results. Overnight, his tester filed > 300 or so bug reports against the Digital C compiler. This was met with > substantial pushback, but it was a mostly an issue that many of the reports > traced to the same underlying bugs. > > Bill McKeemon expanded the technique and published "Differential Testing > of Software" > https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf > > Andy had encountered the underlying idea while working as an intern on the > Alpha processor development team. Among many other testers, they used an > architectural tester called REX to generate more or less random sequences > of instructions, which were then run through different simulation chains > (functional, RTL, cycle-accurate) to see if they did the same thing. > Finding user-accessible bugs in hardware seems like a good thing. > > The point of generating correct programs (mentioned under the term LangSec > here) goes a long way to avoid irritating the maintainers. Making the test > cases short is also maintainer-friendly. The test generator is also in a > position to annotate the source with exactly what it is supposed to do, > which is also helpful. > > -L > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lm at mcvoy.com Tue May 21 12:47:43 2024 From: lm at mcvoy.com (Larry McVoy) Date: Mon, 20 May 2024 19:47:43 -0700 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: <20240521024743.GE25728@mcvoy.com> I think the title might go to my OS prof, Bart Miller. He did a paper https://www.paradyn.org/papers/fuzz.pdf that named it that in 1990. On Tue, May 21, 2024 at 11:56:30AM +1000, Rob Pike wrote: > Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before > 1998. And I believe Doug McIlroy was generating random regular expressions > to compare different implementations. It's probably impossible to decide > who invented fuzzing, so the credit will surely go to the person who named > it. > > -rob > > > On Tue, May 21, 2024 at 12:09???AM Serissa wrote: > > > Well this is obviously a hot button topic. AFAIK I was nearby when > > fuzz-testing for software was invented. I was the main advocate for hiring > > Andy Payne into the Digital Cambridge Research Lab. One of his little > > projects was a thing that generated random but correct C programs and fed > > them to different compilers or compilers with different switches to see if > > they crashed or generated incorrect results. Overnight, his tester filed > > 300 or so bug reports against the Digital C compiler. This was met with > > substantial pushback, but it was a mostly an issue that many of the reports > > traced to the same underlying bugs. > > > > Bill McKeemon expanded the technique and published "Differential Testing > > of Software" > > https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf > > > > Andy had encountered the underlying idea while working as an intern on the > > Alpha processor development team. Among many other testers, they used an > > architectural tester called REX to generate more or less random sequences > > of instructions, which were then run through different simulation chains > > (functional, RTL, cycle-accurate) to see if they did the same thing. > > Finding user-accessible bugs in hardware seems like a good thing. > > > > The point of generating correct programs (mentioned under the term LangSec > > here) goes a long way to avoid irritating the maintainers. Making the test > > cases short is also maintainer-friendly. The test generator is also in a > > position to annotate the source with exactly what it is supposed to do, > > which is also helpful. > > > > -L > > > > > > -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From stewart at serissa.com Tue May 21 12:54:36 2024 From: stewart at serissa.com (Lawrence Stewart) Date: Mon, 20 May 2024 22:54:36 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: <20240521024743.GE25728@mcvoy.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <20240521024743.GE25728@mcvoy.com> Message-ID: Good to learn more of the history! I wonder when the technique got started on the hardware side? I wouldn’t be surprised if IBM were doing some of this for the S/360 since it was a nearly compatible set of systems. -L > On May 20, 2024, at 10:47 PM, Larry McVoy wrote: > > I think the title might go to my OS prof, Bart Miller. He did a paper > > https://www.paradyn.org/papers/fuzz.pdf > > that named it that in 1990. > > On Tue, May 21, 2024 at 11:56:30AM +1000, Rob Pike wrote: >> Ron Hardin was doing this to Dennis's C compiler in the 1980s, well before >> 1998. And I believe Doug McIlroy was generating random regular expressions >> to compare different implementations. It's probably impossible to decide >> who invented fuzzing, so the credit will surely go to the person who named >> it. >> >> -rob >> >> >> On Tue, May 21, 2024 at 12:09???AM Serissa wrote: >> >>> Well this is obviously a hot button topic. AFAIK I was nearby when >>> fuzz-testing for software was invented. I was the main advocate for hiring >>> Andy Payne into the Digital Cambridge Research Lab. One of his little >>> projects was a thing that generated random but correct C programs and fed >>> them to different compilers or compilers with different switches to see if >>> they crashed or generated incorrect results. Overnight, his tester filed >>> 300 or so bug reports against the Digital C compiler. This was met with >>> substantial pushback, but it was a mostly an issue that many of the reports >>> traced to the same underlying bugs. >>> >>> Bill McKeemon expanded the technique and published "Differential Testing >>> of Software" >>> https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf >>> >>> Andy had encountered the underlying idea while working as an intern on the >>> Alpha processor development team. Among many other testers, they used an >>> architectural tester called REX to generate more or less random sequences >>> of instructions, which were then run through different simulation chains >>> (functional, RTL, cycle-accurate) to see if they did the same thing. >>> Finding user-accessible bugs in hardware seems like a good thing. >>> >>> The point of generating correct programs (mentioned under the term LangSec >>> here) goes a long way to avoid irritating the maintainers. Making the test >>> cases short is also maintainer-friendly. The test generator is also in a >>> position to annotate the source with exactly what it is supposed to do, >>> which is also helpful. >>> >>> -L >>> >>> >>> > > -- > --- > Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From robpike at gmail.com Tue May 21 13:36:13 2024 From: robpike at gmail.com (Rob Pike) Date: Tue, 21 May 2024 13:36:13 +1000 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <20240521024743.GE25728@mcvoy.com> Message-ID: Eventually Dennis told Ron to stop as he wasn't interested in protecting against insane things like "unsigned register union". Now that computing has become more adversarial, he might feel differently. -rob -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggm at algebras.org Tue May 21 13:53:35 2024 From: ggm at algebras.org (George Michaelson) Date: Tue, 21 May 2024 13:53:35 +1000 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: On Tue, May 21, 2024 at 11:56 AM Rob Pike wrote: >It's probably impossible to decide who invented fuzzing, so the credit will surely go to the person who named it. That theory probably applies to the Earl of Sandwich, And the Earl of Cardigan. Hoare Belisha also did ok for giant orange balls at zebra crossings. I'm less sure the Earl of Zebra feels recognised, or that Eugène-René Poubelle feels happy with his namesake (he should do, dust bins are huge) From tuhs at tuhs.org Tue May 21 21:59:50 2024 From: tuhs at tuhs.org (=?utf-8?b?UGV0ZXIgV2VpbmJlcmdlciAo5rip5Y2a5qC8KSB2aWEgVFVIUw==?=) Date: Tue, 21 May 2024 07:59:50 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <20240521024743.GE25728@mcvoy.com> Message-ID: On a lesser note, one day I got tired of C compiler crashes (probably on the Vax, possibly originating in my code generator) and converted them into 'fatal internal error' messages. On Mon, May 20, 2024 at 11:36 PM Rob Pike wrote: > > Eventually Dennis told Ron to stop as he wasn't interested in protecting against insane things like "unsigned register union". Now that computing has become more adversarial, he might feel differently. > > -rob > From paul.winalski at gmail.com Wed May 22 02:59:38 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Tue, 21 May 2024 12:59:38 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: On Tue, May 21, 2024 at 12:09 AM Serissa wrote: > Well this is obviously a hot button topic. AFAIK I was nearby when >> fuzz-testing for software was invented. I was the main advocate for hiring >> Andy Payne into the Digital Cambridge Research Lab. One of his little >> projects was a thing that generated random but correct C programs and fed >> them to different compilers or compilers with different switches to see if >> they crashed or generated incorrect results. Overnight, his tester filed >> 300 or so bug reports against the Digital C compiler. This was met with >> substantial pushback, but it was a mostly an issue that many of the reports >> traced to the same underlying bugs. >> >> Bill McKeemon expanded the technique and published "Differential Testing >> of Software" >> https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf >> > In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product teams to introduce fuzz testing into our testing process. As with the C compiler work at DEC Cambridge, fuzz testing for other compilers (Fortran, PL/I) also found large numbers of bugs. The pushback from the compiler folks was mainly a matter of priorities. Fuzz testing is very adept at finding edge conditions, but most failing fuzz tests have syntax that no human programmer would ever write. As a compiler engineer you have limited time to devote to bug testing. Do you spend that time addressing real customer issues that have been reported or do you spend it fixing problems with code that no human being would ever write? To take an example that really happened, a fuzz test consisting of 100 nested parentheses caused an overflow in a parser table (it could only handle 50 nested parens). Is that worth fixing? As you pointed out, fuzz test failures tend to occur in clusters and many of the failures eventually are traced to the same underlying bug. Which leads to the counter-argument to the pushback. The fuzz tests are finding real underlying bugs. Why not fix them before a customer runs into them? That very thing did happen several times. A customer-reported bug was fixed and suddenly several of the fuzz test problems that had been reported went away. Another consideration is that, even back in the 1980s, humans weren't the only ones writing programs. There were programs writing programs and they sometimes produced bizarre (but syntactically correct) code. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuhs at tuhs.org Wed May 22 03:56:03 2024 From: tuhs at tuhs.org (segaloco via TUHS) Date: Tue, 21 May 2024 17:56:03 +0000 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: On Tuesday, May 21st, 2024 at 9:59 AM, Paul Winalski wrote: > On Tue, May 21, 2024 at 12:09 AM Serissa wrote: > > > > Well this is obviously a hot button topic. AFAIK I was nearby when fuzz-testing for software was invented. I was the main advocate for hiring Andy Payne into the Digital Cambridge Research Lab. One of his little projects was a thing that generated random but correct C programs and fed them to different compilers or compilers with different switches to see if they crashed or generated incorrect results. Overnight, his tester filed 300 or so bug reports against the Digital C compiler. This was met with substantial pushback, but it was a mostly an issue that many of the reports traced to the same underlying bugs. > > > > > > Bill McKeemon expanded the technique and published "Differential Testing of Software" https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf > > In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product teams to introduce fuzz testing into our testing process. As with the C compiler work at DEC Cambridge, fuzz testing for other compilers (Fortran, PL/I) also found large numbers of bugs. > > The pushback from the compiler folks was mainly a matter of priorities. Fuzz testing is very adept at finding edge conditions, but most failing fuzz tests have syntax that no human programmer would ever write. As a compiler engineer you have limited time to devote to bug testing. Do you spend that time addressing real customer issues that have been reported or do you spend it fixing problems with code that no human being would ever write? To take an example that really happened, a fuzz test consisting of 100 nested parentheses caused an overflow in a parser table (it could only handle 50 nested parens). Is that worth fixing? > > As you pointed out, fuzz test failures tend to occur in clusters and many of the failures eventually are traced to the same underlying bug. Which leads to the counter-argument to the pushback. The fuzz tests are finding real underlying bugs. Why not fix them before a customer runs into them? That very thing did happen several times. A customer-reported bug was fixed and suddenly several of the fuzz test problems that had been reported went away. Another consideration is that, even back in the 1980s, humans weren't the only ones writing programs. There were programs writing programs and they sometimes produced bizarre (but syntactically correct) code. > > -Paul W. A happy medium could be including far-out fuzzing to characterize issues, but not necessarily then immediately sink the resources into resolving bizarre discoveries from the fuzzing. Better to know then not but also have the wisdom to determine "is someone actually going to trip this" vs. "this is something that is possible and good to document". In my own work we have several of the latter where something is almost guaranteed to never happen with a human interaction, but is also something we want documented somewhere so if unlikely problem ever does happen, the discovery is already done and we just start plotting out a solution. That's also some nice low hanging fruit to pluck when there isn't much else going on, but avoids the phenomenon where we sink critical time into bugfixes with a microscopic ROI. - Matt G. From luther.johnson at makerlisp.com Wed May 22 04:12:29 2024 From: luther.johnson at makerlisp.com (Luther Johnson) Date: Tue, 21 May 2024 11:12:29 -0700 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: I like this anecdote because it points out the difference between being able to handle and process bizarre conditions, as if they were something that should work, which is maybe not that helpful, vs. detecting them and doing something reasonable, like failiing with a "limit exceeded" message. A silent, insidious failure down the line because a limit was exceeded is never good. If "fuzz testing" helps exercise limits and identifies places where software hasn't realized it has exceeded its limits, has run off the end of a table, etc., that seems like a good thing to me. On 05/21/2024 09:59 AM, Paul Winalski wrote: > On Tue, May 21, 2024 at 12:09 AM Serissa > wrote: > > Well this is obviously a hot button topic. AFAIK I was nearby > when fuzz-testing for software was invented. I was the main > advocate for hiring Andy Payne into the Digital Cambridge > Research Lab. One of his little projects was a thing that > generated random but correct C programs and fed them to > different compilers or compilers with different switches to > see if they crashed or generated incorrect results. > Overnight, his tester filed 300 or so bug reports against the > Digital C compiler. This was met with substantial pushback, > but it was a mostly an issue that many of the reports traced > to the same underlying bugs. > > Bill McKeemon expanded the technique and published > "Differential Testing of Software" > https://www.cs.swarthmore.edu/~bylvisa1/cs97/f13/Papers/DifferentialTestingForSoftware.pdf > > > In the mid-late 1980s Bill Mckeeman worked with DEC's compiler product > teams to introduce fuzz testing into our testing process. As with the > C compiler work at DEC Cambridge, fuzz testing for other compilers > (Fortran, PL/I) also found large numbers of bugs. > > The pushback from the compiler folks was mainly a matter of > priorities. Fuzz testing is very adept at finding edge conditions, > but most failing fuzz tests have syntax that no human programmer would > ever write. As a compiler engineer you have limited time to devote to > bug testing. Do you spend that time addressing real customer issues > that have been reported or do you spend it fixing problems with code > that no human being would ever write? To take an example that really > happened, a fuzz test consisting of 100 nested parentheses caused an > overflow in a parser table (it could only handle 50 nested parens). > Is that worth fixing? > > As you pointed out, fuzz test failures tend to occur in clusters and > many of the failures eventually are traced to the same underlying > bug. Which leads to the counter-argument to the pushback. The fuzz > tests are finding real underlying bugs. Why not fix them before a > customer runs into them? That very thing did happen several times. A > customer-reported bug was fixed and suddenly several of the fuzz test > problems that had been reported went away. Another consideration is > that, even back in the 1980s, humans weren't the only ones writing > programs. There were programs writing programs and they sometimes > produced bizarre (but syntactically correct) code. > > -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave at horsfall.org Wed May 22 13:26:36 2024 From: dave at horsfall.org (Dave Horsfall) Date: Wed, 22 May 2024 13:26:36 +1000 (EST) Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: On Tue, 21 May 2024, Paul Winalski wrote: > To take an example that really happened, a fuzz test consisting of 100 > nested parentheses caused an overflow in a parser table (it could only > handle 50 nested parens).  Is that worth fixing? Well, they could be a rabid LISP programmer... -- Dave From flexibeast at gmail.com Wed May 22 15:08:29 2024 From: flexibeast at gmail.com (Alexis) Date: Wed, 22 May 2024 15:08:29 +1000 Subject: [TUHS] A fuzzy awk. In-Reply-To: (Dave Horsfall's message of "Wed, 22 May 2024 13:26:36 +1000 (EST)") References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: <875xv6bfhu.fsf@gmail.com> Dave Horsfall writes: > On Tue, 21 May 2024, Paul Winalski wrote: > >> To take an example that really happened, a fuzz test consisting >> of 100 >> nested parentheses caused an overflow in a parser table (it >> could only >> handle 50 nested parens).  Is that worth fixing? > > Well, they could be a rabid LISP programmer... Just did a quick check of some of the ELisp packages on my system: * For my own packages, the maximum was 10 closing parentheses. * For the packages in my elpa/ directory, the maximum was 26 in ducpel-glyphs.el, where they were part of a glyph, rather than delimiting code. The next highest value was 16, in org.el and magit-sequence.el. i would suggest that any Lisp with more than a couple of dozen closing parentheses is in dire need of refactoring. Although of course someone who's rabid is probably not in the appropriate mental state for that. :-) Alexis. From jpl.jpl at gmail.com Wed May 22 22:20:38 2024 From: jpl.jpl at gmail.com (John P. Linderman) Date: Wed, 22 May 2024 08:20:38 -0400 Subject: [TUHS] Gordon Bell has died Message-ID: https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?unlocked_article_code=1.t00.arl-.blsWtHq8G62d&smid=url-share -------------- next part -------------- An HTML attachment was scrubbed... URL: From imp at bsdimp.com Wed May 22 23:12:36 2024 From: imp at bsdimp.com (Warner Losh) Date: Wed, 22 May 2024 07:12:36 -0600 Subject: [TUHS] A fuzzy awk. In-Reply-To: <875xv6bfhu.fsf@gmail.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <875xv6bfhu.fsf@gmail.com> Message-ID: On Tue, May 21, 2024, 11:08 PM Alexis wrote: > Dave Horsfall writes: > > > On Tue, 21 May 2024, Paul Winalski wrote: > > > >> To take an example that really happened, a fuzz test consisting > >> of 100 > >> nested parentheses caused an overflow in a parser table (it > >> could only > >> handle 50 nested parens). Is that worth fixing? > > > > Well, they could be a rabid LISP programmer... > > Just did a quick check of some of the ELisp packages on my system: > > * For my own packages, the maximum was 10 closing parentheses. > * For the packages in my elpa/ directory, the maximum was 26 in > ducpel-glyphs.el, where they were part of a glyph, rather than > delimiting code. The next highest value was 16, in org.el and > magit-sequence.el. > > i would suggest that any Lisp with more than a couple of dozen > closing parentheses is in dire need of refactoring. Although of > course someone who's rabid is probably not in the appropriate > mental state for that. :-) > That's what ']' is for. Warner > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnold at skeeve.com Wed May 22 23:44:14 2024 From: arnold at skeeve.com (arnold at skeeve.com) Date: Wed, 22 May 2024 07:44:14 -0600 Subject: [TUHS] A fuzzy awk. In-Reply-To: <20240520134155.7A06E1FB2F@orac.inputplus.co.uk> References: <502a5f3c-6bd3-4fe8-993c-5351c07e33cd@case.edu> <20240520134155.7A06E1FB2F@orac.inputplus.co.uk> Message-ID: <202405221344.44MDiEGJ326164@freefriends.org> I've been travelling, so I haven't been able to answer these mails until now. Ralph Corderoy wrote: > I can see an avalanche of errors in an earlier gawk caused problems, but > each time there would have been a first patch of the input which made > a mistake causing the pebble to start rolling. My understanding is that > there was potentially a lot of these and rather than fix them it was > more productive of the limited time to stop patching the input. Then > the code which patched could be deleted, getting rid of the buggy bits > along the way? That's not the case. Gawk didn't try to patch the input. It simply set a flag saying "don't try to run" but kept on parsing anyway, in the hope of finding more errors. That was a bad idea, because the representation of the program being built was then not in the correct state to have more stuff parsed and converted into byte code. Very early on, the first parse error caused an exit. I changed it to keep going to try to be helpful. But when that became a source for essentially specious bug reports and a time sink for me, it became time to go back to exiting on the first problem. HTH, Arnold From paul.winalski at gmail.com Thu May 23 01:37:39 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Wed, 22 May 2024 11:37:39 -0400 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: On Tue, May 21, 2024 at 2:12 PM Luther Johnson wrote: > I like this anecdote because it points out the difference between being > able to handle and process bizarre conditions, as if they were something > that should work, which is maybe not that helpful, vs. detecting them and > doing something reasonable, like failiing with a "limit exceeded" message > That is in fact precisely how the DEC compiler handled the 100 nested parentheses condition. > . A silent, insidious failure down the line because a limit was exceeded > is never good. > Amen! One should always do bounds checking when dealing with fixed-size aggregate data structures. One compiler that I worked on got a bug report of bad code being generated. The problem was an illegal optimization that never should have triggered but did due to a corrupted data table. Finding the culprit of the corruption took hours. It finally turned out to be due to overflow of an adjacent data table in use elsewhere in the compiler. The routine to add another entry to that table didn't check for table overflow. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lm at mcvoy.com Thu May 23 04:49:04 2024 From: lm at mcvoy.com (Larry McVoy) Date: Wed, 22 May 2024 11:49:04 -0700 Subject: [TUHS] A fuzzy awk. In-Reply-To: References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> Message-ID: <20240522184904.GK25728@mcvoy.com> On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote: > On Tue, May 21, 2024 at 2:12???PM Luther Johnson > wrote: > > > I like this anecdote because it points out the difference between being > > able to handle and process bizarre conditions, as if they were something > > that should work, which is maybe not that helpful, vs. detecting them and > > doing something reasonable, like failiing with a "limit exceeded" message > > > That is in fact precisely how the DEC compiler handled the 100 nested > parentheses condition. > > > . A silent, insidious failure down the line because a limit was exceeded > > is never good. > > > Amen! One should always do bounds checking when dealing with fixed-size > aggregate data structures. One compiler that I worked on got a bug report > of bad code being generated. The problem was an illegal optimization that > never should have triggered but did due to a corrupted data table. Finding > the culprit of the corruption took hours. It finally turned out to be due > to overflow of an adjacent data table in use elsewhere in the compiler. > The routine to add another entry to that table didn't check for table > overflow. We invented a data structure that gets around this problem nicely. It's an array of pointers that starts at [1] instead of [0]. The [0] entry encodes 2 things: In the upper bits, the log(2) the size of the array. So all arrays have at least [0] and [1]. So 2 pointers is the smallest array and that was important to us, we wanted it to scale up and scale down. In the lower bits, we record the number of used entries in the array. We assumed 32 bit pointers and with those we got ~134 million entries as our maximum number of entries. Usage is like char **space = allocLines(4); // start with space for 4 entries space = addLine(space, "I am [1]"); space = addLine(space, "I am [2]"); space = addLine(space, "I am [3]"); space = addLine(space, "I am [4]"); // realloc's to 8 entries freelines(space, 0); // second arg is typically 0 or free() It works GREAT. We used it all over BitKeeper, for stuff as small as commit comments to arrays of data structures. It scales down, scales up. Helper functions: /* * liblines - interfaces for autoexpanding data structures * * s= allocLines(n) * pre allocate space for slightly less than N entries. * s = addLine(s, line) * add line to s, allocating as needed. * line must be a pointer to preallocated space. * freeLines(s, freep) * free the lines array; if freep is set, call that on each entry. * if freep is 0, do not free each entry. * buf = popLine(s) * return the most recently added line (not an alloced copy of it) * reverseLines(s) * reverse the order of the lines in the array * sortLines(space, compar) * sort the lines using the compar function if set, else string_sort() * removeLine(s, which, freep) * look for all lines which match "which" and remove them from the array * returns number of matches found * removeLineN(s, i, freep) * remove the 'i'th line. * lines = splitLine(buf, delim, lines) * split buf on any/all chars in delim and put the tokens in lines. * buf = joinLines(":", s) * return one string which is all the strings glued together with ":" * does not free s, caller must free s. * buf = findLine(lines, needle); * Return the index the line in lines that matches needle */ It's all open source, apache licensed, but you'd have to tease it out of the bitkeeper source tree. Wouldn't be that hard and it would be useful. From lm at mcvoy.com Thu May 23 06:17:53 2024 From: lm at mcvoy.com (Larry McVoy) Date: Wed, 22 May 2024 13:17:53 -0700 Subject: [TUHS] A fuzzy awk. In-Reply-To: <20240522184904.GK25728@mcvoy.com> References: <51CC9A0D-122C-4A3D-8BAF-C249489FB817@serissa.com> <20240522184904.GK25728@mcvoy.com> Message-ID: <20240522201753.GL25728@mcvoy.com> Wayne teased this into a stand alone library here: https://github.com/wscott/bksupport On Wed, May 22, 2024 at 11:49:04AM -0700, Larry McVoy wrote: > On Wed, May 22, 2024 at 11:37:39AM -0400, Paul Winalski wrote: > > On Tue, May 21, 2024 at 2:12???PM Luther Johnson > > wrote: > > > > > I like this anecdote because it points out the difference between being > > > able to handle and process bizarre conditions, as if they were something > > > that should work, which is maybe not that helpful, vs. detecting them and > > > doing something reasonable, like failiing with a "limit exceeded" message > > > > > That is in fact precisely how the DEC compiler handled the 100 nested > > parentheses condition. > > > > > . A silent, insidious failure down the line because a limit was exceeded > > > is never good. > > > > > Amen! One should always do bounds checking when dealing with fixed-size > > aggregate data structures. One compiler that I worked on got a bug report > > of bad code being generated. The problem was an illegal optimization that > > never should have triggered but did due to a corrupted data table. Finding > > the culprit of the corruption took hours. It finally turned out to be due > > to overflow of an adjacent data table in use elsewhere in the compiler. > > The routine to add another entry to that table didn't check for table > > overflow. > > We invented a data structure that gets around this problem nicely. It's > an array of pointers that starts at [1] instead of [0]. The [0] > entry encodes 2 things: > > In the upper bits, the log(2) the size of the array. So all arrays > have at least [0] and [1]. So 2 pointers is the smallest array and > that was important to us, we wanted it to scale up and scale down. > > In the lower bits, we record the number of used entries in the array. > We assumed 32 bit pointers and with those we got ~134 million entries > as our maximum number of entries. > > Usage is like > > char **space = allocLines(4); // start with space for 4 entries > > space = addLine(space, "I am [1]"); > space = addLine(space, "I am [2]"); > space = addLine(space, "I am [3]"); > space = addLine(space, "I am [4]"); // realloc's to 8 entries > > freelines(space, 0); // second arg is typically 0 or free() > > It works GREAT. We used it all over BitKeeper, for stuff as small as > commit comments to arrays of data structures. It scales down, scales > up. Helper functions: > > /* > * liblines - interfaces for autoexpanding data structures > * > * s= allocLines(n) > * pre allocate space for slightly less than N entries. > * s = addLine(s, line) > * add line to s, allocating as needed. > * line must be a pointer to preallocated space. > * freeLines(s, freep) > * free the lines array; if freep is set, call that on each entry. > * if freep is 0, do not free each entry. > * buf = popLine(s) > * return the most recently added line (not an alloced copy of it) > * reverseLines(s) > * reverse the order of the lines in the array > * sortLines(space, compar) > * sort the lines using the compar function if set, else string_sort() > * removeLine(s, which, freep) > * look for all lines which match "which" and remove them from the array > * returns number of matches found > * removeLineN(s, i, freep) > * remove the 'i'th line. > * lines = splitLine(buf, delim, lines) > * split buf on any/all chars in delim and put the tokens in lines. > * buf = joinLines(":", s) > * return one string which is all the strings glued together with ":" > * does not free s, caller must free s. > * buf = findLine(lines, needle); > * Return the index the line in lines that matches needle > */ > > It's all open source, apache licensed, but you'd have to tease it out of > the bitkeeper source tree. Wouldn't be that hard and it would be useful. -- --- Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat From douglas.mcilroy at dartmouth.edu Thu May 23 23:49:18 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Thu, 23 May 2024 09:49:18 -0400 Subject: [TUHS] A fuzzy awk Message-ID: > Doug McIlroy was generating random regular expressions Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand. The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good. It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.) Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From jefftwopointzero at gmail.com Fri May 24 04:55:16 2024 From: jefftwopointzero at gmail.com (Jeffrey Joshua Rollin) Date: Thu, 23 May 2024 19:55:16 +0100 Subject: [TUHS] Gordon Bell has died In-Reply-To: References: Message-ID: <5738D239-4D89-4742-A30F-A0CCB1288780@gmail.com> > On 22 May 2024, at 13:20, John P. Linderman wrote: > > https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?unlocked_article_code=1.t00.arl-.blsWtHq8G62d&smid=url-share Very sad news. Jeff. -------------- next part -------------- An HTML attachment was scrubbed... URL: From will.senn at gmail.com Fri May 24 04:58:22 2024 From: will.senn at gmail.com (Will Senn) Date: Thu, 23 May 2024 13:58:22 -0500 Subject: [TUHS] Running v7 in Open-SIMH - update for 2024 Message-ID: All, I can't believe it's been 9 years since I wrote up my original notes on getting Research Unix v7 running in SIMH. Crazy how time flies. Well, this past week Clem found a bug in my scripts that create tape images. It seem like they were missing a tape mark at the end. Not a showstopper by any means, but we like to keep a clean house. So, I applied his fixes and updated the scripts along with the resultant tape image and Warren has updated them in the archive: https://www.tuhs.org/Archive/Distributions/Research/Keith_Bostic_v7/ I've also updated the note to address the fixes, to use the latest version of Open-SIMH on Linux Mint 21.3 "Virginia" (my host of choice these days), and to bring the transcripts up to date: https://decuser.github.io/unix/research-unix/v7/2024/05/23/research-unix-v7-3.2.html Later, Will -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Fri May 24 05:01:55 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Thu, 23 May 2024 14:01:55 -0500 Subject: [TUHS] Running v7 in Open-SIMH - update for 2024 In-Reply-To: References: Message-ID: <20240523190155.rexm26o5rqegvc7u@illithid> Hi Will, At 2024-05-23T13:58:22-0500, Will Senn wrote: > I can't believe it's been 9 years since I wrote up my original notes > on getting Research Unix v7 running in SIMH. Crazy how time flies. > Well, this past week Clem found a bug in my scripts that create tape > images. It seem like they were missing a tape mark at the end. Not a > showstopper by any means, but we like to keep a clean house. So, I > applied his fixes and updated the scripts along with the resultant > tape image [...] I'd like to join the many people who have previously thanked you for this work. Your resource made V7 Unix troff and nroff accessible to me, and that access has been invaluable to me in my efforts on groff. Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From will.senn at gmail.com Fri May 24 06:00:18 2024 From: will.senn at gmail.com (Will Senn) Date: Thu, 23 May 2024 15:00:18 -0500 Subject: [TUHS] Running v7 in Open-SIMH - update for 2024 In-Reply-To: <20240523190155.rexm26o5rqegvc7u@illithid> References: <20240523190155.rexm26o5rqegvc7u@illithid> Message-ID: <8ee116d2-c8b0-467b-a8c1-33ea0aa7a081@gmail.com> Hi Branden, On 5/23/24 2:01 PM, G. Branden Robinson wrote: > Hi Will, > > At 2024-05-23T13:58:22-0500, Will Senn wrote: >> I can't believe it's been 9 years > [...] > > I'd like to join the many people who have previously thanked you for > this work. Your resource made V7 Unix troff and nroff accessible to me, > and that access has been invaluable to me in my efforts on groff. > > Regards, > Branden Aw, you definitely made my day. As a regular user of groff, I am thrilled to have helped, even in this small way. Will From clemc at ccc.com Fri May 24 06:48:53 2024 From: clemc at ccc.com (Clem Cole) Date: Thu, 23 May 2024 16:48:53 -0400 Subject: [TUHS] Running v7 in Open-SIMH - update for 2024 In-Reply-To: References: Message-ID: FYI - POR is to push some new tools I have been creating into OpenSIMH shortly. In fairness to Will, this is in the class of a "2-minute minor," not a "4-minute major." I back into this issue as I was working on Oscar's new PiDP-10 and moving a very old (v6 syntax) UNIXC program that manipulates PDP-10 backup and TOPS-20 Dumper images. PDP-10s do things in 36 bits, which does not map cleanly to the 8 data bits of a 9-track tape (you don't want to know what the 10 does unless you have to deal with it). So, I wrote some tools to better examine and flexibly manipulate TAP files [the debug code for tapes in SIMH is a bit of a mess]. Anyway, as I was testing something, I thought I had made an error in my new tap_decode(1) tool when I was looking at the v7.tap.gz file that Warren has in the TUHS archives (that Will supplied/created with his mktape scripts). When I looked more carefully, it was missing a record. It turns out SIMH will silently "attach" a TAP image without a proper 9-track logical end-of-tape (it should give a warning). It also turns out Will's directions never looked for the actual 9-track EOT records - so nobody ever saw this. I mentioned it to him quietly - cudo's for coming clean. FWIW: I always recommend Will's documents for V6 and V7 (in fact, we point to them in the OpenSIMH archives at my suggestion). The truth is, I wish we had had access to a few more that are as good as Will's for some of the other OSses. Clem ᐧ On Thu, May 23, 2024 at 2:58 PM Will Senn wrote: > All, > > I can't believe it's been 9 years since I wrote up my original notes on > getting Research Unix v7 running in SIMH. Crazy how time flies. Well, this > past week Clem found a bug in my scripts that create tape images. It seem > like they were missing a tape mark at the end. Not a showstopper by any > means, but we like to keep a clean house. So, I applied his fixes and > updated the scripts along with the resultant tape image and Warren has > updated them in the archive: > > https://www.tuhs.org/Archive/Distributions/Research/Keith_Bostic_v7/ > > I've also updated the note to address the fixes, to use the latest version > of Open-SIMH on Linux Mint 21.3 "Virginia" (my host of choice these days), > and to bring the transcripts up to date: > > > https://decuser.github.io/unix/research-unix/v7/2024/05/23/research-unix-v7-3.2.html > > Later, > > Will > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robpike at gmail.com Fri May 24 06:52:35 2024 From: robpike at gmail.com (Rob Pike) Date: Fri, 24 May 2024 06:52:35 +1000 Subject: [TUHS] A fuzzy awk In-Reply-To: References: Message-ID: The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something. -rob On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy < douglas.mcilroy at dartmouth.edu> wrote: > > Doug McIlroy was generating random regular expressions > > Actually not. I exhaustively (within limits) tested an RE recognizer > without knowingly generating any RE either mechanically or by hand. > > The trick: From recursive equations (easily derived from the grammar of > REs), I counted how many REs exist up to various limits on token counts, > Then I generated all strings that satisfied those limits, turned the > recognizer loose on them and counted how many it accepted. Any disagreement > of counts revealed the existence (but not any symptom) of bugs. > > Unlike most diagnostic techniques, this scheme produces a certificate of > (very high odds on) correctness over a representative subdomain. The > scheme also agnostically checks behavior on bad inputs as well as good. It > does not, however, provide a stress test of a recognizer's capacity limits. And > its exponential nature limits its applicability to rather small domains. > (REs have only 5 distinct kinds of token.) > > Doug > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at humeweb.com Fri May 24 15:41:55 2024 From: andrew at humeweb.com (andrew at humeweb.com) Date: Thu, 23 May 2024 22:41:55 -0700 Subject: [TUHS] A fuzzy awk In-Reply-To: References: Message-ID: i did some of the later testing of bitblt. it was a lovely thing, slowly constructing a trustable synthetic bitblt of ever great size and range that you could compare the bitblt to be tested against. and we did find a couple of bugs, much to reiser’s chagrin. > On May 23, 2024, at 1:52 PM, Rob Pike wrote: > > The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something. > > -rob > > > On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy > wrote: >> > Doug McIlroy was generating random regular expressions >> >> Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand. >> >> The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. >> >> Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good. It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.) >> >> Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Fri May 24 17:17:47 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Fri, 24 May 2024 08:17:47 +0100 Subject: [TUHS] A fuzzy awk In-Reply-To: References: Message-ID: <20240524071747.77477213B8@orac.inputplus.co.uk> Hi, Rob wrote: > "Fuzzing" as it is now called (for no reason I can intuit) Barton Miller describes coining the term. ‘That night, I was logged on to the Unix system in my office via a dial-up phone line over a 1200 baud modem. ... I wanted a name that would evoke the feeling of random, unstructured data. After trying out several ideas, I settled on the term “fuzz”.’ — https://pages.cs.wisc.edu/~bart/fuzz/Foreword1.html Line noise inspired him, as he describes. -- Cheers, Ralph. From robpike at gmail.com Fri May 24 17:41:36 2024 From: robpike at gmail.com (Rob Pike) Date: Fri, 24 May 2024 17:41:36 +1000 Subject: [TUHS] A fuzzy awk In-Reply-To: <20240524071747.77477213B8@orac.inputplus.co.uk> References: <20240524071747.77477213B8@orac.inputplus.co.uk> Message-ID: I'm sure that's the etymology but fuzzing isn't exactly random. That's kinda the point of it. -rob On Fri, May 24, 2024 at 5:18 PM Ralph Corderoy wrote: > Hi, > > Rob wrote: > > "Fuzzing" as it is now called (for no reason I can intuit) > > Barton Miller describes coining the term. > > ‘That night, I was logged on to the Unix system in my office via > a dial-up phone line over a 1200 baud modem. ... > I wanted a name that would evoke the feeling of random, unstructured > data. After trying out several ideas, I settled on the term “fuzz”.’ > > — https://pages.cs.wisc.edu/~bart/fuzz/Foreword1.html > > Line noise inspired him, as he describes. > > -- > Cheers, Ralph. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph at inputplus.co.uk Fri May 24 20:00:56 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Fri, 24 May 2024 11:00:56 +0100 Subject: [TUHS] Is fuzz testing random? (Was: A fuzzy awk) In-Reply-To: References: <20240524071747.77477213B8@orac.inputplus.co.uk> Message-ID: <20240524100056.2B01220210@orac.inputplus.co.uk> Hi Rob, > I'm sure that's the etymology but fuzzing isn't exactly random. > That's kinda the point of it. I was just curious about the etymology, but thinking about it... The path crept along isn't random but guided by observation, say new output or increased coverage. But rather than exhaustively generate all possible inputs, a random subset is chosen to allow deeper progress to be made more quickly. -- Cheers, Ralph. From halbert at halwitz.org Fri May 24 21:56:50 2024 From: halbert at halwitz.org (Dan Halbert) Date: Fri, 24 May 2024 07:56:50 -0400 Subject: [TUHS] A fuzzy awk In-Reply-To: <20240524071747.77477213B8@orac.inputplus.co.uk> References: <20240524071747.77477213B8@orac.inputplus.co.uk> Message-ID: <422de511-c7d8-4a0d-a548-7bacd98d38ec@halwitz.org> On 5/24/24 03:17, Ralph Corderoy wrote: > Rob wrote: >> "Fuzzing" as it is now called (for no reason I can intuit) > Barton Miller describes coining the term. > As to where the inspiration of choice of word came from, I'll speculate : Bart Miller was a CS grad student contemporary of mine at Berkeley. Prof. Lotfi Zadeh was working on fuzzy logic, fuzzy sets, and "possibility theory". (Prof. William Kahan hated this work, and called it "wrong, and pernicious": cf. https://www.sciencedirect.com/science/article/abs/pii/S0020025508000716.) So the term "fuzzy" was almost infamous in the department. Prof. Richard Lipton was also at Berkeley at that time, and was working on program mutation testing, which fuzzes the program to determine the adequacy of test coverage, rather than fuzzing the test data. Dan H. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sat May 25 10:03:48 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Fri, 24 May 2024 19:03:48 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? Message-ID: <20240525000348.hq5zvwm6x4evl44h@illithid> Hi folks, I'm finding it difficult to find any direct sources on the question in the subject line. Does anyone here have any source material they can point me to documenting the existence of a port of BSD curses to Unix Version 7? I know that curses made it into 2.9BSD for the PDP-11, but that's not quite the same thing. There are comments in System V Release 2's curses.h file[1][2] (very different from 4BSD's[3]) that suggest some effort to accommodate Version 7's terminal driver. So I would _presume_ that curses got ported to Version 7. But that's System V, right when it started diverging from BSD curses, and moreover, presumption is not evidence. Even personal accounts/anecdotes would be helpful. Maybe some of you _wrote_ curses applications for Version 7 machines. Regards, Branden [1] System III apparently did not have curses at all. Both it and 4BSD were released in 1980. System V Release 1 doesn't seem to, either. [2] https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/include/curses.h [3] https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/include/curses.h -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From tuhs at tuhs.org Sat May 25 10:17:53 2024 From: tuhs at tuhs.org (Bakul Shah via TUHS) Date: Fri, 24 May 2024 17:17:53 -0700 Subject: [TUHS] A fuzzy awk In-Reply-To: References: Message-ID: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> What would be nice if programming languages provided some support for such exhaustive testing[1]. At one point I had suggested turning Go's Interface type to something like Guttag style abstract data types in that relevant axioms are specified right in the interface definition. The idea was that any concrete type that implements that interface must satisfy its axioms. Even if the compiler ignored these axioms, one can write a support program that can generate a set of comprehensive tests based on these axioms. [Right now a type "implementing" an interface only needs to have a set of methods that exactly match the interface methods but nothing more] The underlying idea is that each type is in essence a constraint on what values an instance of that type can take. So adding such axioms simply tightens (& documents) these constraints. Just the process of coming up with such axioms can improve the design (sor of like test driven design but better!). Now it may be that applying this to anything more complex than stacks won't work well & it won't be perfect but I thought this was worth experimenting with. This would be like functional testing of all the nuts and bolts and components that go in an airplane. The airplane may still fall apart but that would be a "composition" error! [1] There are "proof assisant" or formal spec languages such as TLA+, Coq, Isabelle etc. but they don't get used much by the average programmer. I want something more retail! > On May 23, 2024, at 1:52 PM, Rob Pike wrote: > > The semantic distinction is important but the end result is very similar. "Fuzzing" as it is now called (for no reason I can intuit) tries to get to the troublesome cases faster by a sort of depth-first search, but exhaustive will always beat it for value. Our exhaustive tester for bitblt, first done by John Reiser if I remember right, set the stage for my own thinking about how you properly test something. > > -rob > > > On Thu, May 23, 2024 at 11:49 PM Douglas McIlroy > wrote: >> > Doug McIlroy was generating random regular expressions >> >> Actually not. I exhaustively (within limits) tested an RE recognizer without knowingly generating any RE either mechanically or by hand. >> >> The trick: From recursive equations (easily derived from the grammar of REs), I counted how many REs exist up to various limits on token counts, Then I generated all strings that satisfied those limits, turned the recognizer loose on them and counted how many it accepted. Any disagreement of counts revealed the existence (but not any symptom) of bugs. >> >> Unlike most diagnostic techniques, this scheme produces a certificate of (very high odds on) correctness over a representative subdomain. The scheme also agnostically checks behavior on bad inputs as well as good. It does not, however, provide a stress test of a recognizer's capacity limits. And its exponential nature limits its applicability to rather small domains. (REs have only 5 distinct kinds of token.) >> >> Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Sat May 25 10:46:19 2024 From: clemc at ccc.com (Clem Cole) Date: Fri, 24 May 2024 20:46:19 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: <20240525000348.hq5zvwm6x4evl44h@illithid> References: <20240525000348.hq5zvwm6x4evl44h@illithid> Message-ID: I’m traveling this weekend so I’m doing this by memory. ISTR The original curses was developed on Ing70 as part of Rogue and that It missed the 2BSD tape by about a year. See if you can find an early Rogue distribution and I think you’ll find it there. If not look in the early net news source distributions. Sent from a handheld expect more typos than usual On Fri, May 24, 2024 at 8:04 PM G. Branden Robinson < g.branden.robinson at gmail.com> wrote: > Hi folks, > > I'm finding it difficult to find any direct sources on the question in > the subject line. > > Does anyone here have any source material they can point me to > documenting the existence of a port of BSD curses to Unix Version 7? > > I know that curses made it into 2.9BSD for the PDP-11, but that's not > quite the same thing. > > There are comments in System V Release 2's curses.h file[1][2] (very > different from 4BSD's[3]) that suggest some effort to accommodate > Version 7's terminal driver. So I would _presume_ that curses got > ported to Version 7. But that's System V, right when it started > diverging from BSD curses, and moreover, presumption is not evidence. > > Even personal accounts/anecdotes would be helpful. Maybe some of you > _wrote_ curses applications for Version 7 machines. > > Regards, > Branden > > [1] System III apparently did not have curses at all. Both it and 4BSD > were released in 1980. System V Release 1 doesn't seem to, either. > [2] > https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/include/curses.h > [3] > https://minnie.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/include/curses.h > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sat May 25 10:57:01 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Fri, 24 May 2024 19:57:01 -0500 Subject: [TUHS] A fuzzy awk In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> Message-ID: <20240525005701.efxidwmww56qmiwa@illithid> [restricting to list; strong opinions here] At 2024-05-24T17:17:53-0700, Bakul Shah via TUHS wrote: > What would be nice if programming languages provided some support for > such exhaustive testing[1]. [rearranging] > At one point I had suggested turning Go's Interface type to something > like Guttag style abstract data types in that relevant axioms are > specified right in the interface definition. It's an excellent idea. > The underlying idea is that each type is in essence a constraint on > what values an instance of that type can take. In the simple form of a data type plus a range constraint, that's the Ada definition of a subtype since day one--Ada '80 or Ada 83 if you insist on the standardized form of the language. 40 years later we have Linus Torvalds tearing up his achievement certificate in "kinder, gentler email interactions" just to trash the notion of range checks on data types.[1][2][3] Naturally, the brogrammers are quick to take Torvalds's side.[4] Pascal had range checks too, and Kernighan famously punked on Wirth for that. I'm not certain, but I get the feeling the latter got somewhat over-interpreted. (To be fair to Kernighan, Pascal _as specced in the Revised Report of 1973_[5] was in my opinion too weak a language to leave the lab, for many of the reasons he noted. The inflexible array typing was fatal, in my view.) > The idea was that any concrete type that implements that interface > must satisfy its axioms. Yes. There is of course much more to the universe of potential constraints than range checks. Ada 2022 has these in great generality with "subtype predicates". http://www.ada-auth.org/standards/22aarm/html/aa-3-2-4.html > Even if the compiler ignored these axioms, I don't understand why this idea wasn't seized upon with more force at the CSRC. The notion of a compiler flag that turned "extra" (in the Ritchie compiler circa 1980, this is perhaps expressed better as "any") correctness checks could not have been a novelty. NDEBUG and assert() are similarly extremely old even in Unix. > one can write a support program that can generate a set of > comprehensive tests based on these axioms. Yes. As I understand it, this is how Spark/Ada got started. Specially annotated comments expressing predicates communicated with such a support program, running much like the sort of automated theorem prover you characterize below as not "retail". In the last two revision cycles of the Ada standard (2013, 2022), Spark/Ada's enhancements have made it into the language--though I am not certain, and would not claim, that they compose with _every_ language feature. Spark/Ada started life as a subset of the language for a reason. But C has its own subset, MISRA C, so this is hardly a reason to scoff. > [Right now a type "implementing" an interface only needs to > have a set of methods that exactly match the interface methods but > nothing more] The underlying idea is that each type is in essence a > constraint on what values an instance of that type can take. So adding > such axioms simply tightens (& documents) these constraints. Just the > process of coming up with such axioms can improve the design (sor of > like test driven design but better!). Absolutely. Generally, software engineers like to operationalize things consistently enough that they can then be scripted/automated. Evidently software testing is so mind-numblingly tedious that the will to undertake it, even with automation, evaporates. > Now it may be that applying this to anything more complex than stacks > won't work well & it won't be perfect but I thought this was worth > experimenting with. This would be like functional testing of all the > nuts and bolts and components that go in an airplane. The airplane may > still fall apart but that would be a "composition" error! Yes. And even if you can prove 100% of the theorems in your system, you may learn to your dismay that your specification was defective. Automated provers are as yet no aid to system architects. > [1] There are "proof assisant" or formal spec languages such as TLA+, > Coq, Isabelle etc. but they don't get used much by the average > programmer. I want something more retail! I've had a little exposure to these. They are indeed esoteric, but also extremely resource-hungry. My _impression_, based on no hard data, is that increasing the abilities of static analyzers and the expressiveness with which they are directed with predicates is much cheaper. But a lot of programmers will not budge at any cost, and will moreover be celebrated by their peers for their obstinacy. See footnotes. There is much work still to be done. Regards, Branden [1] https://lore.kernel.org/all/202404291502.612E0A10 at keescook/ https://lore.kernel.org/all/CAHk-=wi5YPwWA8f5RAf_Hi8iL0NhGJeL6MN6UFWwRMY8L6UDvQ at mail.gmail.com/ [2] https://lore.kernel.org/lkml/CAHk-=whkGHOmpM_1kNgzX1UDAs10+UuALcpeEWN29EE0m-my=w at mail.gmail.com/ [3] https://www.businessinsider.com/linus-torvalds-linux-time-away-empathy-2018-9 [4] https://lwn.net/Articles/973108/ [5] https://archive.org/details/1973-the-programming-language-pascal-revised-report-wirth -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From g.branden.robinson at gmail.com Sat May 25 10:57:52 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Fri, 24 May 2024 19:57:52 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525000348.hq5zvwm6x4evl44h@illithid> Message-ID: <20240525005752.bbcyvkd4k2rhcxek@illithid> At 2024-05-24T20:46:19-0400, Clem Cole wrote: > I’m traveling this weekend so I’m doing this by memory. ISTR The original > curses was developed on Ing70 as part of Rogue and that It missed the 2BSD > tape by about a year. See if you can find an early Rogue distribution and > I think you’ll find it there. If not look in the early net news source > distributions. Thanks, Clem! Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From jsg at jsg.id.au Sat May 25 20:48:54 2024 From: jsg at jsg.id.au (Jonathan Gray) Date: Sat, 25 May 2024 20:48:54 +1000 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: <20240525000348.hq5zvwm6x4evl44h@illithid> References: <20240525000348.hq5zvwm6x4evl44h@illithid> Message-ID: On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote: > Hi folks, > > I'm finding it difficult to find any direct sources on the question in > the subject line. > > Does anyone here have any source material they can point me to > documenting the existence of a port of BSD curses to Unix Version 7? "In particular, the C shell, curses, termcap, vi and job control were ported back to Version 7 (and later System III) so that it was not unusual to find these features on otherwise pure Bell releases." from Documentation/Books/Life_with_Unix_v2.pdf in some v7ish distributions: unisoft, xenix, nu machine, venix? https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437 https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf pg 580 https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412 https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up From tuhs at tuhs.org Sat May 25 21:08:55 2024 From: tuhs at tuhs.org (Arrigo Triulzi via TUHS) Date: Sat, 25 May 2024 13:08:55 +0200 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: On 25 May 2024, at 12:49, Jonathan Gray wrote: > in some v7ish distributions: unisoft, xenix, nu machine, venix? In Xenix 286 I have “fond” memories of some characters being inverted in curses so you had your windows (if you drew them) looking weird. I had an #ifdef in my code to flip the characters… Arrigo From clemc at ccc.com Sat May 25 22:16:42 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 08:16:42 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525000348.hq5zvwm6x4evl44h@illithid> Message-ID: Oh how I hate history rewrites. Job control was developed by Kulp on V7 in Europe and MIT. Joy saw it and added it what would become 4BSD. The others were all developed on V7 (PDP11)at UCB. They were not back ported either. The vax work inherited them from V7. It is true, The public tended to see these as 4BSD features as that was the vehicle that got larger distribution. Sent from a handheld expect more typos than usual On Sat, May 25, 2024 at 6:49 AM Jonathan Gray wrote: > On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote: > > Hi folks, > > > > I'm finding it difficult to find any direct sources on the question in > > the subject line. > > > > Does anyone here have any source material they can point me to > > documenting the existence of a port of BSD curses to Unix Version 7? > > "In particular, the C shell, curses, termcap, vi and job control were > ported back to Version 7 (and later System III) so that it was not > unusual to find these features on otherwise pure Bell releases." > from Documentation/Books/Life_with_Unix_v2.pdf > > in some v7ish distributions: unisoft, xenix, nu machine, venix? > > https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437 > > https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up > > > https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf > pg 580 > > https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up > > https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412 > > https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davida at pobox.com Sat May 25 23:56:21 2024 From: davida at pobox.com (David Arnold) Date: Sat, 25 May 2024 23:56:21 +1000 Subject: [TUHS] A fuzzy awk In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> Message-ID: <52098DD5-4FE0-4892-9288-12FE70793484@pobox.com> > On 25 May 2024, at 10:18, Bakul Shah via TUHS wrote: > >  > What would be nice if programming languages provided some support for such exhaustive testing[1]. > > At one point I had suggested turning Go's Interface type to something like Guttag style abstract data types in that relevant axioms are specified right in the interface definition. The idea was that any concrete type that implements that interface must satisfy its axioms. Even if the compiler ignored these axioms, one can write a support program that can generate a set of comprehensive tests based on these axioms. Sounds like Eiffel, whose compiler had support for checking pre and post conditions (and maybe invariants?) at runtime, or disabling the checks for “performance” mode. d From douglas.mcilroy at dartmouth.edu Sun May 26 01:06:24 2024 From: douglas.mcilroy at dartmouth.edu (Douglas McIlroy) Date: Sat, 25 May 2024 11:06:24 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? Message-ID: > Does anyone here have any source material they can point me to > documenting the existence of a port of BSD curses to Unix Version 7? Curses appears in the v8 manual but not v7. Of course a conclusion that it was not ported to v7 turns on dates. Does v7 refer to a point in time or an interval that extended until we undertook to prepare the v8 manual? Obviously curses was ported during or before that interval. If curses was available when the v7 manual was prepared, I (who edited both editions) evidently was unaware of any dependence on it then. Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From rich.salz at gmail.com Sun May 26 01:11:39 2024 From: rich.salz at gmail.com (Rich Salz) Date: Sat, 25 May 2024 11:11:39 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: I thought that Rob Pike was involved in the port /R$, troll -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sun May 26 01:28:32 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 25 May 2024 10:28:32 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: Message-ID: <20240525152832.zzjipv2wjcuedyld@illithid> Hi Jonathan & Doug, At 2024-05-25T20:48:54+1000, Jonathan Gray wrote: > On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote: > > Does anyone here have any source material they can point me to > > documenting the existence of a port of BSD curses to Unix Version 7? > > "In particular, the C shell, curses, termcap, vi and [ snip per Clem Cole ;-) ] > were ported back to Version 7 (and later System III) so that it was > not unusual to find these features on otherwise pure Bell releases." > from Documentation/Books/Life_with_Unix_v2.pdf Thanks! This is exactly the sort of source citation I was looking for. At 2024-05-25T11:06:24-0400, Douglas McIlroy wrote: > Curses appears in the v8 manual but not v7. Of course a > conclusion that it was not ported to v7 turns on dates. I was confident that curses was not "part" of v7 because of these factors. (1) It wasn't in the manual; (2) archives of v7 in which we now traffic as historical artifacts show no trace of it; and (3) the story of its origin and development, even when distorted, doesn't place it at the CSRC as far back as 1977/8. But, if someone placed to know had claimed that it was, that would have been a claim worth investigating. > Does v7 refer to a point in time or an interval that extended until we > undertook to prepare the v8 manual? Obviously curses was ported during > or before that interval. Perhaps one reason my question can be read two ways is that I'm interested in both aspects of the issue. I'm trying to write a "History" section for the primary ncurses man page and clean up other problems its documentation has, like a boilerplate reference to "Version 7 curses" in many of its other man pages, which repeatedly implies such a thing as a separate line of development from "BSD curses" and "System V curses". I've been dubious of that language since first encountering it, but I want a good documentary record to support my proposal to chop it out. > If curses was available when the v7 manual was prepared, I (who edited > both editions) evidently was unaware of any dependence on it then. I see no evidence that you missed it. :) Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From clemc at ccc.com Sun May 26 01:40:13 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 11:40:13 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: It was never needed to be ported -- it was developed on V7. It was released in comp.sources.unix volume1 as pcurses That said, I believe late volumes have nervous updates. Clem ᐧ On Sat, May 25, 2024 at 11:11 AM Rich Salz wrote: > I thought that Rob Pike was involved in the port > > /R$, troll > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Sun May 26 01:43:54 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 11:43:54 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: l hate autocorrect ... s/nervous/numerous/ ᐧ On Sat, May 25, 2024 at 11:40 AM Clem Cole wrote: > It was never needed to be ported -- it was developed on V7. > It was released in comp.sources.unix volume1 as pcurses > > That said, I believe late volumes have nervous updates. > > Clem > ᐧ > > On Sat, May 25, 2024 at 11:11 AM Rich Salz wrote: > >> I thought that Rob Pike was involved in the port >> >> /R$, troll >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Sun May 26 01:51:12 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 11:51:12 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: On Sat, May 25, 2024 at 11:40 AM Clem Cole wrote: > It was never needed to be ported -- it was developed on V7. > It was released in comp.sources.unix volume1 as pcurses > > That said, I believe late volumes have nervous updates. > > Clem > ᐧ > >> >> As Rich points out, the comp.source.unix version may be a later Cornell version, but I am fairly sure that the original was developed in Cory Hall, I believe on Ing70, although it may have been the Cory Hall 11/70. I remember finding bugs in it when we ran it on the Teklabs 11/70, which was definitely a heavily hacked V7-based system with much of 2BSD and other UCB tools added to it. The point is while Vaxen had been released, we did not have one at Tektronix at the time, and I got a lot of V7-based tools from folks in Cory Hall. ᐧ ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sun May 26 01:57:37 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 25 May 2024 10:57:37 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: Message-ID: <20240525155737.bwmngdyf4qnj4avv@illithid> Hi Clem, At 2024-05-25T11:40:13-0400, Clem Cole wrote: > It was never needed to be ported -- it was developed on V7. > It was released in comp.sources.unix volume1 as pcurses This bit conflicts with other accounts. Here's what I have in draft. HISTORY 4BSD (1980) introduced curses, implemented largely by Kenneth C. R. C. Arnold, who organized the terminal abstraction and screen management features of Bill Joy’s vi(1) editor into a library. That system ran only on the VAX architecture; curses saw a port to 2.9BSD (1983) for the PDP‐11. System V Release 2 (SVr2, 1984) significantly revised curses and replaced the termcap portion thereof with a different API for terminal handling, terminfo. System V added form and menu libraries in SVr3 (1987) and enhanced curses with color support in SVr3.2 later the same year. SVr4 (1989) brought the panel library. pcurses by distinction was, by the accounts I have, a later effort by Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its termcap bits with a reimplementation terminfo. This was apparently done for licensing reasons, as BSD code was free ("as in freedom") and System V certainly was not. The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms, which starts as follows. Note the 2nd and 3rd paragraphs. .po +.5i .TL The Curses Reference Manual .AU Pavel Curtis .NH Introduction .LP Terminfo is a database describing many capabilities of over 150 different terminals. Curses is a subroutine package which presents a high level screen model to the programmer, while dealing with issues such as terminal differences and optimization of output to change one screenfull of text into another. .LP Terminfo is based on Berkeley's termcap database, but contains a number of improvements and extensions. Parameterized strings are introduced, making it possible to describe such capabilities as video attributes, and to handle far more unusual terminals than possible with termcap. .LP Curses is also based on Berkeley's curses package, with many improvements. The package makes use of the insert and delete line and character features of terminals so equipped, and determines how to optimally use these features with no help from the programmer. It allows arbitrary combinations of video attributes to be displayed, even on terminals that leave ``magic cookies'' on the screen to mark changes in attributes. > That said, I believe late volumes have nervous updates. I'm gathering data for another paragraph of that "History" section now. The long and short of it seems to be that: BSD curses, besides getting ported to many platforms, begat pcurses. pcurses begat PCCurses, PDCurses, and ncurses. PCCurses died. PDCurses went dormant, begat PDCursesMod, and roused from its slumber. ncurses, after a long period of erratic early administration that seemed more concerned with seizing celebrity status for its developers (one of whom was more single-minded and successful at this goal than the other) than with software development, has been maintained with a steady hand over 25 years. There also exists NetBSD curses, which wasn't developed ex nihilo but I'm not sure yet what origin it forked from. Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From clemc at ccc.com Sun May 26 02:06:27 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 12:06:27 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: <20240525155737.bwmngdyf4qnj4avv@illithid> References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: Ken was working in Ing70 [he was part of the Ingres group] - IngVax did not yet exist, ᐧ ᐧ On Sat, May 25, 2024 at 11:57 AM G. Branden Robinson < g.branden.robinson at gmail.com> wrote: > Hi Clem, > > At 2024-05-25T11:40:13-0400, Clem Cole wrote: > > It was never needed to be ported -- it was developed on V7. > > It was released in comp.sources.unix volume1 as pcurses > > This bit conflicts with other accounts. Here's what I have in draft. > > HISTORY > 4BSD (1980) introduced curses, implemented largely by Kenneth > C. R. C. Arnold, who organized the terminal abstraction and screen > management features of Bill Joy’s vi(1) editor into a library. > That system ran only on the VAX architecture; curses saw a port to > 2.9BSD (1983) for the PDP‐11. > > System V Release 2 (SVr2, 1984) significantly revised curses and > replaced the termcap portion thereof with a different API for > terminal handling, terminfo. System V added form and menu > libraries in SVr3 (1987) and enhanced curses with color support in > SVr3.2 later the same year. SVr4 (1989) brought the panel library. > > pcurses by distinction was, by the accounts I have, a later effort by > Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its > termcap bits with a reimplementation terminfo. This was apparently done > for licensing reasons, as BSD code was free ("as in freedom") and System > V certainly was not. > > The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms, > which starts as follows. Note the 2nd and 3rd paragraphs. > > .po +.5i > .TL > The Curses Reference Manual > .AU > Pavel Curtis > .NH > Introduction > .LP > Terminfo is a database describing many capabilities of over 150 > different terminals. Curses is a subroutine package which > presents a high level screen model to the programmer, while > dealing with issues such as terminal differences and optimization of > output to change one screenfull of text into another. > .LP > Terminfo is based on Berkeley's termcap database, but contains a > number of improvements and extensions. Parameterized strings are > introduced, making it possible to describe such capabilities as > video attributes, and to handle far more unusual terminals than > possible with termcap. > .LP > Curses is also based on Berkeley's curses package, with many > improvements. The package makes use of the insert and delete > line and character features of terminals so equipped, and > determines how to optimally use these features with no help from the > programmer. It allows arbitrary combinations of video attributes > to be displayed, even on terminals that leave ``magic cookies'' > on the screen to mark changes in attributes. > > > That said, I believe late volumes have nervous updates. > > I'm gathering data for another paragraph of that "History" section now. > The long and short of it seems to be that: > > BSD curses, besides getting ported to many platforms, begat pcurses. > > pcurses begat PCCurses, PDCurses, and ncurses. > > PCCurses died. > > PDCurses went dormant, begat PDCursesMod, and roused from its slumber. > > ncurses, after a long period of erratic early administration that seemed > more concerned with seizing celebrity status for its developers (one of > whom was more single-minded and successful at this goal than the other) > than with software development, has been maintained with a steady hand > over 25 years. > > There also exists NetBSD curses, which wasn't developed ex nihilo but > I'm not sure yet what origin it forked from. > > Regards, > Branden > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sun May 26 02:13:20 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 25 May 2024 11:13:20 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: <20240525161320.3jvozzlgvr6tfyxl@illithid> Hi Clem, At 2024-05-25T12:06:27-0400, Clem Cole wrote: > Ken [Arnold] was working in Ing70 [he was part of the Ingres group] - > IngVax did not yet exist, That does complicate my simplistic story. Ing70 was, then, as you noted in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix, but rather something with various bits of BSD (also in active development, I reckon). Nevertheless, I venture, the first officially distributed curses was in 4BSD, a VAX-only release. But, it stands to reason that BSD curses never got far from its -11-portable roots; it must have been obvious that the library would be desired on such hosts and the CSRG came to officially support it thus in 2.9BSD 3 years later. Hmm. I'll have to chew on how to recast that economically. Thanks for all the light you're throwing on this! Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From clemc at ccc.com Sun May 26 02:14:10 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 12:14:10 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: Ouch -- there was no licensing issue with curses or termcap. termcap and curses were written at UCB. When MaryAnn went to Columbus - there was desire to rewrite to be "compiled". That work was terminfo. AT&T >>restricted<< terminfo. Pavel (with coaching from a few of us, including me], wrote a new implementation of terminfo. When he was added it, he combined a rewrite of curses. Clem ᐧ On Sat, May 25, 2024 at 12:06 PM Clem Cole wrote: > Ken was working in Ing70 [he was part of the Ingres group] - IngVax did > not yet exist, > ᐧ > ᐧ > > On Sat, May 25, 2024 at 11:57 AM G. Branden Robinson < > g.branden.robinson at gmail.com> wrote: > >> Hi Clem, >> >> At 2024-05-25T11:40:13-0400, Clem Cole wrote: >> > It was never needed to be ported -- it was developed on V7. >> > It was released in comp.sources.unix volume1 as pcurses >> >> This bit conflicts with other accounts. Here's what I have in draft. >> >> HISTORY >> 4BSD (1980) introduced curses, implemented largely by Kenneth >> C. R. C. Arnold, who organized the terminal abstraction and screen >> management features of Bill Joy’s vi(1) editor into a library. >> That system ran only on the VAX architecture; curses saw a port to >> 2.9BSD (1983) for the PDP‐11. >> >> System V Release 2 (SVr2, 1984) significantly revised curses and >> replaced the termcap portion thereof with a different API for >> terminal handling, terminfo. System V added form and menu >> libraries in SVr3 (1987) and enhanced curses with color support in >> SVr3.2 later the same year. SVr4 (1989) brought the panel library. >> >> pcurses by distinction was, by the accounts I have, a later effort by >> Pavel Curtis to clone SVr2 curses by taking BSD curses and replacing its >> termcap bits with a reimplementation terminfo. This was apparently done >> for licensing reasons, as BSD code was free ("as in freedom") and System >> V certainly was not. >> >> The pcurses 0.7 tarball I have contains a document, doc/manual.tbl.ms, >> which starts as follows. Note the 2nd and 3rd paragraphs. >> >> .po +.5i >> .TL >> The Curses Reference Manual >> .AU >> Pavel Curtis >> .NH >> Introduction >> .LP >> Terminfo is a database describing many capabilities of over 150 >> different terminals. Curses is a subroutine package which >> presents a high level screen model to the programmer, while >> dealing with issues such as terminal differences and optimization of >> output to change one screenfull of text into another. >> .LP >> Terminfo is based on Berkeley's termcap database, but contains a >> number of improvements and extensions. Parameterized strings are >> introduced, making it possible to describe such capabilities as >> video attributes, and to handle far more unusual terminals than >> possible with termcap. >> .LP >> Curses is also based on Berkeley's curses package, with many >> improvements. The package makes use of the insert and delete >> line and character features of terminals so equipped, and >> determines how to optimally use these features with no help from the >> programmer. It allows arbitrary combinations of video attributes >> to be displayed, even on terminals that leave ``magic cookies'' >> on the screen to mark changes in attributes. >> >> > That said, I believe late volumes have nervous updates. >> >> I'm gathering data for another paragraph of that "History" section now. >> The long and short of it seems to be that: >> >> BSD curses, besides getting ported to many platforms, begat pcurses. >> >> pcurses begat PCCurses, PDCurses, and ncurses. >> >> PCCurses died. >> >> PDCurses went dormant, begat PDCursesMod, and roused from its slumber. >> >> ncurses, after a long period of erratic early administration that seemed >> more concerned with seizing celebrity status for its developers (one of >> whom was more single-minded and successful at this goal than the other) >> than with software development, has been maintained with a steady hand >> over 25 years. >> >> There also exists NetBSD curses, which wasn't developed ex nihilo but >> I'm not sure yet what origin it forked from. >> >> Regards, >> Branden >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clemc at ccc.com Sun May 26 02:21:17 2024 From: clemc at ccc.com (Clem Cole) Date: Sat, 25 May 2024 12:21:17 -0400 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: <20240525161320.3jvozzlgvr6tfyxl@illithid> References: <20240525155737.bwmngdyf4qnj4avv@illithid> <20240525161320.3jvozzlgvr6tfyxl@illithid> Message-ID: On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson < g.branden.robinson at gmail.com> wrote: > That does complicate my simplistic story. Ing70 was, then, as you noted > in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix, > but rather something with various bits of BSD (also in active > development, I reckon). > Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was very similar to Teklabs which I ran. It had all of 2BSD on it, but the kernel work that we think of as 'BSD" was 3.0BSD and later 4.0BSD and that was 100% on the Vax. The point is it was a 16 bits system, the Johnson C compiler with some fixes from the greater USENIX community including UCB. There was >>no port<< needed. This was its native tongue. It was >>included<< in later BSD released which is how people came to know it because 4.XBSD was became much more widely used than V7+2BSD. The 2.9 work of Keith at al, started because the UCB Math Dept could not afford a VAX. DEC had released the v7m code to support overlays, so slowly changed from the VAX made it back into the V7 based kernel - which took a new life. Clem ᐧ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.branden.robinson at gmail.com Sun May 26 02:25:49 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 25 May 2024 11:25:49 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: <20240525162549.yg2qndtloodv3upq@illithid> Hi Clem, At 2024-05-25T12:14:10-0400, Clem Cole wrote: > Ouch -- there was no licensing issue with [BSD] curses or termcap. Right. I wasn't trying to imply otherwise. That's why Pavel Curtis could use BSD curses as a basis for his pcurses. It is only System V curses that was encumbered. And now it too is available for inspection, if in a somewhat gray area for anyone with commercial ambitions. > termcap and curses were written at UCB. Agreed. I've seen no claim anywhere to the contrary. > When MaryAnn went to Columbus - there was desire to rewrite to be > "compiled". That work was terminfo. AT&T >>restricted<< terminfo. Yes. This too is my understanding. terminfo is a better API (and source format) than termcap, but I also surmise that better support for deployment environments with large "fleets" of video terminals was also seen by AT&T management as an enticing prospect for vendor lock-in. > Pavel (with coaching from a few of us, including me], wrote a new > implementation of terminfo. When he was added it, he combined a > rewrite of curses. Thank you for the confirmation. And for supplying some coaching all those years ago--we're still enjoying the benefits today! Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From g.branden.robinson at gmail.com Sun May 26 02:38:10 2024 From: g.branden.robinson at gmail.com (G. Branden Robinson) Date: Sat, 25 May 2024 11:38:10 -0500 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525155737.bwmngdyf4qnj4avv@illithid> <20240525161320.3jvozzlgvr6tfyxl@illithid> Message-ID: <20240525163810.flvazgbj6tq3l5rw@illithid> Hi Clem, At 2024-05-25T12:21:17-0400, Clem Cole wrote: > On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson < > g.branden.robinson at gmail.com> wrote: > > > That does complicate my simplistic story. Ing70 was, then, as you noted > > in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix, > > but rather something with various bits of BSD (also in active > > development, I reckon). > > > Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was > very similar to Teklabs which I ran. Yes, sorry, I was hasty and sloppy. I should have qualified that "Version 7 Unix" with "pure". Though I wonder if anyone ran "pure" distributions of anything by today's standards, with our flatpaks and VM images and containers and distributions and Linux kernel "taint" flags. And, blessed be, our reproducible builds. So there is such a thing as progress. > The point is it was a 16 bits system, the Johnson C compiler with some > fixes from the greater USENIX community including UCB. > There was >>no port<< needed. > > This was its native tongue. Okay. My crystal ball shows wordsmithing in my future. > It was >>included<< in later BSD released which is how people came to > know it because 4.XBSD was became much more widely used than V7+2BSD. Acknowledged. > The 2.9 work of Keith at al, started because the UCB Math Dept could > not afford a VAX. DEC had released the v7m code to support > overlays, so slowly changed from the VAX made it back into the V7 > based kernel - which took a new life. Ah, I'd never heard the actual origin story of later 2BSD's reason for parallel development. Thanks! Back when I was first learning Unix, a mere 30 years ago, I asked a local guru why the kernel image was called "vmunix" instead of just plain "unix". I got a correct answer, but then asked why you'd keep calling it "vmunix" when no non-VM Unix was even available for the platform. Historical inertia and the long shadow of the work that became 4BSD. (Linus's decision to name his kernel's image "vmlinux" [or "vmlinuz" for those remember having those lulz] when in its case no non-VM version had ever existed anywhere, nor even been desired or conceived, struck me as an excess of continuity.) Unix geeks are conservative about the weirdest things. Regards, Branden -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From imp at bsdimp.com Sun May 26 03:02:00 2024 From: imp at bsdimp.com (Warner Losh) Date: Sat, 25 May 2024 11:02:00 -0600 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: <20240525163810.flvazgbj6tq3l5rw@illithid> References: <20240525155737.bwmngdyf4qnj4avv@illithid> <20240525161320.3jvozzlgvr6tfyxl@illithid> <20240525163810.flvazgbj6tq3l5rw@illithid> Message-ID: On Sat, May 25, 2024, 10:38 AM G. Branden Robinson < g.branden.robinson at gmail.com> wrote: > Hi Clem, > > At 2024-05-25T12:21:17-0400, Clem Cole wrote: > > On Sat, May 25, 2024 at 12:13 PM G. Branden Robinson < > > g.branden.robinson at gmail.com> wrote: > > > > > That does complicate my simplistic story. Ing70 was, then, as you > noted > > > in a previous mail, an 11/70, but it _wasn't_ running Version 7 Unix, > > > but rather something with various bits of BSD (also in active > > > development, I reckon). > > > > > Mumble -- the kernel and 90% of the userspace on Ing70 was V7 -- it was > > very similar to Teklabs which I ran. > > Yes, sorry, I was hasty and sloppy. I should have qualified that > "Version 7 Unix" with "pure". Though I wonder if anyone ran "pure" > distributions of anything by today's standards, with our flatpaks and VM > images and containers and distributions and Linux kernel "taint" flags. > > And, blessed be, our reproducible builds. So there is such a thing as > progress. > > > The point is it was a 16 bits system, the Johnson C compiler with some > > fixes from the greater USENIX community including UCB. > > There was >>no port<< needed. > > > > This was its native tongue. > > Okay. My crystal ball shows wordsmithing in my future. > > > It was >>included<< in later BSD released which is how people came to > > know it because 4.XBSD was became much more widely used than V7+2BSD. > > Acknowledged. > > > The 2.9 work of Keith at al, started because the UCB Math Dept could > > not afford a VAX. DEC had released the v7m code to support > > overlays, so slowly changed from the VAX made it back into the V7 > > based kernel - which took a new life. > > Ah, I'd never heard the actual origin story of later 2BSD's reason for > parallel development. Thanks! > The 2.8 kernel from the 2.83 archive is a V7 with a bunch of hacks / features #ifdef'd into the tree with a primitive config thing to cons up the #defines. This is still largely present in 2.9, but with less rigid adherence for bug fixes. It's very clear that for the kernel this was followed. I've not studied userland to comment on that but i think not. It also explains why the release notes kept saying it was the last release starting iirc with 2.8... Warner Back when I was first learning Unix, a mere 30 years ago, I asked a > local guru why the kernel image was called "vmunix" instead of just > plain "unix". I got a correct answer, but then asked why you'd keep > calling it "vmunix" when no non-VM Unix was even available for the > platform. Historical inertia and the long shadow of the work that > became 4BSD. (Linus's decision to name his kernel's image "vmlinux" [or > "vmlinuz" for those remember having those lulz] when in its case no > non-VM version had ever existed anywhere, nor even been desired or > conceived, struck me as an excess of continuity.) > > Unix geeks are conservative about the weirdest things. > > Regards, > Branden > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.winalski at gmail.com Sun May 26 03:18:17 2024 From: paul.winalski at gmail.com (Paul Winalski) Date: Sat, 25 May 2024 13:18:17 -0400 Subject: [TUHS] A fuzzy awk In-Reply-To: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> Message-ID: On Fri, May 24, 2024 at 8:18 PM Bakul Shah via TUHS wrote: At one point I had suggested turning Go's Interface type to something like > Guttag style abstract data types in that relevant axioms are specified > right in the interface definition. The idea was that any concrete type that > implements that interface must satisfy its axioms. Even if the compiler > ignored these axioms, one can write a support program that can generate a > set of comprehensive tests based on these axioms. [Right now a type > "implementing" an interface only needs to have a set of methods that > exactly match the interface methods but nothing more] The underlying idea > is that each type is in essence a constraint on what values an instance of > that type can take. So adding such axioms simply tightens (& documents) > these constraints. Just the process of coming up with such axioms can > improve the design (sor of like test driven design but better!). > At one point I worked with a programming language called Gypsy that implemented this concept. Each routine had a prefix that specified axioms on the routine's parameters and outputs. The rest of Gypsy was a conventional procedural language but the semantics were carefully chosen to allow for automated proof of correctness. I wrote a formal specification for the DECnet session layer protocol (DECnet's equivalent of TCP) in Gypsy. I turned up a subtle bug in the prose version of the protocol specification in the process. -Paul W. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at quintile.net Sun May 26 03:24:07 2024 From: steve at quintile.net (Steve Simon) Date: Sat, 25 May 2024 18:24:07 +0100 Subject: [TUHS] Was curses ported to Seventh Edition Unix? Message-ID: with my pedantic head on… The “7th Edition” was the name of the Perkin Elmer port (nee Interdata), derived from Richard Miller’s work. This was Unix Version 7 from the labs, with a v6 C compiler, with vi, csh, and curses from 2.4BSD (though we where never 100% sure about this version). You never forget your first Unix :-) -Steve From tom.perrine+tuhs at gmail.com Sun May 26 03:36:53 2024 From: tom.perrine+tuhs at gmail.com (Tom Perrine) Date: Sat, 25 May 2024 10:36:53 -0700 Subject: [TUHS] A fuzzy awk In-Reply-To: References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> Message-ID: Another Gypsy user here... For KSOS-11 the kernel was described in SPECIAL - as a set of axioms and theorems. There was no actual connection between the formal specification in SPECIAL and the Modula code. Some of the critical user-space code for a trusted downgrade program, to bridge data from higher levels of classification to lower, was written in Gypsy. I visited UT Austin and Dr Good(?)'s team to learn it, IIRC. Gypsy was considered better in that the specification was tied to the executable through the pre/post conditions - and the better support for semi-automated theorem proving. On Sat, May 25, 2024 at 10:18 AM Paul Winalski wrote: > On Fri, May 24, 2024 at 8:18 PM Bakul Shah via TUHS wrote: > > At one point I had suggested turning Go's Interface type to something like >> Guttag style abstract data types in that relevant axioms are specified >> right in the interface definition. The idea was that any concrete type that >> implements that interface must satisfy its axioms. Even if the compiler >> ignored these axioms, one can write a support program that can generate a >> set of comprehensive tests based on these axioms. [Right now a type >> "implementing" an interface only needs to have a set of methods that >> exactly match the interface methods but nothing more] The underlying idea >> is that each type is in essence a constraint on what values an instance of >> that type can take. So adding such axioms simply tightens (& documents) >> these constraints. Just the process of coming up with such axioms can >> improve the design (sor of like test driven design but better!). >> > > At one point I worked with a programming language called Gypsy that > implemented this concept. Each routine had a prefix that specified axioms > on the routine's parameters and outputs. The rest of Gypsy was a > conventional procedural language but the semantics were carefully chosen to > allow for automated proof of correctness. I wrote a formal specification > for the DECnet session layer protocol (DECnet's equivalent of TCP) in > Gypsy. I turned up a subtle bug in the prose version of the protocol > specification in the process. > > -Paul W. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sauer at technologists.com Sun May 26 03:53:01 2024 From: sauer at technologists.com (Charles H Sauer (he/him)) Date: Sat, 25 May 2024 12:53:01 -0500 Subject: [TUHS] Prof Don Good [was Re: A fuzzy awk In-Reply-To: References: <1E156D9B-C841-4693-BEA4-34C4BF42BCD5@iitbombay.org> Message-ID: <958b0893-6829-41b9-a096-bf732e338ea1@technologists.com> On 5/25/2024 12:36 PM, Tom Perrine wrote: > Another Gypsy user here... > > For KSOS-11 the kernel was described in SPECIAL - as a set of axioms and > theorems. There was no actual connection between the formal > specification in SPECIAL and the Modula code. > > Some of the critical user-space code for a trusted downgrade program, to > bridge data from higher levels of classification to lower, was written > in Gypsy. I visited UT Austin and Dr Good(?)'s team to learn it, IIRC. > Gypsy was considered better in that the specification was tied to the > executable through the pre/post conditions - and the better support for > semi-automated theorem proving. When I was transitioning from being a rock n' roller to computer science student, I took my first undergraduate languages course from Don. https://www.dignitymemorial.com/obituaries/austin-tx/donald-good-8209907 Charlie -- voice: +1.512.784.7526 e-mail: sauer at technologists.com fax: +1.512.346.5240 Web: https://technologists.com/sauer/ Facebook/Google/LinkedIn/Twitter: CharlesHSauer From ats at offog.org Sun May 26 04:07:17 2024 From: ats at offog.org (Adam Sampson) Date: Sat, 25 May 2024 19:07:17 +0100 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: (Clem Cole's message of "Sat, 25 May 2024 12:14:10 -0400") References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: Clem Cole writes: > Pavel (with coaching from a few of us, including me], wrote a new > implementation of terminfo. When he was added it, he combined a > rewrite of curses. >From the utzoo Usenet archive... --start-- From: utzoo!decvax!harpo!floyd!vax135!cornell!pavel Newsgroups: net.general Title: New Curses/Terminfo Package Article-I.D.: cornell.3348 Posted: Sat Jul 10 15:10:14 1982 Received: Sun Jul 11 03:55:13 1982 At this past week's USENIX meeting, Mark Horton announced the completion of a replacement database/interface for the Berkeley 'termcap' setup. The new version is called 'terminfo' and has several advantages over termcap: - The database is compiled and therefore start-up time for programs using the package is considerably reduced, even faster than reading a single-entry termcap database. - The database is more human-readable and flexible. - Many more terminals can be supported due to the addition of several new capabilities, generalised parameter mechanisms (enabling the full use of, for example, the ANSI cursor-forward capability by allowing you to say 'move forward 35 spaces' as opposed to 'move forward' 35 times), a fully general yet efficient arithmetic mechanism which should allow the use of \any/ bizarre cursor-addressing scheme which can be computed, etc. - A \far/ better set of routines for accessing the database, requiring, for example, only a single call to read in an entire entry, making all of the terminal's capabilities fully available to the calling program. No more need for 'tgetent', 'tgetstr', etc. Conversion of existing programs from termcap to terminfo is very easy and usually consists mostly of throwing out all of the garbage needed to read and store a termcap entry. As a companion to the change to terminfo, Mark has also completed work on a re-vamped version of the Curses screen-handling library package. The new version has many, many advantages over the previous version, some of which are listed below: - New curses can use insert/delete line/character capabilities in terminals which have them, considerably speeding up many applications - It is possible to use the new curses on more than one type of terminal at once - All of the video attributes of a terminal (e.g. reverse video, boldface, blinking, etc.) can be used, in tandem if possible - New curses handles terminals like the Televideos with the so-called 'magic cookie' glitch which leaves markers on the screen for each change of video attributes - The arrow and function keys of terminals can be input just as though they were single characters, even on terminals which use multi-character sequences for these functions. The new curses does all necessary interpretation, passing back to the program only a defined constant telling which key was pressed. - There is a user-accessable scrolling region - The use of shell escapes and the csh ^Z job control feature is supported more fully - On systems which can support the notion, updates of the screen will abort if a character is typed at the keyboard, thus allowing the application to possibly avoid useless output - It should now be possible for most programs to be written very portably to run on most versions of UNIX, including System III, Berkeley UNIX, V7, Bell Labs internal UNIX, etc. This portability extends to the use of most terminal modes, such as raw mode, echoing, etc. Now for the bad news. Mark, being an employee of Bell Labs, cannot release any of his code. Estimates currently run as high as 18 months for a Bell release. Even then, nothing could be guaranteed as to its price. As a result, I have decided to do a public-domain implementation of both terminfo and the new curses. They will be compatible with Mark's versions. I have arranged for the library/database to be distributed with the next Berkeley Software Distribution, 4.2BSD, in December of this year. It will also be made available for free to any requestor. I agree with Mark when he says that terminfo is clearly superior to termcap and deserves to be made a new and lasting standard. I expect to be able to begin recruiting test sites for both curses and terminfo by the end of September. If you have any questions, comments or suggestions, please send them to me, not the network. Pavel Curtis {decvax,allegra,vax135,harpo,...}!cornell!pavel Pavel.Cornell at Udel-Relay --end-- -- Adam Sampson From alanglasser at gmail.com Sun May 26 08:28:27 2024 From: alanglasser at gmail.com (Alan Glasser) Date: Sat, 25 May 2024 18:28:27 -0400 Subject: [TUHS] Did UNIX Ever Touch SPC-SWAP, EPL, or EPLX (1A Languages)? In-Reply-To: References: Message-ID: Matt, First, sorry for the delayed response. In around 1994 through late 1996 I worked on the FlashPort project in Bell Labs. A significant project that we completed was FlashPort'ing the 4ESS SWAP assembler from TSS/360 to Solaris. My memory is that the 4E team wanted to get off of TSS and onto Unix. Alan https://techmonitor.ai/technology/emulator_house_echo_logic_folded_back_into_att On Fri, Apr 5, 2024 at 12:59 AM segaloco via TUHS wrote: > So I've been doing a bit of reading on 1A and 4ESS technologies lately, > getting > a feel for the state of things just prior to 3B and 5ESS popping onto the > scene, > and came across some BSTJ references to the programming environments > involved > in the 4ESS and TSPS No. 1 systems. > > The general assembly system targeting the 1A machine language was known as > SPC-SWAP (SWitching Assembly Program)[1](p. 206) and ran under OS/360/370, > with > editing typically performed in QED. This then gave way to the EPL (ESS > Programming Language) and ultimately EPLX (EPL eXtra)[2](p. 1)[3](p. 8) > languages which, among other things, were used for later 4ESS work with > cross- > compilers for at least TSS/360 by the sounds of it. > > Are there any recollections of attempts by the Bell System to rebase any of > these 1A-targeting environments into UNIX, or by the time UNIX was being > considered more broadly for Bell System projects, was 3B/5ESS technology > well on > the way, rendering attempting to move entrenched IBM-based environments > for the > older switching computation systems moot? > > For the record, in addition to the evolution of ESS to the 5ESS > generation, a > revision of TSPS, 1B, was also introduced which was rebased on the 3B20D > processor and utilized the same 3B cross-compilation SGS under UNIX as > other 3B- > targeted applications[4]. Interestingly, the paper on software development > in [4](p. 109) still makes reference to Programmer's Workbench as of 1982, > implying that nomenclature may have still been the norm at some Bell Labs > sites > such as Naperville, Illinois, although I can't tell if they're referring to > PWB as in the branch of UNIX or the environment of make, sccs, etc. > > Additionally, is anyone aware of surviving accessible specimens of SWAP > assembly, EPL, or EPLX code or literature beyond the BSTJ references and > paper > referenced in the IEEE library below? Thanks for any insights! > > - Matt G. > > [1] - > https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V58N06_197907_Part_1.pdf > [2] - https://ieeexplore.ieee.org/document/810323 > [3] - > https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V60N06_198107_Part_2.pdf > [4] - > https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V62N03_198303_Part_3.pdf > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 060804.PDF Type: application/pdf Size: 28592 bytes Desc: not available URL: From robpike at gmail.com Sun May 26 09:06:25 2024 From: robpike at gmail.com (Rob Pike) Date: Sun, 26 May 2024 09:06:25 +1000 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525000348.hq5zvwm6x4evl44h@illithid> Message-ID: Reminds me of my typesetting story (search the list's archives for versatec and vegents, that should find it.) -rob On Sat, May 25, 2024 at 10:17 PM Clem Cole wrote: > Oh how I hate history rewrites. Job control was developed by Kulp on V7 > in Europe and MIT. Joy saw it and added it what would become 4BSD. > > The others were all developed on V7 (PDP11)at UCB. They were not back > ported either. The vax work inherited them from V7. > > It is true, The public tended to see these as 4BSD features as that was > the vehicle that got larger distribution. > > Sent from a handheld expect more typos than usual > > > On Sat, May 25, 2024 at 6:49 AM Jonathan Gray wrote: > >> On Fri, May 24, 2024 at 07:03:48PM -0500, G. Branden Robinson wrote: >> > Hi folks, >> > >> > I'm finding it difficult to find any direct sources on the question in >> > the subject line. >> > >> > Does anyone here have any source material they can point me to >> > documenting the existence of a port of BSD curses to Unix Version 7? >> >> "In particular, the C shell, curses, termcap, vi and job control were >> ported back to Version 7 (and later System III) so that it was not >> unusual to find these features on otherwise pure Bell releases." >> from Documentation/Books/Life_with_Unix_v2.pdf >> >> in some v7ish distributions: unisoft, xenix, nu machine, venix? >> >> https://bitsavers.org/pdf/codata/Unisoft_UNIX_Vol_1_Aug82.pdf pg 437 >> >> https://archive.org/details/bitsavers_codataUnis_28082791/page/n435/mode/2up >> >> >> https://bitsavers.org/pdf/forwardTechnology/xenix/Xenix_System_Volume_2_Software_Development_1982.pdf >> pg 580 >> >> https://archive.org/details/bitsavers_forwardTecstemVolume2SoftwareDevelopment1982_27714599/page/n579/mode/2up >> >> https://bitsavers.org/pdf/lmi/LMI_Docs/UNIX_1.pdf pg 412 >> >> https://archive.org/details/bitsavers_lmiLMIDocs_20873181/page/n411/mode/2up >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at celo.io Sun May 26 21:10:32 2024 From: joe at celo.io (Joe) Date: Sun, 26 May 2024 13:10:32 +0200 Subject: [TUHS] Test, test In-Reply-To: References: Message-ID: On 11/24/23 01:30, Warren Toomey via TUHS wrote: > Just checking that the TUHS mailing list is still working. > It's been awfully quiet! > Cheers, Warren I was just wondering the same now, last received mail was in November. Testing ... From ralph at inputplus.co.uk Sun May 26 21:31:58 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Sun, 26 May 2024 12:31:58 +0100 Subject: [TUHS] Yes, the list is working. (Was: Test, test) In-Reply-To: References: Message-ID: <20240526113158.6B1601FB21@orac.inputplus.co.uk> Hi Joe, > > Just checking that the TUHS mailing list is still working. It's > > been awfully quiet! > > I was just wondering the same now, last received mail was in November. Yes, the list is working fine. If you look at an email from the list, its header will have list-* fields with useful content which includes List-Archive: That will show if there are emails reaching the list's software which you aren't receiving. -- Cheers, Ralph. From ralph at inputplus.co.uk Mon May 27 19:39:09 2024 From: ralph at inputplus.co.uk (Ralph Corderoy) Date: Mon, 27 May 2024 10:39:09 +0100 Subject: [TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk) In-Reply-To: References: Message-ID: <20240527093909.91CAD21F18@orac.inputplus.co.uk> Hi, Doug wrote: > The trick: From recursive equations (easily derived from the grammar > of REs), I counted how many REs exist up to various limits on token > counts, Then I generated all strings that satisfied those limits, > turned the recognizer loose on them and counted how many it accepted. Which reminded me of Doug's paper. Enumerating the strings of regular languages, J. Functional Programming 14 (2004) 503-518 Haskell code is developed for two ways to list the strings of the language defined by a regular expression: directly by set operations and indirectly by converting to and simulating an equivalent automaton. The exercise illustrates techniques for dealing with infinite ordered domains and leads to an effective standard form for nondeterministic finite automata. PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf It's also nice for the NFA construction with one state per symbol plus one final state, and no epsilon transitions. Doug writes: The even-a language (ab*a|b)* is defined by automaton h, with three start states. h0 = State 0 ’~’ [] h1 = State 1 ’b’ [h4,h1,h0] h2 = State 2 ’a’ [h4,h1,h0] h3 = State 3 ’b’ [h2,h3] h4 = State 4 ’a’ [h2,h3] h = [h4,h1,h0] The symbols replaced by their state numbers gives (43*2|1)*; state 0 is the sole final state. -- Cheers, Ralph. From hellwig.geisse at mni.thm.de Mon May 27 23:03:09 2024 From: hellwig.geisse at mni.thm.de (Hellwig Geisse) Date: Mon, 27 May 2024 15:03:09 +0200 Subject: [TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk) In-Reply-To: <20240527093909.91CAD21F18@orac.inputplus.co.uk> References: <20240527093909.91CAD21F18@orac.inputplus.co.uk> Message-ID: <2fa56390518c73b7f8e4563b5bae0fc48e374b03.camel@mni.thm.de> Hi, On Mon, 2024-05-27 at 10:39 +0100, Ralph Corderoy wrote: > > Which reminded me of Doug's paper. > >     Enumerating the strings of regular languages, >     J. Functional Programming 14 (2004) 503-518 > Thanks for the pointer. That's a nice paper, turned into an equally nice testing method. Hellwig From tuhs at tuhs.org Tue May 28 03:37:40 2024 From: tuhs at tuhs.org (segaloco via TUHS) Date: Mon, 27 May 2024 17:37:40 +0000 Subject: [TUHS] Did UNIX Ever Touch SPC-SWAP, EPL, or EPLX (1A Languages)? In-Reply-To: References: Message-ID: On Saturday, May 25th, 2024 at 3:28 PM, Alan Glasser wrote: > Matt, > First, sorry for the delayed response. > > In around 1994 through late 1996 I worked on the FlashPort project in Bell Labs. > A significant project that we completed was FlashPort'ing the 4ESS SWAP assembler from TSS/360 to Solaris. > My memory is that the 4E team wanted to get off of TSS and onto Unix. > > Alan > > https://techmonitor.ai/technology/emulator_house_echo_logic_folded_back_into_att > > > On Fri, Apr 5, 2024 at 12:59 AM segaloco via TUHS wrote: > > > So I've been doing a bit of reading on 1A and 4ESS technologies lately, getting > > a feel for the state of things just prior to 3B and 5ESS popping onto the scene, > > and came across some BSTJ references to the programming environments involved > > in the 4ESS and TSPS No. 1 systems. > > > > The general assembly system targeting the 1A machine language was known as > > SPC-SWAP (SWitching Assembly Program)[1](p. 206) and ran under OS/360/370, with > > editing typically performed in QED. This then gave way to the EPL (ESS > > Programming Language) and ultimately EPLX (EPL eXtra)[2](p. 1)[3](p. 8) > > languages which, among other things, were used for later 4ESS work with cross- > > compilers for at least TSS/360 by the sounds of it. > > > > Are there any recollections of attempts by the Bell System to rebase any of > > these 1A-targeting environments into UNIX, or by the time UNIX was being > > considered more broadly for Bell System projects, was 3B/5ESS technology well on > > the way, rendering attempting to move entrenched IBM-based environments for the > > older switching computation systems moot? > > > > For the record, in addition to the evolution of ESS to the 5ESS generation, a > > revision of TSPS, 1B, was also introduced which was rebased on the 3B20D > > processor and utilized the same 3B cross-compilation SGS under UNIX as other 3B- > > targeted applications[4]. Interestingly, the paper on software development > > in [4](p. 109) still makes reference to Programmer's Workbench as of 1982, > > implying that nomenclature may have still been the norm at some Bell Labs sites > > such as Naperville, Illinois, although I can't tell if they're referring to > > PWB as in the branch of UNIX or the environment of make, sccs, etc. > > > > Additionally, is anyone aware of surviving accessible specimens of SWAP > > assembly, EPL, or EPLX code or literature beyond the BSTJ references and paper > > referenced in the IEEE library below? Thanks for any insights! > > > > - Matt G. > > > > [1] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V58N06_197907_Part_1.pdf > > [2] - https://ieeexplore.ieee.org/document/810323 > > [3] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V60N06_198107_Part_2.pdf > > [4] - https://bitsavers.org/magazines/Bell_System_Technical_Journal/BSTJ_V62N03_198303_Part_3.pdf Wow, FlashPort sounds like quite the endeavor! It's funny, I've been considering something along those lines for attempting to port older console video games to computer, somewhere between emulation and a true port, essentially emulation where most of the actual translation of CPU operations has been done before-hand (AOT) rather than the common interpreter or dynacomp approaches (JIT). Glad to see a sizeable example of that sort of thing being used. Now if only Nokia would take a walk through the archives and see if any of this stuff still exists... - Matt G. From mah at mhorton.net Tue May 28 04:31:17 2024 From: mah at mhorton.net (Mary Ann Horton) Date: Mon, 27 May 2024 11:31:17 -0700 Subject: [TUHS] Was curses ported to Seventh Edition Unix? In-Reply-To: References: <20240525155737.bwmngdyf4qnj4avv@illithid> Message-ID: <78be4696-e743-4231-9c6a-32b6edd92f09@mhorton.net> Adam, thank you for finding this and setting the record straight. AT&T management had nothing to do with it. I self-censored because AT&T's policy was that anything I wrote belonged to my employer. Pavel graciously offered to clone my work, and I slipped him the spec and the algorithm for the new improved curses. His version was FOSS and became the de facto standard everywhere except AT&T, where it wound up in System V Release 4 / Solaris. Thanks, /Mary Ann Horton/ (she/her/ma'am)       Award Winning Author maryannhorton.com On 5/25/24 11:07, Adam Sampson wrote: > Clem Cole writes: > >> Pavel (with coaching from a few of us, including me], wrote a new >> implementation of terminfo. When he was added it, he combined a >> rewrite of curses. > From the utzoo Usenet archive... > > --start-- > > From: utzoo!decvax!harpo!floyd!vax135!cornell!pavel > Newsgroups: net.general > Title: New Curses/Terminfo Package > Article-I.D.: cornell.3348 > Posted: Sat Jul 10 15:10:14 1982 > Received: Sun Jul 11 03:55:13 1982 > > At this past week's USENIX meeting, Mark Horton announced the completion > of a replacement database/interface for the Berkeley 'termcap' setup. The > new version is called 'terminfo' and has several advantages over termcap: > - The database is compiled and therefore start-up time for > programs using the package is considerably reduced, even > faster than reading a single-entry termcap database. > - The database is more human-readable and flexible. > - Many more terminals can be supported due to the addition > of several new capabilities, generalised parameter > mechanisms (enabling the full use of, for example, the ANSI > cursor-forward capability by allowing you to say 'move forward > 35 spaces' as opposed to 'move forward' 35 times), a fully > general yet efficient arithmetic mechanism which should allow > the use of \any/ bizarre cursor-addressing scheme which can > be computed, etc. > - A \far/ better set of routines for accessing the database, > requiring, for example, only a single call to read in an > entire entry, making all of the terminal's capabilities fully > available to the calling program. No more need for 'tgetent', > 'tgetstr', etc. > Conversion of existing programs from termcap to terminfo is very easy and > usually consists mostly of throwing out all of the garbage needed to read > and store a termcap entry. > > As a companion to the change to terminfo, Mark has also completed work on > a re-vamped version of the Curses screen-handling library package. The new > version has many, many advantages over the previous version, some of which > are listed below: > - New curses can use insert/delete line/character capabilities > in terminals which have them, considerably speeding up many > applications > - It is possible to use the new curses on more than one type of > terminal at once > - All of the video attributes of a terminal (e.g. reverse video, > boldface, blinking, etc.) can be used, in tandem if possible > - New curses handles terminals like the Televideos with the > so-called 'magic cookie' glitch which leaves markers on the > screen for each change of video attributes > - The arrow and function keys of terminals can be input just as > though they were single characters, even on terminals which use > multi-character sequences for these functions. The new curses > does all necessary interpretation, passing back to the program > only a defined constant telling which key was pressed. > - There is a user-accessable scrolling region > - The use of shell escapes and the csh ^Z job control feature is > supported more fully > - On systems which can support the notion, updates of the screen > will abort if a character is typed at the keyboard, thus allowing > the application to possibly avoid useless output > - It should now be possible for most programs to be written very > portably to run on most versions of UNIX, including System III, > Berkeley UNIX, V7, Bell Labs internal UNIX, etc. This portability > extends to the use of most terminal modes, such as raw mode, > echoing, etc. > > Now for the bad news. Mark, being an employee of Bell Labs, cannot release > any of his code. Estimates currently run as high as 18 months for a Bell > release. Even then, nothing could be guaranteed as to its price. As a result, > I have decided to do a public-domain implementation of both terminfo and the > new curses. They will be compatible with Mark's versions. I have arranged > for the library/database to be distributed with the next Berkeley Software > Distribution, 4.2BSD, in December of this year. It will also be made available > for free to any requestor. I agree with Mark when he says that terminfo is > clearly superior to termcap and deserves to be made a new and lasting standard. > > I expect to be able to begin recruiting test sites for both curses and terminfo > by the end of September. > > If you have any questions, comments or suggestions, please send them to me, not > the network. > > Pavel Curtis > {decvax,allegra,vax135,harpo,...}!cornell!pavel > Pavel.Cornell at Udel-Relay > > --end-- > -------------- next part -------------- An HTML attachment was scrubbed... URL: From web at loomcom.com Wed May 29 06:37:37 2024 From: web at loomcom.com (Seth Morabito) Date: Tue, 28 May 2024 13:37:37 -0700 Subject: [TUHS] IN/ix Message-ID: <023b2172-d8f8-456a-91ce-071d95f6b921@app.fastmail.com> A few years ago, someone -- and I've forgotten who, forgive me -- kindly gave me a copy of the source code for a UNIX for the AT&T PC6300 called IN/ix, developed by INTERACTIVE Systems. I have found precious little about this system online. Apparently the PC/ix UNIX for the IBM PC XT is fairly well preserved, but I can't find much about IN/ix. For what it's worth, the login herald in the source code reads: "IN/ix Office System (c) Copyright INTERACTIVE Systems Corp. 1983, 1988" Presumably this was PC/ix, but targeting the AT&T 6300? Does anyone have any more knowledge of IN/ix? If you're interested in digging into it yourself, I've dropped the source here: https://archives.loomcom.com/pc6300/ (N.B.: All the files inside the zip are compressed, that's just how I got it) -Seth -- Seth Morabito * Poulsbo, WA * https://loomcom.com/ From e5655f30a07f at ewoof.net Wed May 29 21:57:50 2024 From: e5655f30a07f at ewoof.net (Michael =?utf-8?B?S2rDtnJsaW5n?=) Date: Wed, 29 May 2024 11:57:50 +0000 Subject: [TUHS] OS and vendor identification Message-ID: I spotted this elsewhere and thought that maybe someone here might be able to contribute. https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html -- Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?” From lars at nocrew.org Thu May 30 03:22:56 2024 From: lars at nocrew.org (Lars Brinkhoff) Date: Wed, 29 May 2024 17:22:56 +0000 Subject: [TUHS] OS and vendor identification In-Reply-To: ("Michael \=\?utf-8\?Q\?Kj\=C3\=B6rling\=22's\?\= message of "Wed, 29 May 2024 11:57:50 +0000") References: Message-ID: <7wed9kjzwv.fsf@junk.nocrew.org> Michael Kjörling wrote: > I spotted this elsewhere and thought that maybe someone here might be > able to contribute. > https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html Chances are you will find something on Bitsavers: https://google.com/search?q=%22triton%22+%22unix%22+site%3Abitsavers.org From crossd at gmail.com Thu May 30 03:31:10 2024 From: crossd at gmail.com (Dan Cross) Date: Wed, 29 May 2024 13:31:10 -0400 Subject: [TUHS] OS and vendor identification In-Reply-To: References: Message-ID: On Wed, May 29, 2024 at 8:07 AM Michael Kjörling wrote: > I spotted this elsewhere and thought that maybe someone here might be > able to contribute. > > https://lists.gnu.org/archive/html/config-patches/2024-05/msg00022.html ACIS/AOS in that listing almost surely refers to the "ACademic Information System" / "Academic Operating System", which was IBM's port of 4.3BSD-Tahoe+NFS to the RT (e.g., 6150/6151/6152) using the ROMP processor sold to universities. These were used in Project Athena, I believe; Ted can say more about that than I can. I'd send this to Zach directly, but I don't have his email address. - Dan C. From pnr at planet.nl Fri May 31 22:00:55 2024 From: pnr at planet.nl (Paul Ruizendaal) Date: Fri, 31 May 2024 14:00:55 +0200 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> Message-ID: <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl> I’m further looking into BCPL / B / C family compilers on 16-bit mini-computers prior to 1979. Lot’s of interesting stuff. BCPL was extended with structures at least twice and plenty struggle with (un)scaled pointers. It seems that the Nova was a much easier target than the PDP-11, with a simpler code generator sufficing to generate quality code. I’ll report more fully when I’m further along with my review. > On May 8, 2024, at 5:51 PM, Clem Cole wrote: > > IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth. As the 11/40 was one of the original Thoth target systems, I would have expected that to exist, but I have never used it. Yes, they did. I’m working through the various papers on Thoth and the Eh / Zed compilers (essentially B with tweaks). I’ve requested pdf’s of two theses that are only on micro-fiche from the Uni of Waterloo library, hopefully this is possible. The original target machines were Honeywell 6060, DG Nova, Microdata 1600/30 and TI-990. The latter is close enough to a PDP-11. This compiler is from 1976. I’ve browsed around for surviving Thoth source code, but it would seem to be lost. Does anyone know of surviving Thoth bits? -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.martin.yardley at gmail.com Fri May 31 22:21:03 2024 From: peter.martin.yardley at gmail.com (Peter Yardley) Date: Fri, 31 May 2024 22:21:03 +1000 Subject: [TUHS] On the uniqueness of DMR's C compiler In-Reply-To: <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl> References: <18efd14f-4da6-4771-ad6a-901c6cb6105d@planet.nl> <57a37626-728c-4f34-b08b-a4f521f1db03@planet.nl> <4CB7B6B4-DF24-43EE-91F2-0C1CCBEB91E3@planet.nl> Message-ID: <601CEA28-C64A-4BF3-AC7A-245ED4E653EA@gmail.com> I believe the Nova became a Mil Std instruction set (proven without hazard). Its architecture was pretty simple. We sold ours to the Navy. > On 31 May 2024, at 10:00 PM, Paul Ruizendaal wrote: > > I’m further looking into BCPL / B / C family compilers on 16-bit mini-computers prior to 1979. > > Lot’s of interesting stuff. BCPL was extended with structures at least twice and plenty struggle with (un)scaled pointers. It seems that the Nova was a much easier target than the PDP-11, with a simpler code generator sufficing to generate quality code. I’ll report more fully when I’m further along with my review. > >> On May 8, 2024, at 5:51 PM, Clem Cole wrote: >> >> IIRC, Mike Malcom and the team built a true B compiler so they could develop Thoth. As the 11/40 was one of the original Thoth target systems, I would have expected that to exist, but I have never used it. > > Yes, they did. I’m working through the various papers on Thoth and the Eh / Zed compilers (essentially B with tweaks). I’ve requested pdf’s of two theses that are only on micro-fiche from the Uni of Waterloo library, hopefully this is possible. The original target machines were Honeywell 6060, DG Nova, Microdata 1600/30 and TI-990. The latter is close enough to a PDP-11. This compiler is from 1976. > > I’ve browsed around for surviving Thoth source code, but it would seem to be lost. Does anyone know of surviving Thoth bits? > > Peter Yardley peter.martin.yardley at gmail.com