Maybe Everything is OSS Now (but Not F)

I think I see a logical argument that (virtually) all software has now become open source and to some extent free.

(Virtually) all software is now developed in the context of LLM tools.
LLM tools hoover up all the data they see, importing all of the source code they generate or work with to internal databases owned by their makers.
These databases are used as training data for future iterations of the LLM models.
LLMs generates source code that is in its training data or similar to code in its training data.
Therefore, to use LLM tools in software development is both to (1) donate and (2) receive source code to and from (virtually) all other organizations and developers.
To donate and receive source code arbitrarily between all developers without regard to organizational barriers is tantamount to “open source” software, at least as opposed to “proprietary” software.

Therefore, all software is now open source and in some sense “freely” distributed.

There are a few obvious protests to this conclusion.

It applies only to “virtually” all software, whatever “virtually” means. Clearly there are organizations and individuals not using LLMs or using them in a secured fashion (eg locally). The logic doesn’t apply to them. That said, the likelihood that some folks at Google are playing with LLMs whose inputs are visible to folks from Microsoft, and vice versa, to name two examples, seems high. To the extent that individuals at these two companies (or others like them) are using competitors’ LLMs in software development, their software is no longer closed source. (I suspect that Google and Microsoft are among the more disciplined organizations in constraining employees from showing proprietary code to external LLMs. Nonetheless, somewhere, in some companies, these employees surely exist and the exchanges are surely happening.)
“Open source” usually implies that you can obtain the entire source code for a given project. LLMs are not providing whole projects’ worth of source code, certainly not in one query and not verbatim. But perhaps this is moot. Imagine your competitor developed a whiz-bang new game engine with the help of an LLM. Later the next iteration of that LLM appears. Now you begin asking the LLM for help building a whiz-bang new game engine. It certainly seems possible if not probable that you could get the LLM to provide you with significant chunks of your competitor’s source code—crucial functions, algorithms, structural decisions—without even mentioning the competitor. The transmission will not be complete or verbatim, but it is likely to be sufficient to gain competitive knowledge. Often a product’s entire competitiveness lies with one or two algorithms, potentially just a few functions. An LLM that sees these will hold them and could express them.
Clearly this is not FOSS as we like to think of it. The means of transmission is radically closed. The original source cannot be directly read or studied.

I suppose the point is that if you are generally in favor of source code being freely traded, LLM is a powerful (but opaque) new vector in that liberation. If you are generally in favor of proprietary source code being withheld from distribution outside of the organization that owns it, then this is an era in which a whole lot of leakage has happened and won’t be undone.

Discover more from Holy Ghost Stories

Subscribe to get the latest posts sent to your email.