[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Java, distributed OO revision control




in some earlier essays posted here I have been exploring some of the 
ramifications of Java and the distributed computing model it gives
rise to, suggesting that many new standards are on their way to
deal with the unique associated programming complexities. 
here are some more thoughts along this line.

Java clearly was designed to allow the integration of 
objects located anywhere in cyberspace, although this is not yet
realized in widespread practice. even as part of the basic 
standard it proposes a naming hierarchy (i.e. object namespace)
that includes internet domain names.

the problem of distributed objects is somewhat interesting and I
believe will lead to many new advancements but also require many
sophisticated new practices on the internet. however, these are
the "same problems" that have been repeatedly encountered in the
past, just re-rearing their heads in a way that begs for systematic
treatment.

consider the problem of software that uses a lot of different 
components built by other people. I create a Widget X that uses
Gadget A,B,C, all of these being different pieces of code maintained
by other people somewhere on the net. each of these pieces of code
may go through revisions that make earlier conventions obsolete,
or worse yet, introduce unexpected bugs. this is a very basic
problem of software development whether you are within a company
or within cyberspace, but it is going to become far more prevalent
once distributed objects are in place. how can we deal with this
complexity?

==

one idea that occurs to me that would be very powerful in tackling
these problems would be a "distributed object oriented revision control 
system", DOORCS. many here are familiar with revision control systems that
work on program files. what I imagine is a RCS that allows individual
objects to be checked in and checked out, and keeps track of earlier
versions of objects. 

let's say then that I write my Widget X. I could "freeze" the versions
of the objects A,B,C that I want to use if each of these designers
was using the DOORCS-- they commit to keeping earlier versions of their
code in place so that my own code is stable. this is *not* the same
as me copying their code into my own directories, which is highly
undesirable from the point of view of development, because it forks
off the lines of geneology.

hence when my code runs, it names the version of the objects that it
is using over the network. so when people create new versions of objects,
my code is guaranteed stable.

all kinds of interesting embellishments on this system can be put in
place that might allow automation of software jobs and chores that take
a very long time in our current system, some of which I will describe.

imagine the problem of some code being revised, and the designer must
spend time integrating the new changes into his system. what I propose
would be that when people create objects, they also include an 
"intention" field that indicates things like:

1. how long this version is likely to stick around, if new versions 
are in the pipeline
2. how long this version will be kept around after new versions of
the object are created, i.e. "expiration date"
3. whether new versions is/are going to be backward compatible

this kind of information could be in fact applied on a method-per-method
basis. now imagine that I run a program associated with my Widget X
called "update". this program goes out into the object hierarchy and
notifies me of new versions of my objects that are in existence. it
might automatically adjust versions to new versions of the objects if
they are supposed to be "backward compatible". it could tell me things
like "so-and-so object that you are using is going to be replaced in
[x] days", or "so-and-so version was replaced with a newer version".

with this kind of information, combined in ingenious ways, I can actually
measure the overall "stability" of my program based on the "stability"
of all the parts. I can actually make design decisions about using
different objects "out there" that are likely to be more stable, if
that is my preference, or more "state-of-the-art" but buggy (the basic
tradeoff going on here).

now, here is where the fun can really begin. when all of these systems
are formalized and standardized, you can write software that automates
some of the very difficult tasks that many programmers face. I would
wager the majority of time spent in large programming tasks is 
dedicated to some basic problems:

1. regression testing. adding new components and making sure the 
"whole" still works when you add new parts (objects).

2. locating bad modules when a regression test fails.

imagine that these time consuming processes that take days of the
lives of programmers could be *automated*! that is precisely what I 
am proposing would be possible with a very good DOORCS. here's how it 
would work:

a person that creates an object also creates "assertions" or 
regression tests built into the object. these tests are run to make
sure they pass for some version of all the objects that this object
comprises. these assertions should be code that can be run with an
exit status of "code passed" or "code failed".

now, when new versions of the other objects are created, an automated
"packager" could test the new versions of code automatically, and also
isolate bad versions of the new objects that aren't backward compatible
or introduce new bugs (i.e. "regress"). the automated "tester" would
be similar to a binary search algorithm: it would start by adding
all new modules, and then running the regression tests. if it passes,
the new modules are considered trustworthy. if it fails, it can 
switch back and forth between previous and new versions of the modules,
rerunning the regression tests, and automatically find the bad modules
possibly very rapidly!! 

I claim that this is exactly what programmers
often spend many of the hours in their day doing, and an automated
means of doing this could possibly be quite revolutionary. furthermore,
adding the "distributed" aspects of associated with cyberspace, and
you have a sort of "holographic programming environment" in which
everyone on the Net effectively becomes a cooperative programmer
in the same company!!

==

now, consider some other interesting problems. often people have
different ideas about where they want code to "go" in the future.
a DOORCS system might actually track the geneology of a piece of
code, and allow anyone, not merely the original creator, to create
a new "branch" of development of any object on the net. viewed in
this way, we have a sort of "object commune" in which everyone
contributes what they want to the development of software, and it
simply moves in the directions that are decided by mass consensus.
you might have "breakings" and "mergings" as people diversify and
unify different algorithms. anyone can decide to use any version
of the object in the existing tree, or modify it accordingly.
in fact this creates a sort of "software breeding ground" in which
different objects are crossed, intermixed, and combined by programmers,
the trees or geneology of which are tracked by the DOORCS.

one concept to bring out in all this: what I am proposing is also
a hierarchical method of revision control in which the granularity
of control is very narrow, i.e. that of individual objects. it is
this granularity or resolution that allows all the neat tricks and
very streamlined version management. (today, most companies do 
RCS on the level of entire programs or files in those programs, which
would not fully support all of these capabilities I've delineated.)

in the view I am proposing, every piece of software is an "object"
composed of other "objects". these objects all have their own 
versions, and some fixed combination of these versions, plus additional
modifications can be named a unique version of the encompassing object.

also, I like the idea of every object having a "maintainer" or an
email address of where to send bug reports to. it seems that I am
eternally finding bugs in other people's software when I try to 
write my own software, and in some ways this is an impossible fate
to avoid (users invariably become bug finders). at least this way
I would have somewhere to complain to. an object might actually
store all of the bug reports or enhancements that have been sent to
it from the net, and when the maintainer goes to modify that object,
he can automatically call up all the associated comments! the maintainer
may even find that various enhancements have already been added to
his objects by others "out there" and he might take the task of
"authorizing" (i.e. integrating into his "official" version)
all the ones he finds most relevant and useful to his software.

note that some of the things I am proposing can be handled by
inheritance properties in language, and there is some similarity,
but I don't consider current concepts of inheritance in general 
the proper mechanism for dealing with revision control, although
it may be that new concepts of "inheritance" that combine it with
the above revision control ideas find their way into languages.

note also that I am very explicitly abandoning the idea that some
programmers have, "if someone's new software doesn't work, then they
should fix it, and not distribute buggy software". the whole concept
and premise here is that BUGGY SOFTWARE EXISTS and cannot necessarily
be detected by the PRODUCER of that software, and that a system ought
to be devised in which the CONSUMER can have total freedom over what
versions of the software he uses based on his own perceptions of its
value or bugginess. the more that bugs and program development are
seen as an inherent part of the process, the more beneficial the overall
system in my view.

==

(there will be many people who object to all this as fantasy 
based on MONEY. "who will pay for it??" ask the unimaginative. I don't
want to get into the economics of all this in this paper, only to
say that I did explore this in an earlier paper, where I said that
microcurrency combined with per-use-charges on objects could lead
to a very interesting "vending machine software environment".)

while all of these ideas may sound "pretty but unnecessary" at this
moment, I think they will be seen as increasingly critical when
distributed objects begin to catch hold on the Internet with the
Java paradigm. many programs such as Makefiles were invented for
the sole purpose of dealing with the associated complexity of
programming, not the programming itself, and I think this trend
will be continuing.

increasingly, programming environments are not merely going to
be programming languages, which is a very minor part of program
development, but entire systems for the development of code.
many programmers tend to oppose these new systems, insisting that
"I could do all that by hand in the old days". but they really will
save tremendous labor if done properly, and not
create new limitations and burdens but instead give new freedoms and 
options to the programmer,