[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Your faith is PGP is charming and quaint, but wrong
>Style, for example, cannot be easily copied.
Au contrair.
Cypherpunks share code...
#! /bin/sh
# This is a shell archive. Remove anything before this line, then feed it
# into a shell via "sh file" or similar. To overwrite existing files,
# type "sh file -c".
# The tool that generated this appeared in the comp.sources.unix newsgroup;
# send mail to [email protected] if you want that tool.
# Contents: Makefile PATCHLEVEL README markov3.6 markov3.l
# Wrapped by rsalz@sulphur on Thu Nov 30 00:04:29 1995
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
echo If this archive is complete, you will see the following message:
echo ' "shar: End of archive."'
if test -f 'Makefile' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'Makefile'\"
else
echo shar: Extracting \"'Makefile'\" \(53 characters\)
sed "s/^X//" >'Makefile' <<'END_OF_FILE'
X
Xmarkov3: markov3.o
X $(CC) -o $@ $(CFLAGS) markov3.o
END_OF_FILE
if test 53 -ne `wc -c <'Makefile'`; then
echo shar: \"'Makefile'\" unpacked with wrong size!
fi
# end of 'Makefile'
fi
if test -f 'PATCHLEVEL' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'PATCHLEVEL'\"
else
echo shar: Extracting \"'PATCHLEVEL'\" \(2 characters\)
sed "s/^X//" >'PATCHLEVEL' <<'END_OF_FILE'
X1
END_OF_FILE
if test 2 -ne `wc -c <'PATCHLEVEL'`; then
echo shar: \"'PATCHLEVEL'\" unpacked with wrong size!
fi
# end of 'PATCHLEVEL'
fi
if test -f 'README' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'README'\"
else
echo shar: Extracting \"'README'\" \(2400 characters\)
sed "s/^X//" >'README' <<'END_OF_FILE'
XThis is a cleaned-up reposting of the markov3 program. The following
Xchanges have been made:
X
XThe null pointer dereferencing bugs have been fixed (I hope).
X
XThe code that uses "rand" should now be portable (the patches posted
Xto the net to fix this problem were wrong, they break the code on
Xsome machines in order to fix it on others. I stole some code from
X"hack" to do things right. If hack works for you, this should).
X
Xmarkov3 now understands "notes" cruft (thanks to Rich Salz).
X
XBecause of the 50% rule in news 2.11, people often use some other
Xcharacter than ">" for inclusions. markov3 assumes that lines
Xbeginning with any of
X
X > < ) | # } ]
X
Xare inclusions (without this rule, funny-looking output results if
Xanyone uses non-standard "quoting").
X
XThe random number generator is initialized using the time, if neither
Xthe -s flag nor the new -x flag is given.
X
XThis will be the last complete posting; a "patchlevel" file is included
Xand I will send out patches if there are further bugs or improvements.
X
XHere's the original README.
X---------------------------------------------------------------------------
XI created a bit of a stir with this program in December 1986 when I
Xused an earlier version of it to simulate a certain well-known net
Xpersonality (Hi Laura!). It digests Usenet articles and spits out
Xother articles with similar characteristics. You need lex to run it,
Xbut otherwise it should run on any Unix I know of.
X
XI had several requests for the program but didn't consider it
X"ready". It's as ready as it will ever be now.
X
XThe program uses getopt(3). There are several public-domain versions
Xavailable for Berkeley systems from the mod.sources archives. Since
Xit's small, I've included Henry Spencer's version, but you'll have
Xto change the Makefile to use it.
X
XFor best results, feed it at least ten articles by the same person
Xor on the same subject. If there are fewer articles the output
Xresembles the original too much; if there is too much variety in
Xthe articles the output is more incoherent than it otherwise is.
X
XThe program requires lots of memory if it is given lots of input;
Xthe small-model people will have problems.
X
XPlease don't post the output to the net (though I'd be happy to
Xsee some of the more interesting results).
X
XSend comments, suggestions for improvement, fan mail, and flames
Xto me: {sun,hplabs,ames,ihnp4}!oliveb!epimass!jbuck.
END_OF_FILE
if test 2400 -ne `wc -c <'README'`; then
echo shar: \"'README'\" unpacked with wrong size!
fi
# end of 'README'
fi
if test -f 'markov3.6' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'markov3.6'\"
else
echo shar: Extracting \"'markov3.6'\" \(2503 characters\)
sed "s/^X//" >'markov3.6' <<'END_OF_FILE'
X.\" markov3
X.\" @(#)markov3.6 1.1 3/6/87 epimass!jbuck
X.TH MARKOV3 6 "3/6/87"
X.UC 4
X.SH NAME
Xmarkov3 \- Digest and spit out quasi-random Usenet articles
X.SH SYNOPSIS
X.B markov3
X[
X.B \-pv
X] [
X.B \-n
X.I n_articles
X] [
X.B \-d
X.I dumpfile
X] [
X.B \-s
X.I seed
X] [
X.B \-x
X]
Xfiles
X.SH DESCRIPTION
X.PP
X.I Markov3
Xdigests Usenet articles and builds an internal data structure that
Xmodels the articles as if they came from a random process, where
Xeach word is determined by the previous two. It then emits a series
Xof articles on the standard output that have the same distribution
Xof words, word pairs, and word triplets as do the input files.
XThe name
X.I markov3
Xcomes from the fact that this structure is called a Markov chain,
Xand that the statistics for word triplets are modeled.
XHere, a "word" is a sequence of printable characters surrounded by
Xwhitespace. Paragraph breaks (blank lines) are also treated as a
X"word". Paragraphs of included text are treated as single "words"
Xand printed as "> ...".
X.PP
XBy default, the program expects to be fed Usenet articles; it strips
Xoff headers, included text, and signatures (or at least it tries).
XThe
X.B \-p
X(plain) option disables the header-stripping feature (otherwise
Xeverything is skipped until a blank line is encountered).
X.PP
XBy default, 10 articles, separated by form feeds, are written on the
Xstandard output. The
X.B \-n
Xoption lets you specify a different number.
X.PP
XThe
X.B \-x
Xoption does not seed the random number generator; this is useful
Xfor simulating people who repeat themselves.
X.PP
XThe
X.B \-d
X(dump) option dumps a representation of the internal data structure
Xbuilt by
X.I markov3
Xon the named file.
X.PP
XFinally, the
X.B \-v
X(verbose)
Xoption prints some statistics on the standard error.
X.SH "CAVEATS"
XThis program allocates lots of memory if given large amounts of input.
XOn virtual memory systems, the paging behavior is atrocious because
Xpointers tend to point every which way, and many pointers are dereferenced
Xfor every word processed. This could be improved, I'm sure.
X.PP
XPosting articles generated by
X.I markov3
Xto the net may be hazardous to your health.
X.PP
XNot as smart as Mark V. Shaney.
X.SH "PORTABILITY"
XAn effort has been made to make this program as portable as possible;
Xan earlier version was much less portable because of problems with
Xnull pointers and rand(3). Please let me know if you have further problems.
X.PP
XIf you don't have lex, you'll need to rewrite the lexical analyzer
Xbut most of the program is in C.
END_OF_FILE
if test 2503 -ne `wc -c <'markov3.6'`; then
echo shar: \"'markov3.6'\" unpacked with wrong size!
fi
# end of 'markov3.6'
fi
if test -f 'markov3.l' -a "${1}" != "-c" ; then
echo shar: Will not clobber existing file \"'markov3.l'\"
else
echo shar: Extracting \"'markov3.l'\" \(11822 characters\)
sed "s/^X//" >'markov3.l' <<'END_OF_FILE'
X%{
X/*
X * Copyright (c) 1986, 1987 by Joe Buck
X *
X * Permission is granted to use, redistribute, or modify this program,
X * as long as you don't pretend you wrote it. Send improvements or
X * bug reports to {ihnp4,hplabs,ames,sun}!oliveb!epimass!jbuck.
X *
X * The program generates simulated Usenet articles, given Usenet articles
X * as input.
X *
X * This program constructs a table of frequencies for a token appearing,
X * given the two preceding tokens. A "token" is a sequence of non-blank
X * characters. An entirely blank line is also treated as a token, as is
X * the beginning and end of an article.
X *
X * The program is designed to process news articles, rejecting text from
X * the header, signature, and included text, together with cruft inserted
X * by rn and notes. A paragraph of included text is treated like a token.
X *
X * After the table is built (and it can be big), articles are generated
X * on the standard output.
X */
X#ifndef lint
Xstatic char *sccs_id = "@(#)markov3.l 1.1 3/6/87 epimass!jbuck";
X#endif
X#include <sys/types.h> /* for time_t */
Xint in_included_text = 0;
X#ifdef yywrap
X#undef yywrap
X#endif
X%}
X%Start HDR BODY SIG
X%%
X<HDR>^[^ \t]+:.*\n ; /* Header line, e.g. "From: [email protected]" */
X<HDR>^[ \t]+[^ \t].*\n ; /* Continuation of header line */
X<HDR>^[ \t]*$ BEGIN BODY;
X<BODY>^"-- "$ BEGIN SIG;
X<BODY>^[><)|#}].*\n { /* 50% rule gets people to change ">"
X to other characters; this gets most of them */
X if (!in_included_text) {
X in_included_text = 1;
X process_token ("\n> ...\n\n");
X }
X }
X<BODY>"]".*\n { /* should have been included in the above. My
X lex generates bad C code if I say [[><...]
X even though ed(1) says that's a valid regular
X expression. */
X if (!in_included_text) {
X in_included_text = 1;
X process_token ("\n> ...\n\n");
X }
X }
X<BODY>^"In article".*\n ; /* Reject rn crud */
X<BODY>^"/* Written".*"*/"\n ; /* Also NOTES crud */
X<BODY>^"/* End of text from".*"*/"\n ; /* NOTES */
X<BODY>^"/* ----------".*"----------*/"\n ; /* NOTES */
X<BODY>[ \t]+ ; /* Skip white space */
X<BODY>\n[ \t\n]*\n { process_token ("\n"); /* Paragraph break */}
X<BODY>^\..* ; /* Ignore format directives. */
X<BODY>[^ \t\n]+ { in_included_text = 0; process_token (yytext); }
X<HDR>. ; /* Eat anything that escaped */
X<HDR>\n ;
X<BODY>\n ;
X<SIG>. ;
X<SIG>\n ;
X%%
Xextern int optind;
Xextern char *optarg;
X
X/*
X * hashtab is a hash table storing all the tokens we encounter.
X */
Xstruct htentry {
X char *htext;
X struct htentry *hlink;
X};
X
X#define HSIZE 3557 /* Should be prime */
X#define Fprintf (void)fprintf
X#define Printf (void)printf
X
Xstruct htentry hashtab[HSIZE];
X
X/*
X * node and succnode are portions of the big structure we're going to build.
X * node represents something like ("was", "a") in a binary tree.
X * a linked list of succnodes contain tokens that may follow ("was", "a")
X */
Xstruct node {
X char *text;
X char *text2;
X int ocount;
X struct node *lc, *rc;
X struct succnode *succ;
X};
X
Xstruct succnode {
X struct node *scnod;
X int count;
X struct succnode *link;
X};
X
X
Xstruct node *prev_code = NULL;
Xchar *prev_token = NULL, **Argv;
Xint init_state = HDR;
Xint verbose = 0;
Xstruct node *root = NULL, *tknptr;
Xstruct succnode *start = NULL;
Xint n_pairs = 0, n_tokens = 0, n_files = 0, n_total = 0;
X
Xstruct node *insert_token();
Xchar *savetoken();
X
Xprocess_token (txt)
Xchar *txt;
X{
X struct node *code;
X char *token = savetoken (txt);
X/* We have a new token. Say the previous two tokens were "one" "way"
X * and the current token is "to". Then prev_code points to a node
X * for ("one", "way") and token is "to". This function adds ("way", "to") as a
X * successor to ("one","way") and makes prev_code point to ("way","to").
X */
X code = insert_token (prev_token, token);
X insert_pair (prev_code, code);
X prev_code = code;
X prev_token = token;
X return;
X}
X
X/*
X * here it is, the main function.
X */
Xmain (argc, argv)
Xint argc;
Xchar **argv;
X{
X int i, c, n_articles = 10, sflag = 0;
X char *dumpfile = NULL;
X extern int optind;
X extern char *optarg;
X
X while ((c = getopt (argc, argv, "pxvn:d:s:")) != EOF) {
X switch (c) {
X case 'v':
X verbose = 1;
X break;
X case 'p': /* Input is plain text, not Usenet stuff */
X init_state = BODY;
X break;
X case 'n': /* # articles to generate */
X n_articles = atoi (optarg);
X break;
X case 'd': /* where to dump the data structure */
X dumpfile = optarg;
X break;
X case 's': /* Set the seed for rand; fall through */
X srand (atoi (optarg));
X case 'x': /* set flag to prevent srand */
X sflag++;
X break;
X default:
X Fprintf (stderr,
X "Usage: markov3 [-pvx] [-s seed] [-n n_art] [-d dump] files\n");
X exit (1);
X }
X }
X BEGIN init_state; /* initial state of lexical analyzer */
X if (!sflag) /* set random number generator */
X srand ((int)time ((time_t *)0));
X/* Note: if optind == argc, there are no file arguments. yyin is left
X * initialized to stdin.
X */
X if (optind < argc) {
X/* yyin is lex input stream. Point to first file. */
X if ((yyin = fopen (argv[optind], "r")) == NULL) {
X perror (argv[optind]);
X exit (1);
X }
X optind++; /* skip to next file */
X }
X Argv = argv; /* make it global so yywrap can access it */
X n_files = 1;
X/* yylex puts all the input files through the lexical analyzer and builds
X * the database.
X */
X (void) yylex ();
X if (dumpfile)
X dump_database (dumpfile);
X if (verbose)
X Fprintf (stderr,
X "Total of %d tokens (%d different), %d different pairs, %d files\n",
X n_total, n_tokens, n_pairs, n_files);
X/* Generate the articles, separated by form feeds */
X for (i = 0; i < n_articles; i++) {
X if (i > 0) output_word ("\n\f\n");
X generate_article ();
X }
X return 0;
X}
X
X/*
X * Lex calls this when EOF is reached. It opens the next file if there
X * is one. Lex interprets a return value of 1 to mean "all done" and 0
X * to mean "keep going".
X */
Xyywrap () {
X (void) fclose (yyin);
X insert_pair (prev_code, (struct node *)0);
X prev_code = NULL;
X if (Argv[optind] == NULL) return 1;
X else if ((yyin = fopen (Argv[optind], "r")) == NULL) {
X perror (Argv[optind]);
X exit (1);
X }
X optind++;
X in_included_text = 0;
X if (verbose && n_files % 10 == 0)
X Fprintf (stderr, "%d files\n", n_files);
X n_files++;
X BEGIN init_state;
X return 0;
X}
X
X/*
X * This function saves a token in the hash table (if it isn't there
X * already) and returns a pointer to the stored copy.
X */
Xchar *
Xsavetoken (txt)
Xchar *txt;
X{
X int h;
X char *p;
X struct htentry *hp;
X
X n_total++;
X for (p = txt, h = 0; *p; h += *p++);
X hp = hashtab + (h % HSIZE);
X while (hp->hlink) {
X if (strcmp (hp->htext, txt) == 0) {
X return hp->htext;
X }
X hp = hp->hlink;
X }
X/* OK, it's a new token. Make hp->hlink point to a new,
X * null block and make hp->htext point to the text.
X */
X hp->hlink = (struct htentry *) malloc (sizeof *hp);
X hp->htext = malloc ((unsigned)(strlen (txt) + 1));
X (void) strcpy (hp->htext, txt);
X hp->hlink->hlink = NULL;
X hp->hlink->htext = NULL;
X n_tokens++;
X return hp->htext;
X}
X
X/*
X * This recursive function inserts a token pair into the tree.
X */
Xstruct node *
Xinsert_in_tree (p, txt, txt2)
Xstruct node *p;
Xchar *txt, *txt2;
X{
X int cmp;
X if (p == NULL) {
X/* Create a new node. */
X p = (struct node *) malloc (sizeof *p);
X p->text = txt;
X p->text2 = txt2;
X p->lc = p->rc = NULL;
X p->succ = NULL;
X p->ocount = 1;
X tknptr = p;
X n_pairs++;
X if (verbose && n_pairs % 1000 == 0)
X Fprintf (stderr, "%d pairs\n", n_pairs);
X return p;
X }
X cmp = my_strcmp (p->text, txt);
X if (cmp == 0) cmp = my_strcmp (p->text2, txt2);
X if (cmp == 0) {
X/* It's a match. Increment the count. */
X tknptr = p;
X p->ocount += 1;
X }
X/* Look in the subtrees. */
X else if (cmp < 0) p->lc = insert_in_tree (p->lc, txt, txt2);
X else p->rc = insert_in_tree (p->rc, txt, txt2);
X return p;
X}
X
X/*
X * This just calls insert_in_tree starting at the root
X */
Xstruct node *
Xinsert_token (txt, txt2)
Xchar *txt,*txt2;
X{
X root = insert_in_tree (root, txt, txt2);
X return tknptr;
X}
X
X/*
X * This function adds a successor.
X */
Xstruct succnode *
Xinsert_in_succ_chain (sp, np)
Xstruct succnode *sp;
Xstruct node *np;
X{
X if (sp == NULL) {
X sp = (struct succnode *) malloc (sizeof *sp);
X sp->scnod = np;
X sp->count = 1;
X sp->link = NULL;
X }
X else if (sp->scnod == np)
X sp->count += 1;
X else sp->link = insert_in_succ_chain (sp->link, np);
X return sp;
X}
X
X/*
X * This calls insert_in_succ_chain starting at the right place.
X */
Xinsert_pair (p1, p2)
Xstruct node *p1, *p2;
X{
X if (p1) p1->succ = insert_in_succ_chain (p1->succ, p2);
X else start = insert_in_succ_chain (start, p2);
X}
X
X/*
X * This function dumps the stored data structure onto a file.
X * Now if only I had a function to read it back in.
X */
Xchar *
Xpr_token (txt)
Xchar *txt;
X{
X if (txt[0] != '\n')
X return txt;
X return txt[1] ? "<INCL>" : "<LF>";
X}
X
Xtreedump (tree, fp)
Xstruct node *tree;
XFILE *fp;
X{
X if (tree) {
X treedump (tree->rc, fp);
X Fprintf (fp, "( %s %s ) %d", pr_token (tree->text),
X pr_token (tree->text2), tree->ocount);
X chaindump (tree->succ, fp);
X treedump (tree->lc, fp);
X }
X}
X
X/*
X * Subroutine of treedump; it does one row.
X */
Xchaindump (p, fp)
Xstruct succnode *p;
XFILE *fp;
X{
X char *text;
X while (p) {
X if (p->scnod == NULL)
X text = "<EOF>";
X else text = pr_token (p->scnod->text2);
X Fprintf (fp, " %s %d", text, p->count);
X p = p->link;
X }
X putc ('\n', fp);
X}
X
X/*
X * This routine generates the dump file (-d option)
X */
Xdump_database (file)
Xchar *file;
X{
X FILE *fp = fopen (file, "w");
X if (fp == NULL) {
X Fprintf (stderr, "markov: can't open ");
X perror (file);
X exit (1);
X }
X Fprintf (fp, "START:");
X chaindump (start, fp);
X treedump (root, fp);
X}
X
X/* roll (n) generates a uniformly distributed rv between 0 and n-1.
X * This code is stolen from "hack" and should be portable. If you
X * change this, remember that different systems have rand functions
X * with different ranges, and the bottom bits are often no good.
X */
X#define roll(n) ((rand() >> 3) % n)
X
X/*
X * This function generates an article by traversing the
X * structure we've built.
X */
Xgenerate_article () {
X struct succnode *p = start;
X int ncounts = n_files;
X int n, accum;
X char *tp;
X
X while (1) {
X/* Roll the dice to find out the next token. The code below selects the
X * next token, and the new state, with a probability corresponding to the
X * frequency in the input.
X */
X n = roll (ncounts);
X accum = p->count;
X while (accum <= n && p->link) {
X p = p->link;
X accum += p->count;
X }
X if (p->scnod == NULL)
X break;
X tp = p->scnod->text2;
X/* Check for "end of story" */
X if (tp == NULL)
X break;
X output_word (tp);
X ncounts = p->scnod->ocount;
X p = p->scnod->succ;
X }
X output_word ("\n"); /* This will flush the buffer as well. */
X return;
X}
X
X/*
X * This version handles null strings *
X */
Xmy_strcmp (a, b)
Xregister char *a, *b;
X{
X if (a == NULL) return b ? -1 : 0;
X if (b == NULL) return 1;
X return strcmp (a, b);
X}
X
X#define LEN 75
Xoutput_word (word)
Xchar *word;
X{
X static char line[LEN+1];
X static int room = LEN;
X int l;
X
X if (word == NULL) return;
X l = strlen (word);
X/* If word won't fit, or starts with \n, dump the current line */
X if ((l >= room || word[0] == '\n') && line[0]) {
X Printf ("%s\n", line);
X line[0] = 0;
X room = LEN;
X }
X/* If word won't fit in the buffer or starts with \n, print it now */
X if (l >= LEN)
X Printf ("%s\n", word);
X else if (word[0] == '\n')
X Printf ("%s", word);
X/* Otherwise fill it in */
X else {
X (void)strcat (line, word);
X (void)strcat (line, " ");
X room -= (l + 1);
X }
X return;
X}
END_OF_FILE
if test 11822 -ne `wc -c <'markov3.l'`; then
echo shar: \"'markov3.l'\" unpacked with wrong size!
fi
# end of 'markov3.l'
fi
echo shar: End of archive.
exit 0