The current UNIX® text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the `shape' of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell pipelines
Rudel is a collaborative editing environment for GNU Emacs. It supports multiple backends to enable communication with other collaborative editors using different protocols (most notably Gobby).
Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads structured records in a variety of input formats (eg. email, XML, MARC) and allows access to them through exact boolean search expressions and relevance-ranked free-text queries.
Pyndexter (pronounced 'poindexter') is an abstraction layer for full-text indexing engines. It presents a uniform query syntax to the user, includes a basic but functional pure-Python indexer, and has adapters for Hype, Hyperestraier, Lucene, Lupy, Pyndex, Swish-e and Xapian
Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.