tree.c

tree

This is a memory-efficient dictionary tree with support for wildcards. Source is 11K, header is 2.5K (mostly documentation), which compiles down to about 4.5K of object code. Included in the tarball is a testing program (check the source to see what it does). This version supports substrings and removes the implicit wildcard at the end of each string.

In terms of memory usage, it is very easy to have it actually use less memory than just a flat file:
1409379ASCII flat file
1032633s[1] without http:// with bytecount
1032652s[1] with http:// with bytecount
866443s[0] with http:// with bytecount
960120s[0] with http:// with TNSIZE()
The TNSIZE() reading is much more accurate than the others, but still does not take into account malloc() overhead, so I don't know the true figure. Such a figure would also vary greatly (including possibly lower) if the program using the tree also used malloc().

Speed appears to be quite good. I tested it by having it match its original input data file, once per line, and it took 4.32 usec per match on average. There is a read-write lock on each tree (currently just emulated with an int) so that many readers can traverse the tree, but allow for a writer to change it while new readers wait. The #defines in tree.c should be replaced with the appropriate calls to your thread library's read-write lock functions.

There is now support for storing an arbitrary non-structure data type along with each string, which is returned by tree_match (0 is returned on no match, so you are not allowed to add anything with a value of 0).

ftree

In addition to all the stuff above, there is a version of tree which uses an mmap'd file instead of malloc(), which allows for persistent storage and multi-process update. Synchronization is handled via flock(). Differences of ftree from tree:

The dumpftree utility will dump a node-by-node listing of an ftree, so that you can get an idea as to how the file is layed out, and how segmented it is. I may make an optimizeftree utility at some point, if there's enough demand for it.

To Do


Questions? Concerns? Bugfixes? Please submit feedback to mike@pedantic.org.