Manuel Odendahl bio photo

Manuel Odendahl



I came across the NIST Dictionary of Algorithms and Data Structures today, and decided to create a Dash docset for it. The steps for creating a docset are very simple, and I hope this blog post makes it easier for people to bundle standard webpages.

The first step in creating a docset is to create the directory layout, adding a propery list information file, and creating a sqlite3 database.

$ mkdir -p nist.docset/Contents/Resources/Documents/

$ cat > nist.docset/Contents/Info.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "">
<plist version="1.0">

$ sqlite3 docSet.dsidx 'CREATE TABLE searchIndex(id INTEGER PRIMARY KEY, name TEXT, type TEXT, path TEXT);'
$ sqlite3 docSet.dsidx 'CREATE UNIQUE INDEX anchor ON searchIndex (name, type, path);'


The next step is to crawl the NIST webpage (which is fast, your mileage may vary depending on the sites you want to scrape). I couldn’t find an HTML zip file.

Note how I use the -k flag for wget to rewrite the HTML links.

$ wget -r -k -c -np
$ # more the HTML files into the docset
$ mv* nist.docset/Contents/Resources/Documents/

Finally, we need to extract the titles of the downloaded HTML pages to extract the name of the entry, and add it to the sqlite3 index. I use pup to parse the html. It is a very nice library that gives you a command line jquery style interface.

This is the script I use, which needs to be run from inside nist.docset/Contents/Resources/Documents.


index () {
    NAME=$(pup 'title text{}' < "$FILE" | sed -e "s/\'/\'\'/g")

    sqlite3 ../docSet.dsidx "INSERT OR IGNORE INTO searchIndex(name, type, path) VALUES ('$NAME', 'Entry', '$FILE');" ||
        echo sqlite3 docSet.dsidx "INSERT OR IGNORE INTO searchIndex(name, type, path) VALUES ('$NAME', 'Entry', '$FILE');"

index "$1"

Here is the loop I use to construct all the entries:

$ for i in HTML/*; do ../ "$i"; done

Once the database has been created, we need to add a nice icon to differentiate the docset from other docsets. I use a small mathematica script to create a multiresolution TIFF icon.

#!/usr/local/bin/MathematicaScript -script

filename = First@Rest[$ScriptCommandLine];

blackRect =
 Rasterize[Rectangle[], Background -> Black, ImageSize -> {64, 64}];

nist = Import[

smallNist = ImageResize[nist, 64];
smallN = ImageResize[ImageTake[nist, {70, 200}, {15, 190}], 64];

logo = ImageCompose[blackRect, smallN];

Export[filename, {ImageResize[logo, 16, Padding -> Black],
  ImageResize[logo, 32, Padding -> Black]}];

Just run make-icon.m icon.tiff from inside nist.docset.

There you go, a nice docset for the NIST dictionary of data structures, ready to be imported into dash.