The first step in creating a docset is to create the directory layout, adding a propery list information file, and creating a sqlite3 database.
The next step is to crawl the NIST webpage (which is fast, your mileage may vary depending on the sites you want to scrape). I couldn’t find an HTML zip file.
Note how I use the -k flag for wget to rewrite the HTML links.
Finally, we need to extract the titles of the downloaded HTML pages to extract the name of the entry, and add it to the sqlite3 index. I use pup to parse the html. It is a very nice library that gives you a command line jquery style interface.
This is the index.sh script I use, which needs to be run from inside nist.docset/Contents/Resources/Documents.
Here is the loop I use to construct all the entries:
Once the database has been created, we need to add a nice icon to differentiate the docset from other docsets. I use a small mathematica script to create a multiresolution TIFF icon.
Just run make-icon.m icon.tiff from inside nist.docset.
There you go, a nice docset for the NIST dictionary of data structures, ready to be imported into dash.