Tools
I primarily work in Haskell, a statically-typed, pure, functional programming language, with the occasional foray into imperative programming through Rust when computational speed is needed.
The tools provided here are of general use for research in the social sciences. Please contact me should you think of any features that would be helpful in your work.
yenta
A fast multi-core fuzzy name matcher for CSV files.
yenta allows users to dynamically combine numerous text matching algorithms easily and output multiple possible matches. Written in Rust,
yenta emphasizes throughput and is capable of processing about 1 million fuzzy matches per minute on a 2019 8-core desktop. Documentation is available on the
Github wiki.
edgar
A command line utility to locally index and download filings
from the SEC Edgar database.
edgar is multithreaded, allowing
for non-blocking simultaneous downloads. Form downloading is query based allowing
researchers to request specific CIKs, company names, form-types, and date ranges (or combinations thereof).
vandelay
Export empirical specification results to LaTeX. Designed to be
quick, easy, and powerful. Never spend days formatting tables again
(in principle). Full documentation is forthcoming.
gramme
A NLP tool for grammatical extraction of data from text. GrammE
implements an embedded domain specific language that allows researchers
to quickly and flexibly implement exhaustive sub-graph searches
of NLP dependencies data. Please email me for details and access.