Summary (AI generated)

Archived original version »

This article explores porting Rob Pike’s minimalist yet elegant regular expression (regex) matcher from C to Go, inspired by Brian Kernighan’s analysis of Pike’s concise 100-line implementation. The original C code uses recursion and is efficient for basic regex tasks but lacks advanced features like Unicode support or optimizations found in tools like GNU Grep.

The author translates this into a ~50-line Go version while preserving the recursive structure, achieving comparable performance to the C counterpart (2.18s vs 2.24s on a benchmark). Surprisingly, Go’s standard regexp package—though more feature-rich—is slower (1.17s) due to its non-recursive approach and handling of Unicode/edge cases. GNU Grep remains faster (~0.67s), leveraging optimized techniques like bitmaps rather than backtracking.

A bonus section introduces a 28-line Go glob matcher for * and ? wildcards, demonstrating similar iterative/recursion-based logic. The article highlights Pike’s code as instructive for its simplicity and efficiency but notes limitations (e.g., poor handling of Unicode). It also touches on alternatives like Russ Cox’s “Thompson NFA” method for faster matching in complex cases.

Key takeaways:

  • Simple regex engines can be efficient with clever algorithms.

  • Trade-offs between brevity, performance, and feature coverage exist.

  • The port underscores Go’s ability to mirror C-like efficiency in certain scenarios.

The piece serves as both a technical deep dive and an homage to concise, effective programming—ideal for learning regex fundamentals while acknowledging real-world constraints.