Monday, April 4, 2011

Automatically Generate Regular Expressions (regex)

I had a thought about two weeks ago that I really wanted a web app that would generate regular expressions for me. I was lazy and had spent some time hacking expressions together for a script I was working on.

I wrote some notes and posted them to a gaggle of programmers and engineers I email regularly. Something like:

A programmer normally google’s for examples of their problem or similar problems online, copies the code and adapts it for their needs. There are websites dedicated to discussing programming problems, and websites dedicated to listing solutions to common problems. The objective of this system is to write a new program for the user from scratch that exactly addresses their needs. 
A user defines some sample input and some sample output. The system examines the data and generates a program for the user to perform the transform. This program can be downloaded, executed in place, or used as a service. The more information provided by the user, the better the robot programmer can do its job. The robot programmer may not get it right the first time, providing an opportunity for the user to work with the robot to better define the problem and tinker with candidate programs that may be used as the basis for the robots second and subsequent tries.
I abstracted the problem and proposed a generic "robot programmer" or "robot coder" that would compute arbitrary transformations based on sample input and output. I pitched both the general case and the specific regex case and got a variety of responses and some good technical discussions.

Using Heroku, I hacked together a prototype of the interface and sent it around for comment. The following is a snap shot of the regex mock:
The following is a snap shot of the general case mock, here the example requires a program to print the input as well as all the input summed together.
Ideally, the service would be free for the simple case, and cost a dollar or two for a more advanced case. The free case would allow a large corpus of examples to be collected and the system refined, and the complex case would provide useful pay-only features such as: save, private, download in preferred language, and pair-program with the system. This last case would be the truly useful incarnation of the system where one could modify the expressions or programs and I/O and iterate the solver until a desired solution was achieved.

I still think it is a good idea, but I just couldn't think of a good way to monetize it. It is an interesting technical challenge and the service would make an interesting novelty, but it is not an earner and I'd rather hack on other things for now.

I have some designs and small technical prototypes for the ensemble of inductive automatic programming solvers and heuristic solvers that would be needed, but I'm keeping the details to myself for now, just in case the idea gets some traction or I can think of a better way to make a few bucks from it (Excel plug-in?).

Sure the general problem is intractable, but the specific case is results-driven. It only has to be good enough for the users needs - a monkey patch for whatever data conversion or scripting they happen to be doing at the time. The subset of actual problems that people have will be much smaller than all of possibility space, and the web is a fertile ground for seed and test instances.

Anyway, if anyone has a burning desire for either or both of these services to exist or has some further ideas, drop me a comment or an email.

1 comments:

Jason said...

Some related approaches:

* Rubular
* txt2re