I have been thinking a lot about the previously mentioned "Data Analysis Project Management: SaaS".
Such a system could be focused on the project management side, say a Basecamp for Data Analysis projects. But I think it could something different, something more. I think there are pain points in the data analysis workflow that could be 1) systematized and 2) automated. I also suspect that large data platforms from IBM and Oracle may offer solutions, but I question whether the any such solutions exist for smaller scale/cost projects. I think there is an opportunity for a Hosted Data Analysis Project Workflow system.
I've been trying to think hard about possible pain points, and here's what I got:
- Revision control - keeping track of changes to project files.
- Research Journal - keeping track of what questions have been asked and what findings have been made.
- Reproducibility - ensuring there is a recipe to recreate a past result.
- Collaboration - working with others on a particular on all aspects of the analysis workflow.
- Interpreting Data - Looking at tables and graphs and generating hypotheses to go and test.
- Executing Models - configuring, running, tuning models.
- Verifying Models - assessing models on test and verification datasets.
- Blending Models - comparing and combining model outputs.
- Many Eyes - Collaborate on interpretation from visualization.
- Google Predict - Hosted models and data
- Github (and similar) - Hosted project files, forking, collaboration
- Kaggle, TunedIT - Data competitions, competitive community around problems, Data Spec Work / R&D Outsourcing.
- Cross Validated - Q/A community around statistics and machine learning
- Google data explorer - Visualization and interpretation public datasets


