Give your Jupyter notebook super powers

data chimp is a programmable data analysis assistant that automatically shows contextual data visualizations, tables, and messages as you work in your notebook. It helps you spot unexpected features in your data, get oriented in a new data set quickly, or to enforce best practices on your team.

Want an email when the beta is ready?

Demo video: 7 minutes

Why?

While data notebooks were an incredible step forward for data work, they still aren't good enough. Their limitations are the reason we spend over half our time on basic "data janitor" work, the reason it's so easy to make mistakes in our analyses, the reason we have to jump between different pieces of the so-called "modern data stack" to get our work done. In the next few paragraphs, I'll summarize the above demo video and try to convince you that we should expect more from our data tools.

We should be able to see more with less code.

Data analysis is intensely visual and we often need to see multiple things at once. So, why does every data notebook only let me see one result at a time? With data chimp, you can loop a previous result and see it along side the cell your currently working in:

You can also see results of executing code that runs based on configurable rules (more on that in a second). Here we are looking at a looped result, the result of executing a single cell, and automated code that shows scatter plots for all combinations of numeric columns I've selected in the main cell:

Visualizations should be context- and code-aware.

Speaking of scatterplots, why are we still making dumb, static visualizations? data chimp gives your ordinary matplotlib visualizations knowledge of the code that created them and an understanding of their relationship to other data results. You can send the code that generated a visualization into a cell and tweak it or hover over a scatterplot point to see the corresponding row in a table:

Data scientists need automated tests too.

The scatterplots shown above are based on a simple rule, but you can create more complicated rules like "When I execute a data frame containing columns that have greater than 3% missing values, warn me." These rules an overdue application of an idea from software engineering: instead of relying on the data scientist to manually check every corner of the data, we should have automated checks that do this for us while we are working.

Our "notebook" should be trivially customizable/hackable.

These rules can be shared and combined into applets. Applets can also execute code on a schedule, access code within cells, use shared functions, and persist their own data frames. This makes it possible to hack on data chimp and build and your own data solutions without being a fullstack developer. In this example, we're building a data quality checker that regularly checks data for typos and shows an error message within data chimp if it detects them:

We need a better way to share code.

These applets are shared with all data chimp users, you can star them, discuss them, and clone them into your workspace. Github wasn't designed for data scientists. It enables code sharing via software packages, which is the wrong unit of sharing for data scientists.

discover

This is just the beginning of how data chimp is re-imagining the data notebook. We're doing this because fragmented, error-prone, tedious data analysis stands in the way of realizing the promise of data, the promise that we can make decisions and build machine learning models that transform our lives and businesses.

If that sounds exciting, then sign up for our beta above.