For data notebooks, functions aren't enough (and what to do about it)
Thu Jan 12 2023
While working in a data notebook a few months back, I was frustrated because I found that no matter what I did, I still found myself typing the same code repetitively or worrying about making mistakes while writing data wrangling and model training code. If you're building an application or library, the solution to both of these problems is simple: write a function that does the repetitive thing and write more tests. That doesn't work for programmatic data work. Below I explain why and highlight how a solution using python's ast module and ipython's event callbacks and profiles does the job better than ordinary functions and tests. Read more
VSCode for data science
Thu Dec 08 2022
There are several well-known challenges with using Jupyter notebooks for data science on a team, but many people don't know that VSCode addresses many of those shortcomings via its free Jupyter and Live Share plugins, its new devcontainer functionality, and its paid CodeSpaces offering. Read more
The code that ChatGPT can't write
Wed Dec 07 2022
ChatGPT is game-changing, and, more generally, language models may be the most important dev tool we see in our generation. (It takes some humility to admit this, as we're working on a dev tool for data scientists.) But neither ChatGPT nor some larger descendent will ever be able to write all of our code from natural language descriptions of desired functionality. The rest of this post argue for this claim, drawing on observations from Fred Brooks' "No Silver Bullet" and Eric Evans' _Domain Driven Design_. Read more
Intro to Metaprogramming for R
Thu Aug 18 2022
Metaprogramming is a generally useful tool to solve certain kinds of programming and data analysis problems. This talk will cover some important applications of metaprogramming to dplyr and highlight how R's metaprogramming makes working with tibbles and dplyr more ergonomic than working with pandas data frames. We'll also get a high-level overview of the important APIs used for R metaprogramming and work through an example metaprogramming function together. Read more
Window and Pane Management Tricks for RStudio and your OS
Mon Aug 08 2022
Learning the hotkeys for window management within my OS and pane management within RStudio have been particularly hepful, and in this post, I'd like to share those hotkeys. Learn them! Then you can feel the joy of flying through previously slow, mouse-based tasks and getting back to what we all love: analyzing data. Read more
European Flight Data Analysis w/ data chimp (for Tidy Tuesday 2022-07-12)
Tue Jul 12 2022
I used data chimp to help me analyze european flight data as a part of the Tidy Tuesday challenge. This screencast is just 20 minutes of me wrangling data and coding up some visualizations with data chimp's help. Check it out if you want to see data chimp in action. Read more