The Big Takeover Band : 12+ Years of Data

I’ve been playing around a lot lately with the Spotify API, but I was wondering what other kind of musical data is out there. Was there any live show data I could play around with? So I reached out to my former band mates in The Big Takeover and asked them if they had anything. Turns out the excel spreadsheet I had started 12+ years ago was still going strong, full of years worth of shows, pay, attendance, etc… And boy was it a mess (almost as messy as the band van)! For this project I wanted to see what kind of value I could create for them – if I could clean it up well enough first!


“Tourcation” – The Big Takeover cleaning up the van… somewhere in the Pacific Northwest circa Summer 2018

Step 1 Data Cleaning

Like I said, the raw data spreadsheet was a mess. The dates were in different formats, the names of cities were misspelled and contained multiple versions of themselves, city names weren’t actual cities (nicknames). There were missing commas, extraneous commas, spaces here, spaces there, spaces just about everywhere. The numerical columns contained info text despite there being a separate info column which would then be null. Speaking of nulls – yup, a whole bunch of those too… I really could go on here. I had some serious work to do. 

I tried to find a balance of using python and pandas to speed things up versus just manually changing things in the original file. For the most part I found that relying on python and pandas was a better bet since my eye often missed some errors at first glance. 

Formatting the dates and monetary columns was straightforward, using the .replace() method to shape things up. I also ended up using this method with the city and venue names a few times, with a little help from the fuzzywuzzy package. This would let me set a ratio on which to match strings and return those over my threshold. I found that if the strings had a 90% match or higher they were probably the same place, just written differently. With only two nulls in the City column these were easy to discern based off of the previous and post gigs. For null values in the pay and attendance columns I decided to just impute with 0. For pay this often meant that they hadn’t gotten paid just yet, and for attendance it most likely was that they haven’t gotten around to inputting the values yet. A quick phone call could shed some light in that column. 

I made sure to save the dataframe and set it off to them to save us all some time in the future. Hopeful we can update this list every year or so. 

Step 2 EDA & Visualizations

I figured some basic bar charts could be insightful to see which venues they play at the most and to see their most fruitful gigs in terms of revenue. One problem I ran across here was that they got paid a lot at the same venues, but those venues were only showing up as one tick mark on the axis. To get around this, I graphed using their index, rather than their name. This would provide me with independent instances of the band playing at the same venue and allow me to graph the same venue name multiple times.


To sum up their total pay and attendance was a breeze.

Next I thought it would be nice to see how pay and attendance has changed for them over the years at some of the venues that they frequent the most. I made one large chart for this, but made sure to keep the x-axes independent from one another, since they have been playing some venues for longer than others. It was easy to see here not only the trends but when and where they had their record release parties, which always draw the biggest crowds. Benefit shows were also easy to spot

.

Over the Years

Now it was time to break everything down by year. I thought that bar charts were the simplest way to sum everything up, so get ready for a lot of bar charts. First the totals per year, then the averages per year. We can definitely see some trends emerging here. I see that they have been playing a lot of shows steadily through the years, and their total pay and average pay is on a steady increase

.

Month to Month

Next up I broke it down by month for the same features: shows, pay, and attendance. First the totals then the averages. Again some trends emerge. More shows , more pay , higher pay and more attendance over the summer months. Not too surprising there (winters in the North East are brutal!) Note the spike in attendance for festival season.


Maps Maps and More Maps!

Understanding how important and hard it is to break into new markets for a band, I wanted next to visualize their data state by state. And what better way to do that than a map! I used plotly since they have a lot of formatting options, and also some extras that you can sign up for (more on that later). The dataset already had the city and state, so it was just a matter of putting them in separate columns. Here’s the state by state choropleth map for pay and number of shows.

Explore the interactive map below here: https://sam-brady.github.io/bigtakeover-shows/state_map_pay.html



Next I wanted to see everything broken down by county. It was a bit more challenging to do this since plotly now relies on a FIPS code. To access the code with only the CITY and STATE available, I had to create a workaround. Two packages that helped me were the uszipcodepackage and the addfips package (follow the links at the bottom for more). 

Explore the interactive map below here: https://sam-brady.github.io/bigtakeover-shows/county_map_pay.html



While grabbing the FIPS code I also took this opportunity to grab the latitude and longitude. I thought it would be interesting to try to make a pin style map. Plotly has a Mapbox extension that allows you to import maps that have cool styles, you just need to set up a free account. Here is is the Pin Map style map of everywhere the Big Takeover has played.

Explore the interactive map below here:  https://sam-brady.github.io/bigtakeover-shows/pin_map.html


Get in touch at:       mr.sam.tritto@gmail.com