How I Created Lemonade from Lemons

BackGround

For my final project this semester in data organization I was tasked with retrieving data and creating information from the data. To better understand my data and information, you need to understand the scenario that was created for the data. Imagine you are a CEO of a tech firm that wants to better understand users and your applications. So you gather data about people like, who they are, gender, country, the app they purchased, how much they have spent in the app and how frequently they use the app. (All of the data was created with Mockaroo.com) Once all the data has been collected you ask someone (me!) in the company to make sense of the data to show at a conference about our company.

What I Did

The very first thing I tried was doing a query about gender. It seemed like the easiest thing for me to do because I somewhat understood queries and I didn’t think I could mess it up. I was wrong, I had to format my query about five different ways until I got what I wanted and no longer had an error message.

=QUERY(Data!C2:D,“select C,count(D) where C <> ” group by C label C ‘Gender’, count(D) ‘Total’ “ )

Above is my query and once I established this query I created similar queries below.

=QUERY(Data!E2:H,“select E, sum(H) where E<>” group by E label E ‘Country’, sum(H) ‘Total Amount of Purchases'”)

=QUERY(Data!D2:E,“select D, count(E) where (D<>”)group by D label D ‘Application’, count(E) ‘Total Frequent Users'”)

=QUERY(Data!D2:I,“select I, count(D) where (I<>”)group by I label I ‘Frequency’, count(D) ‘Count’ “)

=QUERY(Data!A2:I,“select C,sum(H) where C <> ” group by C label C ‘Gender’, sum(H) ‘Total Purchased’ “ )

These queries worked fine until I wanted queries to return data related to certain countries or genders. Understanding how to tell the query, I only wanted to see apps and the number of people using them in certain countries took me forever. I looked on google’s website on how to create a query and saw that in the “where” clause of my query I could put “matches (insert country),” this didn’t work either. I was not sure how to get my query to work so, I started doing it piece by piece until the information I wanted back came back. I figured out that my where clause just needed an “=” when listing the specific country or gender.

=QUERY(Data!C2:I,“select D, count(C) where C=’Female’ group by D label D ‘App’, count(C) ‘Female Users’ “) 

The query above was also used to query about males.

=QUERY(Data!D2:G,“select D, count(E) where (E=’Spain’)group by D label D ‘Credit Card Type’, count(E) ‘Spanish Users'”)

The query above was used for each country that was in the data, which included the United States, United Kingdom, Spain, Germany, Mexico, Italy, Canada, France.

=QUERY(Data!D2:I,“select I, count(D) where (D=’Weather’)group by I label I ‘Weather’, count(D) ‘Users'”)

Like the other queries above this query was tweaked for each individual application, which included: weather, calendar, email, Facebook, Instagram, Messages, Music, Phone, photos, wallet, and Twitter.

Once I created my queries, I was able to create a lot of information about the data. When the data had been queried, and the information was retrieved, it wasn’t easy to read all of the information across the spreadsheets. Charts made the information easy to understand and simple to read. I made one big chart for which app brought in the most money and a dashboard of charts about other information I queried. Each country had a page with charts on it illustrating the data from those countries. The charts were not difficult to create, but I think for next time I would make a better plan of what I wanted to make into charts. When I started the project, I had an idea of what kind of charts and queries I wanted to do, but as I became more involved in the project I wanted to illustrate more and more from the data that I had not initially planned to do.

All of the queries listed above I chose because I felt they created the best possible picture about the data. I have queries about the individual applications, individual countries, purchases, usage, and a demographic breakdown. These queries helped to illustrate what people would want to know from this information and more. I chose these queries because they created the picture of the data, not because they were easy or simple, but because they illustrated the data the best and truly created information.

Next time

The next time I have to do something similar I need to create a more detailed plan. I had a plan in place, and I followed it but added more information and charts than I originally planned. Another thing that I will do next time is to build a template of queries, instead of trying to re-do a similar query over and over and over. If I create a template, then I can create more information faster and look to break down that information to create more information. Once I did figure out that replicating the queries worked and that having a template helped me create information faster. I will do it again in the future so I can better understand other data. Finally, the last thing I learned and will do next time is to be patient when trying to fix an error. It would frustrate me almost every time I did my query wrong, and an error would pop up. Once I worked through my query patiently I was able to figure out what was wrong with it, I usually was able to fix the problem.