Succeeding as the first data hire with David Shore, Principal at Snowpack Data

Welcome to the Canvas Podcast, where we bring data and business leaders together to discuss making data easier for everyone. Today I'm super excited to have David Shore, my friend from Flexport and now a founding partner at a full-stack analytics consultancy called Snowpack Data.

David, want to tell us about yourself?

So I have been in the analytics space for over eight years now. I started working for some small startups when I was back in school and have been in the startup world ever since.

I was essentially the first data hire at a couple of companies that were sub hundred people and then joined Flexport and was there for over four years as they scaled. And then, from those learnings across all those companies, I realized there was a big need to implement analytics infrastructure at smaller companies.

There's a lot of need for refactoring and especially in the economic environment that we're in. There's an opportunity to help businesses without hiring full-time people, so I founded Snowpack Data with other former Flexport data folks.

Tell us about the companies that you joined as the first data hire. How did you approach onboarding?

When I graduated and moved to San Francisco, I joined a company called Truebill. They've since gotten acquired, and you might know them as RocketMoney, but they’re a budgeting, self-finance app similar to the Mint and others.

After that, I moved on to a company called Realm. They were known for an open-source mobile database, so it's more of a dev tools company. And I joined as they were transitioning into monetizing that. So it was an interesting time to join a company, especially when you are the only data resource.

It was a lot of trial and error, and I had to learn what worked best for me and these businesses. But the first 90 days were a lot of just gaining the context I needed to be effective.

So first and foremost, you have just to learn the business. I didn't have much experience with what product analytics looked like for Truebill. So I had to learn how people used their app and their goals.

I spent a lot of time reading documents and talking to business leaders to identify their business units' key metrics and objectives. Truebill was six people at the time, so there weren’t many things complicating their objectives or competing goals, which made identifying the internal KPIs straightforward.

Once you have those clearly defined, ideally, you go and build the metrics so that we can start tracking and measuring any business process off of those metrics.

But from there, you must analyze and assess the existing data infrastructure. At Truebill, it was just a Postgres database that the engineers had put together. It was pretty straightforward. There were some clear events and architecture. They were integrated with Plaid, which had its database structure, too, that we were just ingesting.

It was pretty clean. And at the time, they were not a very complex app, so it was nice to be able to assess this so early in my career. But then, at Realm, which had a much more complex data model, it took a bit more just to grok what was happening and figure out where their data was being ingested.

They were using all kinds of third-party tools and things like that just because it cut down on the requisite engineering time to gather the needed data.

We had to get all this data in the right place and model it appropriately to do anything with it. So I started with that. The primary goal at both these companies was first to get the data modeled in a way that makes sense for the metrics we prioritized.

And then, as all of this is happening, you have to make sure you're documenting what you're doing and communicating actively with stakeholders. To be as successful as possible, you must make progress without making people feel like they're being deprioritized.

Luckily, these were small companies, so there was less of that, but yeah, it's a lot of prioritization and communicating your successes as you go along.

What’s the best way to scope requirements with your stakeholders?

Yeah, it was a standard set of questions. Before going into those scoping meetings with primary stakeholders, I would meet with department heads, directors, and C levels at a company that size.

But doing as much research as possible going into it, like I mentioned, getting that business context is huge. But then, at a company their size and relative complexity, they had at least a fair amount of documentation. They had the start of an organized knowledge base and things like that, that I could at least sift through and get as much context as I could going into those meetings because that makes them that much more valuable.

But then, yeah, I had a question framework that, as I said, I developed over time. This didn't just come naturally, but it started with understanding what challenges and opportunities they saw in their domain. I can make assumptions and everything like that, but seeing their perspective on essentially what they're struggling with or what they see what the best use of my time would be. Sure, I'll have disagreements, but getting their view is huge there. From there, I want to understand what their process for making decisions looks like.

And then I typically go into what metrics are they currently tracking, and what insights are they able to draw. A lot of times this leads with, this ends up being, oh, we don't track anything. It's all intuition.

So being able to contextualize their work with the answers to that helps understand, okay, maybe there is an opportunity for streamlining just customer feedback data. There's a lot there that doesn't immediately shift how they're working but helps move them into a more data-driven decision-making track.

And then from there, it's understanding, okay, what hypotheses, what assumptions, what expectations do you have? They know their business unit, and they know the company better than I do when I'm first scoping these things. So getting an understanding of how they think there goes back into the framing of challenges and opportunities.

So that kind of frames everything as, okay, let's start guiding them to being a bit more data-driven and then, as things come up, talk about, okay, what does data availability look like? What are the limitations here? How quickly does it need to be turned into insights? Most of the time analytics is perfectly good being batched at a daily level for small companies, but understanding like, okay what should I set my goals as I'm building out this data engineering tooling and pipelines.

Let's talk about the data engineering work that you did and how you approached that based on the problems or the opportunities that you saw.

Yeah, so I think a lot of this boils down to prioritization. And then from there, planning work, because in this world, like this was back in 2016, 2017 the main players in the world today didn't exist.

It was a more intensive process to build data pipelines, build the appropriate data warehousing and everything like that. So I had to prioritize. We were working with data from GitHub, internal production data, a bunch of marketing demand generation data, there's a lot of conflicting priorities across the business.

And so just understanding that realistically, they have to get done one at a time as far as pipelines go, but you have to start with the infrastructure. And back in the day it was pretty straightforward. I was just building off of building out of AWS, just relational databases, RDS stuff.

So it was pretty straightforward, which helped me learn the status quo of the analytics industry. But it did make my life just a lot more manual than it is today. I was thinking, okay, I'll do a lot of ad-hoc analytics, some dashboarding, and if I'm lucky, get into forecasting and some modeling.

But data engineering is what they needed to hire for, and luckily, I was able to learn at least the basics to do that. But yeah, it was a matter of just prioritizing how we can get this data where it needs to be for downstream analytics and then how we prioritize what needs to be done first.

What’s the best data engineering framework you use today?

I think it depends on the client, but realistically, there are so many good tools out there, whether you're willing to pay for something like FiveTran or even just going with more custom Airflow pipelines. Airbyte is out there and that's open source and enables a lot, but it streamlines ETL in a lot of ways.

I'm a big proponent of dbt for any of the data modeling bits, and that basically is all you need if you're an analytics engineer. In today's world, I'm pretty indifferent on the data warehousing side. Obviously, Snowflake and BigQuery are more or less my go-tos if my clients don't have preferences.

But that's mainly because I have experience there and they're pretty successful at what they do.

How did you approach the reporting layer?

There's always going to be pressure for the downstream analytics and downstream reporting. So realistically that's where the prioritization came in. Okay, let's understand what are the primary KPIs for the business for C levels to look at on a daily basis. Let's get the bare minimum data in place from the data engineering perspective, and then figure out the best way to report on that.

We approached it very differently at both companies. Truebill was much more of a lets have an internal dashboard. Let's put it on a TV so that everyone at the six-person company sees it on a daily basis when they walk in when they're at lunch.

So realistically that's where you start doing reporting at a high level for P0 company KPIs and metrics. Let's get those built and get them in front of as many people as you can at Realm. We took a very interesting route, where we were to create a Slackbot where we'd just push to the #general channel every day at 8:00 AM and every evening at, I think 4:00 PM as well.

With the top five company KPIs, we'd throw in cool emojis to say, yes, this is going up and we'd say what's the week over week change. And it helped, like people had to look at it every day, like the #general channel, as is the case with any business where people always look if something pops up.

It was effective but It's not something I would recommend for a lot of orgs, especially at their size where it's a hundred people. But it was pushed into their faces and it was effective in that regard.

What are the tradeoffs of exposing something like a Slackbot to the org?

Yeah as you could guess, I'm the sole resource, so anytime anyone has any questions about it, they're pinging me. It's another thing that I have to monitor. As far as just the engineering side, like I built a Slackbot where I had to keep up with what Slack was doing with their API and things like that, where there were a number of times when it broke.

This was earlier in the Slack days, so they were a bit more agile than they are today. But yeah, I think it's the constant feedback from the business as a whole, that created the biggest problems, which is a good thing. They're more data-driven, but, without any self-service tool, it just made it so that I worked a lot slower because I was talking to the CTO about why our daily active users went down today and looked into that. It puts more pressure on you to scale things that maybe aren't ready to be scaled or shouldn't be scaled at that point.

How did you think about making a business case for tools that you needed to buy to your CTO or CEO?

Yeah, you have to make a very compelling case. I think it's hard if you're new to being in the analytics space. Being the sole data person, you don't necessarily know how to estimate your time, or how to scope projects effectively.

So I was doing a lot of, finger in the wind if you ask me to build this new pipeline or a new Slackbot. It's going to take me X number of hours and let's backtrack and do the math on whether that's valuable because that means I'm gonna deprioritize X, Y, and Z.

But you have to speak to whoever your decision maker is and understand what they prioritize. Our CTO at Realm was more focused on, okay let's understand and make the open source database users happy because they're going to be our primary leads for monetization. So let's focus on their well-being. And if they're creating tickets in GitHub let's tackle that and create an analytics flow for those.

Analytics is interesting because, at any early-stage company, it's frequently thought of as a service org. And realistically it's a consulting org. You're constantly going to be in places where you are the main data resource, your job is SQL, and that's how you know leaders see you.

It's not worth their time to write the SQL, to build a chart or a dashboard, or anything like that. And the consultation piece is prioritization. You span the entire business which isn't true of a lot of other orgs.

Engineering is frequently focused on what they're building. Sales is focused on sales that they don't necessarily have. I just had to constantly push back based on my intuition. Basically, the pushing back is vital to actually produce the most value instead of serving whoever is loudest in the room.

You have to get the right people in the room to make sure the priorities are right and be the first one to push back on requests that might not align with top-level business goals, which is super valuable. Because it's a lot like being a product manager at that point.

Your responsibility is to be the glue between these different teams. And have a vantage point that is backed with objective evidence. But at least you can take that vantage point and strategically elevate yourself rather than just being a SQL machine.

And luckily you are in a position where you know the data better than everyone else, so you are able to use the metrics to make your point.

It's hard to argue against the numbers as any business leader would like, sure, you can bring in oh, my intuition, that this customer had X, Y, and Z, but if you're seeing on aggregate something very different, you have the tools to push back with better evidence.

What are the trends you're seeing now as a consultant and what predictions do you have?

It's an interesting time given what the economy looks like. In the last year, we've seen so many tech layoffs and reprioritization in that sense. But it doesn't change how hungry companies are for analytics. They just are a bit more budget conscious about it now.

So it's, weighing a full-time hire to focus on something that will help your business scale versus a third-party tool or a consultant or something that won't hold your hand for the rest of time but gets you 90% of the way there and you can run with as you're scaling and building up.

Consulting has been very good in seeing that side of the world, thinking about the larger industry, what analytics looks like, and what the hunger for analytics is. But we'll see how it progresses. You asked about predictions and can't answer that without thinking about AI—these days. I can go to chat GPT and just ask it to model event stream data for me, and it gets you maybe 65% of the way there, which is way better than I could have asked for six months to a year ago. I'm not particularly worried that AI will automate my job soon, but it will definitely help.

I think augmented analytics is being thrown around as a massive benefit to my world. And it streamlined my work, with things like automating the process of removing duplicates. That's not something I have to worry about these days.

I can throw a data model at various tools, and it'll spit back basically what I need to understand, what that table is doing, and what I should expect for downstream analytics. So I think that'll just get better. But then that also brings in the question of data security and privacy and everything like that.

Any AI discussion brings in those competing points, and it'll be interesting to see how it plays out. I think there will be a vast world where there are SaaS products for companies that are doing a lot of work and successful work in bridging that gap because that's the direction the industry is headed.

Where can people get in touch with you?

Snowpack-data.com is our site. Three of us span the entire analytics stack from data engineering, product analytics, and ML data science work.

We're happy to have chats and help any company where they are. Otherwise, I think I'm just head, full steam ahead. I think my experience being the first data hire has helped frame the value of something like a consultant where I can pop in and whether I just advise or actually build the analytics stack from scratch.