Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

In this post, we'll summarize our co-founder Ryan Buick's recent appearance on the Data Engineering Podcast, hosted by Tobias Macey.

Ryan, can you start by introducing yourself?

Thanks for having me on. I’m Ryan, one of the founders of Canvas. We’ve been around for about a year now. Our mission is to help really bring the modern data stack to operators and to business teams so that they can, make better faster decisions and data teams can continue to focus on the work that really matters to them.

Do you remember how you first got started working in the area of data?

I started a couple years ago, I was one of the first product managers at a company called Flexport. Flexport is basically a tech enabled freight forwarder. So trying to disrupt the millennial old industry of shipping goods from point A to point B and all of the complexities around that. I spent my first couple months at Flexport, during nights and weekends, learning really how to write advanced SQL, and how to think about cleaning data. That was super helpful for me in terms of getting exposed to data. That’s sort of how I ended up in this space.

Can you describe a bit about what it is, and some of the story behind how you decided that that was where you wanted to spend your time and energy to build a product, versus just continuing on your career path of being a product manager?

That story also started at Flexport. I met my co-founders there. They were both engineers, and on the same team. We had seen just how difficult it was for the data teams to keep up with the business team's data requests. I think we did a good job of having strategic dashboards for the business, but there was just so many ad-hoc, everyday questions that needed to be answered. Unfortunately, there wasn’t really a great interface for those questions to be answered. We saw a data team that was spending millions of dollars on the modern data stack but analysts couldn't keep up with each individual team's demands. I started really thinking about it during COVID, and decided to jump into it in late 2020. We started Canvas as a data exploration tool and data visualization tool, primarily for business teams. We integrate with the modern data stack as well as Snowflake, BigQuery, Redshift, and all of the popular warehouses. We also integrate with dbt which is exploding in popularity.

You mentioned that Canvas is intended to bring the promise of the modern data stack to people who don’t want to put in all the engineering effort to manage it themselves, what shortcomings do you see with the modern data stack?

The modern data stack has brought tons of valuable tools and benefits to tons of people, but the issue is there’s still a pretty high barrier to entry. I think a lot of the interfaces haven’t really caught up with the way that data is now being thought about. There’s been a lot of talk about the death of the dashboard. A lot of the strategic dashboards have been solved. But, a lot of the work that’s being done now by operators is more complex. I think that’s really the heart of what we’re trying to do at Canvas, which is to give a flexible interface on top of trusted data. As well as certified data that data teams are working on and give them a way to ask whatever question that they have at the moment, rather than going into the breadline. Thankfully, there’s a ton of exciting things that are happening to bridge that gap.

In terms of what you’re building at Canvas, you mentioned the primary interface paradigm is this spreadsheet that everybody has become familiar with over the years, why do you think this spreadsheet has became so popular?

I think it’s a few things. First of all, it’s been around forever. One of the first things you learn in business school is how to use Excel. So I think the institutional powers are propagating and holding up Excel as a way to do this. I also think it’s a great way to visually program. It’s the first programming language that most people learn. It’s very similar to SQL in a lot of ways, and we see there’s really two sides of the same coin. It’s familiar, it’s fast, and it’s relatively iterative. I think that’s why it’s sticking.

In terms of the existing systems we have for spreadsheets, what are the biggest issues that users encounter as they tried to scale the usage of those spreadsheets?

The first obvious one is the analytical performance, you can’t be taking event data and putting into Google Sheets, it’s gonna fall over. I think that’s one of the first use cases that we see and that we can really help out with. How many times have we seen spreadsheets with “do not edit tabs” they’re afraid to let go their Legos either for privacy concerns, or not wanting to someone to break their massive model that they spent weeks building. I think those are the two biggest things that we see, and the real reason why people are trying to move away from having PII data.

In your own tool, what are some of the extensions or modifications to that paradigm of the spreadsheet?

This has been an interesting adventure with dbt. A big part of spreadsheets is the way that information is shared, you often want to see and show the work behind your analysis. I think leaving a lot of what’s being done with the DAG, and being able to show that what a spreadsheet is actually composed of visually. That’s something that we’re trying to invest in and show people, just because this is a spreadsheet doesn’t mean that it’s not a powerful analytical tool, it doesn’t mean that it’s not extensible . It doesn’t mean that you can’t actually double click and see the lineage behind it. This helps users to understand what has happened to get to the current spreadsheet.

Can you give your perspective on the role that Canvas plays in the overall data ecosystem for an organization?

The primary persona is someone working with data teams that are looking for a way to scale. We have business teams that come to us and they want to just be able to have an interface that they can control over their warehouse or over their database. That’s something in which we’re really just trying to help them be able to move faster and do more with less. A lot of that work has been automated, and so they’re moving more and more into strategic roles. We offer a tool for people in these strategic roles to understand their data better.

What do you see as the juxtaposition of what you’re doing at Canvas, with some of the other ways that people have tried to approach the same problem of providing self service data exploration in a way that is approachable to the business users?

There has been a ton of tools that help with learning SQL or making SQL more collaborative. I think the thing that we’re really trying to lean in on is there are very analytically minded people throughout the org that need to do things beyond just a couple of set basic filters and slicing and dicing the data. They actually have models that are sitting in Google Sheets that are completely manual and updated with live data. That’s the work that we’re really trying to go after and show people there is a better, and more efficient way to handle data.

What are some of the ways that the design and goals of the product have changed and evolved since you started working on it (Canvas)?

The first thing and probably the thing that is most exciting to a client is when I demo. You can open up any of the spreadsheet analysis and it will actually generate SQL with really nice CTEs for data teams to actually go in and inspect and edit. Instead of trying to reverse engineer formulas and references on some sheet that you’ve never seen before, you can actually just open up the hood and see the sequel and make changes really quickly. So that’s our primary sort of escape hatch right now.

As far as being able to manage that translation, what are some of the spots where you’ve seen the impedance mismatch?

One of our goals, when we sit down and prioritize functionality for the spreadsheet or function for the sequel is to make sure that there is a close one-to-one representation of that. One of the big decisions that we had to make was, do we want actually right back to the warehouse? Do we want to actually write back to the data sources? That’s something that raises a lot of concerns from the CISO. I think there’s a decent amount of these that can come up if you’re not careful about actually prioritizing the right things, which I think we’ve done a fairly good job of doing to date.

What is the actual workflow for being able to figure out?

It all starts with essentially a data library. We use social proof to show things that are popular amongst finance teams or marketing teams, etc. So it’s part discovery and part showing them “here’s what you’re looking for” and allowing them to search for it easily.

In terms of the iterative process and managing the versions that have been the plague of everybody who’s ever dealt with Excel, how have you approached that problem?

So one of the nice architectural decisions that we’ve made because it’s a multiplayer application, is you can actually see the multiple cursors, moving around, in real time working on the same data as if you were in Figma. Every action is taken as part of a ledger. So we can actually control each and every stage of the analysis and allow you to revert back to one of those stages, and eventually let you freeze to a particular stage so that you can just jam those numbers.

How have you have seen the analysis being done in Canvas get fed back into things like dbt, or the other analytical systems that an organization might be using?

We had to ask ourselves: “How do you let someone go and take this insight and go and take this model, and allow it to graduate back into your systems?”. For right now, this is a fairly simple interaction between the data and business teams. The data teams are able to see what the business teams are doing, they can collaborate and have a conversation around that there, they can also see which models are being used most often. I think one of the coolest moments that we’re having is watching this conversation happen between data and business teams that before was never really happening.

For people who are looking for a way to make their data self-serve and easier to access for business users, what are the cases where Canvas is the wrong choice?

If you’re looking for hardcore data analysis tools that are going to be very heavy for the data team, it’s not going to be a situation in which we’re going to be a great fit. I think we complement a lot of tools in that case. We do have a couple of customers that have multiple front ends for different personas. I honestly think that’s how the arc of the modern data stack is bending. It’s bending towards best-of-breed tools for whatever use case. For certain data exploration scenarios, we are the best option, and for some, there are better tailored tools. It's all situational.

As you continue to build and iterate on the product, what are some of the things you have planned for the near to medium future?

I alluded to this earlier a little bit with templates. We have been taking in a lot of the requests that we’re seeing in terms of what business users want to do across these different companies and really working on programmatically, implementing these models that can actually help save the data team time from having to build those. It will save time and headache for these data teams, as they won’t have to design a template from scratch.

As a final question, I’d like to get your perspective on what you see as being the biggest gap in the tooling or technology that’s available for data management today?

I think, opening up that conversation and that collaboration between the creators of the data and the consumers of that data. For all the advancements that we’ve made on the tooling side, collaborating across the gap is really the biggest gap that I see still.That's something that we’re most excited about with Canvas.

Thanks so much, guys. Have a great one.

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas