Podcasts
·
June 20, 2023
·
Ryan Buick
Ryan Buick

Fixing the last mile of analytics with David Jayatillake, CEO at Delphi Labs

Welcome to the Canvas Podcast, where we bring data and business leaders together to talk about how to make data easier for everyone. Today I'm super excited to have David Jayatillake on the show, who is the CEO and Co-founder of Delphi Labs.

We got in touch because you posted recently about needing to talk about Excel from a data perspective and I wanted to have you on and talk more about the problems that you're seeing and the future of spreadsheets and data.  But before we do, do you want to tell us about yourself?

Sure, yeah. I'm David. I've been in data for about 13 years. More recently, I've been in some of the modern data stack communities you may be familiar with, including the dbt community and Locally Optimistic. I have my own Substack, which I write every week and I'm the co-founder of Delphi Labs, which is an AI-powered, natural language interface for data using semantic layers.

How did you get your start in data?

So I initially started in big four accounting actually. So I started at E&Y, which used to be called Erston Young a few years ago.

And I didn't really enjoy doing audits - I don't think anyone really does. Not even the auditors, but the thing that I really enjoyed about my internship there was actually the analytical work. So being young and naive, I just thought let's try and get a job as an analyst.

I was 22 or 23 and didn't have much to lose. I ended up at a company called Ocado, which turns out to be a very cool company at the time which is famous in the UK. Some people in the US know it, but essentially Acado is like the first robotic warehouse for groceries.

They've got this amazing high-tech warehouse full of robots that pick stuff to put in people's online grocery orders. They’ve white-labeled their system to US grocery companies like Kroger, so even though you may not have heard of them, you may well be being served by them.

So that's where I picked up SQL. I took this role as an analyst without really knowing much about databases beyond what I learned in high school. And so suddenly I'm learning SQL, Excel, and VBA because this is like pre-Tableau. That was the first place I ever saw Tableau, where we started trialing it towards the end of my time there.

From there I knew this was what I really wanted to do. I moved to a few other companies, spent a great deal of time in payments with a company called Worldpay, which is now FIS and is New York Stock Exchange listed. I had lots of different roles there, everything from analytics engineering, data engineering, data analytics, and building commercial analytics systems.

So had a really good time there. That's where I really saw that I could push the needle forward with data in a company. From there I moved back into mainstream data, leading data teams in E-commerce. FinTech, Credit, and a few other industries. And then finally moving into like startup land as of last year and now this year with Delphi.

Tell us about Delphi.

Yeah, so Delphi uses the large language models of the day to answer questions with data. And you probably may have seen, there's a number of companies out there doing what I call text to SQL. Essentially you write the question in, it generates SQL and runs it for you. We don't do that. We have a contrarian view on that. We've fundamentally don't believe that method will work for the enterprise.

So what we do is we work with the semantic layers of the day: dbt, Looker, Lightdash, Metabase, Cube, and very soon AtScale, and integrate with them via REST API. So what we are generating is not SQL query, it's like a rest API query to that semantic layer.

Fundamentally this provides like a better response because, when something generates SQL from a question, it doesn't know what's in that world, right? It doesn't, it just has a guess, and that's why it comes out with wrong answers really often.

Whereas when you use a semantic layer, you have this whole world, it's like a drawer you can open full of objects. Like oh, this is a customer, this is an order, this is revenue. And if someone asks a question and there's nothing that's particularly similar to it in the semantic way, you can say we can't answer that question. Sorry, this is what we can answer. And these are some previous questions that are similar-ish to what you've asked.

And so it just provides a lot more power in the workflow, and that's why we can offer a much more comprehensive and safe workflow for business users than what you can do with like text to SQL.

Let's start with your post on Excel. So what led you to start that?

Yeah, I'd been thinking about writing that post for a while, and the reason was I was doing a bit of consulting like a few months ago and I went to visit my client and I was in one of their board meetings and afterwards, like we were talking about the data project that we were running and after the project, like one of the like executive directors came up to me and said, oh yeah, I've really enjoyed using like the dashboard you've built in Lightdash.

And you know that I, that the team I'd put in there has done, but he's like oh, but look what I'm doing. I'm downloading the data into Excel and exploring it and doing some other stuff that way.

There was just that moment where, and I knew this from years working in data, where fundamentally, there are some things that people will always want to do in an exploratory way, regardless of how good your data model is or how easy to use your data stack is, some people will always want to do their own exploration with data, and it has to be really tangible.

They want to see what happens to each number as they process it. It can't be wrapped away behind some kind of black box and even like a SQL query is a black box enough for some of these people.

And so that made me think about how I've spent a lot of my career trying to stop people using Excel because, and I've been like one of the worst offenders, building like crazy forecast models with millions of array formulas that could only run on a workstation. I've done all this stuff.

And like on that basis, I feel qualified to say that it's like it's a bad idea to use it, but the truth is when someone like me is using it for that, I'm not doing exploratory work in it. I'm building a production system and that definitely should be stamped out. But the exploratory use case I don't think that's ever going away.

For me, it's like having your math workbook at school and you're solving a problem and you're exploring the numbers. That's the experience that people who are just exploring data and Excel are doing and I don't think that's going away ever really.

It's so easy to use and it's so available. So many people just have it sitting there on their local machine. And it just works. You just open it, just like you open Outlook or some other Microsoft app and you can start playing with it.

It's often even connected to some of your other business data sources, right? Through Microsoft's integrated stack, you can access Dynamics, SQL server, a data warehouse database. You can access loads of stuff without doing any ETL.

When you are doing that exploration and you build abstraction on top of abstraction, you can build something really complicated, but you can get like a lot of clarity in what you've built because you can trace its roots cell by cell and really understand like how it works.

It's learning curve is really good. Like you can start in Excel knowing pretty much nothing. Oh, I'm just gonna type a number in, I'm going to add this number to another. It's like learning math, right? It's, you can start from nothing and just get some basic skills.

And for some people, all they need is I need to sum some numbers up, or multiply some numbers and that's good enough. it's almost like a calculator that can handle multiple numbers at the same time, and that's all they want. And that's fine. And I think that's partly why the learning curve is so sh, so shallow to start with.

Whereas with SQL, even to just get a single basic query running, you need to understand what it's doing and select from and where in order to get just the basic result. And that in itself is, for most people in most businesses, that's quite a high level of skill.

What are some of the key reasons that you think data teams are trying to get business stakeholders out of Excel still?

When it's exploratory, when it's a scratchpad, they're never gonna get them out of there. And I think we should stop trying to get them out of there for that reason. And maybe offering them tools that look like Excel, but work in a more scalable version-controlled way, is a good starting point. But when it comes to production use cases, like you said, where people at Flexport were joining data together in Excel cause they didn't know SQL and they have access, we definitely need to be getting them out of there.

Semantic layers are a great starting point for that because it abstracts away all of the complexity and the data and what this UUID column means. They just need to know what the business terms are. And those stakeholders know business terms, they know business.

That's what they're good at. So semantic layers are definitely important. I'm not precious about using LLMs to help generate semantic layers. I think that obviously makes sense over time. At the same time, I think humans need to govern the meaning of their data. For a machine to hold the meaning of your data, I think that's a step too far at this point.

And the other reasons I think are also sometimes the reasons why people use Excel is because they want an answer as they come to an answer. What was my revenue by month last week? Or revenue by month last year. And so they pull the data from some like place that's not necessarily safe, and then it's probably more complex, complicated, a dataset than they realize, and then they use it in a certain way that could be like not the best, and then they come up with an answer and the answer isn't consistent with anything else.

This is the use case that that Michael and I are trying to solve at Delphi. Let's give them a way to find safe answers quickly that they can then take to Excel if they want.

So we do export to CSV right now, and we'll do export to Excel directly in the future, they need to be able to get safe answers quickly, and that's another reason why data people don't want them in Excel because they've seen so many unsafe answers come out from them using Excel in the past.

And they, all the versions are such a problem. Like you, you send out an Excel file and an email and then suddenly you've got 10 like baby Excel files or children Excel files as a result. And they all name the same with different stuff and it's just a nightmare.

What are some guardrails that you've seen that work at a startup? How do you ensure business teams have the flexibility to do what they want, but data teams can sleep at night knowing that these business teams are working off of the right data?

When you say startup, I guess it depends on the size. So if you are very early, I think it's very hard to get there. In your best case, you're gonna have one data person, they're probably a data engineer. And that could be good, right? Because data engineers are happy to build pipelines and maybe build a semantic model and then they're not going to do any analysis.

They'll just tell you how to use it. And that could work quite nicely. The problem there is the interface. Like with a small startup, you can probably say to those 12 people or however many of them there are. Oh, you need to go and go to Looker or Metabase, which are quite common for those startups and this is how you ask a question or pull the data and they'll pretty much manage then I think as you scale up a bit and there are too many stakeholders that educate in that way.

This is like what we're trying to solve with Delphi. All of those stakeholders are not going to want to learn that BI tool. Not all of them are gonna have the time to use it and not all of them will even be capable, so that's why we want to give them a way to interact with that semantic layer that gives them safe answers without having to know how to use a tool.

It’s not just another app that they need to learn how to use. So that that's partly why we're doing what we're doing is because actually, I think that's broken. I don’t think that the last mile is solved.

Any advice to data teams on how to effectively work with their business stakeholders on this problem?

Yeah, I think it's also like worth trying to know the balance between what are you going to semantically model and what's moving too fast for you to bother.

When you're running experiments, that data changes on a daily basis because some software engineer has added a field or removed a field or completely changed the structure because they're experimenting with what works. And if that happens, like that's not the time to try and build something on top of it that. There's no stability there.

So if you are rewriting the semantic layer as often as you are querying it, it's obviously the wrong idea. You'd rather just write some python or write some SQL to just do that manually. And it's an experiment, right? It doesn't live for that long. It doesn't need to persist.

Once that becomes like, oh, this is a feature we put into production, we need to monitor it over a longer time. There are more eyes on it, we need reliable numbers on it, then that's when you should have that rigor, and have the modeling done well, the semantics done well, and the documentation done well, right?

You want to have as little as possible in your semantic layer because it's expensive and difficult to maintain, and if you have too much in there, people get confused and they don't know how to use it because it's too confusing to know which thing to use.

Do you have any thoughts on business teams contributing to or collaborating with data teams on data modeling?

When I started that consulting project I mentioned, the way I started was to go to those business users and ask what are the things that you like knowing? Like roughly what someone in Marketing in an e-commerce company, someone in Finance, or someone in Product does in an e-commerce company.

I went to them and said what are the things that you measure and what do you need to measure them by? And what are your SLAs, like how often do you need that data refreshed and you get very different answers from team to team.

Product might want very real-time data. Marketing may want a longer history than Product. Product may want a lot more detail. Finance probably doesn’t want real-time. They'd be happy with daily refreshers often, or even sometimes with less than that. And then you find out like what are the metrics that matter to them? And you'll find a lot of overlap.

And I think one thing there is finding like what is the single definition of something. And if even if they have multiple things that they want to filter or adopt, you want to start getting them to agree on the same metric and the same meaning for a metric, because otherwise, you create this problem where Product comes out with one number, Finance have another number, and Marketing have a third. And all that happens at any meeting is people argue about their data, who's right. And they don't get to actually do any like business. Having that,

We've been there and often oftentimes like those metrics where they're having different definitions, like directionally, they're exactly the same, so they could just cope with having the one that the others use. And trying to make them make those compromises is really important to save yourself a lot of pain and also save your company a lot of time.

What are the biggest problems left in the modern data stack and any predictions for what's next?

Yeah, it's a really interesting juncture. I believe that the last mile of questions to answers is a problem that I'm trying to solve at Delphi. But there's more than that out there, obviously.

I was previously at Metaplane and data quality is still a problem. I think we now have better tools than we've ever had before. And the discussion around data contracts, while some people are a bit annoyed with the term, it's still a good progression because the progression is, someone needs to own the data that's produced and it can't be the data team that's a downstream consumer that has no say over how it's produced, right?

There has to be some kind of interface between them, whether that's an interface at a technical level or an interface at a human-like process level to ensure that what data is collected is understood, and is an agreed-upon specification with SLAs and understood meaning.

And that is a good step forward, which we haven't had, in the whole time I've worked in data. So that is like the single biggest step forward, I think, in quality that is happening. I think streaming is also becoming easy.

I've built stacks that kind of could do streaming if I wanted to, but just for the sake of saving credits, I didn't make them that real-time. Because at some time, real-time becomes a bit batchy when you have to start looking backward.

I think streaming will happen. The thing that I think is really interesting is obviously we are using AI and LLMs in our product, but I think that you have to reimagine the whole thing and what happens if there was an LLM introduced here. What could it do, how could it improve either the developer experience or the actual product itself?

And you can start thinking about, even at the ELT level, if you could have an AI method of generating connectors. Almost certainly is the answer, right? They're very generic in some ways. And you start thinking about every little piece, every little action that happens in the data pipelines and the developer workflow today, and what could be automated, what could be enhanced, what could be improved.

Anything you'd like to plug? Where can people get in touch with you?

I write every week on davidsj.substack.com. Also, you can check out delphihq.com. I'm on LinkedIn and Twitter and I have a Mastodon server that a fair few data people are on as well called data-folks.mastro.host. I'm on Locally Optimistic and dbt slacks as well, so I'm pretty easy to get ahold of.

Background
Subscribe to our newsletter
If you found this interesting, consider subscribing to more good stuff from us.
Subscribe
© 2024 Infinite Canvas Inc.
Twitter logo
LinkedIn logo
Spotify logo