How much does a data warehouse cost?
Why do you need a data warehouse?
You’re getting traction, building your team, and collecting data across your tools. So far, you’ve been able to make due on metrics by exporting data out of your SaaS tools into Google Sheets and shoulder-tapping engineers for SQL favors.
Soon though, you start to notice some troubling signs. When you ask a few folks on your team for metrics, they come up with different numbers. Worse, they come back with conflicting narratives when you ask them why the metrics are up or down. You might also hear that your engineers are getting frustrated from hours spent maintaining custom scripts or from having to help business teams with data exports.
These are typically signs that you should invest in data experts and tools to help you get all of your data in one place, define important business logic, and let your team visualize and explore data to understand how you’re performing and make informed decisions.
In this post, we’ll break down the options you have to get data-driven and the cost of those options.
What is a data stack?
Let’s start with what a “modern data stack” actually is. It consists of four parts:
- A data warehouse to store your data (Snowflake, BigQuery, or Redshift). This differs from your database, which typically contains the product data you collect in your application.
- An ELT (Extract, Load, Transform) tool to send data from your SaaS tools and your database to the warehouse (Fivetran, Stitch, or Airbyte)
- A frontend to visualize and explore the data to make everyday decisions (Tableau, Looker, or Mode)
Next, we'll explain how much each part of the stack should cost.
Data warehouse costs
Storage costs $25-$40 a month per terabyte, which is negligible for most early-stage data volumes.
For compute, you pay by the machine hour, where one credit gets you one hour of running a compute instance. One credit costs $2. If you have one instance of the smallest size running constantly, that’s $48 a day or $1450 per month. By auto-suspending the instance, you can get this down somewhat (maybe $24 a day if you’re just using it for half the day), but at the cost of an occasional cold start.
Let’s call this $1,500 for entry-level.
Again, storage will be negligible ($0.02 per GB per month)
Compute is charged on-demand - you don’t pay for an instance hour but per terabyte of data processed at $5 per terabyte. This means if you’re only querying the warehouse occasionally, BigQuery will be the cheapest option.
Using this lightly, $100 per month could be reasonable.
Storage is, again, negligible ($0.024/GB-Month)
You can provision the cheapest reserved instance for $0.25 per hour, or $360 per month.
You can now do on-demand pricing with Redshift (similar to BigQuery) at $0.36 per RPU hour, so a cost of $100 per month for light use could be reasonable.
With Snowflake, you’re paying ~$1,500 per month for entry-level. But this level will scale with you for a while (until you have too many queries that it overwhelms one compute node, which can take a while)
With BigQuery, you pay per terabyte of data processed in queries. This means it’ll be the cheapest for light usage, but the costs will grow linearly with your usage. Depending on how heavy your usage is, this could be as low as $100 a month.
With Redshift, $360 gets you a single instance running non-stop (comparable to Snowflake). Also offers on-demand usage comparable to BigQuery.
With Fivetran, you’re paying for the number of monthly active rows (MAR) you’re sending to the warehouse. These are rows that are inserted, updated, or deleted in any of your connectors. This includes initial syncs and resyncs, and MAR is only counted once per month. All connectors you set up anytime give you a free historical sync and 14 days of trial/usage. They also offer free re-syncs.
The benefit here is that your cost per row will decline automatically with the more unique data you sync. They also just released a free plan so you can get started for a few users and connectors.
Check out their pricing page and play around with their pricing calculator to get a better sense of what your costs will be as your MAR scales.
Like Fivetran, Stitch charges per million rows per month and discounts your unit costs with the more rows you update. They don’t have a free plan (only a free trial) but their standard plan includes up to 10 sources which should be good enough for most early-stage startups. They also have a pricing calculator you can experiment with on their pricing page.
Airbyte is the big open-source player in this space, so you can deploy their free version if you’re up for more work.
Extract and load tools typically charge by the number of records/row changes made monthly. Early-stage B2B SaaS startups with smaller amounts of customer data should expect to pay $500-2,000 per month, while eCommerce startups with tons of customer data can expect to pay $1,000-3,000 per month.
BI tools typically charge per user and usually range from a few dollars per month per viewer license, while users with more advanced permissions can expect to pay 5-10x that per month.
For example, Tableau charges $70 per month for Creators who can connect data, build charts, and publish dashboards, $42 per month for Explorers who can edit charts, and $15 per month for Viewers who can access existing dashboards.
The hidden cost
Tools can be expensive, but the real cost impact here is that you’ll need someone technical to set up these tools, manage your pipelines, and optimize storage and spending. They must also help you understand the data, translate human-language definitions into SQL, and answer ad-hoc queries.
You want to keep your engineers focused on building your product rather than managing your data stack. So if you don’t have a good owner for these systems and tasks, you should consider bringing in data consultants or hiring a team to help.
Hiring data consultants
Consultants are a good option if you haven’t built a data stack before. They can diagnose your needs and recommend options based on the complexity of your situation. They’re also helpful with tasks like modeling your business logic. If you haven’t done the work to define your metrics and business logic, you’ll face an uphill battle.
Data consultants typically charge a few hundred dollars per hour. Short-term projects, like a one-time data analysis or a small data migration, can cost a few thousand dollars. Longer-term projects, such as a complete data strategy overhaul or a large-scale data infrastructure implementation, can cost tens of thousands or more.
The obvious downside is that these consultants charge hourly, so your costs will quickly increase. You’ll also have to do a significant amount of knowledge sharing with your consultants, which will take time and result in a few iterations per model or project.
If you want a long-term approach, consider making your first data hire instead.
Hiring a data team
Data teams consist of a few different roles, and you’ll need to think about what short-term and long-term problems you want to solve before hitting the recruiting trail.
Responsible for cleaning and organizing data and performing analysis and reporting, Data analysts can cost you $80,000 to $150,000 per year, depending on their experience level and location.
Responsible for building and maintaining the infrastructure and pipelines supporting data analysis, Data engineers often think and act more like software engineers with solid technical backgrounds. The salaries of data engineers can range from $100,000 to $200,000 per year.
Of course, hiring takes time, and hiring a bad fit can set your team and your goals back for months. But if you need a full-time, long-term owner, hiring your first data generalist may be a good idea.
Introducing Canvas, the full-stack analytics tool for startups
As the founder of Canvas, I spend my days talking to teams struggling to get their data house in order and looking for a better way to make fast, accurate decisions.
At Canvas, we believe building a few dashboards shouldn’t require a small army of experts and tools. We help startups connect their SaaS tools and databases in minutes without code, explore live data with their spreadsheet skills, and present beautiful, interactive dashboards with their teams and customers.
Instead of needing to buy, build, and manage your data stack, we provision and orchestrate world-class tools like Snowflake, Fivetran, and dbt for you, alongside an interface that lets your teams use their SQL or spreadsheet skills to answer questions.
Canvas connects to >150 databases and SaaS tools like Postgres, Segment, Stripe, Quickbooks, Netsuite, Salesforce, Hubspot, and Zendesk. We also provide model templates for common startup use cases (e.g., ARR cohorts, magic number, LTV, P&L, ad spend, user retention) so you can understand your startup without spending cycles on data modeling.
If you have complex business logic, our team of experts can also model your custom business logic.
If you’re interested in getting a data strategy in days, not months, for a fraction of the cost of hiring consultants or a data team, email me at email@example.com or check out our free trial here.