analytics engineering

5 engineering tools every analytics and data engineer needs to know

Are you considering venturing into the world of analytics engineering? Analytics engineers are the newest addition to data teams and sit somewhere between data engineers and data analysts. They are technical, business savvy, and love to learn. A huge part of an analytics engineer’s role is learning new modern data tools to implement within data stacks.

Data stacks consist of many different tools - one for ingestion, another for transformation, another for orchestration, and another for data visualization

It can be overwhelming to know which tools to tackle first, especially when just starting out. In most cases, once you learn one type of tool, you can easily adapt and learn the others. The most important thing is to start learning something! However, there are a few key players in each of the data stack layers that are a great place to start. You will find that most digital native companies use these tools, making it easier to apply for an analytics engineer role. 

First, you need to determine the cloud data platform solution you would like to learn. I highly recommend Snowflake because of its ease of use and high growth in the space. And, they offer a great free trial! 

Second, you need to determine how you will ingest data into your cloud data platform. Airbyte is an open-source tool for ingesting data from various platforms to your data platform. 

Third, you have to consider how you will transform your data once its in your data platform. dbt takes the cake on this one- making it easy to organize your data models and efficiently run them. 

Next, you need to learn how to deploy these data models to production. Prefect is an easy-to-use orchestration tool that is written in Python. 

Lastly, it is important to visualize all of your findings so that business teams can easily draw insights from your data. Thoughspot leverages search and AI to create interactive dashboards and visuals. 

Let’s discuss each of these tools and why they are so important to learn if you’re wanting to become an analytics engineer.

1. Snowflake

It is essential to learn how to use a cloud data platform because of how important it is in the modern data stack. It sits at the center of ALL your tools. Every tool is connected to your data platform in some way. It is where data is ingested, transformed, and shared. Not only this, but it is your one source of truth, meaning the one place teams can rely on data being accurate. 

Considering I recently attendedthe Snowflake Summit in Las Vegas, you could say I’m a huge fan of this tool. I’ve used other cloud data platforms such as Azure and Redshift but nothing comes close to the ease of use as Snowflake. There is a reason the company has had 101% year-over-year growth! 

Better yet, Snowflake offers a 30-day trial and $400 in free credits for you to experiment and learn how the data platform works. I recommend getting familiar with the commands to create different resources such as databases, schemas, tables, views, and warehouses. Besides running queries, as an analytics engineer, you are expected to create the architecture of your data platform. You can also study how users and roles work and assign different permissions to each role. 

Besides Snowflake, Databricks and Google BigQuery are other cloud data platforms that offer helpful features and high performability. They, too, have free trials for you to hone your skills.

2. Airbyte

Airbyte is the leading open-source data connector. It allows you to connect to external APIs in order to ingest data into your data platform. Because it is open-source, it requires a bit more manual effort to properly set up your ingestion streams. However, it really teaches you to read through external API’s documentation in order to properly configure your account settings. As an engineer, this is a great skill to have. You never know when you will be required to work with external APIs. 

Airbyte offers various different data connectors such as Facebook, Google Ads, PostgreSQL, Shopify, and Survey Monkey. Learning to connect to these major platforms will be key in any analytics engineering role. Because Airbyte is open-source, you can configure connections to these platforms for an affordable price. Perfect for someone who is just beginning to gain engineering skills!

If you work for a company that has an enterprise budget, I also highly recommend checking out Fivetran, Matillion, and Stitch for data ingestion.

3. dbt

Nobody does data transformation like dbt. This open-source tool allows analytics engineers to write, test, and document their SQL data models. Fun fact, they actually coined the analytics engineering field! dbt cloud is such a powerful tool because it decreases the amount of repeatable code you write, allowing for faster and cleaner data models. It is almost impossible to meet an analytics engineer who doesn’t use dbt every day.

dbt allows you to organize your data models into different SQL files that all reference one another. Within these SQL files you can utilize different features such as macros, packages, and tests to make your code cleaner and easier to read. Macros act as functions that can be applied anywhere within your SQL code, automating code that may get redundant throughout your models. Tests allow you to apply data quality checks to columns and data models, testing for certain conditions while they run. 

Because dbt leverages SQL, another key skill of an analytics engineer, learning how to master both is a great step to becoming an analytics engineer. If you can master the use of various SQL functions such as aggregates and window functions, in conjunction with dbt, you will be good to go. dbt also uses a templating language called Jinja. As you familiarize yourself with macros, tests, and the overall syntax of dbt, you will slowly start to understand Jinja. I highly recommend checking out their courses where you can learn all the ins and outs of creating a dbt project.

4. Prefect

Prefect is on the rise of popularity for data pipeline tools. If you aren’t familiar with it, it is an open-source Python framework that uses “tasks” to deploy and automate your data pipelines. Because it is written in Python, it is easier to use than most tools. While you don’t necessarily need to know Python to become an analytics engineer, it comes in handy when dealing with orchestration. It is also way more readable and easy to learn than other languages like Javascript. 

Prefect is the best pipeline solution for analytics engineer because of how easy it is to maintain. If you work on a small team and have multiple responsibilities, you don’t want to spend all your time building the infrastructure to support your data. You want to find a solution that does all of the heavy-lifting for you. Prefect is easy to set up compared to other tools on the market, as long as you have some basic Python knowledge. It also integrates with all of the tools I’ve already mentioned such as Snowflake, Airbyte, and dbt. 

With Prefect, YOU still own the code, making it secure for all types of data teams. However, they maintain the infrastructure used to run your data pipelines. The best of both worlds! Prefect is a great orchestration tool to begin with because of the relatively small learning curve. Then, if you feel the need, you can work your way up to learning other popular tools like Airflow and Dagster.

5. ThoughtSpot and/or Looker

Lastly, we have our visualization tool, the key player when it comes to translating data into insights. Being an analytics engineer who transitioned from a data engineer, I personally don’t have too much experience with business intelligence tools. In my current role, I own the data pipeline from ingestion to orchestration, but the data analyst on our team owns visualizing the data. After reaching out to my peers and other analytics experts, I learned two of the most popular visualization tools in the modern data stack are ThoughtSpot and Looker, each with a mixed bag of strengths and weaknesses.

ThoughtSpot integrates perfectly with the cloud ecosystem and allows you to create interactive Liveboards and visuals. It enables self-service analytics, which is key for creating synchronicity between data and business teams. Self-service platforms like ThoughtSpot help analytics engineers and analysts eliminate ad-hoc requets to focus on more complex initiatives. 

Like Prefect, ThoughtSpot is great because of how it easily integrates into the rest of your data stack. It connects directly to your dbt models and Snowflake’s data platform. The ThoughtSpot Modeling Language (TML) allows analytics engineers to package up data models, queries, and Liveboards into open source code to create re-usable apps (see Mimoune Djouallah's first look.) 

Looker was an early visualization tool for cloud data platforms and one that analytics engineers like for its modeling language, LookML. However, since Google’s acquisition of Looker in 2019, some customers feel innovation has slowed. It lacks integration with dbt which is a major downfall for analytics engineers. While Tableau is one of the most popular visualization tools, it too lacks integration with dbt, has desktop dependencies, and often requires replicating data into its own in-memory engine for more sophisticated analysis.

Only ThoughtSpot combines intuitive user experience, natural language search, automated insights, and performance all on live data stored in your cloud data platform. 

This being said, I think ThoughtSpot is important to learn because of the best practices baked into the tool. As an analytics engineer, it is important to ensure your models are always using the highest quality data as well as following software engineering best practices. ThoughtSpot helps to auto-analyze your data for quality issues as well as keep your code portable, reusable, and version controlled. When you focus on these best practices from the beginning of your career, they become second nature as you move into more complex problems. 

I plan to learn ThoughtSpot more in the upcoming months so that I can take advantage of the features it offers. To me, it is one step more complex and powerful than your typical business intelligence tool. It will not only help me improve my visualization skills and engineering best practices, but it will help introduce me to automation and machine learning in a business context. 

Build your skills across the modern data stack

As an analytics engineer, it is important to familiarize yourself with at least one tool from each of the main layers of a modern data stack. At the center of your data stack, Snowflake brings one source of truth to your business and is a good introduction to cloud data platforms. At the data ingestion layer, Airbyte teaches you how to use different external platforms and read key documentation. At the data transformation layer, dbt allows you to write complex SQL transformations, reducing redundancy in your code. As you orchestrate your data to the next layer, Prefect creates a time and resource-saving pipeline that analytics engineers can easily manage. And lastly, at the experience layer, ThoughtSpot enables self-service analytics through data quality checks and interactive data visualizations.

Together, these tools will teach you a lot of the skills necessary to become an analytics engineer. They will also give you experience that you can easily apply to learning any other tool in the modern data stack space. Happy learning!