Select Page

Data pipeline QA | Pluralsight

Published: September 20, 2022

Durga Halaharvi (DH): As the world moves towards more real-time use cases, like Netflix which requires on-demand use cases, I see more organizations using stream processing. This is also influenced by the rise of DS/ML and a consistent data feed which powers real-time recommendations. 

In general, I also see an increase in open-source technologies like Apache Flink and Kafka. Specifically, I’ve seen a lot more organizations adopt dbt (data build tool) and repurpose it for their own needs. Traditionally, dbt has been an analytics engineering tool because it’s focused on processing data to display it in Tableau or a dashboard. But even at Pluralsight, our data engineering teams have started to adopt the tool more. 

Andrew Brust (AB): One of the bigger trends is the fact that so much data in so many platforms is moving to the cloud. Even the pre-cloud era vendors have moved to the cloud. Informatica, for example, has been around since the ETL days. They began with an on-premises product, but they don’t actively sell it anymore. Most of the action has moved to the cloud. That’s where customers want to be. That’s where Informatica wants to be. And this goes not just for Informatica, but companies like Oracle and Microsoft, too.

In the industry, some people say that old platforms are obsolete. But it’s more complicated than that. The older companies excel at running data pipelines without fail for mission-critical workloads. They took their experience and added it to new technology. 

On the other hand, startups leverage innovative technology, but they may lack the experience or customer knowledge needed to deliver a quality solution. If they haven’t experienced harsh conditions, they may not have bulletproofed their platforms. You could also run into this if you use open source code and run it yourself rather than on a managed platform.