What are the best Data Engineering tools of 2023? Every year, BARC publishes a report featuring various Data Engineering technologies and evaluates these tools on various aspects. In half an hour, we'll be breaking down the most interesting aspects of this report and bring you up to speed on the latest trends.
Within Data Engineering, there are various technologies that you can use within your organization. But which technologies keep up with developments, and what are the latest trends in the Data Engineering space?
Every year, BARC writes a data management summary with various technology and reviews them on different aspects. We reviewed this report for you and will tell you all the interesting details. Fill out the form below and know the most interesting trends and technologies in the field of Data Engineering.
Looking for a low-code way to do application development? The Gartner Magic Quadrant on Low Code platforms is a report of, a whopping 125 pages that lists all the platforms. Are you having a hard time dissecting every detail in the report? Don't worry, our experts delved into every little detail!
In the report “Magic Quadrant for Enterprise Low-Code Application Platform,” Gartner explains the differences between various Low Code platforms. But how do you get the important information from this 125-page report? Luckily, you don’t have to do this yourself. In half an hour, our Low Code specialist will update you on the most important developments and trends for the upcoming year.
Picture this: your business is booming, and you need to make informed decisions quickly to stay ahead of the competition. But forecasting can be a tedious and time-consuming process, leaving you with less time to focus on what’s important. That's where Azure Databricks comes in! We used this powerful technology to automate our internal forecasting process and save precious time. In this blog post, we'll show you the steps we took to streamline our workflow and make better decisions with confidence. So make yourself a coffee and enjoy the read!
Before we dive into the nitty-gritty of coding with Databricks, there are a few important setup steps to take.
First, we created a repository on Azure DevOps, where we could easily track and assign tasks to team members, make comments on specific items, and link them to our Git commits. This helped us stay organized and focused on our project goals.
Next, we set up a new resource group on Azure with three resources: Azure Databricks Services, a blob storage account, and a key vault. Although we already have clean data in our Rockfeather Database (thanks to our meticulous data engineers), we wanted to keep our intermediate files separately in this resource group to ensure version control and maintain a clean workflow. Within our blob storage, we created containers to store our formatted historical actuals, exogenous features, and predictions.
Finally, we sketched out a high-level project architecture to get a bird’s-eye view of the project. This helped us align on our deliverables and encouraged discussion within the team about what a realistic outcome would look like. By taking these initial setup steps, we were able to hit the ground running with Databricks and tackle our forecasting process with confidence.
Now we’re ready to dive into Databricks and its sleek interface. But before we start coding, there are a few more setup steps to take.
First, we want to link the Azure DevOps repo we set up earlier to Databricks. Kudos to Azure Databricks, the integration between these two tools is seamless! To link the repo, we simply go to User Settings > Git Integration and drop the repo link there. For more information, check out this link.
To keep our database and blob storage keys and passwords secure, we use the key vault to store our secrets, which we link to Databricks. You can read more about that here!
Lastly, we need to create a compute resource that will run our code. Unlike Azure Machine Learning, Databricks doesn’t have compute instances, only clusters. While this means it takes a minute to spin up the cluster, we don’t have to worry about forgetting to terminate it since it automatically does so after a pre-defined time period of inactivity! It’s also super easy to install libraries on our compute: just head over to the Libraries tab and click on “⛓️ Install new”!
Now that our setup is complete, we move on to using Databricks notebooks for data loading, data engineering, and forecasting. We break this down into three sub-sections:
By breaking down the process into these three sub-sections, we can work more efficiently and focus on specific tasks without getting overwhelmed. As shown in the Project Architecture above, each step is one notebook. Let’s have a closer look at each notebook.
The code we’ll be writing in Databricks is pretty much in standard notebook format, which is familiar territory to all data scientists. Our data engineers in the audience will also appreciate that we can write, for example, SQL code in the notebook. All we have to do is include the %sql
magic command (link: https://docs.databricks.com/notebooks/notebooks-code.html) at the beginning of the cell, as shown below. We use this approach for our first notebook, where we query the data we need. In our case here, it’s past billable hours and the available hours of our lovely consultants. Once we have the data we need, aggregated to the right level, we save it in our blob storage for the next step.
In Notebook 2, we use pandas
and numpy
to preprocess our data and do our feature engineering. We need to make sure we have all the data on the features we’ve identified for the whole forecast period before we start training any models. For example, if we’re forecasting billable hours and using total available hours of our consultants as an exogenous feature, we need to have that data available for the entire forecast period. While this may seem obvious in this case, it’s an important step to take before we jump into any modelling! Once we’ve got our data formatted the way we want it, we save it back to blob storage and move on to the next notebook.
Here’s where the magic happens – we finally get to do some forecasting! We use the darts
package to train multiple models and compete against our baseline Exponential Smoothing Model. We love this package because it’s super easy to use and makes backtesting a breeze. To evaluate model accuracy, we use the mean absolute percentage error (MAPE) – it’s a simple metric that’s easy to understand.
We try out different models like linear regression, random forest, and XGBoost and compare their performance against our baseline. Our baseline model had an MAPE of 44%, which isn’t great, but we’re not deterred. By adding in our single exogenous feature and leveraging our three ML models, we were able to decrease our MAPE to 12% – a huge improvement! And of course, we save our results back to blob storage for future reference.
With our forecasting pipeline up and running in Databricks, we can sit back and watch the predictions roll in. It’s amazing what you can do with a little data and a lot of creativity!
💡 Forecast backtesting is a method used to check how accurate a forecast is by comparing its predictions with what actually happened. This helps identify any errors or biases in the forecasting model, which can be used to improve future predictions. It’s a useful tool in many industries, such as finance or weather forecasting, and helps decision-makers make better-informed decisions.
We have got to give props to Databricks, it’s a tool that makes our lives easier. It’s like having a Swiss army knife in your pocket – it’s slim, versatile, and gets the job done. We love the collaboration feature – we can code with our team, and it’s like a real-time jam session. Plus, setting up and scheduling pipelines is as smooth as butter. The best part is the seamless integration with mlflow and PySpark – it’s like having your favorite sauce on your favorite dish. Let’s just say that Databricks has been a game-changer for us, and we’re excited to see what new features they’ll cook up in the future!
Although we’ve got a pipeline set up, our forecasting journey still has an exciting ride ahead. That’s always how it is with data science projects. The next step is generating maximum value from our forecasts.
We are discussing with our Data Viz team how best to integrate this forecast into our dashboards. Also, we’re scheduling meetings with our CFO to see how exactly we can make his job easier by, for example, introducing more exogenous features or reporting historical accuracy.
As a data-thinking organization, we’re committed to becoming more anti-fragile, and we treat our customers the same way. This means building resilience into our forecasting models to ensure they can withstand unexpected events and continue providing reliable forecasts. If you found this post inspiring and would like to know more, don’t hesitate to reach out!
Where should you start when looking for a Data Science & Machine Learning solution? What information is important? And especially which sources are the most reliable? Our expert goes over these questions in just half an hour. Sign up now!
Choosing a Data Science solution for your business is a difficult task. Our Data Science experts have looked into all the options and will be talking about the best tools and the newest trends in the Data Science & Machine Learning market. All in a short and sweet 30 minute webinar. They’ll also tell you everything you need to think about when picking a tool and what the pros and cons of the most commonly used tools are.
An effective dashboard is a tool for getting clear insights into your data. But what if a dashboard is less effective than imagined? Or even worse; what if your dashboards mislead users? In this blog, we discuss two new features of Power BI. One can mislead your users, while the other makes your dashboards more effective.
The first update we want to highlight is an update that allows small multiples to be turned off on the shared axis. A common complaint among Power BI users was “due to the shared axis, I can’t see smaller values properly anymore”. This update does make that possible, however how useful is this turning off the shared axis really?
Let’s see what happens when we turn off the shared axis.
As you can see, it is not very easy to tell which product group is the largest if the y-axes are not equal. In fact, at first glance, it all looks the same. Only when you look longer do you see that the axes are not equal.
We believe that you should always keep the axes equal, and so even if they are small values. After all, you are comparing different segments. But then how can you visualize small multiples, while also making it easy to read? It’s simpler than it sounds, take a look:
In this case, we recommend using Zebra BI, a plug-in for BI tools such as Power BI. By using this plug-in, the small multiples are placed in boxes of different sizes depending on the values. This allows users to properly compare the segments and thus draw the correct conclusion, as they are not misled by different axes. Sounds useful right?
A feature from the same update that does make Power BI dashboards more effective, is the addition of Dynamic Slicers. With Dynamic Slicers, you can use field parameters to dynamically change the measurements or dimensions analyzed within a report. But why is this so useful?
With Dynamic Slicers, you can help readers of a BI report explore and customize the report, so they can use the information that is useful for their analysis. As shown in the GIF above, a user can filter by Customers, Product Family, Product Group, or other slicers that you’ve set up in the report.
In addition, you can parameterize slicers further to support dynamic filtering scenarios. In the GIF below, you can see how it works. You can see that the value comes from the dynamic slicer and changes dynamically. This gives your users even more opportunities to interact with the dashboard.
Of course, Microsoft continuously tries to update and improve Power BI, but some updates can have unwanted effects. We have been using IBCS standards for years as our Power BI report guidelines, and we also believe that if these standards are applied properly, you’ll prevent unintended consequences. To test if our reports comply with these standards, we use ZebraBI‘s IBCS-proof plug-in while building dashboards in PowerBI. Want to learn more about how IBCS standards can help your organization?
Fill out this form below and find out the basics of IBCS reporting in a quick 20-minute webinar!
As the low-code platform that integrates with Microsoft 365, Azure, and Dynamics 365, the Microsoft Power Platform is perfect for professionals who want to develop apps but lack the necessary programming knowledge. In addition, because of its integration with other Microsoft platforms, the Power Platform is perfect for companies that use Microsoft tools. But what are the different apps? And why is using the Power Platform so useful? What's so convenient about Low Code? In this blog, we try to answer all these questions.
To put it in Microsoft’s words: “Power Apps is a suite of apps, services, and connectors, as well as a data platform, that provides a rapid development environment to build custom apps for your business needs.”
Power Apps enables you (an accountant, engineer, chef, CEO) to build applications that solve your problems regardless of their size.
Low-code is a way to build applications quickly in a “drag and drop” environment. Low-code enables anyone with a desire to build apps quickly and create something without having to understand coding languages like Java or C++.
On top of that, Low-code also enables you to connect to a variety of back-ends or third party services without having to manually connect to an API or other complex connection tools.
The bread and butter of the low-code Power Platform. It enables you to create applications that display data and allow you to interact with your data directly. Power Apps enables you to build applications for both Mobile and web by using a drag and drop interface.
If you want to create automated processes running in the background, this is how you do it. If you want to send emails after new records are created, while at the same time notifying users, Power Automate does this with just a few clicks.
One of Microsoft’s newest offerings allows you to create chat bots using a low-code approach. Chat bots can be deployed internally on Teams or on your own website. They enable users or customers to find answers to problems without the need for a human-ran support centre.
With Power BI, you get even more out of Business Intelligence. Create stunning visuals that provide deep insights into your data and financials. Use forecasting to better predict changes in upcoming time periods and adjust your business planning accordingly.
Imagine you’re an accountant that has to modify data in excel based on the parameters that someone emails to you on a daily basis. Based on these emails, you want to have an overview of all of your data to be able to show to management during your monthly meetings and, on top of this, you want to be able to modify something within the data on a last minute basis if needed. The Power Platform and it’s low code tools can help you automate a large part of your workflow. Here is how:
Here at Rockfeather, we know that not everyone has the budget for their business ideas to become reality. That’s why we offer both trainings in low-code as well as building services. We empower everyone to become a citizen developer – this enables you to build things in your own time, with your own tools. If you get stuck or need help, you can be sure that Rockfeather will be there to lend a hand.
For the more advanced projects, we use all of the tools above and more to create fully fledged desktop and mobile applications that can be exported and deployed to your environment seamlessly.
Forecasting with Data Science can help your organization take the next step in data maturity. In this webinar, we’ll show you how to get even more out of your forecasts with AI.
Power Automate or Logic apps? Two similar automation tools, both on the Microsoft platform with a common look. With many of the same features the differentiation lies in the details.
Want to know what’s on sale for dashboarding or data integration solutions? Want to compare data science solutions? Or would you like to see Low Coding platforms in action? This and more was discussed at the Data & Analytics Line Up 2022!
Power Automate or Logic apps? Two similar automation tools, both on the Microsoft platform with a common look. With many of the same features the differentiation lies in the details.
Most notably, Logic Apps is aimed at more technically proficient users. Power automate is really aimed at citizen developers, however, with less in-depth features and more user-friendly options.
For most companies, Power Automate is included in the Microsoft Office license with the standard connectors. Though, customers interested in premium connectors will need a premium license. Logic Apps is a service that’s pay as you go, meaning that you pay while the app is running.
Below are 3 key differences:
To further expand on the differences in licensing between Power Automate and Logic Apps, we summarized some key takeaways below:
Power automate is part of the Microsoft 365 environment and the power platform, and its main aim is to automate tasks and work within the Power Platform. By comparison, Logic Apps is one of the solutions within the Microsoft Azure Integration Services and is thus more commonly used for ETL processes and data integration. Therefore, Logic Apps has good integration within Azure, but lacks this integration with the Power Platform. What’s best for your business simply depends on what type of capability you need and what you want.
For instance, consider the following examples:
Interested about the strengths and weaknesses of both platforms? We have articles that dive deeper into both Logic Apps and Power Automate and all their pros and cons.
In today’s data-centric business environment, mastering the art of transforming data into actionable insights can dramatically differentiate leaders from followers. Have you ever been in a situation where you doubted the reliability of your data, leading to decision paralysis? Then read this blog!
Whether you’re a seasoned Power BI user or a BI manager looking to elevate your team’s capabilities, this session will provide you with the insights and tools necessary to achieve a clean and effective Power BI environment.
You have formulated a solid business case for your Data Science project. Congratulations! But what’s next? In this webinar, we will give you an overview of the steps to take in your Data Science project. We will also show you which technologies you can best use for your project.
• Notebooks
• Auto ML
• A drag & drop tool called Designer
Machine learning is a form of artificial intelligence where, through software and algorithms a computer uses from historical data to make predictions about the future. These complex computer codes, also known as algorithms, are used to create complex models. These machine learning models use algorithms to find patterns or relationships in the data, which can then be used by a model to predict values based on new data.
Azure Machine Learning is Microsoft’s answer to the increasing demand for Machine Learning as a Service solutions. With Azure Machine Learning, Microsoft provides a cloud based machine learning environment within the Azure platform through which data science projects can be run and managed. The machine learning services seamlessly integrate with the other cloud computing services Microsoft offers through the Azure platform. Azure Machine Learning gives you the ability to develop models based on open source machine learning tools such as Pytoch, TensorFlow, scikit-learn and many other resources.
A major development in the field of data science in recent years has been the emergence of Auto ML. This development automates parts of the data science development process that were previously often repetitive and time-consuming. In fact, the most modern Auto ML solutions can perform almost the entire development process with well-structured data. However, in most cases, data is not so structured and prior work is still needed. Big data needs to be trained before it can be used in a model. Where Auto ML really adds great value is in training and testing different models. Previously, after cleaning and structuring big data, the data scientist had to determine which models to train and test. The process of training and testing different models is a time-consuming task and often requires a lot of computing power. This is one of the reasons why data scientists often choose to train and test only a limited selection of algorithms. Auto ML allows the data scientist to train and test a much larger selection of algorithms on scalable cloud computers. In this way, the data scientist does not have to train and test every algorithm himself. With Auto ML, the data scientist indicates which algorithms should be tested and trained and which ones should be compared according to which error metric and/or statistics results. As an analogy, think of it like cycling. Previously, a data scientist cycled around on a city bike without gears, but with the development of Auto ML, a data scientist gets an electric bike with pedal assistance. You still have to ride the bike, but there is support from technology that gets data scientists to their destination faster.
At Rockfeather, we are also aware of the advantages that Azure Machine Learning offers over conventional data science tools. Using Azure Machine Learning allows us to deliver better and faster results to our customers in the field of data science. One of the projects where we at Rockfeather have leveraged Azure Machine Learning effectively is predicting sales numbers for one of our clients. These predictions, or forecasts as it is officially called, is a branch of data science that uses historical data to make predictions about the future. Good forecasting is important to this client because it saves costs and waste. The challenge with this project was that there was a large assortment of different products and therefore sales numbers had to be predicted for all these products. Manually training and testing models for a large assortment takes a lot of time, which is why we applied Auto ML in this situation. Because of the scalability of Azure’s cloud based Machine Learning, we can train and test a selection of different algorithms per product to find the most effective algorithm for each product. These models are all stored in Azure Machine Learning and once trained can be used to actually make predictions about future sales numbers.
In today’s data-centric business environment, mastering the art of transforming data into actionable insights can dramatically differentiate leaders from followers. Have you ever been in a situation where you doubted the reliability of your data, leading to decision paralysis? Then read this blog!
Whether you’re a seasoned Power BI user or a BI manager looking to elevate your team’s capabilities, this session will provide you with the insights and tools necessary to achieve a clean and effective Power BI environment.
You have formulated a solid business case for your Data Science project. Congratulations! But what’s next? In this webinar, we will give you an overview of the steps to take in your Data Science project. We will also show you which technologies you can best use for your project.