Data Science in Microsoft Fabric

Microsoft Fabric is a platform that offers Data Science solutions to empower users to complete end-to-end data science workflows for data enrichment and business insights. The platform supports a wide range of activities across the entire data science process, from data exploration, preparation, and cleansing to experimentation, modeling, model scoring, and serving predictive insights.

The typical data science process in Microsoft Fabric involves the following steps:

  • Problem formulation and ideation: Data Science users in Microsoft Fabric work on the same platform as business users and analysts, which makes data sharing and collaboration seamlessly between different roles.
  • Data Discovery and pre-processing: Users can interact with data in OneLake using the Lakehouse item. There are also tools available for data ingestion and data orchestration pipelines.
  • Experimentation and ML modeling: Microsoft Fabric offers capabilities to train machine learning models using popular libraries like PySpark (Python), SparklyR (R), and Scikit Learn. It also provides a built-in MLflow experience for tracking experiments and models.
  • Enrich and operationalize: Notebooks can handle machine learning model batch scoring with open-source libraries for prediction, or the Microsoft Fabric scalable universal Spark Predict function.
  • Gain insights: Predicted values can easily be written to OneLake, and seamlessly consumed from Power BI reports.

Business case:

The financial department of your organization already uses Power BI within Fabric to visualize their data. Now, they would like to use machine learning to generate a cashflow forecast. The data is already in a Lakehouse in OneLake and does therefore not have to be moved or copied. It can directly be used in Synapse Data Science to preprocess the data and perform exploratory data analysis using notebooks. Then, model experimentation can start within the notebooks, tracking important metrics using the built-in MLflow capabilities. After landing on a model that performs well, a cashflow forecast can be generated and written back to the Lakehouse, ready to be visualized within Power BI. The notebooks can then be scheduled to automatically generate a monthly cashflow forecast.

Synapse Real-Time Analytics in Fabric

Microsoft Fabric can also serve as a powerful tool for real-time data analytics, featuring an optimized platform tailored for streaming and time-series data analysis. It is thoroughly designed to streamline data integration and facilitate rapid access to valuable data insights. This is achieved through automatic data streaming, indexing, and data partitioning, all of which are applicable to various data sources and formats. This platform proves to be particularly well-suited for organizations seeking to elevate their analytics solutions to a larger scale, all the while making data accessible to a diverse spectrum of users. These users span from citizen data scientists to advanced data engineers, thus promoting a democratized approach to data utilization.

Key features of Real-Time Analytics include:

  • Capture, transform, and route real-time events to various destinations, including custom apps.
  • Ingest or load data from any source, in any data format.
  • Run analytical queries directly on raw data without the need to build complex data models or create scripting to transform the data.
  • Import data with by-default streaming that provides high performance, low latency, high freshness data analysis.
  • Work with versatile data structures including query structured, semi-structured, or free text.
  • Scale to an unlimited amount of data, from gigabytes to petabytes, with unlimited scale on concurrent queries and concurrent users.
  • Integrate seamlessly with other experiences and items in Microsoft Fabric.

Want to know more about Microsoft Fabric?

Want to know more about Microsoft Fabric as a service? On October 26 we’re organizing a Fabric Masterclass that dives deep into different Fabric use cases for Data Science, Data Engineering, and Data Visualisation.

Want to know more about Microsoft Fabric

Microsoft Fabric

  • Software as a Service (SaaS) 
  • OneLake 
  • Copilot 
Read more

Data Visualization in Microsoft Fabric

By seamlessly integrating with Power BI, Microsoft Fabric revolutionizes how you work with analytics.

Read more

Data Engineering in Microsoft Fabric

In Microsoft Fabric, data engineering plays a pivotal role to empower users to architect, construct, and upkeep infrastructures that facilitate seamless data collection, storage, processing, and analysis for their organizations.

Read more
All posts