DP-900 Study Guide
Table of Contents
- DP-900 Microsoft Azure Data Fundamentals
- Describe core data concepts (25–30%)
- Identify considerations for relational data on Azure (20–25%)
- Describe considerations for working with non-relational data on Azure (15–20%)
- Describe an analytics workload (25–30%)
DP-900 Microsoft Azure Data Fundamentals
🎓 Don't Forget Your Learning Badge!
Congratulations on completing your study! You can redeem your learning badge here to showcase your achievement.
Describe core data concepts (25–30%)
Describe ways to represent data
Describe features of structured data
- Structured data is organized into a defined schema, which describes how data is stored and arranged—for example, tables with columns in databases. The schema ensures consistency by enforcing data types and rules for each element, such as specifying that a column can only contain numbers or dates.
- Each element in structured data, such as a column in a table, has a clearly defined type (like string, integer, date) and restrictions (such as maximum length or allowed values). This allows users and systems to validate, query, and manage data efficiently.
- Structured data is easy to access and analyze using tools like SQL because its organization follows predictable patterns. In cloud platforms like Azure Data Explorer, structured data can be efficiently queried for insights, reporting, and analytics since the data structure is known in advance.
Example: A customer database in Azure SQL contains tables where each row is a customer and columns include ‘CustomerID’ (integer), ‘Name’ (string), and ‘JoinDate’ (date). The schema makes sure that, for instance, every CustomerID is unique and that JoinDate follows a valid date format.
Use Case: An IT team in a company new to Azure Data wants to set up employee data in Azure Data Explorer. By defining structured tables with columns for employee ID, name, department, and date of hire, they can reliably store and query employee information to generate reports, track onboarding trends, and ensure data accuracy.
For more information see these links:
- Type definitions and how to create custom types
- What is Azure Data Explorer?
- Entity Data Model
- Schema (MFC Data Access)
- Data sources that connect to Data Map
Describe features of semi-structured
- Semi-structured data includes information that does not fit neatly into tables but still has some organizational properties, such as tags or keys. Common formats are JSON and XML, which use key-value pairs or nodes for flexible data description.
- Unlike structured data, semi-structured formats can readily accommodate changing or unpredictable data attributes. For example, a JSON object can have new fields added as needed, making it suitable for data sources with varying schemas.
- Azure SQL and related services provide built-in functions to read, write, and transform semi-structured data. JSON data, for example, can be stored directly in database columns, queried using Transact-SQL enhancements (such as OPENJSON and JSON_VALUE), and converted between relational and semi-structured formats for integration and analytics.
Example: Suppose you are storing information about products for an e-commerce site. Not all products share the same attributes—some have color, some have size, others have warranty. You can store these varying attributes as JSON objects in a single database column, so each product entry can include only the fields that apply.
Use Case: As a beginner working with Azure Data, you might need to ingest and analyze real-time device logs from IoT sensors. The log format may change as devices are updated, so storing each log entry as JSON in Azure SQL Database enables you to flexibly ingest, store, and query this semi-structured data without redesigning your tables.
For more information see these links:
- JSON data in SQL Server
- Multi-model capabilities
- Multi-model capabilities
- Multi-model capabilities
- Model semi-structured data
Describe features of unstructured data
- Unstructured data does not follow a predefined format or schema, meaning it does not easily fit into rows and columns like a traditional database. Examples include text files, images, audio, and videos.
- It often contains valuable information, but extracting insights requires specialized tools and techniques to process, analyze, and convert it into a usable form.
- Unstructured data can be stored and managed in a data lake, which is designed to handle large volumes of raw, varied data for future processing and analytics.
- Automatic data type detection tools, such as Power Query, can help identify and organize data from unstructured sources by inspecting its content and assigning appropriate types for analysis.
- Storing unstructured data in cost-effective cloud solutions enables organizations to keep vast amounts of information for machine learning, historical analysis, or real-time processing.
Example: A company stores customer feedback as free-form text responses in a cloud-based data lake. Since these responses don’t follow any set format, they are considered unstructured data. The business uses text analysis tools to extract common themes and sentiments from the feedback, helping them understand customer needs.
Use Case: An Azure Data beginner imports social media posts (unstructured text) into Azure Data Lake Storage. Using Power Query, they automatically detect and assign data types, then transform the raw text into organized insights about brand sentiment, which can be visualized in a dashboard for business decision-making.
For more information see these links:
- Big data architectures
- Data types in Power Query
- Big data architectures
- Big data architecture style
- Data types in Power Query
Identify options for data storage
Describe common formats for data files
- Text-based formats like .txt, .csv, .tsv, and .json are widely used for simple data storage and easy exchange between systems. They store data in a human-readable way and are easy to create and edit manually or programmatically.
- Document formats such as .docx (Word), .pptx (PowerPoint), and .pdf (Adobe PDF) are commonly used to store reports, presentations, and printable content. These formats are suitable for sharing structured data and documents within teams.
- Specialized data formats like .parquet and .orc are designed for analytical workloads in cloud environments, offering efficient storage, compression, and faster data processing for big data.
- Compressed file formats like .zip and .gz are used to reduce storage space and improve transfer speed. Compression can be applied to a variety of underlying data formats to optimize storage usage.
- When preparing files for Azure services, ensure data is in a supported format and properly structured. Tools and scripts are often available to help clean and chunk large or complex documents, improving processing accuracy.
Example: A company exports customer data from a web application in CSV format (.csv). The file contains rows of information such as names, email addresses, and purchase history, which can be easily imported into Azure for analysis or into spreadsheets for reporting.
Use Case: A new Azure Data user needs to upload past project documentation to Azure OpenAI On Your Data for knowledge extraction. They convert .pdf and .docx files to .txt format using a provided script to ensure the AI model can access and process the content accurately for question answering and summarization.
For more information see these links:
- Azure OpenAI On Your Data
- Azure OpenAI On Your Data
- Learn about Endpoint data loss prevention
- Data formats supported by Real-Time Intelligence
- Data formats supported by Real-Time Intelligence
Describe types of databases
- Relational (SQL) Databases: These databases organize data into tables with predefined schemas and use Structured Query Language (SQL) for creating, reading, updating, and deleting data. They excel at maintaining data integrity and handling complex queries involving relationships between different data entities. Good for scenarios requiring consistency and structured data, such as financial records.
- NoSQL Databases: NoSQL databases provide flexible ways to store unstructured or semi-structured data. They include different models: document stores (e.g., MongoDB), key-value stores (e.g., Redis), wide-column stores (e.g., Cassandra), and graph stores (e.g., Neo4j, Azure Cosmos DB for Apache Gremlin). They are optimized for scalability and performance, especially in large-scale or real-time applications.
- Analytical Databases: These databases are designed for analyzing large datasets efficiently. They use technologies like columnar storage to enable fast querying and reporting (OLAP workloads), making them suitable for data warehouses, business intelligence, or big data analytics. Examples in Azure include Azure Synapse Analytics and Azure Data Lake.
- Transactional Databases: Optimized for handling frequent read and write operations (OLTP workloads), these databases ensure that transactions are processed quickly, reliably, and securely. Relational and NoSQL databases can be used for transactional workloads, depending on the requirements for speed, scalability, and data structure.
- Graph Databases: Store data as nodes and relationships (edges), making them ideal for handling complex, interconnected data such as social networks or organizational hierarchies. They efficiently manage and analyze relationships between entities.
Example: A retail company uses a relational database (such as Azure SQL Database) for storing customer profiles and sales transactions, a NoSQL document database (such as Azure Cosmos DB) for capturing flexible product catalog data, and an analytical database (such as Azure Synapse Analytics) to generate sales reports and trends for business decisions.
Use Case: An IT professional new to Azure needs to choose the right data storage option for a cloud-based web application. For user accounts and purchases, they use a relational database to ensure integrity and perform complex queries. For storing session data or activity logs with different formats, a NoSQL database is chosen due to its flexibility and scalability. To analyze user behavior and optimize marketing campaigns, analytical databases are used to aggregate and report on large volumes of data efficiently.
For more information see these links:
- SQL vs. NoSQL data
- Data store classification
- Understand data store models
- Working with Data in ASP.NET Core Apps
- Get started with databases on Windows Subsystem for Linux
Describe common data workloads
Describe features of transactional workloads
- Transactional workloads are characterized by operations that must be completed together as a single, indivisible unit. If any part of a transaction fails, all changes are rolled back to maintain data integrity.
- Transactions follow the ACID properties: Atomicity (all-or-nothing operations), Consistency (data stays valid before and after), Isolation (transactions do not interfere with one another), and Durability (committed changes persist even after crashes).
- Transactional workloads usually involve frequent, small read and write operations—such as updating, inserting, or deleting records—often in systems like banking, ecommerce, or inventory management.
- Support for transactions is built into many Azure data services (like Azure SQL Database and Azure Cosmos DB). Developers often use features like stored procedures or transaction batches to ensure proper management and reliability.
- Keeping transactions short and efficient is important to maximize system performance and prevent bottlenecks, especially when multiple resources (databases, tables) are involved.
Example: Imagine a bank application where a customer transfers money from their savings to checking account. The system subtracts the amount from savings and adds it to checking. Both operations must succeed together; if either fails, neither should go through. This ensures the customer’s total funds remain consistent.
Use Case: A retail company using Azure SQL Database to process customer orders ensures that inventory reduction, payment processing, and order creation all happen as a single transaction. If the payment processing fails, inventory is not reduced, and the order isn’t created—preventing data inconsistencies and order fulfillment problems.
For more information see these links:
- Transactions and concurrency
- Transaction Fundamentals
- Stored procedures, triggers, and user-defined functions
- Transactions and Concurrency
- Transactions (MFC Data Access)
Describe features of analytical workloads
- Analytical workloads focus on processing and analyzing large volumes of data to gain insights and support business decisions. They typically involve reading, aggregating, and summarizing data from various sources, rather than frequent updates or transactions.
- These workloads use column-based storage and distributed, parallel processing, which make it efficient to scan and compute over big datasets. This design is optimized for queries that search, filter, or aggregate across many records, such as business intelligence, reporting, or data science activities.
- To avoid impacting critical transactional operations, analytical workloads are often separated from transactional systems. Technologies like Azure Synapse Analytics and Azure Cosmos DB Analytical Store enable organizations to run analysis on near real-time operational data without slowing down the systems that support daily business functions.
Example: A retail company collects sales transactions from thousands of stores. Managers use analytical workloads to analyze weekly sales trends, identify popular products, and forecast demand, helping make inventory and marketing decisions—all without affecting the systems processing customer purchases in real-time.
Use Case: On Azure, a beginner can enable Azure Synapse Link with Azure Cosmos DB Analytic Store to run analytical queries directly on operational data (such as sales or customer activity) using Synapse Studio. This approach lets users create interactive dashboards or generate reports for business intelligence, while keeping the performance of day-to-day applications high.
For more information see these links:
- Data platform considerations for mission-critical workloads on Azure
- Data science and analytical applications workload in Visual Studio
- Data platform considerations for mission-critical workloads on Azure
- Data platform considerations for mission-critical workloads on Azure
- What is Azure Cosmos DB analytical store?
Identify roles and responsibilities for data workloads
Describe responsibilities for database administrators
- Perform regular database backups and restores to protect data from loss or corruption. Database administrators create, manage, and test backup plans to ensure data can be restored in case of disasters or accidental deletion.
- Monitor and tune database performance, ensuring that queries run efficiently and resources are optimally used. This includes rebuilding indexes, updating statistics, and analyzing resource usage to prevent slow response times.
- Maintain database security by managing user permissions, roles, and access controls. Database administrators ensure that only authorized users can access or modify sensitive data, and they implement best practices to comply with security standards.
- Plan and implement maintenance tasks, such as database consistency checks, data compression, and updating indexes. These tasks help keep databases healthy and performing well.
- Use specialized tools (such as SQL Server Management Studio, sqlcmd, or maintenance plans) to automate, schedule, and report on ongoing database management tasks, minimizing manual errors and improving efficiency.
Example: A database administrator at a retail company schedules nightly backups using SQL Server Management Studio. If a hardware failure occurs, they can use the latest backup to restore the sales database, ensuring business continues with minimal disruption.
Use Case: In an Azure Data environment for a small startup, a new database administrator sets up automated maintenance plans to run weekly index rebuilds and nightly full backups using free tools like SQL Server Management Studio. This helps the startup prevent data loss and maintain fast database performance as their customer base grows.
For more information see these links:
- SQL tools overview
- BizTalk Server Roles and Responsibilities
- Maintenance plans
- Options in the Back Up Database Task for Maintenance Plan
- Back Up and Restore of SQL Server Databases
Describe responsibilities for data engineers
- Design and build data pipelines: Data engineers create and manage processes that move and transform data from various sources (such as databases, files, and streaming data) into formats suitable for analysis and reporting. This often involves developing Extract, Transform, Load (ETL) pipelines using tools like Azure Data Factory.
- Maintain and optimize data storage: Data engineers set up, monitor, and tune data storage solutions, including relational databases, data lakes, and data warehouses such as Azure Data Lake Storage and Azure Synapse Analytics. They ensure data is stored securely, efficiently, and can be accessed quickly by those who need it.
- Ensure data quality, security, and compliance: Data engineers are responsible for cleaning, validating, and securing data throughout its lifecycle. This includes applying data privacy standards, checking for errors or inconsistencies in data, and monitoring pipelines to ensure data delivery aligns with organizational and legal requirements.
- Collaborate with other roles: Data engineers work closely with data analysts, data scientists, and business stakeholders to deliver data solutions that meet business needs. They provide the foundation that allows others to generate insights and make data-driven decisions.
- Monitor and troubleshoot data systems: Data engineers regularly monitor the performance and reliability of data pipelines and storage, resolving technical issues and optimizing processes to support business operations and analytics.
Example: A data engineer in an IT company sets up an Azure Data Factory pipeline to automatically collect customer activity logs from several application servers, cleans the data by removing duplicates, transforms it by standardizing formats, and loads it into an Azure Synapse Analytics data warehouse so business analysts can generate usage reports.
Use Case: An Azure data engineer is tasked with integrating multiple transactional databases from different departments into a centralized analytics workspace using Azure Synapse Analytics. This involves building secure data pipelines, transforming data formats, monitoring for data quality issues, and ensuring only authorized users can access sensitive company information.
For more information see these links:
- Synapse implementation success methodology: Evaluate team skill sets
- Discover Microsoft Azure learning, training, certifications, and career path opportunities
- Get started with data engineering on Azure - Training
- Microsoft Certified: Azure Data Engineer Associate - Certifications
- Course DP-203T00-A: Data Engineering on Microsoft Azure - Training
Describe responsibilities for data analysts
- Data analysts are responsible for cleaning and preparing raw data, ensuring data is accurate and consistent before analysis. This involves removing errors, handling missing values, and standardizing data formats.
- They design and build data models that organize information in a way that makes it easier to analyze and report on. This includes creating tables, charts, graphs, dashboards, and other visualizations to help stakeholders understand trends and patterns in the data.
- Data analysts translate business requirements into actionable reports and insights. They work closely with business teams to identify the most important questions and use data to find answers, supporting decision-making processes.
- They manage data access and usage by configuring permissions and monitoring who uses data, helping ensure data privacy and compliance with organizational policies.
Example: Imagine a company wants to improve the efficiency of its online sales. A data analyst collects all sales data from the past six months, cleans it to remove errors, and then creates dashboards showing daily sales trends, top-selling products, and peak sales times. Their reports help the marketing and sales teams decide when to launch promotions and which products to feature.
Use Case: A new Azure Data analyst in an IT organization uses Azure Synapse Analytics to connect to multiple databases, cleans incoming customer support ticket data, builds visual dashboards that show the most common issues, and shares these insights with the IT support team so they can prioritize fixes and improve response times.
For more information see these links:
- Roles and Responsibilities
- Roles and Responsibilities
- Roles and Responsibilities
- Roles and Responsibilities
- Synapse implementation success methodology: Evaluate team skill sets
Identify considerations for relational data on Azure (20–25%)
Describe relational concepts
Identify features of relational data
- Structured data in tables: Relational data is organized into tables, where each table consists of rows and columns. Each row represents a record and each column represents a data attribute, such as ‘Name’ or ‘Email’. This structure makes it easy to manage and query data.
- Relationships enforced with keys: Relationships between data in different tables are managed by keys. A primary key uniquely identifies each record in a table, and foreign keys create links between tables. For instance, an ‘Order’ table may have a foreign key pointing to the ‘Customer’ table.
- Data integrity through constraints: Relational databases use constraints like unique, not null, and foreign key constraints. These rules ensure data accuracy, prevent duplicate or invalid entries, and enforce connections between related data.
- Schema-on-write and normalization: The data structure (schema) is defined ahead of time, and all data must follow this structure. Normalization splits data into related tables to avoid redundancy, making updates efficient and minimizing storage usage.
- ACID transactions for consistency: Relational databases guarantee data consistency with ACID properties (Atomicity, Consistency, Isolation, Durability). This ensures that updates and queries are reliable and data remains consistent even during errors or high-traffic situations.
Example: In an online store, information about customers is stored in a ‘Customers’ table, while their orders are kept in an ‘Orders’ table. Each order references the customer’s unique ID using a foreign key, allowing the database to link and retrieve all orders for a specific customer easily.
Use Case: A beginner Azure Data Engineer wants to create a secure, reliable sales reporting system using Azure SQL Database. By defining separate tables for products, customers, and orders, including relationships via primary and foreign keys, they can safely track sales transactions, produce accurate reports, and ensure data integrity.
For more information see these links:
- Understand data store models
- Describe concepts of relational data - Training
- Data modeling in Azure Cosmos DB
- Data modeling in Azure Cosmos DB
- Modeling relationships
Describe normalization and why it is used
- Normalization is the process of organizing data in a relational database to minimize redundancy (duplicate data) and improve data integrity.
- It involves breaking down large tables into smaller, related tables and establishing relationships using keys, following formal rules known as normal forms.
- By using normalization, you ensure that data updates, deletions, and insertions are easier to manage and less likely to cause inconsistencies or data errors.
Example: Suppose you are storing customer orders in Azure SQL Database. Without normalization, each order record might repeat the same customer address information. With normalization, you separate customer information into a Customers table and orders into an Orders table, connecting them by a customer ID. This way, if a customer updates their address, you only need to change it in one place.
Use Case: A company using Azure Database for MySQL to store sales data splits customer details and order details into separate tables. This setup reduces data duplication and makes it easier for their data analysts to query accurate customer statistics and correctly join records for Power BI reports.
For more information see these links:
- Description of the database normalization basics
- Glossary
- Data unification best practices
- Understand data store models
- Understand star schema and the importance for Power BI
Identify common structured query language (SQL) statements
- The SELECT statement is used to retrieve data from a database. It allows you to specify which columns and rows you want to see, using optional clauses like WHERE to filter results, and ORDER BY to sort them.
- The INSERT statement enables you to add new records to a table. You specify which table you want to insert into and provide the corresponding values for the columns.
- The UPDATE statement is used to modify existing data in a table. You can select which records to update using the WHERE clause and set new values for one or more columns.
- The DELETE statement removes data from a table. With an optional WHERE clause, you control which records are deleted, helping prevent accidental data loss.
- The CREATE statement is for creating new database objects, such as tables or databases. It defines the structure for storing data, including column names and data types.
Example: Imagine you have an ‘Employees’ table in your Azure SQL Database. To list the names and job titles of all employees, you can use: SELECT Name, JobTitle FROM Employees;
Use Case: A new Azure Data professional needs to update an employee’s contact information in a company database. They would use the UPDATE statement to change the employee’s phone number in the ‘Employees’ table based on their EmployeeID.
For more information see these links:
- SQL
- More Structured Query Language
- Querying the Index with Windows Search SQL Syntax
- UPDATE, DELETE, and INSERT Statements
- SQL language reference
Identify common database objects
- Tables: Tables are the core objects in a relational database where data is stored in rows and columns. They help organize and manage large amounts of information efficiently. Each table usually has a unique name and is made up of fields (columns) describing different types of data.
- Views: A view is a saved query or virtual table based on one or more tables. Views make it easier to display or access specific subsets of data without duplicating data, and can help present data in a simplified way to users.
- Stored Procedures and Functions: These objects contain reusable sets of SQL statements (logic) that automate routine data processing tasks. Stored procedures can perform operations such as adding, updating, or deleting data, while functions typically return a value and are often used for calculations or transformations across data.
- Triggers: Triggers are automatic actions that are executed in response to specific changes in a table (like inserting, updating, or deleting rows). They help enforce business rules or maintain data consistency without manual intervention.
- Indexes: Indexes are objects that improve the speed of data retrieval from tables. They are similar to an index in a book—helping you quickly find the information you need by creating pointers to the data.
Example: In a company’s Azure database, the ‘Employees’ table may store information like employee names and departments. A view called ‘ActiveEmployees’ could show only those currently employed, while a stored procedure called ‘AddNewEmployee’ automates adding a new record. A trigger could ensure every new hire receives a default department. An index on the ‘LastName’ column would help quickly look up employees by their last name.
Use Case: When migrating a MySQL database to Azure Database for MySQL, you need to identify and migrate all relevant database objects—like tables, views, stored procedures, triggers, and indexes—to ensure your applications continue working as expected on the Azure platform. Azure’s migration tools allow you to select these objects individually for migration, providing flexibility and accuracy in the migration process.
For more information see these links:
- MySQL to Azure Database for MySQL Data Migration - MySQL Schema Migration
- Publish Data and Database Objects
- Database objects in Azure Databricks
- Convert Oracle schemas (OracleToSQL)
- Convert MySQL Databases (MySQLToSQL)
Describe relational Azure data services
Describe the Azure SQL family of products including Azure SQL Database, Azure SQL Managed Instance, and SQL Server on Azure Virtual Machines
- The Azure SQL family includes three main options for running SQL Server databases in the cloud: Azure SQL Database, Azure SQL Managed Instance, and SQL Server on Azure Virtual Machines (VMs). Each service offers different levels of management and compatibility.
- Azure SQL Database is a fully managed database service designed for modern cloud applications. It automatically handles backups, updates, and scaling, making it easy for beginners to get started without worrying about database maintenance.
- Azure SQL Managed Instance provides a managed environment with almost full compatibility with traditional SQL Server. It’s ideal for organizations moving existing SQL Server workloads to the cloud with minimal changes, while enjoying built-in security, patching, and high availability.
- SQL Server on Azure Virtual Machines offers complete control of the SQL Server engine and the underlying OS. This option is best for applications that require custom configurations or need access to features not available in managed services. However, you are responsible for maintenance tasks like patching and backups.
- Choosing between these options depends on your needs: if you want minimal admin and cost efficiency, Azure SQL Database is suitable; for easy migration with high compatibility, choose Azure SQL Managed Instance; for full control or specialized setups, use SQL Server on Azure VM.
Example: An IT company wants to build a new cloud-based business application that stores user data. They choose Azure SQL Database for its ease of use and managed features, so they can focus on developing their application rather than managing the database infrastructure.
Use Case: A small business currently uses SQL Server on-premises for inventory management. To reduce hardware costs and improve scalability, they migrate their database to Azure SQL Managed Instance, gaining cloud benefits with minimal changes to their existing system and processes.
For more information see these links:
- What is Azure SQL?
- What is Azure SQL?
- What is Azure SQL?
- What is Azure SQL?
- What is Azure SQL Database?
Identify Azure database services for open-source database systems
- Azure offers fully managed database services for popular open-source relational databases, removing the need for users to manage, patch, or back up underlying infrastructure themselves.
- Key Azure open-source relational database services include Azure Database for MySQL, Azure Database for PostgreSQL, and Azure Database for MariaDB. Each service is based on the respective community edition.
- These managed services provide high availability, automatic scalability, security features (including integration with Microsoft Defender for Cloud), and seamless integration with Azure development tools and platforms.
- Migrating existing open-source databases to Azure is straightforward using tools like Azure Data Migration Service, and you can continue to use familiar client libraries and protocols.
- Choosing managed open-source databases in Azure helps organizations avoid vendor lock-in and leverage the strong community support and extension ecosystem for PostgreSQL, MySQL, and MariaDB.
Example: A small IT company wants to launch a web application that relies on the popular LAMP stack (Linux, Apache, MySQL, PHP). Instead of installing and maintaining their own MySQL server, they use Azure Database for MySQL, which lets them focus on building features while Azure handles backups, updates, and security.
Use Case: A new Azure data engineer is tasked with modernizing their organization’s reporting platform, which currently runs on an on-premises PostgreSQL database. By migrating the system to Azure Database for PostgreSQL, they enable automatic scaling during busy reporting periods, reduce downtime through built-in high availability, and ensure better security by enabling Microsoft Defender for Cloud.
For more information see these links:
- SQL vs. NoSQL data
- Key Azure services for developers
- Enable Defender for open-source relational databases on Azure
- SQL vs. NoSQL data
- Respond to Defender open-source database alerts
Describe considerations for working with non-relational data on Azure (15–20%)
Describe capabilities of Azure storage
Describe Azure Blob storage
- Azure Blob Storage is a cloud-based service for storing large amounts of unstructured data, such as documents, images, videos, and backups. It is highly scalable and accessible over HTTP or HTTPS, making it ideal for applications that need to store and retrieve data from anywhere.
- There are three types of blobs in Azure Blob Storage: Block Blobs (optimized for efficient uploading and storing of large files like documents and media), Append Blobs (optimized for append operations, perfect for logging), and Page Blobs (designed for frequent, random read/write operations, commonly used for virtual machine disks and databases).
- Azure Blob Storage supports different access tiers for managing costs: Hot (frequent access), Cool (infrequent access), and Archive (rare access). You choose the appropriate tier for Block Blobs based on access patterns, while Page and Append Blobs use the Hot tier by default.
- Data stored in Azure Blob Storage is durable and highly available due to built-in data redundancy options. Azure provides REST API, SDKs, and CLI tools for developers to interact with blobs, and supports features like snapshots, metadata, and lifecycle management.
- You can convert between blob types using tools like AzCopy, PowerShell, or Azure CLI. For example, you might convert an Append or Page Blob to a Block Blob if you need access to cost-efficient tiers or features only available for Block Blobs.
Example: A small IT consulting firm uses Azure Blob Storage to save customer invoices and project documents. They upload PDFs and Word files to Block Blob storage, ensuring they are safely stored and accessible to employees regardless of their location.
Use Case: An IT team new to Azure wants to set up cloud-based backup for virtual machines. They use Azure Page Blobs to store virtual machine disks, enabling reliable, fast, and scalable backup storage that can be quickly restored or copied to new VMs as needed.
For more information see these links:
- Overview of Azure page blobs
- Convert append blobs and page blobs into block blobs
- Convert append blobs and page blobs into block blobs
- Access tiers for blob data
- Convert append blobs and page blobs into block blobs
Describe Azure File storage
- Azure File storage is a cloud-based file share service that allows you to create shared folders accessible over the network, similar to traditional file servers, but hosted in Azure.
- It supports two industry-standard protocols: Server Message Block (SMB) and Network File System (NFS). This flexibility allows both Windows (via SMB) and Linux (via NFS) clients to access files, making it suitable for a wide range of applications.
- Azure File shares are scalable, enabling thousands of users or client machines to access the same file share concurrently. This is ideal for teams needing to collaborate on files or for applications that require shared storage.
- All file transfers can be secured with encryption in transit, helping to protect your data from unauthorized access over the network.
- Azure File storage is easy to set up and manage, allowing you to mount a cloud file share on your computers or virtual machines almost as easily as connecting to a traditional network share.
Example: Suppose an IT department wants to provide a shared folder for their team to store project documents. Instead of setting up and maintaining a physical file server, they create an Azure File share. Team members can access this cloud folder from their Windows and Linux computers using standard methods, just as if it were hosted locally.
Use Case: A beginner Azure admin sets up an Azure File share to allow multiple developers working on a project to store and access source code and documentation from both cloud-based and on-premises machines. This simplifies collaboration and centralizes version control for files.
For more information see these links:
- NFS Azure file shares
- Azure Files networking considerations
- SMB Azure file shares
- Mount an Azure file share on macOS
- Plan to deploy Azure Files
Describe Azure Table storage
- Azure Table storage is a NoSQL data service that stores large amounts of structured, non-relational data, making it suitable for scenarios where you need to quickly store and retrieve information without complex relationships or joins.
- Table storage organizes data as tables made up of entities (rows), where each entity contains properties (columns). The structure is flexible—entities in the same table are not required to have the same set of properties, making it easy to adapt data as needs change.
- Table storage is designed for scalability and cost-effectiveness. It can efficiently handle huge datasets and is usually less expensive than traditional relational databases for similar data volumes.
- You can securely access Table storage data from both within and outside Azure, using REST APIs, OData protocol, or .NET libraries, enabling a range of application scenarios from web apps to IoT.
- Reliability in Azure Table storage is ensured by integrated redundancy options and geo-replication, which help protect your data and maintain high availability even in the case of failures.
Example: Imagine a company building a web application to store user profile data. Each user’s profile includes a username, email, registration date, and other optional details like a profile picture or loyalty status. Azure Table storage lets the company save each user’s information as an entity in a table, with flexibility to easily add new properties over time without restructuring the database.
Use Case: An IT team wants to collect and track device information throughout their network—such as device IDs, locations, status flags, and last check-in times. With Azure Table storage, they can log this data directly from devices, scale up as the number of devices grows, and quickly query for devices that need attention using simple search filters.
For more information see these links:
- What is Azure Table storage?
- What is Azure Table storage?
- Browse and manage storage resources by using Server Explorer
- Reliability in Azure Table Storage
- Introduction to Azure Storage
Describe capabilities and features of Azure Cosmos DB
Identify use cases for Azure Cosmos DB
- Azure Cosmos DB enables highly responsive, globally distributed applications by automatically replicating data across Azure regions, ensuring data is close to users for low-latency access.
- Cosmos DB is ideal for scenarios requiring elastic scalability, supporting sudden changes in demand by scaling throughput and storage independently, which is useful for apps that experience variable or high traffic like web, mobile, and IoT platforms.
- It supports multiple data models (document, key-value, graph, columnar), making it suitable for a range of applications—such as storing JSON documents for web data, user profiles, or product catalogs, without strict relational schema constraints.
Example: A global e-commerce website uses Azure Cosmos DB to store product catalog information. By replicating data across multiple Azure regions, shoppers from any country can access product details quickly and the system handles traffic spikes during sales events easily.
Use Case: An IT team new to Azure Data builds a mobile app that tracks device health and usage for clients worldwide. Using Cosmos DB, they store device telemetry data centrally, ensure fast access regardless of user location, and easily scale as client numbers grow.
For more information see these links:
- Common Azure Cosmos DB use cases
- Why use Azure Cosmos DB for NoSQL for your AI applications?
- Distribute your data globally with Azure Cosmos DB
- Azure Cosmos DB as a key value store – cost overview
- Use NoSQL databases as a persistence infrastructure
Describe Azure Cosmos DB APIs
- Azure Cosmos DB supports multiple APIs, including NoSQL, MongoDB, Cassandra, Gremlin, Table, and PostgreSQL. Each API lets your application interact with Azure Cosmos DB using familiar protocols and query languages from popular databases.
- These APIs allow you to use different data models, such as documents (NoSQL, MongoDB), key-value pairs (Table), graphs (Gremlin), and column-family structures (Cassandra). You can choose the API that best fits your application’s data needs and your team’s existing database skills.
- Most APIs in Azure Cosmos DB are wire-compatible with open-source databases, making it easier to migrate existing applications without major changes. This means you can use existing tools, drivers, and skills while benefiting from Cosmos DB features like global distribution and scalable performance.
- Regardless of the API chosen, all operations in Cosmos DB are measured by Request Units (RUs), which abstract the compute, memory, and I/O needed for queries and transactions. This makes cost and performance management simple and consistent across APIs.
Example: A retail company is using MongoDB for its online shopping application. They want to move to Azure for better scalability but keep their existing application logic. By selecting the MongoDB API in Azure Cosmos DB, they can migrate their data without rewriting code, and continue using their MongoDB drivers and queries.
Use Case: An IT team at a new Azure Data client organization wants to build a new social networking feature that models user relationships and connections. They choose the Gremlin API in Azure Cosmos DB to work with graph data, letting them easily create, query, and visualize user connections as nodes and edges, all while leveraging Azure’s cloud scalability and management.
For more information see these links:
- Choose an API in Azure Cosmos DB
- Try Azure Cosmos DB free
- Find the request unit charge for operations executed in Azure Cosmos DB for Apache Cassandra
- Choose an API in Azure Cosmos DB
- Find the request unit charge for operations executed in Azure Cosmos DB for Gremlin
Describe an analytics workload (25–30%)
Describe common elements of large-scale analytics
Describe considerations for data ingestion and processing
- Choose the right ingestion method: Decide between batch and streaming ingestion based on your needs. Batch ingestion is best for processing large amounts of data at scheduled intervals, while streaming ingestion is ideal for real-time or near real-time data requirements.
- Plan for data validation and transformation: As data is ingested, it should be validated for quality and integrity. After ingestion, use ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) processes to clean, format, and structure the data so it’s usable for analytics.
- Optimize scalability and reliability: Use parallel processing, data partitioning, and schema-on-read techniques to efficiently process large and diverse datasets. Ensure your pipelines can handle increases in data volume and recover from failures without losing data.
Example: An IT company collects website activity logs from multiple servers. Each day, logs are ingested in batches into Azure Data Lake Storage. Before storage, logs are checked for missing fields (validation), and then transformed to a standard structure, making them easy to query later for user behavior analytics.
Use Case: A new Azure Data engineer sets up a workflow to ingest sales data from various departmental databases into Azure Data Lake using Azure Data Factory (batch) and sets up streaming ingestion from an IoT device using Azure Event Hubs. The engineer applies automated validations to prevent corrupted data and partitions files by day to improve performance, following best practices for both batch and streaming data scenarios.
For more information see these links:
- Batch vs. streaming data processing in Azure Databricks
- Data platform for AI workloads on Azure
- Big data architecture style
- Azure Data Explorer data ingestion overview
- Implementing the Medallion architecture using a data lake
Describe options for analytical data stores
- Analytical data stores are databases or services designed for storing and analyzing large volumes of data, typically optimized for reading across many records rather than frequent writing. In Azure, popular options include Azure Synapse Analytics, Azure Data Explorer, and Azure Databricks.
- Data lakes, such as Azure Data Lake Storage, are flexible storage locations for massive amounts of structured and unstructured data. They act as central repositories where data from various sources is stored, often in its raw form, allowing batch and real-time processing for analytics.
- Columnar databases, like Azure Synapse Analytics and Azure SQL Database, provide fast query performance for analytical workloads by organizing data in columns rather than rows. This makes tasks like reporting, dashboarding, and aggregating data much quicker and more efficient than traditional relational databases.
- Big data platforms, such as Azure Databricks or HDInsight (with tools like Spark and Hadoop), enable advanced analytics, machine learning, and interactive exploration of massive datasets. These platforms are ideal for scenarios where data size and complexity exceed the capabilities of traditional databases.
Example: A company collects website logs from millions of users. Instead of storing all this raw data in a regular SQL database, they use Azure Data Lake Storage, which can handle massive and varied data formats. Later, analysts use Azure Synapse Analytics to quickly run queries to uncover user trends and patterns.
Use Case: Suppose an IT team in a large retail company wants to analyze sales transactions from multiple stores every day. They use Azure Synapse Analytics to import all transactional data and then create dashboards that visualize top-selling products, busy store hours, and regional performance, helping them make data-driven decisions.
For more information see these links:
- Data store classification
- Big data architecture style
- Big data architectures
- Choose an analytical data store in Azure
- Relational database technologies on Azure and AWS
Describe Microsoft cloud services for large-scale analytics, including Azure Databricks and Microsoft Fabric
- Microsoft offers cloud-based analytics services designed for handling and analyzing large volumes of data. These services help organizations scale their analytics needs without worrying about managing physical infrastructure.
- Azure Databricks is a managed analytics platform based on Apache Spark. It enables data engineers and scientists to process, transform, and collaborate on big data projects using familiar languages like Python, SQL, and R. It provides optimized Spark performance, versioning, and easy integration with other Azure services.
- Microsoft Fabric is an all-in-one analytics platform that unifies data ingestion, data engineering, storage, transformation, and reporting. Fabric simplifies the management of end-to-end analytics workflows and makes it easier for teams to collaborate, securely share, and analyze data at scale. Fabric supports advanced security, integrated AI tools, and works smoothly with Microsoft 365 applications.
Example: A retail company collects sales and inventory data from hundreds of stores every day. With Microsoft Fabric, the company brings together this diverse data into one platform, uses Azure Databricks to process and clean the data, and then generates dashboards in Power BI to help store managers track inventory in real time. The whole process runs automatically in the cloud without manual intervention.
Use Case: A business analyst who is new to Azure wants to analyze customer buying patterns to improve product recommendations. They import sales data into Microsoft Fabric, preprocess the data using Azure Databricks (which automatically scales to handle their large dataset), and visualize trends in Power BI. This helps identify which products are often bought together, guiding marketing and stock decisions.
For more information see these links:
- Choose a batch processing technology in Azure
- Microsoft Fabric
- Microsoft Fabric
- Data and AI
- Relational database technologies on Azure and AWS
Describe consideration for real-time data analytics
Describe the difference between batch and streaming data
- Batch data processing collects and processes large amounts of data at once, typically at scheduled intervals (e.g., hourly, daily). It is well-suited for scenarios where up-to-date information is not required immediately.
- Streaming data processing handles data in real time as it arrives. It processes events continuously, enabling near-instant insights and actions based on the most current data.
- Batch processing is simpler to manage and can handle delayed or missing data by reprocessing large data segments, but it cannot provide immediate results. Streaming processing is ideal for quick decision-making and real-time monitoring but can be more complex to set up, especially when handling late-arriving or out-of-order data.
- In Azure, components like Azure Data Lake and Azure Databricks are commonly used for batch processing, whereas tools like Azure Stream Analytics and Azure Event Hubs are used for streaming data.
- Choosing between batch and streaming depends on your project requirements: use batch for historical reporting and analytics, and streaming for real-time dashboards, monitoring, or alerting.
Example: Imagine an online retail store. At the end of each day, the company runs a batch process to analyze the day’s total sales and generate reports. In contrast, streaming processing updates a real-time dashboard every few seconds to show live sales as customers make purchases.
Use Case: As a new Azure Data professional, you might set up a streaming pipeline in Azure Stream Analytics to monitor website click events in real time for immediate detection of suspicious behavior, while using batch processing in Azure Data Lake to create summary reports of weekly website traffic trends.
For more information see these links:
- Big data architectures
- Batch vs. streaming data processing in Azure Databricks
- Big data architecture style
- Streaming dataflows (preview)
- Batch vs. streaming data processing in Azure Databricks
Identify Microsoft cloud services for real-time analytics
- Azure Stream Analytics enables real-time data processing from sources like IoT devices, applications, and websites. It can analyze data as it arrives and push results to other services for more detailed analytics or visualization.
- Azure Synapse Analytics provides a platform to store, process, and analyze data at scale. You can output real-time data from Azure Stream Analytics jobs directly into dedicated SQL pool tables in Synapse Analytics for enterprise-level reporting and dashboarding.
- Azure Data Explorer is designed for low-latency, high-throughput analytics of large volumes of streaming data, such as logs and telemetry. It connects easily to Azure Stream Analytics, allowing for interactive queries and exploration of hot-path streaming data.
- Configuring Azure cloud services for real-time analytics is straightforward: set up your input (such as Event Hubs or IoT Hub), create and configure Azure Stream Analytics jobs for desired transformations, and select output destinations such as Synapse Analytics or Data Explorer.
- These services are scalable and managed by Microsoft, so beginners don’t have to worry about managing infrastructure or complex setup processes. You just configure jobs and outputs using simple steps in the Azure Portal.
Example: A retail company tracks sales transactions from its stores in real time using IoT sensors and sends these events to Azure Event Hubs. Azure Stream Analytics processes this data instantly and pushes summary data, such as top selling items and hourly sales totals, into Azure Synapse Analytics for live dashboards and business reports.
Use Case: As an IT professional new to Azure Data, you can set up Azure Stream Analytics jobs to monitor traffic on a company website. These jobs detect sudden surges in traffic and possible anomalies, then send the analyzed streaming output to Azure Data Explorer for interactive investigation or to Synapse Analytics for automatic reporting within minutes of events occurring.
For more information see these links:
- Azure Synapse Analytics output from Azure Stream Analytics
- Azure Data Explorer output from Azure Stream Analytics
- Azure Data Explorer output from Azure Stream Analytics
- Choose a real-time analytics and streaming processing technology on Azure
- Ingest data from Azure Stream Analytics into Azure Data Explorer
Describe data visualization in Microsoft Power BI
Identify capabilities of Power BI
- Connect to Multiple Data Sources: Power BI lets you easily connect to a wide range of data sources, such as databases, cloud services, online data, and Excel files. This makes it simple to bring all your relevant data together in one place for analysis.
- Create Interactive Visualizations: With Power BI, you can turn your raw data into interactive reports and dashboards using built-in and custom visuals, such as charts, maps, and graphs. These visuals are dynamic, meaning you can click and filter data right in the reports to explore it from different angles.
- Collaborate and Share Insights: Power BI allows you to work together with team members by sharing reports and dashboards in the cloud. You can set up automatic data refresh, schedule email updates, and let others view or interact with your visualizations securely, all through a web browser or mobile app.
- Data Shaping and Modeling: Power BI includes tools like Power Query Editor to clean, transform, and combine data before creating reports. It also lets you build relationships between data tables and create simple calculations using DAX (Data Analysis Expressions).
- Real-Time Data and Alerts: Power BI can display live data and trigger alerts if certain conditions are met (for example, when a value goes above a set limit). This helps you keep up with changes as they happen, supporting faster decision-making.
Example: An IT team in a company uses Power BI to connect to both their cloud-based ticketing system (like Azure DevOps) and an Excel file with local helpdesk statistics. They create a dashboard showing support ticket trends, average response times, and current workload, updating automatically each day.
Use Case: A new Azure Data professional sets up a Power BI report that pulls live data from Azure SQL Database to monitor system performance metrics, such as CPU usage and network latency. They share this report with their team, and set up an alert to notify them instantly if CPU usage spikes above a safe threshold, enabling proactive issue resolution.
For more information see these links:
- What is Power BI
- What is Power BI
- Features available through the Power BI integration
- The Power BI service for business users
- Previous monthly updates to Power BI Desktop and the Power BI service
Describe features of data models in Power BI
- Power BI data models help organize and structure data from different sources, allowing users to combine, relate, and analyze information easily in a single report. These models transform raw data into a format that supports clear and interactive data visualizations.
- Key features of Power BI data models include defining relationships between tables, creating calculated columns and measures using DAX (Data Analysis Expressions), and enabling filtering and data slicing in reports. Relationships help link different sets of data, while calculated columns and measures allow users to customize their analysis.
- Optimizing data models improves performance and report efficiency. This includes removing unnecessary columns, setting appropriate data types, and managing relationships properly. Efficient models load faster, consume less memory, and provide a smoother experience for report consumers.
Example: An IT support team uses Power BI to analyze service desk tickets. They connect data from ticket logs (issues), employee lists, and service categories. By defining relationships in the data model, they can easily report which departments submit the most tickets, track ticket resolution times, and visualize trends over time.
Use Case: A beginner IT analyst new to Azure Data uses Power BI’s data modeling features to link helpdesk ticket data (stored in spreadsheets) with staff records from a cloud database. This enables the creation of a unified dashboard that helps identify which teams need more training based on the types and frequency of support requests.
For more information see these links:
- What is Power BI
- Model data with Power BI - Training
- Optimization guide for Power BI
- Connect to SSAS multidimensional models in Power BI Desktop
- Edit data models in the Power BI service (preview)
Identify appropriate visualizations for data
- Understand what you want to show: Different questions about your data require different visuals. For example, use column or bar charts to compare categories, line or area charts to show trends over time, and pie charts to display parts of a whole (avoid if you have many categories).
- Know your data’s characteristics: Use maps for geographic data, scatter plots for relationships between two numeric values, and histograms to show the distribution of a single variable. This helps the audience quickly grasp the main message.
- Keep it simple and clear: Avoid cluttered visuals or unnecessary 3D effects. Choose straightforward chart types that make it easy for viewers to understand key points. Use color and labels wisely to highlight important information.
- Leverage Power BI visualization features: Power BI offers many built-in visuals (bar, line, pie, area, map, KPI, ribbon, etc.) and custom visuals from Microsoft AppSource to meet your needs. Try the Q&A visual if you want to let users ask questions in plain language and instantly get relevant charts.
Example: Suppose an IT manager wants to analyze monthly user logins to company applications. They use a line chart in Power BI to visualize the trend of logins over the last 12 months. This quickly shows peaks and dips, making it easy to spot seasonal patterns or issues.
Use Case: A cloud administrator in Azure Data wants to compare resource costs across departments. They create a Power BI dashboard using a bar chart to compare total monthly spending for each department. This makes it easy to identify which departments have higher cloud expenses and where cost optimization is needed.
For more information see these links:
- Visualization types in Power BI
- Visualization types in Power BI
- Tutorial: Analyze webpage data by using Power BI Desktop
- Visualization types
- Tips for designing a great Power BI dashboard