Describe core data concepts (25–30%)

Describe ways to represent data

Features of Structured Data

Structured data refers to information with a high level of organization, such as that which is found in a relational database. It is characterized by the following features:

  1. Predefined Schema: Structured data is organized according to a predefined schema. This schema dictates the structure of the data, including the data types and relationships between different data elements .

  2. Easily Searchable: Due to its organized format, structured data can be easily searched and queried using standard query languages like SQL. This allows for efficient retrieval of data based on specific criteria .

  3. Data Integrity: Structured data maintains data integrity through the use of constraints and rules that ensure the accuracy and consistency of the data. This includes primary keys, foreign keys, and other relational integrity constraints https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

  4. Optimized for Performance: The storage and retrieval of structured data are optimized for performance. Indexing is commonly used to speed up data retrieval, making it ideal for applications that require quick access to large volumes of data https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

  5. Standardized Format: The data is stored in tables with rows and columns, where each row represents a record and each column represents a data field. This standardized format makes it easier to share data across different systems and applications https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-implementation-planning-usage-scenario-self-service-data-preparation .

  6. Support for Complex Operations: Structured data supports complex operations such as joins, which allow for the combination of data from multiple tables. This is essential for applications that require relational data analysis https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

For more information on structured data and its features, you can refer to the following resources: - Real-Time Analytics in Fabric https://learn.microsoft.com/en-us/fabric/real-time-analytics/tutorial-introduction - Common Data Model (CDM) structure https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-implementation-planning-usage-scenario-self-service-data-preparation - Azure Table storage https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview

This information can be used to understand the characteristics of structured data, which is a key concept in data fundamentals.

Describe core data concepts (25–30%)

Describe ways to represent data

Features of Semi-Structured Data

Semi-structured data is a type of data that does not conform to a rigid schema like traditional relational databases but still contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields. It is a form that lies between structured and unstructured data and is often represented in formats like JSON, XML, and YAML. Below are some key features of semi-structured data:

For additional information on semi-structured data and its features, you can refer to the following resources: - Real-Time Analytics in Microsoft Fabric Overview https://learn.microsoft.com/en-us/fabric/real-time-analytics/tutorial-introduction . - Azure Synapse Analytics https://learn.microsoft.com/en-us/industry/well-architected/financial-services/analytics-data-estate . - Introduction to Lakehouse Architecture in Fabric https://learn.microsoft.com/en-us/industry/well-architected/financial-services/analytics-data-estate .

This information can be incorporated into a study guide to help learners understand the characteristics and advantages of semi-structured data in modern data platforms and analytics scenarios.

Features of Unstructured Data

Unstructured data refers to information that does not have a predefined data model or is not organized in a predefined manner. It is typically text-heavy, but may also contain data such as dates, numbers, and facts. Here are some key features of unstructured data:

For additional information on unstructured data and its management within Microsoft technologies, you can refer to the following resources:

Please note that while the FILESTREAM feature is commonly used to store unstructured data in the NTFS file system, it is not supported in Azure SQL Database or Azure SQL Managed Instance https://learn.microsoft.com/en-us/azure/azure-sql/migration-guides/database/sql-server-to-sql-database-assessment-rules https://learn.microsoft.com/en-us/azure/azure-sql/migration-guides/managed-instance/sql-server-to-sql-managed-instance-assessment-rules .

Describe core data concepts (25–30%)

Identify options for data storage

Common Formats for Data Files

Data files can come in various formats, each with its own structure and use cases. Below are some of the common data file formats:

Avro

Apache Avro is a row-based format that is widely used within the Hadoop ecosystem. It is a compact, fast, binary data format that provides rich data structures with a robust schema. When configuring Avro in a data pipeline, you select Avro from the File format drop-down list in your connection settings and then configure specific settings for the format https://learn.microsoft.com/en-us/fabric/data-factory/format-avro .

Parquet

Parquet is a columnar storage file format optimized for use with big data processing frameworks. It is highly efficient for both storage and performance, especially when dealing with complex nested data structures. To configure Parquet format, you choose Parquet from the File format drop-down list in your data pipeline connection and adjust the settings accordingly https://learn.microsoft.com/en-us/fabric/data-factory/format-parquet .

Excel

Microsoft Excel format is commonly used for spreadsheet documents and supports data in tabular form. Excel files can be used as a source in data pipeline copy activities, where you can select Excel as the file format and configure additional settings as needed https://learn.microsoft.com/en-us/fabric/data-factory/format-excel .

JSON

JavaScript Object Notation (JSON) is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON is often used for serializing and transmitting structured data over a network connection. In a data pipeline, JSON format can be selected from the File format drop-down list for further configuration https://learn.microsoft.com/en-us/fabric/data-factory/format-json .

Additional Configuration

For each of these formats, additional settings can be configured to tailor the data handling to specific requirements. This may include specifying the bucket or folder path for the data, as well as advanced settings for performance tuning and error handling https://learn.microsoft.com/en-us/fabric/data-factory/connector-google-cloud-storage-copy-activity .

For more detailed information on configuring these file formats, you can refer to the following resources: - Configure Avro format settings https://learn.microsoft.com/en-us/fabric/data-factory/format-avro - Configure Parquet format settings https://learn.microsoft.com/en-us/fabric/data-factory/format-parquet - Configure Excel format settings https://learn.microsoft.com/en-us/fabric/data-factory/format-excel - Configure JSON format settings https://learn.microsoft.com/en-us/fabric/data-factory/format-json

Understanding these common data file formats and how to configure them is essential for managing and processing data effectively in various data pipeline scenarios.

Describe core data concepts (25–30%)

Identify options for data storage

Types of Databases in Azure

When discussing databases within the Azure ecosystem, it’s important to understand the various types of databases available and their respective use cases. Here are the primary types of databases you can work with in Azure:

Azure SQL Database

Azure SQL Database is a fully managed relational database with built-in intelligence supporting self-driving features such as performance tuning and threat alerts. It is based on the latest stable version of the Microsoft SQL Server database engine and is highly compatible with most SQL Server features. This service is a good choice for a wide range of applications, particularly when working with structured data and requiring complex queries and transactional support https://learn.microsoft.com/en-us/azure/azure-sql/database/file-space-manage .

Azure SQL Managed Instance

Azure SQL Managed Instance is an expansion of Azure SQL Database, designed to provide near 100% compatibility with the latest SQL Server (on-premises) database engine. It offers a fully managed instance that includes features like SQL Agent, Service Broker, and cross-database queries. This service is typically used when migrating an existing SQL Server environment to Azure with minimal changes https://learn.microsoft.com/en-us/azure/azure-sql/database/file-space-manage .

Azure Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database service. It offers turnkey global distribution, supports document, key-value, graph, and column-family data models, and provides comprehensive service level agreements (SLAs). It’s a great option for applications that require a high level of scalability and responsiveness, and for those that deal with varied and evolving data schemas https://learn.microsoft.com/en-AU/azure/cosmos-db/nosql/couchbase-cosmos-migration .

Azure Synapse Analytics

Azure Synapse Analytics (formerly SQL Data Warehouse) is an analytics service that brings together big data and data warehousing. It allows you to query large volumes of data using either on-demand or provisioned resources. Synapse is not a database per se but integrates with databases and data lakes, providing a powerful platform for big data analytics https://learn.microsoft.com/en-us/azure/azure-sql/database/logins-create-manage .

Azure Database for MySQL, PostgreSQL, and MariaDB

These are fully managed database services for app developers, offering community versions of MySQL, PostgreSQL, and MariaDB with built-in high availability, security, and scaling on the fly with minimal downtime. These services are ideal for applications that are built on these open-source database engines and require a managed service with easy scaling, high availability, and security .

Azure Table Storage

Azure Table Storage is a service that stores large amounts of structured, non-relational data. It’s a NoSQL data store for semi-structured data. Azure Table Storage is a good choice for flexible datasets like user data for web applications, address books, device information, and other metadata .

For more detailed information on each of these database types, you can refer to the following URLs: - Azure SQL Database: Learn More - Azure SQL Managed Instance: Learn More - Azure Cosmos DB SDKs: Learn More - Azure Synapse Analytics: Learn More - Azure SQL Database Hyperscale: Learn More

Each of these database services offers unique features and capabilities that cater to different needs and scenarios. When choosing a database service for your application, consider factors such as the data model, scalability requirements, operational maintenance, and the specific features that your application needs.

Describe core data concepts (25–30%)

Describe common data workloads

Transactional workloads are characterized by their need for optimized data centers that prioritize low latency to handle quick and frequent data transactions. These workloads typically involve operations that require immediate processing, such as online transaction processing (OLTP) systems. The key features of transactional workloads include:

For additional information on transactional workloads and their management within Azure services, you can refer to the following resources:

Please note that the URLs provided are for reference purposes and are not meant to be included in the study guide as direct links.

Describe core data concepts (25–30%)

Describe common data workloads

Analytical workloads are designed to handle complex queries that often involve large volumes of data and require significant computational power to execute. These workloads are typically used for business intelligence (BI), reporting, and data analysis purposes. Here are some key features of analytical workloads:

Scalability and Elasticity

Analytical workloads must be able to scale to accommodate large datasets and varying levels of demand. This means that the underlying infrastructure should be able to handle an increase in data volume and query complexity without a degradation in performance https://learn.microsoft.com/en-AU/azure/cosmos-db/synapse-link .

Performance Optimization

The systems that support analytical workloads are optimized for high performance. This includes specialized storage technology that is self-managed and designed to optimize analytics workloads, ensuring that queries are executed efficiently https://learn.microsoft.com/en-AU/azure/cosmos-db/synapse-link .

Integration with Analytical Tools

Analytical workloads often require integration with analytical tools and platforms. Azure Cosmos DB, for example, has built-in support for Azure Synapse Analytics, which simplifies access to the storage layer and enhances performance for analytical tasks https://learn.microsoft.com/en-AU/azure/cosmos-db/synapse-link .

Active-Active Workload Management

Modern databases like Azure Cosmos DB allow for the running of globally distributed analytical workloads alongside transactional workloads in an active-active configuration. This means that analytical queries can be run effectively against the nearest regional copy of the data, reducing latency and improving query performance https://learn.microsoft.com/en-AU/azure/cosmos-db/synapse-link .

Read Replica Utilization

For read-intensive analytical workloads, the use of read replicas can improve performance and scalability. By isolating read workloads to replicas and directing write workloads to the primary source, systems can handle higher volumes of read operations without impacting the write capacity. This is particularly useful for BI and reporting purposes, where the data source is primarily read-only https://learn.microsoft.com/en-AU/azure/mysql/single-server/concepts-read-replicas https://learn.microsoft.com/en-AU/azure/mariadb/concepts-read-replicas .

It’s important to note that while read replicas enhance read performance, they do not directly reduce write-capacity burdens on the primary source and are not suitable for write-intensive workloads. Additionally, there is an inherent delay in data replication, so read replicas are best used for workloads that can accommodate this latency https://learn.microsoft.com/en-AU/azure/mysql/single-server/concepts-read-replicas https://learn.microsoft.com/en-AU/azure/mariadb/concepts-read-replicas .

For more information on Azure Cosmos DB and its capabilities for analytical workloads, you can visit the following URL: Azure Cosmos DB.

Please note that the provided URL is for informational purposes and to offer additional resources for understanding analytical workloads in the context of Azure services.

Describe core data concepts (25–30%)

Identify roles and responsibilities for data workloads

Responsibilities for Database Administrators

Database Administrators (DBAs) play a critical role in managing and maintaining the integrity and performance of databases. Their responsibilities include, but are not limited to, the following areas:

  1. Security Management: DBAs are responsible for ensuring the security of the database. This includes managing access controls, implementing Always Encrypted technology to protect sensitive data, and overseeing the roles of users who manage encryption keys, such as Security Administrators https://learn.microsoft.com/en-us/azure/azure-sql/database/always-encrypted-landing .

  2. Performance Monitoring: Monitoring the performance of the database to ensure it runs efficiently and effectively is a key responsibility. This involves optimizing queries, indexing, and ensuring that the database is tuned for the workload it supports.

  3. Backup and Recovery: DBAs must regularly back up the database and ensure that recovery procedures are in place and tested. This is crucial for protecting data against loss or corruption.

  4. User and Permission Management: Managing user access to the database is essential. This includes creating and maintaining user accounts, assigning appropriate permissions, and ensuring that users have the necessary access to perform their roles .

  5. Automation of Tasks: DBAs may use service principals or managed identities to automate tasks such as Microsoft Entra object creation in SQL Database, which allows for full automation of database user creation without additional SQL privileges https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing .

  6. Governance and Compliance: Ensuring that the database complies with relevant laws and regulations is a critical responsibility. This includes data privacy laws and industry-specific regulations.

  7. Maintenance and Updates: Regular maintenance of the database is required, including applying patches and updates to the database software to ensure it remains secure and efficient.

  8. Troubleshooting and Support: DBAs are the first point of contact for any database-related issues. They must be able to troubleshoot problems and provide support to resolve issues quickly.

For additional information on the roles and responsibilities of database administrators, you can refer to the following resources:

Please note that the URLs provided are for reference purposes to supplement the study guide material.

Describe core data concepts (25–30%)

Identify roles and responsibilities for data workloads

Responsibilities for Data Engineers

Data Engineers play a crucial role in managing and optimizing data workflows and infrastructure within an organization. Their responsibilities include:

  1. Data Architecture and Deployment Planning: Data Engineers are responsible for planning the deployment and architecture of data solutions. This involves integrating various services and data platforms to ensure seamless data flow and accessibility https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

  2. Data Pipeline Development: They build and maintain the data pipelines that are essential for transporting data from various sources to storage solutions or for processing. This includes ensuring that data is collected, stored, and retrieved efficiently https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

  3. Data Asset Publication: Publishing data assets that are used across the organization is another key responsibility. These assets can include lakehouses, data warehouses, data pipelines, dataflows, or semantic models, which form the backbone of the organization’s data infrastructure https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

  4. Data Management: Data Engineers create and manage data assets to support self-service content creators. This includes tasks such as data modeling and ensuring the quality and consistency of shared data resources https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

  5. Support and Troubleshooting: They provide user support by assisting with the resolution of data discrepancies and escalated help desk support issues. This ensures that any problems with data access or quality are quickly addressed https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

  6. Compute Utilization and Reporting: In the context of Microsoft Fabric, Data Engineers are involved in compute utilization and reporting for Spark, which powers Synapse Data Engineering and Science workloads. They manage operations like table preview, load to delta, notebook runs, scheduled runs, and spark job definition runs https://learn.microsoft.com/en-us/fabric/data-science/../data-engineering/billing-capacity-management-for-spark .

  7. Cost Management: Data Engineers also need to be aware of the capacity charges associated with their workspaces and how these are reflected in the Azure portal under the Microsoft Cost Management subscription. Understanding Fabric billing is important for managing the financial aspects of data operations https://learn.microsoft.com/en-us/fabric/data-science/../data-engineering/billing-capacity-management-for-spark .

For additional information on the roles and responsibilities of Data Engineers, you can refer to the following resources: - Data Engineering and Data Science in Microsoft Fabric https://learn.microsoft.com/en-us/fabric/data-science/../data-engineering/billing-capacity-management-for-spark - Roles within a Center of Excellence (COE) https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence

Please note that the scope of responsibilities for Data Engineers can vary significantly between organizations, and the roles may overlap with other positions within a Center of Excellence (COE) https://learn.microsoft.com/power-bi/guidance/fabric-adoption-roadmap-center-of-excellence .

Describe core data concepts (25–30%)

Identify roles and responsibilities for data workloads

Describe Responsibilities for Data Analysts

Data analysts play a crucial role in organizations by turning data into actionable insights. Their responsibilities often include the following:

  1. Utilizing Semantic Models: Data analysts frequently use semantic models to interpret and analyze data. They typically have Read and Build permissions on these models, which allows them to create reports while ensuring that Row-Level Security (RLS) is enforced for both report creators and consumers https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-implementation-planning-security-content-creator-planning .

  2. Collaboration and Data Sharing: Analysts work closely with other roles, such as business users and data scientists, sharing Power BI reports and datasets to facilitate seamless collaboration. This inter-role cooperation is particularly beneficial during the problem formulation phase of projects https://learn.microsoft.com/en-us/fabric/data-science/data-science-overview .

  3. Intra-team Support: Data analysts often engage in peer learning and support within their teams. They may voluntarily take on informal support roles, especially if they are recognized as subject matter experts or champions within their domain. This peer-to-peer learning is a significant aspect of workplace education and is particularly valuable for analysts working on domain-specific analytics solutions https://learn.microsoft.com/en-us/power-bi/guidance/fabric-adoption-roadmap-user-support .

  4. Automated Machine Learning (AutoML): In environments that support Power BI Premium or Embedded capacities, data analysts may leverage AutoML for dataflows. This enables them to train, validate, and invoke machine learning models directly within Power BI, simplifying the process of creating new ML models using their dataflows https://learn.microsoft.com/en-us/power-bi/create-reports/../transform-model/dataflows/dataflows-machine-learning-integration .

  5. Technical Enablement and Support: While the central BI team typically handles the technical aspects of Power BI as a platform, data analysts within distributed analytics groups focus on the content. They are subject matter experts who understand the business needs and work on analytics solutions pertinent to their specific business unit or shared services function https://learn.microsoft.com/en-us/power-bi/guidance/powerbi-migration-learn-from-customers .

For additional information on the roles and responsibilities of data analysts, you can refer to the following resources:

Please note that the URLs provided are for reference purposes to supplement the study guide content.

Identify considerations for relational data on Azure (20–25%)

Describe relational concepts

Features of Relational Data

Relational data refers to structured data that is organized into tables, which are known as relations. Each table contains rows and columns, where rows represent records and columns represent attributes of the data. Here are some key features of relational data:

Structured Query Language (SQL)

SQL is the standard language used to manage and manipulate relational databases. It allows users to perform various operations such as creating tables, inserting data, updating records, and querying data https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing .

Data Integrity

Relational databases enforce data integrity through constraints such as primary keys, foreign keys, unique constraints, and check constraints. These ensure the accuracy and consistency of the data within the database https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing .

Normalization

Normalization is a process used to organize data to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, more manageable pieces while maintaining relationships between them .

Security Features

Relational databases provide several security features to protect data, including: - SQL Managed Instance auditing: Tracks database activities for compliance and security analysis https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing . - Always Encrypted: Helps protect sensitive data by encrypting it at the application layer, so the data is encrypted in transit, at rest, and during query processing https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing . - Threat detection: Identifies unusual activities that could indicate a security threat to the database https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing . - Dynamic data masking: Hides sensitive data in query results without changing the actual data stored in the database https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing . - Row-level security: Controls access to rows in a database table based on the characteristics of the user executing a query https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing . - Transparent data encryption (TDE): Encrypts the storage of an entire database by using a symmetric key called the database encryption key https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing .

Transaction Management

Relational databases support transactions, which are sequences of operations performed as a single logical unit of work. Transactions ensure that all operations within the unit are completed successfully before committing data to the database, maintaining data integrity https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-azure-ad-landing .

Indexing

Indexes are used to speed up the retrieval of rows from a database table. An index is created on columns that are used frequently in query predicates and join conditions .

Spatial Data Support

Relational databases like Azure SQL support spatial data types, which allow the storage and query of data related to the physical location and shape of objects. Azure SQL supports geometry and geography data types for Euclidean and round-earth coordinate systems, respectively https://learn.microsoft.com/en-us/azure/azure-sql/multi-model-features .

Multi-Model Capabilities

Modern relational databases support multi-model data processing, enabling the storage and management of different data types such as graph, JSON, spatial, and XML within the same database https://learn.microsoft.com/en-us/azure/azure-sql/multi-model-features .

Change Tracking and Data Capture

Change tracking and change data capture are features that allow applications to determine the changes that have occurred in the database so that they can respond accordingly. These features enable efficient data synchronization and auditing https://learn.microsoft.com/en-AU/azure/azure-sql-edge/track-data-changes https://learn.microsoft.com/en-AU/azure/azure-sql-edge/track-data-changes .

For additional information on these features, you can refer to the following resources: - SQL Managed Instance Security Features - Spatial Data in Azure SQL - Graph Processing in Azure SQL Database - JSON Data in SQL Server - Change Tracking in SQL Server - Change Data Capture in SQL Server

Identify considerations for relational data on Azure (20–25%)

Describe relational concepts

Normalization is a database design technique that organizes tables in a manner that reduces redundancy and dependency of data. It involves dividing a database into two or more tables and defining relationships between the tables. The main goal of normalization is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships .

There are several reasons why normalization is used:

  1. To minimize data redundancy: Normalization helps to eliminate duplicate data, which not only reduces the storage space required but also ensures that data is consistent across the database .

  2. To improve data integrity: By reducing redundancy, normalization helps maintain data integrity, as there is a single point of truth for each piece of information. This makes the database more accurate and reliable .

  3. To enhance query performance: Well-normalized tables can lead to more efficient database queries. By organizing data in a way that eliminates unnecessary data duplication, queries can run faster because they have to process less data .

  4. To facilitate easier database maintenance: Normalization simplifies the database structure, which makes it easier to maintain over time. Changes in one part of the database are less likely to have unwanted consequences in other parts, which can reduce the complexity of database maintenance tasks .

  5. To avoid update anomalies: When data is duplicated and not properly managed, it can lead to inconsistencies known as update anomalies. Normalization helps to avoid these anomalies by ensuring that each piece of data is stored only once .

For additional information on normalization and its importance in database design, you can refer to the following resources:

Please note that while normalization is beneficial for transactional databases, denormalization may sometimes be preferred for analytical databases to optimize query performance. It’s important to find the right balance for your specific use case https://learn.microsoft.com/en-AU/azure/cosmos-db/nosql/modeling-data .

Identify considerations for relational data on Azure (20–25%)

Describe relational concepts

Common Structured Query Language (SQL) Statements

Structured Query Language (SQL) is a standardized programming language used for managing and manipulating relational databases. Here are some common SQL statements that are essential for interacting with databases:

Data Definition Language (DDL)

DDL statements are used to define and modify the database structure or schema.

  • CREATE: This statement is used to create a new table, a view of a table, or other object in the database.

    CREATE TABLE table_name (
        column1 datatype,
        column2 datatype,
        ...
    );
  • ALTER: Alters the structure of an existing database object, such as adding a column to an existing table.

    ALTER TABLE table_name
    ADD column_name datatype;
  • DROP: Deletes an entire table, a view of a table or other objects in the database.

    DROP TABLE table_name;

Data Manipulation Language (DML)

DML statements are used for managing data within schema objects.

  • SELECT: Retrieves data from one or more tables.

    SELECT column1, column2, ...
    FROM table_name;
  • INSERT INTO: Inserts new data into a table.

    INSERT INTO table_name (column1, column2, column3, ...)
    VALUES (value1, value2, value3, ...);
  • UPDATE: Modifies existing data within a table.

    UPDATE table_name
    SET column1 = value1, column2 = value2, ...
    WHERE condition;
  • DELETE: Removes data from a table.

    DELETE FROM table_name WHERE condition;

Data Control Language (DCL)

DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls of the database system.

  • GRANT: Gives user’s access privileges to the database.

    GRANT privilege_name
    ON object_name
    TO {user_name | PUBLIC | role_name}
    [WITH GRANT OPTION];
  • REVOKE: Withdraws user’s access privileges given by using the GRANT command.

    REVOKE privilege_name
    ON object_name
    FROM {user_name | PUBLIC | role_name}

Transaction Control Language (TCL)

TCL statements are used to manage the changes made by DML statements.

  • COMMIT: Saves all the transactions to the database.

    COMMIT;
  • ROLLBACK: Restores the database to the last committed state.

    ROLLBACK;
  • SAVEPOINT: Sets a savepoint within a transaction.

    SAVEPOINT savepoint_name;
  • SET TRANSACTION: Places a name on a transaction.

    SET TRANSACTION NAME transaction_name;

For more detailed information on SQL statements and their usage, you can refer to the following resources: - SQL Server Documentation - Azure SQL Database Documentation

Please note that the specific syntax for SQL statements can vary slightly between different database systems, but the fundamental concepts remain consistent.

Identify considerations for relational data on Azure (20–25%)

Describe relational concepts

Identify Common Database Objects

When working with databases, it is essential to understand the common objects that you will encounter. These objects are the building blocks of a database and are used to store and manage data. Below is an explanation of some of the most common database objects:

Databases

A database is a structured set of data held in a computer, especially one that is accessible in various ways. In the context of Azure Cosmos DB, a database can support one or more independent databases https://learn.microsoft.com/en-AU/azure/cosmos-db/mongodb/how-to-python-get-started .

Collections

In MongoDB, which is supported by Azure Cosmos DB’s API for MongoDB, a collection is a group of documents. It is analogous to a table in a relational database system. Collections do not enforce a schema, meaning that documents within a single collection can have different fields https://learn.microsoft.com/en-AU/azure/cosmos-db/mongodb/how-to-python-get-started .

Documents

A document is a set of key-value pairs and represents a single entry in a collection. Documents in MongoDB are similar to JSON objects but are stored in a format known as BSON. Documents have dynamic schemas, which means that documents in the same collection do not need to have the same set of fields, and common fields can hold different types of data https://learn.microsoft.com/en-AU/azure/cosmos-db/mongodb/how-to-python-get-started .

Tables

Tables are a type of database object that is used to store data in rows and columns. Each row in a table represents a unique record, and each column represents a field within the record. In Azure Cosmos DB, you can restore specific tables if needed .

Indexes

Indexes are used to speed up the retrieval of data from a database. They are created on columns in a table to allow for faster searches. Azure Cosmos DB allows you to script out existing indexes to view their definitions https://learn.microsoft.com/en-us/azure/azure-sql/identify-query-performance-issues .

For more information on working with these database objects in Azure Cosmos DB, you can refer to the following resources: - Azure Cosmos DB resource model for an overview of the hierarchy of resources https://learn.microsoft.com/en-AU/azure/cosmos-db/mongodb/how-to-python-get-started . - Working with collections and documents in MongoDB for details on managing collections and documents https://learn.microsoft.com/en-AU/azure/cosmos-db/mongodb/how-to-javascript-manage-databases . - Azure Storage redundancy for information on the durability and redundancy of Azure Storage, which is relevant for understanding the storage of database objects in Azure Cosmos DB https://learn.microsoft.com/en-AU/azure/cosmos-db/analytical-store-introduction .

Understanding these common database objects is crucial for anyone working with databases, as they form the foundation upon which data storage and retrieval are built.

Identify considerations for relational data on Azure (20–25%)

Describe relational Azure data services

Azure SQL Family of Products

The Azure SQL family of products offers a range of options for running SQL Server databases in the cloud, each tailored to specific needs and scenarios. Below is a detailed explanation of the three primary offerings: Azure SQL Database, Azure SQL Managed Instance, and SQL Server on Azure Virtual Machines.

Azure SQL Database

Azure SQL Database is a fully managed Platform-as-a-Service (PaaS) database engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement. Azure SQL Database is always running on the latest stable version of SQL Server Database Engine and patched OS with 99.99% availability. It is best suited for modern cloud applications that require a scalable, managed database service.

  • Built-in Intelligence: Azure SQL Database includes features like automatic performance tuning and threat detection.
  • Scalability: Offers a single database with its own set of resources managed via SQL Database server or elastic pools which is a collection of databases with a shared set of resources.
  • High Availability: Built-in high availability with a 99.99% availability SLA.

For more information on Azure SQL Database, visit Azure SQL Database Documentation.

Azure SQL Managed Instance

Azure SQL Managed Instance is an expansion of Azure SQL Database, providing a fully managed instance of the Microsoft SQL Server database engine. It provides near 100% compatibility with the latest SQL Server on-premises (Enterprise Edition) Database Engine, providing a native virtual network (VNet) implementation that addresses common security concerns, and a business model favorable for on-premises SQL Server customers.

For additional details on Azure SQL Managed Instance, you can refer to Azure SQL Managed Instance Documentation.

SQL Server on Azure Virtual Machines

SQL Server on Azure Virtual Machines falls into the category of Infrastructure-as-a-Service (IaaS) and is best for applications that require complete control over the SQL Server environment. This option provides the ability to use full versions of SQL Server in the cloud without having to manage any on-premises hardware.

For more information on SQL Server on Azure Virtual Machines, visit SQL Server on Azure VMs Documentation.

Each of these products is designed to provide a tailored SQL Server experience in the cloud, with varying levels of control and management. Depending on the specific requirements of the workload, users can select the most appropriate Azure SQL service.

Identify considerations for relational data on Azure (20–25%)

Describe relational Azure data services

Identify Azure Database Services for Open-Source Database Systems

Azure offers a variety of database services that support open-source database systems, providing scalability, high availability, and performance. Below are some of the key Azure services tailored for open-source databases:

Azure Managed Instance for Apache Cassandra

Azure Database for MySQL

Azure Database for PostgreSQL

Azure Cosmos DB for NoSQL

These Azure services provide robust solutions for deploying and managing open-source databases in the cloud, ensuring that users can leverage the benefits of Azure’s infrastructure while maintaining the flexibility and control of open-source database systems.

Describe considerations for working with non-relational data on Azure (15–20%)

Describe capabilities of Azure storage

Describe Azure Blob Storage

Azure Blob storage is a service provided by Microsoft Azure that allows for the storage of large amounts of unstructured data, such as text or binary data. It is designed to serve different needs, such as serving images or documents directly to a browser, storing files for distributed access, streaming video and audio, writing to log files, or storing data for backup and restore, disaster recovery, and archiving.

Components of Azure Blob Storage

Types of Blobs

Azure Blob storage offers different types of blobs to cater to various use cases:

  • Block Blobs: Ideal for storing text or binary files, and for uploading large files efficiently.
  • Append Blobs: Optimized for append operations, making them suitable for scenarios such as logging data from virtual machines.
  • Page Blobs: Designed for frequent read/write operations and are used primarily as the backing storage for Azure Virtual Machine disks.

Accessing and Managing Azure Blob Storage

Azure Blob storage can be accessed and managed using Azure’s web-based portal, Azure CLI, Azure PowerShell, or Azure Storage SDKs for different programming languages. For Java, for example, Microsoft recommends using Version 12 of the storage client library, which supports Blob, Queue, File, and Data Lake services https://learn.microsoft.com/java/api/overview/azure/storage .

Security and Access Control

Azure provides various methods to authorize access to Blob storage, such as using an account key or a connection string. For more detailed guidance on how to authorize access using different programming languages, you can refer to the following resources:

Pricing and Account Types

Azure Storage offers several types of storage accounts, each supporting different features and having its own pricing model. For most scenarios, Microsoft recommends using the Standard general-purpose v2 account type, which supports Blob Storage, Queue Storage, Table Storage, and Azure Files with various redundancy options https://learn.microsoft.com/en-AU/azure/storage/tables/../common/storage-account-overview .

For more information on Azure Blob storage, you can visit the following links:

Azure Blob storage is a versatile and scalable solution for storing large amounts of unstructured data in the cloud, providing a range of services and features to meet the needs of various applications and scenarios.

Describe considerations for working with non-relational data on Azure (15–20%)

Describe capabilities of Azure storage

Azure File Storage Overview

Azure File storage is a service that offers shared storage for applications using the standard SMB (Server Message Block) protocol. It is designed to enable cloud or on-premises deployments to share file data across application components via mounted shares, and it supports both Windows and Linux operating systems.

Key Features of Azure File Storage:

  1. Shared Access: Azure Files allows multiple VMs to share the same files with read and write access. This can be useful for application configurations, diagnostic data, or job results that need to be accessed from various locations.

  2. Compatibility: It supports industry-standard SMB protocol, meaning that most existing applications or services that use standard file system APIs can use Azure File storage without any modifications.

  3. Fully Managed: As a fully managed service, Azure Files eliminates the need to manage hardware or an OS for file sharing. This simplifies maintenance and management tasks.

  4. Scripting and Tool Support: Azure Files can be accessed and managed using PowerShell and Azure CLI, and it is supported by Azure Storage Client Libraries and Azure Storage REST API.

  5. Resiliency: Azure Files offers different levels of redundancy to ensure data is highly available and durable. Options include locally redundant storage (LRS), geo-redundant storage (GRS), and zone-redundant storage (ZRS).

  6. Integration with Azure Services: Azure Files can be integrated with other Azure services such as Azure Backup and Azure File Sync, enhancing its capabilities for disaster recovery and hybrid storage scenarios.

Types of Azure File Storage Accounts:

Access and Security:

Performance Considerations:

For optimal performance, it is recommended to use Azure Files when high throughput and low latency are required. Azure Files operates with the filesystem, allowing direct access to file shares https://learn.microsoft.com/en-us/azure/azure-sql/database/database-import-export-azure-services-off .

Cost Considerations:

Azure Files offers a flexible pricing model based on the chosen performance tier (standard or premium) and the redundancy option. To manage costs effectively, it is important to select the appropriate tier and redundancy level based on the application’s performance and availability requirements https://learn.microsoft.com/en-AU/azure/storage/tables/../common/storage-account-overview .

Additional Resources:

Azure File storage is a versatile and secure solution for managing file shares in the cloud, offering seamless integration with existing systems and a range of options to meet various performance and security needs.

Describe considerations for working with non-relational data on Azure (15–20%)

Describe capabilities of Azure storage

Describe Azure Table Storage

Azure Table Storage is a service that offers structured NoSQL data storage in the cloud, allowing for the storage of large amounts of non-relational data. This service is designed to store structured data at a massive scale in a schemaless fashion, which means that the data can be adapted easily as the needs of your application change over time https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

The core unit of data in Azure Table Storage is the “table,” which can be used to store datasets such as user data for web applications, address books, device information, or any other metadata required by a service. Each table stores data as entities, which are a set of properties, similar to rows in a relational database. A storage account can contain any number of tables, subject to the capacity limits of the account https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

Azure Table Storage is known for its fast and cost-effective access to data, making it suitable for many types of applications. It is often more economical than traditional SQL storage for similar volumes of data due to its schemaless design and the nature of its storage https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

The Table Storage API is a REST API that allows for working with tables and the data within them. This API is applicable to both Azure Table Storage and the Table API of Azure Cosmos DB, which offers additional features such as higher performance, global distribution, and automatic secondary indexes https://learn.microsoft.com/rest/api/storageservices/table-service-rest-api .

For developers, Azure provides a unified Azure Tables SDK that supports both Azure Table storage and Azure Cosmos DB for Table. This SDK simplifies the development process by providing a consistent set of tools for interacting with table data across both services https://learn.microsoft.com/en-AU/azure/storage/tables/table-storage-overview .

For monitoring and managing the performance and availability of Azure Table Storage, Azure Monitor can be used. It generates monitoring data for Azure resources, which can be analyzed and alerted upon to ensure the smooth operation of applications and business processes that rely on Azure Table Storage https://learn.microsoft.com/en-AU/azure/storage/tables/monitor-table-storage .

When querying data, it is possible to specify a range within a partition to retrieve a subset of entities. This can be done by combining a partition key filter with a row key filter, allowing for efficient data retrieval that matches specific criteria https://learn.microsoft.com/en-AU/azure/cosmos-db/table/how-to-use-c-plus .

For additional information on Azure Table Storage, you can refer to the following resources: - Azure Table Storage Guide - Azure Cosmos DB for Table - Azure Tables SDK Announcement - Serverless Mode in Azure Cosmos DB - Feature Differences Between Table API and Azure Table Storage

Please note that the URLs provided are for additional information and are not meant to be included in the study guide as explicit references to the exam.

Describe considerations for working with non-relational data on Azure (15–20%)

Describe capabilities and features of Azure Cosmos DB

Identify Use Cases for Azure Cosmos DB

Azure Cosmos DB is a globally distributed, multi-model database service that offers a wide range of use cases due to its flexibility, scalability, and performance. Below are some of the key scenarios where Azure Cosmos DB is particularly well-suited:

  1. Serverless Applications: Azure Cosmos DB’s serverless mode is ideal for development, testing, or any application with sporadic traffic patterns. It operates on a consumption-based model, ensuring you only pay for the resources you use, with no minimum fees. This makes it cost-effective for workloads that do not require continuous throughput capacity https://learn.microsoft.com/en-AU/azure/cosmos-db/optimize-costs .

  2. User-Generated Content (UGC): For applications dealing with UGC such as chat sessions, tweets, blog posts, ratings, and comments, Azure Cosmos DB provides a flexible schema that can handle the semi-structured nature of this data. It allows for rapid development and iteration, as new data properties can be added without the need for complex transformations https://learn.microsoft.com/en-AU/azure/cosmos-db/use-cases .

  3. Social Media Applications: Azure Cosmos DB is well-suited for social media applications that need to adapt to changing data schemas and require immediate data availability for querying. Its automatic indexing ensures that data is always ready for retrieval, providing the necessary flexibility for these dynamic environments https://learn.microsoft.com/en-AU/azure/cosmos-db/use-cases .

  4. Global Distribution: Applications that require global reach can benefit from Azure Cosmos DB’s ability to distribute data across multiple regions. This ensures low-latency access to data regardless of where the end-users are located, enabling a consistent experience worldwide https://learn.microsoft.com/en-AU/azure/cosmos-db/use-cases .

  5. Scalability: For applications with unpredictable usage patterns, Azure Cosmos DB offers the ability to scale out by adding data partitions or creating additional accounts across multiple regions. This flexibility is crucial for applications that experience sudden spikes in traffic or need to grow with user demand https://learn.microsoft.com/en-AU/azure/cosmos-db/use-cases .

For more detailed information on Azure Cosmos DB use cases, you can refer to the following resources: - Common Azure Cosmos DB use cases https://learn.microsoft.com/en-AU/azure/cosmos-db/social-media-apps https://learn.microsoft.com/en-AU/azure/cosmos-db/whitepapers . - Azure Cosmos DB serverless https://learn.microsoft.com/en-AU/azure/cosmos-db/optimize-costs . - Azure Regions for Cosmos DB service availability https://learn.microsoft.com/en-AU/azure/cosmos-db/use-cases .

These resources provide additional insights into how Azure Cosmos DB can be leveraged for various application scenarios, helping to understand its capabilities and how it can fit into different architectural designs.

Describe an analytics workload on Azure (25–30%)

Describe common elements of large-scale analytics

Considerations for Data Ingestion and Processing

When designing a system for data ingestion and processing, several key considerations must be taken into account to ensure efficient, secure, and reliable handling of data. Below are some of the primary considerations:

  1. Data Import Sequence and Transformation:
  2. Data Export:
  3. Data Security:
  4. Data Auditing and Traceability:
  5. Custom Dimensions:
  6. User Interface for Data Ingestion:
  7. Decoupling of Data Processes:
  8. Data Volume and Ingestion Rate:
  9. Number of Materialized Views in a Cluster:
  10. Materialized View Definition:
  11. Data Ingestion Options into a Warehouse:

For additional information on data ingestion and processing, the following resources can be consulted: - Data Import Considerations https://learn.microsoft.com/en-us/industry/well-architected/sustainability/sustainability-manager-data-management-overview - Data Export Design https://learn.microsoft.com/en-us/industry/well-architected/sustainability/sustainability-manager-data-management-overview - Data Security Design https://learn.microsoft.com/en-us/industry/well-architected/sustainability/sustainability-manager-data-management-overview - Data Auditing and Traceability https://learn.microsoft.com/en-us/industry/well-architected/sustainability/sustainability-manager-data-management-overview - Custom Dimensions Design https://learn.microsoft.com/en-us/industry/well-architected/sustainability/sustainability-manager-data-management-overview - Materialized Views Capacity Policy https://learn.microsoft.com/azure/data-explorer/kusto/management/materialized-views/materialized-view-overview - Data Ingestion Options https://learn.microsoft.com/en-us/fabric/data-warehouse/guidelines-warehouse-performance

By considering these aspects, organizations can design a robust data ingestion and processing system that meets their operational needs and supports their strategic objectives.

Describe an analytics workload on Azure (25–30%)

Describe common elements of large-scale analytics

Options for Analytical Data Stores

Analytical data stores are designed to facilitate efficient data analysis and are optimized for read-heavy workloads. They differ from transactional data stores, which are optimized for write operations and transactional integrity. Below are some options and considerations for analytical data stores:

Analytical Time-to-Live (ATTL)

Pricing Considerations

Data Management

For additional information on configuring and managing analytical data stores, you can refer to the following resources: - Configure Analytical TTL on a Container https://learn.microsoft.com/en-AU/azure/cosmos-db/analytical-store-introduction - Analytical Store Overview https://learn.microsoft.com/en-AU/azure/cosmos-db/nosql/modeling-data - Azure Cosmos DB Pricing Details https://learn.microsoft.com/en-AU/azure/cosmos-db/analytical-store-introduction - Azure Synapse Analytics Pricing Details https://learn.microsoft.com/en-AU/azure/cosmos-db/analytical-store-introduction

These resources provide in-depth guidance on the implementation, pricing, and management of analytical data stores within Azure Cosmos DB and how to access them using Azure Synapse Link.

Describe an analytics workload on Azure (25–30%)

Describe common elements of large-scale analytics

Azure Services for Data Warehousing

Data warehousing is a critical component of business intelligence that involves the storage and analysis of large volumes of data. Azure offers a suite of services that cater to various aspects of data warehousing, each with its unique features and capabilities.

Azure Synapse Analytics

Azure Synapse Analytics is an analytics service that brings together big data and data warehousing. It allows users to query data using either serverless on-demand or provisioned resources. Synapse Analytics provides a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. More information can be found at Azure Synapse Analytics Overview https://learn.microsoft.com/azure/data-factory/connector-azure-sql-database .

Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. It provides a collaborative environment with a workspace for data scientists, data engineers, and business analysts to work together. Databricks integrates with Azure services to provide a one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. To create a Spark cluster in Azure Databricks, you can follow the instructions on Deploying Azure Databricks in your Azure Virtual Network https://learn.microsoft.com/en-AU/azure/cosmos-db/../managed-instance-apache-cassandra/deploy-cluster-databricks .

Microsoft Fabric

Microsoft Fabric is an all-in-one analytics solution that covers a wide range of data-related tasks, including data movement, data science, real-time analytics, business intelligence, and reporting. It provides a comprehensive set of tools for enterprises to manage their data warehousing needs. You can start a new trial for Microsoft Fabric at Microsoft Fabric Trial https://learn.microsoft.com/azure/data-factory/connector-azure-sql-database .

Azure HDInsight

Azure HDInsight is a cloud service that makes it easy, fast, and cost-effective to process massive amounts of data. It supports a broad range of scenarios such as ETL, Data Warehousing, Machine Learning, and IoT. However, it’s important to note that Data Factory in Microsoft Fabric does not currently support Azure HDInsight on AKS Trino in data pipelines https://learn.microsoft.com/en-us/fabric/data-factory/connector-azure-hdinsight-on-aks-trino-overview .

Azure Data Factory

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. It enables users to create, schedule, and manage data pipelines that can ingest data from disparate data stores. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management. You can learn more about how to use Copy Activity in Azure Data Factory to copy data from and to Azure SQL Database at Azure Data Factory Introduction https://learn.microsoft.com/azure/data-factory/connector-azure-sql-database .

Each of these services plays a vital role in the data warehousing ecosystem within Azure, providing tools and capabilities to handle large-scale data processing and analytics. They can be used individually or in combination to build a comprehensive data warehousing solution that meets the specific needs of an organization.

Describe an analytics workload on Azure (25–30%)

Describe consideration for real-time data analytics

Batch vs. Streaming Data

Batch data processing is a method where data is collected over a period of time and then processed in large, discrete chunks (batches) at a scheduled time or when the batch reaches a certain size. This approach is often used when it is acceptable to have some latency between data collection and data availability for analysis. Batch processing is suitable for scenarios where the data volume is large, and the processing can be deferred until the complete set of data is available. It is also useful for complex computations that do not require immediate results.

On the other hand, streaming data processing involves continuous ingestion and processing of data records individually or in small batches as soon as they are generated. This method is ideal for scenarios where it is necessary to respond quickly to data as it arrives, such as in real-time analytics, monitoring, and event-driven applications. Streaming data often has a time component that affects data preparation tasks, and it is characterized by low-latency processing, which allows for immediate insights and actions based on the incoming data.

Key differences include:

  • Latency: Batch processing can have high latency due to the time taken to collect and process the entire batch of data. Streaming processing has low latency, as data is processed in near real-time.
  • Data Volume: Batch processing is designed to handle large volumes of data efficiently, while streaming processing deals with continuous flows of data in smaller quantities.
  • Complexity: Batch jobs can be more complex and compute-intensive, as they are not constrained by the need for immediate processing. Streaming processing requires a more streamlined approach to handle the continuous flow of data.
  • Timeliness: Streaming data processing provides more timely insights and enables immediate action, which is crucial for time-sensitive applications.

For more information on batch and streaming data processing, you can refer to the following resources:

These resources provide a deeper understanding of how batch and streaming data processing work and how they can be applied in different scenarios.

Describe an analytics workload on Azure (25–30%)

Describe consideration for real-time data analytics

Microsoft Cloud Services for Real-Time Analytics

Real-time analytics is the process of analyzing data as soon as it becomes available, allowing businesses to make immediate decisions based on the most current information. Microsoft offers several cloud services that enable real-time analytics, providing insights into data with minimal latency. Below are the key services that facilitate real-time analytics in the Microsoft cloud ecosystem:

Azure Event Hubs

Azure Event Hubs is a highly scalable data streaming platform and event ingestion service. It can receive and process millions of events per second, which makes it an ideal entry point for an event pipeline aiming to perform real-time analytics on the ingested data. Event Hubs can be used in conjunction with other services to enable real-time analytics https://learn.microsoft.com/en-us/azure/azure-sql/identify-query-performance-issues https://learn.microsoft.com/en-us/azure/azure-sql/database/metrics-diagnostic-telemetry-logging-streaming-export-configure .

Azure Stream Analytics

Azure Stream Analytics is a real-time analytics and complex event-processing engine that is designed to analyze and process high volumes of fast streaming data from multiple sources simultaneously. Patterns can be identified, and insights can be derived from the data as it flows, which can then be visualized in reports and dashboards. Stream Analytics can be connected to Azure Event Hubs for data input and Power BI for output to create real-time analytics dashboards https://learn.microsoft.com/en-us/azure/azure-sql/identify-query-performance-issues https://learn.microsoft.com/en-us/azure/azure-sql/database/metrics-diagnostic-telemetry-logging-streaming-export-configure https://learn.microsoft.com/en-us/power-bi/connect-data/refresh-data .

Power BI

Power BI is a business analytics service that provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their reports and dashboards. It can be used to visualize the data being processed by Azure Stream Analytics in real-time, providing a powerful tool for building real-time analytics dashboards https://learn.microsoft.com/en-us/azure/azure-sql/identify-query-performance-issues https://learn.microsoft.com/en-us/azure/azure-sql/database/metrics-diagnostic-telemetry-logging-streaming-export-configure https://learn.microsoft.com/en-us/power-bi/connect-data/refresh-data .

Azure Data Explorer

Azure Data Explorer is a fast and highly scalable data exploration service for log and telemetry data. It offers real-time analysis on large volumes of data streaming from applications, websites, IoT devices, and more. Azure Data Explorer uses the Kusto Query Language (KQL) for querying massive amounts of data in near real-time https://learn.microsoft.com/en-us/fabric/real-time-analytics/realtime-analytics-compare .

Real-Time Analytics in Microsoft Fabric

Real-Time Analytics is a data analytics Software as a Service (SaaS) experience within the Microsoft Fabric offering. It shares the same core engine as Azure Data Explorer, providing identical core capabilities but with different management behavior. Real-Time Analytics also offers capabilities such as Eventstreams, which are managed at the workspace level https://learn.microsoft.com/en-us/fabric/real-time-analytics/realtime-analytics-compare .

For more detailed information on setting up and using these services for real-time analytics, you can refer to the following resources: - Stream Analytics and Power BI: A real-time analytics dashboard for streaming data https://learn.microsoft.com/en-us/azure/azure-sql/identify-query-performance-issues https://learn.microsoft.com/en-us/azure/azure-sql/database/metrics-diagnostic-telemetry-logging-streaming-export-configure https://learn.microsoft.com/en-us/power-bi/connect-data/refresh-data . - What is Real-Time Analytics in Fabric? https://learn.microsoft.com/en-us/fabric/real-time-analytics/realtime-analytics-compare .

These services, when used together, provide a comprehensive solution for processing and analyzing data in real-time, enabling businesses to gain timely insights and make informed decisions quickly.

Describe an analytics workload on Azure (25–30%)

Describe data visualization in Microsoft Power BI

Identify Capabilities of Power BI

Power BI is a suite of business analytics tools that deliver insights throughout your organization. Below are some of the key capabilities of Power BI:

Power BI Service and Power BI in Teams

Power BI Desktop

Power BI Visuals

Power BI Embedded Analytics

Power BI and Log Analytics

Please note that the URLs provided are for additional information and are part of the official Microsoft documentation. They offer a deeper dive into the capabilities and features of Power BI.

Describe an analytics workload on Azure (25–30%)

Describe data visualization in Microsoft Power BI

Features of Data Models in Power BI

Power BI data models are essential for transforming raw data into meaningful insights. Here are some of the key features that make Power BI data models powerful tools for data analysis and visualization:

  1. Automated Machine Learning: Power BI allows users to create and apply binary prediction models using automated machine learning. This feature enables users to predict outcomes based on data, such as the purchase intent of online shoppers, by training a machine learning model directly within Power BI https://learn.microsoft.com/en-us/power-bi/create-reports/../connect-data/service-tutorial-build-machine-learning-model .

  2. Dataflows: Dataflows in Power BI enable users to create and manage data preparation pipelines. These dataflows can be used to train machine learning models, and they can also be applied to entities within Power BI for scoring and generating predictions https://learn.microsoft.com/en-us/power-bi/create-reports/../connect-data/service-tutorial-build-machine-learning-model .

  3. Featured Tables: Featured tables in Power BI facilitate the integration of enterprise data into Microsoft Excel. They allow users to easily add data from Power BI semantic models into Excel sheets, enhancing the data analysis capabilities of Excel with the power of Power BI https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-excel-featured-tables .

  4. Advanced AI Integration: Power BI integrates advanced AI capabilities, allowing users to apply artificial intelligence to their dataflows. This can include features like text analytics, image recognition, and automated machine learning models https://learn.microsoft.com/en-us/power-bi/developer/embedded/../../enterprise/service-premium-features .

  5. DirectQuery with Dataflows: This feature allows users to connect directly to their dataflow without having to import the data. It enables real-time data analysis and reporting, which is crucial for timely decision-making https://learn.microsoft.com/en-us/power-bi/developer/embedded/../../enterprise/service-premium-features .

  6. Hybrid Tables: Hybrid tables are a preview feature that combines incremental refresh with real-time data. This allows for a more dynamic and up-to-date data model that can reflect the latest changes in the data source https://learn.microsoft.com/en-us/power-bi/developer/embedded/../../enterprise/service-premium-features .

  7. Model Size Limit: Power BI sets limits on the memory footprint of a single Power BI semantic model, which varies depending on the type of service (Premium, Premium Per User, or Embedded). This ensures that models are optimized for performance https://learn.microsoft.com/en-us/power-bi/developer/embedded/../../enterprise/service-premium-features .

  8. Endorsement: Power BI allows users to promote or certify semantic models, which is known as endorsement. Endorsed models are prioritized in Excel’s Data Types Gallery, with certified semantic models listed first https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-create-excel-featured-tables .

For additional information on these features, you can refer to the following URLs:

These features collectively enhance the data modeling capabilities in Power BI, enabling users to create sophisticated and insightful reports and dashboards.

Describe an analytics workload on Azure (25–30%)

Describe data visualization in Microsoft Power BI

Identifying Appropriate Visualizations for Data

When selecting visualizations for data representation, it is essential to consider the nature of the data and the message you want to convey. Here are some guidelines to help you choose the most suitable visualizations:

  1. Understand Data Types: Different types of data are best represented by different types of visualizations. For instance, geographic data may not be effectively displayed in a funnel chart or line chart https://learn.microsoft.com/en-us/power-bi/create-reports/../visuals/power-bi-report-add-visualizations-ii .

  2. Consider the Visualization’s Purpose:

    • To compare values, bar charts and column charts are effective.
    • For showing trends over time, line charts or area charts are appropriate.
    • Pie charts or donut charts can be used to display proportions of a whole .
  3. Interactivity and Drill-Downs: Some visualizations offer interactive features such as drill-downs, which allow viewers to click on a region of a treemap, for example, to see more detailed data for that area .

  4. Card Visualizations: These are used to display single numbers prominently, such as total sales or average order value https://learn.microsoft.com/en-us/power-bi/create-reports/sample-revenue-opportunities .

  5. Choosing the Right Chart for Sequences: For visualizations that render sequences of values, such as time series data, line charts, time charts, and area charts are suitable. The x-axis typically represents time, and the y-axis represents the measured values https://learn.microsoft.com/azure/data-explorer/kusto/query/render-operator .

  6. Customization: Power BI allows customization of visualizations, including tooltips, to enhance the understanding of the data presented https://learn.microsoft.com/en-us/power-bi/create-reports/sample-revenue-opportunities .

  7. Q&A Feature: Power BI’s Q&A feature can assist in choosing the right visualization by analyzing structured data and selecting an appropriate visualization based on the query https://learn.microsoft.com/en-us/power-bi/create-reports/service-prepare-data-for-q-and-a .

For additional information on creating and customizing visualizations in Power BI, you can refer to the following resources: - Customize tooltips in Power BI https://learn.microsoft.com/en-us/power-bi/create-reports/sample-revenue-opportunities . - Add a query visualization in the web UI https://learn.microsoft.com/azure/data-explorer/kusto/query/render-operator . - Customize dashboard visuals https://learn.microsoft.com/azure/data-explorer/kusto/query/render-operator .

Remember, the key to effective data visualization is to ensure that the chosen visual communicates the data clearly and accurately, allowing the viewer to grasp the insights the data provides.

Describe an analytics workload on Azure (25–30%)

Describe data visualization in Microsoft Power BI

Identifying Appropriate Visualizations for Data

When selecting visualizations for data, it is essential to consider the type of data you are working with and the insights you aim to convey. Here are some guidelines to help you choose the most suitable visualizations:

  1. Understanding Data Types and Visualization Compatibility
  2. Exploring Data through Interactive Visualizations
  3. Utilizing Card Visualizations for Key Metrics
  4. Drilling Down into Data with Tree Diagrams
  5. Choosing Visualizations for Sequences of Values
  6. Leveraging Q&A for Visualization Selection

For more detailed guidance on selecting and customizing visualizations in Power BI, you can refer to the following resources: - Customize tooltips in Power BI https://learn.microsoft.com/en-us/power-bi/create-reports/sample-revenue-opportunities . - Add a query visualization in the web UI https://learn.microsoft.com/azure/data-explorer/kusto/query/render-operator . - Customize dashboard visuals https://learn.microsoft.com/azure/data-explorer/kusto/query/render-operator .

Remember, the key to effective data visualization is to match the visualization type to the data’s nature and the insights you wish to communicate. Experiment with different visualizations to find the one that makes your data easy to understand and explore.