Wednesday, November 18, 2020

Introduction to Unified Data Analytics with Databricks



This course is for individuals who want a high-level overview of how Databricks can help organizations adopt a Unified Data Analytics approach.

By the end of the course you will be able to:  

1: Summarize the benefits of adopting a Unified Data Analytics approach to business. 

2: Explain, at a high-level, how Databricks enables organizations to adopt a Unified Data Analytics approach to business. 

3: Explain how the individual components of the Unified Data Analytics Platform help organizations increase efficiency.

4: Give examples of how real-world customers have used Databricks to streamline big data workflows.

Lesson 1: Why Unified Data Analytics?

In the early days of data analytics, simple relational databases, historical data, and spreadsheet expertise were used to drive business decisions. Today, with the emergence of big data, these methods are no longer sufficient. Businesses today spend a significant amount of resources trying to piece together solutions for how to extract insights from big data. You might be asking yourself, “Why is extracting insights from big data so complicated”? In the video below, we’ll review the challenges that many organizations face as they try to work with their big data. These concepts are important, as they are what set the stage for the emergence of Unified Data Analytics. Big Data is complex to work with because of 3Vs: Volume, Velocity and Variety

Lesson 2: What is Unified Data Analytics?

As we explored in our last lesson, working with big data is complicated. UDA arose to help data practitioners spend more time analyzing and extracting insights from data rather than figuring out how to store it, manage it, and more. In this lesson, we’ll review the basic elements behind a UDA approach to business.
The concept behind a UDA approach sounds pretty straight-forward, right? At a high-level, it is. In summary, a UDA approach simply means that an organization is able to collect and process big data, store that data over a long period of time, and have that data at their disposal whenever they need it for a variety of business purposes. In our next lesson, we’ll apply this idea to a real-world scenario so that you can see what a UDA approach might look like in real-life.

Lesson 3: Applying a Unified Data Analytics approach

In our previous lesson, we introduced the conceptual idea behind a UDA approach to business. In this lesson, we’ll take this conceptual idea and apply it to help illustrate what a UDA approach might look like in a real-world organization.
Did you come up with any examples we didn't include in the video? If you did, nice work! A little later in the course, you'll hear from Databricks customers about how they've applied a UDA approach in their businesses - businesses you are probably familiar with. Next, we'll explore how Databricks helps organizations adopt this type of approach to work with their big data. Before we continue to our next lesson, you'll have the chance to take a quick quiz about what you've learned so far.

Lesson 4: Test

Lesson 5: A brief introduction to Databricks

In the previous lesson, you learned all about UDA - why it is gaining popularity, what it entails, and what it might look like, at a high-level, in a real-world organization. At this point in our course, we’ll pivot to talk about Databricks, which was created to help organizations set up their big data infrastructure using a UDA approach.

Collaborative Data Science Workspace

In this lesson, we’ll review high-level functionality behind the Collaborative Data Science Workspace (Workspace). Workspace functionality The Workspace is the physical location where everyone on your data science team works together, from data ingest to production. Depending on the role of the data practitioner, they’ll use different functionality within the workspace, but they will still be in the same workspace and will have the ability to collaborate with each other. Each Workspace is connected to an organization’s data store, which can (currently) either be in AWS or Azure. This data store serves as their single source of truth, meaning that individuals working in the Workspace can all access and work with the same data. There are three major components of the Data Science Workspace including collaborative notebooks, Managed MLFlow, and the Runtime for Machine Learning. We'll review each of these now.

Lesson 6: Unified Data Service

In this lesson, we'll review the components of the Unified Data Service including the Databricks Runtime, Delta Lake, and Databricks Ingest.
Databricks Runtime: 1. Optimized version of Spark 2. Runs on auto-scaling infrastructure Delta Lake: 1. Adds intelligence to a data lake 2. Benefits include: - Data reliability - Easier data management - Connections to visualization tools Databricks Ingest
How does the Unified Data Service tie back into a Unified Data Analytics approach? First, as already mentioned, the Unified Data Service powers all of the data workflows being conducted by data practitioners as they work with data, from ingest to production. In addition, the Unified Data Service includes features like Delta Lake and Auto Ingest which enable data practitioners to easily store and manage data as business requirements change. The data processed and managed by Unified Data Service are what are used in periodic reporting, real-time dashboards, and artificial intelligence workflows. In our next lesson, we’ll review the final component of the Unified Data Analytics Platform-- the Enterprise Cloud Service.

Lesson 7: Enterprise Cloud Service

Characterstics of the Enterprise Cloud Service: Security Features: - Retain control over your data - Private, isolated, compliant workspaces - Use corporate directories (such as Okta) to help establish data access - Single sign-on - Meet compliance standards (like GDPR (General Data Protection Regulation) and HIPAA (HIPAA is a US regulation that stands for Health Insurance Portability and Accountability Act.)) Simple Administration: - Automatically onboard and offboard users - Audit and analyze user activity - Enforce policy configurations - Set alerts Production-ready Infrastructure: - Ready-to-use environments - APIs to automate version control - On-demand auto-scaling infrastructure How does the Enterprise Cloud Service tie back into a Unified Data Analytics approach? It protects all of the work being done with your data - from ingestion to storage to performing analytics and generating real-time dashboards and periodic reports.

Lesson 8: Knowledge Check

No comments:

Post a Comment