Course Outline
MODULES OF DATA SCIENCE DEEP-DIVE
- MODULE 1
SQL - MODULE 2
Python - MODULE 3
GIT - MODULE 4
Math & Statistics - MODULE 5
Machine Learning - MODULE 6
Visual Programming - MODULE 7
Data Visualization - MODULE 8
Subject Matter Expert
SOFTWARE / TOOLS / LIBRARIES USED IN THIS COURSE
- Anaconda/Jupyter Notebooks
- SQL
- SQLite
- Python
- Python Libraries
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Bokeh
- Plotly
- Statsmodel
- Scikit-learn
- Python Libraries
- Git
- RapidMiner/KNIME
- Tableau
Course 1 (Become SQL Ninja)
LESSON 1: BASIC SQL
- Write common SQL commands including SELECT, FROM, and WHERE.
- Use logical operators like LIKE, AND, and OR.
LESSON 2: SQL JOINS
- Write JOINs in SQL, as you are now able to combine data from multiple sources to answer more complex business questions.
- Understand different types of JOINs and when to use each type.
LESSON 3: SQL AGGREGATIONS
- Write common aggregations in SQL including COUNT, SUM, MIN, and MAX.
- Write CASE and DATE functions, as well as work with NULLs.
LESSON 4: ADVANCED SQL QUERIES
- Use subqueries, also called CTEs, in a number of different situations.
- Use other window functions including RANK, NTILE, LAG, LEAD new functions along with partitions to complete complex tasks.
COURSE PROJECT 1: INVESTIGATE A DATABASE
- In this project, learners will work with a relational database while working with SQLite. They’ll complete the entire data analysis process, starting by posing a question, running appropriate SQL queries to answer questions, and finishing by sharing findings.
Course 2 (Become Python Hunter)
LESSON 1: WHY PYTHON PROGRAMMING
- Gain an overview of what you’ll be learning and doing in the course.
- Understand why you should learn programming with Python.
- Represent data using Python’s data types: integers, floats, booleans, strings,
lists, tuples, sets, dictionaries, compound data structures.
LESSON 2: DATA TYPES & OPERATORS
- Perform computations and create logical statements using Python’s operators.
- Declare, assign, and reassign values using Python variables.
- Modify values using built-in functions and methods.
- Practice whitespace and style guidelines.
- Write conditional expressions using if statements and boolean expressions to Add decision-making to your Python programs.
LESSON 3: CONTROL FLOW
- Use for and while loops along with useful built-in functions to iterate over and
Manipulate lists, sets, and dictionaries. - Skip iterations in loops using break and continue.
- Condense for loops to create lists efficiently with list comprehensions.
LESSON 4: FUNCTIONS
- Define your own custom functions.
- Create and reference variables using the appropriate scope.
- Add documentation to functions using docstrings.
- Define lambda expressions to quickly create anonymous functions.
- Use iterators and generators to create streams of data.
LESSON 5: NUMPY
- Create, access, modify, and sort multidimensional NumPy arrays (ndarrays).
- Load and save ndarrays.
- Use slicing, boolean indexing, and set operations to select or change subsets of an ndarray.
- Understand the difference between a view and a copy of ndarray.
- Perform element-wise operations on ndarrays.
- Use broadcasting to perform operations on ndarrays of different sizes.
LESSON 6: PANDAS
- Create, access, and modify the main objects in pandas, Series and DataFrames.
- Perform arithmetic operations on Series and DataFrames.
- Load data into a DataFrame.
- Deal with Not a Number (NaN) values.
NUMPY & PANDAS: CASE STUDY 1
- Perform the entire data analysis process on a dataset.
- Learn more about NumPy and pandas to wrangle, explore, analyze, and visualize data.
COURSE PROJECT 1: INVESTIGATE A DATASET
- In this project, learners will choose one of Kaggle’s curated datasets and investigate it using numPy and pandas.
- They’ll complete the entire data analysis process—starting by posing a question and finishing by sharing their findings.
Course 3 (Become GIT Hero)
LESSON 1: PURPOSE & TERMINOLOGY
- Learn why developers use version control and discover ways you use version control in your daily life.
- Get an overview of essential Git vocabulary.
- Configure Git using the command line.
LESSON 2: CREATE A GIT REPO
- Create your first Git repository with git init.
- Copy an existing Git repository with git clone.
- Review the current state of a repository with the powerful git status.
LESSON 3: REVIEW A REPO’S HISTORY
- Review a repo’s commit history git log.
- Customize git log’s output using command line flags in order to reveal more (or less) information about each commit.
- Use the git show command to display just one commit.
LESSON 4: ADD COMMITS TO A REPO
- Master the Git workflow and make commits to an example project.
- Use git diff to identify what parts of a file have been changed in a commit.
- Learn how to mark files as “untracked” using .gitignore.
LESSON 5: TAGGING, BRANCHING & MERGING
- Explore tagging, branching, and merging.
- Organize your commits with tags and branches.
- Jump to particular tags and branches using git checkout.
- Learn how to merge together changes on different branches and crush those pesky merge conflicts.
LESSON 6: UNDOING CHANGES
- Learn how and when to edit or delete an existing commit.
- Use git commit’s -amend flag to alter the last commit.
- Use git reset and git revert to undo and erase commits.
COURSE PROJECT 1: POST YOUR WORK ON GITHUB
- Learn the important tools that all programmers use. First, get an introduction to working in the terminal.
- Next, learn to use Git and Github to manage versions of a program and collaborate with others on programming projects.
- Learners will post two different versions of a Jupyter Notebook capturing learnings from the course, and add commits to their project Git repository.
Course 4 (Become Math & Statistics Guru)
LESSON 1: SIMPSON’S PARADOX
- Examine a case study to learn about Simpson’s Paradox.
LESSON 2: STATISTICS & PROBABILITY
- Learn the fundamental rules of Statistics.
- Learn the fundamental rules of Probability.
LESSON 3: BINOMIAL DISTRIBUTION
- Learn about binomial distribution where each observation represents one of two outcomes.
- Derive the probability of a binomial distribution.
LESSON 4: CONDITIONAL PROBABILITY
- Learn about conditional probability, i.e., when events are not independent.
LESSON 5: BAYES RULE
- Build on conditional probability principles to understand the Bayes rule.
- Derive the Bayes theorem.
LESSON 6: STANDARDIZING
- Convert distributions into the standard normal distribution using the Z-score.
- Compute proportions using standardized distributions.
LESSON 7: SAMPLING DISTRIBUTIONS & CENTRAL LIMIT THEOREM
- Use normal distributions to compute probabilities.
- Use the Z-table to look up the proportions of observations above, below, or in between values.
LESSON 8: HYPOTHESIS TESTING
- Use critical values to make decisions on whether or not a treatment has changed the value of a population parameter.
LESSON 9: T-TESTS & A/B TESTS
- Test the effect of a treatment or compare the difference in means for two groups when we have small sample sizes.
LESSON 10: REGRESSION
- Build a linear regression model to understand the relationship between independent and dependent variables.
- Use linear regression results to make a prediction.
LESSON 11: MULTIPLE LINEAR REGRESSION
- Use multiple linear regression results to interpret coefficients for several predictors.
LESSON 12: LOGISTIC REGRESSION
- Use logistic regression results to make a prediction about the relationship between categorical dependent variables and predictors.
COURSE PROJECT 1: ANALYZE A/B TEST RESULTS
- In this project, learners will be provided a dataset reflecting data collected from an experiment. They’ll use statistical techniques to answer questions about the data and report their conclusions and recommendations in a report.
Course 5 (Become Machine Learning Engineer)
LESSON 1: SUPERVISED LEARNING
- Understand the advantages of using machine learning pipelines to streamline the data preparation and modeling process.
- Chain data transformations and an estimator with scikitlearn’s pipeline.
LESSON 2: UNSUPERVISED LEARNING
- Use feature unions to perform steps in parallel and create more complex workflows.
- Grid search over pipeline to optimize parameters for entire workflow.
- Complete a case study to build a full machine learning pipeline that prepares data and creates a model for a dataset.
COURSE PROJECT 1: FIND DONORS FOR CHARITYML WITH KAGGLE
- CharityML is a fictitious charity organization that provides financial support for people learning machine learning.
- In an effort to improve donor outreach effectiveness, you’ll build an algorithm that best identifies potential donors.
- Your goal will be to evaluate and optimize several different supervised learners to determine which algorithm will provide the highest donation yield.
- You can also submit this project in a Udacity competition on Kaggle to see how you rank vs. your fellow students.
COURSE PROJECT 2: CREATING CUSTOMER SEGMENTS
- The data and design for this project was provided by Arvato Financial Services.
- You will apply unsupervised learning techniques on demographic and spending data for a sample of German households.
- You will preprocess the data, apply dimensionality reduction techniques, and implement clustering algorithms to segment customers with the goal of optimizing customer outreach for a mail order company.
Course 6 (Become Visual Programming Hercules)
LESSON 1: NO/LOW CODE ML USING RAPIDMINER
- The RapidMiner data science platform allows users to build complex analytics workflows with little to no code.
- Through this course, learn how to get started with RapidMiner.
- Discover what support RapidMiner offers for the analytics and artificial intelligence workflow, as well as the various tools included with RapidMiner.
- Next, explore the basics of machine learning and compare supervised and unsupervised learning models.
- Finally, work with RapidMiner Studio, and learn about the tool’s different panels.
- Upon completion, you’ll be able to set up to build predictive models in RapidMiner
COURSE PROJECT 1: PERFORMING REGRESSION ANALYSIS
- Using RapidMiner to retrieve data and use it for modeling.
- Build a workflow to train regression models by using operators for data cleaning, imputing missing values, one-hot encoding, and partitioning your data.
- Finally, train multiple models for regression analysis and compare their performance and perform hyperparameter tuning to get the best model design for your use case.
LESSON 2: NO/LOW CODE ML USING KNIME
- The KNIME Analytics Platform makes machine learning and data analytics more accessible by allowing you to build complex workflows with little to no code.
- Through this course, learn how the KNIME platform works.
- Examine the role of the KNIME Analytics Platform and the KNIME Community Hub.
- Next, explore machine learning basics and how supervised and unsupervised learning techniques work.
- Finally, discover how to set up the KNIME Analytics Platform and get familiar with the KNIME user interface.
- Upon completion, you’ll be able to handle building machine learning workflows using KNIME.
COURSE PROJECT 2: BUILDING & USING CLASSIFICATION MODELS
- The KNIME Analytics Platform allows you to load, explore, pre-process, and use your data to train classification models with little to no code.
- Explore classification models and the metrics used to evaluate their performance.
- Next, construct a KNIME workflow to load and view the data for a classification model.
- You will clean data, impute missing values, and cap and floor outlier values in a range.
- Then you will identify and filter correlated variables and you will convert categorical data to numeric values and express numeric variables.
- Finally, train several different classification models on the training data, evaluate them using the test data, and select the best model using hyperparameter tuning.
- Upon completing this course, you will have the skills and knowledge to train, clean, and process your data and to use that data to train classification models and perform hyperparameter tuning.
Course 7 (Become Data Visualization Specialist)
LESSON 1: TABLEAU DESKTOP: REAL TIME DASHBOARDS
- Begin with an introduction to real-time dashboards and differences between real time and streaming data.
- Learn how to build a dashboard in Tableau and update it in real time.
- Discover how to organize your dashboard by adding objects and adjusting the layout. Then customize and format different aspects of dashboards in Tableau and add interactivity using actions like filtering.
- Look at creating a dashboard starter, a prebuilt dashboard that can be used with Tableau Online to connect to cloud data sources. Add extensions to your dashboard such as the Tableau Extensions API (application program interface).
- Explore how to put together a simple dashboard story, which consists of sheets—each sheet in sequence is called a story point—and how to share a dashboard in Tableau.
COURSE PROJECT 1: CREATING REAL TIME DASHBOARD
- Tableau is a software that is used for data visualization and business intelligence. It can help you extract insights from the data generated in your company.
- Tableau is easy to install and learn and also provides its users with a simple interface.
- Integrated tools that can be used to visualize your data.
- Integration of different views that can be used to present data.
- Allows users to apply filters/formatting/drill-downs, create sets, groups, generate trend lines, and perform forecasting.
COURSE PROJECT 2: DATA VISUALIZATION WITH PYTHON USING MATPLOTLIB, SEABORN, BOKEH, AND PLOTLY
- Python provides various libraries that come with different features for visualizing data.
- All these libraries come with different features and can support various types of graphs.
- In this tutorial, we will be discussing four such libraries.
- Matplotlib
- Seaborn
- Bokeh
- Plotly
Course 8 (Become Subject Matter Expert)
LESSON 1: PREDICTIVE ANALYTICS
- While data is one of the most valuable assets of an organization, predictive analytics has become the most important task to examine this data, using business knowledge and extracting valuable insights and interesting patterns.
- Organizations use predictive analytics for various advantages including understanding their customer base, improving their operations, increasing revenues, outperforming their competitors, and better positioning themselves in the marketplace.
LESSON 2: CASE STUDIES IN DIFFERENT ORGANIZATIONS
- Predictive Analytics: Identifying Machine Failures
- Predictive Analytics: Predicting Sales & Customer LifeTime Value
- Predictive Analytics: Predicting Responses to Marketing Campaigns
- Predictive Analytics: Applying Clustering to Soil Features & Conditions
- Predictive Analytics: Predictive Analytics for Healthcare