PSTAT 5A: Lecture 00

Introduction to Data Science

Annie Adams

2025-08-04

Welcome!

Course Staff

  • Instructor:
    • Annie (she/her)
    • aradams@ucsb.edu
    • Office Hours: Monday 11 am - 12 pm, Wednesday 11 am - 12 pm

Teaching Assistants:

  • Summer Le
  • sle@ucsb.edu
  • OH: TBD
  • Mallory Wang
  • mallorywang@ucsb.edu
  • OH: TBD

Course Resources

  • Canvas: for grades
  • Gradescope: for homework, quizzes, and labs
  • Course Website: https://annieradams.github.io/pstat5a.github.io/
    • All relevant course material will be posted to the website!
    • One exception: quizzes, which will be administered through Gradescope
  • Please read the syllabus fully and carefully!

Any Questions about the syllabus?

What is Data Science?

  • “Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI) and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning.” - IBM

  • “a field of study that uses scientific methods, processes, and systems to extract knowledge and insights from data” - The US Census Bureau

  • “Data science is the field that uses statistical methods, programming, and domain knowledge to extract insights and make decisions from data. It combines data analysis, machine learning, and data visualization to solve real-world problems across various industries.” - ChatGPT

What is Data Science?

  • All valid definitions! Data science means different things to different people / companies.

  • There isn’t a single agreed-upon definition of what data science is.

  • Most people agree that Data science is cross-disciplinary, drawing experience and expertise from a wide variety of different fields.

    • As we will soon see, the data that is being analyzed these days is huge; certainly too large to be able to do anything with it on pen and paper.

The Path Forward

  • So, how does this course factor into the discourse surrounding Data Science?

  • From the course description:

Introduction to data science. Concepts of statistical thinking. Topics include random variables, sampling distributions, hypothesis testing, correlation and regression. Visualizing, analyzing and interpreting real world data using Python. Computing labs required.

  • Indeed, this course will serve as a sort of “table of contents” of Data Science, touching on many (but still not all) of the many subfields and subtopics that comprise the field.
  • We will start with Descriptive Statistics, a branch of statistics designed to try and describe or summarize data.
  • We will then devote some time to talking about Probability, which is in many ways the theory behind randomness and uncertainty.

  • Next, we will use Inferential Statistics to discuss how we can use data to draw conclusions (i.e. inferences) about the world around us.

    • This will include both Confidence Intervals as well as Hypothesis Testing.
  • Then, we will discuss a topic known as Regression which will be our first (and only, for this class) foray into statistical modeling.

  • We will then take a closer look at how data is collected, and the various strategies that can be utilized when trying to collect data of our own.

Why Should I Care?

  • I suspect not all of you are necessarily pursuing a degree in Statistics or Data Science. However, this day in age, data is truly everywhere, and having strong mathematical thinking will give you a leg up in any role you want.
  • However, wherever there is data, there is the need for a Data Scientist (or, at least, some of the principles from Data Science).

    • So, even if you are working in (what you might think is) a field that is far removed from Statsitics, the minute you start dealing with Data is the minute you start needing to know Data Science!

Artwork by Allison Horst

So, without further ado…. Let’s Get Started!