Module 1: Introduction to Data Science and Systems Thinking

“Understanding the Data Revolution”

Research Document for DATA 201 Course Development


Table of Contents

  1. Introduction: The Data Journey Framework
  2. Part I: The Ancient Origins of Data Collection
  3. Part II: The Birth of Modern Statistics
  4. Part III: Data Visualization Pioneers
  5. Part IV: The Computing Revolution
  6. Part V: From Data to Discovery - Landmark Stories
  7. Part VI: The Information Age
  8. Recommended Resources
  9. References

Introduction: The Data Journey Framework

Every great data story follows a journey:

  1. Collection - How was the data gathered? What motivated someone to count, measure, or record?
  2. Understanding/Modeling - How did they make sense of patterns? What tools and mental frameworks emerged?
  3. Prediction/Inference - What actions resulted? How did data change decisions, save lives, or transform society?

This module explores the human stories behind data science—not just dates and discoveries, but the lives of people who saw patterns where others saw chaos, who counted when others assumed, and who visualized truths that words alone could not convey.


Part I: The Ancient Origins of Data Collection

The Babylonian Census (4000 BCE)

The First Data Collection

The census is older than Egyptian, Greek, and Roman civilizations. Around 4000 BCE, the Babylonians conducted what may be humanity’s first systematic data collection—a census to determine how much food they needed to find for each member of the population.

The Data Journey:

Evidence of these records exists today in the British Museum—clay tiles that represent humanity’s first attempt to transform social reality into manageable numbers.

Sources


Ancient Egypt: Building Pyramids with Data (2500 BCE)

The Egyptians used censuses not for democratic representation, but for monumental engineering. From around 2,500 BCE, they counted their population to:

This represents one of the earliest examples of data-driven project management—using population statistics to coordinate massive construction projects.

The Pharaoh Amasis Census (570 BCE)

The oldest known census record that survives comes from the reign of Pharaoh Amasis around 570 BCE. It was ordered for two practical purposes:

  1. Taxation - knowing who could be taxed
  2. Military planning - knowing who could be called upon in times of war

Key Insight: From the very beginning, data collection was tied to power—the power to tax, the power to mobilize armies, and the power to plan large-scale projects.


The Roman Census: Where the Word Comes From

The word “census” originates from the Latin censere, meaning “to estimate.” The Roman census was arguably the most sophisticated data collection system of the ancient world.

Key Facts:

The Data Journey:

The Roman census wasn’t just counting—it was classification. Your census record determined your social class, your rights, and your obligations. Data became identity.

Sources


The Han Dynasty Census (2 CE): The World’s Most Accurate Ancient Count

The world’s oldest extant (surviving) census data comes from China during the Han Dynasty. Conducted in the fall of 2 CE, scholars consider it remarkably accurate:

This was the largest population in the world at the time. The census focused on taxable families, revealing its fiscal purpose.

A Demographic Mystery: A later Han census in 140 CE recorded only 48 million people—an apparent decline of 11.6 million. Mass migrations to southern China are believed to explain this demographic shift, demonstrating how census data can reveal hidden historical movements.

Sources


The Domesday Book (1086): Medieval Big Data

After the collapse of Rome, systematic census-taking disappeared from Western Europe—until William the Conqueror.

In 1086, just 20 years after conquering England, William ordered the Great Survey—what would become known as the Domesday Book.

The Scale:

The Method:

Why “Domesday”? The English held this book in awe. “Doom” was the Old English term for judgment—like the Last Judgment, this record was definitive and unchallengeable. Once recorded in Domesday Book, a property holding was legally established.

Legacy:

Sources


Part II: The Birth of Modern Statistics

John Graunt and the Bills of Mortality (1662)

The Founding Father of Demography

John Graunt was a London draper—a cloth merchant with no formal scientific training. Yet his 1662 book Natural and Political Observations Made Upon the Bills of Mortality founded three fields simultaneously:

The Bills of Mortality

Since 1593, London had published weekly Bills of Mortality—documents recording births, deaths, and causes of death by parish. These were printed on Thursdays and distributed throughout the city. They were originally created during plague outbreaks so citizens could track the disease’s spread.

What Graunt Did Differently:

Where others saw mere lists of the dead, Graunt saw patterns waiting to be discovered. He:

  1. Created the first life table - Predicting what percentage of people would survive to each age
  2. Discovered sex ratio patterns - More males are born, but males die at higher rates, equalizing the adult population
  3. Identified the “urban penalty” - City dwellers died younger than rural populations
  4. Documented “excess deaths” during epidemics - The statistical fingerprint of disease outbreaks
  5. Developed sampling methods - Using ratios to estimate total populations from partial data

The Revolutionary Insight:

“The originality in his Observations was phenomenal. The new deep perception that Graunt presented was the value of population-level analysis. Healers had always thought about causes of illness and death in individuals, but no one before him had studied community-wide patterns.”

Graunt’s work earned him election to the Royal Society in 1662, endorsed by King Charles II himself. A cloth merchant, through systematic analysis of data, joined the most elite scientific body in England.

Sources


Adolphe Quetelet: The Average Man (1796-1874)

Adolphe Quetelet, a Belgian astronomer and mathematician, asked a radical question: Could statistical methods used for astronomy be applied to human society?

Social Physics

Quetelet founded what he called “social physics”—the application of mathematical analysis to social phenomena. His central concept was l’homme moyen (“the average man”):

“Quetelet postulated that for any population, there exists a typical or ‘average man,’ characterized by the mean values of measured variables that follow a normal distribution.”

Key Discoveries:

The Crime Statistics Paradox:

Quetelet studied French crime statistics and discovered disturbing regularity:

“Thus we pass from one year to another with the sad perspective of seeing the same crimes reproduced in the same order.”

If crime rates are statistically predictable, what does that say about free will? Are criminals making individual choices, or are they products of social forces?

The BMI Origin Story:

In trying to characterize the “average man,” Quetelet developed what we now call the Body Mass Index (BMI)—originally the “Quetelet Index.” He was searching for an ideal human form and believed the average body represented optimal health and beauty.

Controversy:

Quetelet’s work sparked fierce debate:

Sources


Francis Galton: Genius and Darkness (1822-1911)

Francis Galton was Charles Darwin’s cousin—a polymath who made fundamental contributions to statistics while pursuing deeply troubling goals.

Statistical Contributions

Galton developed or discovered:

  1. Regression to the mean - Observing that extreme values tend to be followed by more moderate ones (originally studying sweet pea seeds)
  2. Correlation - Measuring how two variables move together (first calculated correlation coefficients by comparing arm length to height)
  3. Standard deviation - Building on earlier work

The Sweet Pea Experiment:

Galton noticed something odd when breeding sweet peas. If he selected very large seeds and planted them, the offspring seeds were large—but not as large as the parents. They “regressed” toward the average.

This wasn’t a flaw in his experiment. It was a fundamental statistical principle that applies everywhere: extremely tall parents tend to have tall (but not quite as tall) children; exceptional performance tends to be followed by merely good performance.

Fingerprint Identification

Galton collected fingerprints in his anthropometric laboratory and proved:

  1. Fingerprints remain constant throughout life
  2. Fingerprints can serve as unique identifiers
  3. He developed classification characteristics

The Galton-Henry system of fingerprint classification was published in 1900 and adopted by Scotland Yard in 1901. It spread worldwide and remains the basis for forensic fingerprint identification.

The Dark Legacy: Eugenics

In 1883, Galton coined the term “eugenics”—from the Greek for “well-born.” After reading his cousin Darwin’s Origin of Species, Galton became convinced that humanity could be improved through selective breeding.

“He had in mind a purposeful breeding program, similar to agricultural animal husbandry.”

Galton’s ideas led to:

The Statistical-Eugenics Connection:

This dark history is important for data science students to understand. Statistics and data analysis are not neutral tools—they can be weaponized to justify prejudice and oppression. The same person who gave us correlation and regression also laid the groundwork for scientific racism.

Other prominent statisticians who supported eugenics:

Sources


Karl Pearson: The Institutionalization of Statistics (1857-1936)

Karl Pearson established statistics as an academic discipline, founding the world’s first university statistics department at University College London in 1911.

Key Contributions

  1. Chi-squared test (1900) - A method to test whether observed data differs significantly from expected values
  2. Pearson correlation coefficient - The standard measure of linear correlation
  3. Standard deviation - Pearson coined the term in an 1893 lecture
  4. Standardized methods for estimator errors

The Gresham College Lectures

From 1891 to 1894, Pearson was Professor of Geometry at Gresham College, delivering public lectures that attracted over 300 attendees. His lectures on:

These lectures transformed statistics from a scattered set of techniques into a coherent mathematical discipline.

The Biometrika Journal

In 1901, with W.F.R. Weldon and Francis Galton, Pearson founded Biometrika—the first journal dedicated to mathematical statistics. He edited it until his death. This institutionalized statistics as a field with its own publication venue, peer review, and professional community.

Sources


R.A. Fisher and “The Lady Tasting Tea” (1935)

Ronald A. Fisher is considered one of the greatest statisticians of the 20th century. His 1935 book The Design of Experiments introduced the concept of the null hypothesis through a charming story.

The Story

At a tea party in Cambridge (sometime in the 1920s), a colleague named Muriel Bristol claimed she could tell whether milk or tea was added to the cup first. The scientists were skeptical—surely this was impossible!

Her future husband, William Roach, suggested Fisher design an experiment. Fisher proposed:

The Statistical Framework

Fisher asked: What is the probability she could identify all eight correctly by pure chance?

Answer: 1/70, or about 1.4%

This simple experiment established:

  1. The null hypothesis - “The subject has no ability to distinguish the teas”
  2. Randomization in experimental design
  3. Statistical significance - We reject the null only if the probability of the result occurring by chance is sufficiently low

Did It Work?

According to Fisher’s colleague H. Fairfield Smith, Bristol correctly identified all eight cups.

Fisher’s Exact Test

The mathematical method Fisher developed for this problem became known as Fisher’s Exact Test, still used today when sample sizes are small and the chi-squared approximation is unreliable.

Sources


Part III: Data Visualization Pioneers

William Playfair: The Scottish Scoundrel (1759-1823)

William Playfair invented most of the statistical graphics we use today—and led one of the most colorful lives imaginable.

Career Trajectory

Playfair was, in turn:

He was present at the storming of the Bastille in 1789.

Graphical Inventions

1786: The Commercial and Political Atlas

“Much to Playfair’s frustration, when he tried to plot trade data for Scotland, he found that there were a lot of records missing, meaning he couldn’t plot a time series as usual. And so the bar chart was born.”

Playfair himself considered bar charts “inferior in utility” to line charts!

1801: Statistical Breviary

Why Playfair Succeeded

Playfair’s training with James Watt as an engineering draftsman gave him skills in technical drawing. But more importantly, Playfair had an intuitive understanding of human perception:

“William Playfair… had an instinctive understanding of our psychological capabilities and, moreover, understood how to exploit them. He anticipated many ideas that are the focus of work in experimental psychology to this day.”

Legacy Rediscovered

Playfair’s work was often neglected after his death, corresponding to periods when statistical graphics fell out of fashion. With the rise of computer-based data visualization, interest has surged. In 2010, a copy of his Commercial and Political Atlas sold at Christie’s for $43,750.

Sources


Florence Nightingale: The Lady with the Lamp and the Data (1820-1910)

Florence Nightingale is remembered as the founder of modern nursing. Less known is her role as a pioneer of data visualization and statistical advocacy.

The Crimean War Crisis

In 1854, Nightingale led a team of nurses to care for British soldiers in the Crimean War. What she found horrified her:

Her Response: She immediately began counting things. “She recognized the counting system was in complete shambles. She was very much in favor of fact-based statistics.”

The Rose Diagram (Coxcomb Chart)

In 1858, Nightingale created her famous “polar area diagram”—often called the “coxcomb” or “rose diagram”:

The Shocking Truth:

The diagram revealed that most soldiers who died during the Crimean War died of sickness rather than of wounds. After sanitary improvements were made (March 1855), death rates plummeted.

Collaboration with William Farr

Nightingale worked closely with William Farr, a founder of medical statistics. Together, they compiled rigorous data from battlefield hospitals.

“She is famous for using graphical displays of her data to give the statistics context, realizing early on that officials would likely ignore numbers without a picture to get their attention.”

Impact

Within months of publication:

The Data Journey:

Sources


John Snow: The Map That Stopped an Epidemic (1813-1858)

In 1854, cholera struck London’s Soho neighborhood with devastating speed. Dr. John Snow’s investigation became a founding moment of epidemiology—and a landmark in data visualization.

The Setting

London in 1854 was the world’s largest city (2.5 million people) with:

The prevailing theory blamed “miasma”—bad air—for cholera transmission.

Snow’s Method

Snow did something unprecedented: he mapped the deaths.

The Pattern:

Deaths clustered densely around one pump—the Broad Street pump. Areas served by other pumps had far fewer deaths.

The Pump Handle

Snow presented his findings to local officials. On September 8, 1854, they removed the handle from the Broad Street pump. The epidemic slowed.

Important Caveat: Snow acknowledged that people fleeing the area may also have reduced deaths. The epidemic was already declining when the pump was disabled. But his analysis provided the first compelling evidence for waterborne transmission of cholera.

Legacy

Sources


Charles Minard: The Best Statistical Graphic Ever Drawn (1781-1870)

Charles Minard was a French civil engineer who, after retirement at age 70, devoted himself to creating “graphic tables and figurative maps.”

The Napoleon Graphic (1869)

At age 88, Minard created what information designer Edward Tufte called “may well be the best statistical graphic ever drawn”—a visualization of Napoleon’s 1812 Russian campaign.

Six Variables in Two Dimensions:

  1. Army size - The thickness of the line (1 mm = 10,000 men)
  2. Geographic location - Latitude and longitude
  3. Direction of travel - Tan line advancing, black line retreating
  4. Date - Connected to the temperature scale
  5. Temperature - Scale at bottom showing the brutal Russian winter
  6. Terrain - Rivers crossed, cities passed

The Story in Numbers:

The graphic shows the army shrinking as it advances and dying in the retreat through Russian winter. The temperature scale at the bottom shows temperatures dropping to -30°C.

“Brutal Eloquence”

French physiologist Étienne-Jules Marey praised the graphic’s “brutal eloquence, which seems to defy the pen of the historian.”

Why It Works:

The power comes from combining statistical reality (the numbers) with human geography (the actual route) and environmental context (the temperature). You don’t just see that many soldiers died—you see where they died and why.

Sources


Part IV: The Computing Revolution

Ada Lovelace: The First Programmer (1815-1852)

Ada Lovelace, daughter of the poet Lord Byron, is credited with writing the first computer program—a century before electronic computers existed.

Meeting Babbage

In June 1833, at age 17, Ada met Charles Babbage at a party. Babbage showed her his prototype Difference Engine—a mechanical calculator. He was so impressed by her intellect that he called her “The Enchantress of Number.”

The Analytical Engine

Babbage later designed a more ambitious machine: the Analytical Engine—a general-purpose mechanical computer that was never built but anticipated modern computer architecture.

In 1843, Ada translated an Italian article about the Analytical Engine, adding her own notes that were three times longer than the original article.

Note G: The First Algorithm

Ada’s “Note G” described a method for the Analytical Engine to calculate Bernoulli numbers. This is recognized as the first published computer algorithm.

“Bernoulli numbers can be calculated in many ways, but Lovelace deliberately chose an elaborate method in order to demonstrate the power of the engine.”

Visionary Insight

Ada saw something that even Babbage missed:

“She developed a vision of the capability of computers to go beyond mere calculating or number-crunching… Lovelace was the first to point out the possibility of encoding information besides mere arithmetical figures, such as music, and manipulating it with such a machine.”

She understood that computers could manipulate symbols, not just numbers—the fundamental distinction between calculation and computation.

Legacy

Sources


Herman Hollerith: The Punch Card Revolution (1860-1929)

The 1880 US Census took eight years to tabulate. Projections warned the 1890 census might not be finished before the 1900 census began!

The Problem

The US Constitution requires a census every decade. With fewer than 4 million Americans in 1790, this was manageable. With 63 million in 1890, it was becoming impossible.

Hollerith’s Solution

Herman Hollerith, frustrated by the tedious manual process while working at the Census Office, invented an electromechanical tabulating machine using punched cards.

Key Insight: A datum could be recorded by the presence or absence of a hole at a specific location on a card—essentially binary encoding.

The 1888 Competition

The Census Office held a competition. Three systems were tested:

For data preparation, Hollerith logged 5.5 hours versus 44.5 and 55.5 for competitors.

The 1890 Census Success

Results:

The Road to IBM

1896: Hollerith founded the Tabulating Machine Company 1911: Merged into Computing-Tabulating-Recording Company (CTR) 1924: CTR renamed International Business Machines Corporation (IBM)

Legacy:

Hollerith’s punched card system dominated data processing for nearly a century. It introduced:

Sources


The Women of ENIAC: Hidden Figures of Computing (1940s)

Before “computer” meant a machine, it meant a job description.

Human Computers

During World War II, women—often with mathematics degrees—were hired to perform ballistic calculations by hand. They could be paid much less than men with comparable training.

At the University of Pennsylvania’s Moore School, 200 female computers calculated artillery-firing tables for the US Army. Even so, one table took about a month to complete.

The ENIAC Project

The ENIAC (Electronic Numerical Integrator and Computer) was the first general-purpose, programmable, all-electronic computer—a secret US Army project with 18,000 vacuum tubes.

Out of approximately 100 human computers, six women were chosen to program ENIAC:

  1. Jean (Jennings) Bartik
  2. Betty (Snyder) Holberton
  3. Frances (Bilas) Spence
  4. Kay (McNulty) Mauchly
  5. Marlyn (Wescoff) Meltzer
  6. Ruth (Lichterman) Teitelbaum

Programming Without Manuals

“There were no manuals available and ‘programming’, as we know it today, didn’t yet exist—it was much more physical. Not only did the ‘ENIAC six’ have to correctly wire each cable they had to fully understand the machine’s underlying blueprints and electronic circuits.”

They taught themselves, learning by trial and error, sometimes crawling inside the machine to fix broken wires.

Hidden from History

When ENIAC was presented to the press in 1946, the six women programmers were not mentioned. Programming was seen as “subprofessional” women’s work—the hardware was considered important, not the software.

A museum photo later labeled them as “just models hired to make the machine look better.”

Rediscovery

In the 1980s, a young programmer named Kathy Kleiman found the photo and refused to accept the “models” explanation. Her investigation revealed the truth.

Jean Bartik’s contributions went unrecognized for 40 years. She and Betty Holberton later worked on UNIVAC with Grace Hopper.

Sources


Alan Turing: Breaking Enigma with Data (1912-1954)

Alan Turing’s work at Bletchley Park during World War II represents one of the greatest data analysis feats in history—though it remained secret for decades.

The Enigma Challenge

Germany’s Enigma machine could produce messages with 158 quintillion possible settings (later increased further). The Germans believed it unbreakable.

Polish Groundwork

Polish cryptanalysts, recognizing that Enigma required mathematical rather than linguistic analysis, achieved the first breaks in the 1930s. When Poland was invaded in 1939, they shared their work with Britain.

The Bombe

Within weeks of arriving at Bletchley Park in September 1939, Turing designed the “bombe”—an electromechanical device to search for Enigma settings.

The Method:

Turing’s approach relied on “cribs”—likely fragments of plaintext. If you could guess part of a message (like the standard greeting “Heil Hitler”), the bombe could test which Enigma settings would produce that result.

Breaking Naval Enigma

The German Navy used a more complex Enigma system. Turing and his team cracked it, allowing the Allies to track U-boat movements during the Battle of the Atlantic (1941-1943).

Scale and Secrecy

The Bletchley Park operation grew from hundreds of workers to 10,000 at peak in 1944.

The operation remained classified until 1974—nearly 30 years after the war ended. Only then did the world learn what had been achieved.

Impact

General Dwight D. Eisenhower said the ULTRA intelligence (derived from Enigma decrypts) “saved thousands of British and American lives and, in no small way, contributed to the speed with which the enemy was routed.”

Historians estimate Bletchley Park’s work shortened the war by two years, saving millions of lives.

Sources


Part V: From Data to Discovery - Landmark Stories

Galileo and the Pendulum (c. 1602)

The Birth of Quantitative Science

According to his student Viviani, young Galileo sat in the Pisa cathedral watching a lamp swing back and forth. Using his pulse to measure time, he noticed something remarkable: the period of swing was independent of how far the lamp swung.

This observation—isochronism—would revolutionize timekeeping and science itself.

A New Way of Thinking

“Galileo quickly began questioning the Aristotelian approach. Where Aristotle had taken a qualitative and verbal approach, Galileo developed a quantitative and mathematical approach.”

Galileo’s key innovations:

Stephen Hawking wrote: “Galileo, perhaps more than any other single person, was responsible for the birth of modern science.”

The Pendulum Clock

Galileo never built a pendulum clock, but his principle enabled Christiaan Huygens to build the first one in 1657. Pendulum clocks remained the world’s most accurate timekeepers for 300 years, until the 1930s.

The Data Journey:

Sources


Tycho Brahe and Kepler: The Partnership That Unlocked the Solar System

Tycho Brahe’s Obsession (1546-1601)

Tycho Brahe, a Danish nobleman, was dissatisfied with the accuracy of existing astronomical tables. He dedicated his life—and considerable wealth—to fixing this.

Resources: The King of Denmark gave Tycho:

Achievement: Twenty years of continuous observations of planetary positions, accurate to one arc-minute—a tremendous feat before the telescope.

Enter Johannes Kepler

In 1600, the young mathematician Johannes Kepler became Tycho’s assistant in Prague. Tycho had the data; Kepler had the mathematical skills to analyze it.

But Tycho mistrusted Kepler, fearing the young man might eclipse him. He revealed only partial data, assigning Kepler the particularly troublesome observations of Mars.

The Ironic Twist

Mars has the most elliptical orbit of the visible planets. In trying to fit circular orbits to Mars’s motion—as everyone assumed planets must move—Kepler repeatedly failed.

This failure forced him to a revolutionary conclusion: planetary orbits are ellipses, not circles.

“In a twist of irony, Brahe unwittingly gave Kepler the very part of his data that would enable Kepler to formulate the correct theory of the solar system, banishing Brahe’s own geocentric theory.”

Kepler’s Three Laws

From Tycho’s data, Kepler derived:

  1. Planets move in elliptical orbits with the Sun at one focus
  2. Planets sweep out equal areas in equal times
  3. The orbital period squared is proportional to the semi-major axis cubed

These laws enabled Newton to formulate universal gravitation.

The Data Journey:

Sources


Gauss and the Lost Planet Ceres (1801)

The Birth of Least Squares

On January 1, 1801, Italian astronomer Giuseppe Piazzi discovered a new celestial body between Mars and Jupiter—filling a gap predicted by the Titius-Bode Law. He named it Ceres.

The Problem

Astronomers could only observe Ceres for 41 days before it disappeared behind the Sun. When it emerged months later, they couldn’t find it. They had data on less than 1% of its orbit—how could they predict where it would reappear?

The Challenge: Solve Kepler’s complex non-linear equations for elliptical orbits with minimal data.

Gauss’s Solution

Carl Friedrich Gauss, then just 24 years old, applied new mathematical techniques to the problem. His prediction pointed to an entirely different region of the sky than other astronomers suggested.

On December 7, 1801, astronomer Franz Xaver von Zach found Ceres—within half a degree of where Gauss predicted.

The Least Squares Method

In his 1809 book Theoria Motus Corporum Coelestium, Gauss described the method of least squares—minimizing the sum of squared errors when fitting a model to data.

“Gauss went beyond Legendre and succeeded in connecting the method of least squares with the principles of probability and to the normal distribution.”

Legacy: Least squares remains one of the most fundamental techniques in statistics and machine learning. Every linear regression uses Gauss’s insight.

The Data Journey:

Sources


Semmelweis: The Doctor Who Could Have Saved Millions (1818-1865)

Ignaz Semmelweis made one of the most important medical discoveries in history—and was destroyed for it.

The Vienna Maternity Clinics

At Vienna General Hospital in the 1840s, two maternity clinics operated side by side:

Women begged not to be sent to Clinic 1. Some gave birth in the street rather than enter.

The Breakthrough (1847)

Semmelweis’s friend Jakob Kolletschka died after being accidentally cut during an autopsy. His autopsy revealed pathology identical to women dying of childbed fever.

The Connection: Doctors and students in Clinic 1 went directly from autopsies to delivering babies—carrying “cadaverous particles” on their hands. Midwives in Clinic 2 never touched corpses.

The Intervention

In May 1847, Semmelweis required all doctors and students to wash their hands with chlorinated lime solution before examining patients.

Results:

The Rejection

Despite clear evidence, the medical establishment rejected Semmelweis’s findings:

His contract was not renewed. He returned to Hungary, grew increasingly unstable, and died in a mental institution in 1865—possibly beaten by guards.

Twenty years later, Louis Pasteur and Joseph Lister validated germ theory. Semmelweis became “the savior of mothers”—posthumously.

The Data Journey:

Sources


Edward Lorenz and the Butterfly Effect (1963)

The Discovery of Chaos

On a winter day in 1961, Edward Lorenz, a meteorology professor at MIT, ran a computer simulation of weather patterns. He decided to repeat a run—but rounded one variable from .506127 to .506.

The result completely diverged from the original.

Deterministic Chaos

Lorenz had discovered something profound: in certain systems, tiny differences in initial conditions produce vastly different outcomes—even though the underlying equations are completely deterministic.

His 1963 paper “Deterministic Nonperiodic Flow” founded chaos theory.

The Butterfly Effect

The famous metaphor came later. In 1972, Lorenz couldn’t think of a title for a talk. His colleague Philip Merilees suggested:

“Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?”

The name stuck.

Implications for Weather Prediction

“In meteorology, it led to the conclusion that it may be fundamentally impossible to predict weather beyond two or three weeks with a reasonable degree of accuracy.”

This isn’t a limitation of our instruments or computers—it’s a fundamental property of the atmosphere.

The Lorenz Attractor

Lorenz’s simplified model of atmospheric convection produces a beautiful mathematical object—the Lorenz attractor—whose shape famously resembles a butterfly.

Impact

Some scientists argue the 20th century will be remembered for three scientific revolutions:

  1. Relativity
  2. Quantum mechanics
  3. Chaos

MIT colleague Kerry Emanuel: “By showing that certain deterministic systems have formal predictability limits, Ed put the last nail in the coffin of the Cartesian universe.”

The Data Journey:

Sources


The Hudson Bay Company: 200 Years of Predator-Prey Data

For over two hundred years, trappers working for the Hudson’s Bay Company recorded pelts traded—creating one of the longest ecological time series in existence.

The Data

Starting in the 1840s, records track populations of:

The data shows a striking pattern: populations oscillate with a period of about 10 years, with the lynx population lagging the hare population.

Lotka-Volterra Model

In the 1920s, Alfred Lotka and Vito Volterra independently developed differential equations describing predator-prey dynamics:

The Hudson Bay data provided empirical validation of these theoretical models.

What the Data Shows

“Notice how the predator population lags the prey population: an increase in prey numbers results in a delayed increase in predator numbers as the predators eat more prey.”

This phase lag—about one quarter of a cycle—is a fundamental signature of predator-prey dynamics.

Modern Analysis

The data continues to be studied:

The Data Journey:

Sources


Part VI: The Information Age

Claude Shannon: The Magna Carta of the Digital Era (1916-2001)

In 1948, Claude Shannon, a mathematician at Bell Labs, published a paper that created the foundation for all digital communication.

A Mathematical Theory of Communication

Shannon’s paper appeared in the Bell System Technical Journal. Historian James Gleick rated it more important than the transistor—”even more profound and more fundamental.”

Scientific American called it the “Magna Carta of the Information Age.”

Key Concepts

Information Entropy: Shannon defined a measure of information content analogous to entropy in thermodynamics—essentially, the number of binary digits (bits) needed to encode a message. He credited John Tukey with coining the term “bit.”

Channel Capacity: Every communication channel has a maximum rate at which information can be reliably transmitted—the Shannon limit. You can approach it but never exceed it.

Error Correction: Shannon proved you can transmit information with arbitrarily small error rates below channel capacity—a result that surprised engineers who believed reducing errors required reducing speed.

Impact

“His theory was motivated by practical engineering problems. And while it was esoteric to the engineers of his day, Shannon’s theory has now become the standard framework underlying all modern-day communication systems: optical, underwater, even interplanetary.”

Roboticist Rodney Brooks: Shannon was “the 20th century engineer who contributed the most to 21st century technologies.”

The Reluctant Publisher

Remarkably, Shannon initially wasn’t planning to publish the paper. He only did so at colleagues’ urging.

Sources


Recommended Resources

Books

History of Statistics and Data Science

Data Visualization

Computing History

Online Courses

Websites

Videos and Documentaries

Interactive Visualizations


References

Primary Sources and Academic Papers

Ancient and Medieval Data Collection

Early Statistics and Demography

Quetelet and Social Statistics

Galton, Pearson, and Fisher

Data Visualization

Computing History

Scientific Discoveries

Information Theory


Document compiled for SCDS DATA 201: Introduction to Data Science I Module 1: Introduction to Data Science and Systems Thinking “Understanding the Data Revolution”