Foundations of measurement and data types for building data skills

9 min readSep 5, 2023

A non-technical starting point for students in the applied social sciences

Note: This is a work in progress as part of my Data Visualization course at the University of Michigan’s School of Social Work. Feel free to subscribe to my articles if you are interested in articles for developing foundation data skills.

Measurement is the foundation for working with data because it provides the rules and structure for assigning numbers to represent real-world attributes. Measurement establishes the data’s fundamental mathematical properties, determining what statistical procedures can validly be applied. The resulting data analysis could be meaningful and accurate by adequately measuring attributes on the correct scale.

Measurement allows attributes like income, motivation, depression, etc., to be quantified consistently according to explicit rules. This numerical representation of attributes enables data analysis — numbers can be statistically summarized, compared, and modeled in ways that raw qualitative descriptions cannot. The assigned numbers reflect the actual properties of the attribute.

Choosing the wrong or violating scale assumptions in data analysis invalidates the results. For example, averaging an ordinal motivation scale would wrongly treat ranked scores as equidistant. Interval scales with arbitrary zero points cannot yield meaningful ratios. Using the wrong statistics for a given scale yields nonsense conclusions.

Valid measurement provides a firm foundation for meaningful data analysis and statistics. The scale properties determine appropriate analytical techniques. Data science builds directly on the rigorous quantification of attributes provided by measurement. Applying data analytics and statistics will only succeed with correctly measuring phenomena on appropriate scales.

Careful attention to measurement rules and scale assumptions allows the incredible power of statistical analysis to be brought to bear on understanding complex social work problems. However, the full potential of data requires first establishing a solid measurement of attributes on a valid scale. Doing the hard work to quantify subjective qualities objectively enables rigorous data analysis to reveal new insights.

Scales of Measurement

We establish scales by assigning numbers to phenomena or events based on specific rules. The nature of these scales, shaped by the operations employed in measurement and the mathematical properties of the assigned numbers, is vital for accurate data interpretation and decision-making in these fields. Four primary measurement scales exist; graspingtheir nuances is essential for social work and public health professionals.

1. Nominal Scales: These scales are categorical, where numbers function as mere labels without any quantitative essence. Examples in the realm of social work and public health include:

Types of interventions: Counseling (1), Rehabilitation (2), Support groups (3)
Disease classifications: Communicable (1), Non-communicable (2)
Types of public health campaigns: Vaccination (1), Mental health awareness (2), Nutrition (3)

Only basic statistics, like frequency counts and mode, are applicable. Using measures like mean on nominal data can result in misleading conclusions.

2. Ordinal Scales: These scales rank items, providing an order, but the exact magnitude of differences between ranks is ambiguous. Relevant examples are:

Severity of a condition: Mild, Moderate, Severe
Client satisfaction with social services: Dissatisfied, Neutral, Satisfied
Risk levels in community health assessments: Low, Medium, High

While medians and percentiles are discernible, using means or standard deviations might not be suitable due to the imprecise interval sizes.

3. Interval Scales: These scales have uniform intervals between measurements but lack an inherent zero point. Examples in public health and social work include:

Standardized mental health assessment scores
Quality of life indices
Stress or resilience scales

Means and standard deviations are appropriate for interval scales. Yet, statements like “twice as stressed” should be cautiously approached since there isn’t a true zero.

4. Ratio Scales: The epitome of measurement accuracy, ratio scales provide consistent intervals and an unambiguous zero point. Examples in our context are:

Number of counseling sessions attended
Age of individuals receiving services
Rate of disease incidence in a community per 1,000 people

All statistical techniques are valid with ratio scales, enabling statements like “twice as many sessions” or “half the incidence rate.”

Misinterpretation or misapplication of these scales in social work and public health can result in flawed conclusions, potentially influencing decisions that impact individual lives and communities. For professionals in these sectors, recognizing the principles guiding number assignments and discerning the scale type is paramount. This ensures that our representations of health and societal phenomena through numbers are accurate and meaningful.

In addition to having these basic _scales_ of measurement, I also want to draw attention to two types of values that you will routinely encounter: geographic- and time-based values.

5. Geographic values:

A geographic variable in data analysis is a variable that represents spatial or location-based information that can be mapped onto a geographical space. These variables inherently possess geographic properties that enable them to be represented on a map, often about other geographic features (e.g., latitude/longitude, city, state, postal code).

Here are the essential features of geographic data:

Spatial Reference: Geographic variables have a spatial reference, which means they can be plotted on a map. This could be in the form of latitude and longitude, UTM coordinates, or any other coordinate system.
Relationship with Neighboring Data Points: Data with geographic properties can be analyzed based on spatial relationships. For instance, you can determine if one location is adjacent to, within, or at a certain distance from another.
Mapping Capability: Geographic variables are best visualized on maps, where spatial patterns, relationships, and phenomena can be observed and analyzed. This is different from simple data visualization like bar graphs or pie charts.

Let’s differentiate geographic values from non-geographic values. While a location like “my dresser” might represent a specific place in someone’s room, it lacks the spatial reference and mapping capability that a geographic variable requires. Without an associated coordinate system (like latitude and longitude) or relative position to other geographic features, “my dresser” cannot be mapped in a broader geographical context. On the other hand, if “my dresser” were described with precise geographic coordinates, it could potentially be treatedas a geographic variable.

While many things can have a ‘location’ generally, not all locations qualify as geographic variables. Geographic variables are specifically tied to geographical spaces and can be visualized, analyzed, and understood within that context.

6. Time-based values
Time-based values, often known as temporal data, denote points or periods in time. They play a vital role in data analysis as they enable the monitoring of changes, trends, and patterns over varying durations, from milliseconds to millennia. Such data is foundational in sectors like finance, meteorology, healthcare, and any area that requires time series analysis. Time-based values are important for many reasons:

Trend Analysis: Using time-based data, one can observe trends as they develop. Examples include the progression of stock prices over the years or shifts in global temperatures across decades.
Forecasting: Past time-based data can be used to predict future events or trends. This is seen in weather forecasting based on past patterns or in business for sales predictions.
Sequential Analysis: Time-based data is essential in fields like medicine or engineering where understanding the sequence of events can be crucial. For example, a doctor might track the progression of a disease based on symptoms over time.
Causality and Correlation: Observing events over time can help determine if one event caused another or if they occurred together.

Complications of Formatting:

Time Zones: Different regions worldwide use different time zones, which can complicate the harmonization of temporal data from diverse sources.
Daylight Saving Time: Some regions adjust their clocks for daylight saving time, which can introduce discrepancies in time-based data.
Date Formats: Different cultures or systems might represent dates in various formats (e.g., MM/DD/YYYY vs. DD/MM/YYYY).
Leap Years: Adding an extra day in February every four years can introduce complications, especially in time series analysis.
Granularity: Time data can be represented in various granularities, from milliseconds to centuries, and converting between them can be challenging.
Historical Calendars: Historical data might use different calendar systems, like the Julian or Gregorian calendar, which can complicate comparisons with modern data.

Discrete and continuous variables

Understanding measurement scales is essential invarious fields, including social work and public health. Equally important is the comprehension of the nature of the variables measured on these scales. In this context, variables are broadly classified into discrete and continuous categories. This section explains the differences between these two variables, offering social work examples for more accessible comprehension.

Discrete Variables

A discrete variable is a quantitative variable that can only take specific values and not any values in between. These variables often represent counts or classifications and align with nominal and ratio measurement scales.

For instance, consider the number of individuals attending a support group session in a community center. This number is a discrete variable because it can only take whole numbers as values. You could have 3, 10, or 15 individuals, but never 4.3 or 7.5. This discrete variable aligns with the ratio scale of measurement as it has a definite zero point and uniform intervals.

Another example of a discrete variable in social work is thenumber of counseling sessions a client has received over a month. Similarly, in public health, the total number of confirmed cases of a specific disease in a community also represents a discrete variable.

Continuous Variables

Contrary to discrete variables, continuous variables are numeric variables with infinite values between anytwo values. A continuous variable can be divided into smaller and smaller units, and it can take any of these values. Continuous variables typically align with interval and ratio scales of measurement.

An example of a continuous variable in social work might be the age of clients receiving services. Age can be measured in years, months, days, hours, minutes, or even seconds, so it’s a continuous variable. A person could be 30 years, 11 months, 15 days, 3 hours, and 4 minutes old.

Body temperature is a typical example of a continuous variable in public health. Body temperature can be measured to a high level of precision, such as 36.7 degrees Celsius or 100.2 degrees Fahrenheit.

The Importance of Distinction

Understanding the difference between discrete and continuous variables is crucial for social work and public health professionals. Each variable type requires different statistical techniques for analysis, and the conclusions that can be drawn from them differ.

For example, calculating the mean or median of a discrete variable, such as the number of counseling sessions attended, is meaningful. Still, it might only provide part of the picture. Instead, it may be more helpful to use frequency distribution or mode to understand the most common number of sessions attended.

On the other hand, continuous variables like age or body temperature allow for a range of descriptive and inferential statistical analyses. Measures such as means, medians, ranges, and standard deviations can provide valuable insights.

Recognizing the differences between discrete and continuous variables can significantly influence the understanding and interpreting data in social work and public health. By making the right distinctions, professionals can ensure they employ the most suitable statistical techniques and make accurate, meaningful decisions that positively impact individuals and communities.

Dimensions and measures

In data analysis, two fundamental concepts are dimensions and measures. These terms are often used interchangeably but denote different aspects of data representation. This article aims to equip social work students with a clear understanding of these concepts and their relevance to the field.

Understanding Dimensions

A dimension represents a qualitative variable that helps categorize, filter, or segment data in the data context. Dimensions are often descriptive and non-numerical, providing a way to navigate data. They are inextricably linked to the nominal measurement scale, where numbers or symbols function as mere labels without any quantitative essence.

Consider a social work study examining the efficacy of different types of interventions for substance abuse. The ‘type of intervention’ (such as counseling, rehabilitation, group therapy, etc.) would be a dimension. It’s a categorical variable that allows us to segment the data based on the intervention type.

Similarly, in a public health context, ‘disease classification’ (like communicable, non-communicable) would serve as a dimension. It helps organize data based on disease categories.

Unpacking Measures

On the other hand, measures are quantitative variables that can be calculated, aggregated, or summarized. Typically, measures are numeric and continuous, offering specific quantities or amounts. Measures correlate with the interval and ratio scales of measurement, where numbers have quantitative essence and uniform intervals between measurements exist.

In the aforementioned substance abuse study, suppose we’re also tracking the number of counseling sessions each client attends. This count is a measure because it provides a quantifiable amount that can be summarized or averaged.

In public health, the ‘rate of disease incidence’ in a community is a measure. A numeric variable can be aggregated or compared across different communities or time frames.

Dimensions Vs. Measures: The Crucial Distinction

The distinction between dimensions and measures is vital for social work students, as it influences how data is visualized, analyzed, and interpreted.

Dimensions, being qualitative, are typically used to break down data. They offer a way to view data from different perspectives or levels. For instance, in a substance abuse intervention efficacy dataset, ‘type of intervention’ and ‘demographics of clients’ could offer various angles to analyze the data.

Quantitative measures are generallyused to perform calculations on the data. They provide the numeric facts that we want to analyze. For instance, ‘number of counseling sessions attended’ or ‘rate of improvement in clients’ could offer quantifiable insights into intervention efficacy.

In visual data representations like charts, dimensions often determine the chart’s structure(e.g., the X and Y axes in a bar chart), while measures provide the values that populate the chart.

Understanding the differences between dimensions and measures is pivotal for correctly interpreting data and making informed decisions in social work. By distinguishing between dimensions as categorical variables and measures as numeric variables, social work students can ensure they appropriately analyze data, draw accurate conclusions, and make data-driven decisions that positively impact their practice and the communities they serve.