Skip to main content
Social Sci LibreTexts

8.3: Selecting a Standardized Assessment Tool

  • Page ID
    272926
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Choosing the appropriate assessment tool (authentic or standardized) requires careful evaluation of its properties and purpose. Screening assessment tools are used for the purpose of either developmental or universal screening. When selecting assessment tools for use in early childhood education classrooms, it is essential to consider the specificity and sensitivity (often referred to as selectivity) of screening tools. Additionally, when selecting any assessment tool it is also important to consider its' reliability and validity (National Research Council 2008).

    Screening – Specificity and Selectivity

    Screening tools in public health are used to quickly and systematically identify individuals who may be at risk for certain diseases, conditions, or developmental concerns, even before symptoms appear. These tools—such as questionnaires, physical measurements, laboratory tests, or brief developmental checklists—are typically administered to large populations by medical professionals to detect early warning signs and guide further evaluation or intervention. The primary goal is prevention: by identifying potential health issues early, public health professionals can connect individuals to timely diagnostic assessments, treatment, or support services, ultimately reducing the burden of disease, improving outcomes, and promoting community well-being. Screening tools are not diagnostic on their own but serve as a first step in a larger process of health monitoring and care coordination. For example, the Denver Developmental Screening Test – II (Frankenburg et al. 1992) is commonly used to assess developmental progress in young children, while the Patient Health Questionnaire-9 is often used to screen for depression (Kroenke et al. 2002).

    All effective screening tools must balance specificity - the ability to correctly identify those without a condition - and sensitivity - the ability to correctly identify those with a condition (Squires and Bricker, 2009). A screening tool with a higher sensitivity minimizes false negatives, ensuring individuals needing intervention are identified, while a tool with a high specificity reduces false positives, avoiding unnecessary follow-ups (Glascoe 2013). The video “Sensitivity and Specificity Simplified” by Let’s Learn Public Health provides a thorough explanation of how these key measures are essential to consider when choosing a screening tool for public health purposes.

    Similarly, in early childhood education, screening tools must balance specificity and sensitivity. For example, let’s say a program is planning to use a developmental screening tool to help identify children who may be at risk for developmental delays or disabilities. If the screening tool is too sensitive, it might flag many children as needing further evaluation—even if they are developing typically—leading to unnecessary worry for families and strain on early intervention resources. On the other hand, if the tool were too specific, it might miss children with subtle or emerging developmental issues, delaying access to needed support services. A well-balanced tool aims to detect true developmental concerns (high sensitivity) while minimizing the number of typically developing children incorrectly identified as at risk (high specificity), ensuring both efficient use of resources and timely support for children who need it.

    Validity – Ensuring Accuracy

    When choosing an assessment tool to better understand a child’s social-emotional skills, you can imagine how one might expect and want the assessment tool to include items related to identification of emotions, emotional regulation, social skills, etc. and not counting, alphabet recognition, or other non-related skills/knowledge. The term validity in assessment refers to the extent that an assessment measures what it is intended to measure. For example, if an assessment is designed to evaluate a preschooler’s early literacy skills, it should genuinely capture those specific abilities—such as letter recognition or phonological awareness—rather than unrelated skills like memory or fine motor coordination. Validity ensures that the conclusions drawn from the assessment results are appropriate and meaningful for the child’s development and educational planning. In early childhood settings, this includes considering cultural and linguistic relevance, developmental appropriateness, and the context in which the assessment is administered. Valid assessments help educators make informed decisions that support each child’s growth and learning.

    There are multiple types of validity:

    • Content validity ensures that the assessment covers the full range of developmental skills it is intended to measure. For example, a preschool readiness assessment should include items that assess a variety of skills, such as language, motor, social, and cognitive development, to provide a comprehensive overview.
    • Construct validity is a type of validity that evaluates whether the assessment accurately measures the theoretical construct it is designed to assess. One example is an assessment designed to measure emotional regulation in young children should include items that assess behaviors related to managing emotions, such as self-soothing and frustration tolerance.
    • Criterion-related validity measures how well the assessment correlates with real-world outcomes or existing validated measures, often described as predictive validity or concurrent validity. Predictive validity assesses how well the test predicts future performance - for instance, to what extent a kindergarten readiness assessment predicts later academic success in first grade. Concurrent validity compares the assessment with an established measure to see if they produce similar results. For example, a new literacy screening tool might be compared with an existing, validated literacy assessment.
    • Face validity refers to whether the assessment appears to measure what it is supposed to measure based on its surface appearance. One way of determining an assessment’s face validity is to identify if parents and teachers feel that an assessment of a child’s social skills seems relevant and appropriate for evaluating peer interactions; if it does - it has high face validity.
    • Environmental validity refers to how well the assessment reflects a child’s performance in a natural or everyday setting. For example, instead of administering a standardized language test in a quiet, unfamiliar room, the teacher listens to the child as they interact with peers—asking questions, naming objects, and telling stories while pretending to cook in a play kitchen. This assessment method would have high ecological validity because it captures the child’s natural language use in a familiar, meaningful context, reflecting how they actually communicate in daily life.

    Reliability – Ensuring Consistency

    Another important characteristic of an assessment is its reliability. Reliability in assessment refers to the consistency and stability of a tool in measuring a child’s skills or development over time, across different observers, and in various settings. A reliable assessment produces similar results under consistent conditions—for example, if two teachers observe the same child using the same observation checklist, they should record similar outcomes (inter-rater reliability). Likewise, if the same child is assessed at two points close in time with no significant changes in development, the results should remain consistent (test-retest reliability). In early childhood education, high reliability is essential to ensure that assessment results are trustworthy and can be used to make informed decisions about instruction, intervention, and developmental support.

    Types of reliability include:

    • Test-retest reliability measures the stability of assessment results over time by administering the same assessment to the same group of children at two different points in time. One example of this would be that a developmental screening tool may be considered reliable if it was given to preschoolers in the fall and again in the spring and the scores were similar (and if the children’s developmental progress is consistent).
    • Inter-rater reliability assesses the consistency of scores given by different assessors evaluating the same child’s performance or behavior. In this case, an assessment would be considered reliable if two teachers observing a child’s play during a structured activity provided similar ratings of the child’s social skills when using the same rubric.
    • Parallel-forms reliability describes the consistency of results when two different but equivalent forms of a developmental assessment are administered. For example, an assessment might be considered reliable if a preschool-aged child was given a literacy assessment and another form of the same literacy assessment and they yielded similar results.
    • Internal consistency reliability measures how well items within a single assessment measure the same developmental skill or domain. Therefore, an assessment of young children’s fine-motor skills with high internal consistency would have a variety of items that consistently reflect aspects of fine motor development, such as grasping, manipulation, and coordination.

    For a more concise summary of the differences between validity and reliability, watch Professor Rachelle Tannenbaum’s video.


    This page titled 8.3: Selecting a Standardized Assessment Tool is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?