Smarter Balanced Assessment Consortium
The Smarter Balanced Assessment Consortium, a member-led public organization with headquarters in California, offers assessment tools to teachers in K–12 and higher education. Founded in 2010, the business creates creative, standards-aligned test assessment systems in collaboration with state education organizations. In order to assist educators in identifying learning opportunities and enhancing student learning, Smarter Balanced provides them with lessons, tools, and resources, such as formative, interim, and summative assessments.
In the constantly evolving field of education, Smarter Balanced is dedicated to progress and creativity. The objective is to investigate a systematic methodology for utilizing artificial intelligence (AI) in educational assessments in conjunction with IBM Consulting. The partnership is still in place as of early 2024.
Specifying the difficulty
Standardized exams and structured quizzes, which are common K–12 skill evaluations, are criticized for a number of equity-related reasons. AI has the revolutionary potential to improve assessment fairness across student populations, including marginalized groups, by providing individualized learning and assessment experiences when used responsibly. Therefore, defining what responsible AI adoption and governance in a school setting looks like is the main difficulty.
Educators, professionals in artificial intelligence, ethics and policy surrounding AI, and specialists in educational measurement made up the first multidisciplinary advisory group established by Smarter Balanced and IBM Consulting. The panel’s objective is to create guiding principles for integrating justice and accuracy into the application of AI to learning materials and educational measurement. Below is a summary of some of the advisory panel’s factors.
Considering human needs when designing
Organizations can create a human-centric strategy for implementing technology by utilizing design thinking frameworks. Design thinking is driven by three human-centered principles: a focus on user outcomes, restless reinvention, and team empowerment for diversity. Stakeholders’ strategic alignment and responsiveness to both functional and non-functional organizational governance requirements are enhanced by this approach. Developers and other stakeholders can generate creative solutions, prototype iteratively, and gain a thorough understanding of user demands by applying design thinking.
This methodology is critical for early risk identification and assessment during the development process, as well as for enabling the development of reliable and efficient AI models. Design thinking aids in the development of AI solutions that are mathematically sound, socially conscious, and human-centered by consistently interacting with various communities of domain experts and other stakeholders and taking their input into consideration.
Including Diversity
A varied group of subject-matter experts and thought leaders were assembled by the merged teams to form a think tank for the Smarter Balanced initiative. Experts in the domains of law and educational evaluation, as well as neurodivergent individuals, students, and those with accessibility issues, made up this group.
The think tank aims to iteratively, rather than one-time, integrate its members’ experiences, opinions, and areas of expertise into the governance framework. A fundamental tenet of IBM’s AI ethics is reflected in the strategy: artificial intelligence should supplement human intelligence, not replace it. Incorporating continuous feedback, assessment, and examination by a range of stakeholders can enhance the development of trust and facilitate fair results, ultimately resulting in an educational setting that is more comprehensive and productive.
In grade school settings, these approaches are essential for developing equitable and successful educational assessments. Building AI models that are reflective of all students requires the many perspectives, experiences, and cultural insights that diverse teams bring to the table. Because of its inclusivity, AI systems are less likely to unintentionally reinforce existing disparities or fail to take into account the particular demands of various demographic groups. This highlights another important AI ethical tenet at IBM: diversity in AI is important since it’s about math, not opinion.
Examining beliefs that are focused on the student
Determining the human values IBM want to see represented in AI models was one of the first projects that IBM Consulting and Smarter Balanced performed together. IBM arrived at a set of principles and criteria that correspond to IBM’s AI pillars, or essential characteristics for reliable AI, as this is not a novel ethical dilemma.
- Explainability: The capacity to provide results and functions that don’t require technical explanation
- Fairness: Handling individuals equally
- Robustness: security, dependability, and ability to withstand hostile attacks
- Openness: Sharing information about the use, functionality, and data of AI
- Data privacy: revealing and defending users’ rights to their privacy and data
It is difficult to put these ideas into practice in any kind of organization. Even higher standards apply to an organization that evaluates pupils’ skill sets. Nonetheless, the work is valuable due to the potential advantages of AI. The second phase is now in progress and involves investigating and defining the values that will direct the use of AI to the assessment of young learners.
The teams are debating the following questions:
- What morally-based guidelines are required to properly develop these skills?
- Who should be in charge of operationalizing and governing them?
- What guidance should practitioners providing these models with follow?
- What are the essential needs, both functional and non-functional, and what is the required strength?
Investigating varying impact and levels of affect
IBM used the Layers of Effect design thinking framework for this activity. IBM Design for AI has donated numerous frameworks to the open source community Design Ethically. Stakeholders are asked to think about the primary, secondary, and tertiary implications of their experiences or goods using the Layers of Effect framework.
- The intended and known impacts of the product in this case, an AI model are referred to as primary effects. One of the main functions of a social media platform, for instance, could be to link people with shared interests.
- Although less deliberate, secondary impacts can swiftly gain importance among stakeholders. Using social media as an example, the platform’s value to advertisers may have a secondary effect.
- Unintentional or unexpected consequences that show up gradually are known as tertiary effects. An example of this would be a social media platform’s propensity to provide more views to messages that are insulting or misleading.
The main (desired) consequence of the AI-enhanced test assessment system for this use case is a more effective, representative, and equitable tool that raises learning outcomes throughout the educational system.
Increasing efficiency and obtaining pertinent data to aid in more effective resource allocation where it is most required are possible secondary benefits.
Unintentional and possibly recognized tertiary effects exist. Stakeholders need to investigate what would constitute unintentional harm at this point.
The groups determined that there could be five types of serious harm:
- Detrimental prejudice concerns that fail to take into account or assist pupils from marginalized groups who might require additional resources and viewpoints to meet their varied needs.
- Problems with personally identifiable information (PII) and cybersecurity in educational systems where insufficient protocols are in place for their networks and devices.
- Insufficient governance and regulations to guarantee AI models maintain their intended behaviors.
- Inadequate communication regarding the planned usage of AI systems in schools with parents, students, instructors, and administrative staff. These messages ought to outline agency, like how to opt out, and safeguards against improper usage.
- Restricted off-campus connectivity that could limit people’s access to technology and the use of AI that follows, especially in rural locations.
Disparate impact evaluations, which were first used in court cases, assist organizations in identifying possible biases. These evaluations look at how people from protected classes those who are vulnerable to discrimination on the basis of gender, ethnicity, religion, or other characteristics can be disproportionately impacted by policies and practices that appear to be neutral. These evaluations have shown to be useful in the formulation of employment, financing, and healthcare policies. IBM tried to take into account cohorts of students in their education use case who might, because of their circumstances, receive unequal results from tests.
The following were the categories found to be most vulnerable to possible harm:
- People who experience mental health issues
- People from a wider range of socioeconomic backgrounds, including those without a place to live
- Individuals whose mother tongue is not English
- Those with additional non-linguistic cultural factors
- People with accessibility concerns or those who are neurodivergent
IBM group’s next series of exercises is to investigate ways to lessen these harms by utilizing additional design thinking frameworks, like ethical hacking. IBM will also go over the minimal specifications that companies looking to integrate AI into student assessments must meet.