WCAG Success Criteria · Level A

WCAG 1.2.1: Audio-only and Video-only (Prerecorded)

WCAG 1.2.1 requires that prerecorded audio-only and video-only content have a text-based or media alternative so users who cannot hear or see the media can still access the information. This is a Level A requirement, meaning it is the minimum baseline for web accessibility compliance.

Level A
Wcag
Wcag 2 2 a
Perceivable
Accessibility

What This Rule Means

WCAG 1.2.1 addresses two distinct types of time-based media: audio-only content (such as a podcast episode, a recorded telephone announcement, or a music track that conveys meaningful information) and video-only content (such as a silent instructional animation or a voiceless product demonstration clip). The criterion requires that each of these media types is accompanied by an equivalent alternative that makes the same information available to people who cannot perceive the original format.

For prerecorded audio-only content, the required alternative is a text transcript. The transcript must capture all spoken words, identify speakers where relevant, and describe any meaningful non-speech audio (such as applause, alarms, or music that carries informational value). Simply providing a title or a brief description is not sufficient; the transcript must be a full textual equivalent of everything a listener would hear.

For prerecorded video-only content (video with no audio track, or with an audio track that contains no meaningful information), the required alternative is either a text transcript or an audio description track. An audio description narrates the visual content — describing on-screen actions, scene changes, text that appears on screen, and other visual details — so that a blind or visually impaired user can understand the content through audio alone.

A pass requires that the alternative is clearly associated with the media, easy to find, and fully equivalent in informational content. The alternative may be provided inline on the page, as a linked document, or as a supplementary audio track, as long as it is readily accessible from the same page or player interface.

A fail occurs when: no alternative is provided at all; the alternative is incomplete or omits meaningful information; the alternative is present but so difficult to locate that it effectively requires the user to know it exists; or the alternative describes the media without reproducing its actual content (for example, writing "the presenter explains how to reset the device" instead of providing the actual step-by-step instructions).

WCAG 1.2.1 includes one official exception: if the audio-only or video-only content is itself serving as a media alternative for text that is already on the page, and it is clearly labeled as such, it does not require an additional alternative. For example, a short video that visually demonstrates exactly what a nearby written tutorial already describes in full may be exempt, provided the label makes the relationship clear to all users.

It is also important to note that this criterion covers only prerecorded content — live audio-only and live video-only streams are handled separately under WCAG 1.2.9 and are not in scope here. Content that contains both audio and video (synchronized media) falls under WCAG 1.2.2 (Captions) and 1.2.3 (Audio Description or Media Alternative), not 1.2.1.

Why It Matters

Audio-only and video-only content creates barriers for multiple distinct user groups, and understanding each group's experience is essential for appreciating why this criterion exists at a foundational Level A.

Deaf and hard-of-hearing users cannot access the information in prerecorded audio-only content without a text transcript. For a person who has been profoundly deaf from birth, a podcast interview, a recorded customer-service explanation, or an audio-only FAQ response is simply inaccessible — as if the content did not exist. According to the World Health Organization, over 1.5 billion people worldwide experience some degree of hearing loss, with approximately 430 million requiring rehabilitation. In Turkey alone, surveys indicate millions of citizens live with significant hearing impairment, many of whom rely on Turkish Sign Language or written text as their primary mode of communication.

Blind and visually impaired users are the primary audience for video-only alternatives. A silent product assembly video, a data visualization animation, or a visual-only tutorial is meaningless when conveyed only through a screen reader announcing the presence of a video element. Without an audio description or text transcript, these users receive no information whatsoever from the content.

Users with cognitive and learning disabilities often benefit from having information available in multiple formats. A person with dyslexia may find it easier to listen to an audio description than to read a long visual sequence, while another user may prefer a step-by-step written transcript they can re-read at their own pace. Providing alternatives supports a wider range of processing styles.

Situational and environmental limitations also create a broad usability case that extends far beyond users with permanent disabilities. Someone in a quiet library or open-plan office cannot play audio content and benefits enormously from a transcript. A user on a slow mobile connection who cannot buffer a video can read the text alternative immediately. A user who is a non-native speaker of the language used in the audio may find it much easier to read a transcript than to follow spoken content at speed.

Consider a concrete real-world scenario: a Turkish bank's website publishes a prerecorded audio guide explaining how to activate a new debit card. A customer who is deaf receives this guide as part of their welcome email. Without a transcript, they have no way to complete the activation without calling a support line — a process that may itself present accessibility barriers. Providing a well-structured text transcript eliminates this dependency entirely and serves the customer equally.

From an SEO perspective, text transcripts are fully indexable by search engines. Audio and video content without transcripts represents a missed opportunity for organic search visibility. A transcript published alongside a podcast episode or an instructional video effectively doubles the crawlable content on the page and can significantly improve keyword relevance for search queries related to the media's subject matter.

WCAG 1.2.1 requires manual testing because automated tools cannot evaluate the content or completeness of a media alternative. An automated scanner can detect the presence of a <video> or <audio> element, but it cannot determine whether a linked transcript accurately represents everything in the audio track, or whether an audio description covers all meaningful visual events. Below are the considerations relevant to axe-core's approach to this criterion.

No dedicated automated axe-core rule exists for WCAG 1.2.1. Axe-core and the Deque axe DevTools engine flag this criterion as requiring manual review. This is a deliberate and correct design choice: the rule would generate an unacceptable rate of false positives or false negatives if automated. A scanner cannot "read" an audio file or "watch" a video to verify that a transcript is complete and accurate. As a result, any audit tool that claims to automatically pass or fail WCAG 1.2.1 without human review should be treated with skepticism.
What automated tools can flag as supporting signals: Some tools, including axe in best-practice mode, will flag <audio> and <video> elements that lack any associated text content in the immediate DOM context. This is a useful prompt for manual review, but a positive flag does not mean the transcript is adequate, and the absence of a flag does not mean the transcript is present — a linked transcript on another page would not be visible to the scanner at the element level.
Manual testing is required because: Evaluating this criterion demands a human reviewer who can consume the audio or video content in full, then compare it line by line against the provided alternative to confirm equivalence. The reviewer must also assess whether the alternative is easy to locate from the media element, which requires navigating the page as a user would — something no current automated tool can replicate with reliability.

How to Test

Run an automated scan as a starting point. Use axe DevTools, Lighthouse, or the Accsible audit panel to scan the page. Look for any flagged <audio> or <video> elements in the results. Note that a clean automated result does not confirm compliance with 1.2.1 — it only means no obvious structural issues were detected. Use the scan to build an inventory of all media elements on the page that need manual review.
Identify all prerecorded audio-only and video-only content. Manually review the page source and rendered output. Look for <audio> elements, <video> elements where the video track carries no meaningful audio, embedded media players (such as SoundCloud or Spotify widgets), and any <iframe> elements that load audio or video content from a third-party source.
For each audio-only element, locate the associated transcript. The transcript may be inline on the page, in a collapsible section, or linked via an anchor tag near the player. Navigate to the transcript and read it in full while simultaneously listening to the audio. Confirm that every spoken word is captured, all speakers are identified where relevant, and all meaningful non-speech audio events are described.
For each video-only element, locate the associated alternative. Determine whether a text transcript or an audio description track is provided. If an audio description track is used, activate it in the media player and watch the video while listening to the description. Confirm that all meaningful visual events — actions, scene changes, on-screen text, graphical information — are described in sufficient detail for a blind user to understand the content without seeing the video.
Test with a screen reader to verify discoverability. Using NVDA with Firefox, VoiceOver with Safari on macOS/iOS, or JAWS with Chrome, navigate to the media element using the keyboard alone (Tab, arrow keys). Without using a mouse, verify that you can locate the transcript or audio description link from the media player using only keyboard navigation and screen reader announcements. If the alternative cannot be reached without a mouse, the criterion fails even if the content of the alternative is otherwise adequate.
Check for the labeling exception. If a transcript or alternative is absent, verify whether the media element is explicitly labeled as a media alternative for adjacent text content on the same page. If so, confirm that the surrounding text is a complete equivalent of the media content and that the label is perceivable to all users.

How to Fix

Audio-only podcast or recorded narration — Incorrect

<!-- No transcript provided; the audio content is completely inaccessible
     to deaf and hard-of-hearing users -->
<audio controls src='welcome-guide.mp3'>
  Your browser does not support the audio element.
</audio>

Audio-only podcast or recorded narration — Correct

<!-- A full text transcript is provided immediately after the player,
     making it discoverable by keyboard and screen reader users
     without requiring any additional navigation -->
<figure>
  <figcaption>Welcome Guide Audio — Card Activation Instructions</figcaption>
  <audio controls src='welcome-guide.mp3'>
    Your browser does not support the audio element.
  </audio>
</figure>
<details>
  <summary>Read the full transcript of this audio guide</summary>
  <div>
    <p><strong>Narrator:</strong> Welcome to your new debit card activation guide.
    To begin, locate the 16-digit card number on the front of your card.</p>
    <p><strong>Narrator:</strong> Enter this number in the field provided on
    the activation screen, then press Confirm. [Confirmation chime sounds.]</p>
    <p><strong>Narrator:</strong> Your card is now active and ready for use.</p>
  </div>
</details>

Silent instructional video (video-only) — Incorrect

<!-- Silent animation with no audio description or text transcript.
     A blind user navigating with a screen reader will only hear
     "video" announced — no information about the content is conveyed. -->
<video controls width='640' height='360'>
  <source src='assembly-instructions.mp4' type='video/mp4'>
</video>

Silent instructional video (video-only) with text transcript — Correct

<!-- A text transcript describing all meaningful visual actions is
     linked immediately below the video player. The link text clearly
     communicates the purpose of the destination. -->
<video controls width='640' height='360' aria-labelledby='video-title'>
  <source src='assembly-instructions.mp4' type='video/mp4'>
</video>
<p id='video-title'>Product Assembly: Attaching the Base Unit</p>
<p>
  <a href='assembly-transcript.html'>
    View the full text description of this assembly video
  </a>
</p>

Silent video with inline audio description track — Correct

<!-- For users who prefer audio, a described audio track is offered
     as a <track> element with kind='descriptions'.
     The text transcript link is also retained for deaf-blind users
     and those using text-only browsing. -->
<video controls width='640' height='360'>
  <source src='product-demo-silent.mp4' type='video/mp4'>
  <track
    kind='descriptions'
    src='product-demo-descriptions.vtt'
    srclang='en'
    label='Audio Description (English)'
  >
  <track
    kind='descriptions'
    src='product-demo-descriptions-tr.vtt'
    srclang='tr'
    label='Sesli Betimleme (Türkçe)'
  >
</video>
<p>
  <a href='product-demo-transcript.html'>
    Read the full text description of this product demonstration
  </a>
</p>

Common Mistakes

Providing a summary instead of a full transcript. Writing a brief paragraph like "This audio explains our refund policy" is not an equivalent alternative. The transcript must reproduce the actual content — every sentence, every instruction, every meaningful detail — so that a user who cannot hear the audio loses nothing by reading the transcript instead.
Omitting non-speech audio events from transcripts. If a recording includes a warning tone, a crowd cheering, a doorbell, or background music that signals a transition, these must be noted in the transcript using bracketed descriptions such as [alarm sounds] or [applause]. Omitting these leaves the transcript informationally incomplete.
Placing the transcript on a completely separate page without a visible, keyboard-accessible link. If a user has to know in advance that a transcript exists and navigate away from the media page to find it, discoverability has failed. The link to the alternative must be immediately adjacent to the media element and reachable by keyboard.
Assuming a <video> element with a silent track is covered by captions. Captions (WCAG 1.2.2) address spoken audio in synchronized media. A truly silent video — one with no meaningful audio at all — is video-only content and requires its own text description or audio description under 1.2.1. Captions of silence provide no information.
Using auto-generated transcripts from speech-to-text tools without review. Machine-generated transcripts from services like YouTube auto-captions or AI transcription APIs frequently contain errors in proper nouns, technical terms, and non-standard language. Publishing an unreviewed auto-transcript that contains significant errors does not satisfy the criterion, because an inaccurate transcript is not an equivalent alternative.
Failing to identify speakers in multi-person audio recordings. A transcript that reads as a single undifferentiated block of text, without indicating which speaker is talking, is confusing and may be ambiguous in meaning. Speaker labels should be used consistently throughout any recording that features more than one voice.
Treating the alt attribute on a poster image as a substitute for a video transcript. The alt attribute on a <video> poster image describes the static thumbnail, not the video content itself. It does not fulfill the requirement for a media alternative under 1.2.1 under any interpretation.
Providing an audio description that only describes the setting and ignores on-screen text. If a silent video displays important text — step numbers, labels, measurements, error messages — the audio description or transcript must read out that text explicitly. Describing the visual scene without transcribing the on-screen text leaves critical information inaccessible.
Marking content as exempt without confirming the full equivalence condition is met. The exception for media alternatives to text applies only when the text on the page is a complete equivalent of the media. If the page text covers only part of what the video demonstrates, the exception does not apply and an alternative is still required for the portions not covered by the text.
Neglecting to provide Turkish-language alternatives for Turkish-language media. When audio-only or video-only content is in Turkish, the alternative should also be in Turkish (or at least in the primary language of the target audience). Providing only an English transcript for Turkish audio content does not constitute an equivalent alternative for Turkish-speaking users.

Relation to Turkey's Accessibility Regulations

Turkey's Presidential Circular 2025/10, published in the Official Gazette numbered 32933 on June 21, 2025, establishes a mandatory legal framework for digital accessibility aligned with WCAG 2.2. WCAG 1.2.1 is a Level A criterion, placing it in the most essential tier of requirements under this circular. Level A conformance represents the absolute minimum acceptable standard — failures at this level are considered fundamental barriers that entirely prevent access for affected users.

The circular applies broadly across both public and private sectors. Public institutions — including all ministries, government agencies, municipalities, and state-owned enterprises — are required to achieve full Level A conformance within one year of the circular's publication date. Private sector entities covered by the circular are granted a two-year transition period.

The private sector entities explicitly covered by Presidential Circular 2025/10 include: e-commerce platforms operating in Turkey regardless of registration location; banks and financial institutions regulated under Turkish banking law; hospitals and private healthcare providers; telecommunications companies with 200,000 or more subscribers; travel agencies operating under Turkish tourism licensing requirements; private passenger transport companies; and private educational institutions authorized by the Ministry of National Education (MoNE).

For these entities, WCAG 1.2.1 carries direct and practical implications. A bank that publishes audio-only guides for its mobile banking features without transcripts, a hospital that provides silent video-only tutorials for patient intake procedures, or a telecom provider that uses audio-only recorded announcements on its support portal without text alternatives would each be in direct violation of this requirement from the moment their respective compliance deadline passes.

Non-compliance with the circular can result in administrative sanctions and reputational consequences, as well as exposure to complaints filed through Turkey's Information Technologies and Communication Authority (BTK) and the Presidency's Digital Transformation Office. Given that 1.2.1 is among the most straightforwardly remediated criteria — requiring the creation of a text transcript or audio description rather than any complex technical change — organizations should prioritize an audit of all audio-only and video-only assets on their digital properties as an early and high-impact step in their accessibility compliance programs.

Content teams, not just developers, play a central role in achieving compliance with 1.2.1. Transcripts must be authored, reviewed for accuracy, and maintained as media content is updated. Organizations should establish editorial workflows that treat transcript creation as a mandatory step in the content production and publication process, equivalent in importance to SEO metadata or content review — and should ensure those workflows account for Turkish-language media alongside any other languages used on the platform.

Sources & references

AWCAG 1.1.1: Non-text Content AWCAG 1.2.2: Captions (Prerecorded)AWCAG 1.2.3: Audio Description or Media Alternative (Prerecorded)