How to Read a Spectrogram: A Musician's Visual Guide

A spectrogram is a visual representation of sound — one that shows time, frequency, and amplitude simultaneously on a single display. Where a standard audio waveform shows you only how loud a sound is over time, a spectrogram shows you which frequencies are present, how loud each one is, and how they change as the sound evolves.

For musicians and singers, spectrograms reveal things that are impossible to hear directly: the harmonic structure of your voice, the presence of breathiness or vocal fry, resonance patterns, how your overtones shift between vowel sounds, and whether your vibrato is centred or biased. The spectrogram viewer on this site generates a live spectrogram from your microphone in real time — this guide explains what you’re actually looking at when you open it.

The Three Axes — Time, Frequency, and Amplitude

Every spectrogram, regardless of the software or display, encodes exactly three dimensions of information. Once you understand these, everything else follows from pattern recognition.

X-axis (horizontal) — Time

Read left to right, exactly like text. The left edge is the beginning of the sound; the right edge is where it is now (in a live spectrogram) or the end of the file (in a recorded analysis). Each vertical slice of the spectrogram represents a single moment in time — a snapshot of the frequency content right then.

Y-axis (vertical) — Frequency (Pitch)

Low frequencies sit at the bottom of the display; high frequencies at the top. The scale is typically logarithmic rather than linear — octaves are equally spaced, which means the space between 100 Hz and 200 Hz (one octave) is the same visual height as the space between 400 Hz and 800 Hz (also one octave). This matches how we perceive pitch — each octave feels like an equal step regardless of the actual Hz difference.

For singers and most acoustic musicians, the frequency range of interest is roughly:

80–300 Hz: fundamental frequencies of bass voices, low instruments, chest resonance
300 Hz–1 kHz: fundamental frequencies of tenor, alto, and soprano voices; first and second harmonics
1 kHz–4 kHz: upper harmonics, the “singer’s formant” region, consonant clarity
4 kHz–8 kHz: sibilance (s, sh sounds), brilliance, high harmonics
Above 8 kHz: air, sparkle, extreme high frequencies

Colour/Brightness — Amplitude (Loudness)

The intensity of the colour or brightness at each point encodes how loud that specific frequency is at that specific moment. Common colour schemes:

Dark background, bright colours — black = silence; yellow/white = loud; dark red/blue = quiet but present. This is the most common scheme in music production tools.
Grayscale — white = loud; black = silence. Used in acoustic research and speech analysis.

The key insight: a bright horizontal band means a sustained frequency at consistent amplitude. A faint horizontal band means a frequency that’s present but quiet. No colour at all means that frequency is absent or below the noise floor.

The Harmonic Series — The Ladder Pattern

The single most important pattern for musicians to recognise in a spectrogram is the harmonic series — the stack of parallel horizontal lines that appear when any pitched instrument or voice produces a sustained note.

When you sing a note at, say, A3 (220 Hz), your voice doesn’t produce only 220 Hz. It produces 220 Hz simultaneously with 440 Hz, 660 Hz, 880 Hz, 1100 Hz, and so on — the fundamental frequency plus all its integer multiples. These are called harmonics or overtones.

On a spectrogram, this appears as a ladder of horizontal lines:

Lowest line = fundamental frequency (perceived pitch)
Second line = double the fundamental (one octave up)
Third line = triple the fundamental (an octave and a fifth up)
Fourth line = four times the fundamental (two octaves up)
And so on, each line spaced proportionally higher and typically dimmer than the one below

The spacing between harmonics on the logarithmic frequency scale reveals the fundamental: the lowest, brightest line in the stack is the fundamental frequency of the note. Everything above it is harmonic content.

Why this matters for singers: The relative brightness of different harmonics — which harmonics are loudest — determines the timbre (tonal quality) of your voice. A voice with strong harmonics in the 2–4 kHz range sounds bright and projecting; one with most energy concentrated near the fundamental sounds darker and warmer. You can see this directly on the spectrogram.

To check what note corresponds to any specific Hz value on the spectrogram, the frequency to note converter maps any Hz value to its exact note name instantly.

What Different Patterns Mean — A Visual Guide

Horizontal lines (sustained tones)

A straight, consistent horizontal band means a pitch is being held at steady frequency and amplitude. The width of the band (how thick it is) indicates spectral spread — a pure sine wave produces a razor-thin line; a voice or instrument produces slightly thicker bands because of the natural frequency variation inherent in real sounds (including vibrato).

A gradually brightening horizontal band means a note is getting louder (crescendo). A gradually dimming band means it’s getting softer (diminuendo). This is readable directly from the spectrogram.

Diagonal lines (pitch changes)

A line that slants upward from left to right means pitch is rising over time. A downward slope means pitch is falling. The steeper the slope, the faster the pitch change.

A slow, smooth upward slope on a vocal line means the singer is approaching the note from below (scooping). A straight horizontal line that suddenly jumps to a higher position means a clean interval jump with no slide. The difference is immediately visible.

Vibrato appears as a very rapid, regular oscillation — the line wiggles up and down at the vibrato rate (5–6 Hz for normal vibrato) producing a gentle wave pattern rather than a straight line.

Vertical lines (sharp transients)

A bright vertical slash that spans multiple frequencies simultaneously means a transient — a very fast impulse like a consonant attack, a percussive click, or the initial attack of a plucked string. These are brief, broadband sounds that appear in every frequency band at once.

In vocal spectrograms, consonants that involve stops (p, b, t, d, k, g) appear as a brief gap followed by a transient burst. Fricatives (s, sh, f) appear as a horizontal band of noise specifically in the high-frequency region (4–8 kHz).

Broadband noise (the random scatter)

Noise — electrical noise, breath noise, room noise, fricative consonants — appears as a diffuse scatter of colour rather than organised lines. Instead of neat horizontal or diagonal bands, noise fills a range of frequencies with a grainy, unorganised texture.

Breathy vocal tone appears on the spectrogram as diffuse low-level energy between the harmonic lines rather than the clean silence you’d see between harmonics in a more focused vocal tone. This “harmonic noise” is acoustic evidence of incomplete glottal closure — air escaping between the vocal folds during phonation.

Room noise and microphone noise appear as a thin, consistent layer of grainy colour across all frequencies — always present at low level, and most visible when the signal drops during silence.

Reading the Singer’s Voice on a Spectrogram

The human voice has several specific features that are immediately readable in a spectrogram once you know what to look for.

Formants — The Resonance Peaks

The vocal tract — throat, mouth, and nose — acts as a resonance filter that amplifies certain frequency regions and dampens others. These resonance peaks are called formants, and they’re what distinguishes different vowel sounds from each other.

On a spectrogram of speech or singing, formants appear as horizontal bands of enhanced brightness — the harmonics that happen to fall in the formant region are brighter than their neighbours. Different vowel sounds produce different formant patterns, which is why spectrograms are used in linguistics and speech science to distinguish vowels objectively.

The singer’s formant: Trained classical singers develop a resonance cluster in the 2,500–3,200 Hz region — called the singer’s formant — that allows the voice to project over an orchestra. On a spectrogram, this appears as a consistently bright band in the 2–3 kHz region that maintains its brightness even as other harmonics fluctuate. The presence or absence of the singer’s formant is directly visible in the spectrogram display.

Vibrato Appearance

Vibrato appears as a regular wave oscillation in every harmonic line simultaneously — all the horizontal bands gently undulate together at the same rate. If only one harmonic wiggles while others don’t, it’s an artefact or noise rather than true vibrato. True vibrato affects the fundamental frequency, and therefore all harmonics oscillate together.

From the spectrogram, vibrato rate is readable by counting how many oscillation cycles occur per second (for 5 Hz vibrato, you’ll see 5 complete up-down cycles in each second of the display). Vibrato extent — how far the pitch oscillates — is visible as how much vertical displacement the lines show.

Breathiness and Vocal Fry

Breathy tone: The harmonic lines are present but surrounded by diffuse noise energy. The area between harmonic lines, which should be dark in a focused tone, has a grainy texture — acoustic evidence of air leaking through incompletely closed vocal folds.

Vocal fry (lowest register): Appears as extremely low, irregularly spaced pulses — each glottal pulse is visible as a separate event rather than blending into a continuous tone. The fundamental is very low (often 30–70 Hz) and individual vibration cycles are perceptible as discrete events.

Register Transitions

The passaggio — the transition between chest voice and head voice — often appears as a noticeable change in the harmonic pattern. Chest voice typically shows strong lower harmonics; head voice shows a different formant configuration with different harmonics emphasised. The transition may be smooth (trained technique) or show a jump and temporary instability (an audible register break) that’s clearly visible as a brief moment of noise or discontinuity in the harmonic stack.

How the Spectrogram Is Generated — The FFT Window

Understanding one technical aspect of spectrogram generation helps explain a practical trade-off you’ll notice in different spectrogram displays.

The spectrogram is built from a series of Fast Fourier Transforms (FFTs) — mathematical operations that decompose a short window of audio into its constituent frequencies. Each FFT produces one vertical slice of the spectrogram. The FFT windows are applied sequentially across the entire audio signal, overlapping to some degree, producing the flowing, movie-like display.

The window size trade-off:

A long FFT window analyses more audio per frame → better frequency resolution (you can distinguish harmonics that are close together in frequency) → worse time resolution (fast events are smeared, transients look blurry). Long windows are used when you want to see the harmonic structure of a sustained tone clearly.

A short FFT window analyses less audio per frame → better time resolution (transients are sharp and precise) → worse frequency resolution (close harmonics blur together). Short windows are used when you need to see the precise timing of attacks and transients.

Most music production spectrograms use a medium window size as a compromise. The spectrogram viewer uses a setting optimised for vocal analysis. For the full technical explanation of FFT in pitch detection contexts, the how FFT works in pitch detection guide covers the mathematics in accessible detail.

Spectrogram vs Waveform — When to Use Each

Both representations show audio over time, but they reveal fundamentally different information.

Feature	Waveform	Spectrogram
What it shows	Amplitude over time	Frequency content over time
Best for	Overall level, dynamics, transient shape	Pitch, harmonics, tone quality, noise
Reveals	Clipping, silence, volume envelope	Vowels, vibrato, breathiness, formants
Reading difficulty	Intuitive for most musicians	Requires learning but very powerful
Pitch information	None	Full harmonic content visible

Use the waveform when you need to check levels, find silence, or examine the overall dynamic shape of a recording. Use the spectrogram when you need to understand pitch content, harmonic quality, identify noise or tonal problems, or analyse vocal characteristics. They’re complementary — professional audio tools typically show both simultaneously.

Practical Spectrogram Analysis for Singers

Here’s how to use the spectrogram viewer specifically for vocal analysis:

Checking vowel consistency: Sing the same note on different vowel sounds (ah, eh, ee, oh, oo) and compare the spectrogram. The harmonic ladder stays at the same vertical positions (same pitch), but the brightness pattern shifts — different harmonics are emphasised on each vowel. Consistent vowel shaping produces consistent formant patterns across all vowels. Inconsistent technique shows irregular or collapsing formant patterns on specific vowels.

Identifying breathiness: Sing a sustained note and look at the areas between harmonic lines. Clean, focused tone has dark gaps between harmonics. Breathy tone has diffuse noise filling those gaps. You can literally see breath leakage in the spectrogram. Compare how the texture changes as you adjust your vocal technique — firmer support and better cord closure clean up the between-harmonic areas.

Analysing vibrato: As described above — all harmonic lines should oscillate together at a consistent rate. Irregular vibrato shows as irregular oscillation. Sharp or flat vibrato centre shows the entire harmonic stack sitting above or below where it should be.

Spotting register transitions: Sing through your range and watch the harmonic pattern as you approach and cross your passaggio. The shift in formant pattern is visible as a change in which harmonics are brightest. A smooth transition shows as gradual change; a register break shows as a brief moment of disruption in the harmonic lines.

Checking the singer’s formant: Sing sustained notes at different volume levels. In a well-developed classical voice, the 2–3 kHz region should remain bright even at softer dynamics — the ring is present regardless of volume. In a voice without developed resonance, the 2–3 kHz brightness drops at softer dynamics.

Frequently Asked Questions

What is a spectrogram? A spectrogram is a visual representation of sound that shows three dimensions simultaneously: time (horizontal axis), frequency (vertical axis), and amplitude (colour or brightness). Unlike a waveform which shows only loudness over time, a spectrogram shows which frequencies are present and how loud each one is at every moment. Open the spectrogram viewer to see a live spectrogram from your microphone.

How do you read a spectrogram? The horizontal axis is time — read left to right. The vertical axis is frequency — bass at the bottom, treble at the top. Bright colour or high brightness means a loud frequency; dark or absent means quiet or silent. Horizontal lines are sustained tones. Diagonal lines are pitch changes. Vertical lines are sharp transients. A ladder of parallel horizontal lines is the harmonic series of a pitched sound.

What do harmonics look like on a spectrogram? Harmonics appear as a set of parallel horizontal lines at regular frequency intervals — the fundamental frequency at the bottom, then lines at double, triple, quadruple that frequency and so on. The lowest, brightest line is the fundamental (the note you hear as the pitch). Higher harmonics are progressively fainter. The specific pattern of which harmonics are brightest determines the tonal quality (timbre) of the sound.

Can you see vocal quality in a spectrogram? Yes. Breathiness appears as diffuse noise between harmonic lines. Vocal fry appears as widely spaced, irregular low-frequency pulses. The singer’s formant (the resonant ring of a trained classical voice) appears as consistent brightness in the 2–3 kHz region. Register transitions appear as shifts in the harmonic brightness pattern. Vibrato appears as the harmonic lines oscillating together at a consistent rate.

What does noise look like on a spectrogram? Noise appears as diffuse, grainy texture spread across a range of frequencies — unorganised rather than structured into lines. White noise spans all frequencies equally. Electrical hum appears as a sharp horizontal line at 50 or 60 Hz. Room reverberation appears as extended decay after sounds end. Background conversation appears as speech-like patterns at low amplitude underlying the main signal.

What is the difference between a spectrogram and an FFT? An FFT (Fast Fourier Transform) is a single frequency snapshot at one moment in time — a vertical slice showing which frequencies are present right now. A spectrogram is a sequence of FFTs displayed in time order — a movie of frequency content across the full duration of a sound. A spectrogram is essentially many FFTs placed side by side to create the time-frequency display.

Why does my spectrogram look grainy or unclear? Grain is usually noise — from background sound, microphone self-noise, or room reverb. Move to a quieter environment, get closer to the microphone, and reduce ambient sound sources. Low-quality microphones also produce more noise floor visible in the spectrogram. For guidance on improving signal quality, see the noise and background interference guide.

Ornella

Ornella is a music technology writer and vocal tools specialist at Pitch Detector. She creates practical content around pitch detection, note recognition, vocal analysis, and singing education tools for beginners, singers, and audio creators.

How to Read a Spectrogram: A Musician’s Visual Guide