Content-Type Sniffing Test Suite

Test Index

Tests Specified Type Description Should be treated as
001 001.txt 001.exe 001.gif text/plain US-ASCII text ("PASS") text/plain
002 002.txt 002.exe 002.gif text/plain;charset=us-ascii US-ASCII text ("PASS") text/plain
003 003.txt 003.exe 003.gif text/plain ISO-8859-1 text ("PŧS") text/plain
004 004.txt 004.exe 004.gif text/plain;charset=iso-8859-1 ISO-8859-1 text ("PŧS") text/plain
005 005.txt 005.exe 005.gif text/plain;charset=utf-8 UTF-8 text (ÞΑSS) text/plain
006 006.txt 006.exe 006.gif text/plain;charset=shift_jis Shift_JIS text (PASS) text/plain
007 007.txt 007.exe 007.gif text/plain;charset=utf-16 UTF-16LE text (BOM, ⓅⒶⓈⓈ) text/plain
008 008.txt 008.exe 008.gif text/plain Binary data (includes invalid text/plain characters) application/octet-stream [1]
009 009.txt 009.exe 009.gif text/plain; charset=iso-8859-1 Binary data (includes invalid text/plain characters) application/octet-stream [1]
010 010.txt 010.exe 010.gif text/plain; charset=ISO-8859-1 Binary data (includes invalid text/plain characters) application/octet-stream [1]
011 011.txt 011.exe 011.gif text/plain; charset=IsO-8859-1 Binary data (includes invalid text/plain characters) text/plain
012 012.txt 012.exe 012.gif text/plain;charset=utf-16 UTF16 text that looks like binary data text/plain
013 013.txt 013.exe 013.gif text/plain US-ASCII text that looks like HTML text/plain
014 014.txt 014.exe 014.gif text/plain US-ASCII text that looks like XML text/plain
015 015.txt 015.exe 015.gif TEXT/PLAIN Binary data (includes invalid text/plain characters) text/plain
016 016.txt 016.exe 016.gif text/plain;charset=iso-8859-1 Binary data (includes invalid text/plain characters) text/plain
017 017.txt 017.exe 017.gif text/plain;charset=ISO-8859-1 Binary data (includes invalid text/plain characters) text/plain
018 018.txt 018.exe 018.gif text/plain US-ASCII text that vaguely looks like HTML text/plain

Suggested specification

UAs MUST always honour the Content-Type, except if it is exactly (case sensitive literal compares) text/plain, text/plain; charset=iso-8859-1, or text/plain; charset=ISO-8859-1 and the stream also contains one or more bytes in the range 0x00-0x08, 0x0B, 0x0E-0x1F, 0x7F, 0x80-0x9F. In those specific scenarios, UAs MUST ignore the specified Content-Type header and treat the resource as if it had no specified Content-Type.

Footnotes

1: Technically this should be interpreted as text/plain, but the given content is not valid text/plain. This is one of the default Apache Content-Type for unlabelled file types.