Tuesday, April 7, 2015

IBM Character Fonts

IBM used several fonts during the life of the IBM PC line and soon thereafter.  Eventually the font support would be finalized into the standard VGA font, but there was quite an evolution to get there.

The first font set is  found in the IBM PC BIOS, starting at address FFA6E.  In PC BIOSes, whether from IBM or from another publisher like Phoenix, Award or AMI, you will always find a font beginning at this address.  This address contains the dot patterns, or glyphs, for the first, basic 128 ASCII characters. The font is always in an 8x8 pattern and essentially acts like a fallback for programs using graphics modes. You could only find the glyphs for the second, extended 128 ASCII characters on the display adapters themselves.



Note the extra pixel in the diamond character in the first row.  This is unique to the first IBM PC Model 5150 BIOS revision.  That pixel will be gone in the PC BIOS dated 10-19-81 and every other BIOS thereafter.  Note that the second 128 characters do not exist in the PC BIOS's ROM.

MDA & Hercules

MDA and Hercules-brand graphics cards share the same glyph patterns.  Their text mode uses a 9x14 text cell and you were strictly limited to the 256 characters contained in the Character Generator ROM on the display adapter.  If you wanted to use non-IBM characters with a basic Hercules Graphics Card, you would use the graphics mode.  A Hercules Graphics Card Plus or Hercules InColor Card can redefine the characters in text mode.  Here is what the demo screen looks like on an MDA or Hercules :


Interestingly, the MDA's Character Generator is an 8KB ROM chip, even though its font only takes 4KB.  The other 4KB contain the two CGA fonts described in the next section.  Apparently it was easier to use one ROM for both cards.  The IBM Part number on the chip is 6359300 or 5788005 and the ROM is a 9264 type, so it cannot be dumped or replaced by an EPROM without a pin adapter.  The Character Generator ROMs cannot be read by the system, so the glyph patterns are obtained via a ROM dump.

CGA & PCjr.

CGA text modes always use an 8x8 text cell and typically uses a thick, double-dot font.  A true IBM CGA card also has a thin, single-dot font.  This can be selected by bridging two solder pads just below the MC6845, but IBM did not provide a pair of pins to make this easy for end users.  The thick font was suitable in 40-column mode for TVs, but the thin font shows a lot of color fringing, a.k.a. artifacts.  IBM probably thought that the thin font was not such an important feature that it should be made accessible to end users.  Otherwise, many users would have probably complained that the text was too difficult to read on their TVs.

This is what the standard thick font looks like :


Note that there are four characters with minor differences between the Character Generator ROM font and the BIOS font.  They are listed as "8x8 different between card and BIOS" in the screenshot.

Here is the thin font, which may have been IBM's first attempt at ISO compliance :


The PCjr. fonts should be identical to the CGA fonts, but the thin font is not available.

Tandy 1000

The Tandy 1000 contains a Character Generator ROM that is mostly similar to IBM's CGA double-dot font, but there are some differences :


In the original 1000, the Character ROM is embedded in the Video Gate Array chip.  After the original 1000s, the Tandy integrated the Video Gate Array and MC6845s into a large VLSI chip.  This applies to the EX, SX, HX and TX.  Internal to these chips is a 2K Character Generator ROM.  In the above screenshot, the first 128 characters are correct because they are duplicated in the Tandy BIOS at address FFA6E.  It is very difficult to extract the patterns for the second 128 characters because they are not in an accessible or dumpable ROM.  Here is what the characters truly look like :



By the time of the TL and SL, Tandy was using the Tandy Video II chip and an external 16KB Character ROM with the 8x8 font and a 9x14 font that may or may not be identical to IBM and Hercules.  The Video Controller in the TL and SL and their successors could emulate MDA and Hercules text and graphics.

The Tandy default text mode uses a 8x9 text cell, but usually an 8x8 text cell can be used.  For most characters, the extra row is blank, but for some the ninth pixel row is a repeat of the eighth pixel row.

EGA

With the EGA, MCGA and VGA adapters, the Character Generator ROM would no longer be found on a separate ROM chip accessible only to the CRT Controller.  Instead, multiple character sets would be contained in the BIOS Extension ROM (for EGA and VGA) or within the BIOS (for MCGA).  As these adapters supported redefinable character sets in text mode, DOS could upload its own character set for display.

The EGA BIOS supports an 8x8 text font when displayed on 200 line monitors, an 8x14 text font when displayed on 350 line color monitors and a 9x14 text font when displayed on a monochrome 350 line monitor.  The 9x14 characters are identical to the MDA characters, but many are shifted a pixel one direction or another to produce a more pleasing spacing (kerning) than MDA.  The 8x14 characters are mostly identical to the 9x14 characters, but there are differences.  The first 128 8x8 characters are identical to the PC BIOS and the second 128 8x8 characters are identical to the CGA thick text font.  All these fonts are stored, uncompressed, in the EGA 16KB BIOS extension.

This is the EGA and VGA and (for the first 128 characters) the standard PC BIOS 8x8 text font :


Here is the EGA and VGA 8x14 text font :


And the EGA and VGA 9x14 text font :


MCGA

MCGA includes a 8x8 and an 8x16 text font.  Actually, the MCGA, in addition to the standard 8x16 font, also contains four more 8x16 fonts, none of which ever obtained popularity.  These may have been IBM's attempt to be ISO compliant.

This is the standard 8x16 font for MCGA and VGA :


In addition, PS/2 Model 30s with a revision 0 BIOS contained an earlier version of the 8x16 font.  In this font, the zero character has a slash instead of a dot.

VGA

VGA supports the EGA 8x8, 8x14, 9x14 fonts and 8x16 and 9x16 fonts.  These are all found in the VGA BIOS ROM extension, which can be 24KB-32KB.  With 8x14 and 9x14 or 8x16 and 9x16, with EGA and VGA the glyphs are mostly the same and only one set is stored in the ROM unless there is a substitution for a particular glyph.  The BIOS adjusts for the ninth pixel column, for most characters the column will be blank; for others, the ninth column will repeat whatever is in the eighth.

Here is the final, standard 9x16 VGA font :


DOS Code Pages

The PC was originally designed by and intended for English speaking countries.  Support for other languages was a cumbersome exercise in the early days of MDA and CGA.  Eventually, DOS 3.3 introduced Code Pages, which when combined with an EGA or VGA card, allowed the user to set his PC to his country's symbols.  English language users would generally be content with the default DOS code page, 437, or the alternate English code page, 850.  Code Page 850 is more friendly to Western European languages than 437 but loses some of the drawing characters.  DOS's .CPI files would contain character sets for several code pages, each of which had character sets for 8x8, 8x14 and 8x16.  EGA.CPI contains 437, 850, 852, 860, 863, 865.  Here are 437 and 850 :



While the Tandy Video II chip found in the TL and SL does not support software redefinable fonts, it has support for 512 characters instead of just 256.  (EGA can also support 512 characters).  The first 256 are the characters in Code Page 437, the second 256 characters are those of Code Page 850.  However, as Tandy 1000s after the original can be upgraded to EGA or VGA, Tandy MS-DOS 3.3 supports Code Pages in 8x8, 8x9 and 8x14 text cell sizes.

ISO.CPI contains an English-language character sets suitable for ISO-compliant fonts :


Special thanks to NewRisingSun for all his help with this blog entry.

6 comments:

  1. My character woes started in the old days of DOS. I live in Portugal and the 860 code page is for Portugal. My Epson LX-800 9-pin dot matrix printer (every home user had that printer) was set for that code page. That wasn't in the DIP switches' country list. It was some sort of workaround by Epson.
    Portuguese DOS was in code page 850 "Multilingual (Latin-1)" (Western European languages) and that gave me problems and I didn't use Microsoft Works that came bundled with my IBM PS/1. I had to use the also bundled Windows 3.1 for my word processing.
    DOS had the nlsfunc TSR to change code pages in the computer and printer at once. My printer didn't support nlsfunc and memory was precious. But we can change the computer code page without using nlsfunc with the mode command.
    I only figured out all of this late, after having many frustrations because of code pages.
    Today, using Windows 7, I sometimes have problems with characters and I still haven't figured out how to today's character encoding works!

    ReplyDelete
  2. >This applies to the EX, SX, HX and TX.
    >In the above screenshot, the first 128 characters are correct because they are >duplicated in the Tandy BIOS at address FFA6E.
    >It is very difficult to extract the patterns for the second 128 characters
    >because they are not in an accessible or dumpable ROM.

    On Tandy 1000 (Original BIOS 01.00.00) second 128 characters are located on
    FC070 ~ FC46F.
    On 1000 (Second BIOS 01.01.00) or 1000A, secod 128 characters are located on FC075~FC474.

    They are dumpable.

    ReplyDelete
  3. Does anyone know where these are located in the Tandy 1100? I do notice it has a character ROM onboard.

    ReplyDelete
  4. Thank you for the different fonts! I included a link to your article in my reverse engineering tutorial for "You Have To Win The Game" map format:
    https://github.com/Michaelangel007/you_have_to_win_the_game_world_map

    ReplyDelete
  5. I know this is an old article, but for future reference: It's "The quick brown fox jump*s* over the lazy dog." That's important because it's the only "s" in the sentence. With "jumped", the sentence has every letter in the alphabet except for "s".

    ReplyDelete