[Comp.Sci.Dept, Utrecht] Note from archiver<at>cs.uu.nl: This page is part of a big collection of Usenet postings, archived here for your convenience. For matters concerning the content of this page, please contact its author(s); use the source, if all else fails. For matters concerning the archive as a whole, please refer to the archive description or contact the archiver.

Subject: Medical Image Format FAQ, Part 1/8

This article was archived around: Sun, 21 Dec 2003 14:16:25 GMT

All FAQs in Directory: medical-image-faq
All FAQs posted in: alt.image.medical, comp.protocols.dicom, sci.data.formats
Source: Usenet Version

Archive-name: medical-image-faq/part1 Posting-Frequency: monthly Last-modified: Sun Dec 21 09:16:25 EST 2003 Version: 4.26
This message is automatically posted once a month to help readers looking for information about medical image formats. If you don't want to see this posting every month, please add the subject line to your kill file. Contents: part1 - contains index, general information & standard formats part2 - contains standard formats (continued) part3 - contains information about proprietary CT formats part4 - contains information about proprietary MR formats part5 - contains information about proprietary other formats part6 - contains information about hosts & compression part7 - contains general information sources part8 - contains DICOM information sources Tools that describe and convert many of the formats described in this document are available in the dicom3tools package from "http://www.dclunie.com/dicom3tools.html". A web browsable version of this FAQ is available at: "http://www.dclunie.com/medical-image-faq/html/" or at the mirror sites: "http://www.focus-fr.com/links/faq/medical/" "http://www.focus-fr.com/links/" Html and text forms of the FAQ are available at (postscript and pdf no longer provided): "http://www.dclunie.com/medical-image-faq/". Many FAQs, including this Listing, are available on the archive sites: "ftp://rtfm.mit.edu/pub/usenet/news.answers/medical-image-faq/" "http://www.faqs.org/faqs/medical-image-faq/part1/" "http://www.cs.uu.nl/wais/html/na-dir/medical-image-faq/part1.html" "ftp://ftp.univ-lille1.fr/pub/faq/medical-image-faq/part1" "http://www.pasteur.fr/infosci/FAQ/medical-image-faq/part1" "http://www.panther.net/FAQ/medical-image-faq/part1" "http://faqs.jmas.co.jp/FAQs/medical-image-faq/part1" "http://www.han.de/usenet/medical-image-faq/part1.gz" "http://www.landfield.com/faqs/medical-image-faq/part1/" The name under which a FAQ is archived appears in the Archive-name line at the top of the article. There's a mail server on the FAQ archives. You send a e-mail message to mail-server@rtfm.mit.edu containing the keyword "help" (without quotes!) in the message body. To fetch this particular FAQ send a message with the following body: send usenet/news.answers/medical-image-faq/part1 ... send usenet/news.answers/medical-image-faq/part8 Please direct comments or questions and especially contributions to "mailto:dclunie@dclunie.com" or reply to this article. All unknown formats and test images gratefully accepted. Changes this issue Add Chinese and Korean Visible Human sites Add IntuitiveImaging conformance statement site Add PACSGear document scanning conformance statement site Remove email addresses to minimize spam Update Tiani conformance statement link Add UniPACS site Extensively revise conformance statement links Remove xray.psu dead links Reorganize toolkit summary, and remove lots of dead toolkit links Add Trevor Morgan's dicomlib toolkit Add JDCM Java DICOM Toolkit Add PixelMed Java DICOM Toolkit Tidy up Medigration info Tidy up OFFIS urls Add CDMEDICSPACSWEB site Add Conquest DICOM site Add JiveX site Update UCDMC sites to http from ftp Update DICOM image sample sites Update XMedCon PET format convertor site Update ITU T.81 text site Add transfer syntax determination explanatuon and code Add Madena viewer site Add idoimaging index site Add iRad Mac viewer site Add IBM conformance statement site Comment that R Hindel and book no longer contactable Add free PACS web site Add Apteryx Java Image I/O Plugin site Update Escape QT site Add YourDICOM site Add MRIConvert site Add explanation of DICOM image orientation Update JPEG 2000 resources Clean up a lot of conformance statement links Update UID registration, and elaborate on description Revise structured reporting resources, and add new web site Add dicomworks site Add Almacom's J2K codec site Update Mark Nelson's data compression library site Update TIFF spec site at Adobe Update ALI conformance site Add Sanders ViewPlus site Update MacAngioView site Add display performance section and AAPM TG18 reference Tidy up part 7 indentation of compression/jpeg links Update VMS tools site Update Kodak conformance statement site Add CardioVista viewer Add sites about reading DICOM in MATLAB Fix IHE MESA tools site reference Add Analog Devices J2K chip Changes last issue Add Tiani Java Image Archive Application with source code Tidy up MultiTech site Add Sante Viewer and Anonymizer site Update MedX site Update bicubic spline interpolation site Update Analyze format sites Add SPM site Update Tiani conformance statement site Add J2K book reference Add Kakadu J2K codec site Update MultiTech Solutions web site Cull out dead links to image sites Update ANSI UID registration address Add InviWeb DicomWorks site Update mirror sites list Add Xinapse software and consulting site Update unuid pathology image site Update Interfile resources Add IHE and MESA tools sites Add DICOM SR sample sites Add reViewMD PocketPC DICOM viewer site Add RMIT Digital Radiography page Update TrueVIEW URL Add NIH etdips 3D software site Add Kitware vtk site Add Konica images site Add ECRI DICOM Compatability Analysis Form site Add Marianne's DicomEdit site Add MedicView site The next part is table of contents. Subject: Contents 1. Introduction 1.1 Objective 1.2 Types of Formats 1.3 In Desperation - Quick & Dirty Tricks 2. Standard Formats 2.1 ACR/NEMA 1.0 and 2.0 2.2 DICOM 3.0 2.2.1 Localizer lines on DICOM images 2.3 Papyrus 2.4 Interfile V3.3 2.5 Qsh 2.6 DEFF 3. Proprietary Formats 3.1 Proprietary Formats - General Information 3.1.1 SPI (Standard Product Interconnect) 3.1.2 Siemens - Features common to multiple families Siemens Vax/VMS Siemens Sparc SunOS Starting up Getting a console Native images Exporting images Physical connection Archive devices Becoming root Reset 3.2 CT - Proprietary Formats 3.2.1 General Electric CT GE CT 9800 GE CT 9800 Image data GE CT 9800 Tape format GE CT 9800 Raw data MR GE CT Advantage - Genesis GE CT Advantage Image data GE CT Advantage Archive format GE CT Advantage Raw data GE CT Pace GE CT Sytec GE CTI 3.2.2 Siemens CT Siemens Somatom DR Siemens Somatom Plus Siemens Somatom AR 3.2.3 Philips CT 3.2.4 Picker CT 3.2.5 Toshiba CT 3.2.6 Hitachi CT 3.2.7 Shimadzu CT 3.2.8 Elscint CT 3.2.8 Imatron CT 3.3 MR - Proprietary Formats 3.3.1 General Electric MR GE MR Signa 3.x,4.x GE MR Signa 3.x,4.x Image data GE MR Signa 3.x,4.x Tape format GE MR Signa 3.x,4.x Raw data GE MR Signa 5.x - Genesis GE MR Signa 5.x Image data GE MR Signa 5.x Archive format GE MR Signa 5.x Raw data GE MR Max GE MR Vectra 3.3.2 Siemens MR Siemens Magnetom GBS/GBS II Siemens Magnetom GBS/GBS II Native Format Siemens Magnetom GBS/GBS II SPI Format Siemens Magnetom SP Siemens Magnetom SP Native Format Siemens Magnetom SP SPI Format Siemens Magnetom Impact Siemens Magnetom Impact Native Format Siemens Magnetom Impact SPI Format Siemens Magnetom Vision Siemens Magnetom Vision Native Format Siemens Magnetom Vision SPI Format 3.3.3 Philips MR Philips Gyroscan S5 Philips Gyroscan ACS Philips Gyroscan T5 Philips Gyroscan NT5 & NT15 3.3.4 Picker MR 3.3.5 Toshiba MR 3.3.6 Hitachi MR 3.3.7 Shimadzu MR 3.3.8 Elscint MR 3.4 Proprietary Workstations 3.4.1 ISG Workstations Gyroview 3.4.2 GE Workstations GE Advantage Windows 3.5 Other Proprietary Formats 3.5.1 Analyze From Mayo 4. Host Machines 4.1 Data General 4.1.1 Data General Data Data General Integers Data General Floating Point 4.1.2 Data General Operating System Data General RDOS Data General AOS/VS 4.1.3 Data General Network 4.2 Vax 4.2.1 Vax Data Vax Integers Vax Floating Point Vax Strings 4.2.2 Vax Operating System Vax VMS ULTRIX OSF 4.3 Sun - Sun3 68000 and Sun4 Sparc 4.3.1 Sun Data Sun Integers Sun Floating Point Sun Strings 4.3.2 Sun Operating System 5. Compression Schemes 5.1 Reversible Compression 5.2 Irreversible Compression 5.2.1 Perimeter Encoding 5.3 DICOM Compression 6. Getting Connected 6.1 Tapes 6.2 Ethernet 6.3 Serial Ports 7. Sources of Information 7.1 Contacts and Sites 7.2 Relevant FAQ's 7.3 Mailservers 7.4 References 7.5 Organizations and Societies 7.6 Usenet Newsgroups 7.7 DICOM Information Sources 8. Acknowledgements The next part is part1 - general information & standard formats. 1. Introduction 1.1 Objective The goal of this FAQ is to facilitate access to medical images stored on digital imaging modalities such as CT and MR scanners, and their accompanying descriptive information. The document is designed particularly for those who do not have access to the necessary proprietary tools or descriptions, particularly in those moments when inspiration strikes and one just can't wait for the local sales person to track down the necessary authority and go through the cycle of correspondence necessary to get a non-disclosure agreement in place, by which time interest in the project has usually faded, and another great research opportunity has passed! It may also be helpful for those keen to experiment with home-grown PACS-like systems using their existing equipment, and also for those who still have equipment that is still useful but so old even the host computer vendor doesn't support it any more! There is of course no substitute for the genuine tools or descriptions from the equipment vendors themselves, and pointers to helpful individuals in various organizations, as well as names and catalog numbers of various useful documents, are included here where known. In addition there are several small companies that specialize in such connectivity problems that have a good reputation and are well known. Contact information is provided for them, though I personally have no experience with their products and am not endorsing them. Finally, great care has been taken not to include any information that has been released under non-disclosure agreements. What is included here is the result of either information freely released by vendors, handy hints from others working in the field, or in many cases close scrutiny of hex dumps and experimentation with scanner parameters and study of the effects on the image files. The intent is to spread hard-earned knowledge gained over many years amongst those new to the field or a particular piece of equipment, not to threaten anyone's proprietary interests, or to substitute for the technical support available from vendors that ranges from free to extortionate, and excellent to abysmal, depending on who your are dealing with and where in the world you are located! Please use this information in the spirit in which is intended, and where possible contribute whatever you know in order to expand the information to cover more vendors and equipment. 1.2 Types of Formats Later sections will deal with the problems of getting the image files from the modality to the workstation, but for the moment assume the files are there and need to be deciphered. Four types of information are generally present in these files: - image data, which may be unmodified or compressed, - patient identification and demographics, - technique information about the exam, series, and slice/image. Extracting the image information alone is usually straightforward and is described in 1.3. Dealing with the descriptive information, for example to make use of the data for dissemination in a PACS environment, or to extract geometry details in order to combine images into 3D datasets, is more difficult and requires deeper understanding of how the files are constructed. There are three basis families of formats that are in popular use: - fixed format, where layout is identical in each file, - block format, where the header contains pointers to information, - tag based format, where each item contains its own length. The block format is one of the most popular, though in most cases, the early part of the header contains only a limited number of pointers to large blocks, the blocks are almost always in the same place and a constant length, for standard rather than reformatted images at least, and if one doesn't know the specifics of the layout one can get by assumming a fixed format. I presume this reflects the intent of the designers to handle future expansion and revision of the format. The example par excellence of the tag based format is the ACR/NEMA style of data stream, which, though never intended as a file format per se has proven useful as model. See for example the sections dealing with the ACR/NEMA standards as well as DICOM (whose creators are about to vote on a media interchange format after all this time) and Papyrus. ACR/NEMA style tags are described in more detail elsewhere, but each is self-contained and self-describing (at least if you have the appropriate data dictionary) and contains its own length, so if you can't interpret it you can skip it! Very convenient. Most file formats based on this scheme are just concatenated series of tags, and apart from having to guess the byte order, which is not specified (unlike TIFF which is a similar deal for those in the "real" imaging world), and sometimes skip a fixed length but short header, are dead easy to handle. To identify such a file just do a "strings <file | grep 'ACR-NEMA'" - if it is such a file, just look through the start of the hex dump until you start to see the characteristic sequentially ordered pairs of 16 bit words that identify ACR/NEMA attributes, decide the byte order, et voila, you can pipe it into any general ACR/NEMA dumping program to see what it contains. If you see even group tags, they will be described in the standard. If you see odd group tags then they are vendor specific and you will have to ask the vendor or correlate them with identification information printed on the film until you figure out the ones that are important to you. 1.3 In Desperation - Quick & Dirty Tricks Because radiologists, radiographers, technologists, physicists and imaging programmers are dedicated long suffering creatures who work long hours under adverse conditions for little reward, the vendors in their generosity have seen fit to make life a little easier, by almost universally putting the image data at the end of the file. Rarely you will see files that are padded out to fixed record size boundaries (eg. Vax VMS 512 byte records), and sometimes overlay plane data may be stored after the image data. Furthermore there is almost always an option at archive time to allow for storage in an uncompressed and totally unadulterated form. Even in ACR/NEMA the tag for image pixel data is numerically the highest and hence the last to appear in the sequence which is guaranteed to be sorted. They could have screwed us up totally by gratuitously adding variable length blocks of other stuff at the end, but the only time I have encountered this was on a Siemens Impact with the ACR/NEMA based SPI format padded out to 512 bytes. In other words, if an image is 256 by 256, uncompressed, and 12-16 bits deep (and hence usually, though not always, stored as two bytes per pixel), then we all know that the file is going to contain 256*256*2=131072 bytes of pixel data at the end of the file. If the file is say 145408 bytes long, as all GE Signa 3X/4X files are for example, then you need to skip 14336 bytes of header before you get to the data. Presume row by row starting with top left hand corner raster order, try both alternatives for byte order, deal with the 16 to 8 bit windowing problem, and very soon you have your image on the screen of your workstation. This technique is so useful, even NIH Image for the Macintosh (an excellent must-have free program BTW.) provides a raw import tool to do this, and describes it in the manual using the 14336 byte offset! This tool is something that is sadly lacking in most commercial image handling programs for non-medical applications, which can't import images with more than 8 bits per channel. Of course you have to live without the identification, demographic and technique information (other than what can be derived from the file name in some cases), but for many research and presentation purposes this is quite adequate. Occasionally one runs into clever files where four 12 bit words are packed into three 16 bit words and one goes crazy trying to figure out the logic of how they are packed. The back of the old ACR/NEMA standard describes somewhere one way in which this is done. One should still be able to calculate the length easily enough. I haven't yet encountered a format that did nasty things like have strips of rows seperated by padding ... I guess we are lucky that most images are nice powers of two or even multiples thereof (256,320,512). Of course the GE CT 9800 uses perimeter encoding even when DPCM compression is not selected, so this technique won't work. 2. Standard Formats 2.1 ACR/NEMA 1.0 and 2.0 ACR/NEMA Standards Publication No. 300-1985 <- ACR/NEMA 1.0 ACR/NEMA Standards Publication No. 300-1988 <- ACR/NEMA 2.0 ACR/NEMA Standards Publication PS2-1989 <- data compression The American College of Radiologists (ACR) and the National Electrical Manufacturers Association (NEMA) recognized some time ago the need for standards to facilitate multi-vendor connectivity to promote the development of PACS and what is now referred to as Wide Area Networking. The first such standard was version 1.0 which was released in 1985 as ACR/NEMA Standards Publication No. 300-1985, subsequently revised several times, then revised again and released as version 2.0 in 1988, described in ACR/NEMA Standards Publication No. 300-1988. There it remained until a radically revised and reorganized approach, preserving backward compatibility, was released during 1992-1993 as ACR/NEMA Standards Publication PS3, also referred to as DICOM 3. In the interim, to facilitate the transfer of compressed images, another standard described in ACR/NEMA Standards Publication PS2-1989, was released which described various means fo extending standard 300-1985 to handle compression utilizing a broad range of reversible and irreversible schemes. Though this part of the standard was never apparently implemented by anyone, and has been quietly bypassed by those working on DICOM 3 compression, it makes very interesting reading and is a nice summary of applicable techniques. What does one need to know about ACR/NEMA 1.0 and 2.0 ? The standards define a mechanism along the lines of the layered ISO-OSI (Open Systems Interconnect) model, with physical, transport/network, session, and presentation and application layers. Unless one actually wants to physically connect to a device that supports the unique 50 pin point-to-point electrical interface, then one really only needs to be aware of how ACR/NEMA implements the presentation and application layers, which are described in terms of a "message format". This message format is important to many people, not because anyone seriously wants to connect devices in the limited fashion envisaged by these early standards, but because many proprietary formats and other de facto standards have adopted the ACR/NEMA message format and its corresponding data dictionary and extension mechanisms. The message format is described in sections 4, 5 and 10 of ACR/NEMA SP 300-1988 which are summarized briefly here. Section 6 describes command structure which is not really relevant other than that commands are also structured in the same way as data and consume part of the data dictionary. You will not encounter command tags in data streams ("messages") encapsulated in file formats though. A message consists of a series of "data elements" each of which contains a piece of information. Each element is described by an "element name" consisting of a pair of 16 bit unsigned integers ("group number", "data element number"). The data stream is ordered by ascending group number, and within each group by ascending data element number. Each element may occur only once in a message. Even numbered groups describe elements defined by the standard. Odd numbered groups are available for use by vendors or users, but must conform to the same structure as standard elements. Following the (group number, data element number) pair is a length field that is a 32 bit unsigned even integer that describes the number of bytes from the end of the length field to the beginning of the next data element. The last part of a data element is its value, which is defined by the data dictionary to be an ascii (numeric AN or text AT) or binary value (BI 16 bit or BD 32 bit). The values may be single or multiple. Multiple ascii values are delimited by the backslash (05CH) character. Odd length ascii values are padded with a space (020H). For example: 0008 0010 000C 0000 4341 2D52 454E 414D 3120 302E is data element "Recognition Code" because that is what the dictionary defines group 0008 element 0010 to be. The dictionary says it is of type AT (ascii text), has a value multiplicity of single and only enumerated values are allowed, in this case the ascii string "ACR-NEMA 2.0". It is of length 0000000C hex or 12 bytes long. The electrical interface is a 16 bit one, and hence even though 32 binary values are defined to be transmitted least significant word first (though the order for the 32 bit length is not actually specified), there is no mention in the standard as to how to encapsulate the message in an 8 bit world, hence different users and vendors have chosen little or big endian schemes. The new DICOM standard assumes a default little endian representation which seems to be the most appropriate considering the old definition for 32 bit words, which specified that the least significant 16 bit word be transmitted first. Hence there are three likely possible byte orders that a vendor interpreting the ACR/NEMA standard in a byte oriented world may have used: - little endian 16 and 32 bit words, as in DICOM 3, - big endian 16 and 32 bit words, as in DICOM 3, - big endian 16 bit words, but the least significant half of a 32 bit word is sent first (as per ACR/NEMA 2.0). The choice seems to be made usually on the basis of the native byte order of integers on the host processor. Most of the formats I have encountered are one of the first two, but I did encounter one from Philips that used the last scheme and it drove me crazy for a while, until I appreciated the subtlety of it ! I call it "Big Bad Endian" format in my implementation that recognizes it, but that may be a value judgement on my part :) Notice particularly how this design allows one to parse the message even if the data dictionary is not complete. Consider an element that has an unrecognized element name. One cannot interpret the content of the element and so has to ignore it. One doesn't even know whether it contains binary or ascii information (this is what DICOM later refers to as "implicit representation". despite this, the length value allows one to skip to the next element and proceed. Over the years there has been much discussion amongst those who favour such implicit dictionary driven schemes, and those who prefer explicit representations, including explicit description of the element type (binary or ascii, etc.) and even the element description itself! Some would prefer the message to contain something like "RecognitionCode='ACR-NEMA 2.0';" for example. The nuclear medicine groups have adopted a de facto standard called Interfile that makes use of ACR/NEMA data elements, but uses such a descriptive representation. Their argument is that the data stream is much more readable which is true enough, and more readily extensible. The groups are organized as follows: 0000 Command 0008 Identifying 0010 Patient 0018 Acquisition 0020 Relationship 0028 Image Presentation 4000 Text 6000-601E (even) Overlay 7FE0 Pixel Data Some of the more interesting elements are: (nnnn,0000) BD S Group Length # of bytes in group nnnn (nnnn,4000) AT M Comments (0008,0010) AT S Recognition Code # ACR-NEMA 1.0 or 2.0 (0008,0020) AT S Study Date # yyyy.mm.dd (0008,0021) AT S Series Date # yyyy.mm.dd (0008,0022) AT S Acquisition Date # yyyy.mm.dd (0008,0023) AT S Image Date # yyyy.mm.dd (0008,0030) AT S Study Time # hh.mm.ss.frac (0008,0031) AT S Series Time # hh.mm.ss.frac (0008,0032) AT S Acquisition Time # hh.mm.ss.frac (0008,0033) AT S Image Time # hh.mm.ss.frac (0008,0060) AT S Modality # CT,NM,MR,DS,DR,US,OT (0010,0010) AT S Patient Name (0010,0020) AT S Patient ID (0010,0030) AT S Patient Birthdate # yyyy.mm.dd (0010,0040) AT S Patient Sex # M, F, O for other (0010,1010) AT S Patient Age # xxxD or W or M or Y (0018,0010) AT M Contrast/Bolus Agent # or NONE (0018,0030) AT M Radionuclide (0018,0050) AN S Slice Thickness # mm (0018,0060) AN M KVP (0018,0080) AN S Repetition Time # ms (0018,0081) AN S Echo Time # ms (0018,0082) AN S Inversion Time # ms (0018,1120) AN S Gantry Tilt # degrees (0020,1040) AT S Position Reference # eg. iliac crest (0020,1041) AN S Slice Location # in mm (signed) (0028,0010) BI S Rows (0028,0011) BI S Columns (0028,0030) AN M Pixel Size # row\col in mm (0028,0100) BI S Bits Allocated # eg. 12 bit for CT (0028,0101) BI S Bits Stored # eg. 16 bit (0028,0102) BI S High Bit # eg. 11 (0028,0103) BI S Pixel Representation # 1 signed, 0 unsigned (7FE0,0010) BI M Pixel Data # as described by grp 0028 The way in which the pixel data is stored can vary tremendously, though thankfully most users and vendors use the simple unimaginative scheme that is shown above, ie. 1 12 bit pixel stored in the low order part of a 16 bit word with no attempt at packing more compactly. Following are some examples shown in Appendix E of the standard. Note that when one adds the little/big endian question the permutations mount! Bits Allocated = 16 Bits Stored = 12 High Bit = 11 |<------------------ pixel ----------------->| ______________ ______________ ______________ ______________ |XXXXXXXXXXXXXX| | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0 --------------------------- Bits Allocated = 16 Bits Stored = 12 High Bit = 15 |<------------------ pixel ----------------->| ______________ ______________ ______________ ______________ | | | |XXXXXXXXXXXXXX| |______________|______________|______________|______________| 15 12 11 8 7 4 3 0 --------------------------- Bits Allocated = 12 Bits Stored = 12 High Bit = 11 ------ 2 ----->|<------------------ pixel 1 --------------->| ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0 -------------- 3 ------------>|<------------ 2 -------------- ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0 |<------------------ pixel 4 --------------->|<----- 3 ------ ______________ ______________ ______________ ______________ | | | | | |______________|______________|______________|______________| 15 12 11 8 7 4 3 0 --------------------------- And so on ... refer to the standard itself for more detail. The next part is part2 - standard formats (continued).