Support for many SCSI devices in Linux

This page is about supporting more than 128 SCSI disks with Linux 2.4.xx.

Status of Linux 2.4.19/2.5.32

Linux, as of 2.4.19 (and 2.5.32), does only support up to 128 SCSI disks. The limitation comes from having 8 block device major numbers allocated and as a SCSI disk takes 16 minor numbers (allowing for 15 partitions), we can only support up to 128 disks. I wrote patches to get past that limitation.

Design considerations

The patches should fulfill several criteria: The sd driver thus needs to be changed to dynamically allocate major numbers beyond the first 8/16 (8,65--71/128--135) when needed. These numbers need to be either reserved (by officially registering them) or be reported by the kernel to userspace, so a tool can create device nodes as needed. The static allocation of per-disk and per-major structures needs to be replaced by a dynamic allocation.

Patches

Patches for many SCSI devices (against 2.4.19-SuSE1/2.4.21)
Patch Status Needs Description

1. scsi-rep-hldevs-2421.diff /
scsi-rep-hldevs-2419S1.diff

stable - Extends the file /proc/scsi/scsi to report attached high-level drivers and their device node in order to report this information back to userspace:
Host: scsi3 Channel: 00 Id: 10 Lun: 00
  Vendor: IBM      Model: DDYS-T18350N     Rev: S84D
  Type:   Direct-Access                    ANSI SCSI revision: 03
  Attached drivers: sde(b:08:40) sg5(c:15:05)

2. scsi-switch-repdev-2421.diff /
scsi-switch-repdev-2419S1.diff

stable 1 Make the extra line with Attached drivers: in /proc/scsi/scsi configurable by echoing scsi report-devs X to /proc/scsi/scsi where X is 0 or 1 to switch off/on, respectively. Defaults to off, so we can't break anything.

3. scsi-boot-scsi-2421.diff /
scsi-boot-scsi-2419S1.diff

stable 1 Uses the infrastructure of patch 1 to allow for the specification of the root file system by giving host controller number, controller bus (channel), target (SCSI ID), unit (SCSI LUN) and partition number as root=/dev/scsi/sdcHbCtIuLpP, where H,C,I,L,P are to be replaced by numbers. Together with the scsihosts boot parameter, the root fs can be addressed in a relatively persistent way.

4. (merged into 2.4.20)
nopart-stat-2419S1.diff

stable - As of 2.4.19rc1, we allocate 68 bytes (on 32bit archs) of storage per partition for in hd_struct in gendisk.part, in other words 17408 bytes kmalloc()ed kernel memory per major. This is a lot and becomes a problem, if many disks are attached. One int is unused and thrown out by this patch, thus we're back to 16k per major. 13 more ints are only used for statistics and not interesting for most end-user (and overflow after some time), so this patch makes these bytes a config option CONFIG_BLK_STAT, reducing size to 3k per major. Note that this statistics stuff had been introduced in the 2.4.19-pre only.

5. sd_many-8-2421.diff /
sd_many-7-2419S1.diff

stable 1, (4) Makes major allocation in the sd driver dynamic. Initially, only major 8 is registered. As more disks become attached, more majors are allocated. The next majors are 65--71. Then 128--135. Then we either ask devfs for new free block majors or we test ourselves for free block majors starting with 144. Ugly CONFIG_SD_EXTRA_DEVS has gone. Code should be clean to change the number of partitions per disk. Naming after disk 702 (sdzz) is sdaaa. The number of possible majors supported after applying the patch is configurable. It defaults to 16 (thus being compliant with the number of officially assigned devices), but we can get up to 244 majors (3904 disks). Of course only, if no other devices want the majors.

6. fix-regblkchrdev-sdm7-2421.diff /
fix-regblkchrdev-sdm7-2419S1.diff

stable (5) Fixes register_blkdev() and register_chrdev() to not return majors that are reserved for a driver in devices.txt. Use it to get rid of blk_getfops() hack in patch 5. Patch contributed by Carsten Otte (IBM Böblingen), script to fetch recent dev nos from LANANA and convert to reservation list in C from Susanne Oberhauser (SuSE Nürnberg).

7. sg_many-6-2421.diff /
sg_many-5-2419S1.diff

stable 1, (6) This is the complement to patch 5 (sd_many) for generic SCSI devices: It allows to support more than 256 sg devices by dynamically registering majors for further sg devices.
Patch has been written by Holger Smolinski. Note: The diff is unfortunately very large, because I renamed sg.c into sg_base.c to allow the module to have the same name as in stock kernels: sg.
Note: This is not the case for the 2.4.21 patch: Some Makefile trick removes the need to rename sg.c. For 2.4.21, the sg_many patch should be applied prior to the sd_many one.

Possible issues with patch 5 (sd_many-7)

DISCLAIMER: When preparing the sd_many-3/4/5/6/7 patches, I have been careful and I have done some testing. Still I'd like tester to take care: I can not exclude the possibility that it crashes your computer or in the worst case causes data loss. So please be prepared! Have a backup for valuable data before you test it.

TO DO

A forward port to 2.5.3x for inclusion into the mainstream kernel. Volunteers wanted!

Change log

Changes 2003-07-25:
- Fix sd_many patch: During port to 2.4.21, a change in device size reporting had been overlooked and thus scsi disk sizes been misreported (cosmetical).
Changes 2003-06-29:
- Rediff against 2.4.21
Changes 2002-08-13:
- Update sg_many-4 to sg_many-5 (sg_many-4-5.diff), which fixes a bug in the devfs_dealloc_major() call. (Only relevant if you use devfs.)
Changes 2002-08-13:
- sd_many-7 fixes a bug with deallocation of majors and fixes the use of scsi_malloc(), which only returns memory in chunks of 512 byte.
- Diff between sg-many-6 and 7 is here: sd_many-6-7.diff.
- Rediffed against 2.4.19-SuSE1.
- Added patch 6: Fix register_blk/chrdev() and use it.
- Added patch 7: Support for many sgs, contributed by Holger Smolinski.
- Added More SCSI patches section.
Changes 2002-08-02:
- Rediffed against 2.4.19-rc5aa1. (Untested)
Changes 2002-07-30:
- Rediffed against clean 2.4.19-rc3. I had overlooked some aa patches when generating the patches before. Those diffs have been renamed into 2.4.19rc1aa now.
Changes from sd_many-5 to 6:
- Make buffer length passed to scsi_wait_req() in sd_init_onedisk() consistent with allocation length in CDB and set it to 252. (Inspired by Matthew Darm.)
- Diff from 5 to 6 is here: sd_many-5-6.diff.
Changes from sd_many-4 to 5:
- Added officially LANANA registered SCSI majors to the ones being used first. Thanks to Pete Zaitvec for the hint.
- Made maximum number of SCSI majors configurable, defaulting to 16, so we're taking any non-assigned majors by default.
- Try block majors 144-254, 72--127, 136--143, 12--63 afterwards, meaning, we could get up to 244 majors (3904 disks) theoretically, if all these majors are free.
- Added all "sd" devices to the list of prefered devices to sync on in sysrq. (Also inspired by Pete.)
- Diff between sd_many-4 and 5 is here: sd_many-4-5.diff.
Changes from sd_many-3 to 4:
- With v3, on devfs systems, upon module unload we do not call devfs_dealloc_major() for the dynamic majors that we got by devfs_alloc_major(). Fixed.
- With v3, on lots of functions, we first convert the major number to an index into our own tables. This is currently done by searching an array and should probably by speeded up by using a reverse array. Fixed
- /proc/partitions output has been corrected. I introduced a new function (*devname)(kdev_t, char*) into the gendisk structure to do the kdev_t to name conversion in the driver, not disk_name() in fs/partitions/check.c. For IDE, I got an oops, as it did not properly memset() the kmalloc()ed gendisk structure. Fixed as well. I checked the other gendisk-providing drivers, but they all looked safe, fortunately. Hint: disk_name()could be considerably cleaned up, if more drivers provided this callback.
- Here's the diff from sd_many-3 to 4: sd_many-3-4.diff
Changes from sd_many-2 to 3:
- In scsi_lib, at one place min_/max_major was used to determine which high-level device driver support a device. Removed that crap and replaced by a (*drives_dev)(kdev_t) entry in the high-level device_template structure. Implemented for sd.
- devfs_alloc_major() does not work no non-devfs systems. Implemented replacement using get_blkfops().
- Here's the patch from sd_many-2 to sd_many-3: sd_many-2-3.diff

More SCSI patches

Here are three more SCSI patches for inclusion into 2.4.19: (diffed agaionst 2.4.19-SuSE1, but should apply cleanly to other trees)

scsidev

I added support in scsidev for the extended /proc/scsi/scsi as implemented in patches 1 and 2, so scsidev can provide reliable operation.

Related stuff

When hacking sd, I came across three little bugs, which are of course fixed by the above patch 5. But if you don't apply it, you may still be interested in those three. And most probably patch 4. All three patches have been merged by Alan Cox in his ac kernels as of 2.4.19rc2-ac2.
Written by Kurt Garloff <garloff@suse.de>, 2002-07-17. Feedback is welcome! Even more welcome is help, e.g. with testing, implementing the support for many devices in sg or having good ideas about the reuse of sd slots issue.

Note: This page consists of handwritten HTML and can be viewed with any browser.