James Bottomley [Sat, 26 Jan 2008 02:05:55 +0000 (20:05 -0600)]
[SCSI] bsg: copy the cmd_type field to the subordinate request for bidi
This fixes a problem in SCSI where we use the (previously
uninitialised) cmd_type via blk_pc_request() to set up the transfer in
scsi_init_sgtable().
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Fri, 25 Jan 2008 14:25:14 +0000 (23:25 +0900)]
[SCSI] handle scsi_init_queue failure properly
scsi_init_queue is expected to clean up allocated things when it
fails.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Fri, 25 Jan 2008 14:25:13 +0000 (23:25 +0900)]
[SCSI] destroy scsi_bidi_sdb_cache in scsi_exit_queue
Needs to call kmem_cache_destroy for scsi_bidi_sdb_cache in
scsi_exit_queue.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:32:01 +0000 (01:32 +0900)]
[SCSI] scsi_debug: add XDWRITEREAD_10 support
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:32:00 +0000 (01:32 +0900)]
[SCSI] scsi_debug: add bidi data transfer support
This enables fill_from_dev_buffer and fetch_to_dev_buffer to handle
bidi commands.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Tue, 22 Jan 2008 16:31:59 +0000 (01:31 +0900)]
[SCSI] scsi_debug: add get_data_transfer_info helper function
This adds get_data_transfer_info helper function that get lha and
sectors for READ_* and WRITE_* commands (and XDWRITEREAD_10 later).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Douglas Gilbert <dougg@torque.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Bottomley [Tue, 15 Jan 2008 17:11:46 +0000 (11:11 -0600)]
[SCSI] remove use_sg_chaining
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Kiyoshi Ueda [Fri, 18 Jan 2008 17:02:15 +0000 (12:02 -0500)]
[SCSI] bidirectional: fix up for the new blk_end_request code
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Thu, 13 Dec 2007 11:50:53 +0000 (13:50 +0200)]
[SCSI] bidirectional command support
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Thu, 13 Dec 2007 11:47:40 +0000 (13:47 +0200)]
[SCSI] implement scsi_data_buffer
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Boaz Harrosh [Fri, 14 Dec 2007 00:14:27 +0000 (16:14 -0800)]
[SCSI] tgt: use scsi_init_io instead of scsi_alloc_sgtable
If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.
[jejb: rebase on to sg_table and fix up rejections]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 26 Jan 2008 15:08:19 +0000 (00:08 +0900)]
[SCSI] aic7xxx: fix warnings with CONFIG_PM disabled
CC [M] drivers/scsi/aic7xxx/aic7xxx_osm_pci.o
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:148: warning: 'ahc_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic7xxx_osm_pci.c:166: warning: 'ahc_linux_pci_dev_resume' defined but not used
This moves aic7xxx_pci_driver struct, removes some forward declarations,
and adds some ifdef CONFIG_PM.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sat, 26 Jan 2008 15:08:18 +0000 (00:08 +0900)]
[SCSI] aic79xx: fix warnings with CONFIG_PM disabled
CC [M] drivers/scsi/aic7xxx/aic79xx_osm_pci.o
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:101: warning: 'ahd_linux_pci_dev_suspend' defined but not used
drivers/scsi/aic7xxx/aic79xx_osm_pci.c:121: warning: 'ahd_linux_pci_dev_resume' defined but not used
This moves aic79xx_pci_driver struct, removes some forward
declarations, and adds some ifdef CONFIG_PM.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
David Milburn [Fri, 25 Jan 2008 18:16:18 +0000 (12:16 -0600)]
[SCSI] aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions
The driver only needs to check the SCB_ACTIVE flag if the SCB is not
in the untagged queue.
If the driver is in error recovery, you may end panic'ing on a TUR
that is in the untagged queue.
Attempting to queue an ABORT message
CDB: 0x0 0x0 0x0 0x0 0x0 0x0
SCB 3 done'd twice
This patch is included in Adaptec's 6.3.11 driver on their website.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Thomas Bogendoerfer [Sat, 26 Jan 2008 23:25:53 +0000 (00:25 +0100)]
[SCSI] sgiwd93: use cached memory access to make driver work on IP28
SGI IP28 machines would need special treatment (enable adding addtional
wait states) when accessing memory uncached. To avoid this pain I
changed the driver to use only cached access to memory.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:50 +0000 (12:41 +0900)]
[SCSI] zfcp: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:51 +0000 (12:41 +0900)]
[SCSI] ncr53c8xx: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 03:41:09 +0000 (12:41 +0900)]
[SCSI] aic79xx: fix sense_buffer access bug
The commit
de25deb18016f66dcdede165d07654559bb332bc changed
scsi_cmnd.sense_buffer from a static array to a dynamically allocated
buffer. We can't access to sense_buffer in '&cmd->sense_buffer' way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
FUJITA Tomonori [Sun, 27 Jan 2008 01:22:26 +0000 (10:22 +0900)]
[SCSI] hptiop: fix sense_buffer access bug
&cmnd->sense_buffer now zeroes the wrong thing.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Nathan Lynch [Sat, 26 Jan 2008 22:07:30 +0000 (16:07 -0600)]
[SCSI] sym53c8xx: fix bad memset argument in sym_set_cam_result_error
On a big powerpc box I got the following oops with 2.6.24-git2:
sym0: <1010-66> rev 0x1 at pci 0000:d0:01.0 irq 215
sym0: No NVRAM, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi0 : sym-2.2.3
target0:0:8: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 31)
scsi 0:0:8:0: Direct-Access IBM ST318305LC C509 PQ: 0
ANSI: 3
target0:0:8: tagged command queuing enabled, command queue depth 16.
target0:0:8: Beginning Domain Validation
target0:0:8: asynchronous
target0:0:8: wide asynchronous
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
target0:0:8: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000038460
cpu 0x25: Vector: 300 (Data Access) at [
c00000000f567840]
pc:
c000000000038460: .memcpy+0x60/0x280
lr:
d000000000050280: .sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
sp:
c00000000f567ac0
msr:
8000000000009032
dar: 0
dsisr:
42000000
current = 0xc000006d1e0af0a0
paca = 0xc0000000004afc00
pid = 0, comm = swapper
enter ? for help
[link register ]
d000000000050280
.sym_set_cam_result_error+0xfc/0x1e0 [sym53c8xx]
[
c00000000f567ac0]
c00000000f567b80 (unreliable)
[
c00000000f567b80]
d0000000000552b8 .sym_complete_error+0x12c/0x1bc [sym53c8xx]
[
c00000000f567c20]
d0000000000561a4 .sym_int_sir+0xaa4/0x1718 [sym53c8xx]
[
c00000000f567d00]
d000000000057e8c .sym_interrupt+0x4e4/0x6ec [sym53c8xx]
[
c00000000f567dc0]
d00000000004fdf4 .sym53c8xx_intr+0x6c/0xdc [sym53c8xx]
[
c00000000f567e50]
c0000000000a83e0 .handle_IRQ_event+0x7c/0xec
[
c00000000f567ef0]
c0000000000aa344 .handle_fasteoi_irq+0x130/0x1f0
[
c00000000f567f90]
c00000000002a538 .call_handle_irq+0x1c/0x2c
[
c000004d5e0b3a90]
c00000000000c320 .do_IRQ+0x108/0x1d0
[
c000004d5e0b3b20]
c000000000004790 hardware_interrupt_entry+0x18/0x1c
The memset() in sym_set_cam_result_error() would appear to be trashing
the scsi_cmnd struct instead of clearing sense_buffer.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Linus Torvalds [Wed, 30 Jan 2008 13:40:09 +0000 (00:40 +1100)]
Merge git://git./linux/kernel/git/x86/linux-2.6-x86
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (890 commits)
x86: fix nodemap_size according to nodeid bits
x86: fix overlap between pagetable with bss section
x86: add PCI IDs to k8topology_64.c
x86: fix early_ioremap pagetable ops
x86: use the same pgd_list for PAE and 64-bit
x86: defer cr3 reload when doing pud_clear()
x86: early boot debugging via FireWire (ohci1394_dma=early)
x86: don't special-case pmd allocations as much
x86: shrink some ifdefs in fault.c
x86: ignore spurious faults
x86: remove nx_enabled from fault.c
x86: unify fault_32|64.c
x86: unify fault_32|64.c with ifdefs
x86: unify fault_32|64.c by ifdef'd function bodies
x86: arch/x86/mm/init_32.c printk fixes
x86: arch/x86/mm/init_32.c cleanup
x86: arch/x86/mm/init_64.c printk fixes
x86: unify ioremap
x86: fixes some bugs about EFI memory map handling
x86: use reboot_type on EFI 32
...
Linus Torvalds [Wed, 30 Jan 2008 13:30:15 +0000 (00:30 +1100)]
[net] Gracefully handle shared e1000/1000e driver PCI ID's
Both the old e1000 driver and the new e1000e driver can drive some
PCI-Express e1000 cards, and we should avoid ambiguity about which
driver will pick up the support for those cards when both drivers are
enabled.
This solves the problem by having the old driver support those cards if
the new driver isn't configured, but otherwise ceding support for PCI
Express versions of the e1000 chipset to the newer driver. Thus
allowing both legacy configurations where only the old driver is active
(and handles all chips it knows about) and the new configuration with
the new driver handling the more modern PCIE variants.
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 30 Jan 2008 13:26:10 +0000 (00:26 +1100)]
Make !NETFILTER_ADVANCED enable IP6_NF_MATCH_IPV6HEADER
We want IPV6HEADER matching for the non-advanced default netfilter
configuration, since it's part of the standard netfilter setup of at
least some distributions (eg Fedora).
Otherwise NETFILTER_ADVANCED loses much of its point, since even
non-advanced users would have to enable all the advanced options just to
get a working IPv6 netfilter setup.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 30 Jan 2008 12:34:12 +0000 (13:34 +0100)]
x86: fix nodemap_size according to nodeid bits
memnode.map is s16 array because of nodeid is 16 bit now.
so need to increase the nodemap_size according to that bits.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Yinghai Lu [Wed, 30 Jan 2008 12:34:12 +0000 (13:34 +0100)]
x86: fix overlap between pagetable with bss section
one early crash on one 8 node 256g machine:
Command line: console=uart8250,io,0x3f8,115200n8 initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug apic=debug acpi.debug_level=0x0000000f pci=routeirq ip=dhcp load_ramdisk=1 ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
BIOS-e820:
0000000000000000 -
000000000009bc00 (usable)
BIOS-e820:
000000000009bc00 -
00000000000a0000 (reserved)
BIOS-e820:
00000000000e6000 -
0000000000100000 (reserved)
BIOS-e820:
0000000000100000 -
00000000dffe0000 (usable)
BIOS-e820:
00000000dffe0000 -
00000000dffee000 (ACPI data)
BIOS-e820:
00000000dffee000 -
00000000dffff050 (ACPI NVS)
BIOS-e820:
00000000dffff050 -
00000000e0000000 (reserved)
BIOS-e820:
00000000fec00000 -
00000000fec01000 (reserved)
BIOS-e820:
00000000fee00000 -
00000000fee01000 (reserved)
BIOS-e820:
00000000ff700000 -
0000000100000000 (reserved)
BIOS-e820:
0000000100000000 -
0000004020000000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map =
67239936
Kernel panic - not syncing: Duplicated early reservation d40000-e42000
Pid: 0, comm: swapper Not tainted
2.6.24-smp-g5a514e21-dirty #3
Call Trace:
[<
ffffffff80221545>] lapic_get_maxlvt+0x0/0x10
[<
ffffffff80221657>] clear_local_APIC+0x5/0xcf
[<
ffffffff80221726>] disable_local_APIC+0x5/0x17
[<
ffffffff8021fe16>] smp_send_stop+0x46/0x4c
[<
ffffffff80235293>] panic+0x94/0x13e
[<
ffffffff80bc3b03>] sctp_eps_proc_init+0x12/0x34
[<
ffffffff80b9f1c5>] reserve_early+0x30/0x6c
[<
ffffffff80803925>] init_memory_mapping+0x2cd/0x2dc
[<
ffffffff80b9dc01>] setup_arch+0x21f/0x44e
[<
ffffffff80b978be>] start_kernel+0x6f/0x2c7
[<
ffffffff80b971cc>] _sinittext+0x1cc/0x1d3
it turns out there is overlap between pgtable and bss...
in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end
need to round up table_start to PAGE_SIZE.
also make the panic more informative.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Joachim Deguara [Wed, 30 Jan 2008 12:34:12 +0000 (13:34 +0100)]
x86: add PCI IDs to k8topology_64.c
This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges to
k8topology discovery.
Signed-off-by: Joachim Deguara <joachim.deguara@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: fix early_ioremap pagetable ops
Put appropriate pagetable update hooks in so that paravirt knows
what's going on in there.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: use the same pgd_list for PAE and 64-bit
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: defer cr3 reload when doing pud_clear()
PAE mode requires that we reload cr3 in order to guarantee that
changes to the pgd will be noticed by the processor. This means that
in principle pud_clear needs to reload cr3 every time. However,
because reloading cr3 implies a tlb flush, we want to avoid it where
possible.
pud_clear() is only used in a couple of places:
- in free_pmd_range(), when pulling down a range of process address space, and
- huge_pmd_unshare()
In both cases, the calling code will do a a tlb flush anyway, so
there's no need to do it within pud_clear().
In free_pmd_range(), the pud_clear is immediately followed by
pmd_free_tlb(); we can hook that to make the mmu_gather do an
unconditional full flush to make sure cr3 gets reloaded.
In huge_pmd_unshare, it is followed by flush_tlb_range, which always
results in a full cr3-reload tlb flush.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bernhard Kaindl [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: early boot debugging via FireWire (ohci1394_dma=early)
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: don't special-case pmd allocations as much
In x86 PAE mode, stop treating pmds as a special case. Previously
they were always allocated and freed with the pgd. The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.
This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.
There is a complicating wart, however. When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3. Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.
This patch simply avoids reloading cr3 unless the update is to the
current pagetable. Later patches will optimise this further.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: shrink some ifdefs in fault.c
The change from current to tsk in do_page_fault is safe as
this is set at the very beginning of the function.
Removes a likely() annotation from the 64-bit version, this
could have instead been added to 32-bit.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeremy Fitzhardinge [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: ignore spurious faults
When changing a kernel page from RO->RW, it's OK to leave stale TLB
entries around, since doing a global flush is expensive and they pose
no security problem. They can, however, generate a spurious fault,
which we should catch and simply return from (which will have the
side-effect of reloading the TLB to the current PTE).
This can occur when running under Xen, because it frequently changes
kernel pages from RW->RO->RW to implement Xen's pagetable semantics.
It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids
doing a global TLB flush after changing page permissions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: remove nx_enabled from fault.c
On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always
return early. The test is sufficient on PAE as __supported_pte_mask
is updated in the same places as nx_enabled in init_32.c which also
takes disable_nx into account.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:11 +0000 (13:34 +0100)]
x86: unify fault_32|64.c
Unify includes in moved fault.c.
Modify Makefiles to pick up unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: unify fault_32|64.c with ifdefs
Elimination of these ifdefs can be done in a unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: unify fault_32|64.c by ifdef'd function bodies
It's about time to get on with unifying these files, elimination
of the ugly ifdefs can occur in the unified file.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: arch/x86/mm/init_32.c printk fixes
printk fixes. NOP in terms of functionality, but strings got
a bit larger due to the KERN_ markers that were added.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: arch/x86/mm/init_32.c cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: arch/x86/mm/init_64.c printk fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: unify ioremap
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Huang, Ying [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: fixes some bugs about EFI memory map handling
This patch fixes some bugs of EFI memory handing code.
- On x86_64, it is possible that EFI memory map can not be mapped via
identity map, so efi_map_memmap is removed, just use early_ioremap.
- On i386, the EFI memory map mapping take effect cross paging_init,
so it is not necessary to use efi_map_memmap.
- EFI memory map is unmapped in efi_enter_virtual_mode to avoid
early_ioremap leak.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Huang, Ying [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: use reboot_type on EFI 32
This patch makes reboot_type of BOOT_EFI is used on i386 too. Because
correpsonding reboot code of i386 and x86_64 is merged.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: unify page fault oops printing
This changes the oops dumping format for page faults to
be similar between X86_32 and 64.
This is the first user of printk_address on X86_32.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:10 +0000 (13:34 +0100)]
x86: introduce show_fault_oops helper to fault_32|64.c
This will help when unifying the oops dumping code on 32/64
bit. No functional changes.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: add is_errata100 helper to fault_32|64.c
Further towards unifying these files, add another helper
in same spirit as is_errata93.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: add is_f00f_bug helper to fault_32|64.c
Further towards unifying these files, add another helper
in same spirit as is_errata93.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: make ioremap() UC by default
Yes! A mere 120 c_p_a() fixing and rewriting patches later,
we are now confident that we can enable UC by default for
ioremap(), on x86 too.
Every other architectures was doing this already. Doing so
makes Linux more robust against MTRR mixups (which might go
unnoticed if BIOS writers test other OSs only - where PAT
might override bad MTRRs defaults).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: cpa cleanup the 64-bit alias math
Cleanup the address calculations, which are necessary to identify the
high/low alias mappings of the kernel on 64 bit machines. Instead of
calling __pa/__va back and forth, calculate the physical address once
and base the other calculations on it. Add understandable constants so
we can use the already available within() helper. Also add comments,
which help mere mortals to understand what this code does.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: cpa: fix the self-test
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: rodata config hookup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: enable CONFIG_DEBUG_PAGEALLOC more widely
make CONFIG_DEBUG_PAGEALLOC universally available.
CONFIG_HIBERNATION and CONFIG_HUGETLBFS was disabling it, for no
particular reason.
If there are any unfixed bugs here we'll fix it, but do not disable
vital debugging facilities like that ..
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: init memory debugging
debug incorrect/late access to init memory, by permanently unmapping
the init memory ranges. Depends on CONFIG_DEBUG_PAGEALLOC=y.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: move misplaced rodata check call
It looks like a mismerge put the rodata self-check in the wrong spot; move
it to the right place after marking the .rodata section read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:09 +0000 (13:34 +0100)]
x86: fix clflush_page_range logic
only present ptes must be flushed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: optimize clflush
clflush is sufficient to be issued on one CPU. The invalidation is
broadcast throughout the coherence domain.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: clflush_page_range needs mfence
clflush is an unordered operation with respect to other memory
traffic, including other CLFLUSH instructions. This needs proper
fencing with mfence.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa: rename global_flush_tlb() to cpa_flush_all()
The function name global_flush_tlb() suggests something different from
what the function really does. Rename it to cpa_flush_all(), which is an
understandable counterpart to cpa_flush_range().
no global visibility of the old API anymore.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa: implement clflush optimization
Use clflush on CPUs which support this.
clflush is only used when the page attribute operation has been
successful. On CPUs which do not support clflush and in the case of
error the old fashioned global_flush_tlb() is called.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa use the new set_clr function
Convert cpa_set and cpa_clear to call the new set_clr function.
Seperate out the debug helpers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa create set_and_clr function
Create a set_and_clr function to avoid the duplicate loops. Allows
also to do combined operations for optimization.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa move the flush into set and clear functions
To avoid the modification of the flush code for the clflush
implementation, move the flush into the set and clear functions and
provide helper functions for the debugging code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: add testcases for RODATA and NX protections/attributes
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: ioremap KERN_INFO
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:08 +0000 (13:34 +0100)]
x86: cpa: clean up change_page_attr_set/clear()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: cpa: fix loop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: cpa: fix split thinko
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: make sure initmem is writable
When we free initmem, various rodata and CPA checks may have left
memory read only.. this patch ensures that the memory is writable
before we free it.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: fix pageattr-selftest
In Ingo's testing, he found a bug in the CPA selftest code. What would
happen is that the test would call change_page_attr_addr on a range of
memory, part of which was read only, part of which was writable. The
only thing the test wanted to change was the global bit...
What actually happened was that the selftest would take the permissions
of the first page, and then the change_page_attr_addr call would then
set the permissions of the entire range to this first page. In the
rodata section case, this resulted in pages after the .rodata becoming
read only... which made the kernel rather unhappy in many interesting
ways.
This is just another example of how dangerous the cpa API is (was); this
patch changes the test to use the incremental clear/set APIs
instead, and it changes the clear/set implementation to work on a 1 page
at a time basis.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: remove flush_agp_mappings()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: cpa: move flush to cpa
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: make various pageattr.c functions static
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:07 +0000 (13:34 +0100)]
x86: cpa: set_memory_notpresent()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: cpa: convert ioremap to new API
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: fix ioremap API
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: fix ioremap RAM check
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: fix the missing BIOS area check in page_is_ram
page_is_ram has a FIXME since ages, which reminds to sanity check the
BIOS area between 640k and 1M, which is sometimes falsely reported as
RAM in the e820 tables.
Implement the sanity check. Move the BIOS range defines from
pageattr.c into e820.h to avoid duplicate defines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: move page_is_ram() function
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: deprecate change_page_attr() for drivers
With the introduction of the new API, no driver or non-archcore code needs
to use c-p-a anymore, so this patch also deprecates the EXPORT_SYMBOL of CPA
(it's a horrible API after all).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: convert CPA users to the new set_page_ API
This patch converts various users of change_page_attr() to the new,
more intent driven set_page_*/set_memory_* API set.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: a new API for drivers/etc to control cache and other page attributes
Right now, if drivers or other code want to change, say, a cache attribute of a
page, the only API they have is change_page_attr(). c-p-a is a really bad API
for this, because it forces the caller to know *ALL* the attributes he wants
for the page, not just the 1 thing he wants to change. So code that wants to
set a page uncachable, needs to be aware of the NX status as well etc etc etc.
This patch introduces a set of new APIs for this, set_pages_<attr> and
set_memory_<attr>, that offer a logical change to the user, and leave all
attributes not implied by the requested logical change alone.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 30 Jan 2008 12:34:06 +0000 (13:34 +0100)]
x86: cpa: move clflush_cache_range()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: unify ioremap_32 and _64
Unify the now identical ioremap_32.c and ioremap_64.c into the
same ioremap.c file. No code changed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: unify ioremap
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: use remove_vm_are in ioremap_32 error path
When ioremap_page_range fails, then we can use remove_vm_area instead
of vunmap safely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: __iomem annotations
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: switch to change_page_attr_addr in ioremap_32.c
Use change_page_attr_addr() instead of change_page_attr(), which
simplifies the code significantly and matches the 64bit
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: make c_p_a unconditional in ioremap
Make c_p_a unconditional for ioremap and iounmap. This ensures
complete consistency of the flags which are handed to
ioremap_page_range and the real flags in the mappings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: introduce max_pfn_mapped
64bit uses end_pfn_map and 32bit uses max_low_pfn. There are several
files which have #ifdef'ed defines which map either to end_pfn_map or
max_low_pfn. Replace this by a universal define and clean up all the
other instances.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: cleanup ioremap includes
Get rid of the douplicate define of ISA_START/END_ADDRESS and use the
same headers in 32 and 64 bit code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: style cleanup of ioremap code
Fix the coding style before going further.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:05 +0000 (13:34 +0100)]
x86: fix ioremap pgprot inconsistency
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: fix ioremap pgprot inconsistency
The pgprot flags which are handed into ioremap_page_range() are
different to those which are set in change_page_attr(). The
ioremap_page_range flags are executable, while the c_p_a flags are
not. Also make the mappings global (which is a NOP currently on 32bit,
although CPUs from PPRO+ onwards support it, but that's a separate
fix.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Arjan van de Ven [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: turn the check_exec function into function that
What the check_exec() function really is trying to do is enforce certain
bits in the pgprot that are required by the x86 architecture, but that
callers might not be aware of (such as NX bit exclusion of the BIOS
area for BIOS based PCI access; it's not uncommon to ioremap the BIOS
region for various purposes and normally ioremap() memory has the NX bit
set).
This patch turns the check_exec() function into static_protections()
which also is now used to make sure the kernel text area remains non-NX
and that the .rodata section remains read-only. If the architecture
ends up requiring more such mandatory prot settings for specific areas,
this is now a reasonable place to add these.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: cpa: make self-test depend on DEBUG_KERNEL
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Huang, Ying [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: ioremap_nocache fix
This patch fixes a bug of ioremap_nocache. ioremap_nocache() will call
__ioremap() with flags != 0 to do the real work, which will call
change_page_attr_addr() if phys_addr + size - 1 < (end_pfn_map << PAGE_SHIFT).
But some pages between 0 ~ end_pfn_map << PAGE_SHIFT are not mapped by
identity map, this will make change_page_attr_addr failed.
This patch is based on latest x86 git and has been tested on x86_64 platform.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: add PAGE_KERNEL_EXEC_NOCACHE
add PAGE_KERNEL_EXEC_NOCACHE.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Huang, Ying [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: fix NX bit handling in change_page_attr()
This patch fixes a bug of change_page_attr/change_page_attr_addr on
Intel i386/x86_64 CPUs. After changing page attribute to be
executable with these functions, the page remains un-executable on
Intel i386/x86_64 CPU. Because on Intel i386/x86_64 CPU, only if the
"NX" bits of all three level page tables are cleared (PAE is enabled),
the corresponding page is executable (refer to section 4.13.2 of Intel
64 and IA-32 Architectures Software Developer's Manual). So, the bug
is fixed through clearing the "NX" bit of PMD when splitting the huge
PMD.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: change cpa to pfn based
change CPA to pfn based.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Wed, 30 Jan 2008 12:34:04 +0000 (13:34 +0100)]
x86: keep the BIOS area executable
keep the BIOS area executable.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>