commit 7c017f897e601aced95b71521bb0eb58af9002d5 Author: Greg Kroah-Hartman Date: Sat Mar 3 10:17:11 2018 +0100 Linux 3.18.98 commit 62a101cde990623dbe9b9f8370ac5294f1da3229 Author: Yangbo Lu Date: Tue Jan 9 11:02:33 2018 +0800 net: gianfar_ptp: move set_fipers() to spinlock protecting area [ Upstream commit 11d827a993a969c3c6ec56758ff63a44ba19b466 ] set_fipers() calling should be protected by spinlock in case that any interrupt breaks related registers setting and the function we expect. This patch is to move set_fipers() to spinlock protecting area in ptp_gianfar_adjtime(). Signed-off-by: Yangbo Lu Acked-by: Richard Cochran Reviewed-by: Fabio Estevam Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit f895445fad01fb69df79aae897cb31ff7967a5d8 Author: Marcelo Ricardo Leitner Date: Mon Jan 8 19:02:29 2018 -0200 sctp: make use of pre-calculated len [ Upstream commit c76f97c99ae6d26d14c7f0e50e074382bfbc9f98 ] Some sockopt handling functions were calculating the length of the buffer to be written to userspace and then calculating it again when actually writing the buffer, which could lead to some write not using an up-to-date length. This patch updates such places to just make use of the len variable. Also, replace some sizeof(type) to sizeof(var). Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 99a318ec9d85150ca6f4834c1ec40bf816cec2fb Author: Ross Lagerwall Date: Tue Jan 9 12:10:22 2018 +0000 xen/gntdev: Fix partial gntdev_mmap() cleanup [ Upstream commit cf2acf66ad43abb39735568f55e1f85f9844e990 ] When cleaning up after a partially successful gntdev_mmap(), unmap the successfully mapped grant pages otherwise Xen will kill the domain if in debug mode (Attempt to implicitly unmap a granted PTE) or Linux will kill the process and emit "BUG: Bad page map in process" if Xen is in release mode. This is only needed when use_ptemod is true because gntdev_put_map() will unmap grant pages itself when use_ptemod is false. Signed-off-by: Ross Lagerwall Reviewed-by: Boris Ostrovsky Signed-off-by: Boris Ostrovsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8e43167ca9876f2e1ef65ac40b533440c9ab2e27 Author: Ross Lagerwall Date: Tue Jan 9 12:10:21 2018 +0000 xen/gntdev: Fix off-by-one error when unmapping with holes [ Upstream commit 951a010233625b77cde3430b4b8785a9a22968d1 ] If the requested range has a hole, the calculation of the number of pages to unmap is off by one. Fix it. Signed-off-by: Ross Lagerwall Reviewed-by: Boris Ostrovsky Signed-off-by: Boris Ostrovsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 56cf3902c1795e0180d8ce221562784fb5c1b85b Author: Sergei Shtylyov Date: Sat Jan 6 21:53:26 2018 +0300 SolutionEngine771x: fix Ether platform data [ Upstream commit 195e2addbce09e5afbc766efc1e6567c9ce840d3 ] The 'sh_eth' driver's probe() method would fail on the SolutionEngine7710 board and crash on SolutionEngine7712 board as the platform code is hopelessly behind the driver's platform data -- it passes the PHY address instead of 'struct sh_eth_plat_data *'; pass the latter to the driver in order to fix the bug... Fixes: 71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") Signed-off-by: Sergei Shtylyov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit eb8936ae670aef8d6efa447a82e4acc48f6a304b Author: Christophe JAILLET Date: Sat Jan 6 09:00:09 2018 +0100 mdio-sun4i: Fix a memory leak [ Upstream commit 56c0290202ab94a2f2780c449395d4ae8495fab4 ] If the probing of the regulator is deferred, the memory allocated by 'mdiobus_alloc_size()' will be leaking. It should be freed before the next call to 'sun4i_mdio_probe()' which will reallocate it. Fixes: 4bdcb1dd9feb ("net: Add MDIO bus driver for the Allwinner EMAC") Signed-off-by: Christophe JAILLET Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 8094b1a3fa737e7f3515ae8b5286c65964d1a51f Author: Eduardo Otubo Date: Fri Jan 5 09:42:16 2018 +0100 xen-netfront: enable device after manual module load [ Upstream commit b707fda2df4070785d0fa8a278aa13944c5f51f8 ] When loading the module after unloading it, the network interface would not be enabled and thus wouldn't have a backend counterpart and unable to be used by the guest. The guest would face errors like: [root@guest ~]# ethtool -i eth0 Cannot get driver information: No such device [root@guest ~]# ifconfig eth0 eth0: error fetching interface information: Device not found This patch initializes the state of the netfront device whenever it is loaded manually, this state would communicate the netback to create its device and establish the connection between them. Signed-off-by: Eduardo Otubo Reviewed-by: Boris Ostrovsky Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 4108c7767f28ed381d7c70a5a89884b97d0c34cc Author: Xiongwei Song Date: Tue Jan 2 21:24:55 2018 +0800 drm/ttm: check the return value of kzalloc [ Upstream commit 19d859a7205bc59ffc38303eb25ae394f61d21dc ] In the function ttm_page_alloc_init, kzalloc call is made for variable _manager, we need to check its return value, it may return NULL. Signed-off-by: Xiongwei Song Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 154770b1dc1eeaf3d4db433b35621c2a4844e80d Author: Tushar Dave Date: Wed Dec 6 02:26:29 2017 +0530 e1000: fix disabling already-disabled warning [ Upstream commit 0b76aae741abb9d16d2c0e67f8b1e766576f897d ] This patch adds check so that driver does not disable already disabled device. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 0000:00:03.0: disabling already-disabled device [ 45.013447] ------------[ cut here ]------------ [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: ffff88011bee9e40 task.stack: ffffc90000860000 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: 0000:ffffc90000863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0000000000000035 RBX: ffff88013a230008 RCX: 0000000000000000 [ 45.020182] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000203 [ 45.021084] RBP: ffff88013a3f31e8 R08: 0000000000000001 R09: 0000000000000000 [ 45.021986] R10: ffffffff827ec29c R11: 0000000000000002 R12: 0000000000000001 [ 45.022946] R13: ffff88013a230008 R14: ffff880117802b20 R15: ffffc90000863e8f [ 45.023842] FS: 0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 [ 45.024863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.025583] CR2: ffffc900006d4000 CR3: 000000000220f000 CR4: 00000000000006a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Signed-off-by: Tushar Dave Tested-by: Fengguang Wu Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9d3948996ee2b774435967809b7c2e37d2b536ff Author: Aliaksei Karaliou Date: Thu Dec 21 13:18:26 2017 -0800 xfs: quota: check result of register_shrinker() [ Upstream commit 3a3882ff26fbdbaf5f7e13f6a0bccfbf7121041d ] xfs_qm_init_quotainfo() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: Aliaksei Karaliou [darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos] Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ac086d5fe75a5c56a78f2500bd6dcbf9137fa2f4 Author: Aliaksei Karaliou Date: Thu Dec 21 13:18:26 2017 -0800 xfs: quota: fix missed destroy of qi_tree_lock [ Upstream commit 2196881566225f3c3428d1a5f847a992944daa5b ] xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock while destroys quotainfo->qi_quotaofflock. Signed-off-by: Aliaksei Karaliou Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 96dd200101791418675e7250b88828d71a46e80f Author: Stefan Haberland Date: Wed Dec 6 10:30:39 2017 +0100 s390/dasd: fix wrongly assigned configuration data [ Upstream commit 8a9bd4f8ebc6800bfc0596e28631ff6809a2f615 ] We store per path and per device configuration data to identify the path or device correctly. The per path configuration data might get mixed up if the original request gets into error recovery and is started with a random path mask. This would lead to a wrong identification of a path in case of a CUIR event for example. Fix by copying the path mask from the original request to the error recovery request in case it is a path verification request. Signed-off-by: Stefan Haberland Reviewed-by: Jan Hoeppner Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 20f6d9c2af33da892a0e03ffd6249c7ab81edfb7 Author: Matthieu CASTET Date: Tue Dec 12 11:10:44 2017 +0100 led: core: Fix brightness setting when setting delay_off=0 [ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ] With the current code, the following sequence won't work : echo timer > trigger echo 0 > delay_off * at this point we call ** led_delay_off_store ** led_blink_set commit c85a6e7bdea9a9532c1e68c1c472082bf88bc210 Author: Guilherme G. Piccoli Date: Fri Dec 22 13:01:39 2017 -0200 bnx2x: Improve reliability in case of nested PCI errors [ Upstream commit f7084059a9cb9e56a186e1677b1dcffd76c2cd24 ] While in recovery process of PCI error (called EEH on PowerPC arch), another PCI transaction could be corrupted causing a situation of nested PCI errors. Also, this scenario could be reproduced with error injection mechanisms (for debug purposes). We observe that in case of nested PCI errors, bnx2x might attempt to initialize its shmem and cause a kernel crash due to bad addresses read from MCP. Multiple different stack traces were observed depending on the point the second PCI error happens. This patch avoids the crashes by: * failing PCI recovery in case of nested errors (since multiple PCI errors in a row are not expected to lead to a functional adapter anyway), and by, * preventing access to adapter FW when MCP is failed (we mark it as failed when shmem cannot get initialized properly). Reported-by: Abdul Haleem Signed-off-by: Guilherme G. Piccoli Acked-by: Shahed Shaikh Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit e93041fc495ba9b76b842b4edff57655e54e4fd0 Author: Siva Reddy Kallam Date: Fri Dec 22 16:05:29 2017 +0530 tg3: Enable PHY reset in MTU change path for 5720 [ Upstream commit e60ee41aaf898584205a6af5c996860d0fe6a836 ] A customer noticed RX path hang when MTU is changed on the fly while running heavy traffic with NCSI enabled for 5717 and 5719. Since 5720 belongs to same ASIC family, we observed same issue and same fix could solve this problem for 5720. Signed-off-by: Siva Reddy Kallam Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7862b4d6f8acde0386d17ee162e0ddb26faea751 Author: Siva Reddy Kallam Date: Fri Dec 22 16:05:28 2017 +0530 tg3: Add workaround to restrict 5762 MRRS to 2048 [ Upstream commit 4419bb1cedcda0272e1dc410345c5a1d1da0e367 ] One of AMD based server with 5762 hangs with jumbo frame traffic. This AMD platform has southbridge limitation which is restricting MRRS to 4000. As a work around, driver to restricts the MRRS to 2048 for this particular 5762 NX1 card. Signed-off-by: Siva Reddy Kallam Signed-off-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 05caa7ec533d19058d7978df5ce6d01507205d03 Author: Cathy Avery Date: Tue Dec 19 13:32:48 2017 -0500 scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error [ Upstream commit d1b8b2391c24751e44f618fcf86fb55d9a9247fd ] When an I/O is returned with an srb_status of SRB_STATUS_INVALID_LUN which has zero good_bytes it must be assigned an error. Otherwise the I/O will be continuously requeued and will cause a deadlock in the case where disks are being hot added and removed. sd_probe_async will wait forever for its I/O to complete while holding scsi_sd_probe_domain. Also returning the default error of DID_TARGET_FAILURE causes multipath to not retry the I/O resulting in applications receiving I/O errors before a failover can occur. Signed-off-by: Cathy Avery Signed-off-by: Long Li Reviewed-by: Stephen Hemminger Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 69712e5d5b312ad926191bf54bb69e945ffcad30 Author: Alexander Kochetkov Date: Fri Dec 15 20:20:06 2017 +0300 net: arc_emac: fix arc_emac_rx() error paths [ Upstream commit e688822d035b494071ecbadcccbd6f3325fb0f59 ] arc_emac_rx() has some issues found by code review. In case netdev_alloc_skb_ip_align() or dma_map_single() failure rx fifo entry will not be returned to EMAC. In case dma_map_single() failure previously allocated skb became lost to driver. At the same time address of newly allocated skb will not be provided to EMAC. Signed-off-by: Alexander Kochetkov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit bfde9f19290e7dce114d447b8ab23340e8bd049d Author: Radu Pirea Date: Fri Dec 15 17:40:17 2017 +0200 spi: atmel: fixed spin_lock usage inside atmel_spi_remove [ Upstream commit 66e900a3d225575c8b48b59ae1fe74bb6e5a65cc ] The only part of atmel_spi_remove which needs to be atomic is hardware reset. atmel_spi_stop_dma calls dma_terminate_all and this needs interrupts enabled. atmel_spi_release_dma calls dma_release_channel and dma_release_channel locks a mutex inside of spin_lock. So the call of these functions can't be inside a spin_lock. Reported-by: Jia-Ju Bai Signed-off-by: Radu Pirea Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7c7e39f79bad5cb368714ab746c0d578de79e493 Author: Al Viro Date: Mon Dec 18 15:05:07 2017 -0500 sget(): handle failures of register_shrinker() [ Upstream commit 9ee332d99e4d5a97548943b81c54668450ce641b ] Signed-off-by: Al Viro Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit a97c0dc1a45fbe574eeb713f063b2e3dac73e152 Author: Brendan McGrath Date: Wed Dec 13 22:14:57 2017 +1100 ipv6: icmp6: Allow icmp messages to be looped back [ Upstream commit 588753f1eb18978512b1c9b85fddb457d46f9033 ] One example of when an ICMPv6 packet is required to be looped back is when a host acts as both a Multicast Listener and a Multicast Router. A Multicast Router will listen on address ff02::16 for MLDv2 messages. Currently, MLDv2 messages originating from a Multicast Listener running on the same host as the Multicast Router are not being delivered to the Multicast Router. This is due to dst.input being assigned the default value of dst_discard. This results in the packet being looped back but discarded before being delivered to the Multicast Router. This patch sets dst.input to ip6_input to ensure a looped back packet is delivered to the Multicast Router. Signed-off-by: Brendan McGrath Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9d293792bfdd25f1c1a7aecc0152e2b2bc596e45 Author: Sascha Hauer Date: Tue Dec 5 11:51:40 2017 +0100 mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM [ Upstream commit fdf2e821052958a114618a95ab18a300d0b080cb ] When erased subpages are read then the BCH decoder returns STATUS_ERASED if they are all empty, or STATUS_UNCORRECTABLE if there are bitflips. When there are bitflips, we have to set these bits again to show the upper layers a completely erased page. When a bitflip happens in the exact byte where the bad block marker is, then this byte is swapped with another byte in block_mark_swapping(). The correction code then detects a bitflip in another subpage and no longer corrects the bitflip where it really happens. Correct this behaviour by calling block_mark_swapping() after the bitflips have been corrected. In our case UBIFS failed with this bug because it expects erased pages to be really empty: UBIFS error (pid 187): ubifs_scan: corrupt empty space at LEB 36:118735 UBIFS error (pid 187): ubifs_scanned_corruption: corruption at LEB 36:118735 UBIFS error (pid 187): ubifs_scanned_corruption: first 8192 bytes from LEB 36:118735 UBIFS error (pid 187): ubifs_scan: LEB 36 scanning failed UBIFS error (pid 187): do_commit: commit failed, error -117 Signed-off-by: Sascha Hauer Reviewed-by: Richard Weinberger Acked-by: Boris Brezillon Signed-off-by: Richard Weinberger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5ae0e9de0ac6728f8a29612ff216240945a25dc8 Author: Anna-Maria Gleixner Date: Thu Dec 21 11:41:35 2017 +0100 hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) commit 48d0c9becc7f3c66874c100c126459a9da0fdced upstream. The POSIX specification defines that relative CLOCK_REALTIME timers are not affected by clock modifications. Those timers have to use CLOCK_MONOTONIC to ensure POSIX compliance. The introduction of the additional HRTIMER_MODE_PINNED mode broke this requirement for pinned timers. There is no user space visible impact because user space timers are not using pinned mode, but for consistency reasons this needs to be fixed. Check whether the mode has the HRTIMER_MODE_REL bit set instead of comparing with HRTIMER_MODE_ABS. Signed-off-by: Anna-Maria Gleixner Cc: Christoph Hellwig Cc: John Stultz Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: keescook@chromium.org Fixes: 597d0275736d ("timers: Framework for identifying pinned timers") Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de Signed-off-by: Ingo Molnar Cc: Mike Galbraith Signed-off-by: Greg Kroah-Hartman commit 5900ecf33c2ff2329dc34167fca8405250743c71 Author: Jakub Sitnicki Date: Wed Jun 8 15:13:34 2016 +0200 ipv6: Skip XFRM lookup if dst_entry in socket cache is valid commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be upstream. At present we perform an xfrm_lookup() for each UDPv6 message we send. The lookup involves querying the flow cache (flow_cache_lookup) and, in case of a cache miss, creating an XFRM bundle. If we miss the flow cache, we can end up creating a new bundle and deriving the path MTU (xfrm_init_pmtu) from on an already transformed dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down to xfrm_lookup(). This can happen only if we're caching the dst_entry in the socket, that is when we're using a connected UDP socket. To put it another way, the path MTU shrinks each time we miss the flow cache, which later on leads to incorrectly fragmented payload. It can be observed with ESPv6 in transport mode: 1) Set up a transformation and lower the MTU to trigger fragmentation # ip xfrm policy add dir out src ::1 dst ::1 \ tmpl src ::1 dst ::1 proto esp spi 1 # ip xfrm state add src ::1 dst ::1 \ proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b # ip link set dev lo mtu 1500 2) Monitor the packet flow and set up an UDP sink # tcpdump -ni lo -ttt & # socat udp6-listen:12345,fork /dev/null & 3) Send a datagram that needs fragmentation with a connected socket # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345 2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error 00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448 00:00:00.000014 IP6 ::1 > ::1: frag (1448|32) 00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272 (^ ICMPv6 Parameter Problem) 00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136 4) Compare it to a non-connected socket # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345 00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448 00:00:00.000010 IP6 ::1 > ::1: frag (1448|64) What happens in step (3) is: 1) when connecting the socket in __ip6_datagram_connect(), we perform an XFRM lookup, miss the flow cache, create an XFRM bundle, and cache the destination, 2) afterwards, when sending the datagram, we perform an XFRM lookup, again, miss the flow cache (due to mismatch of flowi6_iif and flowi6_oif, which is an issue of its own), and recreate an XFRM bundle based on the cached (and already transformed) destination. To prevent the recreation of an XFRM bundle, avoid an XFRM lookup altogether whenever we already have a destination entry cached in the socket. This prevents the path MTU shrinkage and brings us on par with UDPv4. The fix also benefits connected PINGv6 sockets, another user of ip6_sk_dst_lookup_flow(), who also suffer messages being transformed twice. Joint work with Hannes Frederic Sowa. Reported-by: Jan Tluka Signed-off-by: Jakub Sitnicki Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Benedict Wong Signed-off-by: Greg Kroah-Hartman