commit 39773be9e3dfce1faf3a330bfda8f6d7a5824f9b Author: Greg Kroah-Hartman Date: Mon Nov 4 04:35:38 2013 -0800 Linux 3.11.7 commit 6aa9338a468dc3df23dd4a9310b90a5afec67c5e Author: Enrico Mioso Date: Tue Oct 15 15:06:47 2013 +0200 usb: serial: option: blacklist Olivetti Olicard200 commit fd8573f5828873343903215f203f14dc82de397c upstream. Interface 6 of this device speaks QMI as per tests done by us. Credits go to Antonella for providing the hardware. Signed-off-by: Enrico Mioso Signed-off-by: Antonella Pellizzari Tested-by: Dan Williams Signed-off-by: Greg Kroah-Hartman commit 8fe6e9028f16f47b136b74db964b7a9d4bdd739d Author: Greg Kroah-Hartman Date: Sat Oct 5 18:14:18 2013 -0700 USB: serial: option: add support for Inovia SEW858 device commit f4c19b8e165cff1a6607c21f8809441d61cab7ec upstream. This patch adds the device id for the Inovia SEW858 device to the option driver. Reported-by: Pavel Parkhomenko Tested-by: Pavel Parkhomenko Signed-off-by: Greg Kroah-Hartman commit c158ec4a6052b94840cb5c75cedbc2ecd1f450b1 Author: Diego Elio Pettenò Date: Tue Oct 8 20:03:37 2013 +0100 USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well. commit c9d09dc7ad106492c17c587b6eeb99fe3f43e522 upstream. Without this change, the USB cable for Freestyle Option and compatible glucometers will not be detected by the driver. Signed-off-by: Diego Elio Pettenò Signed-off-by: Greg Kroah-Hartman commit 7c0cc784b789f951372fd8859b40969fec27810e Author: Roel Kluin Date: Mon Oct 14 23:21:15 2013 +0200 serial: vt8500: add missing braces commit d969de8d83401683420638c8107dcfedb2146f37 upstream. Due to missing braces on an if statement, in presence of a device_node a port was always assigned -1, regardless of any alias entries in the device tree. Conversely, if device_node was NULL, an unitialized port ended up being used. This patch adds the missing braces, fixing the issues. Signed-off-by: Roel Kluin Acked-by: Tony Prisk Signed-off-by: Greg Kroah-Hartman commit 60fd01d9c7c5114e2bafaf5c57d2300aa9559c02 Author: Solomon Peachy Date: Wed Oct 9 12:15:11 2013 -0400 wireless: cw1200: acquire hwbus lock around cw1200_irq_handler() call. commit 4978705d26149a629b9f50ff221caed6f1ae3048 upstream. This fixes "lost interrupt" problems that occurred on SPI-based systems. cw1200_irq_handler() expects the hwbus to be locked, but on the SPI-path, that lock wasn't taken (unlike in the SDIO-path, where the generic SDIO-code takes care of acquiring the lock). Signed-off-by: David Mosberger Signed-off-by: Solomon Peachy Signed-off-by: John W. Linville Signed-off-by: Greg Kroah-Hartman commit 4e30af352149c4d57d1857c2d734908917b08dac Author: Johannes Berg Date: Fri Oct 11 14:47:05 2013 +0200 wireless: radiotap: fix parsing buffer overrun commit f5563318ff1bde15b10e736e97ffce13be08bc1a upstream. When parsing an invalid radiotap header, the parser can overrun the buffer that is passed in because it doesn't correctly check 1) the minimum radiotap header size 2) the space for extended bitmaps The first issue doesn't affect any in-kernel user as they all check the minimum size before calling the radiotap function. The second issue could potentially affect the kernel if an skb is passed in that consists only of the radiotap header with a lot of extended bitmaps that extend past the SKB. In that case a read-only buffer overrun by at most 4 bytes is possible. Fix this by adding the appropriate checks to the parser. Reported-by: Evan Huus Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 832307cb856d0e02740f255d90c29e31fba47606 Author: Hans-Frieder Vogt Date: Sun Oct 6 21:13:40 2013 +0200 w1 - call request_module with w1 master mutex unlocked commit bc04d76d6942068f75c10790072280b847ec6f1f upstream. request_module for w1 slave modules needs to be called with the w1 master mutex unlocked. Because w1_attach_slave_device gets always(?) called with mutex locked, we need to temporarily unlock the w1 master mutex for the loading of the w1 slave module. Signed-off by: Hans-Frieder Vogt Acked-by: Evgeniy Polyakov Signed-off-by: Greg Kroah-Hartman commit 7c0c220dc74fbc153c5c5c8383169c0915d22286 Author: Fengguang Wu Date: Wed Oct 16 13:47:03 2013 -0700 writeback: fix negative bdi max pause commit e3b6c655b91e01a1dade056cfa358581b47a5351 upstream. Toralf runs trinity on UML/i386. After some time it hangs and the last message line is BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521] It's found that pages_dirtied becomes very large. More than 1000000000 pages in this case: period = HZ * pages_dirtied / task_ratelimit; BUG_ON(pages_dirtied > 2000000000); BUG_ON(pages_dirtied > 1000000000); <--------- UML debug printf shows that we got negative pause here: ick: pause : -984 ick: pages_dirtied : 0 ick: task_ratelimit: 0 pause: + if (pause < 0) { + extern int printf(char *, ...); + printf("ick : pause : %li\n", pause); + printf("ick: pages_dirtied : %lu\n", pages_dirtied); + printf("ick: task_ratelimit: %lu\n", task_ratelimit); + BUG_ON(1); + } trace_balance_dirty_pages(bdi, Since pause is bounded by [min_pause, max_pause] where min_pause is also bounded by max_pause. It's suspected and demonstrated that the max_pause calculation goes wrong: ick: pause : -717 ick: min_pause : -177 ick: max_pause : -717 ick: pages_dirtied : 14 ick: task_ratelimit: 0 The problem lies in the two "long = unsigned long" assignments in bdi_max_pause() which might go negative if the highest bit is 1, and the min_t(long, ...) check failed to protect it falling under 0. Fix all of them by using "unsigned long" throughout the function. Signed-off-by: Fengguang Wu Reported-by: Toralf Förster Tested-by: Toralf Förster Reviewed-by: Jan Kara Cc: Richard Weinberger Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 40e806494937a96e21b01b3a72c212c4a8e45c01 Author: David Henningsson Date: Mon Oct 14 10:16:22 2013 +0200 ALSA: hda - Fix inverted internal mic not indicated on some machines commit ccb041571b73888785ef7828a276e380125891a4 upstream. The create_bind_cap_vol_ctl does not create any control indicating that an inverted dmic is present. Therefore, create multiple capture volumes in this scenario, so we always have some indication that the internal mic is inverted. This happens on the Lenovo Ideapad U310 as well as the Lenovo Yoga 13 (both are based on the CX20590 codec), but the fix is generic and could be needed for other codecs/machines too. Thanks to Szymon Acedański for the pointer and a draft patch. BugLink: https://bugs.launchpad.net/bugs/1239392 BugLink: https://bugs.launchpad.net/bugs/1227491 Reported-by: Szymon Acedański Signed-off-by: David Henningsson Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit f4d891400a11576d76a7dfd0ff62c17fd371800c Author: Takashi Iwai Date: Mon Oct 14 16:02:15 2013 +0200 ALSA: us122l: Fix pcm_usb_stream mmapping regression commit ac536a848a1643e4b87e8fbd376a63091afc2ccc upstream. The pcm_usb_stream plugin requires the mremap explicitly for the read buffer, as it expands itself once after reading the required size. But the commit [314e51b9: mm: kill vma flag VM_RESERVED and mm->reserved_vm counter] converted blindly to a combination of VM_DONTEXPAND | VM_DONTDUMP like other normal drivers, and this resulted in the failure of mremap(). For fixing this regression, we need to remove VM_DONTEXPAND for the read-buffer mmap. Reported-and-tested-by: James Miller Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 08899aecf62e61328acbadd35b8c98dbda7a5ec4 Author: Hugh Dickins Date: Wed Oct 16 13:47:08 2013 -0700 mm: fix BUG in __split_huge_page_pmd commit 750e8165f5e87b6a142be953640eabb13a9d350a upstream. Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED). It's invalid: we don't always have down_write of mmap_sem there: a racing do_huge_pmd_wp_page() might have copied-on-write to another huge page before our split_huge_page() got the anon_vma lock. Forget the BUG_ON, just go back and try again if this happens. Signed-off-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Naoya Horiguchi Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4e31a0e20aa1788668b0322396c0e16e06b302da Author: Weijie Yang Date: Wed Oct 16 13:46:54 2013 -0700 mm/zswap: bugfix: memory leak when re-swapon commit aa9bca05a467c61dcea4142b2877d5392de5bdce upstream. zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon, so a memory leak occurs. Free the memory of zswap_tree in zswap_frontswap_invalidate_area(). Signed-off-by: Weijie Yang Reviewed-by: Bob Liu Cc: Minchan Kim Reviewed-by: Minchan Kim From: Weijie Yang Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently Signed-off-by: Greg Kroah-Hartman Consider the following scenario: thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page) thread 1: call zswap_frontswap_invalidate_page to invalidate entry x. finished, entry x and its zbud is not freed as its refcount != 0 now, the swap_map[x] = 0 thread 0: now call zswap_get_swap_cache_page swapcache_prepare return -ENOENT because entry x is not used any more zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM zswap_writeback_entry do nothing except put refcount Now, the memory of zswap_entry x and its zpage leak. Modify: - check the refcount in fail path, free memory if it is not referenced. - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path can be not only caused by nomem but also by invalidate. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang Reviewed-by: Bob Liu Cc: Minchan Kim Cc: Acked-by: Seth Jennings Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds commit 72d2ddf5f2218861f41b7d1a9da4c130d6d81911 Author: Cyrill Gorcunov Date: Wed Oct 16 13:46:51 2013 -0700 mm: migration: do not lose soft dirty bit if page is in migration state commit c3d16e16522fe3fe8759735850a0676da18f4b1d upstream. If page migration is turned on in config and the page is migrating, we may lose the soft dirty bit. If fork and mprotect are called on migrating pages (once migration is complete) pages do not obtain the soft dirty bit in the correspond pte entries. Fix it adding an appropriate test on swap entries. Signed-off-by: Cyrill Gorcunov Cc: Pavel Emelyanov Cc: Andy Lutomirski Cc: Matt Mackall Cc: Xiao Guangrong Cc: Marcelo Tosatti Cc: KOSAKI Motohiro Cc: Stephen Rothwell Cc: Peter Zijlstra Cc: "Aneesh Kumar K.V" Cc: Naoya Horiguchi Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit c0329006f4b3c7ab4dde75aa003de50982a44b91 Author: James Ralston Date: Tue Sep 24 16:47:55 2013 -0700 i2c: ismt: initialize DMA buffer commit bf4169100c909667ede6af67668b3ecce6928343 upstream. This patch adds code to initialize the DMA buffer to compensate for possible hardware data corruption. Signed-off-by: James Ralston [wsa: changed to use 'sizeof'] Signed-off-by: Wolfram Sang Cc: Jean Delvare Signed-off-by: Greg Kroah-Hartman commit b04b881391b7779fe7af1110361d4a6487391000 Author: Mikulas Patocka Date: Wed Oct 16 03:17:47 2013 +0100 dm snapshot: fix data corruption commit e9c6a182649f4259db704ae15a91ac820e63b0ca upstream. This patch fixes a particular type of data corruption that has been encountered when loading a snapshot's metadata from disk. When we allocate a new chunk in persistent_prepare, we increment ps->next_free and we make sure that it doesn't point to a metadata area by further incrementing it if necessary. When we load metadata from disk on device activation, ps->next_free is positioned after the last used data chunk. However, if this last used data chunk is followed by a metadata area, ps->next_free is positioned erroneously to the metadata area. A newly-allocated chunk is placed at the same location as the metadata area, resulting in data or metadata corruption. This patch changes the code so that ps->next_free skips the metadata area when metadata are loaded in function read_exceptions. The patch also moves a piece of code from persistent_prepare_exception to a separate function skip_metadata to avoid code duplication. CVE-2013-4299 Signed-off-by: Mikulas Patocka Cc: Mike Snitzer Signed-off-by: Alasdair G Kergon Signed-off-by: Greg Kroah-Hartman commit a4e22a835e37fd78fb3bd91efb29b26d0281f264 Author: Mika Westerberg Date: Tue Oct 1 17:35:43 2013 +0300 gpio/lynxpoint: check if the interrupt is enabled in IRQ handler commit 03d152d5582abc8a1c19cb107164c3724bbd4be4 upstream. Checking LP_INT_STAT is not enough in the interrupt handler because its contents get updated regardless of whether the pin has interrupt enabled or not. This causes the driver to loop forever for GPIOs that are pulled up. Fix this by checking the interrupt enable bit for the pin as well. Signed-off-by: Mika Westerberg Acked-by: Mathias Nyman Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit 9f7cc4574847fc72e194bda5803e1541e65b796c Author: Miklos Szeredi Date: Thu Oct 10 16:48:19 2013 +0200 ext[34]: fix double put in tmpfile commit 43ae9e3fc70ca0057ae0a24ef5eedff05e3fae06 upstream. d_tmpfile() already swallowed the inode ref. Signed-off-by: Miklos Szeredi Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit b0e0b97c59f6116b12cd33c59aa803a8aac1f6e0 Author: Linus Walleij Date: Mon Oct 7 15:19:53 2013 +0200 ARM: integrator: deactivate timer0 on the Integrator/CP commit 29114fd7db2fc82a34da8340d29b8fa413e03dca upstream. This fixes a long-standing Integrator/CP regression from commit 870e2928cf3368ca9b06bc925d0027b0a56bcd8e "ARM: integrator-cp: convert use CLKSRC_OF for timer init" When this code was introduced, the both aliases pointing the system to use timer1 as primary (clocksource) and timer2 as secondary (clockevent) was ignored, and the system would simply use the first two timers found as clocksource and clockevent. However this made the system timeline accelerate by a factor x25, as it turns out that the way the clocking actually works (totally undocumented and found after some trial-and-error) is that timer0 runs @ 25MHz and timer1 and timer2 runs @ 1MHz. Presumably this divider setting is a boot-on default and configurable albeit the way to configure it is not documented. So as a quick fix to the problem, let's mark timer0 as disabled, so the code will chose timer1 and timer2 as it used to. This also deletes the two aliases for the primary and secondary timer as they have been superceded by the auto-selection Cc: Rob Herring Cc: Russell King Signed-off-by: Linus Walleij Signed-off-by: Olof Johansson Signed-off-by: Greg Kroah-Hartman commit 3ec722ae43d74f5ab13e89999245c1ce862fe3ea Author: AKASHI Takahiro Date: Wed Oct 9 15:58:29 2013 +0100 ARM: 7851/1: check for number of arguments in syscall_get/set_arguments() commit 3c1532df5c1b54b5f6246cdef94eeb73a39fe43a upstream. In ftrace_syscall_enter(), syscall_get_arguments(..., 0, n, ...) if (i == 0) { ...; n--;} memcpy(..., n * sizeof(args[0])); If 'number of arguments(n)' is zero and 'argument index(i)' is also zero in syscall_get_arguments(), none of arguments should be copied by memcpy(). Otherwise 'n--' can be a big positive number and unexpected amount of data will be copied. Tracing system calls which take no argument, say sync(void), may hit this case and eventually make the system corrupted. This patch fixes the issue both in syscall_get_arguments() and syscall_set_arguments(). Acked-by: Will Deacon Signed-off-by: AKASHI Takahiro Signed-off-by: Will Deacon Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 9b013b05b356b9ede635bb580ab9e21a7d2c6382 Author: Mariusz Ceier Date: Mon Oct 21 19:45:04 2013 +0200 davinci_emac.c: Fix IFF_ALLMULTI setup [ Upstream commit d69e0f7ea95fef8059251325a79c004bac01f018 ] When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't, emac_dev_mcast_set should only enable RX of multicasts and reset MACHASH registers. It does this, but afterwards it either sets up multicast MACs filtering or disables RX of multicasts and resets MACHASH registers again, rendering IFF_ALLMULTI flag useless. This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set. Tested with kernel 2.6.37. Signed-off-by: Mariusz Ceier Acked-by: Mugunthan V N Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 35bb207b3d98b20fb2eb2e8d74f2c584b43ded4b Author: Hannes Frederic Sowa Date: Mon Oct 21 06:17:15 2013 +0200 ipv6: probe routes asynchronous in rt6_probe [ Upstream commit c2f17e827b419918c856131f592df9521e1a38e3 ] Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1881ed07b5e53e4d850fd1e58b715e1772592421 Author: Julian Anastasov Date: Sun Oct 20 15:43:05 2013 +0300 netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper [ Upstream commit 56e42441ed54b092d6c7411138ce60d049e7c731 ] Now when rt6_nexthop() can return nexthop address we can use it for proper nexthop comparison of directly connected destinations. For more information refer to commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper"). Signed-off-by: Julian Anastasov Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3125f8ba70a008cd14251eafbd22b5d23d90edf2 Author: Julian Anastasov Date: Sun Oct 20 15:43:04 2013 +0300 ipv6: fill rt6i_gateway with nexthop address [ Upstream commit 550bab42f83308c9d6ab04a980cc4333cef1c8fa ] Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 0c80c90ad1632d30f310d5b6f5397ef3eba245d6 Author: Julian Anastasov Date: Sun Oct 20 15:43:03 2013 +0300 ipv6: always prefer rt6i_gateway if present [ Upstream commit 96dc809514fb2328605198a0602b67554d8cce7b ] In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in ip6_finish_output2()." changed the behaviour of ip6_finish_output2() such that the recently introduced rt6_nexthop() is used instead of an assigned neighbor. As rt6_nexthop() prefers rt6i_gateway only for gatewayed routes this causes a problem for users like IPVS, xt_TEE and RAW(hdrincl) if they want to use different address for routing compared to the destination address. Another case is when redirect can create RTF_DYNAMIC route without RTF_GATEWAY flag, we ignore the rt6i_gateway in rt6_nexthop(). Fix the above problems by considering the rt6i_gateway if present, so that traffic routed to address on local subnet is not wrongly diverted to the destination address. Thanks to Simon Horman and Phil Oester for spotting the problematic commit. Thanks to Hannes Frederic Sowa for his review and help in testing. Reported-by: Phil Oester Reported-by: Mark Brooks Signed-off-by: Julian Anastasov Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9d65a3cc82f3824bfa145f869b1076b3941463d4 Author: Hannes Frederic Sowa Date: Tue Oct 22 00:07:47 2013 +0200 inet: fix possible memory corruption with UDP_CORK and UFO [ This is a simplified -stable version of a set of upstream commits. ] This is a replacement patch only for stable which does fix the problems handled by the following two commits in -net: "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9) "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b) Three frames are written on a corked udp socket for which the output netdevice has UFO enabled. If the first and third frame are smaller than the mtu and the second one is bigger, we enqueue the second frame with skb_append_datato_frags without initializing the gso fields. This leads to the third frame appended regulary and thus constructing an invalid skb. This fixes the problem by always using skb_append_datato_frags as soon as the first frag got enqueued to the skb without marking the packet as SKB_GSO_UDP. The problem with only two frames for ipv6 was fixed by "ipv6: udp packets following an UFO enqueued packet need also be handled by UFO" (2811ebac2521ceac84f2bdae402455baa6a7fb47). Cc: Jiri Pirko Cc: Eric Dumazet Cc: David Miller Signed-off-by: Hannes Frederic Sowa Signed-off-by: Greg Kroah-Hartman commit ed6b1f4d5446e510d42577f0c55c748ef9cd36c8 Author: Seif Mazareeb Date: Thu Oct 17 20:33:21 2013 -0700 net: fix cipso packet validation when !NETLABEL [ Upstream commit f2e5ddcc0d12f9c4c7b254358ad245c9dddce13b ] When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel crash in an SMP system, since the CPU executing this function will stall /not respond to IPIs. This problem can be reproduced by running the IP Stack Integrity Checker (http://isic.sourceforge.net) using the following command on a Linux machine connected to DUT: "icmpsic -s rand -d -r 123456" wait (1-2 min) Signed-off-by: Seif Mazareeb Acked-by: Paul Moore Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a9d080970b19eaebb8a4f03c6912cc16f238d166 Author: Daniel Borkmann Date: Thu Oct 17 22:51:31 2013 +0200 net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race [ Upstream commit 90c6bd34f884cd9cee21f1d152baf6c18bcac949 ] In the case of credentials passing in unix stream sockets (dgram sockets seem not affected), we get a rather sparse race after commit 16e5726 ("af_unix: dont send SCM_CREDENTIALS by default"). We have a stream server on receiver side that requests credential passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED on each spawned/accepted socket on server side to 1 first (as it's not inherited), it can happen that in the time between accept() and setsockopt() we get interrupted, the sender is being scheduled and continues with passing data to our receiver. At that time SO_PASSCRED is neither set on sender nor receiver side, hence in cmsg's SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534 (== overflow{u,g}id) instead of what we actually would like to see. On the sender side, here nc -U, the tests in maybe_add_creds() invoked through unix_stream_sendmsg() would fail, as at that exact time, as mentioned, the sender has neither SO_PASSCRED on his side nor sees it on the server side, and we have a valid 'other' socket in place. Thus, sender believes it would just look like a normal connection, not needing/requesting SO_PASSCRED at that time. As reverting 16e5726 would not be an option due to the significant performance regression reported when having creds always passed, one way/trade-off to prevent that would be to set SO_PASSCRED on the listener socket and allow inheriting these flags to the spawned socket on server side in accept(). It seems also logical to do so if we'd tell the listener socket to pass those flags onwards, and would fix the race. Before, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}}, msg_flags=0}, 0) = 5 After, strace: recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}}, msg_flags=0}, 0) = 5 Signed-off-by: Daniel Borkmann Cc: Eric Dumazet Cc: Eric W. Biederman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 668c036a5b31b514f4738f6239a44276178fb9bd Author: Vasundhara Volam Date: Thu Oct 17 11:47:14 2013 +0530 be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd [ Upstream commit 0fb88d61bc60779dde88b0fc268da17eb81d0412 ] It is a required field for all TX_CREATE cmd versions > 0. This fixes a driver initialization failure, caused by recent SH-R Firmwares (versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is not passed. Signed-off-by: Sathya Perla Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 00ab8cd727c68039a1f6b820abd7adb0ea64cf8a Author: Salva Peiró Date: Wed Oct 16 12:46:50 2013 +0200 wanxl: fix info leak in ioctl [ Upstream commit 2b13d06c9584b4eb773f1e80bbaedab9a1c344e1 ] The wanxl_ioctl() code fails to initialize the two padding bytes of struct sync_serial_settings after the ->loopback member. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Salva Peiró Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ac835dd77807529d243811066812169c56f49133 Author: Vlad Yasevich Date: Tue Oct 15 22:01:31 2013 -0400 sctp: Perform software checksum if packet has to be fragmented. [ Upstream commit d2dbbba77e95dff4b4f901fee236fef6d9552072 ] IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum. This causes problems if SCTP packets has to be fragmented and ipsummed has been set to PARTIAL due to checksum offload support. This condition can happen when retransmitting after MTU discover, or when INIT or other control chunks are larger then MTU. Check for the rare fragmentation condition in SCTP and use software checksum calculation in this case. CC: Fan Du Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4043074ae37743627b5bbdecf4d41e2a27bcb784 Author: Fan Du Date: Tue Oct 15 22:01:30 2013 -0400 sctp: Use software crc32 checksum when xfrm transform will happen. [ Upstream commit 27127a82561a2a3ed955ce207048e1b066a80a2a ] igb/ixgbe have hardware sctp checksum support, when this feature is enabled and also IPsec is armed to protect sctp traffic, ugly things happened as xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing up and pack the 16bits result in the checksum field). The result is fail establishment of sctp communication. Signed-off-by: Fan Du Cc: Neil Horman Cc: Steffen Klassert Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fd105ec514aa39232e839d5c34f6b9198ed77076 Author: Vlad Yasevich Date: Tue Oct 15 22:01:29 2013 -0400 net: dst: provide accessor function to dst->xfrm [ Upstream commit e87b3998d795123b4139bc3f25490dd236f68212 ] dst->xfrm is conditionally defined. Provide accessor funtion that is always available. Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit bbd7578868d2e51934ba592578d2d106a73ef648 Author: Vlad Yasevich Date: Tue Oct 15 14:57:45 2013 -0400 bridge: Correctly clamp MAX forward_delay when enabling STP [ Upstream commit 4b6c7879d84ad06a2ac5b964808ed599187a188d ] Commit be4f154d5ef0ca147ab6bcd38857a774133f5450 bridge: Clamp forward_delay when enabling STP had a typo when attempting to clamp maximum forward delay. It is possible to set bridge_forward_delay to be higher then permitted maximum when STP is off. When turning STP on, the higher then allowed delay has to be clamed down to max value. Signed-off-by: Vlad Yasevich CC: Herbert Xu CC: Stephen Hemminger Reviewed-by: Veaceslav Falico Acked-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 68ab7445f56cea1c3ad4de6777d31395323a74e2 Author: Jason Wang Date: Tue Oct 15 11:18:59 2013 +0800 virtio-net: refill only when device is up during setting queues [ Upstream commit 35ed159bfd96a7547ec277ed8b550c7cbd9841b6 ] We used to schedule the refill work unconditionally after changing the number of queues. This may lead an issue if the device is not up. Since we only try to cancel the work in ndo_stop(), this may cause the refill work still work after removing the device. Fix this by only schedule the work when device is up. The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2. (virtio-net: fix the race between channels setting and refill) Signed-off-by: Jason Wang Cc: Rusty Russell Cc: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cfc85a8e6612dbf742b518a6cd7ec3c637822d63 Author: Jason Wang Date: Tue Oct 15 11:18:58 2013 +0800 virtio-net: don't respond to cpu hotplug notifier if we're not ready [ Upstream commit 3ab098df35f8b98b6553edc2e40234af512ba877 ] We're trying to re-configure the affinity unconditionally in cpu hotplug callback. This may lead the issue during resuming from s3/s4 since - virt queues haven't been allocated at that time. - it's unnecessary since thaw method will re-configure the affinity. Fix this issue by checking the config_enable and do nothing is we're not ready. The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17 (virtio-net: reset virtqueue affinity when doing cpu hotplug). Acked-by: Michael S. Tsirkin Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Wanlong Gao Reviewed-by: Wanlong Gao Signed-off-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 433cc3e581bb79b9f4a2829141049a9d11d3a9d0 Author: Eric Dumazet Date: Sat Oct 12 14:08:34 2013 -0700 bnx2x: record rx queue for LRO packets [ Upstream commit 60e66fee56b2256dcb1dc2ea1b2ddcb6e273857d ] RPS support is kind of broken on bnx2x, because only non LRO packets get proper rx queue information. This triggers reorders, as it seems bnx2x like to generate a non LRO packet for segment including TCP PUSH flag : (this might be pure coincidence, but all the reorders I've seen involve segments with a PUSH) 11:13:34.335847 IP A > B: . 415808:447136(31328) ack 1 win 457 11:13:34.335992 IP A > B: . 447136:448560(1424) ack 1 win 457 11:13:34.336391 IP A > B: . 448560:479888(31328) ack 1 win 457 11:13:34.336425 IP A > B: P 511216:512640(1424) ack 1 win 457 11:13:34.336423 IP A > B: . 479888:511216(31328) ack 1 win 457 11:13:34.336924 IP A > B: . 512640:543968(31328) ack 1 win 457 11:13:34.336963 IP A > B: . 543968:575296(31328) ack 1 win 457 We must call skb_record_rx_queue() to properly give to RPS (and more generally for TX queue selection on forward path) the receive queue information. Similar fix is needed for skb_mark_napi_id(), but will be handled in a separate patch to ease stable backports. Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Cc: Eilon Greenstein Acked-by: Dmitry Kravkov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5d6463753fb915bbf3930bd8a3c5b2a864f7adeb Author: Mathias Krause Date: Mon Sep 30 22:03:07 2013 +0200 connector: use nlmsg_len() to check message length [ Upstream commit 162b2bedc084d2d908a04c93383ba02348b648b0 ] The current code tests the length of the whole netlink message to be at least as long to fit a cn_msg. This is wrong as nlmsg_len includes the length of the netlink message header. Use nlmsg_len() instead to fix this "off-by-NLMSG_HDRLEN" size check. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 214351333b67444e0a431ee349534f2cee1f8cef Author: Mathias Krause Date: Mon Sep 30 22:05:40 2013 +0200 unix_diag: fix info leak [ Upstream commit 6865d1e834be84ddd5808d93d5035b492346c64a ] When filling the netlink message we miss to wipe the pad field, therefore leak one byte of heap memory to userland. Fix this by setting pad to 0. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6b9588a38ee0dd352182d4f2f98cca7c5aa55680 Author: Salva Peiró Date: Fri Oct 11 12:50:03 2013 +0300 farsync: fix info leak in ioctl [ Upstream commit 96b340406724d87e4621284ebac5e059d67b2194 ] The fst_get_iface() code fails to initialize the two padding bytes of struct sync_serial_settings after the ->loopback member. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 49cf14bacf02f6d9047dc0bf642e3fa2a2e99206 Author: stephen hemminger Date: Sun Oct 6 15:16:49 2013 -0700 netem: free skb's in tree on reset [ Upstream commit ff704050f2fc0f3382b5a70bba56a51a3feca79d ] Netem can leak memory because packets get stored in red-black tree and it is not cleared on reset. Reported by: Сергеев Сергей Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 598b990bb1f1f6026dc35e7d6ca69e4cf30e52f4 Author: stephen hemminger Date: Sun Oct 6 15:15:33 2013 -0700 netem: update backlog after drop [ Upstream commit 638a52b801e40ed276ceb69b73579ad99365361a ] When packet is dropped from rb-tree netem the backlog statistic should also be updated. Reported-by: Сергеев Сергей Signed-off-by: Stephen Hemminger Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f1d1f701ef4efb986ccb9466aabea8e42bc3d384 Author: Eric Dumazet Date: Thu Oct 10 06:30:09 2013 -0700 l2tp: must disable bh before calling l2tp_xmit_skb() [ Upstream commit 455cc32bf128e114455d11ad919321ab89a2c312 ] François Cachereul made a very nice bug report and suspected the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from process context was not good. This problem was added by commit 6af88da14ee284aaad6e4326da09a89191ab6165 ("l2tp: Fix locking in l2tp_core.c"). l2tp_eth_dev_xmit() runs from BH context, so we must disable BH from other l2tp_xmit_skb() users. [ 452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662] [ 452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan] [ 452.064012] CPU 1 [ 452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643] [ 452.080015] CPU 2 [ 452.080015] [ 452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs [ 452.080015] RIP: 0010:[] [] do_raw_spin_lock+0x17/0x1f [ 452.080015] RSP: 0018:ffff88007125fc18 EFLAGS: 00000293 [ 452.080015] RAX: 000000000000aba9 RBX: ffffffff811d0703 RCX: 0000000000000000 [ 452.080015] RDX: 00000000000000ab RSI: ffff8800711f6896 RDI: ffff8800745c8110 [ 452.080015] RBP: ffff88007125fc18 R08: 0000000000000020 R09: 0000000000000000 [ 452.080015] R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000286 [ 452.080015] R13: 0000000000000020 R14: 0000000000000240 R15: 0000000000000000 [ 452.080015] FS: 00007fdc0cc24700(0000) GS:ffff8800b6f00000(0000) knlGS:0000000000000000 [ 452.080015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 452.080015] CR2: 00007fdb054899b8 CR3: 0000000074404000 CR4: 00000000000006a0 [ 452.080015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 452.080015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 452.080015] Process accel-pppd (pid: 6643, threadinfo ffff88007125e000, task ffff8800b27e6dd0) [ 452.080015] Stack: [ 452.080015] ffff88007125fc28 ffffffff81256559 ffff88007125fc98 ffffffffa01b2bd1 [ 452.080015] ffff88007125fc58 000000000000000c 00000000029490d0 0000009c71dbe25e [ 452.080015] 000000000000005c 000000080000000e 0000000000000000 ffff880071170600 [ 452.080015] Call Trace: [ 452.080015] [] _raw_spin_lock+0xe/0x10 [ 452.080015] [] l2tp_xmit_skb+0x189/0x4ac [l2tp_core] [ 452.080015] [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.080015] [] __sock_sendmsg_nosec+0x22/0x24 [ 452.080015] [] sock_sendmsg+0xa1/0xb6 [ 452.080015] [] ? __schedule+0x5c1/0x616 [ 452.080015] [] ? __dequeue_signal+0xb7/0x10c [ 452.080015] [] ? fget_light+0x75/0x89 [ 452.080015] [] ? sockfd_lookup_light+0x20/0x56 [ 452.080015] [] sys_sendto+0x10c/0x13b [ 452.080015] [] system_call_fastpath+0x16/0x1b [ 452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 [ 452.080015] Call Trace: [ 452.080015] [] _raw_spin_lock+0xe/0x10 [ 452.080015] [] l2tp_xmit_skb+0x189/0x4ac [l2tp_core] [ 452.080015] [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.080015] [] __sock_sendmsg_nosec+0x22/0x24 [ 452.080015] [] sock_sendmsg+0xa1/0xb6 [ 452.080015] [] ? __schedule+0x5c1/0x616 [ 452.080015] [] ? __dequeue_signal+0xb7/0x10c [ 452.080015] [] ? fget_light+0x75/0x89 [ 452.080015] [] ? sockfd_lookup_light+0x20/0x56 [ 452.080015] [] sys_sendto+0x10c/0x13b [ 452.080015] [] system_call_fastpath+0x16/0x1b [ 452.064012] [ 452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs [ 452.064012] RIP: 0010:[] [] do_raw_spin_lock+0x19/0x1f [ 452.064012] RSP: 0018:ffff8800b6e83ba0 EFLAGS: 00000297 [ 452.064012] RAX: 000000000000aaa9 RBX: ffff8800b6e83b40 RCX: 0000000000000002 [ 452.064012] RDX: 00000000000000aa RSI: 000000000000000a RDI: ffff8800745c8110 [ 452.064012] RBP: ffff8800b6e83ba0 R08: 000000000000c802 R09: 000000000000001c [ 452.064012] R10: ffff880071096c4e R11: 0000000000000006 R12: ffff8800b6e83b18 [ 452.064012] R13: ffffffff8125d51e R14: ffff8800b6e83ba0 R15: ffff880072a589c0 [ 452.064012] FS: 00007fdc0b81e700(0000) GS:ffff8800b6e80000(0000) knlGS:0000000000000000 [ 452.064012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 452.064012] CR2: 0000000000625208 CR3: 0000000074404000 CR4: 00000000000006a0 [ 452.064012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 452.064012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 452.064012] Process accel-pppd (pid: 6662, threadinfo ffff88007129a000, task ffff8800744f7410) [ 452.064012] Stack: [ 452.064012] ffff8800b6e83bb0 ffffffff81256559 ffff8800b6e83bc0 ffffffff8121c64a [ 452.064012] ffff8800b6e83bf0 ffffffff8121ec7a ffff880072a589c0 ffff880071096c62 [ 452.064012] 0000000000000011 ffffffff81430024 ffff8800b6e83c80 ffffffff8121f276 [ 452.064012] Call Trace: [ 452.064012] [ 452.064012] [] _raw_spin_lock+0xe/0x10 [ 452.064012] [] spin_lock+0x9/0xb [ 452.064012] [] udp_queue_rcv_skb+0x186/0x269 [ 452.064012] [] __udp4_lib_rcv+0x297/0x4ae [ 452.064012] [] ? raw_rcv+0xe9/0xf0 [ 452.064012] [] udp_rcv+0x1a/0x1c [ 452.064012] [] ip_local_deliver_finish+0x12b/0x1a5 [ 452.064012] [] ip_local_deliver+0x53/0x84 [ 452.064012] [] ip_rcv_finish+0x2bc/0x2f3 [ 452.064012] [] ip_rcv+0x210/0x269 [ 452.064012] [] ? kvm_clock_get_cycles+0x9/0xb [ 452.064012] [] __netif_receive_skb+0x3a5/0x3f7 [ 452.064012] [] netif_receive_skb+0x57/0x5e [ 452.064012] [] ? __netdev_alloc_skb+0x1f/0x3b [ 452.064012] [] virtnet_poll+0x4ba/0x5a4 [virtio_net] [ 452.064012] [] net_rx_action+0x73/0x184 [ 452.064012] [] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [] __do_softirq+0xc3/0x1a8 [ 452.064012] [] ? ack_APIC_irq+0x10/0x12 [ 452.064012] [] ? _raw_spin_lock+0xe/0x10 [ 452.064012] [] call_softirq+0x1c/0x26 [ 452.064012] [] do_softirq+0x45/0x82 [ 452.064012] [] irq_exit+0x42/0x9c [ 452.064012] [] do_IRQ+0x8e/0xa5 [ 452.064012] [] common_interrupt+0x6e/0x6e [ 452.064012] [ 452.064012] [] ? kfree+0x8a/0xa3 [ 452.064012] [] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core] [ 452.064012] [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.064012] [] __sock_sendmsg_nosec+0x22/0x24 [ 452.064012] [] sock_sendmsg+0xa1/0xb6 [ 452.064012] [] ? __schedule+0x5c1/0x616 [ 452.064012] [] ? __dequeue_signal+0xb7/0x10c [ 452.064012] [] ? fget_light+0x75/0x89 [ 452.064012] [] ? sockfd_lookup_light+0x20/0x56 [ 452.064012] [] sys_sendto+0x10c/0x13b [ 452.064012] [] system_call_fastpath+0x16/0x1b [ 452.064012] Code: 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 8a 07 f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 55 48 [ 452.064012] Call Trace: [ 452.064012] [] _raw_spin_lock+0xe/0x10 [ 452.064012] [] spin_lock+0x9/0xb [ 452.064012] [] udp_queue_rcv_skb+0x186/0x269 [ 452.064012] [] __udp4_lib_rcv+0x297/0x4ae [ 452.064012] [] ? raw_rcv+0xe9/0xf0 [ 452.064012] [] udp_rcv+0x1a/0x1c [ 452.064012] [] ip_local_deliver_finish+0x12b/0x1a5 [ 452.064012] [] ip_local_deliver+0x53/0x84 [ 452.064012] [] ip_rcv_finish+0x2bc/0x2f3 [ 452.064012] [] ip_rcv+0x210/0x269 [ 452.064012] [] ? kvm_clock_get_cycles+0x9/0xb [ 452.064012] [] __netif_receive_skb+0x3a5/0x3f7 [ 452.064012] [] netif_receive_skb+0x57/0x5e [ 452.064012] [] ? __netdev_alloc_skb+0x1f/0x3b [ 452.064012] [] virtnet_poll+0x4ba/0x5a4 [virtio_net] [ 452.064012] [] net_rx_action+0x73/0x184 [ 452.064012] [] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [] __do_softirq+0xc3/0x1a8 [ 452.064012] [] ? ack_APIC_irq+0x10/0x12 [ 452.064012] [] ? _raw_spin_lock+0xe/0x10 [ 452.064012] [] call_softirq+0x1c/0x26 [ 452.064012] [] do_softirq+0x45/0x82 [ 452.064012] [] irq_exit+0x42/0x9c [ 452.064012] [] do_IRQ+0x8e/0xa5 [ 452.064012] [] common_interrupt+0x6e/0x6e [ 452.064012] [] ? kfree+0x8a/0xa3 [ 452.064012] [] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core] [ 452.064012] [] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core] [ 452.064012] [] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp] [ 452.064012] [] __sock_sendmsg_nosec+0x22/0x24 [ 452.064012] [] sock_sendmsg+0xa1/0xb6 [ 452.064012] [] ? __schedule+0x5c1/0x616 [ 452.064012] [] ? __dequeue_signal+0xb7/0x10c [ 452.064012] [] ? fget_light+0x75/0x89 [ 452.064012] [] ? sockfd_lookup_light+0x20/0x56 [ 452.064012] [] sys_sendto+0x10c/0x13b [ 452.064012] [] system_call_fastpath+0x16/0x1b Reported-by: François Cachereul Tested-by: François Cachereul Signed-off-by: Eric Dumazet Cc: James Chapman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 52f7719832d949815e989b18587bfcc67f6af64a Author: Christophe Gouault Date: Tue Oct 8 17:21:22 2013 +0200 vti: get rid of nf mark rule in prerouting [ Upstream commit 7263a5187f9e9de45fcb51349cf0e031142c19a1 ] This patch fixes and improves the use of vti interfaces (while lightly changing the way of configuring them). Currently: - it is necessary to identify and mark inbound IPsec packets destined to each vti interface, via netfilter rules in the mangle table at prerouting hook. - the vti module cannot retrieve the right tunnel in input since commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them. - the i_key is used by the outbound processing as a mark to lookup for the right SP and SA bundle. This patch uses the o_key to store the vti mark (instead of i_key) and enables: - to avoid the need for previously marking the inbound skbuffs via a netfilter rule. - to properly retrieve the right tunnel in input, only based on the IPsec packet outer addresses. - to properly perform an inbound policy check (using the tunnel o_key as a mark). - to properly perform an outbound SPD and SAD lookup (using the tunnel o_key as a mark). - to keep the current mark of the skbuff. The skbuff mark is neither used nor changed by the vti interface. Only the vti interface o_key is used. SAs have a wildcard mark. SPs have a mark equal to the vti interface o_key. The vti interface must be created as follows (i_key = 0, o_key = mark): ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1 The SPs attached to vti1 must be created as follows (mark = vti1 o_key): ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \ proto esp mode tunnel ip xfrm policy add dir in mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \ proto esp mode tunnel The SAs are created with the default wildcard mark. There is no distinction between global vs. vti SAs. Just their addresses will possibly link them to a vti interface: ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \ enc "cbc(aes)" "azertyuiopqsdfgh" ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \ enc "cbc(aes)" "sqbdhgqsdjqjsdfh" To avoid matching "global" (not vti) SPs in vti interfaces, global SPs should no use the default wildcard mark, but explicitly match mark 0. To avoid a double SPD lookup in input and output (in global and vti SPDs), the NOPOLICY and NOXFRM options should be set on the vti interfaces: echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm The outgoing traffic is steered to vti1 by a route via the vti interface: ip route add 192.168.0.0/16 dev vti1 The incoming IPsec traffic is steered to vti1 because its outer addresses match the vti1 tunnel configuration. Signed-off-by: Christophe Gouault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 07cc588696a658581bd4fb79d238267dfb4d3aef Author: Linus Lüssing Date: Sun Oct 20 00:58:57 2013 +0200 Revert "bridge: only expire the mdb entry when query is received" [ Upstream commit 454594f3b93a49ef568cd190c5af31376b105a7b ] While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang Signed-off-by: Linus Lüssing Reviewed-by: Vlad Yasevich Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9e393b6d30a4f56850fd506c9802da8fc633ac11 Author: Vlad Yasevich Date: Thu Oct 10 15:57:59 2013 -0400 bridge: update mdb expiration timer upon reports. [ Upstream commit f144febd93d5ee534fdf23505ab091b2b9088edc ] commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b bridge: only expire the mdb entry when query is received changed the mdb expiration timer to be armed only when QUERY is received. Howerver, this causes issues in an environment where the multicast server socket comes and goes very fast while a client is trying to send traffic to it. The root cause is a race where a sequence of LEAVE followed by REPORT messages can race against QUERY messages generated in response to LEAVE. The QUERY ends up starting the expiration timer, and that timer can potentially expire after the new REPORT message has been received signaling the new join operation. This leads to a significant drop in multicast traffic and possible complete stall. The solution is to have REPORT messages update the expiration timer on entries that already exist. Signed-off-by: Vlad Yasevich CC: Cong Wang CC: Herbert Xu CC: Stephen Hemminger Acked-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 073f9716b6ba92e3dabe87905601d39f0e52b8c5 Author: Marc Kleine-Budde Date: Mon Oct 7 23:19:58 2013 +0200 net: vlan: fix nlmsg size calculation in vlan_get_size() [ Upstream commit c33a39c575068c2ea9bffb22fd6de2df19c74b89 ] This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Cc: Patrick McHardy Signed-off-by: Marc Kleine-Budde Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 7c2506b622ec83eaf66d7ddf46e2188714d2ceb6 Author: Amir Vadai Date: Mon Oct 7 13:38:13 2013 +0200 net/mlx4_en: Fix pages never dma unmapped on rx [ Upstream commit 021f1107ffdae7a82af6c53f4c52654062e365c6 ] This patch fixes a bug introduced by commit 51151a16 (mlx4: allow order-0 memory allocations in RX path). dma_unmap_page never reached because condition to detect last fragment in page is wrong. offset+frag_stride can't be greater than size, need to make sure no additional frag will fit in page => compare offset + frag_stride + next_frag_size instead. next_frag_size is the same as the current one, since page is shared only with frags of the same size. CC: Eric Dumazet Signed-off-by: Amir Vadai Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 242ff63bf498150fbddaefc75cf52250c6c15fb4 Author: Amir Vadai Date: Mon Oct 7 13:38:12 2013 +0200 net/mlx4_en: Rename name of mlx4_en_rx_alloc members [ Upstream commit 70fbe0794393829d9acd686428d87c27b6f6984b ] Add page prefix to page related members: @size and @offset into @page_size and @page_offset CC: Eric Dumazet Signed-off-by: Amir Vadai Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a4626bf64790a725902a02f3979916f9342c0c34 Author: Paul Durrant Date: Tue Oct 8 14:56:44 2013 +0100 xen-netback: Don't destroy the netdev until the vif is shut down [ upstream commit id: 279f438e36c0a70b23b86d2090aeec50155034a9 ] Without this patch, if a frontend cycles through states Closing and Closed (which Windows frontends need to do) then the netdev will be destroyed and requires re-invocation of hotplug scripts to restore state before the frontend can move to Connected. Thus when udev is not in use the backend gets stuck in InitWait. With this patch, the netdev is left alone whilst the backend is still online and is only de-registered and freed just prior to destroying the vif (which is also nicely symmetrical with the netdev allocation and registration being done during probe) so no re-invocation of hotplug scripts is required. Signed-off-by: Paul Durrant Cc: David Vrabel Cc: Wei Liu Cc: Ian Campbell Signed-off-by: Greg Kroah-Hartman commit c298b2e28ec54a4be740fa4e0811a2bdd70bf545 Author: Fabio Estevam Date: Sat Oct 5 17:56:59 2013 -0300 net: secure_seq: Fix warning when CONFIG_IPV6 and CONFIG_INET are not selected [ Upstream commit cb03db9d0e964568407fb08ea46cc2b6b7f67587 ] net_secret() is only used when CONFIG_IPV6 or CONFIG_INET are selected. Building a defconfig with both of these symbols unselected (Using the ARM at91sam9rl_defconfig, for example) leads to the following build warning: $ make at91sam9rl_defconfig # # configuration written to .config # $ make net/core/secure_seq.o scripts/kconfig/conf --silentoldconfig Kconfig CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h make[1]: `include/generated/mach-types.h' is up to date. CALL scripts/checksyscalls.sh CC net/core/secure_seq.o net/core/secure_seq.c:17:13: warning: 'net_secret_init' defined but not used [-Wunused-function] Fix this warning by protecting the definition of net_secret() with these symbols. Reported-by: Olof Johansson Signed-off-by: Fabio Estevam Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit c5c06751157187520aca835756df95506118a4e7 Author: Marc Kleine-Budde Date: Sat Oct 5 21:25:17 2013 +0200 can: dev: fix nlmsg size calculation in can_get_size() [ Upstream commit fe119a05f8ca481623a8d02efcc984332e612528 ] This patch fixes the calculation of the nlmsg size, by adding the missing nla_total_size(). Signed-off-by: Marc Kleine-Budde Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d02f1e02342ddebc6fb2e583e3a7e6f9af65a515 Author: Jiri Benc Date: Fri Oct 4 17:04:48 2013 +0200 ipv4: fix ineffective source address selection [ Upstream commit 0a7e22609067ff524fc7bbd45c6951dd08561667 ] When sending out multicast messages, the source address in inet->mc_addr is ignored and rewritten by an autoselected one. This is caused by a typo in commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output route lookups"). Signed-off-by: Jiri Benc Acked-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 067cb429b87e651f83329ef4a86b997c0688cb81 Author: Mathias Krause Date: Mon Sep 30 22:03:06 2013 +0200 proc connector: fix info leaks [ Upstream commit e727ca82e0e9616ab4844301e6bae60ca7327682 ] Initialize event_data for all possible message types to prevent leaking kernel stack contents to userland (up to 20 bytes). Also set the flags member of the connector message to 0 to prevent leaking two more stack bytes this way. Signed-off-by: Mathias Krause Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit cef1e66ef92e7821e58e7dc32a2b764541a74ff4 Author: Willem de Bruijn Date: Tue Oct 22 10:59:18 2013 -0400 sit: amend "allow to use rtnl ops on fb tunnel" Amend backport to 3.11.y of [ Upstream commit 205983c43700ac3a81e7625273a3fa83cd2759b5 ] The discussion thread in the upstream commit mentions that in backports to stable-* branches, the line - unregister_netdevice_queue(sitn->fb_tunnel_dev, &list); must be omitted if that branch does not have commit 5e6700b3bf98 ("sit: add support of x-netns"). This line has correctly been omitted in the backport to 3.10, which indeed does not have that commit. It was also removed in the backport to 3.11.y, which does have that commit. This causes the following steps to hit a BUG at net/core/dev.c:5039: `modprobe sit; rmmod sit` The bug demonstrates that it causes a device to be unregistered twice. The simple fix is to apply the one line in the upstream commit that was dropped in the backport to 3.11 (3783100374653e2e7fbdf68c710f5). This brings the logic in line with upstream linux, net and net-next branches. Signed-off-by: Willem de Bruijn Acked-by: Nicolas Dichtel Reviewed-by: Veaceslav Falico Signed-off-by: Greg Kroah-Hartman commit 3365bb990a76d4d7dd4b36202aa7ade95bde7c70 Author: Dan Carpenter Date: Thu Oct 3 00:27:20 2013 +0300 net: heap overflow in __audit_sockaddr() [ Upstream commit 1661bf364ae9c506bc8795fef70d1532931be1e8 ] We need to cap ->msg_namelen or it leads to a buffer overflow when we to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to exploit this bug. The call tree is: ___sys_recvmsg() move_addr_to_user() audit_sockaddr() __audit_sockaddr() Reported-by: Jüri Aedla Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e795ffb6551b0c3b52233e585cef96879942638c Author: Sebastian Hesselbarth Date: Wed Oct 2 12:57:21 2013 +0200 net: mv643xx_eth: fix orphaned statistics timer crash [ Upstream commit f564412c935111c583b787bcc18157377b208e2e ] The periodic statistics timer gets started at port _probe() time, but is stopped on _stop() only. In a modular environment, this can cause the timer to access already deallocated memory, if the module is unloaded without starting the eth device. To fix this, we add the timer right before the port is started, instead of at _probe() time. Signed-off-by: Sebastian Hesselbarth Acked-by: Jason Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit d3ba0446d7c89ff8f1e66a3be9f5e9e01a1eaa50 Author: Sebastian Hesselbarth Date: Wed Oct 2 12:57:20 2013 +0200 net: mv643xx_eth: update statistics timer from timer context only [ Upstream commit 041b4ddb84989f06ff1df0ca869b950f1ee3cb1c ] Each port driver installs a periodic timer to update port statistics by calling mib_counters_update. As mib_counters_update is also called from non-timer context, we should not reschedule the timer there but rather move it to timer-only context. Signed-off-by: Sebastian Hesselbarth Acked-by: Jason Cooper Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f94b591b19d4c7c5d3aea008d1df8ec3d0abdd84 Author: David S. Miller Date: Tue Oct 8 15:44:26 2013 -0400 l2tp: Fix build warning with ipv6 disabled. [ Upstream commit 8d8a51e26a6d415e1470759f2cf5f3ee3ee86196 ] net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’: net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable] Create a helper "l2tp_tunnel()" to facilitate this, and as a side effect get rid of a bunch of unnecessary void pointer casts. Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 84ee5cbb9a39a879806d79e0b4812c332418e52a Author: François CACHEREUL Date: Wed Oct 2 10:16:02 2013 +0200 l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses [ Upstream commit e18503f41f9b12132c95d7c31ca6ee5155e44e5c ] IPv4 mapped addresses cause kernel panic. The patch juste check whether the IPv6 address is an IPv4 mapped address. If so, use IPv4 API instead of IPv6. [ 940.026915] general protection fault: 0000 [#1] [ 940.026915] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppox ppp_generic slhc loop psmouse [ 940.026915] CPU: 0 PID: 3184 Comm: memcheck-amd64- Not tainted 3.11.0+ #1 [ 940.026915] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 940.026915] task: ffff880007130e20 ti: ffff88000737e000 task.ti: ffff88000737e000 [ 940.026915] RIP: 0010:[] [] ip6_xmit+0x276/0x326 [ 940.026915] RSP: 0018:ffff88000737fd28 EFLAGS: 00010286 [ 940.026915] RAX: c748521a75ceff48 RBX: ffff880000c30800 RCX: 0000000000000000 [ 940.026915] RDX: ffff88000075cc4e RSI: 0000000000000028 RDI: ffff8800060e5a40 [ 940.026915] RBP: ffff8800060e5a40 R08: 0000000000000000 R09: ffff88000075cc90 [ 940.026915] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000737fda0 [ 940.026915] R13: 0000000000000000 R14: 0000000000002000 R15: ffff880005d3b580 [ 940.026915] FS: 00007f163dc5e800(0000) GS:ffffffff81623000(0000) knlGS:0000000000000000 [ 940.026915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 940.026915] CR2: 00000004032dc940 CR3: 0000000005c25000 CR4: 00000000000006f0 [ 940.026915] Stack: [ 940.026915] ffff88000075cc4e ffffffff81694e90 ffff880000c30b38 0000000000000020 [ 940.026915] 11000000523c4bac ffff88000737fdb4 0000000000000000 ffff880000c30800 [ 940.026915] ffff880005d3b580 ffff880000c30b38 ffff8800060e5a40 0000000000000020 [ 940.026915] Call Trace: [ 940.026915] [] ? inet6_csk_xmit+0xa4/0xc4 [ 940.026915] [] ? l2tp_xmit_skb+0x503/0x55a [l2tp_core] [ 940.026915] [] ? pskb_expand_head+0x161/0x214 [ 940.026915] [] ? pppol2tp_xmit+0xf2/0x143 [l2tp_ppp] [ 940.026915] [] ? ppp_channel_push+0x36/0x8b [ppp_generic] [ 940.026915] [] ? ppp_write+0xaf/0xc5 [ppp_generic] [ 940.026915] [] ? vfs_write+0xa2/0x106 [ 940.026915] [] ? SyS_write+0x56/0x8a [ 940.026915] [] ? system_call_fastpath+0x16/0x1b [ 940.026915] Code: 00 49 8b 8f d8 00 00 00 66 83 7c 11 02 00 74 60 49 8b 47 58 48 83 e0 fe 48 8b 80 18 01 00 00 48 85 c0 74 13 48 8b 80 78 02 00 00 <48> ff 40 28 41 8b 57 68 48 01 50 30 48 8b 54 24 08 49 c7 c1 51 [ 940.026915] RIP [] ip6_xmit+0x276/0x326 [ 940.026915] RSP [ 940.057945] ---[ end trace be8aba9a61c8b7f3 ]--- [ 940.058583] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: François CACHEREUL Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 9ef3f9ace58ea785564189b5eb6967ef608abde7 Author: Matthias Schiffer Date: Fri Sep 27 18:03:39 2013 +0200 batman-adv: set up network coding packet handlers during module init [ Upstream commit 6c519bad7b19a2c14a075b400edabaa630330123 ] batman-adv saves its table of packet handlers as a global state, so handlers must be set up only once (and setting them up a second time will fail). The recently-added network coding support tries to set up its handler each time a new softif is registered, which obviously fails when more that one softif is used (and in consequence, the softif creation fails). Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called only once) and batadv_nc_mesh_init (which is called for each softif); in addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming consistent. Signed-off-by: Matthias Schiffer Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli Signed-off-by: Greg Kroah-Hartman commit 8c89fd5551834088ebb4621921318b95594d4b31 Author: Eric Dumazet Date: Tue Oct 1 21:04:11 2013 -0700 net: do not call sock_put() on TIMEWAIT sockets [ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ] commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets. We should instead use inet_twsk_put() Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 39fc1ed53e50b5a8b6f8a9da7867d214760be19d Author: Yuchung Cheng Date: Sat Oct 12 10:16:27 2013 -0700 tcp: fix incorrect ca_state in tail loss probe [ Upstream commit 031afe4990a7c9dbff41a3a742c44d3e740ea0a1 ] On receiving an ACK that covers the loss probe sequence, TLP immediately sets the congestion state to Open, even though some packets are not recovered and retransmisssion are on the way. The later ACks may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g., https://bugzilla.redhat.com/show_bug.cgi?id=989251 The fix is to follow the similar procedure in recovery by calling tcp_try_keep_open(). The sender switches to Open state if no packets are retransmissted. Otherwise it goes to Disorder and let subsequent ACKs move the state to Recovery or Open. Reported-By: Michael Sterrett Tested-By: Dormando Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dc0791aee672ca2fb8bd61e32a66f77d01bcafc8 Author: Eric Dumazet Date: Fri Oct 4 10:31:41 2013 -0700 tcp: do not forget FIN in tcp_shifted_skb() [ Upstream commit 5e8a402f831dbe7ee831340a91439e46f0d38acd ] Yuchung found following problem : There are bugs in the SACK processing code, merging part in tcp_shift_skb_data(), that incorrectly resets or ignores the sacked skbs FIN flag. When a receiver first SACK the FIN sequence, and later throw away ofo queue (e.g., sack-reneging), the sender will stop retransmitting the FIN flag, and hangs forever. Following packetdrill test can be used to reproduce the bug. $ cat sack-merge-bug.pkt `sysctl -q net.ipv4.tcp_fack=0` // Establish a connection and send 10 MSS. 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +.000 bind(3, ..., ...) = 0 +.000 listen(3, 1) = 0 +.050 < S 0:0(0) win 32792 +.000 > S. 0:0(0) ack 1 +.001 < . 1:1(0) ack 1 win 1024 +.000 accept(3, ..., ...) = 4 +.100 write(4, ..., 12000) = 12000 +.000 shutdown(4, SHUT_WR) = 0 +.000 > . 1:10001(10000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 +.000 > FP. 10001:12001(2000) ack 1 +.050 < . 1:1(0) ack 2001 win 257 +.050 < . 1:1(0) ack 2001 win 257 // SACK reneg +.050 < . 1:1(0) ack 12001 win 257 +0 %{ print "unacked: ",tcpi_unacked }% +5 %{ print "" }% First, a typo inverted left/right of one OR operation, then code forgot to advance end_seq if the merged skb carried FIN. Bug was added in 2.6.29 by commit 832d11c5cd076ab ("tcp: Try to restore large SKBs while SACK processing") Signed-off-by: Eric Dumazet Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Cc: Ilpo Järvinen Acked-by: Ilpo Järvinen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 18ddf5127c9f3ba842bb9e7228fafee6abbb6d6e Author: Eric Dumazet Date: Tue Oct 15 11:54:30 2013 -0700 tcp: must unclone packets before mangling them [ Upstream commit c52e2421f7368fd36cbe330d2cf41b10452e39a9 ] TCP stack should make sure it owns skbs before mangling them. We had various crashes using bnx2x, and it turned out gso_size was cleared right before bnx2x driver was populating TC descriptor of the _previous_ packet send. TCP stack can sometime retransmit packets that are still in Qdisc. Of course we could make bnx2x driver more robust (using ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack. We have identified two points where skb_unclone() was needed. This patch adds a WARN_ON_ONCE() to warn us if we missed another fix of this kind. Kudos to Neal for finding the root cause of this bug. Its visible using small MSS. Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Cc: Yuchung Cheng Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 80bd5d8968d8d2813d05ff485127f45961745cc9 Author: Eric Dumazet Date: Fri Sep 27 03:28:54 2013 -0700 tcp: TSQ can use a dynamic limit [ Upstream commit c9eeec26e32e087359160406f96e0949b3cc6f10 ] When TCP Small Queues was added, we used a sysctl to limit amount of packets queues on Qdisc/device queues for a given TCP flow. Problem is this limit is either too big for low rates, or too small for high rates. Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO auto sizing, it can better control number of packets in Qdisc/device queues. New limit is two packets or at least 1 to 2 ms worth of packets. Low rates flows benefit from this patch by having even smaller number of packets in queues, allowing for faster recovery, better RTT estimations. High rates flows benefit from this patch by allowing more than 2 packets in flight as we had reports this was a limiting factor to reach line rate. [ In particular if TX completion is delayed because of coalescing parameters ] Example for a single flow on 10Gbp link controlled by FQ/pacing 14 packets in flight instead of 2 $ tc -s -d qd qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 1024 quantum 3028 initial_quantum 15140 Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0 requeues 6822476) rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476 2047 flow, 2046 inactive, 1 throttled, delay 15673 ns 2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit Note that sk_pacing_rate is currently set to twice the actual rate, but this might be refined in the future when a flow is in congestion avoidance. Additional change : skb->destructor should be set to tcp_wfree(). A future patch (for linux 3.13+) might remove tcp_limit_output_bytes Signed-off-by: Eric Dumazet Cc: Wei Liu Cc: Cong Wang Cc: Yuchung Cheng Cc: Neal Cardwell Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit dbeb18b221975144a8bd9258c7dba4b0d7aa3a31 Author: Eric Dumazet Date: Tue Aug 27 05:46:32 2013 -0700 tcp: TSO packets automatic sizing [ Upstream commits 6d36824e730f247b602c90e8715a792003e3c5a7, 02cf4ebd82ff0ac7254b88e466820a290ed8289a, and parts of 7eec4174ff29cd42f2acfae8112f51c228545d40 ] After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet Cc: Neal Cardwell Cc: Yuchung Cheng Cc: Van Jacobson Cc: Tom Herbert Acked-by: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman