linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-22 17:29:21 +07:00

Author	SHA1	Message	Date
Eric W. Biederman	2ce9550350	sctp: Make the ctl_sock per network namespace - Kill sctp_get_ctl_sock, it is useless now. - Pass struct net where needed so net->sctp.ctl_sock is accessible. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:17:26 -07:00
Eric W. Biederman	4db67e8086	sctp: Make the address lists per network namespace - Move the address lists into struct net - Add per network namespace initialization and cleanup - Pass around struct net so it is everywhere I need it. - Rename all of the global variable references into references to the variables moved into struct net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 23:12:17 -07:00
Eric W. Biederman	4110cc255d	sctp: Make the association hashtable handle multiple network namespaces - Use struct net in the hash calculation - Use sock_net(association.base.sk) in the association lookups. - On receive calculate the network namespace from skb->dev. - Pass struct net from receive down to the functions that actually do the association lookup. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	4cdadcbcb6	sctp: Make the endpoint hashtable handle multiple network namespaces - Use struct net in the hash calculation - Use sock_net(endpoint.base.sk) in the endpoint lookups. - On receive calculate the network namespace from skb->dev. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	f1f4376307	sctp: Make the port hash table use struct net in it's key. - Add struct net into the port hash table hash calculation - Add struct net inot the struct sctp_bind_bucket so there is a memory of which network namespace a port is allocated in. No need for a ref count because sctp_bind_bucket only exists when there are sockets in the hash table and sockets can not change their network namspace, and sockets already ref count their network namespace. - Add struct net into the key comparison when we are testing to see if we have found the port hash table entry we are looking for. With these changes lookups in the port hash table becomes safe to use in multiple network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 22:44:12 -07:00
Eric W. Biederman	26711a791e	userns: xt_owner: Add basic user namespace support. - Only allow adding matches from the initial user namespace - Add the appropriate conversion functions to handle matches against sockets in other user namespaces. Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Patrick McHardy <kaber@trash.net> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:30 -07:00
Eric W. Biederman	da7428080a	userns xt_recent: Specify the owner/group of ip_list_perms in the initial user namespace xt_recent creates a bunch of proc files and initializes their uid and gids to the values of ip_list_uid and ip_list_gid. When initialize those proc files convert those values to kuids so they can continue to reside on the /proc inode. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jan Engelhardt <jengelh@medozas.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:29 -07:00
Eric W. Biederman	8c6e2a941a	userns: Convert xt_LOG to print socket kuids and kgids as uids and gids xt_LOG always writes messages via sb_add via printk. Therefore when xt_LOG logs the uid and gid of a socket a packet came from the values should be converted to be in the initial user namespace. Thus making xt_LOG as user namespace safe as possible. Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:29 -07:00
Eric W. Biederman	a6c6796c71	userns: Convert cls_flow to work with user namespaces enabled The flow classifier can use uids and gids of the sockets that are transmitting packets and do insert those uids and gids into the packet classification calcuation. I don't fully understand the details but it appears that we can depend on specific uids and gids when making traffic classification decisions. To work with user namespaces enabled map from kuids and kgids into uids and gids in the initial user namespace giving raw integer values the code can play with and depend on. To avoid issues of userspace depending on uids and gids in packet classifiers installed from other user namespaces and getting confused deny all packet classifiers that use uids or gids that are not comming from a netlink socket in the initial user namespace. Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Changli Gao <xiaosuo@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:28 -07:00
Eric W. Biederman	af4c6641f5	net sched: Pass the skb into change so it can access NETLINK_CB cls_flow.c plays with uids and gids. Unless I misread that code it is possible for classifiers to depend on the specific uid and gid values. Therefore I need to know the user namespace of the netlink socket that is installing the packet classifiers. Pass in the rtnetlink skb so I can access the NETLINK_CB of the passed packet. In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk). Pass in not the user namespace but the incomming rtnetlink skb into the the classifier change routines as that is generally the more useful parameter. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:28 -07:00
Eric W. Biederman	9eea9515cb	userns: nfnetlink_log: Report socket uids in the log sockets user namespace At logging instance creation capture the peer netlink socket's user namespace. Use the captured peer user namespace when reporting socket uids to the peer. The peer socket's user namespace is guaranateed to be valid until the user closes the netlink socket. nfnetlink_log removes instances during the final close of a socket. __build_packet_message does not get called after an instance is destroyed. Therefore it is safe to let the peer netlink socket take care of the user namespace reference counting for us. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:27 -07:00
Eric W. Biederman	d06ca95643	userns: Teach inet_diag to work with user namespaces Compute the user namespace of the socket that we are replying to and translate the kuids of reported sockets into that user namespace. Cc: Andrew Vagin <avagin@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:55:20 -07:00
Eric W. Biederman	3fbc290540	netlink: Make the sending netlink socket availabe in NETLINK_CB The sending socket of an skb is already available by it's port id in the NETLINK_CB. If you want to know more like to examine the credentials on the sending socket you have to look up the sending socket by it's port id and all of the needed functions and data structures are static inside of af_netlink.c. So do the simple thing and pass the sending socket to the receivers in the NETLINK_CB. I intend to use this to get the user namespace of the sending socket in inet_diag so that I can report uids in the context of the process who opened the socket, the same way I report uids in the contect of the process who opens files. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:49:49 -07:00
Eric W. Biederman	d13fda8564	userns: Convert net/ax25 to use kuid_t where appropriate Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:49:42 -07:00
Eric W. Biederman	4f82f45730	net ip6 flowlabel: Make owner a union of struct pid * and kuid_t Correct a long standing omission and use struct pid in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS. This guarantees we don't have issues when pid wraparound occurs. Use a kuid_t in the owner field of struct ip6_flowlabel when the share type is IPV6_FL_S_USER to add user namespace support. In /proc/net/ip6_flowlabel capture the current pid namespace when opening the file and release the pid namespace when the file is closed ensuring we print the pid owner value that is meaning to the reader of the file. Similarly use from_kuid_munged to print uid values that are meaningful to the reader of the file. This requires exporting pid_nr_ns so that ipv6 can continue to built as a module. Yoiks what silliness Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:49:25 -07:00
Eric W. Biederman	7064d16e16	userns: Use kgids for sysctl_ping_group_range - Store sysctl_ping_group_range as a paire of kgid_t values instead of a pair of gid_t values. - Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range - For invalid cases reset to the default disabled state. With the kgid_t conversion made part of the original value sanitation from userspace understand how the code will react becomes clearer and it becomes possible to set the sysctl ping group range from something other than the initial user namespace. Cc: Vasiliy Kulikov <segoon@openwall.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:49:10 -07:00
Eric W. Biederman	a7cb5a49bf	userns: Print out socket uids in a user namespace aware fashion. Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:48:06 -07:00
Eric W. Biederman	976d020150	userns: Convert sock_i_uid to return a kuid_t Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:47:34 -07:00
Eric W. Biederman	d04a48b06d	userns: Convert __dev_set_promiscuity to use kuids in audit logs Cc: Klaus Heinrich Kiwi <klausk@br.ibm.com> Cc: Eric Paris <eparis@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2012-08-14 21:42:40 -07:00
Eric W. Biederman	b2e4f544fd	userns: Convert net/core/scm.c to use kuids and kgids With the existence of kuid_t and kgid_t we can take this further and remove the usage of struct cred altogether, ensuring we don't get cache line misses from reference counts. For now however start simply and do a straight forward conversion I can be certain is correct. In cred_to_ucred use from_kuid_munged and from_kgid_munged as these values are going directly to userspace and we want to use the userspace safe values not -1 when reporting a value that does not map. The earlier conversion that used from_kuid was buggy in that respect. Oops. Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-08-14 21:41:58 -07:00
David S. Miller	7bab3ae760	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Alexey Khoroshilov provides a potential memory leak in rndis_wlan. Bob Copeland gives us an ath5k fix for a lockdep problem. Dan Carpenter fixes a signedness mismatch in at76c50x. Felix Fietkau corrects a regression caused by an earlier commit that can lead to an IRQ storm. Lorenzo Bianconi offers a fix for a bad variable initialization in ath9k that can cause it to improperly mark decrypted frames. Rajkumar Manoharan fixes ath9k to prevent the btcoex time from running when the hardware is asleep. The remainder are Bluetooth fixes, about which Gustavo says: "Here goes some fixes for 3.6-rc1, there are a few fix to thte inquiry code by Ram Malovany, support for 2 new devices, and few others fixes for NULL dereference, possible deadlock and a memory leak." ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 17:03:22 -07:00
Ben Hutchings	4acd4945cd	ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock Cong Wang reports that lockdep detected suspicious RCU usage while enabling IPV6 forwarding: [ 1123.310275] =============================== [ 1123.442202] [ INFO: suspicious RCU usage. ] [ 1123.558207] 3.6.0-rc1+ #109 Not tainted [ 1123.665204] ------------------------------- [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section! [ 1123.992320] [ 1123.992320] other info that might help us debug this: [ 1123.992320] [ 1124.307382] [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0 [ 1124.522220] 2 locks held by sysctl/5710: [ 1124.648364] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17 [ 1124.882211] #1: (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29 [ 1125.085209] [ 1125.085209] stack backtrace: [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109 [ 1125.441291] Call Trace: [ 1125.545281] [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112 [ 1125.667212] [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47 [ 1125.781838] [<ffffffff8107c260>] __might_sleep+0x1e/0x19b [...] [ 1127.445223] [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f [...] [ 1127.772188] [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b [ 1127.885174] [<ffffffff81872d26>] dev_forward_change+0x30/0xcb [ 1128.013214] [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5 [...] addrconf_forward_change() uses RCU iteration over the netdev list, which is unnecessary since it already holds the RTNL lock. We also cannot reasonably require netdevice notifier functions not to sleep. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 17:02:12 -07:00
Pavel Emelyanov	eea68e2f1a	packet: Report socket mclist info via diag module The info is reported as an array of packet_diag_mclist structures. Each includes not only the directly configured values (index, type, etc), but also the "count". Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:56:33 -07:00
Pavel Emelyanov	8a360be0c5	packet: Report more packet sk info via diag module This reports in one rtattr message all the other scalar values, that can be set on a packet socket with setsockopt. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:56:33 -07:00
Pavel Emelyanov	96ec632714	packet: Diag core and basic socket info dumping The diag module can be built independently from the af_packet.ko one, just like it's done in unix sockets. The core dumping message carries the info available at socket creation time, i.e. family, type and protocol (in the same byte order as shown in the proc file). The socket inode number and cookie is reserved for future per-socket info retrieving. The per-protocol filtering is also reserved for future by requiring the sdiag_protocol to be zero. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:56:33 -07:00
Pavel Emelyanov	2787b04b6c	packet: Introduce net/packet/internal.h header The diag module will need to access some private packet_sock data, so move it to a header in advance. This file will be shared between the af_packet.c and the diag.c Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:56:33 -07:00
Ben Hutchings	aadf31de16	llc: Fix races between llc2 handler use and (un)registration When registering the handlers, any state they rely on must be completely initialised first. When unregistering, we must wait until they are definitely no longer running. llc_rcv() must also avoid reading the handler pointers again after checking for NULL. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:52:02 -07:00
Ben Hutchings	f4f8720feb	llc2: Call llc_station_exit() on llc2_init() failure path Otherwise the station packet handler will remain registered even though the module is unloaded. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:51:40 -07:00
Ben Hutchings	6024935f5f	llc2: Fix silent failure of llc_station_init() llc_station_init() creates and processes an event skb with no effect other than to change the state from DOWN to UP. Allocation failure is reported, but then ignored by its caller, llc2_init(). Remove this possibility by simply initialising the state as UP. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 16:51:18 -07:00
Igor Maravic	ad5b310228	net: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf We've already found leaf, don't search for it again. Same is for fib leaf info. Signed-off-by: Igor Maravic <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 15:02:20 -07:00
Priyanka Jain	418a99ac6a	Replace rwlock on xfrm_policy_afinfo with rcu xfrm_policy_afinfo is read mosly data structure. Write on xfrm_policy_afinfo is done only at the time of configuration. So rwlocks can be safely replaced with RCU. RCUs usage optimizes the performance. Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:57:37 -07:00
Igor Maravic	4855d6f311	net: ipv6: proc: Fix error handling Fix error handling in case making of dir dev_snmp6 failes Signed-off-by: Igor Maravic <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:45:07 -07:00
Yan, Zheng	7bd86cc282	ipv4: Cache local output routes Commit `caacf05e5a` causes big drop of UDP loop back performance. The cause of the regression is that we do not cache the local output routes. Each time we send a datagram from unconnected UDP socket, the kernel allocates a dst_entry and adds it to the rt_uncached_list. It creates lock contention on the rt_uncached_lock. Reported-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:45:07 -07:00
Amerigo Wang	6bdb7fe310	netpoll: re-enable irq in poll_napi() napi->poll() needs IRQ enabled, so we have to re-enable IRQ before calling it. Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:33 -07:00
Amerigo Wang	689971b446	netpoll: handle vlan tags in netpoll tx and rx path Without this patch, I can't get netconsole logs remotely over vlan. The reason is probably we don't handle vlan tags in either netpoll tx or rx path. I am not sure if I use these vlan functions correctly, at least this patch works. Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	6eacf8ad8d	vlan: clean up vlan_dev_hard_start_xmit() Clean up vlan_dev_hard_start_xmit() function. Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	f3da38932b	vlan: clean up some variable names To be consistent, s/info/vlan/. Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Patrick McHardy <kaber@trash.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	e15c3c2294	netpoll: check netpoll tx status on the right device Although this doesn't matter actually, because netpoll_tx_running() doesn't use the parameter, the code will be more readable. For team_dev_queue_xmit() we have to move it down to avoid compile errors. Cc: David Miller <davem@davemloft.net> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:32 -07:00
Amerigo Wang	4e3828c4bf	bridge: use list_for_each_entry() in netpoll functions We don't delete 'p' from the list in the loop, so we can just use list_for_each_entry(). Cc: David Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:31 -07:00
Amerigo Wang	d30362c071	bridge: add some comments for NETDEV_RELEASE Add comments on why we don't notify NETDEV_RELEASE. Cc: David Miller <davem@davemloft.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:31 -07:00
Amerigo Wang	2899656b49	netpoll: take rcu_read_lock_bh() in netpoll_send_skb_on_dev() This patch fixes several problems in the call path of netpoll_send_skb_on_dev(): 1. Disable IRQ's before calling netpoll_send_skb_on_dev(). 2. All the callees of netpoll_send_skb_on_dev() should use rcu_dereference_bh() to dereference ->npinfo. 3. Rename arp_reply() to netpoll_arp_reply(), the former is too generic. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:31 -07:00
Amerigo Wang	57c5d46191	netpoll: take rcu_read_lock_bh() in netpoll_rx() In __netpoll_rx(), it dereferences ->npinfo without rcu_dereference_bh(), this patch fixes it by using the 'npinfo' passed from netpoll_rx() where it is already dereferenced with rcu_dereference_bh(). Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:31 -07:00
Amerigo Wang	38e6bc185d	netpoll: make __netpoll_cleanup non-block Like the previous patch, slave_disable_netpoll() and __netpoll_cleanup() may be called with read_lock() held too, so we should make them non-block, by moving the cleanup and kfree() to call_rcu_bh() callbacks. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
Amerigo Wang	47be03a28c	netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup() slave_enable_netpoll() and __netpoll_setup() may be called with read_lock() held, so should use GFP_ATOMIC to allocate memory. Eric suggested to pass gfp flags to __netpoll_setup(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:33:30 -07:00
xeb@mail.ru	c12b395a46	gre: Support GRE over IPv6 GRE over IPv6 implementation. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:28:32 -07:00
Amerigo Wang	b7bc2a5b5b	net: remove netdev_bonding_change() I don't see any benifits to use netdev_bonding_change() than using call_netdevice_notifiers() directly. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:28:24 -07:00
Amerigo Wang	ee89bab14e	net: move and rename netif_notify_peers() I believe net/core/dev.c is a better place for netif_notify_peers(), because other net event notify functions also stay in this file. And rename it to netdev_notify_peers(). Cc: David S. Miller <davem@davemloft.net> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-14 14:28:23 -07:00
John W. Linville	1e55217e17	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2012-08-14 14:42:54 -04:00
Pablo Neira Ayuso	68e035c950	netfilter: ctnetlink: fix missing locking while changing conntrack from nfqueue Since `9cb017665` netfilter: add glue code to integrate nfnetlink_queue and ctnetlink, we can modify the conntrack entry via nfnl_queue. However, the change of the conntrack entry via nfnetlink_queue requires appropriate locking to avoid concurrent updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-14 12:54:45 +02:00
Wu Fengguang	19e303d67d	netfilter: PTR_RET can be used This quiets the coccinelle warnings: net/bridge/netfilter/ebtable_filter.c:107:1-3: WARNING: PTR_RET can be used net/bridge/netfilter/ebtable_nat.c:107:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_filter.c:65:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_mangle.c💯1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_raw.c:44:1-3: WARNING: PTR_RET can be used net/ipv6/netfilter/ip6table_security.c:62:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_filter.c:72:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_mangle.c:107:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_raw.c:51:1-3: WARNING: PTR_RET can be used net/ipv4/netfilter/iptable_security.c:70:1-3: WARNING: PTR_RET can be used Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-14 02:31:47 +02:00
Marco Porsch	df32381896	mac80211: fix unnecessary beacon update after peering status change ieee80211_bss_info_change_notify is called everytime a peer link is established or closed, because the accepting_plinks flag in the meshconf IE might have changed. With this patch the corresponding functions return the BSS_CHANGED_BEACON flag when a beacon update is necessary. Also it makes mesh_accept_plinks_update the common place to update the accepting_plinks flag. mesh_accept_plinks_update is called upon plink change and also periodically from ieee80211_mesh_housekeeping. Thus, it also picks up changes of local->num_sta. Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de> Acked-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-08-13 15:28:34 -04:00
danborkmann@iogearbox.net	7f5c3e3a80	af_packet: remove BUG statement in tpacket_destruct_skb Here's a quote of the comment about the BUG macro from asm-generic/bug.h: Don't use BUG() or BUG_ON() unless there's really no way out; one example might be detecting data structure corruption in the middle of an operation that can't be backed out of. If the (sub)system can somehow continue operating, perhaps with reduced functionality, it's probably not BUG-worthy. If you're tempted to BUG(), think again: is completely giving up really the only solution? There are usually better options, where users don't need to reboot ASAP and can mostly shut down cleanly. In our case, the status flag of a ring buffer slot is managed from both sides, the kernel space and the user space. This means that even though the kernel side might work as expected, the user space screws up and changes this flag right between the send(2) is triggered when the flag is changed to TP_STATUS_SENDING and a given skb is destructed after some time. Then, this will hit the BUG macro. As David suggested, the best solution is to simply remove this statement since it cannot be used for kernel side internal consistency checks. I've tested it and the system still behaves /stable/ in this case, so in accordance with the above comment, we should rather remove it. Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-12 13:42:17 -07:00
David S. Miller	69f1de1f7c	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== Here is a handful of fixes intended for 3.6. Daniel Drake offers a cfg80211 fix to consume pending events before taking a wireless device down. This prevents a resource leak. Stanislaw Gruszka gives us a fix for a NULL pointer dereference in rt61pci. Johannes Berg provides an iwlwifi patch to disable "greenfield" mode. Use of that mode was causing a rate scaling problem in for iwlwifi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-10 16:26:41 -07:00
Eric Dumazet	b5ec8eeac4	ipv4: fix ip_send_skb() ip_send_skb() can send orphaned skb, so we must pass the net pointer to avoid possible NULL dereference in error path. Bug added by commit `3a7c384ffd` (ipv4: tcp: unicast_sock should not land outside of TCP stack) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-10 14:08:57 -07:00
John W. Linville	57f784fed3	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next	2012-08-10 15:13:12 -04:00
John W. Linville	bbf2e65258	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth	2012-08-10 14:41:38 -04:00
John W. Linville	039aafba1b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem	2012-08-10 14:05:38 -04:00
Patrick McHardy	f22eb25cf5	netfilter: nf_nat_sip: fix via header translation with multiple parameters Via-headers are parsed beginning at the first character after the Via-address. When the address is translated first and its length decreases, the offset to start parsing at is incorrect and header parameters might be missed. Update the offset after translating the Via-address to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-10 11:53:18 +02:00
Patrick McHardy	02b69cbdc2	netfilter: nf_ct_sip: fix IPv6 address parsing Within SIP messages IPv6 addresses are enclosed in square brackets in most cases, with the exception of the "received=" header parameter. Currently the helper fails to parse enclosed addresses. This patch: - changes the SIP address parsing function to enforce square brackets when required, and accept them when not required but present, as recommended by RFC 5118. - adds a new SDP address parsing function that never accepts square brackets since SDP doesn't use them. With these changes, the SIP helper correctly parses all test messages from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages for Internet Protocol Version 6 (IPv6)). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-10 11:53:11 +02:00
Patrick McHardy	e9324b2ce6	netfilter: nf_ct_sip: fix helper name Commit `3a8fc53a` (netfilter: nf_ct_helper: allocate 16 bytes for the helper and policy names) introduced a bug in the SIP helper, the helper name is sprinted to the sip_names array instead of instead of into the helper structure. This breaks the helper match and the /proc/net/nf_conntrack_expect output. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2012-08-10 11:53:03 +02:00
Eric Dumazet	63d02d157e	net: tcp: ipv6_mapped needs sk_rx_dst_set method commit `5d299f3d3c` (net: ipv6: fix TCP early demux) added a regression for ipv6_mapped case. [ 67.422369] SELinux: initialized (dev autofs, type autofs), uses genfs_contexts [ 67.449678] SELinux: initialized (dev autofs, type autofs), uses genfs_contexts [ 92.631060] BUG: unable to handle kernel NULL pointer dereference at (null) [ 92.631435] IP: [< (null)>] (null) [ 92.631645] PGD 0 [ 92.631846] Oops: 0010 [#1] SMP [ 92.632095] Modules linked in: autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath dm_mod video sbs sbshc battery ac lp parport sg snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device pcspkr snd_pcm_oss snd_mixer_oss snd_pcm snd_timer serio_raw button floppy snd i2c_i801 i2c_core soundcore snd_page_alloc shpchp ide_cd_mod cdrom microcode ehci_hcd ohci_hcd uhci_hcd [ 92.634294] CPU 0 [ 92.634294] Pid: 4469, comm: sendmail Not tainted 3.6.0-rc1 #3 [ 92.634294] RIP: 0010:[<0000000000000000>] [< (null)>] (null) [ 92.634294] RSP: 0018:ffff880245fc7cb0 EFLAGS: 00010282 [ 92.634294] RAX: ffffffffa01985f0 RBX: ffff88024827ad00 RCX: 0000000000000000 [ 92.634294] RDX: 0000000000000218 RSI: ffff880254735380 RDI: ffff88024827ad00 [ 92.634294] RBP: ffff880245fc7cc8 R08: 0000000000000001 R09: 0000000000000000 [ 92.634294] R10: 0000000000000000 R11: ffff880245fc7bf8 R12: ffff880254735380 [ 92.634294] R13: ffff880254735380 R14: 0000000000000000 R15: 7fffffffffff0218 [ 92.634294] FS: 00007f4516ccd6f0(0000) GS:ffff880256600000(0000) knlGS:0000000000000000 [ 92.634294] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 92.634294] CR2: 0000000000000000 CR3: 0000000245ed1000 CR4: 00000000000007f0 [ 92.634294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 92.634294] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 92.634294] Process sendmail (pid: 4469, threadinfo ffff880245fc6000, task ffff880254b8cac0) [ 92.634294] Stack: [ 92.634294] ffffffff813837a7 ffff88024827ad00 ffff880254b6b0e8 ffff880245fc7d68 [ 92.634294] ffffffff81385083 00000000001d2680 ffff8802547353a8 ffff880245fc7d18 [ 92.634294] ffffffff8105903a ffff88024827ad60 0000000000000002 00000000000000ff [ 92.634294] Call Trace: [ 92.634294] [<ffffffff813837a7>] ? tcp_finish_connect+0x2c/0xfa [ 92.634294] [<ffffffff81385083>] tcp_rcv_state_process+0x2b6/0x9c6 [ 92.634294] [<ffffffff8105903a>] ? sched_clock_cpu+0xc3/0xd1 [ 92.634294] [<ffffffff81059073>] ? local_clock+0x2b/0x3c [ 92.634294] [<ffffffff8138caf3>] tcp_v4_do_rcv+0x63a/0x670 [ 92.634294] [<ffffffff8133278e>] release_sock+0x128/0x1bd [ 92.634294] [<ffffffff8139f060>] __inet_stream_connect+0x1b1/0x352 [ 92.634294] [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f [ 92.634294] [<ffffffff8104b333>] ? wake_up_bit+0x25/0x25 [ 92.634294] [<ffffffff813325f5>] ? lock_sock_nested+0x74/0x7f [ 92.634294] [<ffffffff8139f223>] ? inet_stream_connect+0x22/0x4b [ 92.634294] [<ffffffff8139f234>] inet_stream_connect+0x33/0x4b [ 92.634294] [<ffffffff8132e8cf>] sys_connect+0x78/0x9e [ 92.634294] [<ffffffff813fd407>] ? sysret_check+0x1b/0x56 [ 92.634294] [<ffffffff81088503>] ? __audit_syscall_entry+0x195/0x1c8 [ 92.634294] [<ffffffff811cc26e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 92.634294] [<ffffffff813fd3e2>] system_call_fastpath+0x16/0x1b [ 92.634294] Code: Bad RIP value. [ 92.634294] RIP [< (null)>] (null) [ 92.634294] RSP <ffff880245fc7cb0> [ 92.634294] CR2: 0000000000000000 [ 92.648982] ---[ end trace 24e2bed94314c8d9 ]--- [ 92.649146] Kernel panic - not syncing: Fatal exception in interrupt Fix this using inet_sk_rx_dst_set(), and export this function in case IPv6 is modular. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 20:56:09 -07:00
Eric Dumazet	3a7c384ffd	ipv4: tcp: unicast_sock should not land outside of TCP stack commit `be9f4a44e7` (ipv4: tcp: remove per net tcp_sock) added a selinux regression, reported and bisected by John Stultz selinux_ip_postroute_compat() expect to find a valid sk->sk_security pointer, but this field is NULL for unicast_sock It turns out that unicast_sock are really temporary stuff to be able to reuse part of IP stack (ip_append_data()/ip_push_pending_frames()) Fact is that frames sent by ip_send_unicast_reply() should be orphaned to not fool LSM. Note IPv6 never had this problem, as tcp_v6_send_response() doesnt use a fake socket at all. I'll probably implement tcp_v4_send_response() to remove these unicast_sock in linux-3.7 Reported-by: John Stultz <johnstul@us.ibm.com> Bisected-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@parisplace.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 20:56:08 -07:00
Julian Anastasov	3654e61137	ipvs: add pmtu_disc option to disable IP DF for TUN packets Disabling PMTU discovery can increase the output packet rate but some users have enough resources and prefer to fragment than to drop traffic. By default, we copy the DF bit but if pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-08-10 10:35:07 +09:00
Julian Anastasov	f2edb9f770	ipvs: implement passive PMTUD for IPIP packets IPVS is missing the logic to update PMTU in routing for its IPIP packets. We monitor the dst_mtu and can return FRAG_NEEDED messages but if the tunneled packets get ICMP error we can not rely on other traffic to save the lowest MTU. The following patch adds ICMP handling for IPIP packets in incoming direction, from some remote host to our local IP used as saddr in the outer header. By this way we can forward any related ICMP traffic if it is for IPVS TUN connection. For the special case of PMTUD we update the routing and if client requested DF we can forward the error. To properly update the routing we have to bind the cached route (dest->dst_cache) to the selected saddr because ipv4_update_pmtu uses saddr for dst lookup. Add IP_VS_RT_MODE_CONNECT flag to force such binding with second route. Update ip_vs_tunnel_xmit to provide IP_VS_RT_MODE_CONNECT and change the code to copy DF. For now we prefer not to force PMTU discovery (outer DF=1) because we don't have configuration option to enable or disable PMTUD. As we do not keep any packets to resend, we prefer not to play games with packets without DF bit because the sender is not informed when they are rejected. Also, change ops->update_pmtu to be called only for local clients because there is no point to update MTU for input routes, in our case skb->dst->dev is lo. It seems the code is copied from ipip.c where the skb dst points to tunnel device. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-08-10 10:35:03 +09:00
Claudiu Ghioc	2b2d280817	ipvs: fixed sparse warning Removed the following sparse warnings, wether CONFIG_SYSCTL is defined or not: * warning: symbol 'ip_vs_control_net_init_sysctl' was not declared. Should it be static? * warning: symbol 'ip_vs_control_net_cleanup_sysctl' was not declared. Should it be static? Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-08-10 10:34:51 +09:00
Julian Anastasov	be97fdb5fb	ipvs: generalize app registration in netns Get rid of the ftp_app pointer and allow applications to be registered without adding fields in the netns_ipvs structure. v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-08-10 10:34:51 +09:00
Julian Anastasov	aaea4ed74d	ipvs: ip_vs_ftp depends on nf_conntrack_ftp helper The FTP application indirectly depends on the nf_conntrack_ftp helper for proper NAT support. If the module is not loaded, IPVS can resize the packets for the command connection, eg. PASV response but the SEQ adjustment logic in ipv4_confirm is not called without helper. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2012-08-10 10:34:51 +09:00
Pavel Emelyanov	1fb9489bf1	net: Loopback ifindex is constant now As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:18:07 -07:00
Pavel Emelyanov	aa79e66eee	net: Make ifindex generation per-net namespace Strictly speaking this is only _really_ required for checkpoint-restore to make loopback device always have the same index. This change appears to be safe wrt "ifindex should be unique per-system" concept, as all the ifindex usage is either already made per net namespace of is explicitly limited with init_net only. There are two cool side effects of this. The first one -- ifindices of devices in container are always small, regardless of how many containers we've started (and re-started) so far. The second one is -- we can speed up the loopback ifidex access as shown in the next patch. v2: Place ifindex right after dev_base_seq : avoid two holes and use the same cache line, dirtied in list_netdevice()/unlist_netdevice() Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:18:07 -07:00
Pavel Emelyanov	9c7dafbfab	net: Allow to create links with given ifindex Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index is not zero. I propose to allow requesting ifindices on link creation. This is required by the checkpoint-restore to correctly restore a net namespace (i.e. -- a container). Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:18:06 -07:00
Eric Dumazet	a399a80531	time: jiffies_delta_to_clock_t() helper to the rescue Various /proc/net files sometimes report crazy timer values, expressed in clock_t units. This happens when an expired timer delta (expires - jiffies) is passed to jiffies_to_clock_t(). This function has an overflow in : return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ); commit `cbbc719fcc` (time: Change jiffies_to_clock_t() argument type to unsigned long) only got around the problem. As we cant output negative values in /proc/net/tcp without breaking various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper that caps the negative delta to a 0 value. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: hank <pyu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 16:17:03 -07:00
Eric Dumazet	36471012e2	tcp: must free metrics at net dismantle We currently leak all tcp metrics at struct net dismantle time. tcp_net_metrics_exit() frees the hash table, we must first iterate it to free all metrics. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-09 02:31:37 -07:00
Alexey Khoroshilov	7364e445f6	net/core: Fix potential memory leak in dev_set_alias() Do not leak memory by updating pointer with potentially NULL realloc return value. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 16:06:23 -07:00
Jesper Juhl	155e4e12b9	batman-adv: Fix mem leak in the batadv_tt_local_event() function Memory is allocated for 'tt_change_node' with kmalloc(). 'tt_change_node' may go out of scope really being used for anything (except have a few members initialized) if we hit the 'del:' label. This patch makes sure we free the memory in that case. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 16:04:04 -07:00
Paolo Valente	be72f63b4c	sched: add missing group change to qfq_change_class [Resending again, as the text was corrupted by the email client] To speed up operations, QFQ internally divides classes into groups. Which group a class belongs to depends on the ratio between the maximum packet length and the weight of the class. Unfortunately the function qfq_change_class lacks the steps for changing the group of a class when the ratio max_pkt_len/weight of the class changes. For example, when the last of the following three commands is executed, the group of class 1:1 is not correctly changed: tc disc add dev XXX root handle 1: qfq tc class add dev XXX parent 1: qfq classid 1:1 weight 1 tc class change dev XXX parent 1: classid 1:1 qfq weight 4 Not changing the group of a class does not affect the long-term bandwidth guaranteed to the class, as the latter is independent of the maximum packet length, and correctly changes (only) if the weight of the class changes. In contrast, if the group of the class is not updated, the class is still guaranteed the short-term bandwidth and packet delay related to its old group, instead of the guarantees that it should receive according to its new weight and/or maximum packet length. This may also break service guarantees for other classes. This patch adds the missing operations. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 16:02:05 -07:00
Eric Dumazet	a37e6e3449	net: force dst_default_metrics to const section While investigating on network performance problems, I found this little gem : $ nm -v vmlinux \| grep -1 dst_default_metrics ffffffff82736540 b busy.46605 ffffffff82736560 B dst_default_metrics ffffffff82736598 b dst_busy_list Apparently, declaring a const array without initializer put it in (writeable) bss section, in middle of possibly often dirtied cache lines. Since we really want dst_default_metrics be const to avoid any possible false sharing and catch any buggy writes, I force a null initializer. ffffffff818a4c20 R dst_default_metrics Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 16:00:28 -07:00
Eric Dumazet	0c03eca3d9	net: fib: fix incorrect call_rcu_bh() After IP route cache removal, I believe rcu_bh() has very little use and we should remove this RCU variant, since it adds some cycles in fast path. Anyway, the call_rcu_bh() use in fib_true is obviously wrong, since some users only assert rcu_read_lock(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 15:57:46 -07:00
Ying Xue	99aa3473e6	af_packet: Quiet sparse noise about using plain integer as NULL pointer Quiets the sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-08 15:43:22 -07:00
Linus Torvalds	f4ba394c1b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Missed rcu_assign_pointer() in mac80211 scanning, from Johannes Berg. 2) Allow devices to limit the number of segments that an individual TCP TSO packet can use at a time, to deal with device and/or driver specific limitations. From Ben Hutchings. 3) Fix unexpected hard IPSEC expiration after setting the date. From Fan Du. 4) Memory leak fix in bxn2x driver, from Jesper Juhl. 5) Fix two memory leaks in libertas driver, from Daniel Drake. 6) Fix deref of out-of-range array index in packet scheduler generic actions layer. From Hiroaki SHIMODA. 7) Fix TX flow control errors in mlx4 driver, from Yevgeny Petrilin. 8) Fix CRIS eth_v10.c driver build, from Randy Dunlap. 9) Fix wrong SKB freeing in LLC protocol layer, from Sorin Dumitru. 10) The IP output path checks neigh lookup errors incorrectly, it needs to use IS_ERR(). From Vasiliy Kulikov. 11) An estimator leak leads to deref of freed memory in timer handler, fix from Hiroaki SHIMODA. 12) TCP early demux in ipv6 needs to use DST cookies in order to validate the RX route properly. Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits) net: ipv6: fix TCP early demux net: Use PTR_RET rather than if(IS_ERR(.. [1] net_sched: act: Delete estimator in error path. ip: fix error handling in ip_finish_output2() llc: free the right skb ixp4xx_eth: fix ptp_ixp46x build failure drivers/atm/iphase.c: fix error return code tcp_output: fix sparse warning for tcp_wfree drivers/net/phy/mdio-mux-gpio.c: drop devm_kfree of devm_kzalloc'd data batman-adv: select an internet gateway if none was chosen mISDN: Bugfix for layer2 fixed TEI mode igb: don't break user visible strings over multiple lines in igb_ethtool.c igb: correct hardware type (i210/i211) check in igb_loopback_test() igb: Fix for failure to init on some 82576 devices. cris: fix eth_v10.c build error cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN isdnloop: fix and simplify isdnloop_init() hyperv: Move wait completion msg code into rndis_filter_halt_device() net/mlx4_core: Remove port type restrictions net/mlx4_en: Fixing TX queue stop/wake flow ...	2012-08-08 20:06:43 +03:00
Eric Dumazet	79cda75a10	fib: use __fls() on non null argument __fls(x) is a bit faster than fls(x), granted we know x is non null. As Ben Hutchings pointed out, fls(x) = __fls(x) + 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-07 16:24:55 -07:00
Eric Dumazet	aae06bf5f9	tcp: ecn: dont delay ACKS after CE While playing with CoDel and ECN marking, I discovered a non optimal behavior of receiver of CE (Congestion Encountered) segments. In pathological cases, sender has reduced its cwnd to low values, and receiver delays its ACK (by 40 ms). While RFC 3168 6.1.3 (The TCP Receiver) doesn't explicitly recommend to send immediate ACKS, we believe its better to not delay ACKS, because a CE segment should give same signal than a dropped segment, and its quite important to reduce RTT to give ECE/CWR signals as fast as possible. Note we already call tcp_enter_quickack_mode() from TCP_ECN_check_ce() if we receive a retransmit, for the same reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 14:14:34 -07:00
Eric Dumazet	a9e050f4e7	net: tcp: GRO should be ECN friendly While doing TCP ECN tests, I discovered GRO was reordering packets if it receives one packet with CE set, while previous packets in same NAPI run have ECT(0) for the same flow : 09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags [DF], proto TCP (6), length 4396) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 4344 09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF], proto TCP (6), length 1500) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 1448 09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF], proto TCP (6), length 64) 172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f (incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val 1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0 We have two problems here : 1) GRO reorders packets If NIC gave packet1, then packet2, which happen to be from "different flows" GRO feeds stack with packet2, then packet1. I have yet to understand how to solve this problem. 2) GRO is not ECN friendly Delivering packets out of order makes TCP stack not as fast as it could be. In this patch I suggest we make the tos test not part of the 'same_flow' determination, but part of the 'should flush' logic Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:40:47 -07:00
Eric Dumazet	5d299f3d3c	net: ipv6: fix TCP early demux IPv6 needs a cookie in dst_check() call. We need to add rx_dst_cookie and provide a family independent sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:33:21 -07:00
Hiroaki SHIMODA	47fd92f5a7	net_sched: act: Delete estimator in error path. Some action modules free struct tcf_common in their error path while estimator is still active. This results in est_timer() dereference freed memory. Add gen_kill_estimator() in ipt, pedit and simple action. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:30:01 -07:00
Vasiliy Kulikov	9871f1ad67	ip: fix error handling in ip_finish_output2() __neigh_create() returns either a pointer to struct neighbour or PTR_ERR(). But the caller expects it to return either a pointer or NULL. Replace the NULL check with IS_ERR() check. The bug was introduced in `a263b30936` ("ipv4: Make neigh lookups directly in output packet path."). Signed-off-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:30:01 -07:00
Sorin Dumitru	91d27a8650	llc: free the right skb We are freeing skb instead of nskb, resulting in a double free on skb and a leak from nskb. Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:30:01 -07:00
Silviu-Mihai Popescu	8e7dfbc8d1	tcp_output: fix sparse warning for tcp_wfree Fix sparse warning: * symbol 'tcp_wfree' was not declared. Should it be static? Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:29:56 -07:00
Marek Lindner	caa0bf648c	batman-adv: select an internet gateway if none was chosen This is a regression introduced by: `2265c14108` ("batman-adv: gateway election code refactoring") Reported-by: Nicolás Echániz <nicoechaniz@codigosur.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-06 13:23:46 -07:00
Daniel Drake	1f6fc43e62	cfg80211: process pending events when unregistering net device libertas currently calls cfg80211_disconnected() when it is being brought down. This causes an event to be allocated, but since the wdev is already removed from the rdev by the time that the event processing work executes, the event is never processed or freed. http://article.gmane.org/gmane.linux.kernel.wireless.general/95666 Fix this leak, and other possible situations, by processing the event queue when a device is being unregistered. Thanks to Johannes Berg for the suggestion. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: stable@vger.kernel.org Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-08-06 14:29:58 -04:00
Jaganath Kanakkassery	49dfbb9129	Bluetooth: Fix socket not getting freed if l2cap channel create fails If l2cap_chan_create() fails then it will return from l2cap_sock_kill since zapped flag of sk is reset. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:37 -03:00
Andrei Emeltchenko	d08fd0e712	Bluetooth: smp: Fix possible NULL dereference smp_chan_create might return NULL so we need to check before dereferencing smp. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:37 -03:00
Ram Malovany	c3e7c0d90b	Bluetooth: Set name_state to unknown when entry name is empty When the name of the given entry is empty , the state needs to be updated accordingly. Cc: stable@vger.kernel.org Signed-off-by: Ram Malovany <ramm@ti.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:36 -03:00
Ram Malovany	7cc8380eb1	Bluetooth: Fix using a NULL inquiry cache entry If the device was not found in a list of found devices names of which are pending.This may happen in a case when HCI Remote Name Request was sent as a part of incoming connection establishment procedure. Hence there is no need to continue resolving a next name as it will be done upon receiving another Remote Name Request Complete Event. This will fix a kernel crash when trying to use this entry to resolve the next name. Cc: stable@vger.kernel.org Signed-off-by: Ram Malovany <ramm@ti.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:36 -03:00
Ram Malovany	c810089c27	Bluetooth: Fix using NULL inquiry entry If entry wasn't found in the hci_inquiry_cache_lookup_resolve do not resolve the name.This will fix a kernel crash when trying to use NULL pointer. Cc: stable@vger.kernel.org Signed-off-by: Ram Malovany <ramm@ti.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:36 -03:00
Szymon Janc	a9ea3ed9b7	Bluetooth: Fix legacy pairing with some devices Some devices e.g. some Android based phones don't do SDP search before pairing and cancel legacy pairing when ACL is disconnected. PIN Code Request event which changes ACL timeout to HCI_PAIRING_TIMEOUT is only received after remote user entered PIN. In that case no L2CAP is connected so default HCI_DISCONN_TIMEOUT (2 seconds) is being used to timeout ACL connection. This results in problems with legacy pairing as remote user has only few seconds to enter PIN before ACL is disconnected. Increase disconnect timeout for incomming connection to HCI_PAIRING_TIMEOUT if SSP is disabled and no linkey exists. To avoid keeping ACL alive for too long after SDP search set ACL timeout back to HCI_DISCONN_TIMEOUT when L2CAP is connected. 2012-07-19 13:24:43.413521 < HCI Command: Create Connection (0x01\|0x0005) plen 13 bdaddr 00:02:72:D6:6A:3F ptype 0xcc18 rswitch 0x01 clkoffset 0x0000 Packet type: DM1 DM3 DM5 DH1 DH3 DH5 2012-07-19 13:24:43.425224 > HCI Event: Command Status (0x0f) plen 4 Create Connection (0x01\|0x0005) status 0x00 ncmd 1 2012-07-19 13:24:43.885222 > HCI Event: Role Change (0x12) plen 8 status 0x00 bdaddr 00:02:72:D6:6A:3F role 0x01 Role: Slave 2012-07-19 13:24:44.054221 > HCI Event: Connect Complete (0x03) plen 11 status 0x00 handle 42 bdaddr 00:02:72:D6:6A:3F type ACL encrypt 0x00 2012-07-19 13:24:44.054313 < HCI Command: Read Remote Supported Features (0x01\|0x001b) plen 2 handle 42 2012-07-19 13:24:44.055176 > HCI Event: Page Scan Repetition Mode Change (0x20) plen 7 bdaddr 00:02:72:D6:6A:3F mode 0 2012-07-19 13:24:44.056217 > HCI Event: Max Slots Change (0x1b) plen 3 handle 42 slots 5 2012-07-19 13:24:44.059218 > HCI Event: Command Status (0x0f) plen 4 Read Remote Supported Features (0x01\|0x001b) status 0x00 ncmd 0 2012-07-19 13:24:44.062192 > HCI Event: Command Status (0x0f) plen 4 Unknown (0x00\|0x0000) status 0x00 ncmd 1 2012-07-19 13:24:44.067219 > HCI Event: Read Remote Supported Features (0x0b) plen 11 status 0x00 handle 42 Features: 0xbf 0xfe 0xcf 0xfe 0xdb 0xff 0x7b 0x87 2012-07-19 13:24:44.067248 < HCI Command: Read Remote Extended Features (0x01\|0x001c) plen 3 handle 42 page 1 2012-07-19 13:24:44.071217 > HCI Event: Command Status (0x0f) plen 4 Read Remote Extended Features (0x01\|0x001c) status 0x00 ncmd 1 2012-07-19 13:24:44.076218 > HCI Event: Read Remote Extended Features (0x23) plen 13 status 0x00 handle 42 page 1 max 1 Features: 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00 2012-07-19 13:24:44.076249 < HCI Command: Remote Name Request (0x01\|0x0019) plen 10 bdaddr 00:02:72:D6:6A:3F mode 2 clkoffset 0x0000 2012-07-19 13:24:44.081218 > HCI Event: Command Status (0x0f) plen 4 Remote Name Request (0x01\|0x0019) status 0x00 ncmd 1 2012-07-19 13:24:44.105214 > HCI Event: Remote Name Req Complete (0x07) plen 255 status 0x00 bdaddr 00:02:72:D6:6A:3F name 'uw000951-0' 2012-07-19 13:24:44.105284 < HCI Command: Authentication Requested (0x01\|0x0011) plen 2 handle 42 2012-07-19 13:24:44.111207 > HCI Event: Command Status (0x0f) plen 4 Authentication Requested (0x01\|0x0011) status 0x00 ncmd 1 2012-07-19 13:24:44.112220 > HCI Event: Link Key Request (0x17) plen 6 bdaddr 00:02:72:D6:6A:3F 2012-07-19 13:24:44.112249 < HCI Command: Link Key Request Negative Reply (0x01\|0x000c) plen 6 bdaddr 00:02:72:D6:6A:3F 2012-07-19 13:24:44.115215 > HCI Event: Command Complete (0x0e) plen 10 Link Key Request Negative Reply (0x01\|0x000c) ncmd 1 status 0x00 bdaddr 00:02:72:D6:6A:3F 2012-07-19 13:24:44.116215 > HCI Event: PIN Code Request (0x16) plen 6 bdaddr 00:02:72:D6:6A:3F 2012-07-19 13:24:48.099184 > HCI Event: Auth Complete (0x06) plen 3 status 0x13 handle 42 Error: Remote User Terminated Connection 2012-07-19 13:24:48.179182 > HCI Event: Disconn Complete (0x05) plen 4 status 0x00 handle 42 reason 0x13 Reason: Remote User Terminated Connection Cc: stable@vger.kernel.org Signed-off-by: Szymon Janc <szymon.janc@tieto.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:36 -03:00
Gustavo Padovan	269c4845d5	Bluetooth: Fix possible deadlock in SCO code sco_chan_del() only has conn != NULL when called from sco_conn_del() so just move the code from it that deal with conn to sco_conn_del(). [ 120.765529] [ 120.765529] ====================================================== [ 120.766529] [ INFO: possible circular locking dependency detected ] [ 120.766529] 3.5.0-rc1-10292-g3701f94-dirty #70 Tainted: G W [ 120.766529] ------------------------------------------------------- [ 120.766529] kworker/u:3/1497 is trying to acquire lock: [ 120.766529] (&(&conn->lock)->rlock#2){+.+...}, at: [<ffffffffa00b7ecc>] sco_chan_del+0x4c/0x170 [bluetooth] [ 120.766529] [ 120.766529] but task is already holding lock: [ 120.766529] (slock-AF_BLUETOOTH-BTPROTO_SCO){+.+...}, at: [<ffffffffa00b8401>] sco_conn_del+0x61/0xe0 [bluetooth] [ 120.766529] [ 120.766529] which lock already depends on the new lock. [ 120.766529] [ 120.766529] [ 120.766529] the existing dependency chain (in reverse order) is: [ 120.766529] [ 120.766529] -> #1 (slock-AF_BLUETOOTH-BTPROTO_SCO){+.+...}: [ 120.766529] [<ffffffff8107980e>] lock_acquire+0x8e/0xb0 [ 120.766529] [<ffffffff813c19e0>] _raw_spin_lock+0x40/0x80 [ 120.766529] [<ffffffffa00b85e9>] sco_connect_cfm+0x79/0x300 [bluetooth] [ 120.766529] [<ffffffffa0094b13>] hci_sync_conn_complete_evt.isra.90+0x343/0x400 [bluetooth] [ 120.766529] [<ffffffffa009d447>] hci_event_packet+0x317/0xfb0 [bluetooth] [ 120.766529] [<ffffffffa008aa68>] hci_rx_work+0x2c8/0x890 [bluetooth] [ 120.766529] [<ffffffff81047db7>] process_one_work+0x197/0x460 [ 120.766529] [<ffffffff810489d6>] worker_thread+0x126/0x2d0 [ 120.766529] [<ffffffff8104ee4d>] kthread+0x9d/0xb0 [ 120.766529] [<ffffffff813c4294>] kernel_thread_helper+0x4/0x10 [ 120.766529] [ 120.766529] -> #0 (&(&conn->lock)->rlock#2){+.+...}: [ 120.766529] [<ffffffff81078a8a>] __lock_acquire+0x154a/0x1d30 [ 120.766529] [<ffffffff8107980e>] lock_acquire+0x8e/0xb0 [ 120.766529] [<ffffffff813c19e0>] _raw_spin_lock+0x40/0x80 [ 120.766529] [<ffffffffa00b7ecc>] sco_chan_del+0x4c/0x170 [bluetooth] [ 120.766529] [<ffffffffa00b8414>] sco_conn_del+0x74/0xe0 [bluetooth] [ 120.766529] [<ffffffffa00b88a2>] sco_disconn_cfm+0x32/0x60 [bluetooth] [ 120.766529] [<ffffffffa0093a82>] hci_disconn_complete_evt.isra.53+0x242/0x390 [bluetooth] [ 120.766529] [<ffffffffa009d747>] hci_event_packet+0x617/0xfb0 [bluetooth] [ 120.766529] [<ffffffffa008aa68>] hci_rx_work+0x2c8/0x890 [bluetooth] [ 120.766529] [<ffffffff81047db7>] process_one_work+0x197/0x460 [ 120.766529] [<ffffffff810489d6>] worker_thread+0x126/0x2d0 [ 120.766529] [<ffffffff8104ee4d>] kthread+0x9d/0xb0 [ 120.766529] [<ffffffff813c4294>] kernel_thread_helper+0x4/0x10 [ 120.766529] [ 120.766529] other info that might help us debug this: [ 120.766529] [ 120.766529] Possible unsafe locking scenario: [ 120.766529] [ 120.766529] CPU0 CPU1 [ 120.766529] ---- ---- [ 120.766529] lock(slock-AF_BLUETOOTH-BTPROTO_SCO); [ 120.766529] lock(&(&conn->lock)->rlock#2); [ 120.766529] lock(slock-AF_BLUETOOTH-BTPROTO_SCO); [ 120.766529] lock(&(&conn->lock)->rlock#2); [ 120.766529] [ 120.766529] * DEADLOCK * Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:19:36 -03:00
Andre Guedes	cd17decbd9	Bluetooth: Refactor in hci_le_conn_complete_evt This patch moves the hci_conn check to begining of hci_le_conn_ complete_evt in order to improve code's readability and better error handling. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:08:41 -03:00
Andre Guedes	b47a09b33a	Bluetooth: Lookup hci_conn in hci_le_conn_complete_evt This patch does a trivial code refactoring in hci_conn lookup in hci_le_conn_complete_evt. It performs the hci_conn lookup at the begining of the function since it is used by both flows (error and success). Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:08:10 -03:00
Andre Guedes	0c95ab78be	Bluetooth: Find hci_conn by BT_CONNECT state This patch changes hci_cs_le_create_conn to perform hci_conn lookup by state instead of bdaddr. Since we can have only one LE connection in BT_CONNECT state, we can perform LE hci_conn lookup by state. This way, we don't rely on hci_sent_cmd_data helper to find the hci_conn object. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:07:49 -03:00
Andre Guedes	f00a06ac14	Bluetooth: Refactor hci_cs_le_create_conn This patch does some code refactoring in hci_cs_le_create_conn function. The hci_conn object is only needed in case of failure, therefore hdev locking and hci_conn lookup were moved to if-statement scope. Also, the conn->state check was removed since we should always close the connection if it fails. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:07:05 -03:00
Andre Guedes	847012c5e0	Bluetooth: Remove unneeded code This patch removes some unneeded code from hci_cs_le_create_conn. If the hci_conn is not found, it means this LE connection attempt was triggered by a thrid-party tool (e.g. hcitool). We should not create this new hci_conn in LE Create Connection command status event since it is already properly handled in LE Connection Complete event. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:05:51 -03:00
Andre Guedes	b9b343d254	Bluetooth: Fix hci_le_conn_complete_evt We need to check the 'Role' parameter from the LE Connection Complete Event in order to properly set 'out' and 'link_mode' fields from hci_conn structure. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:05:10 -03:00
Andre Guedes	230fd16a23	Bluetooth: Trivial refactoring This patch replaces the unlock-and-return statements by the goto statement. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:04:39 -03:00
Masatake YAMATO	de9b9212c7	Bluetooth: Added /proc/net/sco via bt_procfs_init() Added /proc/net/sco via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:03:00 -03:00
Masatake YAMATO	c6f5df16a2	Bluetooth: Added /proc/net/rfcomm via bt_procfs_init() Added /proc/net/rfcomm via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:03:00 -03:00
Masatake YAMATO	5b28d95c13	Bluetooth: Added /proc/net/l2cap via bt_procfs_init() Added /proc/net/l2cap via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:59 -03:00
Masatake YAMATO	5c6ad8eee0	Bluetooth: Added /proc/net/hidp via bt_procfs_init() Added /proc/net/hidp via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:59 -03:00
Masatake YAMATO	f7c8663789	Bluetooth: Added /proc/net/hci via bt_procfs_init() Added /proc/net/hci via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:59 -03:00
Masatake YAMATO	8c8de589ce	Bluetooth: Added /proc/net/cmtp via bt_procfs_init() Added /proc/net/cmtp via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:59 -03:00
Masatake YAMATO	77cf5585a3	Bluetooth: Added /proc/net/bnep via bt_procfs_init() Added /proc/net/bnep via bt_procfs_init(). Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:59 -03:00
Masatake YAMATO	256a06c8a8	Bluetooth: /proc/net/ entries for bluetooth protocols lsof command can tell the type of socket processes are using. Internal lsof uses inode numbers on socket fs to resolve the type of sockets. Files under /proc/net/, such as tcp, udp, unix, etc provides such inode information. Unfortunately bluetooth related protocols don't provide such inode information. This patch series introduces /proc/net files for the protocols. This patch against af_bluetooth.c provides facility to the implementation of protocols. This patch extends bt_sock_list and introduces two exported function bt_procfs_init, bt_procfs_cleanup. The type bt_sock_list is already used in some of implementation of protocols. bt_procfs_init prepare seq_operations which converts protocol own bt_sock_list data to protocol own proc entry when the entry is accessed. What I, lsof user, need is just inode number of bluetooth socket. However, people may want more information. The bt_procfs_init takes a function pointer for customizing the show handler of seq_operations. In v4 patch, __acquires and __releases attributes are added to suppress sparse warning. Suggested by Andrei Emeltchenko. In v5 patch, linux/proc_fs.h is included to use PDE. Build error is reported by Fengguang Wu. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery	4af66c691f	Bluetooth: Free the l2cap channel list only when refcount is zero Move the l2cap channel list chan->global_l under the refcnt protection and free it based on the refcnt. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:58 -03:00
Jaganath Kanakkassery	3064837289	Bluetooth: Move l2cap_chan_hold/put to l2cap_core.c Refactor the code in order to use the l2cap_chan_destroy() from l2cap_chan_put() under the refcnt protection. Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com> Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Reviewed-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:58 -03:00
Andre Guedes	ee72d150ad	Bluetooth: Remove locking in hci_user_passkey_request_evt This patch removes hdev locking in hci_user_passkey_request_evt since it is not needed. mgmt_user_passkey_request simply calls mgmt_event which does not require hdev locking at all. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:58 -03:00
Andrei Emeltchenko	9e66463127	Bluetooth: Make connect / disconnect cfm functions return void Return values are never used because callers hci_proto_connect_cfm and hci_proto_disconn_cfm return void. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:58 -03:00
Andre Guedes	c58e810eb0	Bluetooth: Use lmp_no_flush_capable where applicable This patch replaces all LMP_NO_FLUSH bit checking by the helper macro lmp_no_flush_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:57 -03:00
Andre Guedes	999dcd10a8	Bluetooth: Use lmp_sniffsubr_capable where applicable This patch replaces all LMP_SNIFF_SUBR bit checking by the helper macro lmp_sniffsubr_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:57 -03:00
Andre Guedes	6eded1004a	Bluetooth: Use lmp_sniff_capable where applicable This patch replaces all LMP_SNIFF bit checking by the helper macro lmp_sniff_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:57 -03:00
Andre Guedes	9f92ebf6c7	Bluetooth: Use lmp_rswitch_capable where applicable This patch replaces all LMP_RSWITCH bit checking by the helper macro lmp_rswitch_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:57 -03:00
Andre Guedes	45db810fb7	Bluetooth: Use lmp_esco_capable where applicable This patch replaces all LMP_ESCO bit checking by the helper macro lmp_esco_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:57 -03:00
Andre Guedes	9a1a1996d5	Bluetooth: Use lmp_ssp_capable where applicable This patch replaces all LMP_SIMPLE_PAIR bit checking by the helper macro lmp_ssp_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:56 -03:00
Andre Guedes	c383ddc481	Bluetooth: Use lmp_le_capable where applicable This patch replaces all LMP_LE bit checking by the helper macro lmp_le_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:56 -03:00
Andre Guedes	ed3fa31f35	Bluetooth: Use lmp_bredr_capable where applicable This patch replaces all LMP_NO_BREDR bit checking by the helper macro lmp_bredr_capable. Signed-off-by: Andre Guedes <andre.guedes@openbossa.org> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:56 -03:00
Andrei Emeltchenko	78eb2f985c	Bluetooth: Fix processing A2MP chan in security_cfm Do not process A2MP channel in l2cap_security_cfm Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:56 -03:00
Andrei Emeltchenko	d9fc1d54f6	Bluetooth: Do not shadow hdr variable Fix compile warnings below: ... net/bluetooth/a2mp.c:505:33: warning: symbol 'hdr' shadows an earlier one net/bluetooth/a2mp.c:498:25: originally declared here ... Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:55 -03:00
Andrei Emeltchenko	8e8c7e36fb	Bluetooth: debug: Fix printing A2MP cmd code format Print A2MP code format according to Bluetooth style. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:54 -03:00
Andrei Emeltchenko	bb4b2a9ae3	Bluetooth: mgmt: Managing only BR/EDR HCI controllers Add check that HCI controller is BR/EDR. AMP controller shall not be managed by mgmt interface and consequently user space. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>	2012-08-06 15:02:54 -03:00
Eric Dumazet	9eb43e7653	ipv4: Introduce IN_DEV_NET_ROUTE_LOCALNET performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET() call done in ip_route_input_slow(), because of multiple dereferences, even if cache lines are clean and available in cpu caches. Since we already have the 'net' pointer, introduce IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences (dev_net(in_dev->dev)) Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr or/and daddr are loopback addresse. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-04 01:27:57 -07:00
Eric Dumazet	40384999d1	ipv4: change inet_addr_hash() Use net_hash_mix(net) instead of hash_ptr(net, 8), and use hash_32() instead of using a serie of XOR Define IN4_ADDR_HSIZE_SHIFT for clarity __ip_dev_find() can perform the net_eq() call only if ifa_local matches the key, to avoid unneeded dereferences. remove inline attributes # size net/ipv4/devinet.o.before net/ipv4/devinet.o text data bss dec hex filename 17471 2545 2048 22064 5630 net/ipv4/devinet.o.before 17335 2545 2048 21928 55a8 net/ipv4/devinet.o Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-04 01:27:57 -07:00
Hiroaki SHIMODA	696ecdc106	net_sched: gact: Fix potential panic in tcf_gact(). gact_rand array is accessed by gact->tcfg_ptype whose value is assumed to less than MAX_RAND, but any range checks are not performed. So add a check in tcf_gact_init(). And in tcf_gact(), we can reduce a branch. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-03 16:47:24 -07:00
Thomas Pedersen	f609a43dca	mac80211: skb leak in mesh_plink_frame_tx() Although adding an IE is almost guaranteed to succeed since we already accounted for its length while allocating the skb, we should still free the skb in case of failure. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-03 21:34:29 +02:00
Thomas Pedersen	e7570dfb63	mac80211: don't request ack for peering close It doesn't make a lot of sense to wait for an ack in response to a peering close frame since either peer in this exchange could be going down. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-03 21:34:25 +02:00
Thomas Pedersen	3e17f2be31	mac80211: remove ieee80211_clean_sdata() This function was only used by mesh, and not really needed since any interface-specific cleanup already happens in the netdev handlers. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-03 21:34:22 +02:00
Thomas Pedersen	0d466b9c67	mac80211: improve cleanup when leaving mesh It is not necessary to stop the mesh beacon in the mac80211 ndo_stop handler, since cfg80211 has already left the mesh on NETDEV_GOING_DOWN notification. Also some improvements to ieee80211_stop_mesh(): - flush mpath entries. - flush sta entries per-sdata so we don't remove entries belonging to other vifs on the same hw. Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-03 21:34:17 +02:00
John W. Linville	1a26904eb6	Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211	2012-08-02 13:49:38 -04:00
Sylvain Munaut	f0666b1ac8	libceph: fix crypto key null deref, memory leak Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-08-02 09:19:20 -07:00
Paul Stewart	899852af60	cfg80211: Clear "beacon_found" on regulatory restore Restore the default state to the "beacon_found" flag when the channel flags are restored. Otherwise, we can end up with a channel that we can no longer transmit on even when we can see beacons on that channel. Signed-off-by: Paul Stewart <pstew@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-02 15:34:22 +02:00
Seth Forshee	03f6b0843a	cfg80211: add channel flag to prohibit OFDM operation Currently the only way for wireless drivers to tell whether or not OFDM is allowed on the current channel is to check the regulatory information. However, this requires hodling cfg80211_mutex, which is not visible to the drivers. Other regulatory restrictions are provided as flags in the channel definition, so let's do similarly with OFDM. This patch adds a new flag, IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not allowed. This flag is set on any channels for which regulatory indicates that OFDM is prohibited. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Tested-by: Arend van Spriel <arend@broadcom.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-02 15:30:49 +02:00
Eric Dumazet	e33cdac014	ipv4: route.c cleanup Remove unused includes after IP cache removal Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-02 02:54:43 -07:00
Fan Du	e3c0d04750	Fix unexpected SA hard expiration after changing date After SA is setup, one timer is armed to detect soft/hard expiration, however the timer handler uses xtime to do the math. This makes hard expiration occurs first before soft expiration after setting new date with big interval. As a result new child SA is deleted before rekeying the new one. Signed-off-by: Fan Du <fdu@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-02 00:19:17 -07:00
Ben Hutchings	1485348d24	tcp: Apply device TSO segment limit earlier Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to limit the size of TSO skbs. This avoids the need to fall back to software GSO for local TCP senders. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-02 00:19:17 -07:00
Ben Hutchings	30b678d844	net: Allow driver to limit number of GSO segments per skb A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. Therefore: 1. Add the field net_device::gso_max_segs holding the device-specific limit. 2. In netif_skb_features(), if the number of segments is too high then mask out GSO features to force fall back to software GSO. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-02 00:19:17 -07:00
Johannes Berg	dd4c9260e7	mac80211: cancel mesh path timer The mesh path timer needs to be canceled when leaving the mesh as otherwise it could fire after the interface has been removed already. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-01 21:03:21 +02:00
Johannes Berg	2d9957cce6	mac80211: clear timer bits when disconnecting There's a corner case that can happen when we suspend with a timer running, then resume and disconnect. If we connect again, suspend and resume we might start timers that shouldn't be running. Reset the timer flags to avoid this. This affects both mesh and managed modes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-01 20:58:28 +02:00
Johannes Berg	19c3b8303d	mac80211: reset station MLME flags upon new association When associating anew, the old station MLME flags should be cleared. The only exception is the 40 MHz disable flag as it might have been set while the channel was set in a previous authentication attempt so it needs to be kept intact. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-08-01 20:13:36 +02:00
Linus Torvalds	a0e881b7c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is not dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...	2012-08-01 10:26:23 -07:00
Linus Torvalds	ac694dbdbc	Merge branch 'akpm' (Andrew's patch-bomb) Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...	2012-07-31 19:25:39 -07:00
Linus Torvalds	3e9a97082f	This patch series contains a major revamp of how we collect entropy from interrupts for /dev/random and /dev/urandom. The goal is to addresses weaknesses discussed in the paper "Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices", by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will be published in the Proceedings of the 21st Usenix Security Symposium, August 2012. (See https://factorable.net for more information and an extended version of the paper.) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8 lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+ LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T 6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw 03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi 3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67 AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0 NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0 TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC bi5FC1OolqhvzVIdsqgt =u7KM -----END PGP SIGNATURE----- Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random subsystem patches from Ted Ts'o: "This patch series contains a major revamp of how we collect entropy from interrupts for /dev/random and /dev/urandom. The goal is to addresses weaknesses discussed in the paper "Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices", by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will be published in the Proceedings of the 21st Usenix Security Symposium, August 2012. (See https://factorable.net for more information and an extended version of the paper.)" Fix up trivial conflicts due to nearby changes in drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c} * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits) random: mix in architectural randomness in extract_buf() dmi: Feed DMI table to /dev/random driver random: Add comment to random_initialize() random: final removal of IRQF_SAMPLE_RANDOM um: remove IRQF_SAMPLE_RANDOM which is now a no-op sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op ...	2012-07-31 19:07:42 -07:00
Linus Torvalds	6dbb35b0a7	NFS client updates for Linux 3.6 Features include: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM cV9Ex8zh2Y4aiUXCy3uA =KSix -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO	2012-07-31 18:45:44 -07:00
Linus Torvalds	fd37ce34bd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking update from David S. Miller: "I think Eric Dumazet and I have dealt with all of the known routing cache removal fallout. Some other minor fixes all around. 1) Fix RCU of cached routes, particular of output routes which require liberation via call_rcu() instead of call_rcu_bh(). From Eric Dumazet. 2) Make sure we purge net device references in cached routes properly. 3) TG3 driver bug fixes from Michael Chan. 4) Fix reported 'expires' value in ipv6 routes, from Li Wei. 5) TUN driver ioctl leaks kernel bytes to userspace, from Mathias Krause." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits) ipv4: Properly purge netdev references on uncached routes. ipv4: Cache routes in nexthop exception entries. ipv4: percpu nh_rth_output cache ipv4: Restore old dst_free() behavior. bridge: make port attributes const ipv4: remove rt_cache_rebuild_count net: ipv4: fix RCU races on dst refcounts net: TCP early demux cleanup tun: Fix formatting. net/tun: fix ioctl() based info leaks tg3: Update version to 3.124 tg3: Fix race condition in tg3_get_stats64() tg3: Add New 5719 Read DMA workaround tg3: Fix Read DMA workaround for 5719 A0. tg3: Request APE_LOCK_PHY before PHY access ipv6: fix incorrect route 'expires' value passed to userspace mISDN: Bugfix only few bytes are transfered on a connection seeq: use PTR_RET at init_module of driver bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_path ipv4: clean up put_child ...	2012-07-31 18:43:13 -07:00
Mel Gorman	a564b8f039	nfs: enable swap on NFS Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:48 -07:00
Mel Gorman	c76562b670	netvm: prevent a stream-specific deadlock This patch series is based on top of "Swap-over-NBD without deadlocking v15" as it depends on the same reservation of PF_MEMALLOC reserves logic. When a user or administrator requires swap for their application, they create a swap partition and file, format it with mkswap and activate it with swapon. In diskless systems this is not an option so if swap if required then swapping over the network is considered. The two likely scenarios are when blade servers are used as part of a cluster where the form factor or maintenance costs do not allow the use of disks and thin clients. The Linux Terminal Server Project recommends the use of the Network Block Device (NBD) for swap but this is not always an option. There is no guarantee that the network attached storage (NAS) device is running Linux or supports NBD. However, it is likely that it supports NFS so there are users that want support for swapping over NFS despite any performance concern. Some distributions currently carry patches that support swapping over NFS but it would be preferable to support it in the mainline kernel. Patch 1 avoids a stream-specific deadlock that potentially affects TCP. Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC reserves. Patch 3 adds three helpers for filesystems to handle swap cache pages. For example, page_file_mapping() returns page->mapping for file-backed pages and the address_space of the underlying swap file for swap cache pages. Patch 4 adds two address_space_operations to allow a filesystem to pin all metadata relevant to a swapfile in memory. Upon successful activation, the swapfile is marked SWP_FILE and the address space operation ->direct_IO is used for writing and ->readpage for reading in swap pages. Patch 5 notes that patch 3 is bolting filesystem-specific-swapfile-support onto the side and that the default handlers have different information to what is available to the filesystem. This patch refactors the code so that there are generic handlers for each of the new address_space operations. Patch 6 adds an API to allow a vector of kernel addresses to be translated to struct pages and pinned for IO. Patch 7 adds support for using highmem pages for swap by kmapping the pages before calling the direct_IO handler. Patch 8 updates NFS to use the helpers from patch 3 where necessary. Patch 9 avoids setting PF_private on PG_swapcache pages within NFS. Patch 10 implements the new swapfile-related address_space operations for NFS and teaches the direct IO handler how to manage kernel addresses. Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO where appropriate. Patch 12 fixes a NULL pointer dereference that occurs when using swap-over-NFS. With the patches applied, it is possible to mount a swapfile that is on an NFS filesystem. Swap performance is not great with a swap stress test taking roughly twice as long to complete than if the swap device was backed by NBD. This patch: netvm: prevent a stream-specific deadlock It could happen that all !SOCK_MEMALLOC sockets have buffered so much data that we're over the global rmem limit. This will prevent SOCK_MEMALLOC buffers from receiving data, which will prevent userspace from running, which is needed to reduce the buffered data. Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit. Once this change it applied, it is important that sockets that set SOCK_MEMALLOC do not clear the flag until the socket is being torn down. If this happens, a warning is generated and the tokens reclaimed to avoid accounting errors until the bug is fixed. [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Rik van Riel <riel@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Neil Brown <neilb@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:47 -07:00
Mel Gorman	b4b9e35585	netvm: set PF_MEMALLOC as appropriate during SKB processing In order to make sure pfmemalloc packets receive all memory needed to proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC. This is limited to a subset of protocols that are expected to be used for writing to swap. Taps are not allowed to use PF_MEMALLOC as these are expected to communicate with userspace processes which could be paged out. [a.p.zijlstra@chello.nl: Ideas taken from various patches] [jslaby@suse.cz: Lock imbalance fix] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Mel Gorman	c93bdd0e03	netvm: allow skb allocation to use PFMEMALLOC reserves Change the skb allocation API to indicate RX usage and use this to fall back to the PFMEMALLOC reserve when needed. SKBs allocated from the reserve are tagged in skb->pfmemalloc. If an SKB is allocated from the reserve and the socket is later found to be unrelated to page reclaim, the packet is dropped so that the memory remains available for page reclaim. Network protocols are expected to recover from this packet loss. [a.p.zijlstra@chello.nl: Ideas taken from various patches] [davem@davemloft.net: Use static branches, coding style corrections] [sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Mel Gorman	7cb0240492	netvm: allow the use of __GFP_MEMALLOC by specific sockets Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC for their allocations. These sockets will be able to go below watermarks and allocate from the emergency reserve. Such sockets are to be used to service the VM (iow. to swap over). They must be handled kernel side, exposing such a socket to user-space is a bug. There is a risk that the reserves be depleted so for now, the administrator is responsible for increasing min_free_kbytes as necessary to prevent deadlock for their workloads. [a.p.zijlstra@chello.nl: Original patches] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Mel Gorman	99a1dec70d	net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket Introduce sk_gfp_atomic(), this function allows to inject sock specific flags to each sock related allocation. It is only used on allocation paths that may be required for writing pages back to network storage. [davem@davemloft.net: Use sk_gfp_atomic only when necessary] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David S. Miller <davem@davemloft.net> Cc: Neil Brown <neilb@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Mel Gorman <mgorman@suse.de> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:46 -07:00
Andrew Morton	c255a45805	memcg: rename config variables Sanity: CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM [mhocko@suse.cz: fix missed bits] Cc: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-31 18:42:43 -07:00
David S. Miller	caacf05e5a	ipv4: Properly purge netdev references on uncached routes. When a device is unregistered, we have to purge all of the references to it that may exist in the entire system. If a route is uncached, we currently have no way of accomplishing this. So create a global list that is scanned when a network device goes down. This mirrors the logic in net/core/dst.c's dst_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-31 15:06:50 -07:00
David S. Miller	c5038a8327	ipv4: Cache routes in nexthop exception entries. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-31 15:02:02 -07:00
Linus Torvalds	08843b79fb	Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...	2012-07-31 14:42:28 -07:00
Eric Dumazet	d26b3a7c4b	ipv4: percpu nh_rth_output cache Input path is mostly run under RCU and doesnt touch dst refcnt But output path on forwarding or UDP workloads hits badly dst refcount, and we have lot of false sharing, for example in ipv4_mtu() when reading rt->rt_pmtu Using a percpu cache for nh_rth_output gives a nice performance increase at a small cost. 24 udpflood test on my 24 cpu machine (dummy0 output device) (each process sends 1.000.000 udp frames, 24 processes are started) before : 5.24 s after : 2.06 s For reference, time on linux-3.5 : 6.60 s Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-31 14:41:39 -07:00
Eric Dumazet	54764bb647	ipv4: Restore old dst_free() behavior. commit `404e0a8b6a` (net: ipv4: fix RCU races on dst refcounts) tried to solve a race but added a problem at device/fib dismantle time : We really want to call dst_free() as soon as possible, even if sockets still have dst in their cache. dst_release() calls in free_fib_info_rcu() are not welcomed. Root of the problem was that now we also cache output routes (in nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in rt_free(), because output route lookups are done in process context. Based on feedback and initial patch from David Miller (adding another call_rcu_bh() call in fib, but it appears it was not the right fix) I left the inet_sk_rx_dst_set() helper and added __rcu attributes to nh_rth_output and nh_rth_input to better document what is going on in this code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-31 14:41:38 -07:00
Linus Torvalds	cc8362b1f6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "Lots of stuff this time around: - lots of cleanup and refactoring in the libceph messenger code, and many hard to hit races and bugs closed as a result. - lots of cleanup and refactoring in the rbd code from Alex Elder, mostly in preparation for the layering functionality that will be coming in 3.7. - some misc rbd cleanups from Josh Durgin that are finally going upstream - support for CRUSH tunables (used by newer clusters to improve the data placement) - some cleanup in our use of d_parent that Al brought up a while back - a random collection of fixes across the tree There is another patch coming that fixes up our ->atomic_open() behavior, but I'm going to hammer on it a bit more before sending it." Fix up conflicts due to commits that were already committed earlier in drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits) rbd: create rbd_refresh_helper() rbd: return obj version in __rbd_refresh_header() rbd: fixes in rbd_header_from_disk() rbd: always pass ops array to rbd_req_sync_op() rbd: pass null version pointer in add_snap() rbd: make rbd_create_rw_ops() return a pointer rbd: have __rbd_add_snap_dev() return a pointer libceph: recheck con state after allocating incoming message libceph: change ceph_con_in_msg_alloc convention to be less weird libceph: avoid dropping con mutex before fault libceph: verify state after retaking con lock after dispatch libceph: revoke mon_client messages on session restart libceph: fix handling of immediate socket connect failure ceph: update MAINTAINERS file libceph: be less chatty about stray replies libceph: clear all flags on con_close libceph: clean up con flags libceph: replace connection state bits with states libceph: drop unnecessary CLOSED check in socket state change callback libceph: close socket directly from ceph_con_close() ...	2012-07-31 14:35:28 -07:00
Johannes Berg	e83e6541ce	mac80211: use eth_broadcast_addr Instead of memset(). Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:19:51 +02:00
Johannes Berg	1411af1565	mac80211: enable WDS carrier only after adding station Enable the carrier on WDS type interfaces only after having added the station entry for the WDS peer so outgoing frames will find it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:19:01 +02:00
Johannes Berg	c405c6298e	mac80211: manage carrier state in mesh Instead of assuming the carrier is on all the time in mesh manage it with joining and leaving the mesh. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:59 +02:00
Johannes Berg	2d56577bc6	mac80211: use correct channel in TX Since we only need the band, remove the channel pointer from struct ieee80211_tx_data and also assign it properly, depending on context, to the correct operating or current channel. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:58 +02:00
Johannes Berg	6b77863b71	mac80211: fix current vs. operating channel in preq/beacon When sending probe requests, e.g. during software scanning, these will go out on the current channel, so their IEs need to be built from the current channel. At other times, e.g. for beacons or probe request templates, the IEs will be used on the operating channel and using the current channel instead might result in errors. Add the appropriate parameters to respect the difference. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:56 +02:00
Johannes Berg	679ef4eadd	mac80211: use oper_channel in utils and config Using hw.conf.channel is wrong as it could be the temporary channel if any function like the beacon get function is called while scanning or during other temporary out-of-channel activities. Use oper_channel instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:55 +02:00
Johannes Berg	568d6e2897	mac80211: use oper_channel in managed mlme Using hw.conf.channel is wrong as it could be the temporary channel if any function like the beacon get function is called while scanning or during other temporary out-of-channel activities. Use oper_channel instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:53 +02:00
Johannes Berg	273686d664	mac80211: use oper_channel in ibss Using hw.conf.channel is wrong as it could be the temporary channel if any function like the beacon get function is called while scanning or during other temporary out-of-channel activities. Use oper_channel instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:52 +02:00
Johannes Berg	6962d60205	mac80211: use oper_channel in mesh Using hw.conf.channel is wrong as it could be the temporary channel if any function like the beacon get function is called while scanning or during other temporary out-of-channel activities. Use oper_channel instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:51 +02:00
Johannes Berg	b17166a707	mac80211: set channel only once during auth/assoc There's no need to set up the channel during auth and again during assoc, just do it once. Currently this doesn't result in any changes since calling hw_config() with an unchanged channel will return early, but with the channel context work this has an impact on channel context assignment. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:50 +02:00
Johannes Berg	13e0c8e355	mac80211: rename sta to new_sta In ieee80211_prep_connection(), the station (if not NULL) is the new station (representing the AP) that needs to be added. Rename the variable to "new_sta" to clarify this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:48 +02:00
Johannes Berg	1b49de2656	mac80211: supress HT/VHT disable if not supported If HT/VHT isn't supported by us we shouldn't print a message that we disabled it, do that only if the AP didn't support WMM and we therefore disable it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:47 +02:00
Thomas Huehn	36323f817a	mac80211: move TX station pointer and restructure TX Remove the control.sta pointer from ieee80211_tx_info to free up sufficient space in the TX skb control buffer for the upcoming Transmit Power Control (TPC). Instead, the pointer is now on the stack in a new control struct that is passed as a function parameter to the drivers' tx method. Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de> Signed-off-by: Alina Friedrichsen <x-alina@gmx.net> Signed-off-by: Felix Fietkau <nbd@openwrt.org> [reworded commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:18:39 +02:00
Eliad Peller	ab09587740	mac80211: add PS flag to bss_conf Currently, ps mode is indicated per device (rather than per interface), which doesn't make a lot of sense. Moreover, there are subtle bugs caused by the inability to indicate ps change along with other changes (e.g. when the AP deauth us, we'd like to indicate CHANGED_PS \| CHANGED_ASSOC, as changing PS before notifying about disassociation will result in null-packets being sent (if IEEE80211_HW_SUPPORTS_DYNAMIC_PS) while the sta is already disconnected.) Keep the current per-device notifications, and add parallel per-vif notifications. In order to keep it simple, the per-device ps and the per-vif ps are orthogonal - the per-vif ps configuration is determined only by the user configuration (enable/disable) and the connection state, and is not affected by other vifs state and (temporary) dynamic_ps/offchannel operations (unlike per-device ps). Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:11:04 +02:00
Vladimir Kondratiev	e21768928d	cfg80211: unify IE search Remove ah-hoc IE search code found in the ieee80211_bss_get_ie() and use cfg80211_find_ie() instead. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:11:01 +02:00
Emmanuel Grumbach	8c7d857c4a	mac80211: don't call mgd_prepare_tx when associated This doesn't make any sense since we are expected to be on the medium or at least to Tx only when we are on the right channel and the AP/GO can hear us. Move the call to mgd_prepare_tx() for deauth to be only done in case we're sending a deauth while not associated. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:10:59 +02:00
Johannes Berg	7eeff74c29	mac80211: don't react to beacon loss if HW monitoring If the HW is monitoring connection loss (as advertised by IEEE80211_HW_CONNECTION_MONITOR) but not filtering beacons (IEEE80211_VIF_BEACON_FILTER) then mac80211 will still start the beacon loss timer and if a few beacons are lost, e.g. due to scanning, drop the connection. If the hardware doesn't advertise connection monitoring, then it won't drop the connection right away but probe the AP, which is intended, but due to the logic in the timer when connection monitoring is done it assumes the connection was actually lost. Fix this problem by not starting the timer when the HW does connection monitoring. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:10:58 +02:00
Mahesh Palivela	d545daba53	mac80211: VHT (11ac) association Insert VHT IEs into association frames to allow mac80211 to connect as a VHT client. Signed-off-by: Mahesh Palivela <maheshp@posedge.com> [clarify commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:10:57 +02:00
Chun-Yeow Yeoh	bae35d92b6	mac80211: don't re-init rate control when receiving mesh beacon Rate control is re-initialized whenever a beacon from a mesh peer received, breaking the algorithms and resulting in low performance. Return early from mesh_peer_init if we already established a link with this peer to avoid this. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> [clarify commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-31 16:10:55 +02:00
Linus Torvalds	1fad1e9a74	NFS client updates for Linux 3.6 Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg 0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4 QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0 NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC 4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL YNVVGhvNxccMU5EU9YR4 =Lc59 -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...	2012-07-30 19:16:57 -07:00
Sage Weil	6139919133	libceph: recheck con state after allocating incoming message We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:19:45 -07:00
Sage Weil	4740a623d2	libceph: change ceph_con_in_msg_alloc convention to be less weird This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:19:30 -07:00
Sage Weil	8636ea672f	libceph: avoid dropping con mutex before fault The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:17:13 -07:00
Sage Weil	7b862e07b1	libceph: verify state after retaking con lock after dispatch We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:16:56 -07:00
Sage Weil	4f471e4a9c	libceph: revoke mon_client messages on session restart Revoke all mon_client messages when we shut down the old connection. This is mostly moot since we are re-using the same ceph_connection, but it is cleaner. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:16:40 -07:00
Sage Weil	8007b8d626	libceph: fix handling of immediate socket connect failure If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:16:16 -07:00
Sage Weil	756a16a5d5	libceph: be less chatty about stray replies There are many (normal) conditions that can lead to us getting unexpected replies, include cluster topology changes, osd failures, and timeouts. There's no need to spam the console about it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:16:03 -07:00
Sage Weil	43c7427d10	libceph: clear all flags on con_close Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:16:02 -07:00
Sage Weil	4a86169208	libceph: clean up con flags Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:16:01 -07:00
Sage Weil	8dacc7da69	libceph: replace connection state bits with states Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:16:00 -07:00
Sage Weil	d7353dd5aa	libceph: drop unnecessary CLOSED check in socket state change callback If we are CLOSED, the socket is closed and we won't get these. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:59 -07:00
Sage Weil	ee76e0736d	libceph: close socket directly from ceph_con_close() It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:58 -07:00
Sage Weil	2e8cb10063	libceph: drop gratuitous socket close calls in con_work If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:58 -07:00
Sage Weil	a59b55a602	libceph: move ceph_con_send() closed check under the con mutex Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:57 -07:00
Sage Weil	00650931e5	libceph: move msgr clear_standby under con mutex protection Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:56 -07:00
Sage Weil	3b5ede07b5	libceph: fix fault locking; close socket on lossy fault If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 18:15:55 -07:00
Jiaju Zhang	048a9d2d06	libceph: trivial fix for the incorrect debug output This is a trivial fix for the debug output, as it is inconsistent with the function name so may confuse people when debugging. [elder@inktank.com: switched to use __func__] Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:36 -07:00
Sage Weil	85effe183d	libceph: reset connection retry on successfully negotiation We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:34 -07:00
Sage Weil	5469155f2b	libceph: protect ceph_con_open() with mutex Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2012-07-30 18:15:33 -07:00
Sage Weil	a410702697	libceph: (re)initialize bio_iter on start of message receive Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - bio_iter now non-NULL, not reinitialized - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:31 -07:00
Sage Weil	6194ea895e	libceph: resubmit linger ops when pg mapping changes The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:31 -07:00
Sage Weil	8c50c81756	libceph: fix mutex coverage for ceph_con_close Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:30 -07:00
Sage Weil	3a140a0d5c	libceph: report socket read/write error message We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:29 -07:00
Sage Weil	546f04ef71	libceph: support crush tunables The server side recently added support for tuning some magic crush variables. Decode these variables if they are present, or use the default values if they are not present. Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5. Signed-off-by: caleb miles <caleb.miles@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 18:15:23 -07:00
Stanislav Kinsbursky	caea33da89	SUNRPC: return negative value in case rpcbind client creation error Without this patch kernel will panic on LockD start, because lockd_up() checks lockd_up_net() result for negative value. From my pow it's better to return negative value from rpcbind routines instead of replacing all such checks like in lockd_up(). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.0]	2012-07-30 20:39:05 -04:00
Sage Weil	1fe60e51a3	libceph: move feature bits to separate header This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>	2012-07-30 16:23:22 -07:00
Jeff Layton	5cf02d09b5	nfs: skip commit in releasepage if we're freeing memory for fs-related reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org	2012-07-30 18:55:59 -04:00
Jeff Layton	506026c3ec	sunrpc: clarify comments on rpc_make_runnable rpc_make_runnable is not generally called with the queue lock held, unless it's waking up a task that has been sitting on a waitqueue. This is safe when the task has not entered the FSM yet, but the comments don't really spell this out. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2012-07-30 18:55:08 -04:00
Joe Perches	cac5d07e3c	sunrpc: clnt: Add missing braces Add a missing set of braces that commit `4e0038b6b2` ("SUNRPC: Move clnt->cl_server into struct rpc_xprt") forgot. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.4]	2012-07-30 17:58:08 -04:00
stephen hemminger	5a0d513b62	bridge: make port attributes const Simple table that can be marked const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:22 -07:00
Eric Dumazet	0c7462a235	ipv4: remove rt_cache_rebuild_count After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:22 -07:00
Eric Dumazet	404e0a8b6a	net: ipv4: fix RCU races on dst refcounts commit `c6cffba4ff` (ipv4: Fix input route performance regression.) added various fatal races with dst refcounts. crashes happen on tcp workloads if routes are added/deleted at the same time. The dst_free() calls from free_fib_info_rcu() are clearly racy. We need instead regular dst refcounting (dst_release()) and make sure dst_release() is aware of RCU grace periods : Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period before dst destruction for cached dst Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero() to make sure we dont increase a zero refcount (On a dst currently waiting an rcu grace period before destruction) rt_cache_route() must take a reference on the new cached route, and release it if was not able to install it. With this patch, my machines survive various benchmarks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:22 -07:00
Eric Dumazet	cca32e4bf9	net: TCP early demux cleanup early_demux() handlers should be called in RCU context, and as we use skb_dst_set_noref(skb, dst), caller must not exit from RCU context before dst use (skb_dst(skb)) or release (skb_drop(dst)) Therefore, rcu_read_lock()/rcu_read_unlock() pairs around ->early_demux() are confusing and not needed : Protocol handlers are already in an RCU read lock section. (__netif_receive_skb() does the rcu_read_lock() ) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-30 14:53:21 -07:00
Guanjun He	a2a3258417	libceph: prevent the race of incoming work during teardown Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by: Guanjun He <gjhe@suse.com> Reviewed-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:53 -07:00
Sage Weil	a16cb1f707	libceph: fix messenger retry In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:52 -07:00
Sage Weil	cd43045c2d	libceph: initialize rb, list nodes in ceph_osd_request These don't strictly need to be initialized based on how they are used, but it is good practice to do so. Reported-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:51 -07:00
Sage Weil	d50b409fb8	libceph: initialize msgpool message types Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-07-30 09:29:50 -07:00
Eliad Peller	ba846a502c	mac80211: don't clear sched_scan_sdata on sched scan stop request ieee80211_request_sched_scan_stop() cleared local->sched_scan_sdata. However, sched_scan_sdata should be cleared only after the driver calls ieee80211_sched_scan_stopped() (like with normal hw scan). Clearing sched_scan_sdata too early caused ieee80211_sched_scan_stopped_work to exit prematurely without properly cleaning all the sched scan resources and without calling cfg80211_sched_scan_stopped (so userspace wasn't notified about sched scan completion). Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-30 09:14:36 +02:00
Johannes Berg	fcb06702f0	Merge remote-tracking branch 'wireless/master' into mac80211	2012-07-30 09:13:03 +02:00
Li Wei	8253947e2c	ipv6: fix incorrect route 'expires' value passed to userspace When userspace use RTM_GETROUTE to dump route table, with an already expired route entry, we always got an 'expires' value(2147157) calculated base on INT_MAX. The reason of this problem is in the following satement: rt->dst.expires - jiffies < INT_MAX gcc promoted the type of both sides of '<' to unsigned long, thus a small negative value would be considered greater than INT_MAX. With the help of Eric Dumazet, do the out of bound checks in rtnl_put_cacheinfo(), _after_ conversion to clock_t. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:31 -07:00
Lin Ming	61648d91fc	ipv4: clean up put_child The first parameter struct trie *t is not used anymore. Remove it. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:30 -07:00
Lin Ming	4ea4bf7ebc	ipv4: fix debug info in tnode_new It should print size of struct rt_trie_node * allocated instead of size of struct rt_trie_node. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-29 23:18:30 -07:00
Al Viro	faf0201029	clean unix_bind() up a bit Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:15 +04:00
Al Viro	a8104a9fcd	pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. One side effect - attempt to create a cross-device link on a read-only fs fails with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:15 +04:00
Al Viro	921a1650de	new helper: done_path_create() releases what needs to be released after {kern,user}_path_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-07-29 21:24:13 +04:00
Jiri Kosina	59ea33a68a	tcp: perform DMA to userspace only if there is a task waiting for it Back in 2006, commit `1a2449a87b` ("[I/OAT]: TCP recv offload to I/OAT") added support for receive offloading to IOAT dma engine if available. The code in tcp_rcv_established() tries to perform early DMA copy if applicable. It however does so without checking whether the userspace task is actually expecting the data in the buffer. This is not a problem under normal circumstances, but there is a corner case where this doesn't work -- and that's when MSG_TRUNC flag to recvmsg() is used. If the IOAT dma engine is not used, the code properly checks whether there is a valid ucopy.task and the socket is owned by userspace, but misses the check in the dmaengine case. This problem can be observed in real trivially -- for example 'tbench' is a good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they have been already early-copied in tcp_rcv_established() using dma engine. This patch introduces the same check we are performing in the simple iovec copy case to the IOAT case as well. It fixes the indefinite recvmsg(MSG_TRUNC) hangs. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-27 13:45:51 -07:00
Jesse Gross	6081030769	Revert "openvswitch: potential NULL deref in sample()" This reverts commit `5b3e7e6cb5`. The problem that the original commit was attempting to fix can never happen in practice because validation is done one a per-flow basis rather than a per-packet basis. Adding additional checks at runtime is unnecessary and inconsistent with the rest of the code. CC: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-27 13:45:51 -07:00
Eric Dumazet	505fbcf035	ipv4: fix TCP early demux commit `92101b3b2e` (ipv4: Prepare for change of rt->rt_iif encoding.) invalidated TCP early demux, because rx_dst_ifindex is not properly initialized and checked. Also remove the use of inet_iif(skb) in favor or skb->skb_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-27 13:45:51 -07:00
Jiri Benc	b1beb681cb	net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI flags are handled specially. Function dev_change_flags sets IFF_PROMISC and IFF_ALLMULTI bits in dev->gflags according to the passed value but do_setlink passes a result of rtnl_dev_combine_flags which takes those bits from dev->flags. This can be easily trigerred by doing: tcpdump -i eth0 & ip l s up eth0 ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set IFF_PROMISC in gflags. Reported-by: Max Matveev <makc@redhat.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-27 13:45:50 -07:00
Hangbin Liu	4249357010	tcp: Add TCP_USER_TIMEOUT negative value check TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But patch "tcp: Add TCP_USER_TIMEOUT socket option"(`dca43c75`) didn't check the negative values. If a user assign -1 to it, the socket will set successfully and wait for 4294967295 miliseconds. This patch add a negative value check to avoid this issue. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-27 13:45:50 -07:00
Linus Torvalds	aa0b3b2bee	Merge branch 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds Pull LED subsystem update from Bryan Wu. * 'for-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds: (50 commits) leds-lp8788: forgotten unlock at lp8788_led_work LEDS: propagate error codes in blinkm_detect() LEDS: memory leak in blinkm_led_common_set() leds: add new lp8788 led driver LEDS: add BlinkM RGB LED driver, documentation and update MAINTAINERS leds: max8997: Simplify max8997_led_set_mode implementation leds/leds-s3c24xx: use devm_gpio_request leds: convert Network Space v2 LED driver to devm_kzalloc() and cleanup error exit path leds: convert DAC124S085 LED driver to devm_kzalloc() leds: convert LM3530 LED driver to devm_kzalloc() and cleanup error exit path leds: convert TCA6507 LED driver to devm_kzalloc() leds: convert Freescale MC13783 LED driver to devm_kzalloc() and cleanup error exit path leds: convert ADP5520 LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA955x LED driver to devm_kzalloc() and cleanup error exit path leds: convert Sun Fire LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA9532 LED driver to devm_kzalloc() leds: convert LT3593 LED driver to devm_kzalloc() leds: convert Renesas TPU LED driver to devm_kzalloc() and cleanup error exit path leds: convert LP5523 LED driver to devm_kzalloc() and cleanup error exit path leds: convert PCA9633 LED driver to devm_kzalloc() ...	2012-07-26 20:26:27 -07:00
Linus Torvalds	1e30c1b386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking updates and fixes from David Miller: 1) Reinstate the no-ref optimization for input route lookups in ipv4 to fix some routing cache removal perf regressions. 2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet. 3) Get RX hash value from correct place in be2net driver, from Sarveshwar Bandi. 4) Validation of FIB cached routes missing critical check, from Eric Dumazet. 5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits) ipv6: Early TCP socket demux ipv4: Fix input route performance regression. pch_gbe: vlan skb len fix pch_gbe: add extra clean tx pch_gbe: fix transmit watchdog timeout ixgbe: fix panic while dumping packets on Tx hang with IOMMU be2net: Fix to parse RSS hash from Receive completions correctly. net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER hyperv: Add error handling to rndis_filter_device_add() hyperv: Add a check for ring_size value ipv4: rt_cache_valid must check expired routes net/pch_gpe: Cannot disable ethernet autonegation qeth: repair crash in qeth_l3_vlan_rx_kill_vid() netiucv: cleanup attribute usage net: wiznet add missing HAS_IOMEM dependency be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC mlx4: Add support for EEH error recovery cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN wanmain: comparing array with NULL caif: fix NULL pointer check ...	2012-07-26 18:09:01 -07:00
Eric Dumazet	c7109986db	ipv6: Early TCP socket demux This is the IPv6 missing bits for infrastructure added in commit `41063e9dd1` (ipv4: Early TCP socket demux.) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 15:50:39 -07:00
David S. Miller	c6cffba4ff	ipv4: Fix input route performance regression. With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 15:50:39 -07:00
Eric Dumazet	4331debc51	ipv4: rt_cache_valid must check expired routes commit `d2d68ba9fe` (ipv4: Cache input routes in fib_info nexthops.) introduced rt_cache_valid() helper. It unfortunately doesn't check if route is expired before caching it. I noticed sk_setup_caps() was constantly called on a tcp workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-25 15:24:14 -07:00
Stanislaw Gruszka	5e31fc0815	wireless: reg: restore previous behaviour of chan->max_power calculations commit `eccc068e8e` Author: Hong Wu <Hong.Wu@dspg.com> Date: Wed Jan 11 20:33:39 2012 +0200 wireless: Save original maximum regulatory transmission power for the calucation of the local maximum transmit pow changed the way we calculate chan->max_power as min(chan->max_power, chan->max_reg_power). That broke rt2x00 (and perhaps some other drivers) that do not set chan->max_power. It is not so easy to fix this problem correctly in rt2x00. According to commit `eccc068e8` changelog, change claim only to save maximum regulatory power - changing setting of chan->max_power was side effect. This patch restore previous calculations of chan->max_power and do not touch chan->max_reg_power. Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-25 16:11:12 +02:00
Alan Cox	8b72ff6484	wanmain: comparing array with NULL gcc really should warn about these ! Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-24 13:55:21 -07:00
Eric Dumazet	9cb429d692	tcp: early_demux fixes 1) Remove a non needed pskb_may_pull() in tcp_v4_early_demux() and fix a potential bug if skb->head was reallocated (iph & th pointers were not reloaded) TCP stack will pull/check headers anyway. 2) must reload iph in ip_rcv_finish() after early_demux() call since skb->head might have changed. 3) skb->dev->ifindex can be now replaced by skb->skb_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-24 13:54:15 -07:00
Linus Torvalds	d14b7a419a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Trivial updates all over the place as usual." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits) Fix typo in include/linux/clk.h . pci: hotplug: Fix typo in pci iommu: Fix typo in iommu video: Fix typo in drivers/video Documentation: Add newline at end-of-file to files lacking one arm,unicore32: Remove obsolete "select MISC_DEVICES" module.c: spelling s/postition/position/g cpufreq: Fix typo in cpufreq driver trivial: typo in comment in mksysmap mach-omap2: Fix typo in debug message and comment scsi: aha152x: Fix sparse warning and make printing pointer address more portable. Change email address for Steve Glendinning Btrfs: fix typo in convert_extent_bit via: Remove bogus if check netprio_cgroup.c: fix comment typo backlight: fix memory leak on obscure error path Documentation: asus-laptop.txt references an obsolete Kconfig item Documentation: ManagementStyle: fixed typo mm/vmscan: cleanup comment error in balance_pgdat mm: cleanup on the comments of zone_reclaim_stat ...	2012-07-24 13:34:56 -07:00
Linus Torvalds	3c4cfadef6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...	2012-07-24 10:01:50 -07:00
Johannes Berg	3aa569c3fe	mac80211: fix scan_sdata assignment We need to use RCU to assign scan_sdata. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2012-07-24 16:54:11 +02:00
WANG Cong	320f5ea0ce	genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP lockdep_is_held() is defined when CONFIG_LOCKDEP, not CONFIG_PROVE_LOCKING. Cc: "David S. Miller" <davem@davemloft.net> Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-24 00:01:30 -07:00
Shuah Khan	19cd67e2d5	leds: Rename led_brightness_set() to led_set_brightness() Rename leds external interface led_brightness_set() to led_set_brightness(). This is the second phase of the change to reduce confusion between the leds internal and external interfaces that set brightness. With this change, now the external interface is led_set_brightness(). The first phase renamed the internal interface led_set_brightness() to __led_set_brightness(). There are no changes to the interface implementations. Signed-off-by: Shuah Khan <shuahkhan@gmail.com> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>	2012-07-24 07:52:34 +08:00
David S. Miller	13378cad02	ipv4: Change rt->rt_iif encoding. On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 16:36:27 -07:00
David S. Miller	b68581778c	net: Make skb->skb_iif always track skb->dev Make it follow device decapsulation, from things such as VLAN and bonding. The stuff that actually cares about pre-demuxed device pointers, is handled by the "orig_dev" variable in __netif_receive_skb(). And the only consumer of that is the po->origdev feature of AF_PACKET sockets. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 16:36:27 -07:00
David S. Miller	92101b3b2e	ipv4: Prepare for change of rt->rt_iif encoding. Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 16:36:26 -07:00
David S. Miller	fe3edf4579	ipv4: Remove all RTCF_DIRECTSRC handliing. The last and final kernel user, ICMP address replies, has been removed. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-23 13:22:20 -07:00

... 3 4 5 6 7 ...

24816 Commits