linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Jarek Poplawski	7f3ff4f63f	pkt_sched: Annotate uninitialized var in sfq_enqueue() Some gcc versions warn that ret may be used uninitialized in sfq_enqueue(). It's a false positive, so let's annotate this. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-21 20:14:48 -08:00
David S. Miller	eb14f01959	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c	2008-12-15 20:03:50 -08:00
Jesper Dangaard Brouer	eb9b851b98	SCHED: netem: Correct documentation comment in code. The netem simulator is no longer limited by Linux timer resolution HZ. Not since Patrick McHardy changed the QoS system to use hrtimer. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-15 00:39:17 -08:00
Jarek Poplawski	512bb43eb5	pkt_sched: sch_htb: Optimize WARN_ONs in htb_dequeue_tree() etc. We can skip WARN_ON() in htb_dequeue_tree() because there should be always a similar warning from htb_lookup_leaf() earlier. The first WARN_ON() in in htb_lookup_leaf() is changed to BUG_ON() because most likly this should end with oops anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-09 22:35:02 -08:00
Jarek Poplawski	1b5c0077e1	pkt_sched: sch_htb: Optimize htb_find_next_upper() htb_id_find_next_upper() is usually called to find a class with next id after some previously removed class, so let's move a check for equality to the end: it's the least likely here. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-09 22:34:40 -08:00
James Morris	ec98ce480a	Merge branch 'master' into next Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>	2008-12-04 17:16:36 +11:00
Jarek Poplawski	59e4220a11	pkt_sched: sch_htb: Replace HTB_ACCNT() macro with inlines Replace HTB_ACCNT() macro with inlines to make it more readable. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:17:27 -08:00
Jarek Poplawski	23cb913d25	pkt_sched: sch_htb: Remove L2T() L2T() is currently used only in one place (and has one spurious parameter, btw), so let's: 'get rid of L2T completely, and just use "qdisc_l2t(rate, size)" directly.' - quote & feedback from David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:16:58 -08:00
Jarek Poplawski	c19f7a34f7	pkt_sched: sch_htb: Clean htb_class prio and quantum fields While implementing htb_parent_to_leaf() there where added backup prio and quantum struct htb_class fields to preserve these values for inner classes in case of their return to leaf. This patch cleans this a bit by removing union leaf duplicates. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:09:45 -08:00
Jarek Poplawski	633fe66ed8	pkt_sched: sch_htb: Remove htb_sched nwc_hit field Remove practically unused struct htb_sched nwc_hit field. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:09:10 -08:00
Jarek Poplawski	4164d661b8	pkt_sched: sch_htb: Remove htb_class aprio field Remove practically unused struct htb_class aprio field. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:08:44 -08:00
Hannes Eder	6113b748fb	pkt_sched: fix sparse warning Impact: make global function static Fix the following sparse warning: net/sched/sch_api.c:192:14: warning: symbol 'qdisc_match_from_root' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-28 03:06:46 -08:00
Jarek Poplawski	244e6c2d07	pkt_sched: gen_estimator: Optimize gen_estimator_active() Since all other gen_estimator functions use bstats and rate_est params together, and searching for them is optimized now, let's use this also in gen_estimator_active(). The return type of gen_estimator_active() is changed to bool, and gen_find_node() parameters to const, btw. In tcf_act_police_locate() a check for ACT_P_CREATED is added before calling gen_estimator_active(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-26 15:24:32 -08:00
Stephen Hemminger	c1b56878fb	tc: policing requires a rate estimator Found that while trying average rate policing, it was possible to request average rate policing without a rate estimator. This results in no policing which is harmless but incorrect. Since policing could be setup in two steps, need to check in the kernel. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:14:06 -08:00
Stephen Hemminger	71bcb09a57	tc: check for errors in gen_rate_estimator creation The functions gen_new_estimator and gen_replace_estimator can return errors, but they were being ignored. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:13:31 -08:00
Stephen Hemminger	0e991ec6a0	tc: propogate errors from tcf_hash_create Allow tcf_hash_create to return different errors on estimator failure. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:13:25 -08:00
Ingo Molnar	dc0a0011cf	pkt_sched: fix warning in net/sched/sch_hfsc.c this warning: net/sched/sch_hfsc.c: In function ‘hfsc_enqueue’: net/sched/sch_hfsc.c:1577: warning: ‘err’ may be used uninitialized in this function triggers because GCC does not recognize the (correct) error flow between hfsc_classify(), 'cl' and 'err'. Annotate it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 16:50:02 -08:00
Jarek Poplawski	f6486d40b3	pkt_sched: sch_api: Remove qdisc_list_lock After implementing qdisc->ops->peek() there is no more calling qdisc_tree_decrease_qlen() without rtnl_lock(), so qdisc_list_lock added by commit: `f6e0b239a2` "pkt_sched: Fix qdisc list locking" can be removed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 13:56:06 -08:00
Patrick McHardy	3f0947c3ff	pkt_sched: sch_drr: fix drr_dequeue loop() Jarek Poplawski points out: If all child qdiscs of sch_drr are non-work-conserving (e.g. sch_tbf) drr_dequeue() will busy-loop waiting for skbs instead of leaving the job for a watchdog. Checking for list_empty() in each loop isn't necessary either, because this can never be true except the first time. Using non-work-conserving qdiscs as children of DRR makes no sense, simply bail out in that case. Reported-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-24 15:46:08 -08:00
Jarek Poplawski	98aa9c80f1	pkt_sched: sch_drr: Fix qlen in drr_drop() Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-21 04:37:27 -08:00
David S. Miller	6ab33d5171	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_main.c include/net/mac80211.h net/phonet/af_phonet.c	2008-11-20 16:44:00 -08:00
Patrick McHardy	47a1a1d4be	pkt_sched: remove unnecessary xchg() in packet classifiers The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. In the case of classifiers they mostly weren't even necessary before that since they're mainly used to assign a NULL pointer to the filter root in the ->destroy path; the root is destroyed immediately after that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:14:28 -08:00
Patrick McHardy	b94c8afcba	pkt_sched: remove unnecessary xchg() in packet schedulers The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:11:36 -08:00
Patrick McHardy	13d2a1d2b0	pkt_sched: add DRR scheduler Add classful DRR scheduler as a more flexible replacement for SFQ. The main difference to the algorithm described in "Efficient Fair Queueing using Deficit Round Robin" is that this implementation doesn't drop packets from the longest queue on overrun because its classful and limits are handled by each individual child qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:10:00 -08:00
Patrick McHardy	3aa4614da7	pkt_sched: fix missing check for packet overrun in qdisc_dump_stab() nla_nest_start() might return NULL, causing a NULL pointer dereference. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:07:14 -08:00
Stephen Hemminger	d314774cf2	netdev: network device operations infrastructure This patch changes the network device internal API to move adminstrative operations out of the network device structure and into a separate structure. This patch involves some hackery to maintain compatablity between the new and old model, so all 300+ drivers don't have to be changed at once. For drivers that aren't converted yet, the netdevice_ops virt function list still resides in the net_device structure. For old protocols, the new net_device_ops are copied out to the old net_device pointers. After the transistion is completed the nag message can be changed to an WARN_ON, and the compatiablity code can be made configurable. Some function pointers aren't moved: * destructor can't be in net_device_ops because it may need to be referenced after the module is unloaded. * neighbor setup is manipulated in a couple of places that need special consideration * hard_start_xmit is in the fast path for transmit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 21:32:24 -08:00
David S. Miller	b47300168e	net: Do not fire linkwatch events until the device is registered. Several device drivers try to do things like netif_carrier_off() before register_netdev() is invoked. This is bogus, but too many drivers do this to fix them all up in one go. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 15:33:54 -08:00
Alexey Dobriyan	4d24b52ac5	ematch: simpler tcf_em_unregister() Simply delete ops from list and let list debugging do the job. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-16 23:01:49 -08:00
Jarek Poplawski	f30ab418a1	pkt_sched: Remove qdisc->ops->requeue() etc. After implementing qdisc->ops->peek() and changing sch_netem into classless qdisc there are no more qdisc->ops->requeue() users. This patch removes this method with its wrappers (qdisc_requeue()), and also unused qdisc->requeue structure. There are a few minor fixes of warnings (htb_enqueue()) and comments btw. The idea to kill ->requeue() and a similar patch were first developed by David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-13 22:56:30 -08:00
David Howells	d76b0d9b2d	CRED: Use creds in file structs Attach creds to file structs and discard f_uid/f_gid. file_operations::open() methods (such as hppfs_open()) should use file->f_cred rather than current_cred(). At the moment file->f_cred will be current_cred() at this point. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:25 +11:00
Thomas Graf	f400923735	pkt_sched: Control group classifier The classifier should cover the most common use case and will work without any special configuration. The principle of the classifier is to directly access the task_struct via get_current(). In order for this to work, classification requests from softirqs must be ignored. This is not a problem because the vast majority of packets in softirq context are not assigned to a task anyway. For this to work, a mechanism is needed to trace softirq context. This repost goes back to the method of relying on the number of nested bh disable calls for the sake of not adding too much complexity and the option to come up with something more reliable if actually needed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-07 22:56:00 -08:00
Stephen Hemminger	265eb67fb4	netem: eliminate unneeded return values All these individual parsing functions never return an error, so they can be void. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-03 21:13:26 -08:00
Jarek Poplawski	67305ebc99	pkt_sched: sch_generic: Kfree gso_skb in qdisc_reset() Since gso_skb is re-used for qdisc_peek_dequeued(), and this skb is counted in the qdisc->q.qlen, it has to be kfreed during qdisc_reset() when qlen is zeroed. With help from David S. Miller <davem@davemloft.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-03 02:52:50 -08:00
Jarek Poplawski	8ba25dad0a	sch_netem: Replace ->requeue() method with open code After removing netem classful functionality we are sure its inner qdisc is tfifo, so we can replace qdisc->ops->requeue() method with open code. After this patch there are no more ops->requeue() users. The idea of this patch is by Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-02 00:36:03 -07:00
Jarek Poplawski	0220146411	sch_netem: Remove classful functionality Patrick McHardy noticed that: "a lot of the functionality of netem requires the inner tfifo anyways and rate-limiting is usually done on top of netem. So I would suggest so either hard-wire the tfifo qdisc or at least make the assumption that inner qdiscs are work-conserving.", and later: "- a lot of other qdiscs still don't work as inner qdiscs of netem [...]". So, according to his suggestion, this patch removes classful options of netem. The main reason of this change is to remove ops->requeue() method, which is currently used only by netem. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-02 00:35:24 -07:00
Jarek Poplawski	77be155cba	pkt_sched: Add peek emulation for non-work-conserving qdiscs. This patch adds qdisc_peek_dequeued() wrapper to emulate peek method with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until dequeuing. This is mainly for compatibility reasons not to break some strange configs because peeking is expected for non-work-conserving parent qdiscs to query work-conserving child qdiscs. This implementation requires using qdisc_dequeue_peeked() wrapper instead of directly calling qdisc->dequeue() for all qdiscs ever querried with qdisc->ops->peek() or qdisc_peek_dequeued(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:47:01 -07:00
Jarek Poplawski	03c05f0d4b	pkt_sched: Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair. After this patch the only remaining user of qdisc->ops->requeue() is netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:46:19 -07:00
Jarek Poplawski	8e3af97899	pkt_sched: Add qdisc->ops->peek() implementation. Add qdisc->ops->peek() implementation for work-conserving qdiscs. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:45:55 -07:00
Jarek Poplawski	99c0db2679	pkt_sched: sch_generic: Add generic qdisc->ops->peek() implementation. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:45:27 -07:00
Patrick McHardy	48a8f519e0	pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs. From: Patrick McHardy <kaber@trash.net> Just as a demonstration how easy adding a peek operation to the work-conserving qdiscs actually is. It doesn't need to keep or change any internal state in many cases thanks to the guarantee that the packet will either be dequeued or, if another packet arrives, the upper qdisc will immediately ->peek again to reevaluate the state. (This is only slightly modified Patrick's patch.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:44:18 -07:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Jarek Poplawski	9f3ffae0db	pkt_sched: sch_generic: Fix oops in sch_teql After these commands: # modprobe sch_teql # tc qdisc add dev eth0 root teql0 # tc qdisc del dev eth0 root we get an oops in teql_destroy() when spin_lock is taken from a null qdisc_sleeping pointer. It's because at the moment teql0 dev haven't been activated yet, and a qdisc_root_sleeping() is pointing to noop qdisc's netdev_queue with qdisc_sleeping uninitialized. This patch fixes this both for noop and noqueue netdev_queues to avoid similar problems in the future. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-19 23:37:47 -07:00
Arjan van de Ven	651dab4264	Merge commit 'linus/master' into merge-linus Conflicts: arch/x86/kvm/i8254.c	2008-10-17 09:20:26 -07:00
Johannes Berg	95a5afca4a	net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-16 15:24:51 -07:00
Jarek Poplawski	53e9150349	pkt_sched: Update qdisc requeue stats in dev_requeue_skb() After the last change of requeuing there is no info about such incidents in tc stats. This patch updates the counter, but we should consider this should differ from previous stats because of additional checks preventing to repeat this. On the other hand, previous stats didn't include requeuing of gso_segmented skbs. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-08 11:36:22 -07:00
Jan Engelhardt	916a917dfe	netfilter: xtables: provide invoked family value to extensions By passing in the family through which extensions were invoked, a bit of data space can be reclaimed. The "family" member will be added to the parameter structures and the check functions be adjusted. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:20 +02:00
Jan Engelhardt	a2df1648ba	netfilter: xtables: move extension arguments into compound structure (6/6) This patch does this for target extensions' destroy functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	af5d6dc200	netfilter: xtables: move extension arguments into compound structure (5/6) This patch does this for target extensions' checkentry functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	7eb3558655	netfilter: xtables: move extension arguments into compound structure (4/6) This patch does this for target extensions' target functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	367c679007	netfilter: xtables: do centralized checkentry call (1/2) It used to be that {ip,ip6,etc}_tables called extension->checkentry themselves, but this can be moved into the xtables core. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:17 +02:00
Jarek Poplawski	6252352d16	pkt_sched: Simplify dev_requeue_skb and dequeue_skb qdisc->requeue was planned to universally replace all requeuing code, but at the top level we never requeue more than one skb, so qdisc-> gso_skb is enough for this. qdisc->requeue would be used on the lower levels only for one level deep requeuing (like in sch_hfsc) after finishing all the changes. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-06 10:41:50 -07:00
Jarek Poplawski	554794de79	pkt_sched: Fix handling of gso skbs on requeuing Jay Cliburn noticed and diagnosed a bug triggered in dev_gso_skb_destructor() after last change from qdisc->gso_skb to qdisc->requeue list. Since gso_segmented skbs can't be queued to another list this patch brings back qdisc->gso_skb for them. Reported-by: Jay Cliburn <jcliburn@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-06 09:54:39 -07:00
Jarek Poplawski	ebf059821e	pkt_sched: Check the state of tx_queue in dequeue_skb() Check in dequeue_skb() the state of tx_queue for requeued skb to save on locking and re-requeuing, and possibly remove the current check in qdisc_run(). Based on the idea of Alexander Duyck. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:16:23 -07:00
David S. Miller	f0876520b0	pkt_sched: Always use q->requeue in dev_requeue_skb(). There is no reason to call into the complicated qdiscs just to remember the last SKB where we found the device blocked. The SKB is outside of the qdiscs realm at this point. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:15:58 -07:00
David S. Miller	242f8bfefe	pkt_sched: Make qdisc->gso_skb a list. The idea is that we can use this to get rid of ->requeue(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:15:30 -07:00
Harvey Harrison	d48abfecea	net: em_cmp.c use unaligned access helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 19:20:51 -07:00
Arnaldo Carvalho de Melo	6067804047	net: Use hton[sl]() instead of __constant_hton[sl]() where applicable Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 22:20:49 -07:00
Alexander Duyck	a574420ff4	multiq: requeue should rewind the current_band Currently dequeueing a packet and requeueing the same packet will cause a different packet to be pulled on the next dequeue. This change forces requeue to rewind the current_band. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 22:07:34 -07:00
Alexander Duyck	f07d150129	multiq: Further multiqueue cleanup This patch resolves a few issues found with multiq including wording suggestions and a problem seen in the allocation of queues. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 17:57:23 -07:00
Alexander Duyck	ca9b0e27e0	pkt_action: add new action skbedit This new action will have the ability to change the priority and/or queue_mapping fields on an sk_buff. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:30:20 -07:00
Alexander Duyck	92651940ab	pkt_sched: Add multiqueue scheduler support This patch is intended to add a qdisc to support the new tx multiqueue architecture by providing a band for each hardware queue. By doing this it is possible to support a different qdisc per physical hardware queue. This qdisc uses the skb->queue_mapping to select which band to place the traffic onto. It then uses a round robin w/ a check to see if the subqueue is stopped to determine which band to dequeue the packet from. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:29:34 -07:00
Arjan van de Ven	5337407c67	warn: Turn the netdev timeout WARN_ON() into a WARN() this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(), so that the device and driver names are inside the warning message. This helps automated tools like kerneloops.org to collect the data and do statistics, as well as making it more likely that humans cut-n-paste the important message as part of a bugreport. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-08 16:17:42 -07:00
Arjan van de Ven	23dd7bb09b	hrtimer: convert net::sched_cbq to the new hrtimer apis In order to be able to do range hrtimers we need to use accessor functions to the "expire" member of the hrtimer struct. This patch converts sched_cbq to these accessors. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:11 -07:00
Thomas Graf	2c10b32bf5	netlink: Remove compat API for nested attributes Removes all _nested_compat() functions from the API. The prio qdisc no longer requires them and netem has its own format anyway. Their existance is only confusing. Resend: Also remove the wrapper macro. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:30:27 -07:00
Jarek Poplawski	102396ae65	pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where appropriate. The only difference is while dev is deactivated, when currently we can use a sleeping qdisc with the lock of noop_qdisc. This shouldn't be dangerous since after deactivation root lock could be used only by gen_estimator code, but looks wrong anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:52 -07:00
Jarek Poplawski	f6f9b93f16	pkt_sched: Fix gen_estimator locks While passing a qdisc root lock to gen_new_estimator() and gen_replace_estimator() dev could be deactivated or even before grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so using qdisc_root_lock() is not enough. This patch adds qdisc_root_sleeping_lock() for this, plus additional checks, where necessary. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:25:17 -07:00
Jarek Poplawski	f7a54c13c7	pkt_sched: Use rcu_assign_pointer() to change dev_queue->qdisc These pointers are RCU protected, so proper primitives should be used. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:22:07 -07:00
Jarek Poplawski	666d9bbedf	pkt_sched: Fix dev_graft_qdisc() locking During dev_graft_qdisc() dev is deactivated, so qdisc_root_lock() returns wrong lock of noop_qdisc instead of qdisc_sleeping. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:15:20 -07:00
Jarek Poplawski	f6e0b239a2	pkt_sched: Fix qdisc list locking Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup()) without rtnl_lock(), adding and deleting from a qdisc list needs additional locking. This patch adds global spinlock qdisc_list_lock and wrapper functions for modifying the list. It is considered as a temporary solution until hfsc_dequeue(), netem_dequeue() and tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone. With feedback from Herbert Xu and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-22 03:31:39 -07:00
Jarek Poplawski	2540e0511e	pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog() or other timer calling netif_schedule() after dev_queue_deactivate(). We prevent this checking aliveness before scheduling the timer. Since during deactivation the root qdisc is available only as qdisc_sleeping additional accessor qdisc_root_sleeping() is created. With feedback from Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-21 05:11:14 -07:00
David S. Miller	f3b9605d74	Revert "pkt_sched: Add BH protection for qdisc_stab_lock." This reverts commit `1cfa26661a`. qdisc_destroy() runs fully under RTNL again and not from softint any longer, so this change is no longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:33:05 -07:00
Ilpo Järvinen	e5befbd952	pkt_sched: remove bogus block (cleanup) ...Last block local var got just deleted. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:30:01 -07:00
David S. Miller	4d8863a29c	pkt_sched: Don't hold qdisc lock over qdisc_destroy(). Based upon reports by Denys Fedoryshchenko, and feedback and help from Jarek Poplawski and Herbert Xu. We always either: 1) Never made an external reference to this qdisc. or 2) Did a dev_deactivate() which purged all asynchronous references. So do not lock the qdisc when we call qdisc_destroy(), it's illegal anyways as when we drop the lock this is free'd memory. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:19 -07:00
Jarek Poplawski	25bfcd5a78	pkt_sched: Add lockdep annotation for qdisc locks Qdisc locks are initialized in the same function, qdisc_alloc(), so lockdep can't distinguish tx qdisc lock from rx and reports "possible recursive locking detected" when both these locks are taken eg. while using act_mirred with ifb. This looks like a false positive. Anyway, after this patch these locks will be reported more exactly. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:09 -07:00
David S. Miller	8608db031b	pkt_sched: Never schedule non-root qdiscs. Based upon initial discovery and patch by Jarek Poplawski. The qdisc watchdogs can be attached to any qdisc, not just the root, so make sure we schedule the correct one. CBQ has a similar bug. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:05:56 -07:00
David S. Miller	69747650c8	pkt_sched: Fix return value corruption in HTB and TBF. Based upon a bug report by Josip Rodin. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 00:39:41 -07:00
David S. Miller	4cf7cb280e	sch_prio: Use NET_XMIT_SUCCESS instead of "0" constant. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:45:17 -07:00
Jussi Kivilinna	0d40b6e564	sch_prio: Use return value from inner qdisc requeue Use return value from inner qdisc requeue when value returned isn't NET_XMIT_SUCCESS, instead of always returning NET_XMIT_DROP. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:43:56 -07:00
David S. Miller	1e0d5a5747	pkt_sched: No longer destroy qdiscs from RCU. We can now kill them synchronously with all of the previous dev_deactivate() cures. This makes netdev destruction and shutdown saner as the qdiscs hold references to the device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:31:26 -07:00
Jarek Poplawski	3a76e3716b	pkt_sched: Grab correct lock in notify_and_destroy(). From: Jarek Poplawski <jarkao2@gmail.com> When we are destroying non-root qdiscs, we need to lock the root of the qdisc tree not the the qdisc itself. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:02:11 -07:00
David S. Miller	4335cd2da1	pkt_sched: Simplify dev_deactivate() polling loop. The condition under which the previous qdisc has no more references after we've attached &noop_qdisc is that both RUNNING and SCHED are both seen clear while holding the root lock. So just make specifically that check in the polling loop, instead of this overly complex "check without then check with lock held" sequence. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:58:07 -07:00
David S. Miller	a9312ae893	pkt_sched: Add 'deactivated' state. This new state lets dev_deactivate() mark a qdisc as having been deactivated. dev_queue_xmit() and ing_filter() check for this bit and do not try to process the qdisc if the bit is set. dev_deactivate() polls the qdisc after setting the bit, waiting for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear. This isn't perfect yet, but subsequent changesets will make it so. This part is just one piece of the puzzle. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:51:03 -07:00
Jarek Poplawski	323c048836	pkt_sched: Fix unlocking in tc_ctl_tfilter() Fix a bug with spin_lock_bh() inserted instead of spin_unlock_bh() by some recent patch. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-14 17:01:10 -07:00
David S. Miller	b9a3b1102b	pkt_sched: Fix queue quiescence testing in dev_deactivate(). Based upon discussions with Jarek P. and Herbert Xu. First, we're testing the wrong qdisc. We just reset the device queue qdiscs to &noop_qdisc and checking it's state is completely pointless here. We want to wait until the previous qdisc that was sitting at the ->qdisc pointer is not busy any more. And that would be ->qdisc_sleeping. Because of how we propagate the samples qdisc pointer down into qdisc_run and friends via per-cpu ->output_queue and netif_schedule, we have to wait also for the __QDISC_STATE_SCHED bit to clear as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:18:38 -07:00
Jarek Poplawski	26b284de54	pkt_sched: Fix oops in htb_delete. Recent changes introduced a bug in htb_delete(): cl->parent->children counter update misses checking cl->parent for NULL, which is used for root classes, so deleting them causes an oops. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:16:43 -07:00
Jamal Hadi Salim	36723873b6	net-sched: fix Action flushing return code Flushing must consistently return ENOMEM on failure of any allocation Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:45 -07:00
Jamal Hadi Salim	f97017cdef	net-sched: Fix actions flushing Flushing of actions has been broken since we changed the semantics of netlink parsed tb[X] to mean X is an attribute type. This makes the flushing work. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:22 -07:00
Jarek Poplawski	1cfa26661a	pkt_sched: Add BH protection for qdisc_stab_lock. Since qdisc_stab_lock is used in qdisc_put_stab(), which is called in BH context from __qdisc_destroy() RCU callback, softirq safe locking is needed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-11 18:11:06 -07:00
David S. Miller	8123b421e8	pkt_sched: Fix ingress deletion and filter attachment. Based upon bug reports by Stephen Hemminger. We still had some cases using ->qdisc instead of ->qdisc_sleeping. Also, qdisc_lookup() should return ingress qdiscs. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-08 23:23:39 -07:00
Jamal Hadi Salim	76aab2c1ea	pkt_sched: Fix actions referencing When an action is added several times with the same exact index it gets deleted on every even-numbered attempt. This fixes that issue. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:37:22 -07:00
David S. Miller	827ebd6410	pkt_sched: Fix qdisc config when link is down. Bug reported by Stephen Hemminger. We need to fetch the root from ->qdisc_sleeping not ->qdisc. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:26:40 -07:00
David S. Miller	ee7af8264d	pkt_sched: Fix "parent is root" test in qdisc_create(). As noticed by Stephen Hemminger, the root qdisc is denoted by TC_H_ROOT, not zero. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:35:59 -07:00
Jarek Poplawski	c27f339af9	net_sched: Add qdisc __NET_XMIT_BYPASS flag Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:39:11 -07:00
Jarek Poplawski	378a2f090f	net_sched: Add qdisc __NET_XMIT_STOLEN flag Patrick McHardy <kaber@trash.net> noticed: "The other problem that affects all qdiscs supporting actions is TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS even though the packet is not queued, corrupting upper qdiscs' qlen counters." and later explained: "The reason why it translates it at all seems to be to not increase the drops counter. Within a single qdisc this could be avoided by other means easily, upper qdiscs would still increase the counter when we return anything besides NET_XMIT_SUCCESS though. This means we need a new NET_XMIT return value to indicate this to the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN, return that to upper qdiscs and translate it to NET_XMIT_SUCCESS in dev_queue_xmit, similar to NET_XMIT_BYPASS." David Miller <davem@davemloft.net> noticed: "Maybe these NET_XMIT_* values being passed around should be a set of bits. They could be composed of base meanings, combined with specific attributes. So you could say "NET_XMIT_DROP \| __NET_XMIT_NO_DROP_COUNT" The attributes get masked out by the top-level ->enqueue() caller, such that the base meanings are the only thing that make their way up into the stack. If it's only about communication within the qdisc tree, let's simply code it that way." This patch is trying to realize these ideas. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:31:03 -07:00
David S. Miller	5fb662297b	pkt_sched: Use qdisc_lock() on already sampled root qdisc. Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 20:02:43 -07:00
David S. Miller	c3f26a269c	netdev: Fix lockdep warnings in multiqueue configurations. When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 16:58:50 -07:00
David S. Miller	8d50b53d66	pkt_sched: Fix OOPS on ingress qdisc add. Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 02:44:25 -07:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
David S. Miller	cdec7e50a4	Revert "pkt_sched: sch_sfq: dump a real number of flows" This reverts commit `f867e6af94`. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:28:09 -07:00
Ilpo Järvinen	547b792cac	net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:43:18 -07:00
David S. Miller	cffe1c5d7a	pkt_sched: Fix locking in shutdown_scheduler_queue() Qdisc locks need to be held with BH disabled. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 01:25:04 -07:00
Jarek Poplawski	f867e6af94	pkt_sched: sch_sfq: dump a real number of flows Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 21:34:27 -07:00
Adrian Bunk	a94f779f9d	pkt_sched: make qdisc_class_hash_alloc() static This patch makes the needlessly global qdisc_class_hash_alloc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:11 -07:00
Arjan van de Ven	6579e57b31	net: Print the module name as part of the watchdog message As suggested by Dave: This patch adds a function to get the driver name from a struct net_device, and consequently uses this in the watchdog timeout handler to print as part of the message. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:31:48 -07:00
David S. Miller	d3678b463d	Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe." This reverts commit `a0c80b80e0`. After discussions with Jamal and Herbert on netdev, we should provide at least minimal prioritization at the qdisc level even in multiqueue situations. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:50 -07:00
Daniel Lezcano	c3ee84163e	pkt_sched: Remove unused variable skb in dev_deactivate_queue function. Removed unused variable 'skb' in the dev_deactivate_queue function Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 09:18:07 -07:00
David S. Miller	3a682fbd73	pkt_sched: Fix build with NET_SCHED disabled. The stab bits can't be referenced uniless the full packet scheduler layer is enabled. Reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 18:13:01 -07:00
Jussi Kivilinna	175f9c1bba	net_sched: Add size table for qdiscs Add size table functions for qdiscs and calculate packet size in qdisc_enqueue(). Based on patch by Patrick McHardy http://marc.info/?l=linux-netdev&m=115201979221729&w=2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:47 -07:00
Jussi Kivilinna	0abf77e55a	net_sched: Add accessor function for packet length for qdiscs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:27 -07:00
Jussi Kivilinna	5f86173bdf	net_sched: Add qdisc_enqueue wrapper Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:04 -07:00
David S. Miller	30ee42be00	pkt_sched: Fix noqueue_qdisc initialization. Like noop_qdisc, it needs a dummy backpointer and explicit qdisc->q.lock initialization. Based upon a report by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:00:11 -07:00
David S. Miller	3072367300	pkt_sched: Manage qdisc list inside of root qdisc. Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 22:50:15 -07:00
David S. Miller	72b25a913e	pkt_sched: Get rid of u32_list. The u32_list is just an indirect way of maintaining a reference to a U32 node on a per-qdisc basis. Just add an explicit node pointer for u32 to struct Qdisc an do away with this global list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 20:54:17 -07:00
David S. Miller	a0c80b80e0	pkt_sched: Make default qdisc nonshared-multiqueue safe. Instead of 'pfifo_fast' we have just plain 'fifo_fast'. No priority queues, just a straight FIFO. This is necessary in order to legally have a seperate qdisc per queue in multi-TX-queue setups, and thus get full parallelization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:33 -07:00
David S. Miller	99194cff39	pkt_sched: Add multiqueue handling to qdisc_graft(). Move the destruction of the old queue into qdisc_graft(). When operating on a root qdisc (ie. "parent == NULL"), apply the operation to all queues. The caller has grabbed a single implicit reference for this graft, therefore when we apply the change to more than one queue we must grab additional qdisc references. Otherwise, we are operating on a class of a specific parent qdisc, and therefore no multiqueue handling is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	8387400092	pkt_sched: Kill netdev_queue lock. We can simply use the qdisc->q.lock for all of the qdisc tree synchronization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	c7e4f3bbb4	pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree. No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:29 -07:00
David S. Miller	53049978df	pkt_sched: Make qdisc grafting locking more specific. Lock the root of the qdisc being operated upon. All explicit references to qdisc_tree_lock() are now gone. The only remaining uses are via the sch_tree_{lock,unlock}() and tcf_tree_{lock,unlock}() macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:27 -07:00
David S. Miller	ead81cc5fc	netdevice: Move qdisc_list back into net_device proper. And give it it's own lock. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:26 -07:00
David S. Miller	15b458fa65	pkt_sched: Kill qdisc_lock_tree usage in cls_route.c It just wants the qdisc tree to be synchronized, so grabbing qdisc_root_lock() is sufficient. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:25 -07:00
David S. Miller	55dbc640c3	pkt_sched: Remove qdisc_lock_tree usage in cls_api.c It just wants the qdisc tree for the filter to be synchronized. So just BH lock qdisc_root_lock(q) instead. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:24 -07:00
David S. Miller	17715e62a5	pkt_sched: Use per-queue locking in shutdown_scheduler_queue. This eliminates another qdisc_lock_tree user. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:23 -07:00
David S. Miller	8a34c5dc3a	pkt_sched: Perform bulk of qdisc destruction in RCU. This allows less strict control of access to the qdisc attached to a netdev_queue. It is even allowed to enqueue into a qdisc which is in the process of being destroyed. The RCU handler will toss out those packets. We will need this to handle sharing of a qdisc amongst multiple TX queues. In such a setup the lock has to be shared, so will be inside of the qdisc itself. At which point the netdev_queue lock cannot be used to hard synchronize access to the ->qdisc pointer. One operation we have to keep inside of qdisc_destroy() is the list deletion. It is the only piece of state visible after the RCU quiesce period, so we have to undo it early and under the appropriate locking. The operations in the RCU handler do not need any looking because the qdisc tree is no longer visible to anything at that point. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:22 -07:00
David S. Miller	16361127eb	pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. We are registering the device, there is no way anyone can get at this object's qdiscs yet in any meaningful way. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:21 -07:00
David S. Miller	37437bb2e1	pkt_sched: Schedule qdiscs instead of netdev_queue. When we have shared qdiscs, packets come out of the qdiscs for multiple transmit queues. Therefore it doesn't make any sense to schedule the transmit queue when logically we cannot know ahead of time the TX queue of the SKB that the qdisc->dequeue() will give us. Just for sanity I added a BUG check to make sure we never get into a state where the noop_qdisc is scheduled. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:20 -07:00
David S. Miller	7698b4fcab	pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). When code wants to lock the qdisc tree state, the logic operation it's doing is locking the top-level qdisc that sits of the root of the netdev_queue. Add qdisc_root_lock() to represent this and convert the easiest cases. In order for this to work out in all cases, we have to hook up the noop_qdisc to a dummy netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:19 -07:00
David S. Miller	e2627c8c22	pkt_sched: Make QDISC_RUNNING a qdisc state. Currently it is associated with a netdev_queue, but when we have qdisc sharing that no longer makes any sense. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	d3b753db7c	pkt_sched: Move gso_skb into Qdisc. We liberate any dangling gso_skb during qdisc destruction. It really only matters for the root qdisc. But when qdiscs can be shared by multiple netdev_queue objects, we can't have the gso_skb in the netdev_queue any more. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	fd2ea0a79f	net: Use queue aware tests throughout. This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:07 -07:00
David S. Miller	1d8ae3fdeb	pkt_sched: Remove RR scheduler. This actually fixes a bug added by the RR scheduler changes. The ->bands and ->prio2band parameters were being set outside of the sch_tree_lock() and thus could result in strange behavior and inconsistencies. It might be possible, in the new design (where there will be one qdisc per device TX queue) to allow similar functionality via a TX hash algorithm for RR but I really see no reason to export this aspect of how these multiqueue cards actually implement the scheduling of the the individual DMA TX rings and the single physical MAC/PHY port. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:04 -07:00
David S. Miller	e8a0464cc9	netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:00 -07:00
Patrick McHardy	72d9794f44	net-sched: cls_flow: add perturbation support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:36:32 -07:00
David S. Miller	79d16385c7	netdev: Move atomic queue state bits into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:14:46 -07:00
David S. Miller	c773e847ea	netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:13:53 -07:00
David S. Miller	eb6aafe3f8	pkt_sched: Make qdisc_run take a netdev_queue. This allows us to use this calling convention all the way down into qdisc_restart(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:12:38 -07:00
David S. Miller	86d804e10a	netdev: Make netif_schedule() routines work with netdev_queue objects. Only plain netif_schedule() remains taking a net_device, mostly as a compatability item while we transition the rest of these interfaces. Everything else calls netif_schedule_queue() or __netif_schedule(), both of which take a netdev_queue pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:11:25 -07:00
David S. Miller	970565bbad	netdev: Move gso_skb into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:10:33 -07:00
David S. Miller	74d58a0c1d	pkt_sched: Make netem queue agnostic. It just wants the root qdisc given an arbitrary qdisc, and that is simply qdisc->dev_queue->qdisc Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com>	2008-07-08 22:57:51 -07:00
David S. Miller	68dfb42798	pkt_sched: Kill stats_lock member of struct Qdisc. It is always equal to qdisc->dev_queue->lock Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:57:31 -07:00
David S. Miller	816f3258e7	netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead. Now that our qdisc management is bi-directional, per-queue, and fully orthogonal, there is no reason to have a special ingress qdisc pointer in struct net_device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:49:00 -07:00
David S. Miller	b0e1e6462d	netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:42:10 -07:00
David S. Miller	555353cfa1	netdev: The ingress_lock member is no longer needed. Every qdisc is assosciated with a queue, and in the case of ingress qdiscs that will now be netdev->rx_queue so using that queue's lock is the thing to do. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:33:13 -07:00
David S. Miller	dc2b48475a	netdev: Move queue_lock into struct netdev_queue. The lock is now an attribute of the device queue. One thing to notice is that "suspicious" places emerge which will need specific training about multiple queue handling. They are so marked with explicit "netdev->rx_queue" and "netdev->tx_queue" references. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:18:23 -07:00
David S. Miller	5ce2d488fe	pkt_sched: Remove 'dev' member of struct Qdisc. It can be obtained via the netdev_queue. So create a helper routine, qdisc_dev(), to make the transformations nicer looking. Now, qdisc_alloc() now no longer needs a net_device pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:06:30 -07:00
David S. Miller	bb949fbd18	netdev: Create netdev_queue abstraction. A netdev_queue is an entity managed by a qdisc. Currently there is one RX and one TX queue, and a netdev_queue merely contains a backpointer to the net_device. The Qdisc struct is augmented with a netdev_queue pointer as well. Eventually the 'dev' Qdisc member will go away and we will have the resulting hierarchy: net_device --> netdev_queue --> Qdisc Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:55:56 -07:00
David S. Miller	e65d22e180	pkt_sched: Remove comment reference to old style TX locking. We haven't had netdev->tbusy in many years :) Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:46:01 -07:00
Patrick McHardy	fb0305ce1b	net-sched: consolidate default fifo qdisc setup Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:40:21 -07:00
Patrick McHardy	aee18a8cf2	net-sched: sch_htb: remove write-only qdisc filter_cnt The filter_cnt is supposed to count filter references to a class. Since the qdisc can't be the target of a filter, it doesn't need a filter_cnt. In fact the counter is never decreased since cls_api considers a return value of zero a failure and doesn't unbind again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:23:27 -07:00
Patrick McHardy	4207759939	net-sched: sch_htb: remove child and sibling lists Now that the qdisc isn't destroyed in hierarchical order anymore, the only user of the child lists left is htb_parent_last_child(). This can be easily changed to use a counter of children to save a few bytes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:53 -07:00
Patrick McHardy	f4c1f3e0c5	net-sched: sch_htb: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:35 -07:00
Patrick McHardy	fbd8f1379a	net-sched: sch_htb: move hash and sibling list removal to htb_delete Hash list removal currently happens twice (once in htb_delete, once in htb_destroy_class), which makes it harder to use the dynamically sized class hash without adding special cases for HTB. The reason is that qdisc destruction destroys classes in hierarchical order, which is not necessary if filters are destroyed in a separate iteration during qdisc destruction. Adjust qdisc destruction to follow the same scheme as other hierarchical qdiscs by first performing a filter destruction pass, then destroying all classes in hash order. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:19 -07:00
Patrick McHardy	d77fea2eb9	net-sched: sch_cbq: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:05 -07:00
Patrick McHardy	be0d39d52c	net-sched: sch_hfsc: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:21:47 -07:00
Patrick McHardy	6fe1c7a555	net-sched: add dynamically sized qdisc class hash helpers Currently all qdiscs which allow to create classes uses a fixed sized hash table with size 16 to hash the classes. This causes a large bottleneck when using thousands of classes and unbound filters. Add helpers for dynamically sized class hashes to fix this. The following patches will convert the qdiscs to use them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:21:31 -07:00
David S. Miller	ea2aca084b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c	2008-07-05 23:08:07 -07:00
Patrick McHardy	a4aebb83cf	net-sched: fix filter destruction in atm/hfsc qdisc destruction Filters need to be destroyed before beginning to destroy classes since the destination class needs to still be alive to unbind the filter. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:53:09 -07:00
Patrick McHardy	ff31ab56c0	net-sched: change tcf_destroy_chain() to clear start of filter list Pass double tcf_proto pointers to tcf_destroy_chain() to make it clear the start of the filter list for more consistency. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:52:38 -07:00
David S. Miller	1b63ba8a86	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c	2008-06-28 01:19:40 -07:00
Adrian Bunk	ede16af4cd	pkt_sched: Remove CONFIG_NET_SCH_RR Commit `d62733c8e4` ([SCHED]: Qdisc changes and sch_rr added for multiqueue) added a NET_SCH_RR option that was unused since the code went unconditionally into sch_prio. Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:54:05 -07:00
WANG Cong	01e123d79a	pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive. Note, in the following patch, 'err' is initialized as: int err = -ENOBUFS; Signed-off-by: WANG Cong <wcong@critical-links.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:51:35 -07:00
David S. Miller	caea902f72	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/Kconfig drivers/net/wireless/rt2x00/rt2x00usb.c net/sctp/protocol.c	2008-06-16 18:25:48 -07:00
Jesper Dangaard Brouer	47083fc073	pkt_sched: Change HTB_HYSTERESIS to a runtime parameter htb_hysteresis. Add a htb_hysteresis parameter to htb_sch.ko and by sysfs magic make it runtime adjustable via /sys/module/sch_htb/parameters/htb_hysteresis mode 640. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 16:39:32 -07:00
Jesper Dangaard Brouer	f9ffcedddb	pkt_sched: HTB scheduler, change default hysteresis mode to off. The HTB hysteresis mode reduce the CPU load, but at the cost of scheduling accuracy. On ADSL links (512 kbit/s upstream), this inaccuracy introduce significant jitter, enought to disturbe VoIP. For details see my masters thesis (http://www.adsl-optimizer.dk/thesis/), chapter 7, section 7.3.1, pp 69-70. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 16:38:33 -07:00
Adrian Bunk	0b04082995	net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-11 21:00:38 -07:00
Thomas Graf	bc3ed28caa	netlink: Improve returned error codes Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:54 -07:00
Patrick McHardy	f2df824948	net_sched: cls_api: fix return value for non-existant classifiers cls_api should return ENOENT when the requested classifier doesn't exist. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-20 14:34:46 -07:00
Jamal Hadi Salim	9d1045ad68	net_cls_act: act_simple dont ignore realloc code reallocation of the policy data was being ignored. It could fail. Simplify so that there is no need for reallocating. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-06 00:10:24 -07:00
Jamal Hadi Salim	fa1b1cff3d	net_cls_act: Make act_simple use of netlink policy. Convert to netlink helpers by using netlink policy validation. As a side effect fixes a leak. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-05 00:22:35 -07:00
Jarek Poplawski	3ba08b00e0	sch_htb: remove from event queue in htb_parent_to_leaf() There is lack of removing a class from the event queue while changing from parent to leaf which can cause corruption of this rb tree. This patch fixes a bug introduced by my patch: "sch_htb: turn intermediate classes into leaves" commit: `160d5e10f8`. Many thanks to Jan 'yanek' Bortl for finding a way to reproduce this rare bug and narrowing the test case, which made possible proper diagnosing. This patch is recommended for all kernels starting from 2.6.20. Reported-and-tested-by: Jan 'yanek' Bortl <yanek@ya.bofh.cz> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-03 20:46:29 -07:00
Arjan van de Ven	b4192bbd85	net: Add a WARN_ON_ONCE() to the transmit timeout function WARN_ON_ONCE() gives a stack trace including the full module list. Having this in the kernel dump for the timeout case in the generic netdev watchdog will help us see quicker which driver is involved. It also allows us to collect statistics and patterns in terms of which drivers have this event occuring. Suggested by Andrew Morton Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:21:07 -07:00
Jarek Poplawski	980c478ddb	sch_sfq: use del_timer_sync() in sfq_destroy() Let's delete timer reliably in sfq_destroy(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:29:03 -07:00
David S. Miller	1e42198609	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-04-17 23:56:30 -07:00
Patrick McHardy	f5ba2d3217	[PKT_SCHED]: Fix datalen check in tcf_simp_init(). datalen is unsigned so it can never be less than zero, but that's ok because the attribute passed to nla_len() has been validated and therefore a negative return value is impossible. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-17 23:19:55 -07:00
Jarek Poplawski	066a3b5b23	[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop TC_H_MAJ(parentid) for root classes is the same as for ingress, and if ingress qdisc is created qdisc_lookup() returns its pointer (without ingress NULL is returned). After this all qdisc_lookups give the same, and we get endless loop. (I don't know how this could hide for so long - it should trigger with every leaf class deleted if it's qdisc isn't empty.) After this fix qdisc_lookup() is omitted both for ingress and root parents, but looking for root is only wasting a little time here... Many thanks to Enrico Demarin for finding a test for catching this bug, which probably bothered quite a lot of admins. Reported-by: Enrico Demarin <enrico@superclick.com>, Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 15:10:42 -07:00
David S. Miller	df39e8ba56	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ehea/ehea_main.c drivers/net/wireless/iwlwifi/Kconfig drivers/net/wireless/rt2x00/rt61pci.c net/ipv4/inet_timewait_sock.c net/ipv6/raw.c net/mac80211/ieee80211_sta.c	2008-04-14 02:30:23 -07:00
Jarek Poplawski	e56cfad132	[NET_SCHED] cls_u32: refcounting fix for u32_delete() Deleting of nonroot hnodes mostly doesn't work in u32_delete(): refcnt == 1 is expected, but such hnodes' refcnts are initialized with 0 and charged only with "link" nodes. Now they'll start with 1 like usual. Thanks to Patrick McHardy for an improving suggestion. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-12 18:37:13 -07:00
David S. Miller	e1ec1b8ccd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/s2io.c	2008-04-02 22:35:23 -07:00
Herbert Xu	2ba2506ca7	[NET]: Add preemption point in qdisc_run The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:25:26 -07:00
YOSHIFUJI Hideaki	3b1e0a655f	[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:55 +09:00
David S. Miller	06802a819a	Merge branch 'master' of ../net-2.6/ Conflicts: net/ipv6/ndisc.c	2008-03-23 22:54:03 -07:00
Martin Devera	8f3ea33a50	sch_htb: fix "too many events" situation HTB is event driven algorithm and part of its work is to apply scheduled events at proper times. It tried to defend itself from livelock by processing only limited number of events per dequeue. Because of faster computers some users already hit this hardcoded limit. This patch limits processing up to 2 jiffies (why not 1 jiffie ? because it might stop prematurely when only fraction of jiffie remains). Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:00:38 -07:00
David S. Miller	577f99c1d0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/rt2x00dev.c net/8021q/vlan_dev.c	2008-03-18 00:37:55 -07:00
Al Viro	0382b9c354	[PKT_SCHED]: annotate cls_u32 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:46:46 -07:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
David S. Miller	30ddb159ff	[PKT_SCHED] ematch: Fix build warning. Commit `954415e33e` ("[PKT_SCHED] ematch: tcf_em_destroy robustness") removed a cast on em->data when passing it to kfree(), but em->data is an integer type that can hold pointers as well as other values so the cast is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-10 03:48:15 -08:00
Jarek Poplawski	21347456ab	[NET_SCHED] sch_htb: htb_requeue fix htb_requeue() enqueues skbs for which htb_classify() returns NULL. This is wrong because such skbs could be handled by NET_CLS_ACT code, and the decision could be different than earlier in htb_enqueue(). So htb_requeue() is changed to work and look more like htb_enqueue(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:44:00 -08:00
Stephen Hemminger	954415e33e	[PKT_SCHED] ematch: tcf_em_destroy robustness Make the code in tcf_em_tree_destroy more robust and cleaner: * Don't need to cast pointer to kfree() or avoid passing NULL. * After freeing the tree, clear the pointer to avoid possible problems from repeated free. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:26:53 -08:00
Stephen Hemminger	ed7af3b350	[PKT_SCHED]: deinline functions in meta match A couple of functions in meta match don't need to be inline. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:26:17 -08:00
Stephen Hemminger	268bcca1e7	[PKT_SCHED] ematch: oops from uninitialized variable (resend) Setting up a meta match causes a kernel OOPS because of uninitialized elements in tree. [ 37.322381] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 37.322381] IP: [<ffffffff883fc717>] :em_meta:em_meta_destroy+0x17/0x80 [ 37.322381] Call Trace: [ 37.322381] [<ffffffff803ec83d>] tcf_em_tree_destroy+0x2d/0xa0 [ 37.322381] [<ffffffff803ecc8c>] tcf_em_tree_validate+0x2dc/0x4a0 [ 37.322381] [<ffffffff803f06d2>] nla_parse+0x92/0xe0 [ 37.322381] [<ffffffff883f9672>] :cls_basic:basic_change+0x202/0x3c0 [ 37.322381] [<ffffffff802a3917>] kmem_cache_alloc+0x67/0xa0 [ 37.322381] [<ffffffff803ea221>] tc_ctl_tfilter+0x3b1/0x580 [ 37.322381] [<ffffffff803dffd0>] rtnetlink_rcv_msg+0x0/0x260 [ 37.322381] [<ffffffff803ee944>] netlink_rcv_skb+0x74/0xa0 [ 37.322381] [<ffffffff803dffc8>] rtnetlink_rcv+0x18/0x20 [ 37.322381] [<ffffffff803ee6c3>] netlink_unicast+0x263/0x290 [ 37.322381] [<ffffffff803cf276>] __alloc_skb+0x96/0x160 [ 37.322381] [<ffffffff803ef014>] netlink_sendmsg+0x274/0x340 [ 37.322381] [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140 [ 37.322381] [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30 [ 37.322381] [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30 [ 37.322381] [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140 [ 37.322381] [<ffffffff80288611>] zone_statistics+0xb1/0xc0 [ 37.322381] [<ffffffff803c7e5e>] sys_sendmsg+0x20e/0x360 [ 37.322381] [<ffffffff803c7411>] sockfd_lookup_light+0x41/0x80 [ 37.322381] [<ffffffff8028d04b>] handle_mm_fault+0x3eb/0x7f0 [ 37.322381] [<ffffffff8020c2fb>] system_call_after_swapgs+0x7b/0x80 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 03:47:19 -08:00
Stephen Hemminger	04f217aca4	[TC]: oops in em_meta If userspace passes a unknown match index into em_meta, then em_meta_change will return an error and the data for the match will not be set. This then causes an null pointer dereference when the cleanup is done in the error path via tcf_em_tree_destroy. Since the tree structure comes kzalloc, it is initialized to NULL. Discovered when testing a new version of tc command against an accidental older kernel. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-07 18:13:00 -08:00
Patrick McHardy	9ec138101f	[NET_SCHED]: cls_flow: support classification based on VLAN tag Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:21:04 -08:00
Patrick McHardy	4f25049106	[NET_SCHED]: cls_flow: fix key mask validity check Since we're using fls(), we need to check whether the value is non-zero first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:19:59 -08:00
Patrick McHardy	0ea9d70df8	[NET_SCHED]: em_meta: fix compile warning net/sched/em_meta.c: In function 'meta_int_vlan_tag': net/sched/em_meta.c:179: warning: 'tag' may be used uninitialized in this function Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:19:33 -08:00
Stephen Hemminger	3113e88c3c	[PKT_SCHED]: vlan tag match Provide a way to use tc filters on vlan tag even if tag is buried in skb due to hardware acceleration. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:20:13 -08:00
Rami Rosen	0aead54347	[NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build) The 2.6 latest git build was broken when using the following configuration options: CONFIG_NET_EMATCH=n CONFIG_NET_CLS_FLOW=y with the following error: net/sched/cls_flow.c: In function 'flow_dump': net/sched/cls_flow.c:598: error: 'struct tcf_ematch_tree' has no member named 'hdr' make[2]: * [net/sched/cls_flow.o] Error 1 make[1]: * [net/sched] Error 2 make: *** [net] Error 2 see the recent post by Li Zefan: http://www.spinics.net/lists/netdev/msg54434.html The reason for this crash is that struct tcf_ematch_tree (net/pkt_cls.h) is empty when CONFIG_NET_EMATCH is not defined. When CONFIG_NET_EMATCH is defined, the tcf_ematch_tree structure indeed holds a struct tcf_ematch_tree_hdr (hdr) as flow_dump() expects. This patch adds #ifdef CONFIG_NET_EMATCH in flow_dump to avoid this. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 02:56:48 -08:00
Patrick McHardy	e5dfb81518	[NET_SCHED]: Add flow classifier Add new "flow" classifier, which is meant to extend the SFQ hashing capabilities without hard-coding new hash functions and also allows deterministic mappings of keys to classes, replacing some out of tree iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps IPs to marks, with fw filters to classes), ... Some examples: - Classic SFQ hash: tc filter add ... flow hash \ keys src,dst,proto,proto-src,proto-dst divisor 1024 - Classic SFQ hash, but using information from conntrack to work properly in combination with NAT: tc filter add ... flow hash \ keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024 - Map destination IPs of 192.168.0.0/24 to classids 1-257: tc filter add ... flow map \ key dst addend -192.168.0.0 divisor 256 - alternatively: tc filter add ... flow map \ key dst and 0xff - similar, but reverse ordered: tc filter add ... flow map \ key dst and 0xff xor 0xff Perturbation is currently not supported because we can't reliable kill the timer on destruction. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:36 -08:00
Patrick McHardy	94de78d195	[NET_SCHED]: sch_sfq: make internal queues visible as classes Add support for dumping statistics and make internal queues visible as classes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:35 -08:00
Patrick McHardy	7d2681a6ff	[NET_SCHED]: sch_sfq: add support for external classifiers Add support for external classifiers to allow using different flow hash functions similar to ESFQ. When no classifier is attached the built-in hash is used as before. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:34 -08:00
Patrick McHardy	5239008b0d	[NET_SCHED]: Constify struct tcf_ext_map Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:34 -08:00
Roel Kluin	cc8fd14dca	[PKT_SCHED] sch_teql.c: Duplicate IFF_BROADCAST in FMASK, remove 2nd. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:29 -08:00
Patrick McHardy	72eb7bd269	[NET_SCHED]: sch_ingress: remove netfilter support Since the old policer code is gone, TC actions are needed for policing. The ingress qdisc can get packets directly from netif_receive_skb() in case TC actions are enabled or through netfilter otherwise, but since without TC actions there is no policer the only thing it actually does is count packets. Remove the netfilter support and always require TC actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:25 -08:00
Patrick McHardy	7a9c1bd409	[NET_SCHED]: Use nla_policy for attribute validation in ematches Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:24 -08:00
Patrick McHardy	53b2bf3f8a	[NET_SCHED]: Use nla_policy for attribute validation in actions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:23 -08:00
Patrick McHardy	6fa8c0144b	[NET_SCHED]: Use nla_policy for attribute validation in classifiers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:23 -08:00
Patrick McHardy	27a3421e48	[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:22 -08:00
Patrick McHardy	5feb5e1aaa	[NET_SCHED]: sch_api: introduce constant for rate table size Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:21 -08:00
Patrick McHardy	1587bac49f	[NET_SCHED]: Use typeful attribute parsing helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:21 -08:00
Patrick McHardy	24beeab539	[NET_SCHED]: Use typeful attribute construction helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:20 -08:00
Patrick McHardy	57e1c487a4	[NET_SCHED]: Use NLA_PUT_STRING for string dumping Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:19 -08:00
Patrick McHardy	4b3550ef53	[NET_SCHED]: Use nla_nest_start/nla_nest_end Use nla_nest_start/nla_nest_end for dumping nested attributes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:18 -08:00
Patrick McHardy	cee63723b3	[NET_SCHED]: Propagate nla_parse return value nla_parse() returns more detailed errno codes, propagate them back on error. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:18 -08:00
Patrick McHardy	ab27cfb85c	[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:17 -08:00
Patrick McHardy	c96c9471dd	[NET_SCHED]: act_api: use nlmsg_parse Convert open-coded nlmsg_parse to use the real function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:16 -08:00
Patrick McHardy	6d834e04e5	[NET_SCHED]: act_api: fix netlink API conversion bug Fix two invalid attribute accesses, indices start at 1 with the new netlink API. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:15 -08:00
Patrick McHardy	b03f467200	[NET_SCHED]: sch_netem: use nla_parse_nested_compat Replace open coded equivalent of nla_parse_nested_compat(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:15 -08:00
Patrick McHardy	f5e5cb7553	[NET_SCHED]: sch_atm: fix format string warning Fix format string warning introduces by the netlink API conversion: net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:14 -08:00
Patrick McHardy	7ba699c604	[NET_SCHED]: Convert actions from rtnetlink to new netlink API Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:11 -08:00
Patrick McHardy	add93b610a	[NET_SCHED]: Convert classifiers from rtnetlink to new netlink API Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:11 -08:00
Patrick McHardy	1e90474c37	[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API Convert packet schedulers to use the netlink API. Unfortunately a gradual conversion is not possible without breaking compilation in the middle or adding lots of casts, so this patch converts them all in one step. The patch has been mostly generated automatically with some minor edits to at least allow seperate conversion of classifiers and actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:10 -08:00
Patrick McHardy	2eb9d75c72	[NET_SCHED]: mark classifier ops __read_mostly Additionally remove unnecessary NULL initilizations of the next pointer. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:08 -08:00
Patrick McHardy	62e3ba1b55	[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Stephen Hemminger	aa767bfea4	[PKT_SCHED] net classifier: style cleanup's Classifier code cleanup. Get rid of printk wrapper, and fix whitespace and other style stuff reported by checkpatch Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:42 -08:00
Stephen Hemminger	786a90366f	[PKT_SCHED] sch_atm: style cleanup ATM scheduler clean house: * get rid of printk and qdisc_priv() wrapper * split some assignment in if() statements * whitespace and line breaks. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:41 -08:00
Stephen Hemminger	9d127fbdd2	[PKT_SCHED] dsmark: checkpatch warning cleanup Get rid of all style things checkpatch warns about, indentation and whitespace. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:40 -08:00
Stephen Hemminger	4c30719f4f	[PKT_SCHED] dsmark: handle cloned and non-linear skb's Make dsmark work properly with non-linear and cloned skb's Before modifying the header, it needs to check that skb header is writeable. Note: this makes the assumption, that if it queues a good skb then a good skb will come out of the embedded qdisc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:40 -08:00
David S. Miller	5b0ac72bc5	[PKT_SCHED] dsmark: Use hweight32() instead of convoluted loop. Based upon a patch by Stephen Hemminger and suggestions from Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:39 -08:00
Stephen Hemminger	81da99ed71	[PKT_SCHED] dsmark: get rid of wrappers Remove extraneous macro wrappers for printk and qdisc_priv. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:38 -08:00
Patrick McHardy	13a0a096e5	[NET_SCHED]: kill obsolete NET_CLS_POLICE option The code is already gone for about half a year, the config option has been kept around to select the replacement options for easier upgrades. This seems long enough, people upgrading from older kernels will have to reconfigure a lot anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:37 -08:00
Patrick McHardy	891687649a	[NET_SCHED]: sch_ingress: remove useless printk The printk about ingress qdisc registration error can't be triggered under normal circumstances. Since register_qdisc only fails for two identical registrations, the only way to trigger it is by loading the sch_ingress modules multiple times under different names, in which case we already return -EEXIST to userspace. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:22 -08:00
Patrick McHardy	1389356735	[NET_SCHED]: sch_ingress: avoid a few #ifdefs Move the repeating "ifndef CONFIG_NET_CLS_ACT/ifdef CONFIG_NETFILTER" ifdefs into a single condition. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:22 -08:00
Patrick McHardy	645a1e39e4	[NET_SCHED]: sch_ingress: move dependencies to Kconfig Instead of complaining at scheduler initialization time, check the dependencies in Kconfig. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:21 -08:00
Patrick McHardy	c6ee877f2e	[NET_SCHED]: sch_ingress: remove unnecessary ops - ->reset is optional - sch_api provides identical defaults for ->dequeue/->requeue - ->drop can't happen since ingress never has a parent qdisc Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:20 -08:00
Patrick McHardy	e037834758	[NET_SCHED]: sch_ingress: return proper error code in ingress_graft() Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:20 -08:00
Patrick McHardy	c21d4d5dd2	[NET_SCHED]: sch_ingress: remove unused inner qdisc Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:19 -08:00
Patrick McHardy	cb53c04891	[NET_SCHED]: sch_ingress: remove qdisc_priv() wrapper Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:18 -08:00
Patrick McHardy	a47812211b	[NET_SCHED]: sch_ingress: remove excessive debugging Remove excessive debugging statements and some "future use" stuff. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:18 -08:00
Patrick McHardy	58f4df423e	[NET_SCHED]: sch_ingress: formatting fixes Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:17 -08:00
Stephen Hemminger	6f9e98f7a9	[PKT_SCHED] SFQ: whitespace cleanup Add whitespace around operators, and add a few blank lines to improve readability. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:16 -08:00
Stephen Hemminger	d46f8dd87d	[PKT_SCHED] SFQ: use net_random SFQ doesn't need true random numbers, it is only using them to salt a hash. Therefore it is better to use net_random() and avoid any possible problems with depleting the entropy pool. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:15 -08:00
Stephen Hemminger	d3e994830d	[PKT_SCHED] SFQ: timer is deferrable The perturbation timer used for re-keying can be deferred, it doesn't need to be deterministic. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:15 -08:00
Ilpo Järvinen	d88c305a03	[PKT_SCHED] HTB: htb_classid is dead static inline Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:59 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Patrick McHardy	1999414a4e	[NETFILTER]: Mark hooks __read_mostly Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:07 -08:00
Patrick McHardy	41c5b31703	[NETFILTER]: Use nf_register_hooks for multiple registrations Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:06 -08:00
Patrick McHardy	be0ea7d5da	[NETFILTER]: Convert old checksum helper names Kill the defines again, convert to the new checksum helper names and remove the dependency of NET_ACT_NAT on NETFILTER. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:15 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
Eric Dumazet	20fea08b5f	[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Joe Perches	9a94b35184	[PKT_SCHED]: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:02:40 -08:00
Peter P Waskiewicz Jr	5f1a485d59	[PKT_SCHED]: Check subqueue status before calling hard_start_xmit The only qdiscs that check subqueue state before dequeue'ing are PRIO and RR. The other qdiscs, including the default pfifo_fast qdisc, will allow traffic bound for subqueue 0 through to hard_start_xmit. The check for netif_queue_stopped() is done above in pkt_sched.h, so it is unnecessary for qdisc_restart(). However, if the underlying driver is multiqueue capable, and only sets queue states on subqueues, this will allow packets to enter the driver when it's currently unable to process packets, resulting in expensive requeues and driver entries. This patch re-adds the check for the subqueue status before calling hard_start_xmit, so we can try and avoid the driver entry when the queues are stopped. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 20:40:55 -08:00
Radu Rendec	b226801676	[PKT_SCHED] CLS_U32: Use ffs() instead of C code on hash mask to get first set bit. Computing the rank of the first set bit in the hash mask (for using later in u32_hash_fold()) was done with plain C code. Using ffs() instead makes the code more readable and improves performance (since ffs() is better optimized in assembler). Using the conditional operator on hash mask before applying ntohl() also saves one ntohl() call if mask is 0. Signed-off-by: Radu Rendec <radu.rendec@ines.ro> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:54:50 -08:00
Radu Rendec	543821c6f5	[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. While trying to implement u32 hashes in my shaping machine I ran into a possible bug in the u32 hash/bucket computing algorithm (net/sched/cls_u32.c). The problem occurs only with hash masks that extend over the octet boundary, on little endian machines (where htonl() actually does something). Let's say that I would like to use 0x3fc0 as the hash mask. This means 8 contiguous "1" bits starting at b6. With such a mask, the expected (and logical) behavior is to hash any address in, for instance, 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in bucket 1, then 192.168.0.128/26 in bucket 2 and so on. This is exactly what would happen on a big endian machine, but on little endian machines, what would actually happen with current implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace tool and then applied to 192.168.x.x in the u32 classifier. When shifting right by 16 bits (rank of first "1" bit in the reversed mask) and applying the divisor mask (0xff for divisor 256), what would actually remain is 0x3f applied on the "168" octet of the address. One could say is this can be easily worked around by taking endianness into account in userspace and supplying an appropriate mask (0xfc03) that would be turned into contiguous "1" bits when reversed (0x03fc0000). But the actual problem is the network address (inside the packet) not being converted to host order, but used as a host-order value when computing the bucket. Let's say the network address is written as n31 n30 ... n0, with n0 being the least significant bit. When used directly (without any conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8 would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be consecutive. The fix is to apply ntohl() on the hmask before computing fshift, and in u32_hash_fold() convert the packet data to host order before shifting down by fshift. With helpful feedback from Jamal Hadi Salim and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:45 -08:00
Evgeniy Polyakov	4f9f8311a0	[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline tecl_reset() is called from deactivate and qdisc is set to noop already, but subsequent teql_xmit does not know about it and dereference private data as teql qdisc and thus oopses. not catch it first :) Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:17 -08:00
Jamal Hadi Salim	a057ae3c10	[NET_CLS_ACT]: Use skb_act_clone clean skb_clone of any signs of CONFIG_NET_CLS_ACT and have mirred us skb_act_clone() Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 02:47:54 -07:00
Pavel Emelyanov	0034622693	[PKT_SCHED]: Fix sch_prio.c build with CONFIG_NETDEVICES_MULTIQUEUE Fix one more user of netiff_subqueue_stopped. To check for the queue id one must use the __netiff_subqueue_stoped call. This run out of my sight when I made the: `668f895a85` [NET]: Hide the queue_mapping field inside netif_subqueue_stopped commit :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:53 -07:00
Pavel Emelyanov	668f895a85	[NET]: Hide the queue_mapping field inside netif_subqueue_stopped Many places get the queue_mapping field from skb to pass it to the netif_subqueue_stopped() which will be 0 in any case. Make the helper that works with sk_buff Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Pavel Emelyanov	4e3ab47a54	[NET]: Make and use skb_get_queue_mapping Make the helper for getting the field, symmetrical to the "set" one. Return 0 if CONFIG_NETDEVICES_MULTIQUEUE=n Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Robert P. J. Day	3a4fa0a25d	Fix misspellings of "system", "controller", "interrupt" and "necessary". Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:10:43 +02:00
Herbert Xu	ce0e32e65f	[NET]: Fix possible dev_deactivate race condition The function dev_deactivate is supposed to only return when all outstanding transmissions have completed. Unfortunately it is possible for store operations in the driver's transmit function to only become visible after dev_deactivate returns. This patch fixes this by taking the queue lock after we see the end of the queue run. This ensures that all effects of any previous transmit calls are visible. If however we detect that there is another queue run occuring, then we'll warn about it because this should never happen as we have pointed dev->qdisc to noop_qdisc within the same queue lock earlier in the functino. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:37:58 -07:00
Randy Dunlap	85ef3e5cad	[NET]: QoS/Sched as menuconfig Convert "QoS and/or fair queueing" to menuconfig. This makes it easy for someone to disable all sub-options with one config symbol. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 21:56:38 -07:00
Jeff Garzik	bfaae0f04c	[NET]: fix carrier-on bug? While looking at a net driver with the following construct, if (!netif_carrier_ok(dev)) netif_carrier_on(dev); it stuck me that the netif_carrier_ok() check was redundant, since netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway. This is the same reason why netif_queue_stopped() need not be called prior to netif_wake_queue(). This is true, but there is however an unwanted side effect from assuming that netif_carrier_on() can be called multiple times: it touches the watchdog, regardless of pre-existing carrier state. The fix: move watchdog-up inside the bit-cleared code path. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 23:26:43 -07:00
Herbert Xu	3db05fea51	[NETFILTER]: Replace sk_buff ** with sk_buff * With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:29 -07:00
Patrick McHardy	3c0cfc1358	[NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched The fourth parameter of /proc/net/psched is supposed to show the timer resultion and is used by HTB userspace to calculate the necessary burst rate. Currently we show the clock resolution, which results in a too low burst rate when the two differ. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:59 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Herbert Xu	b421995235	[PKT_SCHED]: Add stateless NAT Stateless NAT is useful in controlled environments where restrictions are placed on through traffic such that we don't need connection tracking to correctly NAT protocol-specific data. In particular, this is of interest when the number of flows or the number of addresses being NATed is large, or if connection tracking information has to be replicated and where it is not practical to do so. Previously we had stateless NAT functionality which was integrated into the IPv4 routing subsystem. This was a great solution as long as the NAT worked on a subnet to subnet basis such that the number of NAT rules was relatively small. The reason is that for SNAT the routing based system had to perform a linear scan through the rules. If the number of rules is large then major renovations would have take place in the routing subsystem to make this practical. For the time being, the least intrusive way of achieving this is to use the u32 classifier written by Alexey Kuznetsov along with the actions infrastructure implemented by Jamal Hadi Salim. The following patch is an attempt at this problem by creating a new nat action that can be invoked from u32 hash tables which would allow large number of stateless NAT rules that can be used/updated in constant time. The actual NAT code is mostly based on the previous stateless NAT code written by Alexey. In future we might be able to utilise the protocol NAT code from netfilter to improve support for other protocols. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:11 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Stephen Hemminger	0c4e85813d	[NET]: Wrap netdevice hardware header creation. Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:50 -07:00
Jamal Hadi Salim	8236632fb3	[NET_SCHED]: explict hold dev tx lock For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of operation, after all the work of entering the dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3 showed similar behavior; Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:15 -07:00
Ralf Baechle	10d024c1b2	[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:13 -07:00
Jesper Dangaard Brouer	e9bef55d3d	[NET_SCHED]: Cleanup L2T macros and handle oversized packets Change L2T (length to time) macros, in all rate based schedulers, to call a common function qdisc_l2t() that does the rate table lookup. This function handles if the packet size lookup is larger than the rate table, which often occurs with TSO enabled. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:20 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Stephen Hemminger	bf1b803b01	[PKT_SCHED] cls_u32: error code isn't been propogated properly Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-07 23:57:45 -07:00
Alexey Kuznetsov	32740ddc10	[SFQ]: Remove artificial limitation for queue limit. This is followup to Patrick's patch. A little optimization to enqueue routine allows to remove artificial limitation on queue length. Plus, testing showed that hash function used by SFQ is too bad or even worse. It does not even sweep the whole range of hash values. Switched to Jenkins' hash. Signed-off-by: Alexey Kuznetsov <kaber@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-01 21:01:23 -07:00
Alexey Kuznetsov	5588b40d7c	[PKT_SCHED]: Fix 'SFQ qdisc crashes with limit of 2 packets' Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-20 12:14:08 -07:00
Satyam Sharma	ddeee3ce7f	[PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning net/sched/sch_cbq.c: In function 'cbq_enqueue': net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this function has been verified to be a bogus case. So let's shut it up. Signed-off-by: Satyam Sharma <satyam@infradead.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-16 14:54:05 -07:00
Jamal Hadi Salim	e1e992e52f	[NET_SCHED] protect action config/dump from irqs (with no apologies to C Heston) On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote: On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote: > > > > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep > > was quite noisy when I tried to shape my external (wireless) interface: > > > > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock: > > [ 6400.534713] (&dev->ingress_lock){-+..}, at: [<c038d595>] > > netif_receive_skb+0x2d5/0x3c0 > > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the > > past: > > [ 6400.535145] (police_lock){-.--} > > This is a genuine dead-lock. The police lock can be taken > for reading with softirqs on. If a second CPU tries to take > the police lock for writing, while holding the ingress lock, > then a softirq on the first CPU can dead-lock when it tries > to get the ingress lock. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-14 16:43:05 -07:00
Lucas Nussbaum	dbaaa07a60	[NET_SCHED] sch_prio.c: remove duplicate call of tc_classify() When CONFIG_NET_CLS_ACT is enabled, tc_classify() is called twice in prio_classify(). This causes "interesting" behaviour: with the setup below, packets are duplicated, sent twice to ifb0, and then loop in and out of ifb0. The patch uses the previously calculated return value in the switch, which is probably what Patrick had in mind in commit `bdba91ec70` -- maybe Patrick can double-check this? -- example setup -- ifconfig ifb0 up tc qdisc add dev ifb0 root netem delay 2s tc qdisc add dev $ETH root handle 1: prio tc filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 172.24.110.6/32 flowid 1:1 \ action mirred egress redirect dev ifb0 ping -c1 172.24.110.6 Signed-off-by: Lucas Nussbaum <lucas.nussbaum@imag.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-30 22:35:46 -07:00
Jesper Juhl	0a26f4cdc2	[PKT_SCHED]: Clean up duplicate includes in net/sched/ This patch cleans up duplicate includes in net/sched/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-13 22:52:04 -07:00
Peter P Waskiewicz Jr	0773192b0f	[NET]: Fix prio_tune() handling of root qdisc. Fix the check in prio_tune() to see if sch->parent is TC_H_ROOT instead of sch->handle to load or reject the qdisc for multiqueue devices. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:20 -07:00
Patrick McHardy	ffc8fefaf2	[NET]: Fix sch_api to properly set sch->parent on the root. Fix sch_api to correctly set sch->parent for both ingress and egress qdiscs in qdisc_create(). Signed-off-by: Patrick McHardy <trash@kaber.net> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:19 -07:00
Patrick McHardy	bdba91ec70	[NET_SCHED]: Fix prio/ingress classification logic error Fix handling of empty or completely non-matching filter chains. In that case -1 is returned and tcf_result is uninitialized, the qdisc should fall back to default classification in that case. Noticed by PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:18 -07:00
Gabriel Craciunescu	99acaeb92f	[PKT_SCHED]: Some typo fixes in net/sched/Kconfig Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 02:00:04 -07:00
vignesh babu	782f795689	[ATM]: Replacing kmalloc/memset combination with kzalloc. Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: chas williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:46:51 -07:00
Patrick McHardy	c3bc7cff8f	[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE, remove the old code. The config option will be kept around to select the equivalent NET_CLS_ACT options for a short time to allow easier upgrades. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:03:05 -07:00
Patrick McHardy	73ca4918fb	[NET_SCHED]: act_api: qdisc internal reclassify support The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return it to the qdisc, which could handle it internally or ignore it. With NET_CLS_ACT however, tc_classify starts over at the first classifier and never returns it to the qdisc. This makes it impossible to support qdisc-internal reclassification, which in turn makes it impossible to remove the old NET_CLS_POLICE code without breaking compatibility since we have two qdiscs (CBQ and ATM) that support this. This patch adds a tc_classify_compat function that handles reclassification the old way and changes CBQ and ATM to use it. This again is of course not fully backwards compatible with the previous NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain compatibility and support qdisc internal reclassification with NET_CLS_ACT, but this seems like the better choice over keeping the two incompatible options around forever. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:02:31 -07:00
Patrick McHardy	f6853e2df3	[NET_SCHED]: sch_dsmark: act_api support Handle act_api classification results. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:02:10 -07:00
Patrick McHardy	9210080445	[NET_SCHED]: sch_atm: act_api support Handle act_api classification results. The ATM scheduler behaves slightly different than other schedulers in that it only handles policer results for successful classifications, this behaviour is retained for the act_api case. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:01:49 -07:00
Patrick McHardy	b0188d4dbe	[NET_SCHED]: sch_atm: Lindent Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-15 00:01:25 -07:00
Patrick McHardy	0621ed2e4e	[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run from the timer handler can result in deadlock: > CPU#0 > > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! The entire optimization is a bit questionable IMO, it moves potentially large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ, which kind of defeats the separation of them. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Ranko Zivojnovic <ranko@spidernet.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-14 20:49:26 -07:00
Patrick McHardy	db3d99c090	[NET_SCHED]: ematch: module autoloading Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-11 19:46:26 -07:00
Ranjit Manomohan	c9726d6890	[NET_SCHED]: Make HTB scheduler work with TSO. Currently the HTB scheduler does not correctly account for TSO packets which causes large inaccuracies in the bandwidth control when using TSO. This patch allows the HTB scheduler to work with TSO enabled devices. Signed-off-by: Ranjit Manomohan <ranjitm@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:43:16 -07:00
Patrick McHardy	0ba4805383	[NET_SCHED]: Remove unnecessary includes Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:41 -07:00
Patrick McHardy	ee39e10c27	[NET_SCHED]: sch_htb: use generic estimator Use the generic estimator instead of reimplementing (parts of) it. For compatibility always create a default estimator for new classes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:39 -07:00
Patrick McHardy	4bdf39911e	[NET_SCHED]: Remove unnecessary stats_lock pointers Remove stats_lock pointers from qdisc-internal structures, in all cases it points to dev->queue_lock. The only case where it is necessary is for top-level qdiscs, where it might also point to dev->ingress_lock in case of the ingress qdisc. Also remove it from actions completely, it always points to the actions internal lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:38 -07:00
Patrick McHardy	876d48aabf	[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option The generic estimator is always built in anways and all the config options does is prevent including a minimal amount of code for setting it up. Additionally the option is already automatically selected for most cases. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:37 -07:00
Peter P Waskiewicz Jr	d62733c8e4	[SCHED]: Qdisc changes and sch_rr added for multiqueue Add the new sch_rr qdisc for multiqueue network device support. Allow sch_prio and sch_rr to be compiled with or without multiqueue hardware support. sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS. This was done since sch_prio and sch_rr only differ in their dequeue routine. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:22 -07:00
Peter P Waskiewicz Jr	f25f4e4480	[CORE] Stack changes to add multiqueue hardware support API Add the multiqueue hardware device support API to the core network stack. Allow drivers to allocate multiple queues and manage them at the netdev level if they choose to do so. Added a new field to sk_buff, namely queue_mapping, for drivers to know which tx_ring to select based on OS classification of the flow. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:21 -07:00
Krishna Kumar	e50c41b53d	[NET]: qdisc_restart - couple of optimizations. Changes : - netif_queue_stopped need not be called inside qdisc_restart as it has been called already in qdisc_run() before the first skb is sent, and in __qdisc_run() after each intermediate skb is sent (note : we are the only sender, so the queue cannot get stopped while the tx lock was got in the ~LLTX case). - BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1 meant more packets are available, and __qdisc_run used to loop when qdisc_restart() returned -1. During those days, it was necessary to make sure that qlen is never less than zero, since __qdisc_run would get into an infinite loop if no packets are on the queue and this bug in qdisc was there (and worse - no more skbs could ever get queue'd as we hold the queue lock too). With Herbert's recent change to return values, this check is not required. Hopefully Herbert can validate this change. If at all this is required, it should be added to skb_dequeue (in failure case), and not to qdisc_qlen. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:36 -07:00
Krishna Kumar	6c1361a6f2	[NET]: qdisc_restart - readability changes plus one bug fix. New changes : - Incorporated Peter Waskiewicz's comments. - Re-added back one warning message (on driver returning wrong value). Previous changes : - Converted to use switch/case code which looks neater. - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless check should be removed, since driver will return NETDEV_TX_LOCKED only if lockless is true and driver has to do the locking. In the original code as well as the latest code, this code can result in a bug where if LLTX is not set for a driver (lockless == 0) but the driver is written wrongly to do a trylock (despite LLTX being set), the driver returns LOCKED. But since lockless is zero, the packet is requeue'd instead of calling collision code which will issue warning and free up the skb. Instead this skb will be retried with this driver next time, and the same result will ensue. Removing this check will catch these driver bugs instead of hiding the problem. I am keeping this change to readability section since : a. it is confusing to check two things as it is; and b. it is difficult to keep this check in the changed 'switch' code. - Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is the work being done and easier to understand) and do_dev_requeue to dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to dev_handle_collision (handle_dev_cpu_collision is a small routine with only one caller, so there is no need to have two separate routines which also results in getting rid of two macros, etc. - Removed an XXX comment as it should never fail (I suspect this was related to batch skb WIP, Jamal ?). Converted some functions to original coding style of having the return values and the function name on same line, eg prio2list. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:35 -07:00
Jamal Hadi Salim	c716a81ab9	[NET_SCHED]: Cleanup readability of qdisc restart Over the years this code has gotten hairier. Resulting in many long discussions over long summer days and patches that get it wrong. This patch helps tame that code so normal people will understand it. Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy for their valuable reviews. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:06:16 -07:00
Patrick McHardy	b00b4bf94e	[NET_SCHED]: Fix filter double free cbq and atm destroy their filters twice when destroying inner classes during qdisc destruction. Reported-and-tested-by: Strobl Anton <a.strobl@aws-it.at> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:41:05 -07:00
Bill Nottingham	75202e7689	[NET]: Fix comparisons of unsigned < 0. Recent gcc versions emit warnings when unsigned variables are compared < 0 or >= 0. Signed-off-by: Bill Nottingham <notting@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:47 -07:00
Venkatesh Pallipadi	60468d5b5b	[NET]: Make net watchdog timers 1 sec jiffy aligned. round_jiffies for net dev watchdog timer. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:46 -07:00
Patrick McHardy	2e4b3b0e87	[NET_SCHED]: sch_htb: fix event cache time calculation The event cache time must be an absolute value, when no event exists it is incorrectly set to 1s instead of 1s in the future. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:56 -07:00
Herbert Xu	36247f5421	[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Reported-and-debugged-by: Anant Nitya <kernel@prachanda.info> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 16:36:43 -07:00
Jamal Hadi Salim	3e5c2d3bdb	[NET_SCHED]: prio qdisc boundary condition This fixes an out-of-boundary condition when the classified band equals q->bands. Caught by Alexey Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-14 02:57:19 -07:00
Herbert Xu	41a23b0788	[NET_SCHED]: Avoid requeue warning on dev_deactivate When we relinquish queue_lock in qdisc_restart and then retake it for requeueing, we might race against dev_deactivate and end up requeueing onto noop_qdisc. This causes a warning to be printed. This patch fixes this by checking this before we requeue. As an added bonus, we can remove the same check in __qdisc_run which was added to prevent dev->gso_skb from being requeued when we're shutting down. Even though we've had to add a new conditional in its place, it's better because it only happens on requeues rather than every single time that qdisc_run is called. For this to work we also need to move the clearing of gso_skb up in dev_deactivate as now qdisc_restart can occur even after we wait for __LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long as the queue and gso_skb is already clear). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:42 -07:00
Herbert Xu	cce1fa36a8	[NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OK Now that we return the queue length after NETDEV_TX_OK we better make sure that we have the right queue. Otherwise we can cause a stall after a really quick dev_deactive/dev_activate. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:41 -07:00
Herbert Xu	d90df3ad07	[NET_SCHED]: Rationalise return value of qdisc_restart The current return value scheme and associated comment was invented back in the 20th century when we still had that tbusy flag. Things have changed quite a bit since then (even Tony Blair is moving on now, not to mention the new French president). All we need to indicate now is whether the caller should continue processing the queue. Therefore it's sufficient if we return 0 if we want to stop and non-zero otherwise. This is based on a patch by Krishna Kumar. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:40 -07:00
Thomas Graf	5830725f8a	[NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED case When transmit fails with NETDEV_TX_LOCKED the skb is requeued to dev->qdisc again. The dev->qdisc pointer is protected by the queue lock which needs to be dropped when attempting to transmit and acquired again before requeing. The problem is that qdisc_restart() fetches the dev->qdisc pointer once and stores it in the `q' variable which is invalidated when dropping the queue_lock, therefore the variable needs to be refreshed before requeueing. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:39 -07:00
Krishna Kumar	4cd8c9e87b	[NET_SCHED]: teql_enqueue can check limits before skb enqueue Optimize teql_enqueue so that it first checks limits before enqueing. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:45:10 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Stephen Hemminger	3ff50b7997	[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:24 -07:00
Patrick McHardy	fd44de7cc1	[NET_SCHED]: ingress: switch back to using ingress_lock Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:08 -07:00
Patrick McHardy	0463d4ae25	[NET_SCHED]: Eliminate qdisc_tree_lock Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:07 -07:00
Patrick McHardy	c95e939508	[NET_SCHED]: qdisc: remove unnecessary memory barriers We're holding dev->queue_lock in qdisc_watchdog_schedule and qdisc_watchdog_cancel, no need for the barriers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:58 -07:00
Patrick McHardy	a48b5a6144	[NET_SCHED]: Unline tcf_destroy Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:56 -07:00
Patrick McHardy	3bebcda280	[NET_SCHED]: turn PSCHED_GET_TIME into inline function Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:55 -07:00
Patrick McHardy	03cc45c0a5	[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:54 -07:00
Patrick McHardy	8edc0c31d6	[NET_SCHED]: kill PSCHED_TDIFF Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:53 -07:00
Patrick McHardy	a084980dcb	[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:51 -07:00
Patrick McHardy	104e087898	[NET_SCHED]: kill PSCHED_TLESS Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:50 -07:00
Patrick McHardy	7c59e25f31	[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:49 -07:00
Patrick McHardy	26e252df1e	[NET_SCHED]: kill PSCHED_AUDIT_TDIFF Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:48 -07:00
Patrick McHardy	76d643cd3b	[NET_SCHED]: sch_netem: fix off-by-one in send time comparison netem checks PSCHED_TLESS(cb->time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb->time_to_send < now. Use !PSCHED_TLESS(now, cb->time_to_send) instead to properly handle cb->time_to_send == now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:47 -07:00
Stephen Hemminger	bb2f8cc0ec	[NETEM]: spelling errors Get rid of some of my creative spelling. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:33 -07:00
Stephen Hemminger	1936502d00	[NET_SCHED] qdisc: avoid transmit softirq on watchdog wakeup If possible, avoid having to do a transmit softirq when a qdisc watchdog decides to re-enable. The watchdog routine runs off a timer, so it is already in the same effective context as the softirq. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:23 -07:00
Stephen Hemminger	11274e5a43	[NETEM]: avoid excessive requeues The netem code would call getnstimeofday() and dequeue/requeue after every packet, even if it was waiting. Avoid this overhead by using the throttled flag. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:22 -07:00
Stephen Hemminger	075aa573b7	[NETEM]: Optimize tfifo In most cases, the next packet will be sent after the last one. So optimize that case. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:21 -07:00
Stephen Hemminger	b407621c35	[NETEM]: use better types for time values The random number generator always generates 32 bit values. The time values are limited by psched_tdiff_t Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:20 -07:00
Stephen Hemminger	a362e0a789	[NETEM]: report reorder percent correctly. If you setup netem to just delay packets; "tc qdisc ls" will report the reordering as 100%. Well it's a lie, reorder isn't used unless gap is set, so just set value to 0 so the output of utility is correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:20 -07:00
Thomas Graf	708914cc5e	[PKT_SCHED] act: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:11 -07:00
Thomas Graf	82623c0d73	[PKT_SCHED] cls: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:10 -07:00
Thomas Graf	be577ddc2b	[PKT_SCHED] qdisc: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:09 -07:00
Arnaldo Carvalho de Melo	dc5fc579b9	[NETLINK]: Use nlmsg_trim() where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Patrick McHardy	514bca322c	[NET_SCHED]: Fix warning net/sched/sch_api.c: In function 'psched_show': net/sched/sch_api.c:1219: warning: format '%08x' expects type 'unsigned int', but argument 6 has type 's64' Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:17 -07:00
Patrick McHardy	bb239acf56	[NET_SCHED]: sch_cbq: fix watchdog scheduled too late q->now is increased during dequeue and doesn't contain the current time afterwards, resulting in a too large timeout value for the qdisc watchdog. Use "now" instead, which still contains the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:16 -07:00
Patrick McHardy	4361cb17f0	[NET_SCHED]: Export real timer resolution in /proc/net/psched The timer resolution exported in /proc/net/psched is used by userspace to calculate HTB's burst values. Currently it is set to HZ, since we're now using hrtimers, use KTIME_MONOTONIC_RES, which makes HTB use smaller burst values. This patch also affects libnl, which incorrectly uses this value for the SFQ perturbation parameter, which is always in seconds, and some routing cache values, which are in USER_HZ, so both cases are broken anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:15 -07:00
Patrick McHardy	00c04af9df	[NET_SCHED]: kill jiffie conversion macros Now that all packet schedulers have been converted to hrtimers most users of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use it to convert external time units to packet scheduler clock ticks, so use PSCHED_TICKS_PER_SEC instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:14 -07:00
Patrick McHardy	fb983d4578	[NET_SCHED]: sch_htb: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:13 -07:00
Patrick McHardy	1a13cb63d6	[NET_SCHED]: sch_cbq: use hrtimer for delay_timer Switch delay_timer to hrtimer. The class penalty parameter is changed to use psched ticks as units. Since iproute never supported using this and the only existing user (libnl) incorrectly assumes psched ticks as units anyway, this shouldn't break anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:12 -07:00
Patrick McHardy	e9054a339e	[NET_SCHED]: sch_cbq: fix cbq_undelay_prio for non-active priorites cbq_undelay_prio is supposed to return a time delta, but returns the current time for non-active priorities, causing cbq_undelay to mark the priority as active and schedule a timer for twice the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:11 -07:00
Patrick McHardy	88a993540a	[NET_SCHED]: sch_cbq: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:09 -07:00
Patrick McHardy	59cb5c6734	[NET_SCHED]: sch_netem: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:08 -07:00
Patrick McHardy	f7f593e383	[NET_SCHED]: sch_tbf: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:07 -07:00
Patrick McHardy	ed2b229a97	[NET_SCHED]: sch_hfsc: use hrtimer based watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:06 -07:00
Patrick McHardy	4179477f63	[NET_SCHED]: Add hrtimer based qdisc watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:05 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	bbe735e424	[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:58 -07:00
YOSHIFUJI Hideaki	b6d9bcb069	[NET] SCHED: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:00 -07:00
Patrick McHardy	bb8a954f27	[NET_SCHED]: cls_tcindex: fix compatibility breakage Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed to expect and use a u16 value in 2.6.11, which broke compatibility on big endian machines. Change back to use int. Reported by Ole Reinartz <ole.reinartz@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-09 13:31:13 -07:00
Patrick McHardy	31ba548f96	[NET_SCHED]: cls_basic: fix memory leak in basic_destroy tp->root is not freed on destruction. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-02 13:30:52 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00
Patrick McHardy	c38c83cb70	[NET_SCHED]: sch_htb/sch_hfsc: fix oops in qlen_notify During both HTB and HFSC class deletion the class is removed from the class hash before calling qdisc_tree_decrease_qlen. This makes the ->get operation in qdisc_tree_decrease_qlen fail, so it passes a NULL pointer to ->qlen_notify, causing an oops. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-27 14:04:24 -07:00
Robert P. J. Day	9b2f7bcf0e	[NET]: Remove dead net/sched/Makefile entry for sch_hpfq.o. Remove the worthless net/sched/Makefile entry for the non-existent source file sch_hpfq.c. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-26 16:20:34 -07:00
Patrick McHardy	d3fa76ee6b	[NET_SCHED]: cls_basic: fix NULL pointer dereference cls_basic doesn't allocate tp->root before it is linked into the active classifier list, resulting in a NULL pointer dereference when packets hit the classifier before its ->change function is called. Reported by Chris Madden <chris@reflexsecurity.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:11 -07:00
Dave Jones	b6f99a2119	[NET]: fix up misplaced inlines. Turning up the warnings on gcc makes it emit warnings about the placement of 'inline' in function declarations. Here's everything that was under net/ Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-22 12:27:49 -07:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Patrick McHardy	3d50f23108	[NET_SCHED]: sch_hfsc: replace ASSERT macro by WARN_ON Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:36:57 -08:00
Arjan van de Ven	da7071d7e3	[PATCH] mark struct file_operations const 8 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
YOSHIFUJI Hideaki	10297b9931	[NET] SCHED: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:08 -08:00
Jan Engelhardt	e60a13e030	[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:20 -08:00
Patrick McHardy	a8d0f9526f	[NET]: Add UDPLITE support in a few missing spots Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:14 -08:00
Arjan van de Ven	f5a6e01c09	[NET]: user of the jiffies rounding code: Networking This patch introduces users of the round_jiffies() function in the networking code. These timers all were of the "about once a second" or "about once every X seconds" variety and several showed up in the "what wakes the cpu up" profiles that the tickless patches provide. Some timers are highly dynamic based on network load; but even on low activity systems they still show up so the rounding is done only in cases of low activity, allowing higher frequency timers in the high activity case. The various hardware watchdogs are an obvious case; they run every 2 seconds but aren't otherwise specific of exactly when they need to run. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:52 -08:00
Jarek Poplawski	2cf6c36cb4	[NET_SCHED] sch_prio: class statistics printing enabled This patch adds a dump_stats callback to enable printing of basic statistics of prio classes. (With help of Patrick McHardy). Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:40 -08:00
Patrick McHardy	239a87c876	[NET_SCHED]: act_ipt: fix regression in ipt action The x_tables patch broke target module autoloading in the ipt action by replacing the ipt_find_target call (which does autoloading) by xt_find_target (which doesn't do autoloading). Additionally xt_find_target may return ERR_PTR values in case of an error, which are not handled. Use xt_request_find_target, which does both autoloading and ERR_PTR handling properly. Also don't forget to drop the target module reference again when xt_check_target fails. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-02 00:40:36 -08:00
Jarek Poplawski	160d5e10f8	[NET_SCHED] sch_htb: turn intermediate classes into leaves - turn intermediate classes into leaves again when their last child is deleted (struct htb_class changed) Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-08 17:19:32 -08:00
Jarek Poplawski	a37ef2e325	[NET_SCHED] sch_cbq: deactivating when grafting, purging etc. - deactivating of active classes when q.qlen drops to zero (cbq_drop) - a redundant instruction removed from cbq_deactivate_class PS: probably htb_deactivate in htb_delete and cbq_deactivate_class in cbq_delete are also redundant now. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-08 17:19:31 -08:00
Patrick McHardy	5c804bfdcc	[NET_SCHED]: cls_fw: fix NULL pointer dereference When the first fw classifier is initialized, there is a small window between the ->init() and ->change() calls, during which the classifier is active but not entirely set up and tp->root is still NULL (->init() does nothing). When a packet is queued during this window a NULL pointer dereference occurs in fw_classify() when trying to dereference head->mask; Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:07 -08:00
Kim Nordlund	a163148c1b	[PKT_SCHED] act_gact: division by zero Not returning -EINVAL, because someone might want to use the value zero in some future gact_prob algorithm? Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:11 -08:00
Patrick McHardy	1e9b3d5339	[NET_SCHED]: policer: restore compatibility with old iproute binaries The tc actions increased the size of struct tc_police, which broke compatibility with old iproute binaries since both the act_police and the old NET_CLS_POLICE code check for an exact size match. Since the new members are not even used, the simple fix is to also accept the size of the old structure. Dumping is not affected since old userspace will receive a bigger structure, which is handled fine. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:07 -08:00
Adrian Bunk	5f68e4c07c	[PKT_SCHED]: Remove unused exports. This patch removes the following unused EXPORT_SYMBOL's: - sch_api.c: qdisc_lookup - sch_generic.c: __netdev_watchdog_up - sch_generic.c: noop_qdisc_ops - sch_generic.c: qdisc_alloc Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:32:06 -08:00
Patrick McHardy	e488eafcc5	[NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures When peeking at the next packet in a child qdisc by calling dequeue/requeue, the upper qdisc qlen counter may get out of sync in case the requeue fails. The qdisc and the child qdisc both have their counter decremented, but since no packet is given to the upper qdisc it won't decrement its counter itself. requeue should not fail, so this is mostly for "correctness". Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:46 -08:00
Patrick McHardy	256d61b87b	[NET_SCHED]: Fix endless loops (part 4): HTB Convert HTB to use qdisc_tree_decrease_len() and add a callback for deactivating a class when its child queue becomes empty. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:45 -08:00
Patrick McHardy	f973b913e1	[NET_SCHED]: Fix endless loops (part 3): HFSC Convert HFSC to use qdisc_tree_decrease_len() and add a callback for deactivating a class when its child queue becomes empty. All queue purging goes through hfsc_purge_queue(), which is used in three cases: grafting, class creation (when a leaf class is turned into an intermediate class by attaching a new class) and class deletion. In all cases qdisc_tree_decrease_len() is needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:44 -08:00
Patrick McHardy	5e50da01d0	[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where necessary: - all graft operations - destruction of old child qdiscs in prio, red and tbf change operation - purging of queue in sfq change operation Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:43 -08:00
Patrick McHardy	43effa1e57	[NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1) There are multiple problems related to qlen adjustment that can lead to an upper qdisc getting out of sync with the real number of packets queued, leading to endless dequeueing attempts by the upper layer code. All qdiscs must maintain an accurate q.qlen counter. There are basically two groups of operations affecting the qlen: operations that propagate down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the root qdisc and operations only affecting a subtree or single qdisc (change, graft, delete class). Since qlen changes during operations from the second group don't propagate to ancestor qdiscs, their qlen values become desynchronized. This patch adds a function to propagate qlen changes up the qdisc tree, optionally calling a callback function to perform qdisc-internal maintenance when the child qdisc becomes empty. The follow-up patches will convert all qdiscs to use this function where necessary. Noticed by Timo Steinbach <tsteinbach@astaro.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:42 -08:00
Patrick McHardy	9f9afec482	[NET_SCHED]: Set parent classid in default qdiscs Set parent classids in default qdiscs to allow walking up the tree from outside the qdiscs. This is needed by the next patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:41 -08:00
Patrick McHardy	814a175e7b	[NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete qlen adjustment should happen immediately in ->delete and not in the class destroy function because the reference count will not hit zero in ->delete (sch_api holds a reference) but in ->put. Since the qdisc lock is released between deletion of the class and final destruction this creates an externally visible error in the qlen counter. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:40 -08:00
Arnaldo Carvalho de Melo	c7b1b24978	[SCHED]: Use kmemdup & kzalloc where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:18 -08:00
Al Viro	66c6f529c3	[NET]: net/sched annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:19 -08:00
David Kimdon	3c62f75aac	[PKT_SCHED]: Make sch_fifo.o available when CONFIG_NET_SCHED is not set. Based on patch by Patrick McHardy. Add a new option, NET_SCH_FIFO, which provides a simple fifo qdisc without requiring CONFIG_NET_SCHED. The d80211 stack needs a generic fifo qdisc for WME. At present it uses net/d80211/fifo_qdisc.c which is functionally equivalent to sch_fifo.c. This patch will allow the d80211 stack to remove net/d80211/fifo_qdisc.c and use sch_fifo.c instead. Signed-off-by: David Kimdon <david.kimdon@devicescape.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:43 -08:00
Thomas Graf	82e91ffef6	[NET]: Turn nfmark into generic mark nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:38 -08:00
Stephen Hemminger	da33e3eb48	[PKT_SCHED] sch_htb: Use hlist_del_init(). Otherwise we can hit paths that (legally) do multiple deletes on the same node and OOPS with the HLIST poison values there instead of NULL. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-07 15:10:12 -08:00
Stephen Hemminger	798b6b19d7	[PATCH] skge, sky2, et all. gplv2 only I don't want my code to downgraded to GPLv3 because of cut-n-pasted the comments. These files which I hold copyright on were started before it was clear what GPLv3 was going to be. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2006-10-31 20:22:06 -05:00
David S. Miller	4e8a520150	[PKT_SCHED] netem: Orphan SKB when adding to queue. The networking emulator can queue SKBs for a very long time, so if you're using netem on the sender side for large bandwidth/delay product testing, the SKB socket send queue sizes become artificially larger. Correct this by calling skb_orphan() in netem_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-22 21:00:33 -07:00
Akinbou Mita	30bdbe397b	[PKT_SCHED] sch_htb: use rb_first() cleanup Use rb_first() to get first entry in rb tree. Signed-off-by: Akinbou Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-12 01:52:05 -07:00
Patrick McHardy	2473ffe3ca	[NET_SCHED]: Remove old estimator implementation Remove unused file, estimators live in net/core/gen_estimator.c now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-04 00:31:07 -07:00
Ismail Donmez	81771b3b20	[NET_SCHED]: Revert "HTB: fix incorrect use of RB_EMPTY_NODE" With commit `10fd48f237` [1] , RB_EMPTY_NODE changed behaviour so it returns true when the node is empty as expected. Hence Patrick McHardy's fix for sched_htb.c should be reverted. Signed-off-by: Ismail Donmez <ismail@pardus.org.tr> ACKed-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-04 00:30:58 -07:00
Patrick McHardy	85670cc1fa	[NET_SCHED]: Fix fallout from dev->qdisc RCU change The move of qdisc destruction to a rcu callback broke locking in the entire qdisc layer by invalidating previously valid assumptions about the context in which changes to the qdisc tree occur. The two assumptions were: - since changes only happen in process context, read_lock doesn't need bottem half protection. Now invalid since destruction of inner qdiscs, classifiers, actions and estimators happens in the RCU callback unless they're manually deleted, resulting in dead-locks when read_lock in process context is interrupted by write_lock_bh in bottem half context. - since changes only happen under the RTNL, no additional locking is necessary for data not used during packet processing (f.e. u32_list). Again, since destruction now happens in the RCU callback, this assumption is not valid anymore, causing races while using this data, which can result in corruption or use-after-free. Instead of "fixing" this by disabling bottem halfs everywhere and adding new locks/refcounting, this patch makes these assumptions valid again by moving destruction back to process context. Since only the dev->qdisc pointer is protected by RCU, but ->enqueue and the qdisc tree are still protected by dev->qdisc_lock, destruction of the tree can be performed immediately and only the final free needs to happen in the rcu callback to make sure dev_queue_xmit doesn't access already freed memory. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:50 -07:00
Patrick McHardy	787e0617e5	[NET_SCHED]: HTB: fix incorrect use of RB_EMPTY_NODE Fix incorrect use of RB_EMPTY_NODE in htb_safe_rb_erase, which makes it skip nodes within the rbtree instead of nodes not in the tree, resulting in crashes later on. The root cause for this seems to be the very counter-intuitive behaviour of the RB_EMPTY_NODE macro, which returns _false_ when the node is empty. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-28 18:01:49 -07:00
Kim Nordlund	658270a0a4	[PKT_SCHED] cls_basic: Use unsigned int when generating handle Prevents filters from being added if the first generated handle already exists. Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com> Signed-off-by: Thomas Graf <tgraf@suug.ch>	2006-09-28 18:01:45 -07:00
Adrian Bunk	28a7b327b8	[PKT_SCHED] act_simple.c: make struct simp_hash_info static This patch makes the needlessly global struct simp_hash_info static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:40 -07:00
Patrick McHardy	b4e9b520ca	[NET_SCHED]: Add mask support to fwmark classifier Support masking the nfmark value before the search. The mask value is global for all filters contained in one instance. It can only be set when a new instance is created, all filters must specify the same mask. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 15:18:12 -07:00
Patrick McHardy	efa741656e	[NETFILTER]: x_tables: remove unused size argument to check/destroy functions The size is verified by x_tables and isn't needed by the modules anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:34 -07:00
Patrick McHardy	fe1cb10873	[NETFILTER]: x_tables: remove unused argument to target functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:33 -07:00
David S. Miller	e9ce1cd3cf	[PKT_SCHED]: Kill pkt_act.h inlining. This was simply making templates of functions and mostly causing a lot of code duplication in the classifier action modules. We solve this more cleanly by having a common "struct tcf_common" that hash worker functions contained once in act_api.c can work with. Callers work with real action objects that have the common struct plus their module specific struct members. You go from a common object to the higher level one using a "to_foo()" macro which makes use of container_of() to do the dirty work. This also kills off act_generic.h which was only used by act_simple.c and keeping it around was more work than the it's value. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:55:10 -07:00
Thomas Graf	2942e90050	[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:48 -07:00
Stephen Hemminger	3696f625e2	[HTB]: rbtree cleanup Add code to initialize rb tree nodes, and check for double deletion. This is not a real fix, but I can make it trap sometimes and may be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:34 -07:00
Stephen Hemminger	0cef296da9	[HTB]: Use hlist for hash lists. Use hlist instead of list for the hash list. This saves space, and we can check for double delete better. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:34 -07:00
Stephen Hemminger	87990467d3	[HTB]: Lindent Code was a mess in terms of indentation. Run through Lindent script, and cleanup the damage. Also, don't use, vim magic comment, and substitute inline for __inline__. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:33 -07:00
Stephen Hemminger	18a63e868b	[HTB]: HTB_HYSTERESIS cleanup Change the conditional compilation around HTB_HYSTERSIS since code was splitting mid expression. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:32 -07:00
Stephen Hemminger	9ac961ee05	[HTB]: Remove lock macro. Get rid of the macro's being used to obscure the locking. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:31 -07:00
Stephen Hemminger	3bf72957d2	[HTB]: Remove broken debug code. The HTB network scheduler had debug code that wouldn't compile and confused and obfuscated the code, remove it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:54:30 -07:00
Patrick McHardy	84fa7933a3	[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-22 14:53:53 -07:00
Herbert Xu	d7811e623d	[NET]: Drop tx lock in dev_watchdog_up Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM. On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote: > > ======================================================= > [ INFO: possible circular locking dependency detected ] > ------------------------------------------------------- > swapper/0 is trying to acquire lock: > (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290 > > but task is already holding lock: > (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290 > > which lock already depends on the new lock. This turns out to be a genuine bug. The queue lock and xmit lock are intentionally taken out of order. Two things are supposed to prevent dead-locks from occuring: 1) When we hold the queue_lock we're supposed to only do try_lock on the tx_lock. 2) We always drop the queue_lock after taking the tx_lock and before doing anything else. > > the existing dependency chain (in reverse order) is: > > -> #1 (&dev->_xmit_lock){-+..}: > [<c012e7b6>] lock_acquire+0x76/0xa0 > [<c0336241>] _spin_lock_bh+0x31/0x40 > [<c02d25a9>] dev_activate+0x69/0x120 This path obviously breaks assumption 1) and therefore can lead to ABBA dead-locks. I've looked at the history and there seems to be no reason for the lock to be held at all in dev_watchdog_up. The lock appeared in day one and even there it was unnecessary. In fact, people added __dev_watchdog_up precisely in order to get around the tx lock there. The function dev_watchdog_up is already serialised by rtnl_lock since its only caller dev_activate is always called under it. So here is a simple patch to remove the tx lock from dev_watchdog_up. In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and replace it with dev_watchdog_up. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-09-18 00:22:30 -07:00
Ralf Hildebrandt	c0956bd251	[PKT_SCHED] cls_u32: Fix typo. Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-17 16:29:54 -07:00
Jamal Hadi Salim	b9e2cc0f0e	[PKT_SCHED]: Return ENOENT if qdisc module is unavailable Return ENOENT if qdisc module is unavailable Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-08-04 22:59:49 -07:00
Panagiotis Issaris	0da974f4f3	[NET]: Conversions from kmalloc+memset to k(z\|c)alloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:51:30 -07:00
Guillaume Chazarain	89e1df74f8	[PKT_SCHED] netem: Fix slab corruption with netem (2nd try) CONFIG_DEBUG_SLAB found the following bug: netem_enqueue() in sch_netem.c gets a pointer inside a slab object: struct netem_skb_cb cb = (struct netem_skb_cb )skb->cb; But then, the slab object may be freed: skb = skb_unshare(skb, GFP_ATOMIC) cb is still pointing inside the freed skb, so here is a patch to initialize cb later, and make it clear that initializing it sooner is a bad idea. [From Stephen Hemminger: leave cb unitialized in order to let gcc complain in case of use before initialization] Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-21 14:45:25 -07:00
Dave Jones	2724a1a55f	[PATCH] sch_htb compile fix. net/sched/sch_htb.c: In function 'htb_change_class': net/sched/sch_htb.c:1605: error: expected ';' before 'do_gettimeofday' Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-15 11:40:20 -07:00
Stephen Hemminger	b3a6251915	[PKT_SCHED] HTB: initialize upper bound properly The upper bound for HTB time diff needs to be scaled to PSCHED units rather than just assuming usecs. The field mbuffer is used in TDIFF_SAFE(), as an upper bound. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-14 16:32:27 -07:00
Stephen Hemminger	781b456a98	[MAINTAINERS]: Add proper entry for TC classifier Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-12 13:58:48 -07:00
Thomas Graf	ebbaeab18b	[PKT_SCHED]: act_api: Fix module leak while flushing actions Module reference needs to be given back if message header construction fails. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-09 11:36:23 -07:00
Thomas Graf	4fe683f50d	[PKT_SCHED]: Fix error handling while dumping actions "return -err" and blindly inheriting the error code in the netlink failure exception handler causes errors codes to be returned as positive value therefore making them being ignored by the caller. May lead to sending out incomplete netlink messages. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-05 20:47:28 -07:00
Thomas Graf	d152b4e1e9	[PKT_SCHED]: Return ENOENT if action module is unavailable Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-05 20:45:57 -07:00
Thomas Graf	26dab8930b	[PKT_SCHED]: Fix illegal memory dereferences when dumping actions The TCA_ACT_KIND attribute is used without checking its availability when dumping actions therefore leading to a value of 0x4 being dereferenced. The use of strcmp() in tc_lookup_action_n() isn't safe when fed with string from an attribute without enforcing proper NUL termination. Both bugs can be triggered with malformed netlink message and don't require any privileges. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-07-05 20:45:06 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Matt LaPlante	3539c272f1	Kconfig: Typos in net/sched/Kconfig Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 18:53:46 +02:00
Herbert Xu	f6a78bfcb1	[NET]: Add generic segmentation offload This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 02:07:31 -07:00
Herbert Xu	d4828d85d1	[NET]: Prevent transmission after dev_deactivate The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-23 02:07:26 -07:00
Herbert Xu	48d83325b6	[NET]: Prevent multiple qdisc runs Having two or more qdisc_run's contend against each other is bad because it can induce packet reordering if the packets have to be requeued. It appears that this is an unintended consequence of relinquinshing the queue lock while transmitting. That in turn is needed for devices that spend a lot of time in their transmit routine. There are no advantages to be had as devices with queues are inherently single-threaded (the loopback device is not but then it doesn't have a queue). Even if you were to add a queue to a parallel virtual device (e.g., bolt a tbf filter in front of an ipip tunnel device), you would still want to process the queue in sequence to ensure that the packets are ordered correctly. The solution here is to steal a bit from net_device to prevent this. BTW, as qdisc_restart is no longer used by anyone as a module inside the kernel (IIRC it used to with netif_wake_queue), I have not exported the new __qdisc_run function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-19 23:57:59 -07:00
Herbert Xu	932ff279a4	[NET]: Add netif_tx_lock Various drivers use xmit_lock internally to synchronise with their transmission routines. They do so without setting xmit_lock_owner. This is fine as long as netpoll is not in use. With netpoll it is possible for deadlocks to occur if xmit_lock_owner isn't set. This is because if a printk occurs while xmit_lock is held and xmit_lock_owner is not set can cause netpoll to attempt to take xmit_lock recursively. While it is possible to resolve this by getting netpoll to use trylock, it is suboptimal because netpoll's sole objective is to maximise the chance of getting the printk out on the wire. So delaying or dropping the message is to be avoided as much as possible. So the only alternative is to always set xmit_lock_owner. The following patch does this by introducing the netif_tx_lock family of functions that take care of setting/unsetting xmit_lock_owner. I renamed xmit_lock to _xmit_lock to indicate that it should not be used directly. I didn't provide irq versions of the netif_tx_lock functions since xmit_lock is meant to be a BH-disabling lock. This is pretty much a straight text substitution except for a small bug fix in winbond. It currently uses netif_stop_queue/spin_unlock_wait to stop transmission. This is unsafe as an IRQ can potentially wake up the queue. So it is safer to use netif_tx_disable. The hamradio bits used spin_lock_irq but it is unnecessary as xmit_lock must never be taken in an IRQ handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-06-17 21:30:14 -07:00
Stephen Hemminger	338f7566e5	[PKT_SCHED]: Potential jiffy wrap bug in dev_watchdog(). There is a potential jiffy wraparound bug in the transmit watchdog that is easily avoided by using time_after(). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-16 15:02:12 -07:00
Patrick McHardy	210525d65d	[NET_SCHED]: HFSC: fix thinko in hfsc_adjust_levels() When deleting the last child the level of a class should drop to zero. Noticed by Andreas Mueller <andreas@stapelspeicher.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-05-11 12:22:03 -07:00
Stephen Hemminger	89bbb0a361	[PKT_SCHED] netem: fix loss The following one line fix is needed to make loss function of netem work right when doing loss on the local host. Otherwise, higher layers just recover. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-29 18:33:12 -07:00
Patrick McHardy	18118cdbfd	[NETFILTER]: ipt action: use xt_check_target for basic verification The targets don't do the basic verification themselves anymore so the ipt action needs to take care of it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-24 17:27:34 -07:00
Jamal Hadi Salim	83b950c89c	[PKT_SCHED] act_police: Rename methods. Rename policer specific _generic_ methods to be specific to _act_police_ Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-04-09 22:25:46 -07:00
Patrick McHardy	1ae39a430b	[NET_SCHED]: cls_u32: remove unnecessary NULL-ptr check In both cases n can't be NULL without crashing anyway. Coverity #78 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-23 01:16:48 -08:00
Adrian Bunk	afb5bb5744	[PKT_SCHED]: Let NET_CLS_ACT no longer depend on EXPERIMENTAL This option should IMHO no longer depend on EXPERIMENTAL. Signed-off-by: Adrian Bunk <bunk@stusta.de> ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:44:24 -08:00
Stephen Hemminger	1533306186	[NET]: dev_put/dev_hold cleanup Get rid of the old __dev_put macro that is just a hold over from pre 2.6 kernel. And turn dev_hold into an inline instead of a macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 22:32:28 -08:00
Patrick McHardy	f38c39d6ce	[PKT_SCHED]: Convert sch_red to a classful qdisc Convert sch_red to a classful qdisc. All qdiscs that maintain accurate backlog counters are eligible as child qdiscs. When a queue limit larger than zero is given, a bfifo qdisc is used for backwards compatibility. Current versions of tc enforce a limit larger than zero, other users can avoid creating the default qdisc by using zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:20:44 -08:00
Patrick McHardy	f5539eb8ca	[PKT_SCHED]: Keep backlog counter in sch_sfq Keep backlog counter in SFQ qdisc to make it usable as child qdisc with RED. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:01:38 -08:00
Patrick McHardy	053cfed75d	[PKT_SCHED]: Restore TBF change semantic When TBF was converted to a classful qdisc, the semantic of the limit parameter was broken. On initilization an inner bfifo qdisc is created for backwards compatibility, when changing parameters however the new limit is ignored and the current child qdisc remains in place. Always replace the child qdisc by the default bfifo when limit is above zero, otherwise don't touch the inner qdisc. Current tc version enforce a limit above zero, other users can avoid creating the inner qdisc by using zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:01:21 -08:00
Patrick McHardy	cdc7f8e362	[PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark} A qdisc should set tcm_info to the child qdisc handle in its class dump function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:01:06 -08:00
Patrick McHardy	6d037a26f0	[PKT_SCHED]: Qdisc drop operation is optional The drop operation is optional and qdiscs must check if childs support it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 19:00:49 -08:00
Patrick McHardy	1c524830d0	[NETFILTER]: x_tables: pass registered match/target data to match/target functions This allows to make decisions based on the revision (and address family with a follow-up patch) at runtime. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-20 18:02:15 -08:00
Patrick McHardy	f6e57464df	[NET_SCHED]: act_api: fix skb leak in error path The skb is allocated by the function, so it needs to be freed instead of trimmed on overrun. Coverity #614 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-03-12 20:39:36 -08:00
Patrick McHardy	ae82af54d7	[PKT_SCHED]: Handle SCTP/DCCP in sfq_hash Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 13:01:06 -08:00
Amnon Aaronsohn	dd914b4082	[PKT_SCHED] sch_prio: fix qdisc bands init Currently when PRIO is configured to use N bands, it lets the packets be directed to any of the bands 0..N-1. However, PRIO attaches a fifo qdisc only to the bands that appear in the priomap; the rest of the N bands remain with a noop qdisc attached. This patch changes PRIO's behavior so that it attaches a fifo qdisc to all of the N bands. Signed-off-by: Amnon Aaronsohn <bla@cs.huji.ac.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-17 02:24:26 -08:00
Patrick McHardy	dca80b962a	[PKT_SCHED]: Change default clock source to gettimeofday The default of using jiffies is very bad and results in underutilization except with very low bandwidth. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-01-13 14:36:55 -08:00

... 7 8 9 10 11 ...

953 Commits