linux_dsm_epyc7002/drivers/net/ppp/ppp_generic.c

3334 lines
78 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Generic PPP layer for Linux.
*
* Copyright 1999-2002 Paul Mackerras.
*
* The generic PPP layer handles the PPP network interfaces, the
* /dev/ppp device, packet and VJ compression, and multilink.
* It talks to PPP `channels' via the interface defined in
* include/linux/ppp_channel.h. Channels provide the basic means for
* sending and receiving PPP frames on some kind of communications
* channel.
*
* Part of the code in this driver was inspired by the old async-only
* PPP driver, written by Michael Callahan and Al Longyear, and
* subsequently hacked by Paul Mackerras.
*
* ==FILEVERSION 20041108==
*/
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/sched/signal.h>
#include <linux/kmod.h>
#include <linux/init.h>
#include <linux/list.h>
#include <linux/idr.h>
#include <linux/netdevice.h>
#include <linux/poll.h>
#include <linux/ppp_defs.h>
#include <linux/filter.h>
#include <linux/ppp-ioctl.h>
#include <linux/ppp_channel.h>
#include <linux/ppp-comp.h>
#include <linux/skbuff.h>
#include <linux/rtnetlink.h>
#include <linux/if_arp.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/spinlock.h>
#include <linux/rwsem.h>
#include <linux/stddef.h>
#include <linux/device.h>
#include <linux/mutex.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
#include <linux/slab.h>
#include <linux/file.h>
#include <asm/unaligned.h>
#include <net/slhc_vj.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
#include <linux/nsproxy.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
#define PPP_VERSION "2.4.2"
/*
* Network protocols we support.
*/
#define NP_IP 0 /* Internet Protocol V4 */
#define NP_IPV6 1 /* Internet Protocol V6 */
#define NP_IPX 2 /* IPX protocol */
#define NP_AT 3 /* Appletalk protocol */
#define NP_MPLS_UC 4 /* MPLS unicast */
#define NP_MPLS_MC 5 /* MPLS multicast */
#define NUM_NP 6 /* Number of NPs. */
#define MPHDRLEN 6 /* multilink protocol header length */
#define MPHDRLEN_SSN 4 /* ditto with short sequence numbers */
/*
* An instance of /dev/ppp can be associated with either a ppp
* interface unit or a ppp channel. In both cases, file->private_data
* points to one of these.
*/
struct ppp_file {
enum {
INTERFACE=1, CHANNEL
} kind;
struct sk_buff_head xq; /* pppd transmit queue */
struct sk_buff_head rq; /* receive queue for pppd */
wait_queue_head_t rwait; /* for poll on reading /dev/ppp */
refcount_t refcnt; /* # refs (incl /dev/ppp attached) */
int hdrlen; /* space to leave for headers */
int index; /* interface unit / channel number */
int dead; /* unit/channel has been shut down */
};
#define PF_TO_X(pf, X) container_of(pf, X, file)
#define PF_TO_PPP(pf) PF_TO_X(pf, struct ppp)
#define PF_TO_CHANNEL(pf) PF_TO_X(pf, struct channel)
/*
* Data structure to hold primary network stats for which
* we want to use 64 bit storage. Other network stats
* are stored in dev->stats of the ppp strucute.
*/
struct ppp_link_stats {
u64 rx_packets;
u64 tx_packets;
u64 rx_bytes;
u64 tx_bytes;
};
/*
* Data structure describing one ppp unit.
* A ppp unit corresponds to a ppp network interface device
* and represents a multilink bundle.
* It can have 0 or more ppp channels connected to it.
*/
struct ppp {
struct ppp_file file; /* stuff for read/write/poll 0 */
struct file *owner; /* file that owns this unit 48 */
struct list_head channels; /* list of attached channels 4c */
int n_channels; /* how many channels are attached 54 */
spinlock_t rlock; /* lock for receive side 58 */
spinlock_t wlock; /* lock for transmit side 5c */
int __percpu *xmit_recursion; /* xmit recursion detect */
int mru; /* max receive unit 60 */
unsigned int flags; /* control bits 64 */
unsigned int xstate; /* transmit state bits 68 */
unsigned int rstate; /* receive state bits 6c */
int debug; /* debug flags 70 */
struct slcompress *vj; /* state for VJ header compression */
enum NPmode npmode[NUM_NP]; /* what to do with each net proto 78 */
struct sk_buff *xmit_pending; /* a packet ready to go out 88 */
struct compressor *xcomp; /* transmit packet compressor 8c */
void *xc_state; /* its internal state 90 */
struct compressor *rcomp; /* receive decompressor 94 */
void *rc_state; /* its internal state 98 */
unsigned long last_xmit; /* jiffies when last pkt sent 9c */
unsigned long last_recv; /* jiffies when last pkt rcvd a0 */
struct net_device *dev; /* network interface device a4 */
int closing; /* is device closing down? a8 */
#ifdef CONFIG_PPP_MULTILINK
int nxchan; /* next channel to send something on */
u32 nxseq; /* next sequence number to send */
int mrru; /* MP: max reconst. receive unit */
u32 nextseq; /* MP: seq no of next packet */
u32 minseq; /* MP: min of most recent seqnos */
struct sk_buff_head mrq; /* MP: receive reconstruction queue */
#endif /* CONFIG_PPP_MULTILINK */
#ifdef CONFIG_PPP_FILTER
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
struct bpf_prog *pass_filter; /* filter for packets to pass */
struct bpf_prog *active_filter; /* filter for pkts to reset idle */
#endif /* CONFIG_PPP_FILTER */
struct net *ppp_net; /* the net we belong to */
struct ppp_link_stats stats64; /* 64 bit network stats */
};
/*
* Bits in flags: SC_NO_TCP_CCID, SC_CCP_OPEN, SC_CCP_UP, SC_LOOP_TRAFFIC,
* SC_MULTILINK, SC_MP_SHORTSEQ, SC_MP_XSHORTSEQ, SC_COMP_TCP, SC_REJ_COMP_TCP,
* SC_MUST_COMP
* Bits in rstate: SC_DECOMP_RUN, SC_DC_ERROR, SC_DC_FERROR.
* Bits in xstate: SC_COMP_RUN
*/
#define SC_FLAG_BITS (SC_NO_TCP_CCID|SC_CCP_OPEN|SC_CCP_UP|SC_LOOP_TRAFFIC \
|SC_MULTILINK|SC_MP_SHORTSEQ|SC_MP_XSHORTSEQ \
|SC_COMP_TCP|SC_REJ_COMP_TCP|SC_MUST_COMP)
/*
* Private data structure for each channel.
* This includes the data structure used for multilink.
*/
struct channel {
struct ppp_file file; /* stuff for read/write/poll */
struct list_head list; /* link in all/new_channels list */
struct ppp_channel *chan; /* public channel data structure */
struct rw_semaphore chan_sem; /* protects `chan' during chan ioctl */
spinlock_t downl; /* protects `chan', file.xq dequeue */
struct ppp *ppp; /* ppp unit we're connected to */
struct net *chan_net; /* the net channel belongs to */
struct list_head clist; /* link in list of channels per unit */
rwlock_t upl; /* protects `ppp' */
#ifdef CONFIG_PPP_MULTILINK
u8 avail; /* flag used in multilink stuff */
u8 had_frag; /* >= 1 fragments have been sent */
u32 lastseq; /* MP: last sequence # received */
int speed; /* speed of the corresponding ppp channel*/
#endif /* CONFIG_PPP_MULTILINK */
};
struct ppp_config {
struct file *file;
s32 unit;
bool ifname_is_set;
};
/*
* SMP locking issues:
* Both the ppp.rlock and ppp.wlock locks protect the ppp.channels
* list and the ppp.n_channels field, you need to take both locks
* before you modify them.
* The lock ordering is: channel.upl -> ppp.wlock -> ppp.rlock ->
* channel.downl.
*/
static DEFINE_MUTEX(ppp_mutex);
static atomic_t ppp_unit_count = ATOMIC_INIT(0);
static atomic_t channel_count = ATOMIC_INIT(0);
/* per-net private data for this module */
netns: make struct pernet_operations::id unsigned int Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-17 08:58:21 +07:00
static unsigned int ppp_net_id __read_mostly;
struct ppp_net {
/* units to ppp mapping */
struct idr units_idr;
/*
* all_ppp_mutex protects the units_idr mapping.
* It also ensures that finding a ppp unit in the units_idr
* map and updating its file.refcnt field is atomic.
*/
struct mutex all_ppp_mutex;
/* channels */
struct list_head all_channels;
struct list_head new_channels;
int last_channel_index;
/*
* all_channels_lock protects all_channels and
* last_channel_index, and the atomicity of find
* a channel and updating its file.refcnt field.
*/
spinlock_t all_channels_lock;
};
/* Get the PPP protocol number from a skb */
#define PPP_PROTO(skb) get_unaligned_be16((skb)->data)
/* We limit the length of ppp->file.rq to this (arbitrary) value */
#define PPP_MAX_RQLEN 32
/*
* Maximum number of multilink fragments queued up.
* This has to be large enough to cope with the maximum latency of
* the slowest channel relative to the others. Strictly it should
* depend on the number of channels and their characteristics.
*/
#define PPP_MP_MAX_QLEN 128
/* Multilink header bits. */
#define B 0x80 /* this fragment begins a packet */
#define E 0x40 /* this fragment ends a packet */
/* Compare multilink sequence numbers (assumed to be 32 bits wide) */
#define seq_before(a, b) ((s32)((a) - (b)) < 0)
#define seq_after(a, b) ((s32)((a) - (b)) > 0)
/* Prototypes. */
static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
struct file *file, unsigned int cmd, unsigned long arg);
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb);
static void ppp_send_frame(struct ppp *ppp, struct sk_buff *skb);
static void ppp_push(struct ppp *ppp);
static void ppp_channel_push(struct channel *pch);
static void ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb,
struct channel *pch);
static void ppp_receive_error(struct ppp *ppp);
static void ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb);
static struct sk_buff *ppp_decompress_frame(struct ppp *ppp,
struct sk_buff *skb);
#ifdef CONFIG_PPP_MULTILINK
static void ppp_receive_mp_frame(struct ppp *ppp, struct sk_buff *skb,
struct channel *pch);
static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
#endif /* CONFIG_PPP_MULTILINK */
static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
static void ppp_ccp_closed(struct ppp *ppp);
static struct compressor *find_compressor(int type);
static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
static int ppp_create_interface(struct net *net, struct file *file, int *unit);
static void init_ppp_file(struct ppp_file *pf, int kind);
static void ppp_destroy_interface(struct ppp *ppp);
static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
static struct channel *ppp_find_channel(struct ppp_net *pn, int unit);
static int ppp_connect_channel(struct channel *pch, int unit);
static int ppp_disconnect_channel(struct channel *pch);
static void ppp_destroy_channel(struct channel *pch);
static int unit_get(struct idr *p, void *ptr);
static int unit_set(struct idr *p, void *ptr, int n);
static void unit_put(struct idr *p, int n);
static void *unit_find(struct idr *p, int n);
static void ppp_setup(struct net_device *dev);
static const struct net_device_ops ppp_netdev_ops;
static struct class *ppp_class;
/* per net-namespace data */
static inline struct ppp_net *ppp_pernet(struct net *net)
{
BUG_ON(!net);
return net_generic(net, ppp_net_id);
}
/* Translates a PPP protocol number to a NP index (NP == network protocol) */
static inline int proto_to_npindex(int proto)
{
switch (proto) {
case PPP_IP:
return NP_IP;
case PPP_IPV6:
return NP_IPV6;
case PPP_IPX:
return NP_IPX;
case PPP_AT:
return NP_AT;
case PPP_MPLS_UC:
return NP_MPLS_UC;
case PPP_MPLS_MC:
return NP_MPLS_MC;
}
return -EINVAL;
}
/* Translates an NP index into a PPP protocol number */
static const int npindex_to_proto[NUM_NP] = {
PPP_IP,
PPP_IPV6,
PPP_IPX,
PPP_AT,
PPP_MPLS_UC,
PPP_MPLS_MC,
};
/* Translates an ethertype into an NP index */
static inline int ethertype_to_npindex(int ethertype)
{
switch (ethertype) {
case ETH_P_IP:
return NP_IP;
case ETH_P_IPV6:
return NP_IPV6;
case ETH_P_IPX:
return NP_IPX;
case ETH_P_PPPTALK:
case ETH_P_ATALK:
return NP_AT;
case ETH_P_MPLS_UC:
return NP_MPLS_UC;
case ETH_P_MPLS_MC:
return NP_MPLS_MC;
}
return -1;
}
/* Translates an NP index into an ethertype */
static const int npindex_to_ethertype[NUM_NP] = {
ETH_P_IP,
ETH_P_IPV6,
ETH_P_IPX,
ETH_P_PPPTALK,
ETH_P_MPLS_UC,
ETH_P_MPLS_MC,
};
/*
* Locking shorthand.
*/
#define ppp_xmit_lock(ppp) spin_lock_bh(&(ppp)->wlock)
#define ppp_xmit_unlock(ppp) spin_unlock_bh(&(ppp)->wlock)
#define ppp_recv_lock(ppp) spin_lock_bh(&(ppp)->rlock)
#define ppp_recv_unlock(ppp) spin_unlock_bh(&(ppp)->rlock)
#define ppp_lock(ppp) do { ppp_xmit_lock(ppp); \
ppp_recv_lock(ppp); } while (0)
#define ppp_unlock(ppp) do { ppp_recv_unlock(ppp); \
ppp_xmit_unlock(ppp); } while (0)
/*
* /dev/ppp device routines.
* The /dev/ppp device is used by pppd to control the ppp unit.
* It supports the read, write, ioctl and poll functions.
* Open instances of /dev/ppp can be in one of three states:
* unattached, attached to a ppp unit, or attached to a ppp channel.
*/
static int ppp_open(struct inode *inode, struct file *file)
{
/*
* This could (should?) be enforced by the permissions on /dev/ppp.
*/
if (!ns_capable(file->f_cred->user_ns, CAP_NET_ADMIN))
return -EPERM;
return 0;
}
static int ppp_release(struct inode *unused, struct file *file)
{
struct ppp_file *pf = file->private_data;
struct ppp *ppp;
if (pf) {
file->private_data = NULL;
if (pf->kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
rtnl_lock();
if (file == ppp->owner)
unregister_netdevice(ppp->dev);
rtnl_unlock();
}
if (refcount_dec_and_test(&pf->refcnt)) {
switch (pf->kind) {
case INTERFACE:
ppp_destroy_interface(PF_TO_PPP(pf));
break;
case CHANNEL:
ppp_destroy_channel(PF_TO_CHANNEL(pf));
break;
}
}
}
return 0;
}
static ssize_t ppp_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
struct ppp_file *pf = file->private_data;
DECLARE_WAITQUEUE(wait, current);
ssize_t ret;
struct sk_buff *skb = NULL;
struct iovec iov;
struct iov_iter to;
ret = count;
if (!pf)
return -ENXIO;
add_wait_queue(&pf->rwait, &wait);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
skb = skb_dequeue(&pf->rq);
if (skb)
break;
ret = 0;
if (pf->dead)
break;
if (pf->kind == INTERFACE) {
/*
* Return 0 (EOF) on an interface that has no
* channels connected, unless it is looping
* network traffic (demand mode).
*/
struct ppp *ppp = PF_TO_PPP(pf);
ppp_recv_lock(ppp);
if (ppp->n_channels == 0 &&
(ppp->flags & SC_LOOP_TRAFFIC) == 0) {
ppp_recv_unlock(ppp);
break;
}
ppp_recv_unlock(ppp);
}
ret = -EAGAIN;
if (file->f_flags & O_NONBLOCK)
break;
ret = -ERESTARTSYS;
if (signal_pending(current))
break;
schedule();
}
set_current_state(TASK_RUNNING);
remove_wait_queue(&pf->rwait, &wait);
if (!skb)
goto out;
ret = -EOVERFLOW;
if (skb->len > count)
goto outf;
ret = -EFAULT;
iov.iov_base = buf;
iov.iov_len = count;
iov_iter_init(&to, READ, &iov, 1, count);
if (skb_copy_datagram_iter(skb, 0, &to, skb->len))
goto outf;
ret = skb->len;
outf:
kfree_skb(skb);
out:
return ret;
}
static ssize_t ppp_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
struct ppp_file *pf = file->private_data;
struct sk_buff *skb;
ssize_t ret;
if (!pf)
return -ENXIO;
ret = -ENOMEM;
skb = alloc_skb(count + pf->hdrlen, GFP_KERNEL);
if (!skb)
goto out;
skb_reserve(skb, pf->hdrlen);
ret = -EFAULT;
if (copy_from_user(skb_put(skb, count), buf, count)) {
kfree_skb(skb);
goto out;
}
switch (pf->kind) {
case INTERFACE:
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
ppp_xmit_process(PF_TO_PPP(pf), skb);
break;
case CHANNEL:
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
skb_queue_tail(&pf->xq, skb);
ppp_channel_push(PF_TO_CHANNEL(pf));
break;
}
ret = count;
out:
return ret;
}
/* No kernel lock - fine */
static __poll_t ppp_poll(struct file *file, poll_table *wait)
{
struct ppp_file *pf = file->private_data;
__poll_t mask;
if (!pf)
return 0;
poll_wait(file, &pf->rwait, wait);
mask = EPOLLOUT | EPOLLWRNORM;
if (skb_peek(&pf->rq))
mask |= EPOLLIN | EPOLLRDNORM;
if (pf->dead)
mask |= EPOLLHUP;
else if (pf->kind == INTERFACE) {
/* see comment in ppp_read */
struct ppp *ppp = PF_TO_PPP(pf);
ppp_recv_lock(ppp);
if (ppp->n_channels == 0 &&
(ppp->flags & SC_LOOP_TRAFFIC) == 0)
mask |= EPOLLIN | EPOLLRDNORM;
ppp_recv_unlock(ppp);
}
return mask;
}
#ifdef CONFIG_PPP_FILTER
static int get_filter(void __user *arg, struct sock_filter **p)
{
struct sock_fprog uprog;
struct sock_filter *code = NULL;
net: ppp: don't call sk_chk_filter twice Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") causes sk_chk_filter() to be called twice when setting a PPP pass or active filter. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from within get_filter(). The second one is through the call chain ppp_ioctl() or isdn_ppp_ioctl() --> sk_unattached_filter_create() --> __sk_prepare_filter() --> sk_chk_filter() The first call from within get_filter() should be deleted as get_filter() is called just before calling sk_unattached_filter_create() later on, which eventually calls sk_chk_filter() anyway. For 3.15.x, this proposed change is a bugfix rather than a pure optimization as in that branch, sk_chk_filter() may replace filter codes by other codes which are not recognized when executing sk_chk_filter() a second time. So with 3.15.x, if sk_chk_filter() is called twice, the second invocation may yield EINVAL (this depends on the filter codes found in the filter to be set, but because the replacement is done for frequently used codes, this is almost always the case). The net effect is that setting pass and/or active PPP filters does not work anymore, since sk_unattached_filter_create() always returns EINVAL due to the second call to sk_chk_filter(), regardless whether the filter was originally sane or not. Signed-off-by: Christoph Schulz <develop@kristov.de> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 13:01:10 +07:00
int len;
if (copy_from_user(&uprog, arg, sizeof(uprog)))
return -EFAULT;
if (!uprog.len) {
*p = NULL;
return 0;
}
len = uprog.len * sizeof(struct sock_filter);
code = memdup_user(uprog.filter, len);
if (IS_ERR(code))
return PTR_ERR(code);
*p = code;
return uprog.len;
}
#endif /* CONFIG_PPP_FILTER */
static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct ppp_file *pf;
struct ppp *ppp;
int err = -EFAULT, val, val2, i;
struct ppp_idle idle;
struct npioctl npi;
int unit, cflags;
struct slcompress *vj;
void __user *argp = (void __user *)arg;
int __user *p = argp;
mutex_lock(&ppp_mutex);
pf = file->private_data;
if (!pf) {
err = ppp_unattached_ioctl(current->nsproxy->net_ns,
pf, file, cmd, arg);
goto out;
}
if (cmd == PPPIOCDETACH) {
/*
ppp: remove the PPPIOCDETACH ioctl The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file before f_count has reached 0, which is fundamentally a bad idea. It does check 'f_count < 2', which excludes concurrent operations on the file since they would only be possible with a shared fd table, in which case each fdget() would take a file reference. However, it fails to account for the fact that even with 'f_count == 1' the file can still be linked into epoll instances. As reported by syzbot, this can trivially be used to cause a use-after-free. Yet, the only known user of PPPIOCDETACH is pppd versions older than ppp-2.4.2, which was released almost 15 years ago (November 2003). Also, PPPIOCDETACH apparently stopped working reliably at around the same time, when the f_count check was added to the kernel, e.g. see https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2' check makes PPPIOCDETACH only work in single-threaded applications; it always fails if called from a multithreaded application. All pppd versions released in the last 15 years just close() the file descriptor instead. Therefore, instead of hacking around this bug by exporting epoll internals to modules, and probably missing other related bugs, just remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave a stub in place that prints a one-time warning and returns EINVAL. Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 04:37:38 +07:00
* PPPIOCDETACH is no longer supported as it was heavily broken,
* and is only known to have been used by pppd older than
* ppp-2.4.2 (released November 2003).
*/
ppp: remove the PPPIOCDETACH ioctl The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file before f_count has reached 0, which is fundamentally a bad idea. It does check 'f_count < 2', which excludes concurrent operations on the file since they would only be possible with a shared fd table, in which case each fdget() would take a file reference. However, it fails to account for the fact that even with 'f_count == 1' the file can still be linked into epoll instances. As reported by syzbot, this can trivially be used to cause a use-after-free. Yet, the only known user of PPPIOCDETACH is pppd versions older than ppp-2.4.2, which was released almost 15 years ago (November 2003). Also, PPPIOCDETACH apparently stopped working reliably at around the same time, when the f_count check was added to the kernel, e.g. see https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2' check makes PPPIOCDETACH only work in single-threaded applications; it always fails if called from a multithreaded application. All pppd versions released in the last 15 years just close() the file descriptor instead. Therefore, instead of hacking around this bug by exporting epoll internals to modules, and probably missing other related bugs, just remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave a stub in place that prints a one-time warning and returns EINVAL. Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-24 04:37:38 +07:00
pr_warn_once("%s (%d) used obsolete PPPIOCDETACH ioctl\n",
current->comm, current->pid);
err = -EINVAL;
goto out;
}
if (pf->kind == CHANNEL) {
struct channel *pch;
struct ppp_channel *chan;
pch = PF_TO_CHANNEL(pf);
switch (cmd) {
case PPPIOCCONNECT:
if (get_user(unit, p))
break;
err = ppp_connect_channel(pch, unit);
break;
case PPPIOCDISCONN:
err = ppp_disconnect_channel(pch);
break;
default:
down_read(&pch->chan_sem);
chan = pch->chan;
err = -ENOTTY;
if (chan && chan->ops->ioctl)
err = chan->ops->ioctl(chan, cmd, arg);
up_read(&pch->chan_sem);
}
goto out;
}
if (pf->kind != INTERFACE) {
/* can't happen */
pr_err("PPP: not interface or channel??\n");
err = -EINVAL;
goto out;
}
ppp = PF_TO_PPP(pf);
switch (cmd) {
case PPPIOCSMRU:
if (get_user(val, p))
break;
ppp->mru = val;
err = 0;
break;
case PPPIOCSFLAGS:
if (get_user(val, p))
break;
ppp_lock(ppp);
cflags = ppp->flags & ~val;
#ifdef CONFIG_PPP_MULTILINK
if (!(ppp->flags & SC_MULTILINK) && (val & SC_MULTILINK))
ppp->nextseq = 0;
#endif
ppp->flags = val & SC_FLAG_BITS;
ppp_unlock(ppp);
if (cflags & SC_CCP_OPEN)
ppp_ccp_closed(ppp);
err = 0;
break;
case PPPIOCGFLAGS:
val = ppp->flags | ppp->xstate | ppp->rstate;
if (put_user(val, p))
break;
err = 0;
break;
case PPPIOCSCOMPRESS:
err = ppp_set_compress(ppp, arg);
break;
case PPPIOCGUNIT:
if (put_user(ppp->file.index, p))
break;
err = 0;
break;
case PPPIOCSDEBUG:
if (get_user(val, p))
break;
ppp->debug = val;
err = 0;
break;
case PPPIOCGDEBUG:
if (put_user(ppp->debug, p))
break;
err = 0;
break;
case PPPIOCGIDLE:
idle.xmit_idle = (jiffies - ppp->last_xmit) / HZ;
idle.recv_idle = (jiffies - ppp->last_recv) / HZ;
if (copy_to_user(argp, &idle, sizeof(idle)))
break;
err = 0;
break;
case PPPIOCSMAXCID:
if (get_user(val, p))
break;
val2 = 15;
if ((val >> 16) != 0) {
val2 = val >> 16;
val &= 0xffff;
}
vj = slhc_init(val2+1, val+1);
if (IS_ERR(vj)) {
err = PTR_ERR(vj);
break;
}
ppp_lock(ppp);
if (ppp->vj)
slhc_free(ppp->vj);
ppp->vj = vj;
ppp_unlock(ppp);
err = 0;
break;
case PPPIOCGNPMODE:
case PPPIOCSNPMODE:
if (copy_from_user(&npi, argp, sizeof(npi)))
break;
err = proto_to_npindex(npi.protocol);
if (err < 0)
break;
i = err;
if (cmd == PPPIOCGNPMODE) {
err = -EFAULT;
npi.mode = ppp->npmode[i];
if (copy_to_user(argp, &npi, sizeof(npi)))
break;
} else {
ppp->npmode[i] = npi.mode;
/* we may be able to transmit more packets now (??) */
netif_wake_queue(ppp->dev);
}
err = 0;
break;
#ifdef CONFIG_PPP_FILTER
case PPPIOCSPASS:
{
struct sock_filter *code;
err = get_filter(argp, &code);
if (err >= 0) {
struct bpf_prog *pass_filter = NULL;
struct sock_fprog_kern fprog = {
.len = err,
.filter = code,
};
err = 0;
if (fprog.filter)
err = bpf_prog_create(&pass_filter, &fprog);
if (!err) {
ppp_lock(ppp);
if (ppp->pass_filter)
bpf_prog_destroy(ppp->pass_filter);
ppp->pass_filter = pass_filter;
ppp_unlock(ppp);
net: ppp: fix creating PPP pass and active filters Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") inadvertently changed the logic when setting PPP pass and active filters. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl() (or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to remove a pass/active filter previously set by using a filter of length zero. However, with the new code this is not possible anymore as this case is not explicitly checked for, which leads to passing NULL as a filter to sk_unattached_filter_create(). This results in returning EINVAL to the caller. Additionally, the variables ppp->pass_filter and ppp->active_filter (or is->pass_filter and is->active_filter, resp.) are not reset to NULL, although the filters they point to may have been destroyed by sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are left behind (provided the pointers were previously non-NULL). This patch corrects both problems by checking whether the filter passed is empty or non-empty, and prevents sk_unattached_filter_create() from being called in the first case. Moreover, the pointers are always reset to NULL as soon as sk_unattached_filter_destroy() returns. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17 03:10:29 +07:00
}
kfree(code);
}
break;
}
case PPPIOCSACTIVE:
{
struct sock_filter *code;
err = get_filter(argp, &code);
if (err >= 0) {
struct bpf_prog *active_filter = NULL;
struct sock_fprog_kern fprog = {
.len = err,
.filter = code,
};
err = 0;
if (fprog.filter)
err = bpf_prog_create(&active_filter, &fprog);
if (!err) {
ppp_lock(ppp);
if (ppp->active_filter)
bpf_prog_destroy(ppp->active_filter);
ppp->active_filter = active_filter;
ppp_unlock(ppp);
net: ppp: fix creating PPP pass and active filters Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") inadvertently changed the logic when setting PPP pass and active filters. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl() (or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to remove a pass/active filter previously set by using a filter of length zero. However, with the new code this is not possible anymore as this case is not explicitly checked for, which leads to passing NULL as a filter to sk_unattached_filter_create(). This results in returning EINVAL to the caller. Additionally, the variables ppp->pass_filter and ppp->active_filter (or is->pass_filter and is->active_filter, resp.) are not reset to NULL, although the filters they point to may have been destroyed by sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are left behind (provided the pointers were previously non-NULL). This patch corrects both problems by checking whether the filter passed is empty or non-empty, and prevents sk_unattached_filter_create() from being called in the first case. Moreover, the pointers are always reset to NULL as soon as sk_unattached_filter_destroy() returns. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17 03:10:29 +07:00
}
kfree(code);
}
break;
}
#endif /* CONFIG_PPP_FILTER */
#ifdef CONFIG_PPP_MULTILINK
case PPPIOCSMRRU:
if (get_user(val, p))
break;
ppp_recv_lock(ppp);
ppp->mrru = val;
ppp_recv_unlock(ppp);
err = 0;
break;
#endif /* CONFIG_PPP_MULTILINK */
default:
err = -ENOTTY;
}
out:
mutex_unlock(&ppp_mutex);
return err;
}
static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
struct file *file, unsigned int cmd, unsigned long arg)
{
int unit, err = -EFAULT;
struct ppp *ppp;
struct channel *chan;
struct ppp_net *pn;
int __user *p = (int __user *)arg;
switch (cmd) {
case PPPIOCNEWUNIT:
/* Create a new ppp unit */
if (get_user(unit, p))
break;
err = ppp_create_interface(net, file, &unit);
if (err < 0)
break;
err = -EFAULT;
if (put_user(unit, p))
break;
err = 0;
break;
case PPPIOCATTACH:
/* Attach to an existing ppp unit */
if (get_user(unit, p))
break;
err = -ENXIO;
pn = ppp_pernet(net);
mutex_lock(&pn->all_ppp_mutex);
ppp = ppp_find_unit(pn, unit);
if (ppp) {
refcount_inc(&ppp->file.refcnt);
file->private_data = &ppp->file;
err = 0;
}
mutex_unlock(&pn->all_ppp_mutex);
break;
case PPPIOCATTCHAN:
if (get_user(unit, p))
break;
err = -ENXIO;
pn = ppp_pernet(net);
spin_lock_bh(&pn->all_channels_lock);
chan = ppp_find_channel(pn, unit);
if (chan) {
refcount_inc(&chan->file.refcnt);
file->private_data = &chan->file;
err = 0;
}
spin_unlock_bh(&pn->all_channels_lock);
break;
default:
err = -ENOTTY;
}
return err;
}
static const struct file_operations ppp_device_fops = {
.owner = THIS_MODULE,
.read = ppp_read,
.write = ppp_write,
.poll = ppp_poll,
.unlocked_ioctl = ppp_ioctl,
.open = ppp_open,
llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
2010-08-15 23:52:59 +07:00
.release = ppp_release,
.llseek = noop_llseek,
};
static __net_init int ppp_init_net(struct net *net)
{
struct ppp_net *pn = net_generic(net, ppp_net_id);
idr_init(&pn->units_idr);
mutex_init(&pn->all_ppp_mutex);
INIT_LIST_HEAD(&pn->all_channels);
INIT_LIST_HEAD(&pn->new_channels);
spin_lock_init(&pn->all_channels_lock);
return 0;
}
static __net_exit void ppp_exit_net(struct net *net)
{
struct ppp_net *pn = net_generic(net, ppp_net_id);
struct net_device *dev;
struct net_device *aux;
struct ppp *ppp;
LIST_HEAD(list);
int id;
rtnl_lock();
for_each_netdev_safe(net, dev, aux) {
if (dev->netdev_ops == &ppp_netdev_ops)
unregister_netdevice_queue(dev, &list);
}
idr_for_each_entry(&pn->units_idr, ppp, id)
/* Skip devices already unregistered by previous loop */
if (!net_eq(dev_net(ppp->dev), net))
unregister_netdevice_queue(ppp->dev, &list);
unregister_netdevice_many(&list);
rtnl_unlock();
mutex_destroy(&pn->all_ppp_mutex);
idr_destroy(&pn->units_idr);
WARN_ON_ONCE(!list_empty(&pn->all_channels));
WARN_ON_ONCE(!list_empty(&pn->new_channels));
}
static struct pernet_operations ppp_net_ops = {
.init = ppp_init_net,
.exit = ppp_exit_net,
.id = &ppp_net_id,
.size = sizeof(struct ppp_net),
};
static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
{
struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
int ret;
mutex_lock(&pn->all_ppp_mutex);
if (unit < 0) {
ret = unit_get(&pn->units_idr, ppp);
if (ret < 0)
goto err;
} else {
/* Caller asked for a specific unit number. Fail with -EEXIST
* if unavailable. For backward compatibility, return -EEXIST
* too if idr allocation fails; this makes pppd retry without
* requesting a specific unit number.
*/
if (unit_find(&pn->units_idr, unit)) {
ret = -EEXIST;
goto err;
}
ret = unit_set(&pn->units_idr, ppp, unit);
if (ret < 0) {
/* Rewrite error for backward compatibility */
ret = -EEXIST;
goto err;
}
}
ppp->file.index = ret;
if (!ifname_is_set)
snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
mutex_unlock(&pn->all_ppp_mutex);
ret = register_netdevice(ppp->dev);
if (ret < 0)
goto err_unit;
atomic_inc(&ppp_unit_count);
return 0;
err_unit:
mutex_lock(&pn->all_ppp_mutex);
unit_put(&pn->units_idr, ppp->file.index);
err:
mutex_unlock(&pn->all_ppp_mutex);
return ret;
}
static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
const struct ppp_config *conf)
{
struct ppp *ppp = netdev_priv(dev);
int indx;
int err;
int cpu;
ppp->dev = dev;
ppp->ppp_net = src_net;
ppp->mru = PPP_MRU;
ppp->owner = conf->file;
init_ppp_file(&ppp->file, INTERFACE);
ppp->file.hdrlen = PPP_HDRLEN - 2; /* don't count proto bytes */
for (indx = 0; indx < NUM_NP; ++indx)
ppp->npmode[indx] = NPMODE_PASS;
INIT_LIST_HEAD(&ppp->channels);
spin_lock_init(&ppp->rlock);
spin_lock_init(&ppp->wlock);
ppp->xmit_recursion = alloc_percpu(int);
if (!ppp->xmit_recursion) {
err = -ENOMEM;
goto err1;
}
for_each_possible_cpu(cpu)
(*per_cpu_ptr(ppp->xmit_recursion, cpu)) = 0;
#ifdef CONFIG_PPP_MULTILINK
ppp->minseq = -1;
skb_queue_head_init(&ppp->mrq);
#endif /* CONFIG_PPP_MULTILINK */
#ifdef CONFIG_PPP_FILTER
ppp->pass_filter = NULL;
ppp->active_filter = NULL;
#endif /* CONFIG_PPP_FILTER */
err = ppp_unit_register(ppp, conf->unit, conf->ifname_is_set);
if (err < 0)
goto err2;
conf->file->private_data = &ppp->file;
return 0;
err2:
free_percpu(ppp->xmit_recursion);
err1:
return err;
}
static const struct nla_policy ppp_nl_policy[IFLA_PPP_MAX + 1] = {
[IFLA_PPP_DEV_FD] = { .type = NLA_S32 },
};
static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
if (!data)
return -EINVAL;
if (!data[IFLA_PPP_DEV_FD])
return -EINVAL;
if (nla_get_s32(data[IFLA_PPP_DEV_FD]) < 0)
return -EBADF;
return 0;
}
static int ppp_nl_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack)
{
struct ppp_config conf = {
.unit = -1,
.ifname_is_set = true,
};
struct file *file;
int err;
file = fget(nla_get_s32(data[IFLA_PPP_DEV_FD]));
if (!file)
return -EBADF;
/* rtnl_lock is already held here, but ppp_create_interface() locks
* ppp_mutex before holding rtnl_lock. Using mutex_trylock() avoids
* possible deadlock due to lock order inversion, at the cost of
* pushing the problem back to userspace.
*/
if (!mutex_trylock(&ppp_mutex)) {
err = -EBUSY;
goto out;
}
if (file->f_op != &ppp_device_fops || file->private_data) {
err = -EBADF;
goto out_unlock;
}
conf.file = file;
/* Don't use device name generated by the rtnetlink layer when ifname
* isn't specified. Let ppp_dev_configure() set the device name using
* the PPP unit identifer as suffix (i.e. ppp<unit_id>). This allows
* userspace to infer the device name using to the PPPIOCGUNIT ioctl.
*/
if (!tb[IFLA_IFNAME])
conf.ifname_is_set = false;
err = ppp_dev_configure(src_net, dev, &conf);
out_unlock:
mutex_unlock(&ppp_mutex);
out:
fput(file);
return err;
}
static void ppp_nl_dellink(struct net_device *dev, struct list_head *head)
{
unregister_netdevice_queue(dev, head);
}
static size_t ppp_nl_get_size(const struct net_device *dev)
{
return 0;
}
static int ppp_nl_fill_info(struct sk_buff *skb, const struct net_device *dev)
{
return 0;
}
static struct net *ppp_nl_get_link_net(const struct net_device *dev)
{
struct ppp *ppp = netdev_priv(dev);
return ppp->ppp_net;
}
static struct rtnl_link_ops ppp_link_ops __read_mostly = {
.kind = "ppp",
.maxtype = IFLA_PPP_MAX,
.policy = ppp_nl_policy,
.priv_size = sizeof(struct ppp),
.setup = ppp_setup,
.validate = ppp_nl_validate,
.newlink = ppp_nl_newlink,
.dellink = ppp_nl_dellink,
.get_size = ppp_nl_get_size,
.fill_info = ppp_nl_fill_info,
.get_link_net = ppp_nl_get_link_net,
};
#define PPP_MAJOR 108
/* Called at boot time if ppp is compiled into the kernel,
or at module load time (from init_module) if compiled as a module. */
static int __init ppp_init(void)
{
int err;
pr_info("PPP generic driver version " PPP_VERSION "\n");
err = register_pernet_device(&ppp_net_ops);
if (err) {
pr_err("failed to register PPP pernet device (%d)\n", err);
goto out;
}
err = register_chrdev(PPP_MAJOR, "ppp", &ppp_device_fops);
if (err) {
pr_err("failed to register PPP device (%d)\n", err);
goto out_net;
}
ppp_class = class_create(THIS_MODULE, "ppp");
if (IS_ERR(ppp_class)) {
err = PTR_ERR(ppp_class);
goto out_chrdev;
}
err = rtnl_link_register(&ppp_link_ops);
if (err) {
pr_err("failed to register rtnetlink PPP handler\n");
goto out_class;
}
/* not a big deal if we fail here :-) */
device_create(ppp_class, NULL, MKDEV(PPP_MAJOR, 0), NULL, "ppp");
return 0;
out_class:
class_destroy(ppp_class);
out_chrdev:
unregister_chrdev(PPP_MAJOR, "ppp");
out_net:
unregister_pernet_device(&ppp_net_ops);
out:
return err;
}
/*
* Network interface unit routines.
*/
static netdev_tx_t
ppp_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ppp *ppp = netdev_priv(dev);
int npi, proto;
unsigned char *pp;
npi = ethertype_to_npindex(ntohs(skb->protocol));
if (npi < 0)
goto outf;
/* Drop, accept or reject the packet */
switch (ppp->npmode[npi]) {
case NPMODE_PASS:
break;
case NPMODE_QUEUE:
/* it would be nice to have a way to tell the network
system to queue this one up for later. */
goto outf;
case NPMODE_DROP:
case NPMODE_ERROR:
goto outf;
}
/* Put the 2-byte PPP protocol number on the front,
making sure there is room for the address and control fields. */
if (skb_cow_head(skb, PPP_HDRLEN))
goto outf;
pp = skb_push(skb, 2);
proto = npindex_to_proto[npi];
put_unaligned_be16(proto, pp);
skb_scrub_packet(skb, !net_eq(ppp->ppp_net, dev_net(dev)));
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
ppp_xmit_process(ppp, skb);
return NETDEV_TX_OK;
outf:
kfree_skb(skb);
++dev->stats.tx_dropped;
return NETDEV_TX_OK;
}
static int
ppp_net_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
struct ppp *ppp = netdev_priv(dev);
int err = -EFAULT;
void __user *addr = (void __user *) ifr->ifr_ifru.ifru_data;
struct ppp_stats stats;
struct ppp_comp_stats cstats;
char *vers;
switch (cmd) {
case SIOCGPPPSTATS:
ppp_get_stats(ppp, &stats);
if (copy_to_user(addr, &stats, sizeof(stats)))
break;
err = 0;
break;
case SIOCGPPPCSTATS:
memset(&cstats, 0, sizeof(cstats));
if (ppp->xc_state)
ppp->xcomp->comp_stat(ppp->xc_state, &cstats.c);
if (ppp->rc_state)
ppp->rcomp->decomp_stat(ppp->rc_state, &cstats.d);
if (copy_to_user(addr, &cstats, sizeof(cstats)))
break;
err = 0;
break;
case SIOCGPPPVER:
vers = PPP_VERSION;
if (copy_to_user(addr, vers, strlen(vers) + 1))
break;
err = 0;
break;
default:
err = -EINVAL;
}
return err;
}
static void
ppp_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats64)
{
struct ppp *ppp = netdev_priv(dev);
ppp_recv_lock(ppp);
stats64->rx_packets = ppp->stats64.rx_packets;
stats64->rx_bytes = ppp->stats64.rx_bytes;
ppp_recv_unlock(ppp);
ppp_xmit_lock(ppp);
stats64->tx_packets = ppp->stats64.tx_packets;
stats64->tx_bytes = ppp->stats64.tx_bytes;
ppp_xmit_unlock(ppp);
stats64->rx_errors = dev->stats.rx_errors;
stats64->tx_errors = dev->stats.tx_errors;
stats64->rx_dropped = dev->stats.rx_dropped;
stats64->tx_dropped = dev->stats.tx_dropped;
stats64->rx_length_errors = dev->stats.rx_length_errors;
}
static int ppp_dev_init(struct net_device *dev)
{
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
struct ppp *ppp;
netdev_lockdep_set_classes(dev);
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
ppp = netdev_priv(dev);
/* Let the netdevice take a reference on the ppp file. This ensures
* that ppp_destroy_interface() won't run before the device gets
* unregistered.
*/
refcount_inc(&ppp->file.refcnt);
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
return 0;
}
static void ppp_dev_uninit(struct net_device *dev)
{
struct ppp *ppp = netdev_priv(dev);
struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
ppp_lock(ppp);
ppp->closing = 1;
ppp_unlock(ppp);
mutex_lock(&pn->all_ppp_mutex);
unit_put(&pn->units_idr, ppp->file.index);
mutex_unlock(&pn->all_ppp_mutex);
ppp->owner = NULL;
ppp->file.dead = 1;
wake_up_interruptible(&ppp->file.rwait);
}
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
static void ppp_dev_priv_destructor(struct net_device *dev)
{
struct ppp *ppp;
ppp = netdev_priv(dev);
if (refcount_dec_and_test(&ppp->file.refcnt))
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
ppp_destroy_interface(ppp);
}
static const struct net_device_ops ppp_netdev_ops = {
.ndo_init = ppp_dev_init,
.ndo_uninit = ppp_dev_uninit,
.ndo_start_xmit = ppp_start_xmit,
.ndo_do_ioctl = ppp_net_ioctl,
.ndo_get_stats64 = ppp_get_stats64,
};
static struct device_type ppp_type = {
.name = "ppp",
};
static void ppp_setup(struct net_device *dev)
{
dev->netdev_ops = &ppp_netdev_ops;
SET_NETDEV_DEVTYPE(dev, &ppp_type);
ppp: declare PPP devices as LLTX ppp_xmit_process() already locks the xmit path. If HARD_TX_LOCK() tries to hold the _xmit_lock we can get lock inversion. [ 973.726130] ====================================================== [ 973.727311] [ INFO: possible circular locking dependency detected ] [ 973.728546] 4.8.0-rc2 #1 Tainted: G O [ 973.728986] ------------------------------------------------------- [ 973.728986] accel-pppd/1806 is trying to acquire lock: [ 973.728986] (&qdisc_xmit_lock_key){+.-...}, at: [<ffffffff8146f6fe>] sch_direct_xmit+0x8d/0x221 [ 973.728986] [ 973.728986] but task is already holding lock: [ 973.728986] (l2tp_sock){+.-...}, at: [<ffffffffa0202c4a>] l2tp_xmit_skb+0x1e8/0x5d7 [l2tp_core] [ 973.728986] [ 973.728986] which lock already depends on the new lock. [ 973.728986] [ 973.728986] [ 973.728986] the existing dependency chain (in reverse order) is: [ 973.728986] -> #3 (l2tp_sock){+.-...}: [ 973.728986] [<ffffffff810b3130>] lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff815752f4>] _raw_spin_lock+0x2d/0x3c [ 973.728986] [<ffffffffa0202c4a>] l2tp_xmit_skb+0x1e8/0x5d7 [l2tp_core] [ 973.728986] [<ffffffffa01b2466>] pppol2tp_xmit+0x1f2/0x25e [l2tp_ppp] [ 973.728986] [<ffffffffa0184f59>] ppp_channel_push+0xb5/0x14a [ppp_generic] [ 973.728986] [<ffffffffa01853ed>] ppp_write+0x104/0x11c [ppp_generic] [ 973.728986] [<ffffffff811b2ec6>] __vfs_write+0x56/0x120 [ 973.728986] [<ffffffff811b3f4c>] vfs_write+0xbd/0x11b [ 973.728986] [<ffffffff811b4cb2>] SyS_write+0x5e/0x96 [ 973.728986] [<ffffffff81575ba5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 973.728986] -> #2 (&(&pch->downl)->rlock){+.-...}: [ 973.728986] [<ffffffff810b3130>] lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff81575334>] _raw_spin_lock_bh+0x31/0x40 [ 973.728986] [<ffffffffa01808e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 973.728986] [<ffffffffa0184675>] __ppp_xmit_process+0x48/0x877 [ppp_generic] [ 973.728986] [<ffffffffa018505b>] ppp_xmit_process+0x4b/0xaf [ppp_generic] [ 973.728986] [<ffffffffa01853f7>] ppp_write+0x10e/0x11c [ppp_generic] [ 973.728986] [<ffffffff811b2ec6>] __vfs_write+0x56/0x120 [ 973.728986] [<ffffffff811b3f4c>] vfs_write+0xbd/0x11b [ 973.728986] [<ffffffff811b4cb2>] SyS_write+0x5e/0x96 [ 973.728986] [<ffffffff81575ba5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 973.728986] -> #1 (&(&ppp->wlock)->rlock){+.-...}: [ 973.728986] [<ffffffff810b3130>] lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff81575334>] _raw_spin_lock_bh+0x31/0x40 [ 973.728986] [<ffffffffa0184654>] __ppp_xmit_process+0x27/0x877 [ppp_generic] [ 973.728986] [<ffffffffa018505b>] ppp_xmit_process+0x4b/0xaf [ppp_generic] [ 973.728986] [<ffffffffa01852da>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 973.728986] [<ffffffff8143f767>] dev_hard_start_xmit+0x1a9/0x43d [ 973.728986] [<ffffffff8146f747>] sch_direct_xmit+0xd6/0x221 [ 973.728986] [<ffffffff814401e4>] __dev_queue_xmit+0x62a/0x912 [ 973.728986] [<ffffffff814404d7>] dev_queue_xmit+0xb/0xd [ 973.728986] [<ffffffff81449978>] neigh_direct_output+0xc/0xe [ 973.728986] [<ffffffff8150e62b>] ip6_finish_output2+0x5a9/0x623 [ 973.728986] [<ffffffff81512128>] ip6_output+0x15e/0x16a [ 973.728986] [<ffffffff8153ef86>] dst_output+0x76/0x7f [ 973.728986] [<ffffffff8153f737>] mld_sendpack+0x335/0x404 [ 973.728986] [<ffffffff81541c61>] mld_send_initial_cr.part.21+0x99/0xa2 [ 973.728986] [<ffffffff8154441d>] ipv6_mc_dad_complete+0x42/0x71 [ 973.728986] [<ffffffff8151c4bd>] addrconf_dad_completed+0x1cf/0x2ea [ 973.728986] [<ffffffff8151e4fa>] addrconf_dad_work+0x453/0x520 [ 973.728986] [<ffffffff8107a393>] process_one_work+0x365/0x6f0 [ 973.728986] [<ffffffff8107aecd>] worker_thread+0x2de/0x421 [ 973.728986] [<ffffffff810816fb>] kthread+0x121/0x130 [ 973.728986] [<ffffffff81575dbf>] ret_from_fork+0x1f/0x40 [ 973.728986] -> #0 (&qdisc_xmit_lock_key){+.-...}: [ 973.728986] [<ffffffff810b28d6>] __lock_acquire+0x1118/0x1483 [ 973.728986] [<ffffffff810b3130>] lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff815752f4>] _raw_spin_lock+0x2d/0x3c [ 973.728986] [<ffffffff8146f6fe>] sch_direct_xmit+0x8d/0x221 [ 973.728986] [<ffffffff814401e4>] __dev_queue_xmit+0x62a/0x912 [ 973.728986] [<ffffffff814404d7>] dev_queue_xmit+0xb/0xd [ 973.728986] [<ffffffff81449978>] neigh_direct_output+0xc/0xe [ 973.728986] [<ffffffff81487811>] ip_finish_output2+0x5db/0x609 [ 973.728986] [<ffffffff81489590>] ip_finish_output+0x152/0x15e [ 973.728986] [<ffffffff8148a0d4>] ip_output+0x8c/0x96 [ 973.728986] [<ffffffff81489652>] ip_local_out+0x41/0x4a [ 973.728986] [<ffffffff81489e7d>] ip_queue_xmit+0x5a5/0x609 [ 973.728986] [<ffffffffa0202fe4>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 973.728986] [<ffffffffa01b2466>] pppol2tp_xmit+0x1f2/0x25e [l2tp_ppp] [ 973.728986] [<ffffffffa0184f59>] ppp_channel_push+0xb5/0x14a [ppp_generic] [ 973.728986] [<ffffffffa01853ed>] ppp_write+0x104/0x11c [ppp_generic] [ 973.728986] [<ffffffff811b2ec6>] __vfs_write+0x56/0x120 [ 973.728986] [<ffffffff811b3f4c>] vfs_write+0xbd/0x11b [ 973.728986] [<ffffffff811b4cb2>] SyS_write+0x5e/0x96 [ 973.728986] [<ffffffff81575ba5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 973.728986] [ 973.728986] other info that might help us debug this: [ 973.728986] [ 973.728986] Chain exists of: &qdisc_xmit_lock_key --> &(&pch->downl)->rlock --> l2tp_sock [ 973.728986] Possible unsafe locking scenario: [ 973.728986] [ 973.728986] CPU0 CPU1 [ 973.728986] ---- ---- [ 973.728986] lock(l2tp_sock); [ 973.728986] lock(&(&pch->downl)->rlock); [ 973.728986] lock(l2tp_sock); [ 973.728986] lock(&qdisc_xmit_lock_key); [ 973.728986] [ 973.728986] *** DEADLOCK *** [ 973.728986] [ 973.728986] 6 locks held by accel-pppd/1806: [ 973.728986] #0: (&(&pch->downl)->rlock){+.-...}, at: [<ffffffffa0184efa>] ppp_channel_push+0x56/0x14a [ppp_generic] [ 973.728986] #1: (l2tp_sock){+.-...}, at: [<ffffffffa0202c4a>] l2tp_xmit_skb+0x1e8/0x5d7 [l2tp_core] [ 973.728986] #2: (rcu_read_lock){......}, at: [<ffffffff81486981>] rcu_lock_acquire+0x0/0x20 [ 973.728986] #3: (rcu_read_lock_bh){......}, at: [<ffffffff81486981>] rcu_lock_acquire+0x0/0x20 [ 973.728986] #4: (rcu_read_lock_bh){......}, at: [<ffffffff814340e3>] rcu_lock_acquire+0x0/0x20 [ 973.728986] #5: (dev->qdisc_running_key ?: &qdisc_running_key#2){+.....}, at: [<ffffffff8144011e>] __dev_queue_xmit+0x564/0x912 [ 973.728986] [ 973.728986] stack backtrace: [ 973.728986] CPU: 2 PID: 1806 Comm: accel-pppd Tainted: G O 4.8.0-rc2 #1 [ 973.728986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 973.728986] ffff7fffffffffff ffff88003436f850 ffffffff812a20f4 ffffffff82156e30 [ 973.728986] ffffffff82156920 ffff88003436f890 ffffffff8115c759 ffff88003344ae00 [ 973.728986] ffff88003344b5c0 0000000000000002 0000000000000006 ffff88003344b5e8 [ 973.728986] Call Trace: [ 973.728986] [<ffffffff812a20f4>] dump_stack+0x67/0x90 [ 973.728986] [<ffffffff8115c759>] print_circular_bug+0x22e/0x23c [ 973.728986] [<ffffffff810b28d6>] __lock_acquire+0x1118/0x1483 [ 973.728986] [<ffffffff810b3130>] lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff810b3130>] ? lock_acquire+0x150/0x217 [ 973.728986] [<ffffffff8146f6fe>] ? sch_direct_xmit+0x8d/0x221 [ 973.728986] [<ffffffff815752f4>] _raw_spin_lock+0x2d/0x3c [ 973.728986] [<ffffffff8146f6fe>] ? sch_direct_xmit+0x8d/0x221 [ 973.728986] [<ffffffff8146f6fe>] sch_direct_xmit+0x8d/0x221 [ 973.728986] [<ffffffff814401e4>] __dev_queue_xmit+0x62a/0x912 [ 973.728986] [<ffffffff814404d7>] dev_queue_xmit+0xb/0xd [ 973.728986] [<ffffffff81449978>] neigh_direct_output+0xc/0xe [ 973.728986] [<ffffffff81487811>] ip_finish_output2+0x5db/0x609 [ 973.728986] [<ffffffff81486853>] ? dst_mtu+0x29/0x2e [ 973.728986] [<ffffffff81489590>] ip_finish_output+0x152/0x15e [ 973.728986] [<ffffffff8148a0bc>] ? ip_output+0x74/0x96 [ 973.728986] [<ffffffff8148a0d4>] ip_output+0x8c/0x96 [ 973.728986] [<ffffffff81489652>] ip_local_out+0x41/0x4a [ 973.728986] [<ffffffff81489e7d>] ip_queue_xmit+0x5a5/0x609 [ 973.728986] [<ffffffff814c559e>] ? udp_set_csum+0x207/0x21e [ 973.728986] [<ffffffffa0202fe4>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 973.728986] [<ffffffffa01b2466>] pppol2tp_xmit+0x1f2/0x25e [l2tp_ppp] [ 973.728986] [<ffffffffa0184f59>] ppp_channel_push+0xb5/0x14a [ppp_generic] [ 973.728986] [<ffffffffa01853ed>] ppp_write+0x104/0x11c [ppp_generic] [ 973.728986] [<ffffffff811b2ec6>] __vfs_write+0x56/0x120 [ 973.728986] [<ffffffff8124c11d>] ? fsnotify_perm+0x27/0x95 [ 973.728986] [<ffffffff8124d41d>] ? security_file_permission+0x4d/0x54 [ 973.728986] [<ffffffff811b3f4c>] vfs_write+0xbd/0x11b [ 973.728986] [<ffffffff811b4cb2>] SyS_write+0x5e/0x96 [ 973.728986] [<ffffffff81575ba5>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 973.728986] [<ffffffff810ae0fa>] ? trace_hardirqs_off_caller+0x121/0x12f Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:51 +07:00
dev->features |= NETIF_F_LLTX;
dev->hard_header_len = PPP_HDRLEN;
dev->mtu = PPP_MRU;
dev->addr_len = 0;
dev->tx_queue_len = 3;
dev->type = ARPHRD_PPP;
dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
ppp: fix race in ppp device destruction ppp_release() tries to ensure that netdevices are unregistered before decrementing the unit refcount and running ppp_destroy_interface(). This is all fine as long as the the device is unregistered by ppp_release(): the unregister_netdevice() call, followed by rtnl_unlock(), guarantee that the unregistration process completes before rtnl_unlock() returns. However, the device may be unregistered by other means (like ppp_nl_dellink()). If this happens right before ppp_release() calling rtnl_lock(), then ppp_release() has to wait for the concurrent unregistration code to release the lock. But rtnl_unlock() releases the lock before completing the device unregistration process. This allows ppp_release() to proceed and eventually call ppp_destroy_interface() before the unregistration process completes. Calling free_netdev() on this partially unregistered device will BUG(): ------------[ cut here ]------------ kernel BUG at net/core/dev.c:8141! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 Call Trace: ppp_destroy_interface+0xd8/0xe0 [ppp_generic] ppp_disconnect_channel+0xda/0x110 [ppp_generic] ppp_unregister_channel+0x5e/0x110 [ppp_generic] pppox_unbind_sock+0x23/0x30 [pppox] pppoe_connect+0x130/0x440 [pppoe] SYSC_connect+0x98/0x110 ? do_fcntl+0x2c0/0x5d0 SyS_connect+0xe/0x10 entry_SYSCALL_64_fastpath+0x1a/0xa5 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88 ---[ end trace ed294ff0cc40eeff ]--- We could set the ->needs_free_netdev flag on PPP devices and move the ppp_destroy_interface() logic in the ->priv_destructor() callback. But that'd be quite intrusive as we'd first need to unlink from the other channels and units that depend on the device (the ones that used the PPPIOCCONNECT and PPPIOCATTACH ioctls). Instead, we can just let the netdevice hold a reference on its ppp_file. This reference is dropped in ->priv_destructor(), at the very end of the unregistration process, so that neither ppp_release() nor ppp_disconnect_channel() can call ppp_destroy_interface() in the interim. Reported-by: Beniamino Galvani <bgalvani@redhat.com> Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 22:05:49 +07:00
dev->priv_destructor = ppp_dev_priv_destructor;
netif_keep_dst(dev);
}
/*
* Transmit-side routines.
*/
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
/* Called to do any work queued up on the transmit side that can now be done */
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
static void __ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
{
ppp_xmit_lock(ppp);
if (!ppp->closing) {
ppp_push(ppp);
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
if (skb)
skb_queue_tail(&ppp->file.xq, skb);
while (!ppp->xmit_pending &&
(skb = skb_dequeue(&ppp->file.xq)))
ppp_send_frame(ppp, skb);
/* If there's no work left to do, tell the core net
code that we can accept some more. */
if (!ppp->xmit_pending && !skb_peek(&ppp->file.xq))
netif_wake_queue(ppp->dev);
else
netif_stop_queue(ppp->dev);
}
ppp_xmit_unlock(ppp);
}
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
static void ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
{
local_bh_disable();
if (unlikely(*this_cpu_ptr(ppp->xmit_recursion)))
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
goto err;
(*this_cpu_ptr(ppp->xmit_recursion))++;
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
__ppp_xmit_process(ppp, skb);
(*this_cpu_ptr(ppp->xmit_recursion))--;
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
local_bh_enable();
return;
err:
local_bh_enable();
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
kfree_skb(skb);
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
if (net_ratelimit())
netdev_err(ppp->dev, "recursion detected\n");
}
static inline struct sk_buff *
pad_compress_skb(struct ppp *ppp, struct sk_buff *skb)
{
struct sk_buff *new_skb;
int len;
int new_skb_size = ppp->dev->mtu +
ppp->xcomp->comp_extra + ppp->dev->hard_header_len;
int compressor_skb_size = ppp->dev->mtu +
ppp->xcomp->comp_extra + PPP_HDRLEN;
new_skb = alloc_skb(new_skb_size, GFP_ATOMIC);
if (!new_skb) {
if (net_ratelimit())
netdev_err(ppp->dev, "PPP: no memory (comp pkt)\n");
return NULL;
}
if (ppp->dev->hard_header_len > PPP_HDRLEN)
skb_reserve(new_skb,
ppp->dev->hard_header_len - PPP_HDRLEN);
/* compressor still expects A/C bytes in hdr */
len = ppp->xcomp->compress(ppp->xc_state, skb->data - 2,
new_skb->data, skb->len + 2,
compressor_skb_size);
if (len > 0 && (ppp->flags & SC_CCP_UP)) {
consume_skb(skb);
skb = new_skb;
skb_put(skb, len);
skb_pull(skb, 2); /* pull off A/C bytes */
} else if (len == 0) {
/* didn't compress, or CCP not up yet */
consume_skb(new_skb);
new_skb = skb;
} else {
/*
* (len < 0)
* MPPE requires that we do not send unencrypted
* frames. The compressor will return -1 if we
* should drop the frame. We cannot simply test
* the compress_proto because MPPE and MPPC share
* the same number.
*/
if (net_ratelimit())
netdev_err(ppp->dev, "ppp: compressor dropped pkt\n");
kfree_skb(skb);
consume_skb(new_skb);
new_skb = NULL;
}
return new_skb;
}
/*
* Compress and send a frame.
* The caller should have locked the xmit path,
* and xmit_pending should be 0.
*/
static void
ppp_send_frame(struct ppp *ppp, struct sk_buff *skb)
{
int proto = PPP_PROTO(skb);
struct sk_buff *new_skb;
int len;
unsigned char *cp;
if (proto < 0x8000) {
#ifdef CONFIG_PPP_FILTER
/* check if we should pass this packet */
/* the filter instructions are constructed assuming
a four-byte PPP header on each packet */
*(u8 *)skb_push(skb, 2) = 1;
if (ppp->pass_filter &&
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
BPF_PROG_RUN(ppp->pass_filter, skb) == 0) {
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: outbound frame "
"not passed\n");
kfree_skb(skb);
return;
}
/* if this packet passes the active filter, record the time */
if (!(ppp->active_filter &&
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
BPF_PROG_RUN(ppp->active_filter, skb) == 0))
ppp->last_xmit = jiffies;
skb_pull(skb, 2);
#else
/* for data packets, record the time */
ppp->last_xmit = jiffies;
#endif /* CONFIG_PPP_FILTER */
}
++ppp->stats64.tx_packets;
ppp->stats64.tx_bytes += skb->len - 2;
switch (proto) {
case PPP_IP:
if (!ppp->vj || (ppp->flags & SC_COMP_TCP) == 0)
break;
/* try to do VJ TCP header compression */
new_skb = alloc_skb(skb->len + ppp->dev->hard_header_len - 2,
GFP_ATOMIC);
if (!new_skb) {
netdev_err(ppp->dev, "PPP: no memory (VJ comp pkt)\n");
goto drop;
}
skb_reserve(new_skb, ppp->dev->hard_header_len - 2);
cp = skb->data + 2;
len = slhc_compress(ppp->vj, cp, skb->len - 2,
new_skb->data + 2, &cp,
!(ppp->flags & SC_NO_TCP_CCID));
if (cp == skb->data + 2) {
/* didn't compress */
consume_skb(new_skb);
} else {
if (cp[0] & SL_TYPE_COMPRESSED_TCP) {
proto = PPP_VJC_COMP;
cp[0] &= ~SL_TYPE_COMPRESSED_TCP;
} else {
proto = PPP_VJC_UNCOMP;
cp[0] = skb->data[2];
}
consume_skb(skb);
skb = new_skb;
cp = skb_put(skb, len + 2);
cp[0] = 0;
cp[1] = proto;
}
break;
case PPP_CCP:
/* peek at outbound CCP frames */
ppp_ccp_peek(ppp, skb, 0);
break;
}
/* try to do packet compression */
if ((ppp->xstate & SC_COMP_RUN) && ppp->xc_state &&
proto != PPP_LCP && proto != PPP_CCP) {
if (!(ppp->flags & SC_CCP_UP) && (ppp->flags & SC_MUST_COMP)) {
if (net_ratelimit())
netdev_err(ppp->dev,
"ppp: compression required but "
"down - pkt dropped.\n");
goto drop;
}
skb = pad_compress_skb(ppp, skb);
if (!skb)
goto drop;
}
/*
* If we are waiting for traffic (demand dialling),
* queue it up for pppd to receive.
*/
if (ppp->flags & SC_LOOP_TRAFFIC) {
if (ppp->file.rq.qlen > PPP_MAX_RQLEN)
goto drop;
skb_queue_tail(&ppp->file.rq, skb);
wake_up_interruptible(&ppp->file.rwait);
return;
}
ppp->xmit_pending = skb;
ppp_push(ppp);
return;
drop:
kfree_skb(skb);
++ppp->dev->stats.tx_errors;
}
/*
* Try to send the frame in xmit_pending.
* The caller should have the xmit path locked.
*/
static void
ppp_push(struct ppp *ppp)
{
struct list_head *list;
struct channel *pch;
struct sk_buff *skb = ppp->xmit_pending;
if (!skb)
return;
list = &ppp->channels;
if (list_empty(list)) {
/* nowhere to send the packet, just drop it */
ppp->xmit_pending = NULL;
kfree_skb(skb);
return;
}
if ((ppp->flags & SC_MULTILINK) == 0) {
/* not doing multilink: send it down the first channel */
list = list->next;
pch = list_entry(list, struct channel, clist);
spin_lock(&pch->downl);
if (pch->chan) {
if (pch->chan->ops->start_xmit(pch->chan, skb))
ppp->xmit_pending = NULL;
} else {
/* channel got unregistered */
kfree_skb(skb);
ppp->xmit_pending = NULL;
}
spin_unlock(&pch->downl);
return;
}
#ifdef CONFIG_PPP_MULTILINK
/* Multilink: fragment the packet over as many links
as can take the packet at the moment. */
if (!ppp_mp_explode(ppp, skb))
return;
#endif /* CONFIG_PPP_MULTILINK */
ppp->xmit_pending = NULL;
kfree_skb(skb);
}
#ifdef CONFIG_PPP_MULTILINK
static bool mp_protocol_compress __read_mostly = true;
module_param(mp_protocol_compress, bool, 0644);
MODULE_PARM_DESC(mp_protocol_compress,
"compress protocol id in multilink fragments");
/*
* Divide a packet to be transmitted into fragments and
* send them out the individual links.
*/
static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb)
{
int len, totlen;
int i, bits, hdrlen, mtu;
int flen;
int navail, nfree, nzero;
int nbigger;
int totspeed;
int totfree;
unsigned char *p, *q;
struct list_head *list;
struct channel *pch;
struct sk_buff *frag;
struct ppp_channel *chan;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
totspeed = 0; /*total bitrate of the bundle*/
nfree = 0; /* # channels which have no packet already queued */
navail = 0; /* total # of usable channels (not deregistered) */
nzero = 0; /* number of channels with zero speed associated*/
totfree = 0; /*total # of channels available and
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
*having no queued packets before
*starting the fragmentation*/
hdrlen = (ppp->flags & SC_MP_XSHORTSEQ)? MPHDRLEN_SSN: MPHDRLEN;
i = 0;
list_for_each_entry(pch, &ppp->channels, clist) {
if (pch->chan) {
pch->avail = 1;
navail++;
pch->speed = pch->chan->speed;
} else {
pch->avail = 0;
}
if (pch->avail) {
if (skb_queue_empty(&pch->file.xq) ||
!pch->had_frag) {
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
if (pch->speed == 0)
nzero++;
else
totspeed += pch->speed;
pch->avail = 2;
++nfree;
++totfree;
}
if (!pch->had_frag && i < ppp->nxchan)
ppp->nxchan = i;
}
++i;
}
/*
* Don't start sending this packet unless at least half of
* the channels are free. This gives much better TCP
* performance if we have a lot of channels.
*/
if (nfree == 0 || nfree < navail / 2)
return 0; /* can't take now, leave it in xmit_pending */
/* Do protocol field compression */
p = skb->data;
len = skb->len;
if (*p == 0 && mp_protocol_compress) {
++p;
--len;
}
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
totlen = len;
nbigger = len % nfree;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
/* skip to the channel after the one we last used
and start at that one */
list = &ppp->channels;
for (i = 0; i < ppp->nxchan; ++i) {
list = list->next;
if (list == &ppp->channels) {
i = 0;
break;
}
}
/* create a fragment for each channel */
bits = B;
while (len > 0) {
list = list->next;
if (list == &ppp->channels) {
i = 0;
continue;
}
pch = list_entry(list, struct channel, clist);
++i;
if (!pch->avail)
continue;
/*
* Skip this channel if it has a fragment pending already and
* we haven't given a fragment to all of the free channels.
*/
if (pch->avail == 1) {
if (nfree > 0)
continue;
} else {
pch->avail = 1;
}
/* check the channel's mtu and whether it is still attached. */
spin_lock(&pch->downl);
if (pch->chan == NULL) {
/* can't use this channel, it's being deregistered */
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
if (pch->speed == 0)
nzero--;
else
totspeed -= pch->speed;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
spin_unlock(&pch->downl);
pch->avail = 0;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
totlen = len;
totfree--;
nfree--;
if (--navail == 0)
break;
continue;
}
/*
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
*if the channel speed is not set divide
*the packet evenly among the free channels;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
*otherwise divide it according to the speed
*of the channel we are going to transmit on
*/
flen = len;
if (nfree > 0) {
if (pch->speed == 0) {
flen = len/nfree;
if (nbigger > 0) {
flen++;
nbigger--;
}
} else {
flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
((totspeed*totfree)/pch->speed)) - hdrlen;
if (nbigger > 0) {
flen += ((totfree - nzero)*pch->speed)/totspeed;
nbigger -= ((totfree - nzero)*pch->speed)/
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
totspeed;
}
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
}
nfree--;
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
}
/*
*check if we are on the last channel or
*we exceded the length of the data to
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
*fragment
*/
if ((nfree <= 0) || (flen > len))
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
flen = len;
/*
*it is not worth to tx on slow channels:
*in that case from the resulting flen according to the
*above formula will be equal or less than zero.
*Skip the channel in this case
*/
if (flen <= 0) {
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
pch->avail = 2;
spin_unlock(&pch->downl);
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
continue;
}
/*
* hdrlen includes the 2-byte PPP protocol field, but the
* MTU counts only the payload excluding the protocol field.
* (RFC1661 Section 2)
*/
mtu = pch->chan->mtu - (hdrlen - 2);
if (mtu < 4)
mtu = 4;
if (flen > mtu)
flen = mtu;
if (flen == len)
bits |= E;
frag = alloc_skb(flen + hdrlen + (flen == 0), GFP_ATOMIC);
if (!frag)
goto noskb;
q = skb_put(frag, flen + hdrlen);
/* make the MP header */
put_unaligned_be16(PPP_MP, q);
if (ppp->flags & SC_MP_XSHORTSEQ) {
q[2] = bits + ((ppp->nxseq >> 8) & 0xf);
q[3] = ppp->nxseq;
} else {
q[2] = bits;
q[3] = ppp->nxseq >> 16;
q[4] = ppp->nxseq >> 8;
q[5] = ppp->nxseq;
}
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
memcpy(q + hdrlen, p, flen);
/* try to send it down the channel */
chan = pch->chan;
if (!skb_queue_empty(&pch->file.xq) ||
ppp: ppp_mp_explode() redesign I found the PPP subsystem to not work properly when connecting channels with different speeds to the same bundle. Problem Description: As the "ppp_mp_explode" function fragments the sk_buff buffer evenly among the PPP channels that are connected to a certain PPP unit to make up a bundle, if we are transmitting using an upper layer protocol that requires an Ack before sending the next packet (like TCP/IP for example), we will have a bandwidth bottleneck on the slowest channel of the bundle. Let's clarify by an example. Let's consider a scenario where we have two PPP links making up a bundle: a slow link (10KB/sec) and a fast link (1000KB/sec) working at the best (full bandwidth). On the top we have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the PPP subsystem. The "ppp_mp_explode" function will divide the buffer in two fragments of 500B each (we are neglecting all the headers, crc, flags etc?.). Before the TCP/IP stack sends out the next buffer, it will have to wait for the ACK response from the remote peer, so it will have to wait for both fragments to have been sent over the two PPP links, received by the remote peer and reconstructed. The resulting behaviour is that, rather than having a bundle working @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle working @20KB/sec (the double of the slowest channels bandwidth). Problem Solution: The problem has been solved by redesigning the "ppp_mp_explode" function in such a way to make it split the sk_buff buffer according to the speeds of the underlying PPP channels (the speeds of the serial interfaces respectively attached to the PPP channels). Referring to the above example, the redesigned "ppp_mp_explode" function will now divide the 1000 Bytes buffer into two fragments whose sizes are set according to the speeds of the channels where they are going to be sent on (e.g . 10 Byets on 10KB/sec channel and 990 Bytes on 1000KB/sec channel). The reworked function grants the same performances of the original one in optimal working conditions (i.e. a bundle made up of PPP links all working at the same speed), while greatly improving performances on the bundles made up of channels working at different speeds. Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-14 06:09:12 +07:00
!chan->ops->start_xmit(chan, frag))
skb_queue_tail(&pch->file.xq, frag);
pch->had_frag = 1;
p += flen;
len -= flen;
++ppp->nxseq;
bits = 0;
spin_unlock(&pch->downl);
}
ppp->nxchan = i;
return 1;
noskb:
spin_unlock(&pch->downl);
if (ppp->debug & 1)
netdev_err(ppp->dev, "PPP: no memory (fragment)\n");
++ppp->dev->stats.tx_errors;
++ppp->nxseq;
return 1; /* abandon the frame */
}
#endif /* CONFIG_PPP_MULTILINK */
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
/* Try to send data out on a channel */
static void __ppp_channel_push(struct channel *pch)
{
struct sk_buff *skb;
struct ppp *ppp;
spin_lock(&pch->downl);
if (pch->chan) {
while (!skb_queue_empty(&pch->file.xq)) {
skb = skb_dequeue(&pch->file.xq);
if (!pch->chan->ops->start_xmit(pch->chan, skb)) {
/* put the packet back and try again later */
skb_queue_head(&pch->file.xq, skb);
break;
}
}
} else {
/* channel got deregistered */
skb_queue_purge(&pch->file.xq);
}
spin_unlock(&pch->downl);
/* see if there is anything from the attached unit to be sent */
if (skb_queue_empty(&pch->file.xq)) {
ppp = pch->ppp;
if (ppp)
ppp: avoid loop in xmit recursion detection code We already detect situations where a PPP channel sends packets back to its upper PPP device. While this is enough to avoid deadlocking on xmit locks, this doesn't prevent packets from looping between the channel and the unit. The problem is that ppp_start_xmit() enqueues packets in ppp->file.xq before checking for xmit recursion. Therefore, __ppp_xmit_process() might dequeue a packet from ppp->file.xq and send it on the channel which, in turn, loops it back on the unit. Then ppp_start_xmit() queues the packet back to ppp->file.xq and __ppp_xmit_process() picks it up and sends it again through the channel. Therefore, the packet will loop between __ppp_xmit_process() and ppp_start_xmit() until some other part of the xmit path drops it. For L2TP, we rapidly fill the skb's headroom and pppol2tp_xmit() drops the packet after a few iterations. But PPTP reallocates the headroom if necessary, letting the loop run and exhaust the machine resources (as reported in https://bugzilla.kernel.org/show_bug.cgi?id=199109). Fix this by letting __ppp_xmit_process() enqueue the skb to ppp->file.xq, so that we can check for recursion before adding it to the queue. Now ppp_xmit_process() can drop the packet when recursion is detected. __ppp_channel_push() is a bit special. It calls __ppp_xmit_process() without having any actual packet to send. This is used by ppp_output_wakeup() to re-enable transmission on the parent unit (for implementations like ppp_async.c, where the .start_xmit() function might not consume the skb, leaving it in ppp->xmit_pending and disabling transmission). Therefore, __ppp_xmit_process() needs to handle the case where skb is NULL, dequeuing as many packets as possible from ppp->file.xq. Reported-by: xu heng <xuheng333@zoho.com> Fixes: 55454a565836 ("ppp: avoid dealock on recursive xmit") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-20 22:49:26 +07:00
__ppp_xmit_process(ppp, NULL);
}
}
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
static void ppp_channel_push(struct channel *pch)
{
ppp: fix xmit recursion detection on ppp channels Commit e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices") dropped the xmit_recursion counter incrementation in ppp_channel_push() and relied on ppp_xmit_process() for this task. But __ppp_channel_push() can also send packets directly (using the .start_xmit() channel callback), in which case the xmit_recursion counter isn't incremented anymore. If such packets get routed back to the parent ppp unit, ppp_xmit_process() won't notice the recursion and will call ppp_channel_push() on the same channel, effectively creating the deadlock situation that the xmit_recursion mechanism was supposed to prevent. This patch re-introduces the xmit_recursion counter incrementation in ppp_channel_push(). Since the xmit_recursion variable is now part of the parent ppp unit, incrementation is skipped if the channel doesn't have any. This is fine because only packets routed through the parent unit may enter the channel recursively. Finally, we have to ensure that pch->ppp is not going to be modified while executing ppp_channel_push(). Instead of taking this lock only while calling ppp_xmit_process(), we now have to hold it for the full ppp_channel_push() execution. This respects the ppp locks ordering which requires locking ->upl before ->downl. Fixes: e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp devices") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 16:43:24 +07:00
read_lock_bh(&pch->upl);
if (pch->ppp) {
(*this_cpu_ptr(pch->ppp->xmit_recursion))++;
__ppp_channel_push(pch);
(*this_cpu_ptr(pch->ppp->xmit_recursion))--;
} else {
__ppp_channel_push(pch);
}
read_unlock_bh(&pch->upl);
ppp: avoid dealock on recursive xmit In case of misconfiguration, a virtual PPP channel might send packets back to their parent PPP interface. This typically happens in misconfigured L2TP setups, where PPP's peer IP address is set with the IP of the L2TP peer. When that happens the system hangs due to PPP trying to recursively lock its xmit path. [ 243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926 [ 243.333272] lock: 0xffff880033d90f18, .magic: dead4ead, .owner: accel-pppd/926, .owner_cpu: 1 [ 243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1 [ 243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 243.336018] ffff7fffffffffff ffff8800319a77a0 ffffffff8128de85 ffff880033d90f18 [ 243.336018] ffff880033ad8000 ffff8800319a77d8 ffffffff810ad7c0 ffffffff0000039e [ 243.336018] ffff880033d90f18 ffff880033d90f60 ffff880033d90f18 ffff880033d90f28 [ 243.336018] Call Trace: [ 243.336018] [<ffffffff8128de85>] dump_stack+0x4f/0x65 [ 243.336018] [<ffffffff810ad7c0>] spin_dump+0xe1/0xeb [ 243.336018] [<ffffffff810ad7f0>] spin_bug+0x26/0x28 [ 243.336018] [<ffffffff810ad8b9>] do_raw_spin_lock+0x5c/0x160 [ 243.336018] [<ffffffff815522aa>] _raw_spin_lock_bh+0x35/0x3c [ 243.336018] [<ffffffffa01a88e2>] ? ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffffa01a88e2>] ppp_push+0xa7/0x82d [ppp_generic] [ 243.336018] [<ffffffff810adada>] ? do_raw_spin_unlock+0xc2/0xcc [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81552438>] ? _raw_spin_unlock_irqrestore+0x34/0x49 [ 243.336018] [<ffffffffa01ac657>] ppp_xmit_process+0x48/0x877 [ppp_generic] [ 243.336018] [<ffffffff81084962>] ? preempt_count_sub+0x13/0xc7 [ 243.336018] [<ffffffff81408cd3>] ? skb_queue_tail+0x71/0x7c [ 243.336018] [<ffffffffa01ad1c5>] ppp_start_xmit+0x21b/0x22a [ppp_generic] [ 243.336018] [<ffffffff81426af1>] dev_hard_start_xmit+0x15e/0x32c [ 243.336018] [<ffffffff81454ed7>] sch_direct_xmit+0xd6/0x221 [ 243.336018] [<ffffffff814273a8>] __dev_queue_xmit+0x52a/0x820 [ 243.336018] [<ffffffff814276a9>] dev_queue_xmit+0xb/0xd [ 243.336018] [<ffffffff81430a3c>] neigh_direct_output+0xc/0xe [ 243.336018] [<ffffffff8146b5d7>] ip_finish_output2+0x4d2/0x548 [ 243.336018] [<ffffffff8146a8e6>] ? dst_mtu+0x29/0x2e [ 243.336018] [<ffffffff8146d49c>] ip_finish_output+0x152/0x15e [ 243.336018] [<ffffffff8146df84>] ? ip_output+0x74/0x96 [ 243.336018] [<ffffffff8146df9c>] ip_output+0x8c/0x96 [ 243.336018] [<ffffffff8146d55e>] ip_local_out+0x41/0x4a [ 243.336018] [<ffffffff8146dd15>] ip_queue_xmit+0x531/0x5c5 [ 243.336018] [<ffffffff814a82cd>] ? udp_set_csum+0x207/0x21e [ 243.336018] [<ffffffffa01f2f04>] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core] [ 243.336018] [<ffffffffa01ea458>] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp] [ 243.336018] [<ffffffffa01acf17>] ppp_channel_push+0x91/0x102 [ppp_generic] [ 243.336018] [<ffffffffa01ad2d8>] ppp_write+0x104/0x11c [ppp_generic] [ 243.336018] [<ffffffff811a3c1e>] __vfs_write+0x56/0x120 [ 243.336018] [<ffffffff81239801>] ? fsnotify_perm+0x27/0x95 [ 243.336018] [<ffffffff8123ab01>] ? security_file_permission+0x4d/0x54 [ 243.336018] [<ffffffff811a4ca4>] vfs_write+0xbd/0x11b [ 243.336018] [<ffffffff811a5a0a>] SyS_write+0x5e/0x96 [ 243.336018] [<ffffffff81552a1b>] entry_SYSCALL_64_fastpath+0x13/0x94 The main entry points for sending packets over a PPP unit are the .write() and .ndo_start_xmit() callbacks (simplified view): .write(unit fd) or .ndo_start_xmit() \ CALL ppp_xmit_process() \ LOCK unit's xmit path (ppp->wlock) | CALL ppp_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might recursively call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_push() | UNLOCK unit's xmit path / RETURN from ppp_xmit_process() Packets can also be directly sent on channels (e.g. LCP packets): .write(channel fd) or ppp_output_wakeup() \ CALL ppp_channel_push() \ LOCK channel's xmit path (chan->downl) | CALL lower layer's .start_xmit() callback \ ... might call .ndo_start_xmit() ... / RETURN from .start_xmit() | UNLOCK channel's xmit path / RETURN from ppp_channel_push() Key points about the lower layer's .start_xmit() callback: * It can be called directly by a channel fd .write() or by ppp_output_wakeup() or indirectly by a unit fd .write() or by .ndo_start_xmit(). * In any case, it's always called with chan->downl held. * It might route the packet back to its parent unit using .ndo_start_xmit() as entry point. This patch detects and breaks recursion in ppp_xmit_process(). This function is a good candidate for the task because it's called early enough after .ndo_start_xmit(), it's always part of the recursion loop and it's on the path of whatever entry point is used to send a packet on a PPP unit. Recursion detection is done using the per-cpu ppp_xmit_recursion variable. Since ppp_channel_push() too locks the channel's xmit path and calls the lower layer's .start_xmit() callback, we need to also increment ppp_xmit_recursion there. However there's no need to check for recursion, as it's out of the recursion loop. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 03:22:32 +07:00
}
/*
* Receive-side routines.
*/
struct ppp_mp_skb_parm {
u32 sequence;
u8 BEbits;
};
#define PPP_MP_CB(skb) ((struct ppp_mp_skb_parm *)((skb)->cb))
static inline void
ppp_do_recv(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
{
ppp_recv_lock(ppp);
if (!ppp->closing)
ppp_receive_frame(ppp, skb, pch);
else
kfree_skb(skb);
ppp_recv_unlock(ppp);
}
/**
* __ppp_decompress_proto - Decompress protocol field, slim version.
* @skb: Socket buffer where protocol field should be decompressed. It must have
* at least 1 byte of head room and 1 byte of linear data. First byte of
* data must be a protocol field byte.
*
* Decompress protocol field in PPP header if it's compressed, e.g. when
* Protocol-Field-Compression (PFC) was negotiated. No checks w.r.t. skb data
* length are done in this function.
*/
static void __ppp_decompress_proto(struct sk_buff *skb)
{
if (skb->data[0] & 0x01)
*(u8 *)skb_push(skb, 1) = 0x00;
}
/**
* ppp_decompress_proto - Check skb data room and decompress protocol field.
* @skb: Socket buffer where protocol field should be decompressed. First byte
* of data must be a protocol field byte.
*
* Decompress protocol field in PPP header if it's compressed, e.g. when
* Protocol-Field-Compression (PFC) was negotiated. This function also makes
* sure that skb data room is sufficient for Protocol field, before and after
* decompression.
*
* Return: true - decompressed successfully, false - not enough room in skb.
*/
static bool ppp_decompress_proto(struct sk_buff *skb)
{
/* At least one byte should be present (if protocol is compressed) */
if (!pskb_may_pull(skb, 1))
return false;
__ppp_decompress_proto(skb);
/* Protocol field should occupy 2 bytes when not compressed */
return pskb_may_pull(skb, 2);
}
void
ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
{
struct channel *pch = chan->ppp;
int proto;
if (!pch) {
kfree_skb(skb);
return;
}
read_lock_bh(&pch->upl);
if (!ppp_decompress_proto(skb)) {
kfree_skb(skb);
if (pch->ppp) {
++pch->ppp->dev->stats.rx_length_errors;
ppp_receive_error(pch->ppp);
}
goto done;
}
proto = PPP_PROTO(skb);
if (!pch->ppp || proto >= 0xc000 || proto == PPP_CCPFRAG) {
/* put it on the channel queue */
skb_queue_tail(&pch->file.rq, skb);
/* drop old frames if queue too long */
while (pch->file.rq.qlen > PPP_MAX_RQLEN &&
(skb = skb_dequeue(&pch->file.rq)))
kfree_skb(skb);
wake_up_interruptible(&pch->file.rwait);
} else {
ppp_do_recv(pch->ppp, skb, pch);
}
done:
read_unlock_bh(&pch->upl);
}
/* Put a 0-length skb in the receive queue as an error indication */
void
ppp_input_error(struct ppp_channel *chan, int code)
{
struct channel *pch = chan->ppp;
struct sk_buff *skb;
if (!pch)
return;
read_lock_bh(&pch->upl);
if (pch->ppp) {
skb = alloc_skb(0, GFP_ATOMIC);
if (skb) {
skb->len = 0; /* probably unnecessary */
skb->cb[0] = code;
ppp_do_recv(pch->ppp, skb, pch);
}
}
read_unlock_bh(&pch->upl);
}
/*
* We come in here to process a received frame.
* The receive side of the ppp unit is locked.
*/
static void
ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
{
/* note: a 0-length skb is used as an error indication */
if (skb->len > 0) {
skb_checksum_complete_unset(skb);
#ifdef CONFIG_PPP_MULTILINK
/* XXX do channel-level decompression here */
if (PPP_PROTO(skb) == PPP_MP)
ppp_receive_mp_frame(ppp, skb, pch);
else
#endif /* CONFIG_PPP_MULTILINK */
ppp_receive_nonmp_frame(ppp, skb);
} else {
kfree_skb(skb);
ppp_receive_error(ppp);
}
}
static void
ppp_receive_error(struct ppp *ppp)
{
++ppp->dev->stats.rx_errors;
if (ppp->vj)
slhc_toss(ppp->vj);
}
static void
ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb)
{
struct sk_buff *ns;
int proto, len, npi;
/*
* Decompress the frame, if compressed.
* Note that some decompressors need to see uncompressed frames
* that come in as well as compressed frames.
*/
if (ppp->rc_state && (ppp->rstate & SC_DECOMP_RUN) &&
(ppp->rstate & (SC_DC_FERROR | SC_DC_ERROR)) == 0)
skb = ppp_decompress_frame(ppp, skb);
if (ppp->flags & SC_MUST_COMP && ppp->rstate & SC_DC_FERROR)
goto err;
/* At this point the "Protocol" field MUST be decompressed, either in
* ppp_input(), ppp_decompress_frame() or in ppp_receive_mp_frame().
*/
proto = PPP_PROTO(skb);
switch (proto) {
case PPP_VJC_COMP:
/* decompress VJ compressed packets */
if (!ppp->vj || (ppp->flags & SC_REJ_COMP_TCP))
goto err;
if (skb_tailroom(skb) < 124 || skb_cloned(skb)) {
/* copy to a new sk_buff with more tailroom */
ns = dev_alloc_skb(skb->len + 128);
if (!ns) {
netdev_err(ppp->dev, "PPP: no memory "
"(VJ decomp)\n");
goto err;
}
skb_reserve(ns, 2);
skb_copy_bits(skb, 0, skb_put(ns, skb->len), skb->len);
consume_skb(skb);
skb = ns;
}
else
skb->ip_summed = CHECKSUM_NONE;
len = slhc_uncompress(ppp->vj, skb->data + 2, skb->len - 2);
if (len <= 0) {
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: VJ decompression error\n");
goto err;
}
len += 2;
if (len > skb->len)
skb_put(skb, len - skb->len);
else if (len < skb->len)
skb_trim(skb, len);
proto = PPP_IP;
break;
case PPP_VJC_UNCOMP:
if (!ppp->vj || (ppp->flags & SC_REJ_COMP_TCP))
goto err;
/* Until we fix the decompressor need to make sure
* data portion is linear.
*/
if (!pskb_may_pull(skb, skb->len))
goto err;
if (slhc_remember(ppp->vj, skb->data + 2, skb->len - 2) <= 0) {
netdev_err(ppp->dev, "PPP: VJ uncompressed error\n");
goto err;
}
proto = PPP_IP;
break;
case PPP_CCP:
ppp_ccp_peek(ppp, skb, 1);
break;
}
++ppp->stats64.rx_packets;
ppp->stats64.rx_bytes += skb->len - 2;
npi = proto_to_npindex(proto);
if (npi < 0) {
/* control or unknown frame - pass it to pppd */
skb_queue_tail(&ppp->file.rq, skb);
/* limit queue length by dropping old frames */
while (ppp->file.rq.qlen > PPP_MAX_RQLEN &&
(skb = skb_dequeue(&ppp->file.rq)))
kfree_skb(skb);
/* wake up any process polling or blocking on read */
wake_up_interruptible(&ppp->file.rwait);
} else {
/* network protocol frame - give it to the kernel */
#ifdef CONFIG_PPP_FILTER
/* check if the packet passes the pass and active filters */
/* the filter instructions are constructed assuming
a four-byte PPP header on each packet */
if (ppp->pass_filter || ppp->active_filter) {
if (skb_unclone(skb, GFP_ATOMIC))
goto err;
*(u8 *)skb_push(skb, 2) = 0;
if (ppp->pass_filter &&
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
BPF_PROG_RUN(ppp->pass_filter, skb) == 0) {
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: inbound frame "
"not passed\n");
kfree_skb(skb);
return;
}
if (!(ppp->active_filter &&
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
BPF_PROG_RUN(ppp->active_filter, skb) == 0))
ppp->last_recv = jiffies;
__skb_pull(skb, 2);
} else
#endif /* CONFIG_PPP_FILTER */
ppp->last_recv = jiffies;
if ((ppp->dev->flags & IFF_UP) == 0 ||
ppp->npmode[npi] != NPMODE_PASS) {
kfree_skb(skb);
} else {
/* chop off protocol */
skb_pull_rcsum(skb, 2);
skb->dev = ppp->dev;
skb->protocol = htons(npindex_to_ethertype[npi]);
skb_reset_mac_header(skb);
skb_scrub_packet(skb, !net_eq(ppp->ppp_net,
dev_net(ppp->dev)));
netif_rx(skb);
}
}
return;
err:
kfree_skb(skb);
ppp_receive_error(ppp);
}
static struct sk_buff *
ppp_decompress_frame(struct ppp *ppp, struct sk_buff *skb)
{
int proto = PPP_PROTO(skb);
struct sk_buff *ns;
int len;
/* Until we fix all the decompressor's need to make sure
* data portion is linear.
*/
if (!pskb_may_pull(skb, skb->len))
goto err;
if (proto == PPP_COMP) {
int obuff_size;
switch(ppp->rcomp->compress_proto) {
case CI_MPPE:
obuff_size = ppp->mru + PPP_HDRLEN + 1;
break;
default:
obuff_size = ppp->mru + PPP_HDRLEN;
break;
}
ns = dev_alloc_skb(obuff_size);
if (!ns) {
netdev_err(ppp->dev, "ppp_decompress_frame: "
"no memory\n");
goto err;
}
/* the decompressor still expects the A/C bytes in the hdr */
len = ppp->rcomp->decompress(ppp->rc_state, skb->data - 2,
skb->len + 2, ns->data, obuff_size);
if (len < 0) {
/* Pass the compressed frame to pppd as an
error indication. */
if (len == DECOMP_FATALERROR)
ppp->rstate |= SC_DC_FERROR;
kfree_skb(ns);
goto err;
}
consume_skb(skb);
skb = ns;
skb_put(skb, len);
skb_pull(skb, 2); /* pull off the A/C bytes */
/* Don't call __ppp_decompress_proto() here, but instead rely on
* corresponding algo (mppe/bsd/deflate) to decompress it.
*/
} else {
/* Uncompressed frame - pass to decompressor so it
can update its dictionary if necessary. */
if (ppp->rcomp->incomp)
ppp->rcomp->incomp(ppp->rc_state, skb->data - 2,
skb->len + 2);
}
return skb;
err:
ppp->rstate |= SC_DC_ERROR;
ppp_receive_error(ppp);
return skb;
}
#ifdef CONFIG_PPP_MULTILINK
/*
* Receive a multilink frame.
* We put it on the reconstruction queue and then pull off
* as many completed frames as we can.
*/
static void
ppp_receive_mp_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
{
u32 mask, seq;
struct channel *ch;
int mphdrlen = (ppp->flags & SC_MP_SHORTSEQ)? MPHDRLEN_SSN: MPHDRLEN;
if (!pskb_may_pull(skb, mphdrlen + 1) || ppp->mrru == 0)
goto err; /* no good, throw it away */
/* Decode sequence number and begin/end bits */
if (ppp->flags & SC_MP_SHORTSEQ) {
seq = ((skb->data[2] & 0x0f) << 8) | skb->data[3];
mask = 0xfff;
} else {
seq = (skb->data[3] << 16) | (skb->data[4] << 8)| skb->data[5];
mask = 0xffffff;
}
PPP_MP_CB(skb)->BEbits = skb->data[2];
skb_pull(skb, mphdrlen); /* pull off PPP and MP headers */
/*
* Do protocol ID decompression on the first fragment of each packet.
* We have to do that here, because ppp_receive_nonmp_frame() expects
* decompressed protocol field.
*/
if (PPP_MP_CB(skb)->BEbits & B)
__ppp_decompress_proto(skb);
/*
* Expand sequence number to 32 bits, making it as close
* as possible to ppp->minseq.
*/
seq |= ppp->minseq & ~mask;
if ((int)(ppp->minseq - seq) > (int)(mask >> 1))
seq += mask + 1;
else if ((int)(seq - ppp->minseq) > (int)(mask >> 1))
seq -= mask + 1; /* should never happen */
PPP_MP_CB(skb)->sequence = seq;
pch->lastseq = seq;
/*
* If this packet comes before the next one we were expecting,
* drop it.
*/
if (seq_before(seq, ppp->nextseq)) {
kfree_skb(skb);
++ppp->dev->stats.rx_dropped;
ppp_receive_error(ppp);
return;
}
/*
* Reevaluate minseq, the minimum over all channels of the
* last sequence number received on each channel. Because of
* the increasing sequence number rule, we know that any fragment
* before `minseq' which hasn't arrived is never going to arrive.
* The list of channels can't change because we have the receive
* side of the ppp unit locked.
*/
list_for_each_entry(ch, &ppp->channels, clist) {
if (seq_before(ch->lastseq, seq))
seq = ch->lastseq;
}
if (seq_before(ppp->minseq, seq))
ppp->minseq = seq;
/* Put the fragment on the reconstruction queue */
ppp_mp_insert(ppp, skb);
/* If the queue is getting long, don't wait any longer for packets
before the start of the queue. */
if (skb_queue_len(&ppp->mrq) >= PPP_MP_MAX_QLEN) {
struct sk_buff *mskb = skb_peek(&ppp->mrq);
if (seq_before(ppp->minseq, PPP_MP_CB(mskb)->sequence))
ppp->minseq = PPP_MP_CB(mskb)->sequence;
}
/* Pull completed packets off the queue and receive them. */
while ((skb = ppp_mp_reconstruct(ppp))) {
if (pskb_may_pull(skb, 2))
ppp_receive_nonmp_frame(ppp, skb);
else {
++ppp->dev->stats.rx_length_errors;
kfree_skb(skb);
ppp_receive_error(ppp);
}
}
return;
err:
kfree_skb(skb);
ppp_receive_error(ppp);
}
/*
* Insert a fragment on the MP reconstruction queue.
* The queue is ordered by increasing sequence number.
*/
static void
ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb)
{
struct sk_buff *p;
struct sk_buff_head *list = &ppp->mrq;
u32 seq = PPP_MP_CB(skb)->sequence;
/* N.B. we don't need to lock the list lock because we have the
ppp unit receive-side lock. */
skb_queue_walk(list, p) {
if (seq_before(seq, PPP_MP_CB(p)->sequence))
break;
}
__skb_queue_before(list, p, skb);
}
/*
* Reconstruct a packet from the MP fragment queue.
* We go through increasing sequence numbers until we find a
* complete packet, or we get to the sequence number for a fragment
* which hasn't arrived but might still do so.
*/
static struct sk_buff *
ppp_mp_reconstruct(struct ppp *ppp)
{
u32 seq = ppp->nextseq;
u32 minseq = ppp->minseq;
struct sk_buff_head *list = &ppp->mrq;
struct sk_buff *p, *tmp;
struct sk_buff *head, *tail;
struct sk_buff *skb = NULL;
int lost = 0, len = 0;
if (ppp->mrru == 0) /* do nothing until mrru is set */
return NULL;
head = __skb_peek(list);
tail = NULL;
skb_queue_walk_safe(list, p, tmp) {
again:
if (seq_before(PPP_MP_CB(p)->sequence, seq)) {
/* this can't happen, anyway ignore the skb */
netdev_err(ppp->dev, "ppp_mp_reconstruct bad "
"seq %u < %u\n",
PPP_MP_CB(p)->sequence, seq);
__skb_unlink(p, list);
kfree_skb(p);
continue;
}
if (PPP_MP_CB(p)->sequence != seq) {
u32 oldseq;
/* Fragment `seq' is missing. If it is after
minseq, it might arrive later, so stop here. */
if (seq_after(seq, minseq))
break;
/* Fragment `seq' is lost, keep going. */
lost = 1;
oldseq = seq;
seq = seq_before(minseq, PPP_MP_CB(p)->sequence)?
minseq + 1: PPP_MP_CB(p)->sequence;
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"lost frag %u..%u\n",
oldseq, seq-1);
goto again;
}
/*
* At this point we know that all the fragments from
* ppp->nextseq to seq are either present or lost.
* Also, there are no complete packets in the queue
* that have no missing fragments and end before this
* fragment.
*/
/* B bit set indicates this fragment starts a packet */
if (PPP_MP_CB(p)->BEbits & B) {
head = p;
lost = 0;
len = 0;
}
len += p->len;
/* Got a complete packet yet? */
if (lost == 0 && (PPP_MP_CB(p)->BEbits & E) &&
(PPP_MP_CB(head)->BEbits & B)) {
if (len > ppp->mrru + 2) {
++ppp->dev->stats.rx_length_errors;
netdev_printk(KERN_DEBUG, ppp->dev,
"PPP: reconstructed packet"
" is too long (%d)\n", len);
} else {
tail = p;
break;
}
ppp->nextseq = seq + 1;
}
/*
* If this is the ending fragment of a packet,
* and we haven't found a complete valid packet yet,
* we can discard up to and including this fragment.
*/
if (PPP_MP_CB(p)->BEbits & E) {
struct sk_buff *tmp2;
skb_queue_reverse_walk_from_safe(list, p, tmp2) {
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"discarding frag %u\n",
PPP_MP_CB(p)->sequence);
__skb_unlink(p, list);
kfree_skb(p);
}
head = skb_peek(list);
if (!head)
break;
}
++seq;
}
/* If we have a complete packet, copy it all into one skb. */
if (tail != NULL) {
/* If we have discarded any fragments,
signal a receive error. */
if (PPP_MP_CB(head)->sequence != ppp->nextseq) {
skb_queue_walk_safe(list, p, tmp) {
if (p == head)
break;
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
"discarding frag %u\n",
PPP_MP_CB(p)->sequence);
__skb_unlink(p, list);
kfree_skb(p);
}
if (ppp->debug & 1)
netdev_printk(KERN_DEBUG, ppp->dev,
" missed pkts %u..%u\n",
ppp->nextseq,
PPP_MP_CB(head)->sequence-1);
++ppp->dev->stats.rx_dropped;
ppp_receive_error(ppp);
}
skb = head;
if (head != tail) {
struct sk_buff **fragpp = &skb_shinfo(skb)->frag_list;
p = skb_queue_next(list, head);
__skb_unlink(skb, list);
skb_queue_walk_from_safe(list, p, tmp) {
__skb_unlink(p, list);
*fragpp = p;
p->next = NULL;
fragpp = &p->next;
skb->len += p->len;
skb->data_len += p->len;
skb->truesize += p->truesize;
if (p == tail)
break;
}
} else {
__skb_unlink(skb, list);
}
ppp->nextseq = PPP_MP_CB(tail)->sequence + 1;
}
return skb;
}
#endif /* CONFIG_PPP_MULTILINK */
/*
* Channel interface.
*/
/* Create a new, unattached ppp channel. */
int ppp_register_channel(struct ppp_channel *chan)
{
return ppp_register_net_channel(current->nsproxy->net_ns, chan);
}
/* Create a new, unattached ppp channel for specified net. */
int ppp_register_net_channel(struct net *net, struct ppp_channel *chan)
{
struct channel *pch;
struct ppp_net *pn;
pch = kzalloc(sizeof(struct channel), GFP_KERNEL);
if (!pch)
return -ENOMEM;
pn = ppp_pernet(net);
pch->ppp = NULL;
pch->chan = chan;
ppp: take reference on channels netns Let channels hold a reference on their network namespace. Some channel types, like ppp_async and ppp_synctty, can have their userspace controller running in a different namespace. Therefore they can't rely on them to preclude their netns from being removed from under them. ================================================================== BUG: KASAN: use-after-free in ppp_unregister_channel+0x372/0x3a0 at addr ffff880064e217e0 Read of size 8 by task syz-executor/11581 ============================================================================= BUG net_namespace (Not tainted): kasan: bad access detected ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in copy_net_ns+0x6b/0x1a0 age=92569 cpu=3 pid=6906 [< none >] ___slab_alloc+0x4c7/0x500 kernel/mm/slub.c:2440 [< none >] __slab_alloc+0x4c/0x90 kernel/mm/slub.c:2469 [< inline >] slab_alloc_node kernel/mm/slub.c:2532 [< inline >] slab_alloc kernel/mm/slub.c:2574 [< none >] kmem_cache_alloc+0x23a/0x2b0 kernel/mm/slub.c:2579 [< inline >] kmem_cache_zalloc kernel/include/linux/slab.h:597 [< inline >] net_alloc kernel/net/core/net_namespace.c:325 [< none >] copy_net_ns+0x6b/0x1a0 kernel/net/core/net_namespace.c:360 [< none >] create_new_namespaces+0x2f6/0x610 kernel/kernel/nsproxy.c:95 [< none >] copy_namespaces+0x297/0x320 kernel/kernel/nsproxy.c:150 [< none >] copy_process.part.35+0x1bf4/0x5760 kernel/kernel/fork.c:1451 [< inline >] copy_process kernel/kernel/fork.c:1274 [< none >] _do_fork+0x1bc/0xcb0 kernel/kernel/fork.c:1723 [< inline >] SYSC_clone kernel/kernel/fork.c:1832 [< none >] SyS_clone+0x37/0x50 kernel/kernel/fork.c:1826 [< none >] entry_SYSCALL_64_fastpath+0x16/0x7a kernel/arch/x86/entry/entry_64.S:185 INFO: Freed in net_drop_ns+0x67/0x80 age=575 cpu=2 pid=2631 [< none >] __slab_free+0x1fc/0x320 kernel/mm/slub.c:2650 [< inline >] slab_free kernel/mm/slub.c:2805 [< none >] kmem_cache_free+0x2a0/0x330 kernel/mm/slub.c:2814 [< inline >] net_free kernel/net/core/net_namespace.c:341 [< none >] net_drop_ns+0x67/0x80 kernel/net/core/net_namespace.c:348 [< none >] cleanup_net+0x4e5/0x600 kernel/net/core/net_namespace.c:448 [< none >] process_one_work+0x794/0x1440 kernel/kernel/workqueue.c:2036 [< none >] worker_thread+0xdb/0xfc0 kernel/kernel/workqueue.c:2170 [< none >] kthread+0x23f/0x2d0 kernel/drivers/block/aoe/aoecmd.c:1303 [< none >] ret_from_fork+0x3f/0x70 kernel/arch/x86/entry/entry_64.S:468 INFO: Slab 0xffffea0001938800 objects=3 used=0 fp=0xffff880064e20000 flags=0x5fffc0000004080 INFO: Object 0xffff880064e20000 @offset=0 fp=0xffff880064e24200 CPU: 1 PID: 11581 Comm: syz-executor Tainted: G B 4.4.0+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 00000000ffffffff ffff8800662c7790 ffffffff8292049d ffff88003e36a300 ffff880064e20000 ffff880064e20000 ffff8800662c77c0 ffffffff816f2054 ffff88003e36a300 ffffea0001938800 ffff880064e20000 0000000000000000 Call Trace: [< inline >] __dump_stack kernel/lib/dump_stack.c:15 [<ffffffff8292049d>] dump_stack+0x6f/0xa2 kernel/lib/dump_stack.c:50 [<ffffffff816f2054>] print_trailer+0xf4/0x150 kernel/mm/slub.c:654 [<ffffffff816f875f>] object_err+0x2f/0x40 kernel/mm/slub.c:661 [< inline >] print_address_description kernel/mm/kasan/report.c:138 [<ffffffff816fb0c5>] kasan_report_error+0x215/0x530 kernel/mm/kasan/report.c:236 [< inline >] kasan_report kernel/mm/kasan/report.c:259 [<ffffffff816fb4de>] __asan_report_load8_noabort+0x3e/0x40 kernel/mm/kasan/report.c:280 [< inline >] ? ppp_pernet kernel/include/linux/compiler.h:218 [<ffffffff83ad71b2>] ? ppp_unregister_channel+0x372/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392 [< inline >] ppp_pernet kernel/include/linux/compiler.h:218 [<ffffffff83ad71b2>] ppp_unregister_channel+0x372/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392 [< inline >] ? ppp_pernet kernel/drivers/net/ppp/ppp_generic.c:293 [<ffffffff83ad6f26>] ? ppp_unregister_channel+0xe6/0x3a0 kernel/drivers/net/ppp/ppp_generic.c:2392 [<ffffffff83ae18f3>] ppp_asynctty_close+0xa3/0x130 kernel/drivers/net/ppp/ppp_async.c:241 [<ffffffff83ae1850>] ? async_lcp_peek+0x5b0/0x5b0 kernel/drivers/net/ppp/ppp_async.c:1000 [<ffffffff82c33239>] tty_ldisc_close.isra.1+0x99/0xe0 kernel/drivers/tty/tty_ldisc.c:478 [<ffffffff82c332c0>] tty_ldisc_kill+0x40/0x170 kernel/drivers/tty/tty_ldisc.c:744 [<ffffffff82c34943>] tty_ldisc_release+0x1b3/0x260 kernel/drivers/tty/tty_ldisc.c:772 [<ffffffff82c1ef21>] tty_release+0xac1/0x13e0 kernel/drivers/tty/tty_io.c:1901 [<ffffffff82c1e460>] ? release_tty+0x320/0x320 kernel/drivers/tty/tty_io.c:1688 [<ffffffff8174de36>] __fput+0x236/0x780 kernel/fs/file_table.c:208 [<ffffffff8174e405>] ____fput+0x15/0x20 kernel/fs/file_table.c:244 [<ffffffff813595ab>] task_work_run+0x16b/0x200 kernel/kernel/task_work.c:115 [< inline >] exit_task_work kernel/include/linux/task_work.h:21 [<ffffffff81307105>] do_exit+0x8b5/0x2c60 kernel/kernel/exit.c:750 [<ffffffff813fdd20>] ? debug_check_no_locks_freed+0x290/0x290 kernel/kernel/locking/lockdep.c:4123 [<ffffffff81306850>] ? mm_update_next_owner+0x6f0/0x6f0 kernel/kernel/exit.c:357 [<ffffffff813215e6>] ? __dequeue_signal+0x136/0x470 kernel/kernel/signal.c:550 [<ffffffff8132067b>] ? recalc_sigpending_tsk+0x13b/0x180 kernel/kernel/signal.c:145 [<ffffffff81309628>] do_group_exit+0x108/0x330 kernel/kernel/exit.c:880 [<ffffffff8132b9d4>] get_signal+0x5e4/0x14f0 kernel/kernel/signal.c:2307 [< inline >] ? kretprobe_table_lock kernel/kernel/kprobes.c:1113 [<ffffffff8151d355>] ? kprobe_flush_task+0xb5/0x450 kernel/kernel/kprobes.c:1158 [<ffffffff8115f7d3>] do_signal+0x83/0x1c90 kernel/arch/x86/kernel/signal.c:712 [<ffffffff8151d2a0>] ? recycle_rp_inst+0x310/0x310 kernel/include/linux/list.h:655 [<ffffffff8115f750>] ? setup_sigcontext+0x780/0x780 kernel/arch/x86/kernel/signal.c:165 [<ffffffff81380864>] ? finish_task_switch+0x424/0x5f0 kernel/kernel/sched/core.c:2692 [< inline >] ? finish_lock_switch kernel/kernel/sched/sched.h:1099 [<ffffffff81380560>] ? finish_task_switch+0x120/0x5f0 kernel/kernel/sched/core.c:2678 [< inline >] ? context_switch kernel/kernel/sched/core.c:2807 [<ffffffff85d794e9>] ? __schedule+0x919/0x1bd0 kernel/kernel/sched/core.c:3283 [<ffffffff81003901>] exit_to_usermode_loop+0xf1/0x1a0 kernel/arch/x86/entry/common.c:247 [< inline >] prepare_exit_to_usermode kernel/arch/x86/entry/common.c:282 [<ffffffff810062ef>] syscall_return_slowpath+0x19f/0x210 kernel/arch/x86/entry/common.c:344 [<ffffffff85d88022>] int_ret_from_sys_call+0x25/0x9f kernel/arch/x86/entry/entry_64.S:281 Memory state around the buggy address: ffff880064e21680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880064e21700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff880064e21780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880064e21800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880064e21880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 273ec51dd7ce ("net: ppp_generic - introduce net-namespace functionality v2") Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-23 22:38:55 +07:00
pch->chan_net = get_net(net);
chan->ppp = pch;
init_ppp_file(&pch->file, CHANNEL);
pch->file.hdrlen = chan->hdrlen;
#ifdef CONFIG_PPP_MULTILINK
pch->lastseq = -1;
#endif /* CONFIG_PPP_MULTILINK */
init_rwsem(&pch->chan_sem);
spin_lock_init(&pch->downl);
rwlock_init(&pch->upl);
spin_lock_bh(&pn->all_channels_lock);
pch->file.index = ++pn->last_channel_index;
list_add(&pch->list, &pn->new_channels);
atomic_inc(&channel_count);
spin_unlock_bh(&pn->all_channels_lock);
return 0;
}
/*
* Return the index of a channel.
*/
int ppp_channel_index(struct ppp_channel *chan)
{
struct channel *pch = chan->ppp;
if (pch)
return pch->file.index;
return -1;
}
/*
* Return the PPP unit number to which a channel is connected.
*/
int ppp_unit_number(struct ppp_channel *chan)
{
struct channel *pch = chan->ppp;
int unit = -1;
if (pch) {
read_lock_bh(&pch->upl);
if (pch->ppp)
unit = pch->ppp->file.index;
read_unlock_bh(&pch->upl);
}
return unit;
}
/*
* Return the PPP device interface name of a channel.
*/
char *ppp_dev_name(struct ppp_channel *chan)
{
struct channel *pch = chan->ppp;
char *name = NULL;
if (pch) {
read_lock_bh(&pch->upl);
if (pch->ppp && pch->ppp->dev)
name = pch->ppp->dev->name;
read_unlock_bh(&pch->upl);
}
return name;
}
/*
* Disconnect a channel from the generic layer.
* This must be called in process context.
*/
void
ppp_unregister_channel(struct ppp_channel *chan)
{
struct channel *pch = chan->ppp;
struct ppp_net *pn;
if (!pch)
return; /* should never happen */
chan->ppp = NULL;
/*
* This ensures that we have returned from any calls into the
* the channel's start_xmit or ioctl routine before we proceed.
*/
down_write(&pch->chan_sem);
spin_lock_bh(&pch->downl);
pch->chan = NULL;
spin_unlock_bh(&pch->downl);
up_write(&pch->chan_sem);
ppp_disconnect_channel(pch);
pn = ppp_pernet(pch->chan_net);
spin_lock_bh(&pn->all_channels_lock);
list_del(&pch->list);
spin_unlock_bh(&pn->all_channels_lock);
pch->file.dead = 1;
wake_up_interruptible(&pch->file.rwait);
if (refcount_dec_and_test(&pch->file.refcnt))
ppp_destroy_channel(pch);
}
/*
* Callback from a channel when it can accept more to transmit.
* This should be called at BH/softirq level, not interrupt level.
*/
void
ppp_output_wakeup(struct ppp_channel *chan)
{
struct channel *pch = chan->ppp;
if (!pch)
return;
ppp_channel_push(pch);
}
/*
* Compression control.
*/
/* Process the PPPIOCSCOMPRESS ioctl. */
static int
ppp_set_compress(struct ppp *ppp, unsigned long arg)
{
int err;
struct compressor *cp, *ocomp;
struct ppp_option_data data;
void *state, *ostate;
unsigned char ccp_option[CCP_MAX_OPTION_LENGTH];
err = -EFAULT;
if (copy_from_user(&data, (void __user *) arg, sizeof(data)))
goto out;
if (data.length > CCP_MAX_OPTION_LENGTH)
goto out;
if (copy_from_user(ccp_option, (void __user *) data.ptr, data.length))
goto out;
err = -EINVAL;
if (data.length < 2 || ccp_option[1] < 2 || ccp_option[1] > data.length)
goto out;
cp = try_then_request_module(
find_compressor(ccp_option[0]),
"ppp-compress-%d", ccp_option[0]);
if (!cp)
goto out;
err = -ENOBUFS;
if (data.transmit) {
state = cp->comp_alloc(ccp_option, data.length);
if (state) {
ppp_xmit_lock(ppp);
ppp->xstate &= ~SC_COMP_RUN;
ocomp = ppp->xcomp;
ostate = ppp->xc_state;
ppp->xcomp = cp;
ppp->xc_state = state;
ppp_xmit_unlock(ppp);
if (ostate) {
ocomp->comp_free(ostate);
module_put(ocomp->owner);
}
err = 0;
} else
module_put(cp->owner);
} else {
state = cp->decomp_alloc(ccp_option, data.length);
if (state) {
ppp_recv_lock(ppp);
ppp->rstate &= ~SC_DECOMP_RUN;
ocomp = ppp->rcomp;
ostate = ppp->rc_state;
ppp->rcomp = cp;
ppp->rc_state = state;
ppp_recv_unlock(ppp);
if (ostate) {
ocomp->decomp_free(ostate);
module_put(ocomp->owner);
}
err = 0;
} else
module_put(cp->owner);
}
out:
return err;
}
/*
* Look at a CCP packet and update our state accordingly.
* We assume the caller has the xmit or recv path locked.
*/
static void
ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound)
{
unsigned char *dp;
int len;
if (!pskb_may_pull(skb, CCP_HDRLEN + 2))
return; /* no header */
dp = skb->data + 2;
switch (CCP_CODE(dp)) {
case CCP_CONFREQ:
/* A ConfReq starts negotiation of compression
* in one direction of transmission,
* and hence brings it down...but which way?
*
* Remember:
* A ConfReq indicates what the sender would like to receive
*/
if(inbound)
/* He is proposing what I should send */
ppp->xstate &= ~SC_COMP_RUN;
else
/* I am proposing to what he should send */
ppp->rstate &= ~SC_DECOMP_RUN;
break;
case CCP_TERMREQ:
case CCP_TERMACK:
/*
* CCP is going down, both directions of transmission
*/
ppp->rstate &= ~SC_DECOMP_RUN;
ppp->xstate &= ~SC_COMP_RUN;
break;
case CCP_CONFACK:
if ((ppp->flags & (SC_CCP_OPEN | SC_CCP_UP)) != SC_CCP_OPEN)
break;
len = CCP_LENGTH(dp);
if (!pskb_may_pull(skb, len + 2))
return; /* too short */
dp += CCP_HDRLEN;
len -= CCP_HDRLEN;
if (len < CCP_OPT_MINLEN || len < CCP_OPT_LENGTH(dp))
break;
if (inbound) {
/* we will start receiving compressed packets */
if (!ppp->rc_state)
break;
if (ppp->rcomp->decomp_init(ppp->rc_state, dp, len,
ppp->file.index, 0, ppp->mru, ppp->debug)) {
ppp->rstate |= SC_DECOMP_RUN;
ppp->rstate &= ~(SC_DC_ERROR | SC_DC_FERROR);
}
} else {
/* we will soon start sending compressed packets */
if (!ppp->xc_state)
break;
if (ppp->xcomp->comp_init(ppp->xc_state, dp, len,
ppp->file.index, 0, ppp->debug))
ppp->xstate |= SC_COMP_RUN;
}
break;
case CCP_RESETACK:
/* reset the [de]compressor */
if ((ppp->flags & SC_CCP_UP) == 0)
break;
if (inbound) {
if (ppp->rc_state && (ppp->rstate & SC_DECOMP_RUN)) {
ppp->rcomp->decomp_reset(ppp->rc_state);
ppp->rstate &= ~SC_DC_ERROR;
}
} else {
if (ppp->xc_state && (ppp->xstate & SC_COMP_RUN))
ppp->xcomp->comp_reset(ppp->xc_state);
}
break;
}
}
/* Free up compression resources. */
static void
ppp_ccp_closed(struct ppp *ppp)
{
void *xstate, *rstate;
struct compressor *xcomp, *rcomp;
ppp_lock(ppp);
ppp->flags &= ~(SC_CCP_OPEN | SC_CCP_UP);
ppp->xstate = 0;
xcomp = ppp->xcomp;
xstate = ppp->xc_state;
ppp->xc_state = NULL;
ppp->rstate = 0;
rcomp = ppp->rcomp;
rstate = ppp->rc_state;
ppp->rc_state = NULL;
ppp_unlock(ppp);
if (xstate) {
xcomp->comp_free(xstate);
module_put(xcomp->owner);
}
if (rstate) {
rcomp->decomp_free(rstate);
module_put(rcomp->owner);
}
}
/* List of compressors. */
static LIST_HEAD(compressor_list);
static DEFINE_SPINLOCK(compressor_list_lock);
struct compressor_entry {
struct list_head list;
struct compressor *comp;
};
static struct compressor_entry *
find_comp_entry(int proto)
{
struct compressor_entry *ce;
list_for_each_entry(ce, &compressor_list, list) {
if (ce->comp->compress_proto == proto)
return ce;
}
return NULL;
}
/* Register a compressor */
int
ppp_register_compressor(struct compressor *cp)
{
struct compressor_entry *ce;
int ret;
spin_lock(&compressor_list_lock);
ret = -EEXIST;
if (find_comp_entry(cp->compress_proto))
goto out;
ret = -ENOMEM;
ce = kmalloc(sizeof(struct compressor_entry), GFP_ATOMIC);
if (!ce)
goto out;
ret = 0;
ce->comp = cp;
list_add(&ce->list, &compressor_list);
out:
spin_unlock(&compressor_list_lock);
return ret;
}
/* Unregister a compressor */
void
ppp_unregister_compressor(struct compressor *cp)
{
struct compressor_entry *ce;
spin_lock(&compressor_list_lock);
ce = find_comp_entry(cp->compress_proto);
if (ce && ce->comp == cp) {
list_del(&ce->list);
kfree(ce);
}
spin_unlock(&compressor_list_lock);
}
/* Find a compressor. */
static struct compressor *
find_compressor(int type)
{
struct compressor_entry *ce;
struct compressor *cp = NULL;
spin_lock(&compressor_list_lock);
ce = find_comp_entry(type);
if (ce) {
cp = ce->comp;
if (!try_module_get(cp->owner))
cp = NULL;
}
spin_unlock(&compressor_list_lock);
return cp;
}
/*
* Miscelleneous stuff.
*/
static void
ppp_get_stats(struct ppp *ppp, struct ppp_stats *st)
{
struct slcompress *vj = ppp->vj;
memset(st, 0, sizeof(*st));
st->p.ppp_ipackets = ppp->stats64.rx_packets;
st->p.ppp_ierrors = ppp->dev->stats.rx_errors;
st->p.ppp_ibytes = ppp->stats64.rx_bytes;
st->p.ppp_opackets = ppp->stats64.tx_packets;
st->p.ppp_oerrors = ppp->dev->stats.tx_errors;
st->p.ppp_obytes = ppp->stats64.tx_bytes;
if (!vj)
return;
st->vj.vjs_packets = vj->sls_o_compressed + vj->sls_o_uncompressed;
st->vj.vjs_compressed = vj->sls_o_compressed;
st->vj.vjs_searches = vj->sls_o_searches;
st->vj.vjs_misses = vj->sls_o_misses;
st->vj.vjs_errorin = vj->sls_i_error;
st->vj.vjs_tossed = vj->sls_i_tossed;
st->vj.vjs_uncompressedin = vj->sls_i_uncompressed;
st->vj.vjs_compressedin = vj->sls_i_compressed;
}
/*
* Stuff for handling the lists of ppp units and channels
* and for initialization.
*/
/*
* Create a new ppp interface unit. Fails if it can't allocate memory
* or if there is already a unit with the requested number.
* unit == -1 means allocate a new number.
*/
static int ppp_create_interface(struct net *net, struct file *file, int *unit)
{
struct ppp_config conf = {
.file = file,
.unit = *unit,
.ifname_is_set = false,
};
struct net_device *dev;
struct ppp *ppp;
int err;
dev = alloc_netdev(sizeof(struct ppp), "", NET_NAME_ENUM, ppp_setup);
if (!dev) {
err = -ENOMEM;
goto err;
}
dev_net_set(dev, net);
dev->rtnl_link_ops = &ppp_link_ops;
ppp: fix lockdep splat in ppp_dev_uninit() ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection. ppp_create_interface() must then lock these mutexes in that same order to avoid possible deadlock. [ 120.880011] ====================================================== [ 120.880011] [ INFO: possible circular locking dependency detected ] [ 120.880011] 4.2.0 #1 Not tainted [ 120.880011] ------------------------------------------------------- [ 120.880011] ppp-apitest/15827 is trying to acquire lock: [ 120.880011] (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [ 120.880011] but task is already holding lock: [ 120.880011] (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14 [ 120.880011] [ 120.880011] which lock already depends on the new lock. [ 120.880011] [ 120.880011] [ 120.880011] the existing dependency chain (in reverse order) is: [ 120.880011] [ 120.880011] -> #1 (rtnl_mutex){+.+.+.}: [ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e [ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341 [ 120.880011] [<ffffffff812e4255>] rtnl_lock+0x12/0x14 [ 120.880011] [<ffffffff812d9d94>] register_netdev+0x11/0x27 [ 120.880011] [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic] [ 120.880011] [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532 [ 120.880011] [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d [ 120.880011] [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f [ 120.880011] [ 120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}: [ 120.880011] [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76 [ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e [ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341 [ 120.880011] [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252 [ 120.880011] [<ffffffff812d5381>] rollback_registered+0x29/0x38 [ 120.880011] [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77 [ 120.880011] [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic] [ 120.880011] [<ffffffff8112d9f6>] __fput+0xec/0x192 [ 120.880011] [<ffffffff8112dacc>] ____fput+0x9/0xb [ 120.880011] [<ffffffff8105447a>] task_work_run+0x66/0x80 [ 120.880011] [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7 [ 120.880011] [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104 [ 120.880011] [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f [ 120.880011] [ 120.880011] other info that might help us debug this: [ 120.880011] [ 120.880011] Possible unsafe locking scenario: [ 120.880011] [ 120.880011] CPU0 CPU1 [ 120.880011] ---- ---- [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] [ 120.880011] *** DEADLOCK *** Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 17:54:01 +07:00
rtnl_lock();
err = ppp_dev_configure(net, dev, &conf);
if (err < 0)
goto err_dev;
ppp = netdev_priv(dev);
*unit = ppp->file.index;
ppp: fix lockdep splat in ppp_dev_uninit() ppp_dev_uninit() locks all_ppp_mutex while under rtnl mutex protection. ppp_create_interface() must then lock these mutexes in that same order to avoid possible deadlock. [ 120.880011] ====================================================== [ 120.880011] [ INFO: possible circular locking dependency detected ] [ 120.880011] 4.2.0 #1 Not tainted [ 120.880011] ------------------------------------------------------- [ 120.880011] ppp-apitest/15827 is trying to acquire lock: [ 120.880011] (&pn->all_ppp_mutex){+.+.+.}, at: [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [ 120.880011] but task is already holding lock: [ 120.880011] (rtnl_mutex){+.+.+.}, at: [<ffffffff812e4255>] rtnl_lock+0x12/0x14 [ 120.880011] [ 120.880011] which lock already depends on the new lock. [ 120.880011] [ 120.880011] [ 120.880011] the existing dependency chain (in reverse order) is: [ 120.880011] [ 120.880011] -> #1 (rtnl_mutex){+.+.+.}: [ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e [ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341 [ 120.880011] [<ffffffff812e4255>] rtnl_lock+0x12/0x14 [ 120.880011] [<ffffffff812d9d94>] register_netdev+0x11/0x27 [ 120.880011] [<ffffffffa0147b17>] ppp_ioctl+0x289/0xc98 [ppp_generic] [ 120.880011] [<ffffffff8113b367>] do_vfs_ioctl+0x4ea/0x532 [ 120.880011] [<ffffffff8113b3fd>] SyS_ioctl+0x4e/0x7d [ 120.880011] [<ffffffff813ad7d7>] entry_SYSCALL_64_fastpath+0x12/0x6f [ 120.880011] [ 120.880011] -> #0 (&pn->all_ppp_mutex){+.+.+.}: [ 120.880011] [<ffffffff8107334e>] __lock_acquire+0xb07/0xe76 [ 120.880011] [<ffffffff81073a6f>] lock_acquire+0xcf/0x10e [ 120.880011] [<ffffffff813ab18a>] mutex_lock_nested+0x56/0x341 [ 120.880011] [<ffffffffa0145f56>] ppp_dev_uninit+0x64/0xb0 [ppp_generic] [ 120.880011] [<ffffffff812d5263>] rollback_registered_many+0x19e/0x252 [ 120.880011] [<ffffffff812d5381>] rollback_registered+0x29/0x38 [ 120.880011] [<ffffffff812d53fa>] unregister_netdevice_queue+0x6a/0x77 [ 120.880011] [<ffffffffa0146a94>] ppp_release+0x42/0x79 [ppp_generic] [ 120.880011] [<ffffffff8112d9f6>] __fput+0xec/0x192 [ 120.880011] [<ffffffff8112dacc>] ____fput+0x9/0xb [ 120.880011] [<ffffffff8105447a>] task_work_run+0x66/0x80 [ 120.880011] [<ffffffff81001801>] prepare_exit_to_usermode+0x8c/0xa7 [ 120.880011] [<ffffffff81001900>] syscall_return_slowpath+0xe4/0x104 [ 120.880011] [<ffffffff813ad931>] int_ret_from_sys_call+0x25/0x9f [ 120.880011] [ 120.880011] other info that might help us debug this: [ 120.880011] [ 120.880011] Possible unsafe locking scenario: [ 120.880011] [ 120.880011] CPU0 CPU1 [ 120.880011] ---- ---- [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] lock(rtnl_mutex); [ 120.880011] lock(&pn->all_ppp_mutex); [ 120.880011] [ 120.880011] *** DEADLOCK *** Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 17:54:01 +07:00
rtnl_unlock();
return 0;
err_dev:
rtnl_unlock();
free_netdev(dev);
err:
return err;
}
/*
* Initialize a ppp_file structure.
*/
static void
init_ppp_file(struct ppp_file *pf, int kind)
{
pf->kind = kind;
skb_queue_head_init(&pf->xq);
skb_queue_head_init(&pf->rq);
refcount_set(&pf->refcnt, 1);
init_waitqueue_head(&pf->rwait);
}
/*
* Free the memory used by a ppp unit. This is only called once
* there are no channels connected to the unit and no file structs
* that reference the unit.
*/
static void ppp_destroy_interface(struct ppp *ppp)
{
atomic_dec(&ppp_unit_count);
if (!ppp->file.dead || ppp->n_channels) {
/* "can't happen" */
netdev_err(ppp->dev, "ppp: destroying ppp struct %p "
"but dead=%d n_channels=%d !\n",
ppp, ppp->file.dead, ppp->n_channels);
return;
}
ppp_ccp_closed(ppp);
if (ppp->vj) {
slhc_free(ppp->vj);
ppp->vj = NULL;
}
skb_queue_purge(&ppp->file.xq);
skb_queue_purge(&ppp->file.rq);
#ifdef CONFIG_PPP_MULTILINK
skb_queue_purge(&ppp->mrq);
#endif /* CONFIG_PPP_MULTILINK */
#ifdef CONFIG_PPP_FILTER
if (ppp->pass_filter) {
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
bpf_prog_destroy(ppp->pass_filter);
ppp->pass_filter = NULL;
}
if (ppp->active_filter) {
net: filter: split 'struct sk_filter' into socket and bpf parts clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 10:34:16 +07:00
bpf_prog_destroy(ppp->active_filter);
ppp->active_filter = NULL;
}
#endif /* CONFIG_PPP_FILTER */
kfree_skb(ppp->xmit_pending);
free_percpu(ppp->xmit_recursion);
free_netdev(ppp->dev);
}
/*
* Locate an existing ppp unit.
* The caller should have locked the all_ppp_mutex.
*/
static struct ppp *
ppp_find_unit(struct ppp_net *pn, int unit)
{
return unit_find(&pn->units_idr, unit);
}
/*
* Locate an existing ppp channel.
* The caller should have locked the all_channels_lock.
* First we look in the new_channels list, then in the
* all_channels list. If found in the new_channels list,
* we move it to the all_channels list. This is for speed
* when we have a lot of channels in use.
*/
static struct channel *
ppp_find_channel(struct ppp_net *pn, int unit)
{
struct channel *pch;
list_for_each_entry(pch, &pn->new_channels, list) {
if (pch->file.index == unit) {
list_move(&pch->list, &pn->all_channels);
return pch;
}
}
list_for_each_entry(pch, &pn->all_channels, list) {
if (pch->file.index == unit)
return pch;
}
return NULL;
}
/*
* Connect a PPP channel to a PPP interface unit.
*/
static int
ppp_connect_channel(struct channel *pch, int unit)
{
struct ppp *ppp;
struct ppp_net *pn;
int ret = -ENXIO;
int hdrlen;
pn = ppp_pernet(pch->chan_net);
mutex_lock(&pn->all_ppp_mutex);
ppp = ppp_find_unit(pn, unit);
if (!ppp)
goto out;
write_lock_bh(&pch->upl);
ret = -EINVAL;
if (pch->ppp)
goto outl;
ppp_lock(ppp);
spin_lock_bh(&pch->downl);
if (!pch->chan) {
/* Don't connect unregistered channels */
spin_unlock_bh(&pch->downl);
ppp_unlock(ppp);
ret = -ENOTCONN;
goto outl;
}
spin_unlock_bh(&pch->downl);
if (pch->file.hdrlen > ppp->file.hdrlen)
ppp->file.hdrlen = pch->file.hdrlen;
hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */
if (hdrlen > ppp->dev->hard_header_len)
ppp->dev->hard_header_len = hdrlen;
list_add_tail(&pch->clist, &ppp->channels);
++ppp->n_channels;
pch->ppp = ppp;
refcount_inc(&ppp->file.refcnt);
ppp_unlock(ppp);
ret = 0;
outl:
write_unlock_bh(&pch->upl);
out:
mutex_unlock(&pn->all_ppp_mutex);
return ret;
}
/*
* Disconnect a channel from its ppp unit.
*/
static int
ppp_disconnect_channel(struct channel *pch)
{
struct ppp *ppp;
int err = -EINVAL;
write_lock_bh(&pch->upl);
ppp = pch->ppp;
pch->ppp = NULL;
write_unlock_bh(&pch->upl);
if (ppp) {
/* remove it from the ppp unit's list */
ppp_lock(ppp);
list_del(&pch->clist);
if (--ppp->n_channels == 0)
wake_up_interruptible(&ppp->file.rwait);
ppp_unlock(ppp);
if (refcount_dec_and_test(&ppp->file.refcnt))
ppp_destroy_interface(ppp);
err = 0;
}
return err;
}
/*
* Free up the resources used by a ppp channel.
*/
static void ppp_destroy_channel(struct channel *pch)
{
put_net(pch->chan_net);
pch->chan_net = NULL;
atomic_dec(&channel_count);
if (!pch->file.dead) {
/* "can't happen" */
pr_err("ppp: destroying undead channel %p !\n", pch);
return;
}
skb_queue_purge(&pch->file.xq);
skb_queue_purge(&pch->file.rq);
kfree(pch);
}
static void __exit ppp_cleanup(void)
{
/* should never happen */
if (atomic_read(&ppp_unit_count) || atomic_read(&channel_count))
pr_err("PPP: removing module but units remain!\n");
rtnl_link_unregister(&ppp_link_ops);
unregister_chrdev(PPP_MAJOR, "ppp");
device_destroy(ppp_class, MKDEV(PPP_MAJOR, 0));
class_destroy(ppp_class);
unregister_pernet_device(&ppp_net_ops);
}
/*
* Units handling. Caller must protect concurrent access
* by holding all_ppp_mutex
*/
/* associate pointer with specified number */
static int unit_set(struct idr *p, void *ptr, int n)
{
int unit;
unit = idr_alloc(p, ptr, n, n + 1, GFP_KERNEL);
if (unit == -ENOSPC)
unit = -EINVAL;
return unit;
}
/* get new free unit number and associate pointer with it */
static int unit_get(struct idr *p, void *ptr)
{
return idr_alloc(p, ptr, 0, 0, GFP_KERNEL);
}
/* put unit number back to a pool */
static void unit_put(struct idr *p, int n)
{
idr_remove(p, n);
}
/* get pointer associated with the number */
static void *unit_find(struct idr *p, int n)
{
return idr_find(p, n);
}
/* Module/initialization stuff */
module_init(ppp_init);
module_exit(ppp_cleanup);
EXPORT_SYMBOL(ppp_register_net_channel);
EXPORT_SYMBOL(ppp_register_channel);
EXPORT_SYMBOL(ppp_unregister_channel);
EXPORT_SYMBOL(ppp_channel_index);
EXPORT_SYMBOL(ppp_unit_number);
EXPORT_SYMBOL(ppp_dev_name);
EXPORT_SYMBOL(ppp_input);
EXPORT_SYMBOL(ppp_input_error);
EXPORT_SYMBOL(ppp_output_wakeup);
EXPORT_SYMBOL(ppp_register_compressor);
EXPORT_SYMBOL(ppp_unregister_compressor);
MODULE_LICENSE("GPL");
driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20 23:07:20 +07:00
MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
MODULE_ALIAS_RTNL_LINK("ppp");
driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-20 23:07:20 +07:00
MODULE_ALIAS("devname:ppp");