linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Anjali Singhai	232f47060a	i40e: Ioremap changes For future device support we do not want to map the whole CSR space since some of it is mapped by other drivers with different mapping methods. Note: As a side effect, the flash region (if exposed through the memory map) gets unmapped too since it follows the future use region. Change-ID: Ic729a2eacd692984220b1a415ff4fa0f98ea419a Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 07:15:28 -08:00
Jesse Brandeburg	5bbc330100	i40e/i40evf: Clean up some formatting and other things Fix some double blank lines and un-split a function declaration that all fits on one line. Also make i40e_get_priv_flags static. Change-ID: I11b5d25d1153a06b286d0d2f5d916d7727c58e4a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 06:39:42 -08:00
Catherine Sullivan	180204c79f	i40e: Add AOC PHY types to case statements Add the 10G and 40G AOC PHY types to the case statement in get_media_type and ethtool get_settings so that the correct information gets reported back to the user. Change-ID: I1b4849d22199a9acf7c8807166d0317c1faad375 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 06:02:11 -08:00
Greg Rose	5b86c5cf75	i40e: Fix ethtool offline test If the system administrator is requesting an offline diagnostic test using 'ethtool -t' then we should, you know, actually take the device offline before doing the testing. Change-ID: I6afa1cbfcc821c9ab6e6f47ed4d8dc2d8dd20e82 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 05:43:52 -08:00
Catherine Sullivan	088c4ee370	i40e: Reassign incorrect PHY type to fix a FW bug Some FW versions are incorrectly reporting a breakout cable as PHY type 0x3 when it should be 0x16 (I40E_PHY_TYPE_10GBASE_SFPP_CU). If we get this value back from FW and the version is < 4.40, reassign it to I40E_PHY_TYPE_10GBASE_SFPP_CU. Change-ID: Ibb41a0e3cd2c0753744e8553959240df6ed13ae8 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 05:25:18 -08:00
Jesse Brandeburg	9a660eeae2	i40e: fix XPS mask when resetting During resets (possibly caused by a Tx hang) the driver would accidentally clear the XPS mask for all queues back to 0. This caused higher CPU utilization and had some other performance impacts for transmit tests. Change-ID: I95f112432c9e643a153eaa31cd28cdcbfdd01831 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 04:57:33 -08:00
Jesse Brandeburg	ce7ca75176	i40e: use more portable sign extension Use automatic sign extension by replacing 0xffff... constants with ~(u64)0 or ~(u32)0. Change-ID: I73cab4cd2611795bb12e00f0f24fafaaee07457c Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 04:00:06 -08:00
Shannon Nelson	4f651a5b0a	i40e/i40evf: grab NVM devstarter version not image version 0x2A is the NVM version so it has useful data but it is per image version every image can have a different one. 0x18 is the dev starter version which all the images for release will have the same version. Of the two 0x18 is more useful and is what should be displayed. Change-ID: Idf493da13a42ab211e2de0bef287f5de51033cca Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 03:04:33 -08:00
Neerav Parikh	7589f65b32	i40e: Don't check operational or sync bit for App TLV In CEE mode the firmware does not set the operational status bit of the application TLV status as returned from the "Get CEE DCBX Oper Cfg" AQ command. This occurs whenever a DCBX configuration is changed. This is a workaround to remove the check for the operational and sync bits of the application TLV status till a firmware fix is provided. Change-ID: I1a31ff2fcadcb06feb5b55776a33593afc6ea176 Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 01:53:22 -08:00
Matt Jared	b84d5cd819	i40e: during LED interaction ignore activity LED src modes Modify our get and set LED functions so they ignore activity LEDs, as we are required to blink the link LEDs only. Change-ID: I647ea67a6fc95cbbab6e3cd01d81ec9ae096a9ad Signed-off-by: Matt Jared <matthew.a.jared@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 01:31:15 -08:00
Greg Rose	c668a12c7b	i40e: Fix NPAR Tx Scheduler init Recent changes to the driver initialization have caused the BW configurations to not take effect. We use a BW configuration read and write back to "kick" the Tx scheduler into action. Change-ID: I94ab377c58d3a3986e3de62b6c199be3fd2ee5e6 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-03-05 01:05:55 -08:00
Petri Gynther	66d06757d9	net: bcmgenet: simplify __bcmgenet_tx_reclaim() 1. Use c_index and ring->c_index to determine how many TxCBs/TxBDs are ready for cleanup - c_index = the current value of TDMA_CONS_INDEX - TDMA_CONS_INDEX is HW-incremented and auto-wraparound (0x0-0xFFFF) - ring->c_index = __bcmgenet_tx_reclaim() cleaned up to this point on the previous invocation 2. Add bcmgenet_tx_ring->clean_ptr - index of the next TxCB to be cleaned - incremented as TxCBs/TxBDs are processed - value always in range [ring->cb_ptr, ring->end_ptr] 3. Fix incrementing of dev->stats.tx_packets - should be incremented only when tx_cb_ptr->skb != NULL These changes simplify __bcmgenet_tx_reclaim(). Furthermore, Tx ring size can now be any value. With the old code, Tx ring size had to be a power-of-2: num_tx_bds = ring->size; c_index &= (num_tx_bds - 1); last_c_index &= (num_tx_bds - 1); Signed-off-by: Petri Gynther <pgynther@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:54:54 -05:00
David S. Miller	f93eb4ba0f	Merge branch 'fib_trie-next' Alexander Duyck says: ==================== ipv4/fib_trie: Cleanups to prepare for introduction of key vector This patch series is meant to mostly just clean up the fib_trie to prepare it for the introduction of the key_vector. As such there are a number of minor clean-ups such as reformatting the tnode to match the format once the key vector is introduced, some optimizations to drop the need for a leaf parent pointer, and some changes to remove duplication of effort such as the 2 look-ups that were essentially being done per node insertion. v2: Added code to cleanup idx >> n->bits and explain unsigned long logic Added code to prevent allocation when tnode size is larger than size_t ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:24 -05:00
Alexander Duyck	1de3d87bcd	fib_trie: Prevent allocating tnode if bits is too big for size_t This patch adds code to prevent us from attempting to allocate a tnode with a size larger than what can be represented by size_t. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	71e8b67d0f	fib_trie: Update last spot w/ idx >> n->bits code and explanation This change updates the fib_table_lookup function so that it is in sync with the fib_find_node function in terms of the explanation for the index check based on the bits value. I have also updated it from doing a mask to just doing a compare as I have found that seems to provide more options to the compiler as I have seen it turn this into a shift of the value and test under some circumstances. In addition I addressed one minor issue in which we kept computing the key ^ n->key when checking the fib aliases. I pulled the xor out of the loop in order to reduce the number of memory reads in the lookup. As a result we should save a couple cycles since the xor is only done once much earlier in the lookup. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	a7e5353123	fib_trie: Make fib_table rcu safe The fib_table was wrapped in several places with an rcu_read_lock/rcu_read_unlock however after looking over the code I found several spots where the tables were being accessed as just standard pointers without any protections. This change fixes that so that all of the proper protections are in place when accessing the table to take RCU replacement or removal of the table into account. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	41b489fd6c	fib_trie: move leaf and tnode to occupy the same spot in the key vector If we are going to compact the leaf and tnode we first need to make sure the fields are all in the same place. In that regard I am moving the leaf pointer which represents the fib_alias hash list to occupy what is currently the first key_vector pointer. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	d5d6487cb8	fib_trie: Update insert and delete to make use of tp from find_node This change makes it so that the insert and delete functions make use of the tnode pointer returned in the fib_find_node call. By doing this we will not have to rely on the parent pointer in the leaf which will be going away soon. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:18 -05:00
Alexander Duyck	d4a975e83f	fib_trie: Fib find node should return parent This change makes it so that the parent pointer is returned by reference in fib_find_node. By doing this I can use it to find the parent node when I am performing an insertion and I don't have to look for it again in fib_insert_node. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
Alexander Duyck	8be33e955c	fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf This change makes it so that leaf_walk_rcu takes a tnode and a key instead of the trie and a leaf. The main idea behind this is to avoid using the leaf parent pointer as that can have additional overhead in the future as I am trying to reduce the size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
Alexander Duyck	7289e6ddb6	fib_trie: Only resize tnodes once instead of on each leaf removal in fib_table_flush This change makes it so that we only call resize on the tnodes, instead of from each of the leaves. By doing this we can significantly reduce the amount of time spent resizing as we can update all of the leaves in the tnode first before we make any determinations about resizing. As a result we can simply free the tnode in the case that all of the leaves from a given tnode are flushed instead of resizing with each leaf removed. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 23:35:17 -05:00
David S. Miller	3a65f63ff6	linux-can-next-for-4.1-20150304 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJU9tIxAAoJECte4hHFiupURkwP/03p+RItcv2+pBg3A61tqpo7 qwCNcJCZFhj/ULjZzZOgpeQS2TN0/ZlRwtZjuUX3E90oUd7ADjtsqOWrZKUko1o5 Km2gLQKCHCum6eAsvenWsqSXiMWMfu2+3vfvux6GuF7dO18wQydCHfAjpVU2jxHr difmawYlisEBAT3sBdBsdypKkKCLM4HJ28fbJ2oGPTH1Jusxx8gRCdx2NyMNK5Kb kqCiYntz8ghsYUGkVqhwUZPblae6u9EJeqMclGxtRvYlvynotkM+06gtV4uSNDTI Z2lZd/4/M6UK6OputpcBofiTIF1VBvMDbq9yjTqKH3fiJhMuDgGaV+SqbyemjVoB DURfohvS527JqQFs4vjN4vYx3t7EJJ9Si/CTHEiYcsNXXKnQ3cYiJAeG5qXZJZdh h7TqGxzbP+5VJKWq/AsT6G74m2QUHpKIbcgvsJ4DA2WBEPN2OV9/r/X4EQZcQgxR YCR6zRhvt7apO6ZZFtwX+tHbPVCGEB8m+Yj3f0Emga6S2v3Z2+s0bUfbme/FH8wI k7ksaT9jEraM4KODTswzSmOxnjH8TkPw3B8hRY++s67x0/r73y9p9HLGTfQx9xiL yUCoO3SQqK2/4PmqvO8FyNf6bWEpXugQlw0a2768hGTFWhwjp16dd4v42XF18+q2 gXYRNHe2WL0ROXp7Z8ov =pJAi -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-4.1-20150304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-03-04 this is a pull request of 3 patches for net-next/master. Aaron Wu contributes three patches for the blackfin can driver, which cleans up the driver and makes use of more platform independent code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 16:40:59 -05:00
Wu Fengguang	f0126539c7	mpls: rtm_mpls_policy[] can be static Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 16:39:45 -05:00
David S. Miller	c473463ca7	Merge branch 'be2net-next' Sathya Perla says: ==================== be2net: patch set Hi Dave, the following patch set includes three feature additions relating to SR-IOV to be2net. Patch 1 avoid creating a non-RSS default RXQ when FW allows it. This prevents wasting one RXQ for each VF. Patch 2 adds support for evenly distributing all queue & filter resources across VFs. The FW informs the driver as to which resources are distributable. Patch 3 implements the sriov_configure PCI method to allow runtime enablement of VFs via sysfs. Pls consider applying this patch-set to the net-next tree. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:58:48 -05:00
Vasundhara Volam	ace40aff3c	be2net: implement .sriov_configure() PCI callback This patch implements the .sriov_configure() PCI method to allow for runtime enabling/disabling of VFs. The module param "num_vfs" is now deprecated. At the time of driver load the PF-pool resources are allocated to the PF. When the user enables VFs, the resources are then re-distributed across PFs and VFs based on the number of VFs enabled. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:58:37 -05:00
Vasundhara Volam	f285873841	be2net: re-distribute SRIOV resources allowed by FW When SR-IOV is enabled in the adapter, the FW distributes resources evenly across the PF and it's VFs. This is currently done only for some resources. This patch adds support for a new cmd that queries the FW for the list of resources for which the distribution is allowed and distributes them accordingly. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:58:37 -05:00
Vasundhara Volam	71bb8bd08c	be2net: avoid creating the non-RSS default RXQ if FW allows to On BE2, BE3 and Skhawk-R chips one non-RSS (called "default") RXQ was needed to receive non-IP traffic. Some FW versions now export a capability called IFACE_FLAGS_DEFQ_RSS where this requirement doesn't hold. On such FWs the driver now does not create the non-RSS default queue. This prevents wasting one RXQ per VF. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:58:37 -05:00
Michal Simek	28811a8c00	net: cadence: Remove Kconfig dependency on ARCH Remove Kconfig dependency and enable driver for all ARCHs. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Sören Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:48:26 -05:00
David S. Miller	4f075a58c1	Merge branch 'sh_eth-next' Ben Hutchings says: ==================== sh_eth changes for net-next Some minor new features and fixes. These depend in part on the series I sent earlier for net, specifically "sh_eth: WARN on access to a register not implemented in a particular chip" depends on "sh_eth: Fix RX recovery on R-Car in case of RX ring underrun". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:59 -05:00
Ben Hutchings	4398f9c817	sh_eth: Mitigate lost statistics updates The statistics registers have write-clear behaviour, which means we will lose any increment between the read and write. Mitigate this by only clearing when we read a non-zero value, so we will never falsely report a total of zero. This also saves time as we only handle error statistics here and they won't often be incremented. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:54 -05:00
Ben Hutchings	e5fd13f476	sh_eth: Optionally log RX and TX status for each completed descriptor Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:54 -05:00
Ben Hutchings	6b4b4fead3	sh_eth: Implement ethtool register dump operations There are many different sets of registers implemented by the different versions of this controller, and we can only expect this to get more complicated in future. Limit how much ethtool needs to know by including an explicit bitmap of which registers are included in the dump, allowing room for future growth in the number of possible registers. As I don't have datasheets for all of these, I've only included registers that are: - defined in all 5 register type arrays, or - used by the driver, or - documented in the datasheet I have Add one new capability flag so we can tell whether the RTRATE register is implemented. Delete the TSU_ADRL0 and TSU_ADR{H,L}31 definitions, as they weren't used and the address table is already assumed to be contiguous. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:54 -05:00
Ben Hutchings	3365711df0	sh_eth: WARN on access to a register not implemented in a particular chip Currently we may silently read/write a register at offset 0. Change this to WARN and then ignore the write or read-back all-ones. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:54 -05:00
Ben Hutchings	25b77ad774	sh_eth: Implement multicast statistic based on the RFS8 status bit At least on the R8A7790, RFS8 reflects the RINT8 (multicast) MAC status flag. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 15:40:54 -05:00
Aaron Wu	400aff5da5	bfin_can: Merge header file from arch dependent location Header file was in arch dependent location arch/blackfin/include/asm/bfin_can.h, Now move and merge the useful contents of header file into driver code, note the original header file is reserved for full registers set access test by other code so it survives. Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-03-04 09:52:49 +01:00
Aaron Wu	dead83894c	bfin_can: introduce ioremap to comply to archs with MMU Blackfin was built without MMU, old driver code access the IO space by physical address, introduce the ioremap approach to be compitable with the common style supporting MMU enabled arch. Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-03-04 09:52:49 +01:00
Aaron Wu	e4936e01d0	bfin_can: rewrite the blackfin style of read/write to common ones Replace the blackfin arch dependent style of bfin_read/bfin_write with common readw/writew Signed-off-by: Aaron Wu <Aaron.wu@analog.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-03-04 09:52:49 +01:00
David S. Miller	27db730c4f	Merge branch 'basic-mpls-support' Eric W. Biederman says: ==================== Basic MPLS support take 2 On top of my two pending neighbour table prep patches here is the mpls support refactored to use them, and edited to not drop routes when an interface goes down. Additionally the addition of RTA_LLGATEWAY has been replaced with the addtion of RTA_VIA. RTA_VIA being an attribute that includes the address family as well as the address of the next hop. MPLS is at it's heart simple and I have endeavoured to maintain that simplicity in my implemenation. This is an implementation of a RFC3032 forwarding engine, and basic MPLS egress logic. Which should make linux sufficient to be a mpls forwarding node or to be a LSA (Label Switched Router) as it says in all of the MPLS documents. The ingress support will follow but it deserves it's own discussion so I am pushing it separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:13 -05:00
Eric W. Biederman	8de147dc8e	mpls: Multicast route table change notifications Unlike IPv4 this code notifies on all cases where mpls routes are added or removed and it never automatically removes routes. Avoiding both the userspace confusion that is caused by omitting route updates and the possibility of a flood of netlink traffic when an interface goes doew. For now reserved labels are handled automatically and userspace is not notified. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	03c0566542	mpls: Netlink commands to add, remove, and dump routes This change adds two new netlink routing attributes: RTA_VIA and RTA_NEWDST. RTA_VIA specifies the specifies the next machine to send a packet to like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it includes the address family of the address of the next machine to send a packet to. Currently the MPLS code supports addresses in AF_INET, AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac address is acquired from the neighbour table. For AF_PACKET the destination mac_address is specified in the netlink configuration. I think raw destination mac address support with the family AF_PACKET will prove useful. There is MPLS-TP which is defined to operate on machines that do not support internet packets of any flavor. Further seem to be corner cases where it can be useful. At this point I don't care much either way. RTA_NEWDST specifies the destination address to forward the packet with. MPLS typically changes it's destination address at every hop. For a swap operation RTA_NEWDST is specified with a length of one label. For a push operation RTA_NEWDST is specified with two or more labels. For a pop operation RTA_NEWDST is not specified or equivalently an emtpy RTAN_NEWDST is specified. Those new netlink attributes are used to implement handling of rt-netlink RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the MPLS label table. rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message, verify no unhandled attributes or unhandled values are present and sets up the data structures for mpls_route_add and mpls_route_del. I did my best to match up with the existing conventions with the caveats that MPLS addresses are all destination-specific-addresses, and so don't properly have a scope. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	966bae3349	mpls: Functions for reading and wrinting mpls labels over netlink Reading and writing addresses in network byte order in netlink is traditional and I see no reason to change that. MPLS is interesting as effectively it has variabely length addresses (the MPLS label stack). To represent these variable length addresses in netlink I use a valid MPLS label stack (complete with stop bit). This achieves two things: a well defined existing format is used, and the data can be interpreted without looking at it's length. Not needed to look at the length to decode the variable length network representation allows existing userspace functions such as inet_ntop to be used without needed to change their prototype. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	a2519929ab	mpls: Basic support for adding and removing routes mpls_route_add and mpls_route_del implement the basic logic for adding and removing Next Hop Label Forwarding Entries from the MPLS input label map. The addition and subtraction is done in a way that is consistent with how the existing routing table in Linux are maintained. Thus all of the work to deal with NLM_F_APPEND, NLM_F_EXCL, NLM_F_REPLACE, and NLM_F_CREATE. Cases that are not clearly defined such as changing the interpretation of the mpls reserved labels is not allowed. Because it seems like the right thing to do adding an MPLS route without specifying an input label and allowing the kernel to pick a free label table entry is supported. The implementation is currently less than optimal but that can be changed. As I don't have anything else to test with only ethernet and the loopback device are the only two device types currently supported for forwarding MPLS over. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	7720c01f3f	mpls: Add a sysctl to control the size of the mpls label table This sysctl gives two benefits. By defaulting the table size to 0 mpls even when compiled in and enabled defaults to not forwarding any packets. This prevents unpleasant surprises for users. The other benefit is that as mpls labels are allocated locally a dense table a small dense label table may be used which saves memory and is extremely simple and efficient to implement. This sysctl allows userspace to choose the restrictions on the label table size userspace applications need to cope with. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	0189197f44	mpls: Basic routing support This change adds a new Kconfig option MPLS_ROUTING. The core of this change is the code to look at an mpls packet received from another machine. Look that packet up in a routing table and forward the packet on. Support of MPLS over ATM is not considered or attempted here. This implemntation follows RFC3032 and implements the MPLS shim header that can pass over essentially any network. What RFC3021 refers to as the as the Incoming Label Map (ILM) I call net->mpls.platform_label[]. What RFC3031 refers to as the Next Label Hop Forwarding Entry (NHLFE) I call mpls_route. Though calling it the label fordwarding information base (lfib) might also be valid. Further the implemntation forwards packets as described in RFC3032. There is no need and given the original motivation for MPLS a strong discincentive to have a flexible label forwarding path. In essence the logic is the topmost label is read, looked up, removed, and replaced by 0 or more new lables and the sent out the specified interface to it's next hop. Quite a few optional features are not implemented here. Among them are generation of ICMP errors when the TTL is exceeded or the packet is larger than the next hop MTU (those conditions are detected and the packets are dropped instead of generating an icmp error). The traffic class field is always set to 0. The implementation focuses on IP over MPLS and does not handle egress of other kinds of protocols. Instead of implementing coordination with the neighbour table and sorting out how to input next hops in a different address family (for which there is value). I was lazy and implemented a next hop mac address instead. The code is simpler and there are flavor of MPLS such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is appropriate so a next hop by mac address would need to be implemented at some point. Two new definitions AF_MPLS and PF_MPLS are exposed to userspace. Decoding the mpls header must be done by first byeswapping a 32bit bit endian word into the local cpu endian and then bit shifting to extract the pieces. There is no C bit-field that can represent a wire format mpls header on a little endian machine as the low bits of the 20bit label wind up in the wrong half of third byte. Therefore internally everything is deal with in cpu native byte order except when writing to and reading from a packet. For management simplicity if a label is configured to forward out an interface that is down the packet is dropped early. Similarly if an network interface is removed rt_dev is updated to NULL (so no reference is preserved) and any packets for that label are dropped. Keeping the label entries in the kernel allows the kernel label table to function as the definitive source of which labels are allocated and which are not. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
Eric W. Biederman	cec9166ca4	mpls: Refactor how the mpls module is built This refactoring is needed to allow more than just mpls gso support to be built into the mpls moddule. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:26:06 -05:00
David S. Miller	ee23393b40	Merge branch 'neigh-mpls-prep' Eric W. Biederman says: ==================== Neighbour table prep for MPLS In preparation for using the IPv4 and IPv6 neighbour tables in my mpls code this patchset factors out ___neigh_lookup_noref from __ipv4_neigh_lookup_noref, __ipv6_lookup_noref and neigh_lookup. Allowing the lookup logic to be shared between the different implementations. At what appears to be no cost. (Aka the same assembly is generated for ip6_finish_output2 and ip_finish_output2). After that I add a simple function that takes an address family and an address consults the neighbour table and sends the packet to the appropriate location. The address family argument decoupls callers of neigh_xmit from the addresses families the packets are sent over. (Aka The ipv6 module can be loaded after mpls and a previously configured ipv6 next hop will start working). The refactoring in ___neigh_lookup_noref may be a bit overkill but it feels like the right thing to do. Especially since the same code is generated. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:23:37 -05:00
Eric W. Biederman	4fd3d7d9e8	neigh: Add helper function neigh_xmit For MPLS I am building the code so that either the neighbour mac address can be specified or we can have a next hop in ipv4 or ipv6. The kind of next hop we have is indicated by the neighbour table pointer. A neighbour table pointer of NULL is a link layer address. A non-NULL neighbour table pointer indicates which neighbour table and thus which address family the next hop address is in that we need to look up. The code either sends a packet directly or looks up the appropriate neighbour table entry and sends the packet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:23:23 -05:00
Eric W. Biederman	60395a20ff	neigh: Factor out ___neigh_lookup_noref While looking at the mpls code I found myself writing yet another version of neigh_lookup_noref. We currently have __ipv4_lookup_noref and __ipv6_lookup_noref. So to make my work a little easier and to make it a smidge easier to verify/maintain the mpls code in the future I stopped and wrote ___neigh_lookup_noref. Then I rewote __ipv4_lookup_noref and __ipv6_lookup_noref in terms of this new function. I tested my new version by verifying that the same code is generated in ip_finish_output2 and ip6_finish_output2 where these functions are inlined. To get to ___neigh_lookup_noref I added a new neighbour cache table function key_eq. So that the static size of the key would be available. I also added __neigh_lookup_noref for people who want to to lookup a neighbour table entry quickly but don't know which neibhgour table they are going to look up. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:23:23 -05:00
Johannes Berg	2f56f6be47	bridge: fix bridge netlink RCU usage When the STP timer fires, it can call br_ifinfo_notify(), which in turn ends up in the new br_get_link_af_size(). This function is annotated to be using RTNL locking, which clearly isn't the case here, and thus lockdep warns: =============================== [ INFO: suspicious RCU usage. ] 3.19.0+ #569 Not tainted ------------------------------- net/bridge/br_private.h:204 suspicious rcu_dereference_protected() usage! Fix this by doing RCU locking here. Fixes: `b7853d73e3` ("bridge: add vlan info to bridge setlink and dellink notification messages") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-04 00:20:22 -05:00
David S. Miller	71a83a6db6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-03-03 21:16:48 -05:00

1 2 3 4 5 ...

506213 Commits