Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

Pull networking updates from David Miller:
 "Here we go, another merge window full of networking and #ebpf changes:

   1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
      range without dealing with floods of ARP traffic, from Linus
      Lüssing.

   2) Throttle buffered multicast packet transmission in mt76, from
      Felix Fietkau.

   3) Support adaptive interrupt moderation in ice, from Brett Creeley.

   4) A lot of struct_size conversions, from Gustavo A. R. Silva.

   5) Add peek/push/pop commands to bpftool, as well as bash completion,
      from Stanislav Fomichev.

   6) Optimize sk_msg_clone(), from Vakul Garg.

   7) Add SO_BINDTOIFINDEX, from David Herrmann.

   8) Be more conservative with local resends due to local congestion,
      from Yuchung Cheng.

   9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.

  10) Add health buffer support to devlink, from Eran Ben Elisha.

  11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.

  12) Add statistics to basic packet scheduler filter, from Cong Wang.

  13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.

  14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.

  15) Add 3ad stats to bonding, from Nikolay Aleksandrov.

  16) Lots of probing improvements for bpftool, from Quentin Monnet.

  17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.

  18) Allow #ebpf programs to access gso_segs from skb shared info, from
      Eric Dumazet.

  19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.

  20) Support 22260 iwlwifi devices, from Luca Coelho.

  21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.

  22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.

  23) Add spinlock support to #ebpf, from Alexei Starovoitov.

  24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.

  25) Add device infomation API to devlink, from Jakub Kicinski.

  26) Add new timestamping socket options which are y2038 safe, from
      Deepa Dinamani.

  27) Add RX checksum offloading for various sh_eth chips, from Sergei
      Shtylyov.

  28) Flow offload infrastructure, from Pablo Neira Ayuso.

  29) Numerous cleanups, improvements, and bug fixes to the PHY layer
      and many drivers from Heiner Kallweit.

  30) Lots of changes to try and make packet scheduler classifiers run
      lockless as much as possible, from Vlad Buslov.

  31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.

  32) Add concurrency tests to tc-tests infrastructure, from Vlad
      Buslov.

  33) Add hwmon support to aquantia, from Heiner Kallweit.

  34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.

  And I would be remiss if I didn't thank the various major networking
  subsystem maintainers for integrating much of this work before I even
  saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
  Johannes Berg, Kalle Valo, and many others. Thank you!"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
  net/sched: avoid unused-label warning
  net: ignore sysctl_devconf_inherit_init_net without SYSCTL
  phy: mdio-mux: fix Kconfig dependencies
  net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
  net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
  selftest/net: Remove duplicate header
  sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
  net/mlx5e: Update tx reporter status in case channels were successfully opened
  devlink: Add support for direct reporter health state update
  devlink: Update reporter state to error even if recover aborted
  sctp: call iov_iter_revert() after sending ABORT
  team: Free BPF filter when unregistering netdev
  ip6mr: Do not call __IP6_INC_STATS() from preemptible context
  isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
  net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
  cxgb4/chtls: Prefix adapter flags with CXGB4
  net-sysfs: Switch to bitmap_zalloc()
  mellanox: Switch to bitmap_zalloc()
  bpf: add test cases for non-pointer sanitiation logic
  mlxsw: i2c: Extend initialization by querying resources data
  ...
This commit is contained in:
Linus Torvalds 2019-03-05 08:26:13 -08:00
commit 6456300356
2150 changed files with 112827 additions and 58173 deletions

View File

@ -461,6 +461,11 @@
possible to determine what the correct size should be. possible to determine what the correct size should be.
This option provides an override for these situations. This option provides an override for these situations.
carrier_timeout=
[NET] Specifies amount of time (in seconds) that
the kernel should wait for a network carrier. By default
it waits 120 seconds.
ca_keys= [KEYS] This parameter identifies a specific key(s) on ca_keys= [KEYS] This parameter identifies a specific key(s) on
the system trusted keyring to be used for certificate the system trusted keyring to be used for certificate
trust validation. trust validation.

View File

@ -36,27 +36,27 @@ consideration important quirks of other architectures) and
defines calling convention that is compatible with C calling defines calling convention that is compatible with C calling
convention of the linux kernel on those architectures. convention of the linux kernel on those architectures.
Q: can multiple return values be supported in the future? Q: Can multiple return values be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF allows only register R0 to be used as return value. A: NO. BPF allows only register R0 to be used as return value.
Q: can more than 5 function arguments be supported in the future? Q: Can more than 5 function arguments be supported in the future?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: NO. BPF calling convention only allows registers R1-R5 to be used A: NO. BPF calling convention only allows registers R1-R5 to be used
as arguments. BPF is not a standalone instruction set. as arguments. BPF is not a standalone instruction set.
(unlike x64 ISA that allows msft, cdecl and other conventions) (unlike x64 ISA that allows msft, cdecl and other conventions)
Q: can BPF programs access instruction pointer or return address? Q: Can BPF programs access instruction pointer or return address?
----------------------------------------------------------------- -----------------------------------------------------------------
A: NO. A: NO.
Q: can BPF programs access stack pointer ? Q: Can BPF programs access stack pointer ?
------------------------------------------ ------------------------------------------
A: NO. A: NO.
Only frame pointer (register R10) is accessible. Only frame pointer (register R10) is accessible.
From compiler point of view it's necessary to have stack pointer. From compiler point of view it's necessary to have stack pointer.
For example LLVM defines register R11 as stack pointer in its For example, LLVM defines register R11 as stack pointer in its
BPF backend, but it makes sure that generated code never uses it. BPF backend, but it makes sure that generated code never uses it.
Q: Does C-calling convention diminishes possible use cases? Q: Does C-calling convention diminishes possible use cases?
@ -66,8 +66,8 @@ A: YES.
BPF design forces addition of major functionality in the form BPF design forces addition of major functionality in the form
of kernel helper functions and kernel objects like BPF maps with of kernel helper functions and kernel objects like BPF maps with
seamless interoperability between them. It lets kernel call into seamless interoperability between them. It lets kernel call into
BPF programs and programs call kernel helpers with zero overhead. BPF programs and programs call kernel helpers with zero overhead,
As all of them were native C code. That is particularly the case as all of them were native C code. That is particularly the case
for JITed BPF programs that are indistinguishable from for JITed BPF programs that are indistinguishable from
native kernel C code. native kernel C code.
@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
------------------------------------------------------------------------ ------------------------------------------------------------------------
A: Soft yes. A: Soft yes.
At least for now until BPF core has support for At least for now, until BPF core has support for
bpf-to-bpf calls, indirect calls, loops, global variables, bpf-to-bpf calls, indirect calls, loops, global variables,
jump tables, read only sections and all other normal constructs jump tables, read-only sections, and all other normal constructs
that C code can produce. that C code can produce.
Q: Can loops be supported in a safe way? Q: Can loops be supported in a safe way?
@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like?
A: This was necessary to avoid introducing flags into ISA which are A: This was necessary to avoid introducing flags into ISA which are
impossible to make generic and efficient across CPU architectures. impossible to make generic and efficient across CPU architectures.
Q: why BPF_DIV instruction doesn't map to x64 div? Q: Why BPF_DIV instruction doesn't map to x64 div?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because if we picked one-to-one relationship to x64 it would have made A: Because if we picked one-to-one relationship to x64 it would have made
it more complicated to support on arm64 and other archs. Also it it more complicated to support on arm64 and other archs. Also it
needs div-by-zero runtime check. needs div-by-zero runtime check.
Q: why there is no BPF_SDIV for signed divide operation? Q: Why there is no BPF_SDIV for signed divide operation?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A: Because it would be rarely used. llvm errors in such case and A: Because it would be rarely used. llvm errors in such case and
prints a suggestion to use unsigned divide instead prints a suggestion to use unsigned divide instead.
Q: Why BPF has implicit prologue and epilogue? Q: Why BPF has implicit prologue and epilogue?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

848
Documentation/bpf/btf.rst Normal file
View File

@ -0,0 +1,848 @@
=====================
BPF Type Format (BTF)
=====================
1. Introduction
***************
BTF (BPF Type Format) is the metadata format which encodes the debug info
related to BPF program/map. The name BTF was used initially to describe data
types. The BTF was later extended to include function info for defined
subroutines, and line info for source/line information.
The debug info is used for map pretty print, function signature, etc. The
function signature enables better bpf program/function kernel symbol. The line
info helps generate source annotated translated byte code, jited code and
verifier log.
The BTF specification contains two parts,
* BTF kernel API
* BTF ELF file format
The kernel API is the contract between user space and kernel. The kernel
verifies the BTF info before using it. The ELF file format is a user space
contract between ELF file and libbpf loader.
The type and string sections are part of the BTF kernel API, describing the
debug info (mostly types related) referenced by the bpf program. These two
sections are discussed in details in :ref:`BTF_Type_String`.
.. _BTF_Type_String:
2. BTF Type and String Encoding
*******************************
The file ``include/uapi/linux/btf.h`` provides high-level definition of how
types/strings are encoded.
The beginning of data blob must be::
struct btf_header {
__u16 magic;
__u8 version;
__u8 flags;
__u32 hdr_len;
/* All offsets are in bytes relative to the end of this header */
__u32 type_off; /* offset of type section */
__u32 type_len; /* length of type section */
__u32 str_off; /* offset of string section */
__u32 str_len; /* length of string section */
};
The magic is ``0xeB9F``, which has different encoding for big and little
endian systems, and can be used to test whether BTF is generated for big- or
little-endian target. The ``btf_header`` is designed to be extensible with
``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is
generated.
2.1 String Encoding
===================
The first string in the string section must be a null string. The rest of
string table is a concatenation of other null-terminated strings.
2.2 Type Encoding
=================
The type id ``0`` is reserved for ``void`` type. The type section is parsed
sequentially and type id is assigned to each recognized type starting from id
``1``. Currently, the following types are supported::
#define BTF_KIND_INT 1 /* Integer */
#define BTF_KIND_PTR 2 /* Pointer */
#define BTF_KIND_ARRAY 3 /* Array */
#define BTF_KIND_STRUCT 4 /* Struct */
#define BTF_KIND_UNION 5 /* Union */
#define BTF_KIND_ENUM 6 /* Enumeration */
#define BTF_KIND_FWD 7 /* Forward */
#define BTF_KIND_TYPEDEF 8 /* Typedef */
#define BTF_KIND_VOLATILE 9 /* Volatile */
#define BTF_KIND_CONST 10 /* Const */
#define BTF_KIND_RESTRICT 11 /* Restrict */
#define BTF_KIND_FUNC 12 /* Function */
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */
Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
Each type contains the following common data::
struct btf_type {
__u32 name_off;
/* "info" bits arrangement
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-27: kind (e.g. int, ptr, array...etc)
* bits 28-30: unused
* bit 31: kind_flag, currently used by
* struct, union and fwd
*/
__u32 info;
/* "size" is used by INT, ENUM, STRUCT and UNION.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC and FUNC_PROTO.
* "type" is a type_id referring to another type.
*/
union {
__u32 size;
__u32 type;
};
};
For certain kinds, the common data are followed by kind-specific data. The
``name_off`` in ``struct btf_type`` specifies the offset in the string table.
The following sections detail encoding of each kind.
2.2.1 BTF_KIND_INT
~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: any valid offset
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_INT
* ``info.vlen``: 0
* ``size``: the size of the int type in bytes.
``btf_type`` is followed by a ``u32`` with the following bits arrangement::
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24)
#define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16)
#define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff)
The ``BTF_INT_ENCODING`` has the following attributes::
#define BTF_INT_SIGNED (1 << 0)
#define BTF_INT_CHAR (1 << 1)
#define BTF_INT_BOOL (1 << 2)
The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or
bool, for the int type. The char and bool encoding are mostly useful for
pretty print. At most one encoding can be specified for the int type.
The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int
type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4.
The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()``
for the type. The maximum value of ``BTF_INT_BITS()`` is 128.
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values
for this int. For example, a bitfield struct member has: * btf member bit
offset 100 from the start of the structure, * btf member pointing to an int
type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4``
Then in the struct memory layout, this member will occupy ``4`` bits starting
from bits ``100 + 2 = 102``.
Alternatively, the bitfield struct member can be the following to access the
same bits as the above:
* btf member bit offset 102,
* btf member pointing to an int type,
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4``
The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of
bitfield encoding. Currently, both llvm and pahole generate
``BTF_INT_OFFSET() = 0`` for all int types.
2.2.2 BTF_KIND_PTR
~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_PTR
* ``info.vlen``: 0
* ``type``: the pointee type of the pointer
No additional type data follow ``btf_type``.
2.2.3 BTF_KIND_ARRAY
~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_ARRAY
* ``info.vlen``: 0
* ``size/type``: 0, not used
``btf_type`` is followed by one ``struct btf_array``::
struct btf_array {
__u32 type;
__u32 index_type;
__u32 nelems;
};
The ``struct btf_array`` encoding:
* ``type``: the element type
* ``index_type``: the index type
* ``nelems``: the number of elements for this array (``0`` is also allowed).
The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``,
``u64``, ``unsigned __int128``). The original design of including
``index_type`` follows DWARF, which has an ``index_type`` for its array type.
Currently in BTF, beyond type verification, the ``index_type`` is not used.
The ``struct btf_array`` allows chaining through element type to represent
multidimensional arrays. For example, for ``int a[5][6]``, the following type
information illustrates the chaining:
* [1]: int
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6``
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5``
Currently, both pahole and llvm collapse multidimensional array into
one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is
equal to ``30``. This is because the original use case is map pretty print
where the whole array is dumped out so one-dimensional array is enough. As
more BTF usage is explored, pahole and llvm can be changed to generate proper
chained representation for multidimensional arrays.
2.2.4 BTF_KIND_STRUCT
~~~~~~~~~~~~~~~~~~~~~
2.2.5 BTF_KIND_UNION
~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0 or offset to a valid C identifier
* ``info.kind_flag``: 0 or 1
* ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION
* ``info.vlen``: the number of struct/union members
* ``info.size``: the size of the struct/union in bytes
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.::
struct btf_member {
__u32 name_off;
__u32 type;
__u32 offset;
};
``struct btf_member`` encoding:
* ``name_off``: offset to a valid C identifier
* ``type``: the member type
* ``offset``: <see below>
If the type info ``kind_flag`` is not set, the offset contains only bit offset
of the member. Note that the base type of the bitfield can only be int or enum
type. If the bitfield size is 32, the base type can be either int or enum
type. If the bitfield size is not 32, the base type must be int, and int type
``BTF_INT_BITS()`` encodes the bitfield size.
If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member
bitfield size and bit offset. The bitfield size and bit offset are calculated
as below.::
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24)
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff)
In this case, if the base type is an int type, it must be a regular int type:
* ``BTF_INT_OFFSET()`` must be 0.
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``.
The following kernel patch introduced ``kind_flag`` and explained why both
modes exist:
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3
2.2.6 BTF_KIND_ENUM
~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0 or offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_ENUM
* ``info.vlen``: number of enum values
* ``size``: 4
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.::
struct btf_enum {
__u32 name_off;
__s32 val;
};
The ``btf_enum`` encoding:
* ``name_off``: offset to a valid C identifier
* ``val``: any value
2.2.7 BTF_KIND_FWD
~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0 for struct, 1 for union
* ``info.kind``: BTF_KIND_FWD
* ``info.vlen``: 0
* ``type``: 0
No additional type data follow ``btf_type``.
2.2.8 BTF_KIND_TYPEDEF
~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_TYPEDEF
* ``info.vlen``: 0
* ``type``: the type which can be referred by name at ``name_off``
No additional type data follow ``btf_type``.
2.2.9 BTF_KIND_VOLATILE
~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_VOLATILE
* ``info.vlen``: 0
* ``type``: the type with ``volatile`` qualifier
No additional type data follow ``btf_type``.
2.2.10 BTF_KIND_CONST
~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_CONST
* ``info.vlen``: 0
* ``type``: the type with ``const`` qualifier
No additional type data follow ``btf_type``.
2.2.11 BTF_KIND_RESTRICT
~~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_RESTRICT
* ``info.vlen``: 0
* ``type``: the type with ``restrict`` qualifier
No additional type data follow ``btf_type``.
2.2.12 BTF_KIND_FUNC
~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_FUNC
* ``info.vlen``: 0
* ``type``: a BTF_KIND_FUNC_PROTO type
No additional type data follow ``btf_type``.
A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose
signature is defined by ``type``. The subprogram is thus an instance of that
type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
(ABI).
2.2.13 BTF_KIND_FUNC_PROTO
~~~~~~~~~~~~~~~~~~~~~~~~~~
``struct btf_type`` encoding requirement:
* ``name_off``: 0
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_FUNC_PROTO
* ``info.vlen``: # of parameters
* ``type``: the return type
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.::
struct btf_param {
__u32 name_off;
__u32 type;
};
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then
``btf_param.name_off`` must point to a valid C identifier except for the
possible last argument representing the variable argument. The btf_param.type
refers to parameter type.
If the function has variable arguments, the last parameter is encoded with
``name_off = 0`` and ``type = 0``.
3. BTF Kernel API
*****************
The following bpf syscall command involves BTF:
* BPF_BTF_LOAD: load a blob of BTF data into kernel
* BPF_MAP_CREATE: map creation with btf key and value type info.
* BPF_PROG_LOAD: prog load with btf function and line info.
* BPF_BTF_GET_FD_BY_ID: get a btf fd
* BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info
and other btf related info are returned.
The workflow typically looks like:
::
Application:
BPF_BTF_LOAD
|
v
BPF_MAP_CREATE and BPF_PROG_LOAD
|
V
......
Introspection tool:
......
BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's)
|
V
BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd)
|
V
BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id)
| |
V |
BPF_BTF_GET_FD_BY_ID (get btf_fd) |
| |
V |
BPF_OBJ_GET_INFO_BY_FD (get btf) |
| |
V V
pretty print types, dump func signatures and line info, etc.
3.1 BPF_BTF_LOAD
================
Load a blob of BTF data into kernel. A blob of data, described in
:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
is returned to a userspace.
3.2 BPF_MAP_CREATE
==================
A map can be created with ``btf_fd`` and specified key/value type id.::
__u32 btf_fd; /* fd pointing to a BTF type data */
__u32 btf_key_type_id; /* BTF type_id of the key */
__u32 btf_value_type_id; /* BTF type_id of the value */
In libbpf, the map can be defined with extra annotation like below:
::
struct bpf_map_def SEC("maps") btf_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(struct ipv_counts),
.max_entries = 4,
};
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts);
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and
value types for the map. During ELF parsing, libbpf is able to extract
key/value type_id's and assign them to BPF_MAP_CREATE attributes
automatically.
.. _BPF_Prog_Load:
3.3 BPF_PROG_LOAD
=================
During prog_load, func_info and line_info can be passed to kernel with proper
values for the following attributes:
::
__u32 insn_cnt;
__aligned_u64 insns;
......
__u32 prog_btf_fd; /* fd pointing to BTF type data */
__u32 func_info_rec_size; /* userspace bpf_func_info size */
__aligned_u64 func_info; /* func info */
__u32 func_info_cnt; /* number of bpf_func_info records */
__u32 line_info_rec_size; /* userspace bpf_line_info size */
__aligned_u64 line_info; /* line info */
__u32 line_info_cnt; /* number of bpf_line_info records */
The func_info and line_info are an array of below, respectively.::
struct bpf_func_info {
__u32 insn_off; /* [0, insn_cnt - 1] */
__u32 type_id; /* pointing to a BTF_KIND_FUNC type */
};
struct bpf_line_info {
__u32 insn_off; /* [0, insn_cnt - 1] */
__u32 file_name_off; /* offset to string table for the filename */
__u32 line_off; /* offset to string table for the source line */
__u32 line_col; /* line number and column number */
};
func_info_rec_size is the size of each func_info record, and
line_info_rec_size is the size of each line_info record. Passing the record
size to kernel make it possible to extend the record itself in the future.
Below are requirements for func_info:
* func_info[0].insn_off must be 0.
* the func_info insn_off is in strictly increasing order and matches
bpf func boundaries.
Below are requirements for line_info:
* the first insn in each func must have a line_info record pointing to it.
* the line_info insn_off is in strictly increasing order.
For line_info, the line number and column number are defined as below:
::
#define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10)
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
In kernel, every loaded program, map or btf has a unique id. The id won't
change during the lifetime of a program, map, or btf.
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for
each command, to user space, for bpf program or maps, respectively, so an
inspection tool can inspect all programs and maps.
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
An introspection tool cannot use id to get details about program or maps.
A file descriptor needs to be obtained first for reference-counting purpose.
3.6 BPF_OBJ_GET_INFO_BY_FD
==========================
Once a program/map fd is acquired, an introspection tool can get the detailed
information from kernel about this fd, some of which are BTF-related. For
example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated
bpf byte codes, and jited_line_info.
3.7 BPF_BTF_GET_FD_BY_ID
========================
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the
kernel with BPF_BTF_LOAD, can be retrieved.
With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection
tool has full btf knowledge and is able to pretty print map key/values, dump
func signatures and line info, along with byte/jit codes.
4. ELF File Format Interface
****************************
4.1 .BTF section
================
The .BTF section contains type and string data. The format of this section is
same as the one describe in :ref:`BTF_Type_String`.
.. _BTF_Ext_Section:
4.2 .BTF.ext section
====================
The .BTF.ext section encodes func_info and line_info which needs loader
manipulation before loading into the kernel.
The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h``
and ``tools/lib/bpf/btf.c``.
The current header of .BTF.ext section::
struct btf_ext_header {
__u16 magic;
__u8 version;
__u8 flags;
__u32 hdr_len;
/* All offsets are in bytes relative to the end of this header */
__u32 func_info_off;
__u32 func_info_len;
__u32 line_info_off;
__u32 line_info_len;
};
It is very similar to .BTF section. Instead of type/string section, it
contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details
about func_info and line_info record format.
The func_info is organized as below.::
func_info_rec_size
btf_ext_info_sec for section #1 /* func_info for section #1 */
btf_ext_info_sec for section #2 /* func_info for section #2 */
...
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when
.BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of
func_info for each specific ELF section.::
struct btf_ext_info_sec {
__u32 sec_name_off; /* offset to section name */
__u32 num_info;
/* Followed by num_info * record_size number of bytes */
__u8 data[0];
};
Here, num_info must be greater than 0.
The line_info is organized as below.::
line_info_rec_size
btf_ext_info_sec for section #1 /* line_info for section #1 */
btf_ext_info_sec for section #2 /* line_info for section #2 */
...
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when
.BTF.ext is generated.
The interpretation of ``bpf_func_info->insn_off`` and
``bpf_line_info->insn_off`` is different between kernel API and ELF API. For
kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct
bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
beginning of section (``btf_ext_info_sec->sec_name_off``).
5. Using BTF
************
5.1 bpftool map pretty print
============================
With BTF, the map key/value can be printed based on fields rather than simply
raw bytes. This is especially valuable for large structure or if your data
structure has bitfields. For example, for the following map,::
enum A { A1, A2, A3, A4, A5 };
typedef enum A ___A;
struct tmp_t {
char a1:4;
int a2:4;
int :4;
__u32 a3:4;
int b;
___A b1:4;
enum A b2:4;
};
struct bpf_map_def SEC("maps") tmpmap = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct tmp_t),
.max_entries = 1,
};
BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t);
bpftool is able to pretty print like below:
::
[{
"key": 0,
"value": {
"a1": 0x2,
"a2": 0x4,
"a3": 0x6,
"b": 7,
"b1": 0x8,
"b2": 0xa
}
}
]
5.2 bpftool prog dump
=====================
The following is an example showing how func_info and line_info can help prog
dump with better kernel symbol names, function prototypes and line
information.::
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv
[...]
int test_long_fname_2(struct dummy_tracepoint_args * arg):
bpf_prog_44a040bf25481309_test_long_fname_2:
; static int test_long_fname_2(struct dummy_tracepoint_args *arg)
0: push %rbp
1: mov %rsp,%rbp
4: sub $0x30,%rsp
b: sub $0x28,%rbp
f: mov %rbx,0x0(%rbp)
13: mov %r13,0x8(%rbp)
17: mov %r14,0x10(%rbp)
1b: mov %r15,0x18(%rbp)
1f: xor %eax,%eax
21: mov %rax,0x20(%rbp)
25: xor %esi,%esi
; int key = 0;
27: mov %esi,-0x4(%rbp)
; if (!arg->sock)
2a: mov 0x8(%rdi),%rdi
; if (!arg->sock)
2e: cmp $0x0,%rdi
32: je 0x0000000000000070
34: mov %rbp,%rsi
; counts = bpf_map_lookup_elem(&btf_map, &key);
[...]
5.3 Verifier Log
================
The following is an example of how line_info can help debugging verification
failure.::
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c
* is modified as below.
*/
data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end;
/*
if (data + 4 > data_end)
return XDP_DROP;
*/
*(u32 *)data = dst->dst;
$ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
; data = (void *)(long)xdp->data;
224: (79) r2 = *(u64 *)(r10 -112)
225: (61) r2 = *(u32 *)(r2 +0)
; *(u32 *)data = dst->dst;
226: (63) *(u32 *)(r2 +0) = r1
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
R2 offset is outside of the packet
6. BTF Generation
*****************
You need latest pahole
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't
support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,::
-bash-4.4$ cat t.c
struct t {
int a:2;
int b:3;
int c:2;
} g;
-bash-4.4$ gcc -c -O2 -g t.c
-bash-4.4$ pahole -JV t.o
File t.o:
[1] STRUCT t kind_flag=1 size=4 vlen=3
a type_id=2 bitfield_size=2 bits_offset=0
b type_id=2 bitfield_size=3 bits_offset=2
c type_id=2 bitfield_size=2 bits_offset=5
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target
only. The assembly code (-S) is able to show the BTF encoding in assembly
format.::
-bash-4.4$ cat t2.c
typedef int __int32;
struct t2 {
int a2;
int (*f2)(char q1, __int32 q2, ...);
int (*f3)();
} g2;
int main() { return 0; }
int test() { return 0; }
-bash-4.4$ clang -c -g -O2 -target bpf t2.c
-bash-4.4$ readelf -S t2.o
......
[ 8] .BTF PROGBITS 0000000000000000 00000247
000000000000016e 0000000000000000 0 0 1
[ 9] .BTF.ext PROGBITS 0000000000000000 000003b5
0000000000000060 0000000000000000 0 0 1
[10] .rel.BTF.ext REL 0000000000000000 000007e0
0000000000000040 0000000000000010 16 9 8
......
-bash-4.4$ clang -S -g -O2 -target bpf t2.c
-bash-4.4$ cat t2.s
......
.section .BTF,"",@progbits
.short 60319 # 0xeb9f
.byte 1
.byte 0
.long 24
.long 0
.long 220
.long 220
.long 122
.long 0 # BTF_KIND_FUNC_PROTO(id = 1)
.long 218103808 # 0xd000000
.long 2
.long 83 # BTF_KIND_INT(id = 2)
.long 16777216 # 0x1000000
.long 4
.long 16777248 # 0x1000020
......
.byte 0 # string offset=0
.ascii ".text" # string offset=1
.byte 0
.ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7
.byte 0
.ascii "int main() { return 0; }" # string offset=33
.byte 0
.ascii "int test() { return 0; }" # string offset=58
.byte 0
.ascii "int" # string offset=83
......
.section .BTF.ext,"",@progbits
.short 60319 # 0xeb9f
.byte 1
.byte 0
.long 24
.long 0
.long 28
.long 28
.long 44
.long 8 # FuncInfo
.long 1 # FuncInfo section string offset=1
.long 2
.long .Lfunc_begin0
.long 3
.long .Lfunc_begin1
.long 5
.long 16 # LineInfo
.long 1 # LineInfo section string offset=1
.long 2
.long .Ltmp0
.long 7
.long 33
.long 7182 # Line 7 Col 14
.long .Ltmp3
.long 7
.long 58
.long 8206 # Line 8 Col 14
7. Testing
**********
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.

View File

@ -15,6 +15,13 @@ that goes into great technical depth about the BPF Architecture.
The primary info for the bpf syscall is available in the `man-pages`_ The primary info for the bpf syscall is available in the `man-pages`_
for `bpf(2)`_. for `bpf(2)`_.
BPF Type Format (BTF)
=====================
.. toctree::
:maxdepth: 1
btf
Frequently asked questions (FAQ) Frequently asked questions (FAQ)

View File

@ -9,6 +9,9 @@ Required properties:
(more may be added later) are: (more may be added later) are:
"usb1286,204e" (Marvell 8997) "usb1286,204e" (Marvell 8997)
"usbcf3,e300" (Qualcomm QCA6174A)
"usb4ca,301a" (Qualcomm QCA6174A (Lite-On))
Also, vendors that use btusb may have device additional properties, e.g: Also, vendors that use btusb may have device additional properties, e.g:
Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt Documentation/devicetree/bindings/net/marvell-bt-8xxx.txt

View File

@ -7,6 +7,11 @@ Required properties:
of the following: of the following:
- "microchip,ksz9477" - "microchip,ksz9477"
- "microchip,ksz9897" - "microchip,ksz9897"
- "microchip,ksz9896"
- "microchip,ksz9567"
- "microchip,ksz8565"
- "microchip,ksz9893"
- "microchip,ksz9563"
Optional properties: Optional properties:
@ -19,58 +24,96 @@ Examples:
Ethernet switch connected via SPI to the host, CPU port wired to eth0: Ethernet switch connected via SPI to the host, CPU port wired to eth0:
eth0: ethernet@10001000 { eth0: ethernet@10001000 {
fixed-link { fixed-link {
speed = <1000>; speed = <1000>;
full-duplex; full-duplex;
}; };
}; };
spi1: spi@f8008000 { spi1: spi@f8008000 {
pinctrl-0 = <&pinctrl_spi_ksz>; pinctrl-0 = <&pinctrl_spi_ksz>;
cs-gpios = <&pioC 25 0>; cs-gpios = <&pioC 25 0>;
id = <1>; id = <1>;
ksz9477: ksz9477@0 { ksz9477: ksz9477@0 {
compatible = "microchip,ksz9477"; compatible = "microchip,ksz9477";
reg = <0>; reg = <0>;
spi-max-frequency = <44000000>; spi-max-frequency = <44000000>;
spi-cpha; spi-cpha;
spi-cpol; spi-cpol;
ports { ports {
#address-cells = <1>; #address-cells = <1>;
#size-cells = <0>; #size-cells = <0>;
port@0 { port@0 {
reg = <0>; reg = <0>;
label = "lan1"; label = "lan1";
}; };
port@1 { port@1 {
reg = <1>; reg = <1>;
label = "lan2"; label = "lan2";
}; };
port@2 { port@2 {
reg = <2>; reg = <2>;
label = "lan3"; label = "lan3";
}; };
port@3 { port@3 {
reg = <3>; reg = <3>;
label = "lan4"; label = "lan4";
}; };
port@4 { port@4 {
reg = <4>; reg = <4>;
label = "lan5"; label = "lan5";
}; };
port@5 { port@5 {
reg = <5>; reg = <5>;
label = "cpu"; label = "cpu";
ethernet = <&eth0>; ethernet = <&eth0>;
fixed-link { fixed-link {
speed = <1000>; speed = <1000>;
full-duplex; full-duplex;
}; };
}; };
}; };
}; };
}; ksz8565: ksz8565@0 {
compatible = "microchip,ksz8565";
reg = <0>;
spi-max-frequency = <44000000>;
spi-cpha;
spi-cpol;
ports {
#address-cells = <1>;
#size-cells = <0>;
port@0 {
reg = <0>;
label = "lan1";
};
port@1 {
reg = <1>;
label = "lan2";
};
port@2 {
reg = <2>;
label = "lan3";
};
port@3 {
reg = <3>;
label = "lan4";
};
port@6 {
reg = <6>;
label = "cpu";
ethernet = <&eth0>;
fixed-link {
speed = <1000>;
full-duplex;
};
};
};
};
};

View File

@ -3,12 +3,16 @@ Mediatek MT7530 Ethernet switch
Required properties: Required properties:
- compatible: Must be compatible = "mediatek,mt7530"; - compatible: may be compatible = "mediatek,mt7530"
or compatible = "mediatek,mt7621"
- #address-cells: Must be 1. - #address-cells: Must be 1.
- #size-cells: Must be 0. - #size-cells: Must be 0.
- mediatek,mcm: Boolean; if defined, indicates that either MT7530 is the part - mediatek,mcm: Boolean; if defined, indicates that either MT7530 is the part
on multi-chip module belong to MT7623A has or the remotely standalone on multi-chip module belong to MT7623A has or the remotely standalone
chip as the function MT7623N reference board provided for. chip as the function MT7623N reference board provided for.
If compatible mediatek,mt7530 is set then the following properties are required
- core-supply: Phandle to the regulator node necessary for the core power. - core-supply: Phandle to the regulator node necessary for the core power.
- io-supply: Phandle to the regulator node necessary for the I/O power. - io-supply: Phandle to the regulator node necessary for the I/O power.
See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt See Documentation/devicetree/bindings/regulator/mt6323-regulator.txt

View File

@ -0,0 +1,69 @@
* ENETC ethernet device tree bindings
Depending on board design and ENETC port type (internal or
external) there are two supported link modes specified by
below device tree bindings.
Required properties:
- reg : Specifies PCIe Device Number and Function
Number of the ENETC endpoint device, according
to parent node bindings.
- compatible : Should be "fsl,enetc".
1) The ENETC external port is connected to a MDIO configurable phy:
In this case, the ENETC node should include a "mdio" sub-node
that in turn should contain the "ethernet-phy" node describing the
external phy. Below properties are required, their bindings
already defined in ethernet.txt or phy.txt, under
Documentation/devicetree/bindings/net/*.
Required:
- phy-handle : Phandle to a PHY on the MDIO bus.
Defined in ethernet.txt.
- phy-connection-type : Defined in ethernet.txt.
- mdio : "mdio" node, defined in mdio.txt.
- ethernet-phy : "ethernet-phy" node, defined in phy.txt.
Example:
ethernet@0,0 {
compatible = "fsl,enetc";
reg = <0x000000 0 0 0 0>;
phy-handle = <&sgmii_phy0>;
phy-connection-type = "sgmii";
mdio {
#address-cells = <1>;
#size-cells = <0>;
sgmii_phy0: ethernet-phy@2 {
reg = <0x2>;
};
};
};
2) The ENETC port is an internal port or has a fixed-link external
connection:
In this case, the ENETC port node defines a fixed link connection,
as specified by "fixed-link.txt", under
Documentation/devicetree/bindings/net/*.
Required:
- fixed-link : "fixed-link" node, defined in "fixed-link.txt".
Example:
ethernet@0,2 {
compatible = "fsl,enetc";
reg = <0x000200 0 0 0 0>;
fixed-link {
speed = <1000>;
full-duplex;
};
};

View File

@ -3,8 +3,8 @@
Required properties: Required properties:
- compatible: Should be "cdns,[<chip>-]{macb|gem}" - compatible: Should be "cdns,[<chip>-]{macb|gem}"
Use "cdns,at91rm9200-emac" Atmel at91rm9200 SoC. Use "cdns,at91rm9200-emac" Atmel at91rm9200 SoC.
Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs or the 10/100Mbit IP Use "cdns,at91sam9260-macb" for Atmel at91sam9 SoCs.
available on sama5d3 SoCs. Use "cdns,sam9x60-macb" for Microchip sam9x60 SoC.
Use "cdns,np4-macb" for NP4 SoC devices. Use "cdns,np4-macb" for NP4 SoC devices.
Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb". Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb".
Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on

View File

@ -19,7 +19,7 @@ Optional properties:
"marvell,armada-370-neta" and 9800B for others. "marvell,armada-370-neta" and 9800B for others.
- clock-names: List of names corresponding to clocks property; shall be - clock-names: List of names corresponding to clocks property; shall be
"core" for core clock and "bus" for the optional bus clock. "core" for core clock and "bus" for the optional bus clock.
- phys: comphy for the ethernet port, see ../phy/phy-bindings.txt
Optional properties (valid only for Armada XP/38x): Optional properties (valid only for Armada XP/38x):

View File

@ -0,0 +1,82 @@
Properties for an MDIO bus multiplexer consumer device
This is a special case of MDIO mux when MDIO mux is defined as a consumer
of a mux producer device. The mux producer can be of any type like mmio mux
producer, gpio mux producer or generic register based mux producer.
Required properties in addition to the MDIO Bus multiplexer properties:
- compatible : should be "mmio-mux-multiplexer"
- mux-controls : mux controller node to use for operating the mux
- mdio-parent-bus : phandle to the parent MDIO bus.
each child node of mdio bus multiplexer consumer device represent a mdio
bus.
for more information please refer
Documentation/devicetree/bindings/mux/mux-controller.txt
and Documentation/devicetree/bindings/net/mdio-mux.txt
Example:
In below example the Mux producer and consumer are separate nodes.
&i2c0 {
fpga@66 { // fpga connected to i2c
compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c",
"simple-mfd";
reg = <0x66>;
mux: mux-controller { // Mux Producer
compatible = "reg-mux";
#mux-control-cells = <1>;
mux-reg-masks = <0x54 0xf8>, /* 0: reg 0x54, bits 7:3 */
<0x54 0x07>; /* 1: reg 0x54, bits 2:0 */
};
};
};
mdio-mux-1 { // Mux consumer
compatible = "mdio-mux-multiplexer";
mux-controls = <&mux 0>;
mdio-parent-bus = <&emdio1>;
#address-cells = <1>;
#size-cells = <0>;
mdio@0 {
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
};
mdio@8 {
reg = <0x8>;
#address-cells = <1>;
#size-cells = <0>;
};
..
..
};
mdio-mux-2 { // Mux consumer
compatible = "mdio-mux-multiplexer";
mux-controls = <&mux 1>;
mdio-parent-bus = <&emdio2>;
#address-cells = <1>;
#size-cells = <0>;
mdio@0 {
reg = <0x0>;
#address-cells = <1>;
#size-cells = <0>;
};
mdio@1 {
reg = <0x1>;
#address-cells = <1>;
#size-cells = <0>;
};
..
..
};

View File

@ -33,3 +33,67 @@ Example:
clock-names = "ref"; clock-names = "ref";
}; };
}; };
MediaTek UART based Bluetooth Devices
==================================
This device is a serial attached device to UART device and thus it must be a
child node of the serial node with UART.
Please refer to the following documents for generic properties:
Documentation/devicetree/bindings/serial/slave-device.txt
Required properties:
- compatible: Must be
"mediatek,mt7663u-bluetooth": for MT7663U device
"mediatek,mt7668u-bluetooth": for MT7668U device
- vcc-supply: Main voltage regulator
- pinctrl-names: Should be "default", "runtime"
- pinctrl-0: Should contain UART RXD low when the device is powered up to
enter proper bootstrap mode.
- pinctrl-1: Should contain UART mode pin ctrl
Optional properties:
- reset-gpios: GPIO used to reset the device whose initial state keeps low,
if the GPIO is missing, then board-level design should be
guaranteed.
- current-speed: Current baud rate of the device whose defaults to 921600
Example:
uart1_pins_boot: uart1-default {
pins-dat {
pinmux = <MT7623_PIN_81_URXD1_FUNC_GPIO81>;
output-low;
};
};
uart1_pins_runtime: uart1-runtime {
pins-dat {
pinmux = <MT7623_PIN_81_URXD1_FUNC_URXD1>,
<MT7623_PIN_82_UTXD1_FUNC_UTXD1>;
};
};
uart1: serial@11003000 {
compatible = "mediatek,mt7623-uart",
"mediatek,mt6577-uart";
reg = <0 0x11003000 0 0x400>;
interrupts = <GIC_SPI 52 IRQ_TYPE_LEVEL_LOW>;
clocks = <&pericfg CLK_PERI_UART1_SEL>,
<&pericfg CLK_PERI_UART1>;
clock-names = "baud", "bus";
bluetooth {
compatible = "mediatek,mt7663u-bluetooth";
vcc-supply = <&reg_5v>;
reset-gpios = <&pio 24 GPIO_ACTIVE_LOW>;
pinctrl-names = "default", "runtime";
pinctrl-0 = <&uart1_pins_boot>;
pinctrl-1 = <&uart1_pins_runtime>;
current-speed = <921600>;
};
};

View File

@ -1,16 +1,52 @@
* NI XGE Ethernet controller * NI XGE Ethernet controller
Required properties: Required properties:
- compatible: Should be "ni,xge-enet-2.00" - compatible: Should be "ni,xge-enet-3.00", but can be "ni,xge-enet-2.00" for
- reg: Address and length of the register set for the device older device trees with DMA engines co-located in the address map,
with the one reg entry to describe the whole device.
- reg: Address and length of the register set for the device. It contains the
information of registers in the same order as described by reg-names.
- reg-names: Should contain the reg names
"dma": DMA engine control and status region
"ctrl": MDIO and PHY control and status region
- interrupts: Should contain tx and rx interrupt - interrupts: Should contain tx and rx interrupt
- interrupt-names: Should be "rx" and "tx" - interrupt-names: Should be "rx" and "tx"
- phy-mode: See ethernet.txt file in the same directory. - phy-mode: See ethernet.txt file in the same directory.
- phy-handle: See ethernet.txt file in the same directory.
- nvmem-cells: Phandle of nvmem cell containing the MAC address - nvmem-cells: Phandle of nvmem cell containing the MAC address
- nvmem-cell-names: Should be "address" - nvmem-cell-names: Should be "address"
Optional properties:
- mdio subnode to indicate presence of MDIO controller
- fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
Use instead of phy-handle.
- phy-handle: See ethernet.txt file in the same directory.
Examples (10G generic PHY): Examples (10G generic PHY):
nixge0: ethernet@40000000 {
compatible = "ni,xge-enet-3.00";
reg = <0x40000000 0x4000
0x41002000 0x2000>;
reg-names = "dma", "ctrl";
nvmem-cells = <&eth1_addr>;
nvmem-cell-names = "address";
interrupts = <0 29 IRQ_TYPE_LEVEL_HIGH>, <0 30 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "rx", "tx";
interrupt-parent = <&intc>;
phy-mode = "xgmii";
phy-handle = <&ethernet_phy1>;
mdio {
ethernet_phy1: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <4>;
};
};
};
Examples (10G generic PHY, no MDIO):
nixge0: ethernet@40000000 { nixge0: ethernet@40000000 {
compatible = "ni,xge-enet-2.00"; compatible = "ni,xge-enet-2.00";
reg = <0x40000000 0x6000>; reg = <0x40000000 0x6000>;
@ -24,9 +60,33 @@ Examples (10G generic PHY):
phy-mode = "xgmii"; phy-mode = "xgmii";
phy-handle = <&ethernet_phy1>; phy-handle = <&ethernet_phy1>;
};
ethernet_phy1: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c45"; Examples (1G generic fixed-link + MDIO):
reg = <4>; nixge0: ethernet@40000000 {
}; compatible = "ni,xge-enet-2.00";
reg = <0x40000000 0x6000>;
nvmem-cells = <&eth1_addr>;
nvmem-cell-names = "address";
interrupts = <0 29 IRQ_TYPE_LEVEL_HIGH>, <0 30 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "rx", "tx";
interrupt-parent = <&intc>;
phy-mode = "xgmii";
fixed-link {
speed = <1000>;
pause;
link-gpios = <&gpio0 63 GPIO_ACTIVE_HIGH>;
};
mdio {
ethernet_phy1: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c22";
reg = <4>;
};
};
}; };

View File

@ -0,0 +1,64 @@
Qualcomm Ethernet ETHQOS device
This documents dwmmac based ethernet device which supports Gigabit
ethernet for version v2.3.0 onwards.
This device has following properties:
Required properties:
- compatible: Should be qcom,qcs404-ethqos"
- reg: Address and length of the register set for the device
- reg-names: Should contain register names "stmmaceth", "rgmii"
- clocks: Should contain phandle to clocks
- clock-names: Should contain clock names "stmmaceth", "pclk",
"ptp_ref", "rgmii"
- interrupts: Should contain phandle to interrupts
- interrupt-names: Should contain interrupt names "macirq", "eth_lpi"
Rest of the properties are defined in stmmac.txt file in same directory
Example:
ethernet: ethernet@7a80000 {
compatible = "qcom,qcs404-ethqos";
reg = <0x07a80000 0x10000>,
<0x07a96000 0x100>;
reg-names = "stmmaceth", "rgmii";
clock-names = "stmmaceth", "pclk", "ptp_ref", "rgmii";
clocks = <&gcc GCC_ETH_AXI_CLK>,
<&gcc GCC_ETH_SLAVE_AHB_CLK>,
<&gcc GCC_ETH_PTP_CLK>,
<&gcc GCC_ETH_RGMII_CLK>;
interrupts = <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "macirq", "eth_lpi";
snps,reset-gpio = <&tlmm 60 GPIO_ACTIVE_LOW>;
snps,reset-active-low;
snps,txpbl = <8>;
snps,rxpbl = <2>;
snps,aal;
snps,tso;
phy-handle = <&phy1>;
phy-mode = "rgmii";
mdio {
#address-cells = <0x1>;
#size-cells = <0x0>;
compatible = "snps,dwmac-mdio";
phy1: phy@4 {
device_type = "ethernet-phy";
reg = <0x4>;
};
};
};

View File

@ -4,6 +4,13 @@ This node provides properties for configuring the MediaTek mt76xx wireless
device. The node is expected to be specified as a child node of the PCI device. The node is expected to be specified as a child node of the PCI
controller to which the wireless chip is connected. controller to which the wireless chip is connected.
Alternatively, it can specify the wireless part of the MT7628/MT7688 SoC.
For SoC, use the compatible string "mediatek,mt7628-wmac" and the following
properties:
- reg: Address and length of the register set for the device.
- interrupts: Main device interrupt
Optional properties: Optional properties:
- mac-address: See ethernet.txt in the parent directory - mac-address: See ethernet.txt in the parent directory
@ -30,3 +37,15 @@ Optional nodes:
}; };
}; };
}; };
MT7628 example:
wmac: wmac@10300000 {
compatible = "mediatek,mt7628-wmac";
reg = <0x10300000 0x100000>;
interrupt-parent = <&cpuintc>;
interrupts = <6>;
mediatek,mtd-eeprom = <&factory 0x0000>;
};

View File

@ -0,0 +1,40 @@
mvebu armada 38x comphy driver
------------------------------
This comphy controller can be found on Marvell Armada 38x. It provides a
number of shared PHYs used by various interfaces (network, sata, usb,
PCIe...).
Required properties:
- compatible: should be "marvell,armada-380-comphy"
- reg: should contain the comphy register location and length.
- #address-cells: should be 1.
- #size-cells: should be 0.
A sub-node is required for each comphy lane provided by the comphy.
Required properties (child nodes):
- reg: comphy lane number.
- #phy-cells : from the generic phy bindings, must be 1. Defines the
input port to use for a given comphy lane.
Example:
comphy: phy@18300 {
compatible = "marvell,armada-380-comphy";
reg = <0x18300 0x100>;
#address-cells = <1>;
#size-cells = <0>;
cpm_comphy0: phy@0 {
reg = <0>;
#phy-cells = <1>;
};
cpm_comphy1: phy@1 {
reg = <1>;
#phy-cells = <1>;
};
};

View File

@ -17,6 +17,11 @@ Clock Properties:
- fsl,tmr-fiper1 Fixed interval period pulse generator. - fsl,tmr-fiper1 Fixed interval period pulse generator.
- fsl,tmr-fiper2 Fixed interval period pulse generator. - fsl,tmr-fiper2 Fixed interval period pulse generator.
- fsl,max-adj Maximum frequency adjustment in parts per billion. - fsl,max-adj Maximum frequency adjustment in parts per billion.
- fsl,extts-fifo The presence of this property indicates hardware
support for the external trigger stamp FIFO.
- little-endian The presence of this property indicates the 1588 timer
IP block is little-endian mode. The default endian mode
is big-endian.
These properties set the operational parameters for the PTP These properties set the operational parameters for the PTP
clock. You must choose these carefully for the clock to work right. clock. You must choose these carefully for the clock to work right.

View File

@ -125,6 +125,9 @@ functions/definitions
.. kernel-doc:: include/net/mac80211.h .. kernel-doc:: include/net/mac80211.h
:functions: ieee80211_rx_status :functions: ieee80211_rx_status
.. kernel-doc:: include/net/mac80211.h
:functions: mac80211_rx_encoding_flags
.. kernel-doc:: include/net/mac80211.h .. kernel-doc:: include/net/mac80211.h
:functions: mac80211_rx_flags :functions: mac80211_rx_flags

View File

@ -295,6 +295,41 @@ using::
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
can be displayed with "-h", as usual. can be displayed with "-h", as usual.
FAQ
=======
Q: I am not seeing any traffic on the socket. What am I doing wrong?
A: When a netdev of a physical NIC is initialized, Linux usually
allocates one Rx and Tx queue pair per core. So on a 8 core system,
queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
bind call or the xsk_socket__create libbpf function call, you
specify a specific queue id to bind to and it is only the traffic
towards that queue you are going to get on you socket. So in the
example above, if you bind to queue 0, you are NOT going to get any
traffic that is distributed to queues 1 through 7. If you are
lucky, you will see the traffic, but usually it will end up on one
of the queues you have not bound to.
There are a number of ways to solve the problem of getting the
traffic you want to the queue id you bound to. If you want to see
all the traffic, you can force the netdev to only have 1 queue, queue
id 0, and then bind to queue 0. You can use ethtool to do this::
sudo ethtool -L <interface> combined 1
If you want to only see part of the traffic, you can program the
NIC through ethtool to filter out your traffic to a single queue id
that you can bind your XDP socket to. Here is one example in which
UDP traffic to and from port 4242 are sent to queue 2::
sudo ethtool -N <interface> rx-flow-hash udp4 fn
sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
4242 action 2
A number of other ways are possible all up to the capabilitites of
the NIC you have.
Credits Credits
======= =======
@ -309,4 +344,3 @@ Credits
- Michael S. Tsirkin - Michael S. Tsirkin
- Qi Z Zhang - Qi Z Zhang
- Willem de Bruijn - Willem de Bruijn

View File

@ -27,11 +27,12 @@ Driver Overview
The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and The DPIO driver is bound to DPIO objects discovered on the fsl-mc bus and
provides services that: provides services that:
A) allow other drivers, such as the Ethernet driver, to enqueue and dequeue
A. allow other drivers, such as the Ethernet driver, to enqueue and dequeue
frames for their respective objects frames for their respective objects
B) allow drivers to register callbacks for data availability notifications B. allow drivers to register callbacks for data availability notifications
when data becomes available on a queue or channel when data becomes available on a queue or channel
C) allow drivers to manage hardware buffer pools C. allow drivers to manage hardware buffer pools
The Linux DPIO driver consists of 3 primary components-- The Linux DPIO driver consists of 3 primary components--
DPIO object driver-- fsl-mc driver that manages the DPIO object DPIO object driver-- fsl-mc driver that manages the DPIO object
@ -140,11 +141,10 @@ QBman portal interface (qbman-portal.c)
The qbman-portal component provides APIs to do the low level hardware The qbman-portal component provides APIs to do the low level hardware
bit twiddling for operations such as: bit twiddling for operations such as:
-initializing Qman software portals
-building and sending portal commands - initializing Qman software portals
- building and sending portal commands
-portal interrupt configuration and processing - portal interrupt configuration and processing
The qbman-portal APIs are not public to other drivers, and are The qbman-portal APIs are not public to other drivers, and are
only used by dpio-service. only used by dpio-service.

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
==============================================================
Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
============================================================== ==============================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
===========================================================
Linux* Base Driver for Intel(R) Ethernet Network Connection Linux* Base Driver for Intel(R) Ethernet Network Connection
=========================================================== ===========================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
======================================================
Linux* Driver for Intel(R) Ethernet Network Connection Linux* Driver for Intel(R) Ethernet Network Connection
====================================================== ======================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
==============================================================
Linux* Base Driver for Intel(R) Ethernet Multi-host Controller Linux* Base Driver for Intel(R) Ethernet Multi-host Controller
============================================================== ==============================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
==================================================================
Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series Linux* Base Driver for the Intel(R) Ethernet Controller 700 Series
================================================================== ==================================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
==================================================================
Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function Linux* Base Driver for Intel(R) Ethernet Adaptive Virtual Function
================================================================== ==================================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
===================================================================
Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series Linux* Base Driver for the Intel(R) Ethernet Connection E800 Series
=================================================================== ===================================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
===========================================================
Linux* Base Driver for Intel(R) Ethernet Network Connection Linux* Base Driver for Intel(R) Ethernet Network Connection
=========================================================== ===========================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
============================================================
Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet Linux* Base Virtual Function Driver for Intel(R) 1G Ethernet
============================================================ ============================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
=====================================================================
Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
===================================================================== =====================================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
=============================================================================
Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Adapters
============================================================================= =============================================================================

View File

@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0+ .. SPDX-License-Identifier: GPL-2.0+
=============================================================
Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet Linux* Base Virtual Function Driver for Intel(R) 10G Ethernet
============================================================= =============================================================

View File

@ -267,7 +267,7 @@ static struct fixed_phy_status stmmac0_fixed_phy_status = {
During the board's device_init we can configure the first During the board's device_init we can configure the first
MAC for fixed_link by calling: MAC for fixed_link by calling:
fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status, -1); fixed_phy_add(PHY_POLL, 1, &stmmac0_fixed_phy_status);
and the second one, with a real PHY device attached to the bus, and the second one, with a real PHY device attached to the bus,
by using the stmmac_mdio_bus_data structure (to provide the id, the by using the stmmac_mdio_bus_data structure (to provide the id, the
reset procedure etc). reset procedure etc).

View File

@ -0,0 +1,86 @@
The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
information.
The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.
The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
- Recovery procedures
- Diagnostics and object dump procedures
- OOB initial parameters
Different parts of the driver can register different types of health reporters
with different handlers.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and saved at the reporter instance (as long as
there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto-recovery configuration
- Grace period vs. time passed since last recover
The user interface:
User can access/change each reporter's parameters and driver specific callbacks
via devlink, e.g per error type (per health reporter)
- Configure reporter's generic parameters (like: disable/enable auto recovery)
- Invoke recovery procedure
- Run diagnostics
- Object dump
The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
Retrieves the last stored dump. Devlink health
saves a single dump. If an dump is not already stored by the devlink
for this reporter, devlink generates a new dump.
dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
Clears the last saved dump file for the specified reporter.
netlink
+--------------------------+
| |
| + |
| | |
+--------------------------+
|request for ops
|(diagnose,
mlx5_core devlink |recover,
|dump)
+--------+ +--------------------------+
| | | reporter| |
| | | +---------v----------+ |
| | ops execution | | | |
| <----------------------------------+ | |
| | | | | |
| | | + ^------------------+ |
| | | | request for ops |
| | | | (recover, dump) |
| | | | |
| | | +-+------------------+ |
| | health report | | health handler | |
| +-------------------------------> | |
| | | +--------------------+ |
| | health reporter create | |
| +----------------------------> |
+--------+ +--------------------------+

View File

@ -0,0 +1,43 @@
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
=====================
Devlink info versions
=====================
board.id
========
Unique identifier of the board design.
board.rev
=========
Board design revision.
board.manufacture
=================
An identifier of the company or the facility which produced the part.
fw.mgmt
=======
Control unit firmware version. This firmware is responsible for house
keeping tasks, PHY control etc. but not the packet-by-packet data path
operation.
fw.app
======
Data path microcode controlling high-speed packet processing.
fw.undi
=======
UNDI software, may include the UEFI driver, firmware or both.
fw.ncsi
=======
Version of the software responsible for supporting/handling the
Network Controller Sideband Interface.

View File

@ -0,0 +1,10 @@
fw_load_policy [DEVICE, GENERIC]
Configuration mode: driverinit
acl_region_rehash_interval [DEVICE, DRIVER-SPECIFIC]
Sets an interval for periodic ACL region rehashes.
The value is in milliseconds, minimal value is "3000".
Value "0" disables the periodic work.
The first rehash will be run right after value is set.
Type: u32
Configuration mode: runtime

View File

@ -236,19 +236,6 @@ description.
Design limitations Design limitations
================== ==================
DSA is a platform device driver
-------------------------------
DSA is implemented as a DSA platform device driver which is convenient because
it will register the entire DSA switch tree attached to a master network device
in one-shot, facilitating the device creation and simplifying the device driver
model a bit, this comes however with a number of limitations:
- building DSA and its switch drivers as modules is currently not working
- the device driver parenting does not necessarily reflect the original
bus/device the switch can be created from
- supporting non-MDIO and non-MMIO (platform) switches is not possible
Limits on the number of devices and ports Limits on the number of devices and ports
----------------------------------------- -----------------------------------------

View File

@ -464,10 +464,11 @@ breakpoints: 0 1
JIT compiler JIT compiler
------------ ------------
The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC, The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC,
ARM, ARM64, MIPS and s390 and can be enabled through CONFIG_BPF_JIT. The JIT PowerPC, ARM, ARM64, MIPS, RISC-V and s390 and can be enabled through
compiler is transparently invoked for each attached filter from user space CONFIG_BPF_JIT. The JIT compiler is transparently invoked for each
or for internal kernel users if it has been previously enabled by root: attached filter from user space or for internal kernel users if it has
been previously enabled by root:
echo 1 > /proc/sys/net/core/bpf_jit_enable echo 1 > /proc/sys/net/core/bpf_jit_enable
@ -603,9 +604,10 @@ got from bpf_prog_create(), and 'ctx' the given context (e.g.
skb pointer). All constraints and restrictions from bpf_check_classic() apply skb pointer). All constraints and restrictions from bpf_check_classic() apply
before a conversion to the new layout is being done behind the scenes! before a conversion to the new layout is being done behind the scenes!
Currently, the classic BPF format is being used for JITing on most 32-bit Currently, the classic BPF format is being used for JITing on most
architectures, whereas x86-64, aarch64, s390x, powerpc64, sparc64, arm32 perform 32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
JIT compilation from eBPF instruction set. sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
instruction set.
Some core changes of the new internal format: Some core changes of the new internal format:
@ -827,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9
is not used by socket filters either, but more complex filters may be running is not used by socket filters either, but more complex filters may be running
out of registers and would have to resort to spill/fill to stack. out of registers and would have to resort to spill/fill to stack.
Internal BPF can used as generic assembler for last step performance Internal BPF can be used as a generic assembler for last step performance
optimizations, socket filters and seccomp are using it as assembler. Tracing optimizations, socket filters and seccomp are using it as assembler. Tracing
filters may use it as assembler to generate code from kernel. In kernel usage filters may use it as assembler to generate code from kernel. In kernel usage
may not be bounded by security considerations, since generated internal BPF code may not be bounded by security considerations, since generated internal BPF code
@ -865,7 +867,7 @@ Three LSB bits store instruction class which is one of:
BPF_STX 0x03 BPF_STX 0x03 BPF_STX 0x03 BPF_STX 0x03
BPF_ALU 0x04 BPF_ALU 0x04 BPF_ALU 0x04 BPF_ALU 0x04
BPF_JMP 0x05 BPF_JMP 0x05 BPF_JMP 0x05 BPF_JMP 0x05
BPF_RET 0x06 [ class 6 unused, for future if needed ] BPF_RET 0x06 BPF_JMP32 0x06
BPF_MISC 0x07 BPF_ALU64 0x07 BPF_MISC 0x07 BPF_ALU64 0x07
When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ... When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
@ -902,9 +904,9 @@ If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */ BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
BPF_END 0xd0 /* eBPF only: endianness conversion */ BPF_END 0xd0 /* eBPF only: endianness conversion */
If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of: If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of:
BPF_JA 0x00 BPF_JA 0x00 /* BPF_JMP only */
BPF_JEQ 0x10 BPF_JEQ 0x10
BPF_JGT 0x20 BPF_JGT 0x20
BPF_JGE 0x30 BPF_JGE 0x30
@ -912,8 +914,8 @@ If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
BPF_JNE 0x50 /* eBPF only: jump != */ BPF_JNE 0x50 /* eBPF only: jump != */
BPF_JSGT 0x60 /* eBPF only: signed '>' */ BPF_JSGT 0x60 /* eBPF only: signed '>' */
BPF_JSGE 0x70 /* eBPF only: signed '>=' */ BPF_JSGE 0x70 /* eBPF only: signed '>=' */
BPF_CALL 0x80 /* eBPF only: function call */ BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */
BPF_EXIT 0x90 /* eBPF only: function return */ BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */
BPF_JLT 0xa0 /* eBPF only: unsigned '<' */ BPF_JLT 0xa0 /* eBPF only: unsigned '<' */
BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */ BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */
BPF_JSLT 0xc0 /* eBPF only: signed '<' */ BPF_JSLT 0xc0 /* eBPF only: signed '<' */
@ -936,8 +938,9 @@ Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
operation. Classic BPF_RET | BPF_K means copy imm32 into return register operation. Classic BPF_RET | BPF_K means copy imm32 into return register
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
in eBPF means function exit only. The eBPF program needs to store return in eBPF means function exit only. The eBPF program needs to store return
value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
unused and reserved for future use. BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
operands for the comparisons instead.
For load and store instructions the 8-bit 'code' field is divided as: For load and store instructions the 8-bit 'code' field is divided as:

View File

@ -1,154 +1,38 @@
===============================
Linux IEEE 802.15.4 implementation IEEE 802.15.4 Developer's Guide
===============================
Introduction Introduction
============ ============
The IEEE 802.15.4 working group focuses on standardization of the bottom The IEEE 802.15.4 working group focuses on standardization of the bottom
two layers: Medium Access Control (MAC) and Physical access (PHY). And there two layers: Medium Access Control (MAC) and Physical access (PHY). And there
are mainly two options available for upper layers: are mainly two options available for upper layers:
- ZigBee - proprietary protocol from the ZigBee Alliance
- 6LoWPAN - IPv6 networking over low rate personal area networks - ZigBee - proprietary protocol from the ZigBee Alliance
- 6LoWPAN - IPv6 networking over low rate personal area networks
The goal of the Linux-wpan is to provide a complete implementation The goal of the Linux-wpan is to provide a complete implementation
of the IEEE 802.15.4 and 6LoWPAN protocols. IEEE 802.15.4 is a stack of the IEEE 802.15.4 and 6LoWPAN protocols. IEEE 802.15.4 is a stack
of protocols for organizing Low-Rate Wireless Personal Area Networks. of protocols for organizing Low-Rate Wireless Personal Area Networks.
The stack is composed of three main parts: The stack is composed of three main parts:
- IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
the generic Linux networking stack to transfer IEEE 802.15.4 data
messages and a special protocol over netlink for configuration/management
- MAC - provides access to shared channel and reliable data delivery
- PHY - represents device drivers
- IEEE 802.15.4 layer; We have chosen to use plain Berkeley socket API,
the generic Linux networking stack to transfer IEEE 802.15.4 data
messages and a special protocol over netlink for configuration/management
- MAC - provides access to shared channel and reliable data delivery
- PHY - represents device drivers
Socket API Socket API
========== ==========
int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0); .. c:function:: int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
.....
The address family, socket addresses etc. are defined in the The address family, socket addresses etc. are defined in the
include/net/af_ieee802154.h header or in the special header include/net/af_ieee802154.h header or in the special header
in the userspace package (see either http://wpan.cakelab.org/ or the in the userspace package (see either http://wpan.cakelab.org/ or the
git tree at https://github.com/linux-wpan/wpan-tools). git tree at https://github.com/linux-wpan/wpan-tools).
Kernel side
=============
Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
exports a management (e.g. MLME) and data API.
2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
possibly with some kinds of acceleration like automatic CRC computation and
comparation, automagic ACK handling, address matching, etc.
Those types of devices require different approach to be hooked into Linux kernel.
HardMAC
=======
See the header include/net/ieee802154_netdev.h. You have to implement Linux
net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
code via plain sk_buffs. On skb reception skb->cb must contain additional
info as described in the struct ieee802154_mac_cb. During packet transmission
the skb->cb is used to provide additional data to device's header_ops->create
function. Be aware that this data can be overridden later (when socket code
submits skb to qdisc), so if you need something from that cb later, you should
store info in the skb->data on your own.
To hook the MLME interface you have to populate the ml_priv field of your
net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
All other fields are required.
SoftMAC
=======
The MAC is the middle layer in the IEEE 802.15.4 Linux stack. This moment it
provides interface for drivers registration and management of slave interfaces.
NOTE: Currently the only monitor device type is supported - it's IEEE 802.15.4
stack interface for network sniffers (e.g. WireShark).
This layer is going to be extended soon.
See header include/net/mac802154.h and several drivers in
drivers/net/ieee802154/.
Device drivers API
==================
The include/net/mac802154.h defines following functions:
- struct ieee802154_hw *
ieee802154_alloc_hw(size_t priv_data_len, const struct ieee802154_ops *ops):
allocation of IEEE 802.15.4 compatible hardware device
- void ieee802154_free_hw(struct ieee802154_hw *hw):
freeing allocated hardware device
- int ieee802154_register_hw(struct ieee802154_hw *hw):
register PHY which is the allocated hardware device, in the system
- void ieee802154_unregister_hw(struct ieee802154_hw *hw):
freeing registered PHY
- void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb,
u8 lqi):
telling 802.15.4 module there is a new received frame in the skb with
the RF Link Quality Indicator (LQI) from the hardware device
- void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb,
bool ifs_handling):
telling 802.15.4 module the frame in the skb is or going to be
transmitted through the hardware device
The device driver must implement the following callbacks in the IEEE 802.15.4
operations structure at least:
struct ieee802154_ops {
...
int (*start)(struct ieee802154_hw *hw);
void (*stop)(struct ieee802154_hw *hw);
...
int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
int (*ed)(struct ieee802154_hw *hw, u8 *level);
int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
...
};
- int start(struct ieee802154_hw *hw):
handler that 802.15.4 module calls for the hardware device initialization.
- void stop(struct ieee802154_hw *hw):
handler that 802.15.4 module calls for the hardware device cleanup.
- int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
handler that 802.15.4 module calls for each frame in the skb going to be
transmitted through the hardware device.
- int ed(struct ieee802154_hw *hw, u8 *level):
handler that 802.15.4 module calls for Energy Detection from the hardware
device.
- int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
set radio for listening on specific channel of the hardware device.
Moreover IEEE 802.15.4 device operations structure should be filled.
Fake drivers
============
In addition there is a driver available which simulates a real device with
SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
provides a possibility to test and debug the stack without usage of real hardware.
See sources in drivers/net/ieee802154 folder for more details.
6LoWPAN Linux implementation 6LoWPAN Linux implementation
============================ ============================
@ -173,5 +57,124 @@ and net/ieee802154/6lowpan/*
To setup a 6LoWPAN interface you need: To setup a 6LoWPAN interface you need:
1. Add IEEE802.15.4 interface and set channel and PAN ID; 1. Add IEEE802.15.4 interface and set channel and PAN ID;
2. Add 6lowpan interface by command like: 2. Add 6lowpan interface by command like:
# ip link add link wpan0 name lowpan0 type lowpan # ip link add link wpan0 name lowpan0 type lowpan
3. Bring up 'lowpan0' interface 3. Bring up 'lowpan0' interface
Drivers
=======
Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
exports a management (e.g. MLME) and data API.
2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
possibly with some kinds of acceleration like automatic CRC computation and
comparation, automagic ACK handling, address matching, etc.
Those types of devices require different approach to be hooked into Linux kernel.
HardMAC
-------
See the header include/net/ieee802154_netdev.h. You have to implement Linux
net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
code via plain sk_buffs. On skb reception skb->cb must contain additional
info as described in the struct ieee802154_mac_cb. During packet transmission
the skb->cb is used to provide additional data to device's header_ops->create
function. Be aware that this data can be overridden later (when socket code
submits skb to qdisc), so if you need something from that cb later, you should
store info in the skb->data on your own.
To hook the MLME interface you have to populate the ml_priv field of your
net_device with a pointer to struct ieee802154_mlme_ops instance. The fields
assoc_req, assoc_resp, disassoc_req, start_req, and scan_req are optional.
All other fields are required.
SoftMAC
-------
The MAC is the middle layer in the IEEE 802.15.4 Linux stack. This moment it
provides interface for drivers registration and management of slave interfaces.
NOTE: Currently the only monitor device type is supported - it's IEEE 802.15.4
stack interface for network sniffers (e.g. WireShark).
This layer is going to be extended soon.
See header include/net/mac802154.h and several drivers in
drivers/net/ieee802154/.
Fake drivers
------------
In addition there is a driver available which simulates a real device with
SoftMAC (fakelb - IEEE 802.15.4 loopback driver) interface. This option
provides a possibility to test and debug the stack without usage of real hardware.
Device drivers API
==================
The include/net/mac802154.h defines following functions:
.. c:function:: struct ieee802154_dev *ieee802154_alloc_device (size_t priv_size, struct ieee802154_ops *ops)
Allocation of IEEE 802.15.4 compatible device.
.. c:function:: void ieee802154_free_device(struct ieee802154_dev *dev)
Freeing allocated device.
.. c:function:: int ieee802154_register_device(struct ieee802154_dev *dev)
Register PHY in the system.
.. c:function:: void ieee802154_unregister_device(struct ieee802154_dev *dev)
Freeing registered PHY.
.. c:function:: void ieee802154_rx_irqsafe(struct ieee802154_hw *hw, struct sk_buff *skb, u8 lqi):
Telling 802.15.4 module there is a new received frame in the skb with
the RF Link Quality Indicator (LQI) from the hardware device.
.. c:function:: void ieee802154_xmit_complete(struct ieee802154_hw *hw, struct sk_buff *skb, bool ifs_handling):
Telling 802.15.4 module the frame in the skb is or going to be
transmitted through the hardware device
The device driver must implement the following callbacks in the IEEE 802.15.4
operations structure at least::
struct ieee802154_ops {
...
int (*start)(struct ieee802154_hw *hw);
void (*stop)(struct ieee802154_hw *hw);
...
int (*xmit_async)(struct ieee802154_hw *hw, struct sk_buff *skb);
int (*ed)(struct ieee802154_hw *hw, u8 *level);
int (*set_channel)(struct ieee802154_hw *hw, u8 page, u8 channel);
...
};
.. c:function:: int start(struct ieee802154_hw *hw):
Handler that 802.15.4 module calls for the hardware device initialization.
.. c:function:: void stop(struct ieee802154_hw *hw):
Handler that 802.15.4 module calls for the hardware device cleanup.
.. c:function:: int xmit_async(struct ieee802154_hw *hw, struct sk_buff *skb):
Handler that 802.15.4 module calls for each frame in the skb going to be
transmitted through the hardware device.
.. c:function:: int ed(struct ieee802154_hw *hw, u8 *level):
Handler that 802.15.4 module calls for Energy Detection from the hardware
device.
.. c:function:: int set_channel(struct ieee802154_hw *hw, u8 page, u8 channel):
Set radio for listening on specific channel of the hardware device.
Moreover IEEE 802.15.4 device operations structure should be filled.

View File

@ -24,11 +24,15 @@ Contents:
device_drivers/intel/i40e device_drivers/intel/i40e
device_drivers/intel/iavf device_drivers/intel/iavf
device_drivers/intel/ice device_drivers/intel/ice
devlink-info-versions
ieee802154
kapi kapi
z8530book z8530book
msg_zerocopy msg_zerocopy
failover failover
net_failover net_failover
phy
sfp-phylink
alias alias
bridge bridge
snmp_counter snmp_counter

View File

@ -0,0 +1,447 @@
=====================
PHY Abstraction Layer
=====================
Purpose
=======
Most network devices consist of set of registers which provide an interface
to a MAC layer, which communicates with the physical connection through a
PHY. The PHY concerns itself with negotiating link parameters with the link
partner on the other side of the network connection (typically, an ethernet
cable), and provides a register interface to allow drivers to determine what
settings were chosen, and to configure what settings are allowed.
While these devices are distinct from the network devices, and conform to a
standard layout for the registers, it has been common practice to integrate
the PHY management code with the network driver. This has resulted in large
amounts of redundant code. Also, on embedded systems with multiple (and
sometimes quite different) ethernet controllers connected to the same
management bus, it is difficult to ensure safe use of the bus.
Since the PHYs are devices, and the management busses through which they are
accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
In doing so, it has these goals:
#. Increase code-reuse
#. Increase overall code-maintainability
#. Speed development time for new network drivers, and for new systems
Basically, this layer is meant to provide an interface to PHY devices which
allows network driver writers to write as little code as possible, while
still providing a full feature set.
The MDIO bus
============
Most network devices are connected to a PHY by means of a management bus.
Different devices use different busses (though some share common interfaces).
In order to take advantage of the PAL, each bus interface needs to be
registered as a distinct device.
#. read and write functions must be implemented. Their prototypes are::
int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
int read(struct mii_bus *bus, int mii_id, int regnum);
mii_id is the address on the bus for the PHY, and regnum is the register
number. These functions are guaranteed not to be called from interrupt
time, so it is safe for them to block, waiting for an interrupt to signal
the operation is complete
#. A reset function is optional. This is used to return the bus to an
initialized state.
#. A probe function is needed. This function should set up anything the bus
driver needs, setup the mii_bus structure, and register with the PAL using
mdiobus_register. Similarly, there's a remove function to undo all of
that (use mdiobus_unregister).
#. Like any driver, the device_driver structure must be configured, and init
exit functions are used to register the driver.
#. The bus must also be declared somewhere as a device, and registered.
As an example for how one driver implemented an mdio bus driver, see
drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
(RG)MII/electrical interface considerations
===========================================
The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
electrical signal interface using a synchronous 125Mhz clock signal and several
data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
between the clock line (RXC or TXC) and the data lines to let the PHY (clock
sink) have enough setup and hold times to sample the data lines correctly. The
PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
the PHY driver and optionally the MAC driver, implement the required delay. The
values of phy_interface_t must be understood from the perspective of the PHY
device itself, leading to the following:
* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
internal delay by itself, it assumes that either the Ethernet MAC (if capable
or the PCB traces) insert the correct 1.5-2ns delay
* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
for the transmit data lines (TXD[3:0]) processed by the PHY device
* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
for the receive data lines (RXD[3:0]) processed by the PHY device
* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
both transmit AND receive data lines from/to the PHY device
Whenever possible, use the PHY side RGMII delay for these reasons:
* PHY devices may offer sub-nanosecond granularity in how they allow a
receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
precision may be required to account for differences in PCB trace lengths
* PHY devices are typically qualified for a large range of applications
(industrial, medical, automotive...), and they provide a constant and
reliable delay across temperature/pressure/voltage ranges
* PHY device drivers in PHYLIB being reusable by nature, being able to
configure correctly a specified delay enables more designs with similar delay
requirements to be operate correctly
For cases where the PHY is not capable of providing this delay, but the
Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
configured correctly in order to provide the required transmit and/or receive
side delay from the perspective of the PHY device. Conversely, if the Ethernet
MAC driver looks at the phy_interface_t value, for any other mode but
PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
disabled.
In case neither the Ethernet MAC, nor the PHY are capable of providing the
required delays, as defined per the RGMII standard, several options may be
available:
* Some SoCs may offer a pin pad/mux/controller capable of configuring a given
set of pins'strength, delays, and voltage; and it may be a suitable
option to insert the expected 2ns RGMII delay.
* Modifying the PCB design to include a fixed delay (e.g: using a specifically
designed serpentine), which may not require software configuration at all.
Common problems with RGMII delay mismatch
-----------------------------------------
When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
will most likely result in the clock and data line signals to be unstable when
the PHY or MAC take a snapshot of these signals to translate them into logical
1 or 0 states and reconstruct the data being transmitted/received. Typical
symptoms include:
* Transmission/reception partially works, and there is frequent or occasional
packet loss observed
* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
or just discard them all
* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
(since there is enough setup/hold time in that case)
Connecting to a PHY
===================
Sometime during startup, the network driver needs to establish a connection
between the PHY device, and the network device. At this time, the PHY's bus
and drivers need to all have been loaded, so it is ready for the connection.
At this point, there are several ways to connect to the PHY:
#. The PAL handles everything, and only calls the network driver when
the link state changes, so it can react.
#. The PAL handles everything except interrupts (usually because the
controller has the interrupt registers).
#. The PAL handles everything, but checks in with the driver every second,
allowing the network driver to react first to any changes before the PAL
does.
#. The PAL serves only as a library of functions, with the network device
manually calling functions to update status, and configure the PHY
Letting the PHY Abstraction Layer do Everything
===============================================
If you choose option 1 (The hope is that every driver can, but to still be
useful to drivers that can't), connecting to the PHY is simple:
First, you need a function to react to changes in the link state. This
function follows this protocol::
static void adjust_link(struct net_device *dev);
Next, you need to know the device name of the PHY connected to this device.
The name will look something like, "0:00", where the first number is the
bus id, and the second is the PHY's address on that bus. Typically,
the bus is responsible for making its ID unique.
Now, to connect, just call this function::
phydev = phy_connect(dev, phy_name, &adjust_link, interface);
*phydev* is a pointer to the phy_device structure which represents the PHY.
If phy_connect is successful, it will return the pointer. dev, here, is the
pointer to your net_device. Once done, this function will have started the
PHY's software state machine, and registered for the PHY's interrupt, if it
has one. The phydev structure will be populated with information about the
current state, though the PHY will not yet be truly operational at this
point.
PHY-specific flags should be set in phydev->dev_flags prior to the call
to phy_connect() such that the underlying PHY driver can check for flags
and perform specific operations based on them.
This is useful if the system has put hardware restrictions on
the PHY/controller, of which the PHY needs to be aware.
*interface* is a u32 which specifies the connection type used
between the controller and the PHY. Examples are GMII, MII,
RGMII, and SGMII. For a full list, see include/linux/phy.h
Now just make sure that phydev->supported and phydev->advertising have any
values pruned from them which don't make sense for your controller (a 10/100
controller may be connected to a gigabit capable PHY, so you would need to
mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
for these bitfields. Note that you should not SET any bits, except the
SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
put into an unsupported state.
Lastly, once the controller is ready to handle network traffic, you call
phy_start(phydev). This tells the PAL that you are ready, and configures the
PHY to connect to the network. If the MAC interrupt of your network driver
also handles PHY status changes, just set phydev->irq to PHY_IGNORE_INTERRUPT
before you call phy_start and use phy_mac_interrupt() from the network
driver. If you don't want to use interrupts, set phydev->irq to PHY_POLL.
phy_start() enables the PHY interrupts (if applicable) and starts the
phylib state machine.
When you want to disconnect from the network (even if just briefly), you call
phy_stop(phydev). This function also stops the phylib state machine and
disables PHY interrupts.
Pause frames / flow control
===========================
The PHY does not participate directly in flow control/pause frames except by
making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
controller supports such a thing. Since flow control/pause frames generation
involves the Ethernet MAC driver, it is recommended that this driver takes care
of properly indicating advertisement and support for such features by setting
the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
either before or after phy_connect() and/or as a result of implementing the
ethtool::set_pauseparam feature.
Keeping Close Tabs on the PAL
=============================
It is possible that the PAL's built-in state machine needs a little help to
keep your network device and the PHY properly in sync. If so, you can
register a helper function when connecting to the PHY, which will be called
every second before the state machine reacts to any changes. To do this, you
need to manually call phy_attach() and phy_prepare_link(), and then call
phy_start_machine() with the second argument set to point to your special
handler.
Currently there are no examples of how to use this functionality, and testing
on it has been limited because the author does not have any drivers which use
it (they all use option 1). So Caveat Emptor.
Doing it all yourself
=====================
There's a remote chance that the PAL's built-in state machine cannot track
the complex interactions between the PHY and your network device. If this is
so, you can simply call phy_attach(), and not call phy_start_machine or
phy_prepare_link(). This will mean that phydev->state is entirely yours to
handle (phy_start and phy_stop toggle between some of the states, so you
might need to avoid them).
An effort has been made to make sure that useful functionality can be
accessed without the state-machine running, and most of these functions are
descended from functions which did not interact with a complex state-machine.
However, again, no effort has been made so far to test running without the
state machine, so tryer beware.
Here is a brief rundown of the functions::
int phy_read(struct phy_device *phydev, u16 regnum);
int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
Simple read/write primitives. They invoke the bus's read/write function
pointers.
::
void phy_print_status(struct phy_device *phydev);
A convenience function to print out the PHY status neatly.
::
void phy_request_interrupt(struct phy_device *phydev);
Requests the IRQ for the PHY interrupts.
::
struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
phy_interface_t interface);
Attaches a network device to a particular PHY, binding the PHY to a generic
driver if none was found during bus initialization.
::
int phy_start_aneg(struct phy_device *phydev);
Using variables inside the phydev structure, either configures advertising
and resets autonegotiation, or disables autonegotiation, and configures
forced settings.
::
static inline int phy_read_status(struct phy_device *phydev);
Fills the phydev structure with up-to-date information about the current
settings in the PHY.
::
int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
Ethtool convenience functions.
::
int phy_mii_ioctl(struct phy_device *phydev,
struct mii_ioctl_data *mii_data, int cmd);
The MII ioctl. Note that this function will completely screw up the state
machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
use this only to write registers which are not standard, and don't set off
a renegotiation.
PHY Device Drivers
==================
With the PHY Abstraction Layer, adding support for new PHYs is
quite easy. In some cases, no work is required at all! However,
many PHYs require a little hand-holding to get up-and-running.
Generic PHY driver
------------------
If the desired PHY doesn't have any errata, quirks, or special
features you want to support, then it may be best to not add
support, and let the PHY Abstraction Layer's Generic PHY Driver
do all of the work.
Writing a PHY driver
--------------------
If you do need to write a PHY driver, the first thing to do is
make sure it can be matched with an appropriate PHY device.
This is done during bus initialization by reading the device's
UID (stored in registers 2 and 3), then comparing it to each
driver's phy_id field by ANDing it with each driver's
phy_id_mask field. Also, it needs a name. Here's an example::
static struct phy_driver dm9161_driver = {
.phy_id = 0x0181b880,
.name = "Davicom DM9161E",
.phy_id_mask = 0x0ffffff0,
...
}
Next, you need to specify what features (speed, duplex, autoneg,
etc) your PHY device and driver support. Most PHYs support
PHY_BASIC_FEATURES, but you can look in include/mii.h for other
features.
Each driver consists of a number of function pointers, documented
in include/linux/phy.h under the phy_driver structure.
Of these, only config_aneg and read_status are required to be
assigned by the driver code. The rest are optional. Also, it is
preferred to use the generic phy driver's versions of these two
functions if at all possible: genphy_read_status and
genphy_config_aneg. If this is not possible, it is likely that
you only need to perform some actions before and after invoking
these functions, and so your functions will wrap the generic
ones.
Feel free to look at the Marvell, Cicada, and Davicom drivers in
drivers/net/phy/ for examples (the lxt and qsemi drivers have
not been tested as of this writing).
The PHY's MMD register accesses are handled by the PAL framework
by default, but can be overridden by a specific PHY driver if
required. This could be the case if a PHY was released for
manufacturing before the MMD PHY register definitions were
standardized by the IEEE. Most modern PHYs will be able to use
the generic PAL framework for accessing the PHY's MMD registers.
An example of such usage is for Energy Efficient Ethernet support,
implemented in the PAL. This support uses the PAL to access MMD
registers for EEE query and configuration if the PHY supports
the IEEE standard access mechanisms, or can use the PHY's specific
access interfaces if overridden by the specific PHY driver. See
the Micrel driver in drivers/net/phy/ for an example of how this
can be implemented.
Board Fixups
============
Sometimes the specific interaction between the platform and the PHY requires
special handling. For instance, to change where the PHY's clock input is,
or to add a delay to account for latency issues in the data path. In order
to support such contingencies, the PHY Layer allows platform code to register
fixups to be run when the PHY is brought up (or subsequently reset).
When the PHY Layer brings up a PHY it checks to see if there are any fixups
registered for it, matching based on UID (contained in the PHY device's phy_id
field) and the bus identifier (contained in phydev->dev.bus_id). Both must
match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
wildcards for the bus ID and UID, respectively.
When a match is found, the PHY layer will invoke the run function associated
with the fixup. This function is passed a pointer to the phy_device of
interest. It should therefore only operate on that PHY.
The platform code can either register the fixup using phy_register_fixup()::
int phy_register_fixup(const char *phy_id,
u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
Or using one of the two stubs, phy_register_fixup_for_uid() and
phy_register_fixup_for_id()::
int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
int phy_register_fixup_for_id(const char *phy_id,
int (*run)(struct phy_device *));
The stubs set one of the two matching criteria, and set the other one to
match anything.
When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module,
unregister fixup and free allocate memory are required.
Call one of following function before unloading module::
int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
int phy_register_fixup_for_id(const char *phy_id);
Standards
=========
IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
RGMII v1.3:
http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
RGMII v2.0:
http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf

View File

@ -1,427 +0,0 @@
-------
PHY Abstraction Layer
(Updated 2008-04-08)
Purpose
Most network devices consist of set of registers which provide an interface
to a MAC layer, which communicates with the physical connection through a
PHY. The PHY concerns itself with negotiating link parameters with the link
partner on the other side of the network connection (typically, an ethernet
cable), and provides a register interface to allow drivers to determine what
settings were chosen, and to configure what settings are allowed.
While these devices are distinct from the network devices, and conform to a
standard layout for the registers, it has been common practice to integrate
the PHY management code with the network driver. This has resulted in large
amounts of redundant code. Also, on embedded systems with multiple (and
sometimes quite different) ethernet controllers connected to the same
management bus, it is difficult to ensure safe use of the bus.
Since the PHYs are devices, and the management busses through which they are
accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
In doing so, it has these goals:
1) Increase code-reuse
2) Increase overall code-maintainability
3) Speed development time for new network drivers, and for new systems
Basically, this layer is meant to provide an interface to PHY devices which
allows network driver writers to write as little code as possible, while
still providing a full feature set.
The MDIO bus
Most network devices are connected to a PHY by means of a management bus.
Different devices use different busses (though some share common interfaces).
In order to take advantage of the PAL, each bus interface needs to be
registered as a distinct device.
1) read and write functions must be implemented. Their prototypes are:
int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
int read(struct mii_bus *bus, int mii_id, int regnum);
mii_id is the address on the bus for the PHY, and regnum is the register
number. These functions are guaranteed not to be called from interrupt
time, so it is safe for them to block, waiting for an interrupt to signal
the operation is complete
2) A reset function is optional. This is used to return the bus to an
initialized state.
3) A probe function is needed. This function should set up anything the bus
driver needs, setup the mii_bus structure, and register with the PAL using
mdiobus_register. Similarly, there's a remove function to undo all of
that (use mdiobus_unregister).
4) Like any driver, the device_driver structure must be configured, and init
exit functions are used to register the driver.
5) The bus must also be declared somewhere as a device, and registered.
As an example for how one driver implemented an mdio bus driver, see
drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
(RG)MII/electrical interface considerations
The Reduced Gigabit Medium Independent Interface (RGMII) is a 12-pin
electrical signal interface using a synchronous 125Mhz clock signal and several
data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
between the clock line (RXC or TXC) and the data lines to let the PHY (clock
sink) have enough setup and hold times to sample the data lines correctly. The
PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
the PHY driver and optionally the MAC driver, implement the required delay. The
values of phy_interface_t must be understood from the perspective of the PHY
device itself, leading to the following:
* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
internal delay by itself, it assumes that either the Ethernet MAC (if capable
or the PCB traces) insert the correct 1.5-2ns delay
* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
for the transmit data lines (TXD[3:0]) processed by the PHY device
* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
for the receive data lines (RXD[3:0]) processed by the PHY device
* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
both transmit AND receive data lines from/to the PHY device
Whenever possible, use the PHY side RGMII delay for these reasons:
* PHY devices may offer sub-nanosecond granularity in how they allow a
receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
precision may be required to account for differences in PCB trace lengths
* PHY devices are typically qualified for a large range of applications
(industrial, medical, automotive...), and they provide a constant and
reliable delay across temperature/pressure/voltage ranges
* PHY device drivers in PHYLIB being reusable by nature, being able to
configure correctly a specified delay enables more designs with similar delay
requirements to be operate correctly
For cases where the PHY is not capable of providing this delay, but the
Ethernet MAC driver is capable of doing so, the correct phy_interface_t value
should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
configured correctly in order to provide the required transmit and/or receive
side delay from the perspective of the PHY device. Conversely, if the Ethernet
MAC driver looks at the phy_interface_t value, for any other mode but
PHY_INTERFACE_MODE_RGMII, it should make sure that the MAC-level delays are
disabled.
In case neither the Ethernet MAC, nor the PHY are capable of providing the
required delays, as defined per the RGMII standard, several options may be
available:
* Some SoCs may offer a pin pad/mux/controller capable of configuring a given
set of pins'strength, delays, and voltage; and it may be a suitable
option to insert the expected 2ns RGMII delay.
* Modifying the PCB design to include a fixed delay (e.g: using a specifically
designed serpentine), which may not require software configuration at all.
Common problems with RGMII delay mismatch
When there is a RGMII delay mismatch between the Ethernet MAC and the PHY, this
will most likely result in the clock and data line signals to be unstable when
the PHY or MAC take a snapshot of these signals to translate them into logical
1 or 0 states and reconstruct the data being transmitted/received. Typical
symptoms include:
* Transmission/reception partially works, and there is frequent or occasional
packet loss observed
* Ethernet MAC may report some or all packets ingressing with a FCS/CRC error,
or just discard them all
* Switching to lower speeds such as 10/100Mbits/sec makes the problem go away
(since there is enough setup/hold time in that case)
Connecting to a PHY
Sometime during startup, the network driver needs to establish a connection
between the PHY device, and the network device. At this time, the PHY's bus
and drivers need to all have been loaded, so it is ready for the connection.
At this point, there are several ways to connect to the PHY:
1) The PAL handles everything, and only calls the network driver when
the link state changes, so it can react.
2) The PAL handles everything except interrupts (usually because the
controller has the interrupt registers).
3) The PAL handles everything, but checks in with the driver every second,
allowing the network driver to react first to any changes before the PAL
does.
4) The PAL serves only as a library of functions, with the network device
manually calling functions to update status, and configure the PHY
Letting the PHY Abstraction Layer do Everything
If you choose option 1 (The hope is that every driver can, but to still be
useful to drivers that can't), connecting to the PHY is simple:
First, you need a function to react to changes in the link state. This
function follows this protocol:
static void adjust_link(struct net_device *dev);
Next, you need to know the device name of the PHY connected to this device.
The name will look something like, "0:00", where the first number is the
bus id, and the second is the PHY's address on that bus. Typically,
the bus is responsible for making its ID unique.
Now, to connect, just call this function:
phydev = phy_connect(dev, phy_name, &adjust_link, interface);
phydev is a pointer to the phy_device structure which represents the PHY. If
phy_connect is successful, it will return the pointer. dev, here, is the
pointer to your net_device. Once done, this function will have started the
PHY's software state machine, and registered for the PHY's interrupt, if it
has one. The phydev structure will be populated with information about the
current state, though the PHY will not yet be truly operational at this
point.
PHY-specific flags should be set in phydev->dev_flags prior to the call
to phy_connect() such that the underlying PHY driver can check for flags
and perform specific operations based on them.
This is useful if the system has put hardware restrictions on
the PHY/controller, of which the PHY needs to be aware.
interface is a u32 which specifies the connection type used
between the controller and the PHY. Examples are GMII, MII,
RGMII, and SGMII. For a full list, see include/linux/phy.h
Now just make sure that phydev->supported and phydev->advertising have any
values pruned from them which don't make sense for your controller (a 10/100
controller may be connected to a gigabit capable PHY, so you would need to
mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
for these bitfields. Note that you should not SET any bits, except the
SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
put into an unsupported state.
Lastly, once the controller is ready to handle network traffic, you call
phy_start(phydev). This tells the PAL that you are ready, and configures the
PHY to connect to the network. If you want to handle your own interrupts,
just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start.
Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL.
When you want to disconnect from the network (even if just briefly), you call
phy_stop(phydev).
Pause frames / flow control
The PHY does not participate directly in flow control/pause frames except by
making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
controller supports such a thing. Since flow control/pause frames generation
involves the Ethernet MAC driver, it is recommended that this driver takes care
of properly indicating advertisement and support for such features by setting
the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
either before or after phy_connect() and/or as a result of implementing the
ethtool::set_pauseparam feature.
Keeping Close Tabs on the PAL
It is possible that the PAL's built-in state machine needs a little help to
keep your network device and the PHY properly in sync. If so, you can
register a helper function when connecting to the PHY, which will be called
every second before the state machine reacts to any changes. To do this, you
need to manually call phy_attach() and phy_prepare_link(), and then call
phy_start_machine() with the second argument set to point to your special
handler.
Currently there are no examples of how to use this functionality, and testing
on it has been limited because the author does not have any drivers which use
it (they all use option 1). So Caveat Emptor.
Doing it all yourself
There's a remote chance that the PAL's built-in state machine cannot track
the complex interactions between the PHY and your network device. If this is
so, you can simply call phy_attach(), and not call phy_start_machine or
phy_prepare_link(). This will mean that phydev->state is entirely yours to
handle (phy_start and phy_stop toggle between some of the states, so you
might need to avoid them).
An effort has been made to make sure that useful functionality can be
accessed without the state-machine running, and most of these functions are
descended from functions which did not interact with a complex state-machine.
However, again, no effort has been made so far to test running without the
state machine, so tryer beware.
Here is a brief rundown of the functions:
int phy_read(struct phy_device *phydev, u16 regnum);
int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
Simple read/write primitives. They invoke the bus's read/write function
pointers.
void phy_print_status(struct phy_device *phydev);
A convenience function to print out the PHY status neatly.
int phy_start_interrupts(struct phy_device *phydev);
int phy_stop_interrupts(struct phy_device *phydev);
Requests the IRQ for the PHY interrupts, then enables them for
start, or disables then frees them for stop.
struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
phy_interface_t interface);
Attaches a network device to a particular PHY, binding the PHY to a generic
driver if none was found during bus initialization.
int phy_start_aneg(struct phy_device *phydev);
Using variables inside the phydev structure, either configures advertising
and resets autonegotiation, or disables autonegotiation, and configures
forced settings.
static inline int phy_read_status(struct phy_device *phydev);
Fills the phydev structure with up-to-date information about the current
settings in the PHY.
int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
Ethtool convenience functions.
int phy_mii_ioctl(struct phy_device *phydev,
struct mii_ioctl_data *mii_data, int cmd);
The MII ioctl. Note that this function will completely screw up the state
machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
use this only to write registers which are not standard, and don't set off
a renegotiation.
PHY Device Drivers
With the PHY Abstraction Layer, adding support for new PHYs is
quite easy. In some cases, no work is required at all! However,
many PHYs require a little hand-holding to get up-and-running.
Generic PHY driver
If the desired PHY doesn't have any errata, quirks, or special
features you want to support, then it may be best to not add
support, and let the PHY Abstraction Layer's Generic PHY Driver
do all of the work.
Writing a PHY driver
If you do need to write a PHY driver, the first thing to do is
make sure it can be matched with an appropriate PHY device.
This is done during bus initialization by reading the device's
UID (stored in registers 2 and 3), then comparing it to each
driver's phy_id field by ANDing it with each driver's
phy_id_mask field. Also, it needs a name. Here's an example:
static struct phy_driver dm9161_driver = {
.phy_id = 0x0181b880,
.name = "Davicom DM9161E",
.phy_id_mask = 0x0ffffff0,
...
}
Next, you need to specify what features (speed, duplex, autoneg,
etc) your PHY device and driver support. Most PHYs support
PHY_BASIC_FEATURES, but you can look in include/mii.h for other
features.
Each driver consists of a number of function pointers, documented
in include/linux/phy.h under the phy_driver structure.
Of these, only config_aneg and read_status are required to be
assigned by the driver code. The rest are optional. Also, it is
preferred to use the generic phy driver's versions of these two
functions if at all possible: genphy_read_status and
genphy_config_aneg. If this is not possible, it is likely that
you only need to perform some actions before and after invoking
these functions, and so your functions will wrap the generic
ones.
Feel free to look at the Marvell, Cicada, and Davicom drivers in
drivers/net/phy/ for examples (the lxt and qsemi drivers have
not been tested as of this writing).
The PHY's MMD register accesses are handled by the PAL framework
by default, but can be overridden by a specific PHY driver if
required. This could be the case if a PHY was released for
manufacturing before the MMD PHY register definitions were
standardized by the IEEE. Most modern PHYs will be able to use
the generic PAL framework for accessing the PHY's MMD registers.
An example of such usage is for Energy Efficient Ethernet support,
implemented in the PAL. This support uses the PAL to access MMD
registers for EEE query and configuration if the PHY supports
the IEEE standard access mechanisms, or can use the PHY's specific
access interfaces if overridden by the specific PHY driver. See
the Micrel driver in drivers/net/phy/ for an example of how this
can be implemented.
Board Fixups
Sometimes the specific interaction between the platform and the PHY requires
special handling. For instance, to change where the PHY's clock input is,
or to add a delay to account for latency issues in the data path. In order
to support such contingencies, the PHY Layer allows platform code to register
fixups to be run when the PHY is brought up (or subsequently reset).
When the PHY Layer brings up a PHY it checks to see if there are any fixups
registered for it, matching based on UID (contained in the PHY device's phy_id
field) and the bus identifier (contained in phydev->dev.bus_id). Both must
match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
wildcards for the bus ID and UID, respectively.
When a match is found, the PHY layer will invoke the run function associated
with the fixup. This function is passed a pointer to the phy_device of
interest. It should therefore only operate on that PHY.
The platform code can either register the fixup using phy_register_fixup():
int phy_register_fixup(const char *phy_id,
u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
Or using one of the two stubs, phy_register_fixup_for_uid() and
phy_register_fixup_for_id():
int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
int (*run)(struct phy_device *));
int phy_register_fixup_for_id(const char *phy_id,
int (*run)(struct phy_device *));
The stubs set one of the two matching criteria, and set the other one to
match anything.
When phy_register_fixup() or *_for_uid()/*_for_id() is called at module,
unregister fixup and free allocate memory are required.
Call one of following function before unloading module.
int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask);
int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask);
int phy_register_fixup_for_id(const char *phy_id);
Standards
IEEE Standard 802.3: CSMA/CD Access Method and Physical Layer Specifications, Section Two:
http://standards.ieee.org/getieee802/download/802.3-2008_section2.pdf
RGMII v1.3:
http://web.archive.org/web/20160303212629/http://www.hp.com/rnd/pdfs/RGMIIv1_3.pdf
RGMII v2.0:
http://web.archive.org/web/20160303171328/http://www.hp.com/rnd/pdfs/RGMIIv2_0_final_hp.pdf

View File

@ -0,0 +1,268 @@
.. SPDX-License-Identifier: GPL-2.0
=======
phylink
=======
Overview
========
phylink is a mechanism to support hot-pluggable networking modules
without needing to re-initialise the adapter on hot-plug events.
phylink supports conventional phylib-based setups, fixed link setups
and SFP (Small Formfactor Pluggable) modules at present.
Modes of operation
==================
phylink has several modes of operation, which depend on the firmware
settings.
1. PHY mode
In PHY mode, we use phylib to read the current link settings from
the PHY, and pass them to the MAC driver. We expect the MAC driver
to configure exactly the modes that are specified without any
negotiation being enabled on the link.
2. Fixed mode
Fixed mode is the same as PHY mode as far as the MAC driver is
concerned.
3. In-band mode
In-band mode is used with 802.3z, SGMII and similar interface modes,
and we are expecting to use and honor the in-band negotiation or
control word sent across the serdes channel.
By example, what this means is that:
.. code-block:: none
&eth {
phy = <&phy>;
phy-mode = "sgmii";
};
does not use in-band SGMII signalling. The PHY is expected to follow
exactly the settings given to it in its :c:func:`mac_config` function.
The link should be forced up or down appropriately in the
:c:func:`mac_link_up` and :c:func:`mac_link_down` functions.
.. code-block:: none
&eth {
managed = "in-band-status";
phy = <&phy>;
phy-mode = "sgmii";
};
uses in-band mode, where results from the PHY's negotiation are passed
to the MAC through the SGMII control word, and the MAC is expected to
acknowledge the control word. The :c:func:`mac_link_up` and
:c:func:`mac_link_down` functions must not force the MAC side link
up and down.
Rough guide to converting a network driver to sfp/phylink
=========================================================
This guide briefly describes how to convert a network driver from
phylib to the sfp/phylink support. Please send patches to improve
this documentation.
1. Optionally split the network driver's phylib update function into
three parts dealing with link-down, link-up and reconfiguring the
MAC settings. This can be done as a separate preparation commit.
An example of this preparation can be found in git commit fc548b991fb0.
2. Replace::
select FIXED_PHY
select PHYLIB
with::
select PHYLINK
in the driver's Kconfig stanza.
3. Add::
#include <linux/phylink.h>
to the driver's list of header files.
4. Add::
struct phylink *phylink;
to the driver's private data structure. We shall refer to the
driver's private data pointer as ``priv`` below, and the driver's
private data structure as ``struct foo_priv``.
5. Replace the following functions:
.. flat-table::
:header-rows: 1
:widths: 1 1
:stub-columns: 0
* - Original function
- Replacement function
* - phy_start(phydev)
- phylink_start(priv->phylink)
* - phy_stop(phydev)
- phylink_stop(priv->phylink)
* - phy_mii_ioctl(phydev, ifr, cmd)
- phylink_mii_ioctl(priv->phylink, ifr, cmd)
* - phy_ethtool_get_wol(phydev, wol)
- phylink_ethtool_get_wol(priv->phylink, wol)
* - phy_ethtool_set_wol(phydev, wol)
- phylink_ethtool_set_wol(priv->phylink, wol)
* - phy_disconnect(phydev)
- phylink_disconnect_phy(priv->phylink)
Please note that some of these functions must be called under the
rtnl lock, and will warn if not. This will normally be the case,
except if these are called from the driver suspend/resume paths.
6. Add/replace ksettings get/set methods with:
.. code-block:: c
static int foo_ethtool_set_link_ksettings(struct net_device *dev,
const struct ethtool_link_ksettings *cmd)
{
struct foo_priv *priv = netdev_priv(dev);
return phylink_ethtool_ksettings_set(priv->phylink, cmd);
}
static int foo_ethtool_get_link_ksettings(struct net_device *dev,
struct ethtool_link_ksettings *cmd)
{
struct foo_priv *priv = netdev_priv(dev);
return phylink_ethtool_ksettings_get(priv->phylink, cmd);
}
7. Replace the call to:
phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface);
and associated code with a call to:
err = phylink_of_phy_connect(priv->phylink, node, flags);
For the most part, ``flags`` can be zero; these flags are passed to
the of_phy_attach() inside this function call if a PHY is specified
in the DT node ``node``.
``node`` should be the DT node which contains the network phy property,
fixed link properties, and will also contain the sfp property.
The setup of fixed links should also be removed; these are handled
internally by phylink.
of_phy_connect() was also passed a function pointer for link updates.
This function is replaced by a different form of MAC updates
described below in (8).
Manipulation of the PHY's supported/advertised happens within phylink
based on the validate callback, see below in (8).
Note that the driver no longer needs to store the ``phy_interface``,
and also note that ``phy_interface`` becomes a dynamic property,
just like the speed, duplex etc. settings.
Finally, note that the MAC driver has no direct access to the PHY
anymore; that is because in the phylink model, the PHY can be
dynamic.
8. Add a :c:type:`struct phylink_mac_ops <phylink_mac_ops>` instance to
the driver, which is a table of function pointers, and implement
these functions. The old link update function for
:c:func:`of_phy_connect` becomes three methods: :c:func:`mac_link_up`,
:c:func:`mac_link_down`, and :c:func:`mac_config`. If step 1 was
performed, then the functionality will have been split there.
It is important that if in-band negotiation is used,
:c:func:`mac_link_up` and :c:func:`mac_link_down` do not prevent the
in-band negotiation from completing, since these functions are called
when the in-band link state changes - otherwise the link will never
come up.
The :c:func:`validate` method should mask the supplied supported mask,
and ``state->advertising`` with the supported ethtool link modes.
These are the new ethtool link modes, so bitmask operations must be
used. For an example, see drivers/net/ethernet/marvell/mvneta.c.
The :c:func:`mac_link_state` method is used to read the link state
from the MAC, and report back the settings that the MAC is currently
using. This is particularly important for in-band negotiation
methods such as 1000base-X and SGMII.
The :c:func:`mac_config` method is used to update the MAC with the
requested state, and must avoid unnecessarily taking the link down
when making changes to the MAC configuration. This means the
function should modify the state and only take the link down when
absolutely necessary to change the MAC configuration. An example
of how to do this can be found in :c:func:`mvneta_mac_config` in
drivers/net/ethernet/marvell/mvneta.c.
For further information on these methods, please see the inline
documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`.
9. Remove calls to of_parse_phandle() for the PHY,
of_phy_register_fixed_link() for fixed links etc. from the probe
function, and replace with:
.. code-block:: c
struct phylink *phylink;
phylink = phylink_create(dev, node, phy_mode, &phylink_ops);
if (IS_ERR(phylink)) {
err = PTR_ERR(phylink);
fail probe;
}
priv->phylink = phylink;
and arrange to destroy the phylink in the probe failure path as
appropriate and the removal path too by calling:
.. code-block:: c
phylink_destroy(priv->phylink);
10. Arrange for MAC link state interrupts to be forwarded into
phylink, via:
.. code-block:: c
phylink_mac_change(priv->phylink, link_is_up);
where ``link_is_up`` is true if the link is currently up or false
otherwise.
11. Verify that the driver does not call::
netif_carrier_on()
netif_carrier_off()
as these will interfere with phylink's tracking of the link state,
and cause phylink to omit calls via the :c:func:`mac_link_up` and
:c:func:`mac_link_down` methods.
Network drivers should call phylink_stop() and phylink_start() via their
suspend/resume paths, which ensures that the appropriate
:c:type:`struct phylink_mac_ops <phylink_mac_ops>` methods are called
as necessary.
For information describing the SFP cage in DT, please see the binding
documentation in the kernel source tree
``Documentation/devicetree/bindings/net/sff,sfp.txt``

View File

@ -1,16 +1,17 @@
=========== ============
SNMP counter SNMP counter
=========== ============
This document explains the meaning of SNMP counters. This document explains the meaning of SNMP counters.
General IPv4 counters General IPv4 counters
==================== =====================
All layer 4 packets and ICMP packets will change these counters, but All layer 4 packets and ICMP packets will change these counters, but
these counters won't be changed by layer 2 packets (such as STP) or these counters won't be changed by layer 2 packets (such as STP) or
ARP packets. ARP packets.
* IpInReceives * IpInReceives
Defined in `RFC1213 ipInReceives`_ Defined in `RFC1213 ipInReceives`_
.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26 .. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
@ -23,6 +24,7 @@ and so on). It indicates the number of aggregated segments after
GRO/LRO. GRO/LRO.
* IpInDelivers * IpInDelivers
Defined in `RFC1213 ipInDelivers`_ Defined in `RFC1213 ipInDelivers`_
.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28 .. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
@ -33,6 +35,7 @@ supported protocols will be delivered, if someone listens on the raw
socket, all valid IP packets will be delivered. socket, all valid IP packets will be delivered.
* IpOutRequests * IpOutRequests
Defined in `RFC1213 ipOutRequests`_ Defined in `RFC1213 ipOutRequests`_
.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28 .. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
@ -42,6 +45,7 @@ multicast packets, and would always be updated together with
IpExtOutOctets. IpExtOutOctets.
* IpExtInOctets and IpExtOutOctets * IpExtInOctets and IpExtOutOctets
They are Linux kernel extensions, no RFC definitions. Please note, They are Linux kernel extensions, no RFC definitions. Please note,
RFC1213 indeed defines ifInOctets and ifOutOctets, but they RFC1213 indeed defines ifInOctets and ifOutOctets, but they
are different things. The ifInOctets and ifOutOctets include the MAC are different things. The ifInOctets and ifOutOctets include the MAC
@ -49,6 +53,7 @@ layer header size but IpExtInOctets and IpExtOutOctets don't, they
only include the IP layer header and the IP layer data. only include the IP layer header and the IP layer data.
* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts * IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
They indicate the number of four kinds of ECN IP packets, please refer They indicate the number of four kinds of ECN IP packets, please refer
`Explicit Congestion Notification`_ for more details. `Explicit Congestion Notification`_ for more details.
@ -60,6 +65,7 @@ for the same packet, you might find that IpInReceives count 1, but
IpExtInNoECTPkts counts 2 or more. IpExtInNoECTPkts counts 2 or more.
* IpInHdrErrors * IpInHdrErrors
Defined in `RFC1213 ipInHdrErrors`_. It indicates the packet is Defined in `RFC1213 ipInHdrErrors`_. It indicates the packet is
dropped due to the IP header error. It might happen in both IP input dropped due to the IP header error. It might happen in both IP input
and IP forward paths. and IP forward paths.
@ -67,6 +73,7 @@ and IP forward paths.
.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27 .. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpInAddrErrors * IpInAddrErrors
Defined in `RFC1213 ipInAddrErrors`_. It will be increased in two Defined in `RFC1213 ipInAddrErrors`_. It will be increased in two
scenarios: (1) The IP address is invalid. (2) The destination IP scenarios: (1) The IP address is invalid. (2) The destination IP
address is not a local address and IP forwarding is not enabled address is not a local address and IP forwarding is not enabled
@ -74,6 +81,7 @@ address is not a local address and IP forwarding is not enabled
.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27 .. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInNoRoutes * IpExtInNoRoutes
This counter means the packet is dropped when the IP stack receives a This counter means the packet is dropped when the IP stack receives a
packet and can't find a route for it from the route table. It might packet and can't find a route for it from the route table. It might
happen when IP forwarding is enabled and the destination IP address is happen when IP forwarding is enabled and the destination IP address is
@ -81,6 +89,7 @@ not a local address and there is no route for the destination IP
address. address.
* IpInUnknownProtos * IpInUnknownProtos
Defined in `RFC1213 ipInUnknownProtos`_. It will be increased if the Defined in `RFC1213 ipInUnknownProtos`_. It will be increased if the
layer 4 protocol is unsupported by kernel. If an application is using layer 4 protocol is unsupported by kernel. If an application is using
raw socket, kernel will always deliver the packet to the raw socket raw socket, kernel will always deliver the packet to the raw socket
@ -89,10 +98,12 @@ and this counter won't be increased.
.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27 .. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInTruncatedPkts * IpExtInTruncatedPkts
For IPv4 packet, it means the actual data size is smaller than the For IPv4 packet, it means the actual data size is smaller than the
"Total Length" field in the IPv4 header. "Total Length" field in the IPv4 header.
* IpInDiscards * IpInDiscards
Defined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped Defined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped
in the IP receiving path and due to kernel internal reasons (e.g. no in the IP receiving path and due to kernel internal reasons (e.g. no
enough memory). enough memory).
@ -100,20 +111,23 @@ enough memory).
.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28 .. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutDiscards * IpOutDiscards
Defined in `RFC1213 ipOutDiscards`_. It indicates the packet is Defined in `RFC1213 ipOutDiscards`_. It indicates the packet is
dropped in the IP sending path and due to kernel internal reasons. dropped in the IP sending path and due to kernel internal reasons.
.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28 .. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutNoRoutes * IpOutNoRoutes
Defined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is Defined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is
dropped in the IP sending path and no route is found for it. dropped in the IP sending path and no route is found for it.
.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29 .. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29
ICMP counters ICMP counters
============ =============
* IcmpInMsgs and IcmpOutMsgs * IcmpInMsgs and IcmpOutMsgs
Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_ Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41 .. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
@ -126,6 +140,7 @@ IcmpOutMsgs would still be updated if the IP header is constructed by
a userspace program. a userspace program.
* ICMP named types * ICMP named types
| These counters include most of common ICMP types, they are: | These counters include most of common ICMP types, they are:
| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_ | IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_ | IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
@ -180,6 +195,7 @@ straightforward. The 'In' counter means kernel receives such a packet
and the 'Out' counter means kernel sends such a packet. and the 'Out' counter means kernel sends such a packet.
* ICMP numeric types * ICMP numeric types
They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
ICMP type number. These counters track all kinds of ICMP packets. The ICMP type number. These counters track all kinds of ICMP packets. The
ICMP type number definition could be found in the `ICMP parameters`_ ICMP type number definition could be found in the `ICMP parameters`_
@ -192,12 +208,14 @@ IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
packet, IcmpMsgInType0 would increase 1. packet, IcmpMsgInType0 would increase 1.
* IcmpInCsumErrors * IcmpInCsumErrors
This counter indicates the checksum of the ICMP packet is This counter indicates the checksum of the ICMP packet is
wrong. Kernel verifies the checksum after updating the IcmpInMsgs and wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
before updating IcmpMsgInType[N]. If a packet has bad checksum, the before updating IcmpMsgInType[N]. If a packet has bad checksum, the
IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated. IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
* IcmpInErrors and IcmpOutErrors * IcmpInErrors and IcmpOutErrors
Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_ Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41 .. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
@ -209,7 +227,7 @@ and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
is increased, IcmpInErrors would always be increased too. is increased, IcmpInErrors would always be increased too.
relationship of the ICMP counters relationship of the ICMP counters
------------------------------- ---------------------------------
The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
are updated at the same time. The sum of IcmpMsgInType[N] plus are updated at the same time. The sum of IcmpMsgInType[N] plus
IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
@ -229,8 +247,9 @@ IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
IcmpInErrors. IcmpInErrors.
General TCP counters General TCP counters
================== ====================
* TcpInSegs * TcpInSegs
Defined in `RFC1213 tcpInSegs`_ Defined in `RFC1213 tcpInSegs`_
.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48 .. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
@ -247,6 +266,7 @@ isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
counter would only increase 1. counter would only increase 1.
* TcpOutSegs * TcpOutSegs
Defined in `RFC1213 tcpOutSegs`_ Defined in `RFC1213 tcpOutSegs`_
.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48 .. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
@ -258,6 +278,7 @@ GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
increase 2. increase 2.
* TcpActiveOpens * TcpActiveOpens
Defined in `RFC1213 tcpActiveOpens`_ Defined in `RFC1213 tcpActiveOpens`_
.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47 .. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@ -267,6 +288,7 @@ state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
increase 1. increase 1.
* TcpPassiveOpens * TcpPassiveOpens
Defined in `RFC1213 tcpPassiveOpens`_ Defined in `RFC1213 tcpPassiveOpens`_
.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47 .. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@ -275,6 +297,7 @@ It means the TCP layer receives a SYN, replies a SYN+ACK, come into
the SYN-RCVD state. the SYN-RCVD state.
* TcpExtTCPRcvCoalesce * TcpExtTCPRcvCoalesce
When packets are received by the TCP layer and are not be read by the When packets are received by the TCP layer and are not be read by the
application, the TCP layer will try to merge them. This counter application, the TCP layer will try to merge them. This counter
indicate how many packets are merged in such situation. If GRO is indicate how many packets are merged in such situation. If GRO is
@ -282,12 +305,14 @@ enabled, lots of packets would be merged by GRO, these packets
wouldn't be counted to TcpExtTCPRcvCoalesce. wouldn't be counted to TcpExtTCPRcvCoalesce.
* TcpExtTCPAutoCorking * TcpExtTCPAutoCorking
When sending packets, the TCP layer will try to merge small packets to When sending packets, the TCP layer will try to merge small packets to
a bigger one. This counter increase 1 for every packet merged in such a bigger one. This counter increase 1 for every packet merged in such
situation. Please refer to the LWN article for more details: situation. Please refer to the LWN article for more details:
https://lwn.net/Articles/576263/ https://lwn.net/Articles/576263/
* TcpExtTCPOrigDataSent * TcpExtTCPOrigDataSent
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below:: explaination below::
@ -297,6 +322,7 @@ explaination below::
more useful to track the TCP retransmission rate. more useful to track the TCP retransmission rate.
* TCPSynRetrans * TCPSynRetrans
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below:: explaination below::
@ -304,6 +330,7 @@ explaination below::
retransmissions into SYN, fast-retransmits, timeout retransmits, etc. retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
* TCPFastOpenActiveFail * TCPFastOpenActiveFail
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below:: explaination below::
@ -313,6 +340,7 @@ explaination below::
.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd .. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
* TcpExtListenOverflows and TcpExtListenDrops * TcpExtListenOverflows and TcpExtListenDrops
When kernel receives a SYN from a client, and if the TCP accept queue When kernel receives a SYN from a client, and if the TCP accept queue
is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows. is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
At the same time kernel will also add 1 to TcpExtListenDrops. When a At the same time kernel will also add 1 to TcpExtListenDrops. When a
@ -336,17 +364,22 @@ time client replies ACK, this socket will get another chance to move
to the accept queue. to the accept queue.
TCP Fast Open
=============
* TcpEstabResets * TcpEstabResets
Defined in `RFC1213 tcpEstabResets`_. Defined in `RFC1213 tcpEstabResets`_.
.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48 .. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
* TcpAttemptFails * TcpAttemptFails
Defined in `RFC1213 tcpAttemptFails`_. Defined in `RFC1213 tcpAttemptFails`_.
.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48 .. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
* TcpOutRsts * TcpOutRsts
Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
the 'segments sent containing the RST flag', but in linux kernel, this the 'segments sent containing the RST flag', but in linux kernel, this
couner indicates the segments kerenl tried to send. The sending couner indicates the segments kerenl tried to send. The sending
@ -354,6 +387,30 @@ process might be failed due to some errors (e.g. memory alloc failed).
.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52 .. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
* TcpExtTCPSpuriousRtxHostQueues
When the TCP stack wants to retransmit a packet, and finds that packet
is not lost in the network, but the packet is not sent yet, the TCP
stack would give up the retransmission and update this counter. It
might happen if a packet stays too long time in a qdisc or driver
queue.
* TcpEstabResets
The socket receives a RST packet in Establish or CloseWait state.
* TcpExtTCPKeepAlive
This counter indicates many keepalive packets were sent. The keepalive
won't be enabled by default. A userspace program could enable it by
setting the SO_KEEPALIVE socket option.
* TcpExtTCPSpuriousRTOs
The spurious retransmission timeout detected by the `F-RTO`_
algorithm.
.. _F-RTO: https://tools.ietf.org/html/rfc5682
TCP Fast Path TCP Fast Path
============ ============
@ -389,20 +446,23 @@ will disable the fast path at first, and try to enable it after kernel
receives packets. receives packets.
* TcpExtTCPPureAcks and TcpExtTCPHPAcks * TcpExtTCPPureAcks and TcpExtTCPHPAcks
If a packet set ACK flag and has no data, it is a pure ACK packet, if If a packet set ACK flag and has no data, it is a pure ACK packet, if
kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1, kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
if kernel handles it in the slow path, TcpExtTCPPureAcks will if kernel handles it in the slow path, TcpExtTCPPureAcks will
increase 1. increase 1.
* TcpExtTCPHPHits * TcpExtTCPHPHits
If a TCP packet has data (which means it is not a pure ACK packet), If a TCP packet has data (which means it is not a pure ACK packet),
and this packet is handled in the fast path, TcpExtTCPHPHits will and this packet is handled in the fast path, TcpExtTCPHPHits will
increase 1. increase 1.
TCP abort TCP abort
======== =========
* TcpExtTCPAbortOnData * TcpExtTCPAbortOnData
It means TCP layer has data in flight, but need to close the It means TCP layer has data in flight, but need to close the
connection. So TCP layer sends a RST to the other side, indicate the connection. So TCP layer sends a RST to the other side, indicate the
connection is not closed very graceful. An easy way to increase this connection is not closed very graceful. An easy way to increase this
@ -421,11 +481,13 @@ when the application closes a connection, kernel will send a RST
immediately and increase the TcpExtTCPAbortOnData counter. immediately and increase the TcpExtTCPAbortOnData counter.
* TcpExtTCPAbortOnClose * TcpExtTCPAbortOnClose
This counter means the application has unread data in the TCP layer when This counter means the application has unread data in the TCP layer when
the application wants to close the TCP connection. In such a situation, the application wants to close the TCP connection. In such a situation,
kernel will send a RST to the other side of the TCP connection. kernel will send a RST to the other side of the TCP connection.
* TcpExtTCPAbortOnMemory * TcpExtTCPAbortOnMemory
When an application closes a TCP connection, kernel still need to track When an application closes a TCP connection, kernel still need to track
the connection, let it complete the TCP disconnect process. E.g. an the connection, let it complete the TCP disconnect process. E.g. an
app calls the close method of a socket, kernel sends fin to the other app calls the close method of a socket, kernel sends fin to the other
@ -447,10 +509,12 @@ the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
* TcpExtTCPAbortOnTimeout * TcpExtTCPAbortOnTimeout
This counter will increase when any of the TCP timers expire. In such This counter will increase when any of the TCP timers expire. In such
situation, kernel won't send RST, just give up the connection. situation, kernel won't send RST, just give up the connection.
* TcpExtTCPAbortOnLinger * TcpExtTCPAbortOnLinger
When a TCP connection comes into FIN_WAIT_2 state, instead of waiting When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
for the fin packet from the other side, kernel could send a RST and for the fin packet from the other side, kernel could send a RST and
delete the socket immediately. This is not the default behavior of delete the socket immediately. This is not the default behavior of
@ -458,6 +522,7 @@ Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
you could let kernel follow this behavior. you could let kernel follow this behavior.
* TcpExtTCPAbortFailed * TcpExtTCPAbortFailed
The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
satisfied. If an internal error occurs during this process, satisfied. If an internal error occurs during this process,
TcpExtTCPAbortFailed will be increased. TcpExtTCPAbortFailed will be increased.
@ -465,7 +530,7 @@ TcpExtTCPAbortFailed will be increased.
.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50 .. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
TCP Hybrid Slow Start TCP Hybrid Slow Start
==================== =====================
The Hybrid Slow Start algorithm is an enhancement of the traditional The Hybrid Slow Start algorithm is an enhancement of the traditional
TCP congestion window Slow Start algorithm. It uses two pieces of TCP congestion window Slow Start algorithm. It uses two pieces of
information to detect whether the max bandwidth of the TCP path is information to detect whether the max bandwidth of the TCP path is
@ -481,23 +546,27 @@ relate with the Hybrid Slow Start algorithm.
.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf .. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
* TcpExtTCPHystartTrainDetect * TcpExtTCPHystartTrainDetect
How many times the ACK train length threshold is detected How many times the ACK train length threshold is detected
* TcpExtTCPHystartTrainCwnd * TcpExtTCPHystartTrainCwnd
The sum of CWND detected by ACK train length. Dividing this value by The sum of CWND detected by ACK train length. Dividing this value by
TcpExtTCPHystartTrainDetect is the average CWND which detected by the TcpExtTCPHystartTrainDetect is the average CWND which detected by the
ACK train length. ACK train length.
* TcpExtTCPHystartDelayDetect * TcpExtTCPHystartDelayDetect
How many times the packet delay threshold is detected. How many times the packet delay threshold is detected.
* TcpExtTCPHystartDelayCwnd * TcpExtTCPHystartDelayCwnd
The sum of CWND detected by packet delay. Dividing this value by The sum of CWND detected by packet delay. Dividing this value by
TcpExtTCPHystartDelayDetect is the average CWND which detected by the TcpExtTCPHystartDelayDetect is the average CWND which detected by the
packet delay. packet delay.
TCP retransmission and congestion control TCP retransmission and congestion control
====================================== =========================================
The TCP protocol has two retransmission mechanisms: SACK and fast The TCP protocol has two retransmission mechanisms: SACK and fast
recovery. They are exclusive with each other. When SACK is enabled, recovery. They are exclusive with each other. When SACK is enabled,
the kernel TCP stack would use SACK, or kernel would use fast the kernel TCP stack would use SACK, or kernel would use fast
@ -516,12 +585,14 @@ https://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf
.. _RFC6582: https://tools.ietf.org/html/rfc6582 .. _RFC6582: https://tools.ietf.org/html/rfc6582
* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery * TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery
When the congestion control comes into Recovery state, if sack is When the congestion control comes into Recovery state, if sack is
used, TcpExtTCPSackRecovery increases 1, if sack is not used, used, TcpExtTCPSackRecovery increases 1, if sack is not used,
TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP
stack begins to retransmit the lost packets. stack begins to retransmit the lost packets.
* TcpExtTCPSACKReneging * TcpExtTCPSACKReneging
A packet was acknowledged by SACK, but the receiver has dropped this A packet was acknowledged by SACK, but the receiver has dropped this
packet, so the sender needs to retransmit this packet. In this packet, so the sender needs to retransmit this packet. In this
situation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver situation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver
@ -532,6 +603,7 @@ the RTO expires for this packet, then the sender assumes this packet
has been dropped by the receiver. has been dropped by the receiver.
* TcpExtTCPRenoReorder * TcpExtTCPRenoReorder
The reorder packet is detected by fast recovery. It would only be used The reorder packet is detected by fast recovery. It would only be used
if SACK is disabled. The fast recovery algorithm detects recorder by if SACK is disabled. The fast recovery algorithm detects recorder by
the duplicate ACK number. E.g., if retransmission is triggered, and the duplicate ACK number. E.g., if retransmission is triggered, and
@ -542,6 +614,7 @@ order packet. Thus the sender would find more ACks than its
expectation, and the sender knows out of order occurs. expectation, and the sender knows out of order occurs.
* TcpExtTCPTSReorder * TcpExtTCPTSReorder
The reorder packet is detected when a hole is filled. E.g., assume the The reorder packet is detected when a hole is filled. E.g., assume the
sender sends packet 1,2,3,4,5, and the receiving order is sender sends packet 1,2,3,4,5, and the receiving order is
1,2,4,5,3. When the sender receives the ACK of packet 3 (which will 1,2,4,5,3. When the sender receives the ACK of packet 3 (which will
@ -551,6 +624,7 @@ fill the hole), two conditions will let TcpExtTCPTSReorder increase
than the retransmission timestamp. than the retransmission timestamp.
* TcpExtTCPSACKReorder * TcpExtTCPSACKReorder
The reorder packet detected by SACK. The SACK has two methods to The reorder packet detected by SACK. The SACK has two methods to
detect reorder: (1) DSACK is received by the sender. It means the detect reorder: (1) DSACK is received by the sender. It means the
sender sends the same packet more than one times. And the only reason sender sends the same packet more than one times. And the only reason
@ -562,6 +636,29 @@ packet yet, the sender would know packet 4 is out of order. The TCP
stack of kernel will increase TcpExtTCPSACKReorder for both of the stack of kernel will increase TcpExtTCPSACKReorder for both of the
above scenarios. above scenarios.
* TcpExtTCPSlowStartRetrans
The TCP stack wants to retransmit a packet and the congestion control
state is 'Loss'.
* TcpExtTCPFastRetrans
The TCP stack wants to retransmit a packet and the congestion control
state is not 'Loss'.
* TcpExtTCPLostRetransmit
A SACK points out that a retransmission packet is lost again.
* TcpExtTCPRetransFail
The TCP stack tries to deliver a retransmission packet to lower layers
but the lower layers return an error.
* TcpExtTCPSynRetrans
The TCP stack retransmits a SYN packet.
DSACK DSACK
===== =====
The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report The DSACK is defined in `RFC2883`_. The receiver uses DSACK to report
@ -574,10 +671,12 @@ sender side.
.. _RFC2883 : https://tools.ietf.org/html/rfc2883 .. _RFC2883 : https://tools.ietf.org/html/rfc2883
* TcpExtTCPDSACKOldSent * TcpExtTCPDSACKOldSent
The TCP stack receives a duplicate packet which has been acked, so it The TCP stack receives a duplicate packet which has been acked, so it
sends a DSACK to the sender. sends a DSACK to the sender.
* TcpExtTCPDSACKOfoSent * TcpExtTCPDSACKOfoSent
The TCP stack receives an out of order duplicate packet, so it sends a The TCP stack receives an out of order duplicate packet, so it sends a
DSACK to the sender. DSACK to the sender.
@ -586,6 +685,7 @@ The TCP stack receives a DSACK, which indicates an acknowledged
duplicate packet is received. duplicate packet is received.
* TcpExtTCPDSACKOfoRecv * TcpExtTCPDSACKOfoRecv
The TCP stack receives a DSACK, which indicate an out of order The TCP stack receives a DSACK, which indicate an out of order
duplicate packet is received. duplicate packet is received.
@ -640,23 +740,26 @@ A skb should be shifted or merged, but the TCP stack doesn't do it for
some reasons. some reasons.
TCP out of order TCP out of order
=============== ================
* TcpExtTCPOFOQueue * TcpExtTCPOFOQueue
The TCP layer receives an out of order packet and has enough memory The TCP layer receives an out of order packet and has enough memory
to queue it. to queue it.
* TcpExtTCPOFODrop * TcpExtTCPOFODrop
The TCP layer receives an out of order packet but doesn't have enough The TCP layer receives an out of order packet but doesn't have enough
memory, so drops it. Such packets won't be counted into memory, so drops it. Such packets won't be counted into
TcpExtTCPOFOQueue. TcpExtTCPOFOQueue.
* TcpExtTCPOFOMerge * TcpExtTCPOFOMerge
The received out of order packet has an overlay with the previous The received out of order packet has an overlay with the previous
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
packets will also be counted into TcpExtTCPOFOQueue. packets will also be counted into TcpExtTCPOFOQueue.
TCP PAWS TCP PAWS
======= ========
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
which is used to drop old packets. It depends on the TCP which is used to drop old packets. It depends on the TCP
timestamps. For detail information, please refer the `timestamp wiki`_ timestamps. For detail information, please refer the `timestamp wiki`_
@ -666,13 +769,15 @@ and the `RFC of PAWS`_.
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps .. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
* TcpExtPAWSActive * TcpExtPAWSActive
Packets are dropped by PAWS in Syn-Sent status. Packets are dropped by PAWS in Syn-Sent status.
* TcpExtPAWSEstab * TcpExtPAWSEstab
Packets are dropped by PAWS in any status other than Syn-Sent. Packets are dropped by PAWS in any status other than Syn-Sent.
TCP ACK skip TCP ACK skip
=========== ============
In some scenarios, kernel would avoid sending duplicate ACKs too In some scenarios, kernel would avoid sending duplicate ACKs too
frequently. Please find more details in the tcp_invalid_ratelimit frequently. Please find more details in the tcp_invalid_ratelimit
section of the `sysctl document`_. When kernel decides to skip an ACK section of the `sysctl document`_. When kernel decides to skip an ACK
@ -684,6 +789,7 @@ it has no data.
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt .. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
* TcpExtTCPACKSkippedSynRecv * TcpExtTCPACKSkippedSynRecv
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
@ -697,6 +803,7 @@ increase TcpExtTCPACKSkippedSynRecv.
* TcpExtTCPACKSkippedPAWS * TcpExtTCPACKSkippedPAWS
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2 numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
or Time-Wait statuses, the skipped ACK would be counted to or Time-Wait statuses, the skipped ACK would be counted to
@ -705,18 +812,22 @@ TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
would be counted to TcpExtTCPACKSkippedPAWS. would be counted to TcpExtTCPACKSkippedPAWS.
* TcpExtTCPACKSkippedSeq * TcpExtTCPACKSkippedSeq
The sequence number is out of window and the timestamp passes the PAWS The sequence number is out of window and the timestamp passes the PAWS
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait. check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
* TcpExtTCPACKSkippedFinWait2 * TcpExtTCPACKSkippedFinWait2
The ACK is skipped in Fin-Wait-2 status, the reason would be either The ACK is skipped in Fin-Wait-2 status, the reason would be either
PAWS check fails or the received sequence number is out of window. PAWS check fails or the received sequence number is out of window.
* TcpExtTCPACKSkippedTimeWait * TcpExtTCPACKSkippedTimeWait
Tha ACK is skipped in Time-Wait status, the reason would be either Tha ACK is skipped in Time-Wait status, the reason would be either
PAWS check failed or the received sequence number is out of window. PAWS check failed or the received sequence number is out of window.
* TcpExtTCPACKSkippedChallenge * TcpExtTCPACKSkippedChallenge
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_, 3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these `RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
@ -729,8 +840,9 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11 .. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
TCP receive window TCP receive window
================= ==================
* TcpExtTCPWantZeroWindowAdv * TcpExtTCPWantZeroWindowAdv
Depending on current memory usage, the TCP stack tries to set receive Depending on current memory usage, the TCP stack tries to set receive
window to zero. But the receive window might still be a no-zero window to zero. But the receive window might still be a no-zero
value. For example, if the previous window size is 10, and the TCP value. For example, if the previous window size is 10, and the TCP
@ -738,14 +850,16 @@ stack receives 3 bytes, the current window size would be 7 even if the
window size calculated by the memory usage is zero. window size calculated by the memory usage is zero.
* TcpExtTCPToZeroWindowAdv * TcpExtTCPToZeroWindowAdv
The TCP receive window is set to zero from a no-zero value. The TCP receive window is set to zero from a no-zero value.
* TcpExtTCPFromZeroWindowAdv * TcpExtTCPFromZeroWindowAdv
The TCP receive window is set to no-zero value from zero. The TCP receive window is set to no-zero value from zero.
Delayed ACK Delayed ACK
========== ===========
The TCP Delayed ACK is a technique which is used for reducing the The TCP Delayed ACK is a technique which is used for reducing the
packet count in the network. For more details, please refer the packet count in the network. For more details, please refer the
`Delayed ACK wiki`_ `Delayed ACK wiki`_
@ -753,10 +867,12 @@ packet count in the network. For more details, please refer the
.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment .. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
* TcpExtDelayedACKs * TcpExtDelayedACKs
A delayed ACK timer expires. The TCP stack will send a pure ACK packet A delayed ACK timer expires. The TCP stack will send a pure ACK packet
and exit the delayed ACK mode. and exit the delayed ACK mode.
* TcpExtDelayedACKLocked * TcpExtDelayedACKLocked
A delayed ACK timer expires, but the TCP stack can't send an ACK A delayed ACK timer expires, but the TCP stack can't send an ACK
immediately due to the socket is locked by a userspace program. The immediately due to the socket is locked by a userspace program. The
TCP stack will send a pure ACK later (after the userspace program TCP stack will send a pure ACK later (after the userspace program
@ -765,29 +881,152 @@ TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
mode. mode.
* TcpExtDelayedACKLost * TcpExtDelayedACKLost
It will be updated when the TCP stack receives a packet which has been It will be updated when the TCP stack receives a packet which has been
ACKed. A Delayed ACK loss might cause this issue, but it would also be ACKed. A Delayed ACK loss might cause this issue, but it would also be
triggered by other reasons, such as a packet is duplicated in the triggered by other reasons, such as a packet is duplicated in the
network. network.
Tail Loss Probe (TLP) Tail Loss Probe (TLP)
=================== =====================
TLP is an algorithm which is used to detect TCP packet loss. For more TLP is an algorithm which is used to detect TCP packet loss. For more
details, please refer the `TLP paper`_. details, please refer the `TLP paper`_.
.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 .. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
* TcpExtTCPLossProbes * TcpExtTCPLossProbes
A TLP probe packet is sent. A TLP probe packet is sent.
* TcpExtTCPLossProbeRecovery * TcpExtTCPLossProbeRecovery
A packet loss is detected and recovered by TLP. A packet loss is detected and recovered by TLP.
TCP Fast Open
=============
TCP Fast Open is a technology which allows data transfer before the
3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
general description.
.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
* TcpExtTCPFastOpenActive
When the TCP stack receives an ACK packet in the SYN-SENT status, and
the ACK packet acknowledges the data in the SYN packet, the TCP stack
understand the TFO cookie is accepted by the other side, then it
updates this counter.
* TcpExtTCPFastOpenActiveFail
This counter indicates that the TCP stack initiated a TCP Fast Open,
but it failed. This counter would be updated in three scenarios: (1)
the other side doesn't acknowledge the data in the SYN packet. (2) The
SYN packet which has the TFO cookie is timeout at least once. (3)
after the 3-way handshake, the retransmission timeout happens
net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
fast open after the handshake.
* TcpExtTCPFastOpenPassive
This counter indicates how many times the TCP stack accepts the fast
open request.
* TcpExtTCPFastOpenPassiveFail
This counter indicates how many times the TCP stack rejects the fast
open request. It is caused by either the TFO cookie is invalid or the
TCP stack finds an error during the socket creating process.
* TcpExtTCPFastOpenListenOverflow
When the pending fast open request number is larger than
fastopenq->max_qlen, the TCP stack will reject the fast open request
and update this counter. When this counter is updated, the TCP stack
won't update TcpExtTCPFastOpenPassive or
TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
TCP_FASTOPEN socket operation and it could not be larger than
net.core.somaxconn. For example:
setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
* TcpExtTCPFastOpenCookieReqd
This counter indicates how many times a client wants to request a TFO
cookie.
SYN cookies
===========
SYN cookies are used to mitigate SYN flood, for details, please refer
the `SYN cookies wiki`_.
.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
* TcpExtSyncookiesSent
It indicates how many SYN cookies are sent.
* TcpExtSyncookiesRecv
How many reply packets of the SYN cookies the TCP stack receives.
* TcpExtSyncookiesFailed
The MSS decoded from the SYN cookie is invalid. When this counter is
updated, the received packet won't be treated as a SYN cookie and the
TcpExtSyncookiesRecv counter wont be updated.
Challenge ACK
=============
For details of challenge ACK, please refer the explaination of
TcpExtTCPACKSkippedChallenge.
* TcpExtTCPChallengeACK
The number of challenge acks sent.
* TcpExtTCPSYNChallenge
The number of challenge acks sent in response to SYN packets. After
updates this counter, the TCP stack might send a challenge ACK and
update the TcpExtTCPChallengeACK counter, or it might also skip to
send the challenge and update the TcpExtTCPACKSkippedChallenge.
prune
=====
When a socket is under memory pressure, the TCP stack will try to
reclaim memory from the receiving queue and out of order queue. One of
the reclaiming method is 'collapse', which means allocate a big sbk,
copy the contiguous skbs to the single big skb, and free these
contiguous skbs.
* TcpExtPruneCalled
The TCP stack tries to reclaim memory for a socket. After updates this
counter, the TCP stack will try to collapse the out of order queue and
the receiving queue. If the memory is still not enough, the TCP stack
will try to discard packets from the out of order queue (and update the
TcpExtOfoPruned counter)
* TcpExtOfoPruned
The TCP stack tries to discard packet on the out of order queue.
* TcpExtRcvPruned
After 'collapse' and discard packets from the out of order queue, if
the actually used memory is still larger than the max allowed memory,
this counter will be updated. It means the 'prune' fails.
* TcpExtTCPRcvCollapsed
This counter indicates how many skbs are freed during 'collapse'.
examples examples
======= ========
ping test ping test
-------- ---------
Run the ping command against the public dns server 8.8.8.8:: Run the ping command against the public dns server 8.8.8.8::
nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1 nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
@ -831,7 +1070,7 @@ and its corresponding Echo Reply packet are constructed by:
So the IpExtInOctets and IpExtOutOctets are 20+16+48=84. So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
tcp 3-way handshake tcp 3-way handshake
------------------ -------------------
On server side, we run:: On server side, we run::
nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000 nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
@ -873,7 +1112,7 @@ ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
1, TcpOutSegs increased 2. 1, TcpOutSegs increased 2.
TCP normal traffic TCP normal traffic
----------------- ------------------
Run nc on server:: Run nc on server::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@ -996,7 +1235,7 @@ and the packet received from client qualified for fast path, so it
was counted into 'TcpExtTCPHPHits'. was counted into 'TcpExtTCPHPHits'.
TcpExtTCPAbortOnClose TcpExtTCPAbortOnClose
-------------------- ---------------------
On the server side, we run below python script:: On the server side, we run below python script::
import socket import socket
@ -1030,7 +1269,7 @@ If we run tcpdump on the server side, we could find the server sent a
RST after we type Ctrl-C. RST after we type Ctrl-C.
TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
----------------------------------------------- ---------------------------------------------------
Below is an example which let the orphan socket count be higher than Below is an example which let the orphan socket count be higher than
net.ipv4.tcp_max_orphans. net.ipv4.tcp_max_orphans.
Change tcp_max_orphans to a smaller value on client:: Change tcp_max_orphans to a smaller value on client::
@ -1152,7 +1391,7 @@ FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
TcpExtTCPAbortOnTimeout 10 0.0 TcpExtTCPAbortOnTimeout 10 0.0
TcpExtTCPAbortOnLinger TcpExtTCPAbortOnLinger
--------------------- ----------------------
The server side code:: The server side code::
nstatuser@nstat-b:~$ cat server_linger.py nstatuser@nstat-b:~$ cat server_linger.py
@ -1197,7 +1436,7 @@ After run client_linger.py, check the output of nstat::
TcpExtTCPAbortOnLinger 1 0.0 TcpExtTCPAbortOnLinger 1 0.0
TcpExtTCPRcvCoalesce TcpExtTCPRcvCoalesce
------------------- --------------------
On the server, we run a program which listen on TCP port 9000, but On the server, we run a program which listen on TCP port 9000, but
doesn't read any data:: doesn't read any data::
@ -1257,7 +1496,7 @@ the receiving queue. So the TCP layer merged the two packets, and we
could find the TcpExtTCPRcvCoalesce increased 1. could find the TcpExtTCPRcvCoalesce increased 1.
TcpExtListenOverflows and TcpExtListenDrops TcpExtListenOverflows and TcpExtListenDrops
---------------------------------------- -------------------------------------------
On server, run the nc command, listen on port 9000:: On server, run the nc command, listen on port 9000::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000 nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@ -1305,7 +1544,7 @@ TcpExtListenOverflows and TcpExtListenDrops would be larger, because
the SYN of the 4th nc was dropped, the client was retrying. the SYN of the 4th nc was dropped, the client was retrying.
IpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes IpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes
---------------------------------------------- -------------------------------------------------
server A IP address: 192.168.122.250 server A IP address: 192.168.122.250
server B IP address: 192.168.122.251 server B IP address: 192.168.122.251
Prepare on server A, add a route to server B:: Prepare on server A, add a route to server B::
@ -1400,7 +1639,7 @@ a route for the 8.8.8.8 IP address, so server B increased
IpOutNoRoutes. IpOutNoRoutes.
TcpExtTCPACKSkippedSynRecv TcpExtTCPACKSkippedSynRecv
------------------------ --------------------------
In this test, we send 3 same SYN packets from client to server. The In this test, we send 3 same SYN packets from client to server. The
first SYN will let server create a socket, set it to Syn-Recv status, first SYN will let server create a socket, set it to Syn-Recv status,
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
@ -1448,7 +1687,7 @@ Check snmp cunter on nstat-b::
As we expected, TcpExtTCPACKSkippedSynRecv is 1. As we expected, TcpExtTCPACKSkippedSynRecv is 1.
TcpExtTCPACKSkippedPAWS TcpExtTCPACKSkippedPAWS
---------------------- -----------------------
To trigger PAWS, we could send an old SYN. To trigger PAWS, we could send an old SYN.
On nstat-b, let nc listen on port 9000:: On nstat-b, let nc listen on port 9000::
@ -1485,7 +1724,7 @@ failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
for the second SYN, and updated TcpExtTCPACKSkippedPAWS. for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
TcpExtTCPACKSkippedSeq TcpExtTCPACKSkippedSeq
-------------------- ----------------------
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
timestamp (to pass PAWS check) but the sequence number is out of timestamp (to pass PAWS check) but the sequence number is out of
window. The linux TCP stack would avoid to skip if the packet has window. The linux TCP stack would avoid to skip if the packet has

View File

@ -196,7 +196,7 @@ The switch device will learn/forget source MAC address/VLAN on ingress packets
and notify the switch driver of the mac/vlan/port tuples. The switch driver, and notify the switch driver of the mac/vlan/port tuples. The switch driver,
in turn, will notify the bridge driver using the switchdev notifier call: in turn, will notify the bridge driver using the switchdev notifier call:
err = call_switchdev_notifiers(val, dev, info); err = call_switchdev_notifiers(val, dev, info, extack);
Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
forgetting, and info points to a struct switchdev_notifier_fdb_info. On forgetting, and info points to a struct switchdev_notifier_fdb_info. On
@ -232,10 +232,8 @@ Learning_sync attribute enables syncing of the learned/forgotten FDB entry to
the bridge's FDB. It's possible, but not optimal, to enable learning on the the bridge's FDB. It's possible, but not optimal, to enable learning on the
device port and on the bridge port, and disable learning_sync. device port and on the bridge port, and disable learning_sync.
To support learning and learning_sync port attributes, the driver implements To support learning, the driver implements switchdev op
switchdev op switchdev_port_attr_get/set for switchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
SWITCHDEV_ATTR_PORT_ID_BRIDGE_FLAGS. The driver should initialize the attributes
to the hardware defaults.
FDB Ageing FDB Ageing
^^^^^^^^^^ ^^^^^^^^^^
@ -373,22 +371,3 @@ The driver can monitor for updates to arp_tbl using the netevent notifier
NETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops NETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops
for the routes as arp_tbl updates. The driver implements ndo_neigh_destroy for the routes as arp_tbl updates. The driver implements ndo_neigh_destroy
to know when arp_tbl neighbor entries are purged from the port. to know when arp_tbl neighbor entries are purged from the port.
Transaction item queue
^^^^^^^^^^^^^^^^^^^^^^
For switchdev ops attr_set and obj_add, there is a 2 phase transaction model
used. First phase is to "prepare" anything needed, including various checks,
memory allocation, etc. The goal is to handle the stuff that is not unlikely
to fail here. The second phase is to "commit" the actual changes.
Switchdev provides an infrastructure for sharing items (for example memory
allocations) between the two phases.
The object created by a driver in "prepare" phase and it is queued up by:
switchdev_trans_item_enqueue()
During the "commit" phase, the driver gets the object by:
switchdev_trans_item_dequeue()
If a transaction is aborted during "prepare" phase, switchdev code will handle
cleanup of the queued-up objects.

View File

@ -6,11 +6,21 @@ The interfaces for receiving network packages timestamps are:
* SO_TIMESTAMP * SO_TIMESTAMP
Generates a timestamp for each incoming packet in (not necessarily Generates a timestamp for each incoming packet in (not necessarily
monotonic) system time. Reports the timestamp via recvmsg() in a monotonic) system time. Reports the timestamp via recvmsg() in a
control message as struct timeval (usec resolution). control message in usec resolution.
SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD
based on the architecture type and time_t representation of libc.
Control message format is in struct __kernel_old_timeval for
SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for
SO_TIMESTAMP_NEW options respectively.
* SO_TIMESTAMPNS * SO_TIMESTAMPNS
Same timestamping mechanism as SO_TIMESTAMP, but reports the Same timestamping mechanism as SO_TIMESTAMP, but reports the
timestamp as struct timespec (nsec resolution). timestamp as struct timespec in nsec resolution.
SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD
based on the architecture type and time_t representation of libc.
Control message format is in struct timespec for SO_TIMESTAMPNS_OLD
and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options
respectively.
* IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] * IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
Only for multicast:approximate transmit timestamp obtained by Only for multicast:approximate transmit timestamp obtained by
@ -22,7 +32,7 @@ The interfaces for receiving network packages timestamps are:
timestamps for stream sockets. timestamps for stream sockets.
1.1 SO_TIMESTAMP: 1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW):
This socket option enables timestamping of datagrams on the reception This socket option enables timestamping of datagrams on the reception
path. Because the destination socket, if any, is not known early in path. Because the destination socket, if any, is not known early in
@ -31,15 +41,25 @@ same is true for all early receive timestamp options.
For interface details, see `man 7 socket`. For interface details, see `man 7 socket`.
Always use SO_TIMESTAMP_NEW timestamp to always get timestamp in
struct __kernel_sock_timeval format.
1.2 SO_TIMESTAMPNS: SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038
on 32 bit machines.
1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW):
This option is identical to SO_TIMESTAMP except for the returned data type. This option is identical to SO_TIMESTAMP except for the returned data type.
Its struct timespec allows for higher resolution (ns) timestamps than the Its struct timespec allows for higher resolution (ns) timestamps than the
timeval of SO_TIMESTAMP (ms). timeval of SO_TIMESTAMP (ms).
Always use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in
struct __kernel_timespec format.
1.3 SO_TIMESTAMPING: SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038
on 32 bit machines.
1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW):
Supports multiple types of timestamp requests. As a result, this Supports multiple types of timestamp requests. As a result, this
socket option takes a bitmap of flags, not a boolean. In socket option takes a bitmap of flags, not a boolean. In
@ -323,10 +343,23 @@ SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
These timestamps are returned in a control message with cmsg_level These timestamps are returned in a control message with cmsg_level
SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
For SO_TIMESTAMPING_OLD:
struct scm_timestamping { struct scm_timestamping {
struct timespec ts[3]; struct timespec ts[3];
}; };
For SO_TIMESTAMPING_NEW:
struct scm_timestamping64 {
struct __kernel_timespec ts[3];
Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in
struct scm_timestamping64 format.
SO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038
on 32 bit machines.
The structure can return up to three timestamps. This is a legacy The structure can return up to three timestamps. This is a legacy
feature. At least one field is non-zero at any time. Most timestamps feature. At least one field is non-zero at any time. Most timestamps
are passed in ts[0]. Hardware timestamps are passed in ts[2]. are passed in ts[0]. Hardware timestamps are passed in ts[2].

View File

@ -52,6 +52,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
- sparc64 - sparc64
- mips64 - mips64
- s390x - s390x
- riscv
And the older cBPF JIT supported on the following archs: And the older cBPF JIT supported on the following archs:
- mips - mips
@ -291,6 +292,20 @@ user space is responsible for creating them if needed.
Default : 0 (for compatibility reasons) Default : 0 (for compatibility reasons)
devconf_inherit_init_net
----------------------------
Controls if a new network namespace should inherit all current
settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
default, we keep the current behavior: for IPv4 we inherit all current
settings from init_net and for IPv6 we reset all settings to default.
If set to 1, both IPv4 and IPv6 settings are forced to inherit from
current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
forced to reset to their default values.
Default : 0 (for compatibility reasons)
2. /proc/sys/net/unix - Parameters for Unix domain sockets 2. /proc/sys/net/unix - Parameters for Unix domain sockets
------------------------------------------------------- -------------------------------------------------------

View File

@ -2916,6 +2916,12 @@ L: bpf@vger.kernel.org
S: Maintained S: Maintained
F: arch/powerpc/net/ F: arch/powerpc/net/
BPF JIT for RISC-V (RV64G)
M: Björn Töpel <bjorn.topel@gmail.com>
L: netdev@vger.kernel.org
S: Maintained
F: arch/riscv/net/
BPF JIT for S390 BPF JIT for S390
M: Martin Schwidefsky <schwidefsky@de.ibm.com> M: Martin Schwidefsky <schwidefsky@de.ibm.com>
M: Heiko Carstens <heiko.carstens@de.ibm.com> M: Heiko Carstens <heiko.carstens@de.ibm.com>
@ -4135,7 +4141,7 @@ S: Maintained
F: drivers/media/dvb-frontends/cxd2820r* F: drivers/media/dvb-frontends/cxd2820r*
CXGB3 ETHERNET DRIVER (CXGB3) CXGB3 ETHERNET DRIVER (CXGB3)
M: Arjun Vynipadath <arjun@chelsio.com> M: Vishal Kulkarni <vishal@chelsio.com>
L: netdev@vger.kernel.org L: netdev@vger.kernel.org
W: http://www.chelsio.com W: http://www.chelsio.com
S: Supported S: Supported
@ -4164,7 +4170,7 @@ S: Supported
F: drivers/crypto/chelsio F: drivers/crypto/chelsio
CXGB4 ETHERNET DRIVER (CXGB4) CXGB4 ETHERNET DRIVER (CXGB4)
M: Arjun Vynipadath <arjun@chelsio.com> M: Vishal Kulkarni <vishal@chelsio.com>
L: netdev@vger.kernel.org L: netdev@vger.kernel.org
W: http://www.chelsio.com W: http://www.chelsio.com
S: Supported S: Supported
@ -6036,6 +6042,12 @@ L: linuxppc-dev@lists.ozlabs.org
S: Maintained S: Maintained
F: drivers/dma/fsldma.* F: drivers/dma/fsldma.*
FREESCALE ENETC ETHERNET DRIVERS
M: Claudiu Manoil <claudiu.manoil@nxp.com>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/freescale/enetc/
FREESCALE eTSEC ETHERNET DRIVER (GIANFAR) FREESCALE eTSEC ETHERNET DRIVER (GIANFAR)
M: Claudiu Manoil <claudiu.manoil@nxp.com> M: Claudiu Manoil <claudiu.manoil@nxp.com>
L: netdev@vger.kernel.org L: netdev@vger.kernel.org
@ -6099,7 +6111,9 @@ FREESCALE QORIQ PTP CLOCK DRIVER
M: Yangbo Lu <yangbo.lu@nxp.com> M: Yangbo Lu <yangbo.lu@nxp.com>
L: netdev@vger.kernel.org L: netdev@vger.kernel.org
S: Maintained S: Maintained
F: drivers/net/ethernet/freescale/enetc/enetc_ptp.c
F: drivers/ptp/ptp_qoriq.c F: drivers/ptp/ptp_qoriq.c
F: drivers/ptp/ptp_qoriq_debugfs.c
F: include/linux/fsl/ptp_qoriq.h F: include/linux/fsl/ptp_qoriq.h
F: Documentation/devicetree/bindings/ptp/ptp-qoriq.txt F: Documentation/devicetree/bindings/ptp/ptp-qoriq.txt
@ -10598,6 +10612,7 @@ F: Documentation/devicetree/bindings/net/dsa/
F: net/dsa/ F: net/dsa/
F: include/net/dsa.h F: include/net/dsa.h
F: include/linux/dsa/ F: include/linux/dsa/
F: include/linux/platform_data/dsa.h
F: drivers/net/dsa/ F: drivers/net/dsa/
NETWORKING [GENERAL] NETWORKING [GENERAL]
@ -12640,6 +12655,14 @@ L: netdev@vger.kernel.org
S: Maintained S: Maintained
F: drivers/net/ethernet/qualcomm/emac/ F: drivers/net/ethernet/qualcomm/emac/
QUALCOMM ETHQOS ETHERNET DRIVER
M: Vinod Koul <vkoul@kernel.org>
M: Niklas Cassel <niklas.cassel@linaro.org>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c
F: Documentation/devicetree/bindings/net/qcom,dwmac.txt
QUALCOMM GENERIC INTERFACE I2C DRIVER QUALCOMM GENERIC INTERFACE I2C DRIVER
M: Alok Chauhan <alokc@codeaurora.org> M: Alok Chauhan <alokc@codeaurora.org>
M: Karthikeyan Ramasubramanian <kramasub@codeaurora.org> M: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
@ -13771,6 +13794,7 @@ F: drivers/misc/sgi-xp/
SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS SHARED MEMORY COMMUNICATIONS (SMC) SOCKETS
M: Ursula Braun <ubraun@linux.ibm.com> M: Ursula Braun <ubraun@linux.ibm.com>
M: Karsten Graul <kgraul@linux.ibm.com>
L: linux-s390@vger.kernel.org L: linux-s390@vger.kernel.org
W: http://www.ibm.com/developerworks/linux/linux390/ W: http://www.ibm.com/developerworks/linux/linux390/
S: Supported S: Supported

View File

@ -3,6 +3,7 @@
#define _UAPI_ASM_SOCKET_H #define _UAPI_ASM_SOCKET_H
#include <asm/sockios.h> #include <asm/sockios.h>
#include <asm/bitsperlong.h>
/* For setsockopt(2) */ /* For setsockopt(2) */
/* /*
@ -30,8 +31,8 @@
#define SO_RCVBUFFORCE 0x100b #define SO_RCVBUFFORCE 0x100b
#define SO_RCVLOWAT 0x1010 #define SO_RCVLOWAT 0x1010
#define SO_SNDLOWAT 0x1011 #define SO_SNDLOWAT 0x1011
#define SO_RCVTIMEO 0x1012 #define SO_RCVTIMEO_OLD 0x1012
#define SO_SNDTIMEO 0x1013 #define SO_SNDTIMEO_OLD 0x1013
#define SO_ACCEPTCONN 0x1014 #define SO_ACCEPTCONN 0x1014
#define SO_PROTOCOL 0x1028 #define SO_PROTOCOL 0x1028
#define SO_DOMAIN 0x1029 #define SO_DOMAIN 0x1029
@ -51,13 +52,9 @@
#define SO_GET_FILTER SO_ATTACH_FILTER #define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 28 #define SO_PEERNAME 28
#define SO_TIMESTAMP 29
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_PEERSEC 30 #define SO_PEERSEC 30
#define SO_PASSSEC 34 #define SO_PASSSEC 34
#define SO_TIMESTAMPNS 35
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
/* Security levels - as per NRL IPv6 - don't actually do anything */ /* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 19 #define SO_SECURITY_AUTHENTICATION 19
@ -66,9 +63,6 @@
#define SO_MARK 36 #define SO_MARK 36
#define SO_TIMESTAMPING 37
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_RXQ_OVFL 40 #define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41 #define SO_WIFI_STATUS 41
@ -115,4 +109,41 @@
#define SO_TXTIME 61 #define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME #define SCM_TXTIME SO_TXTIME
#define SO_BINDTOIFINDEX 62
#define SO_TIMESTAMP_OLD 29
#define SO_TIMESTAMPNS_OLD 35
#define SO_TIMESTAMPING_OLD 37
#define SO_TIMESTAMP_NEW 63
#define SO_TIMESTAMPNS_NEW 64
#define SO_TIMESTAMPING_NEW 65
#define SO_RCVTIMEO_NEW 66
#define SO_SNDTIMEO_NEW 67
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
#define SO_TIMESTAMP SO_TIMESTAMP_OLD
#define SO_TIMESTAMPNS SO_TIMESTAMPNS_OLD
#define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
#define SO_RCVTIMEO SO_RCVTIMEO_OLD
#define SO_SNDTIMEO SO_SNDTIMEO_OLD
#else
#define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
#define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
#define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
#endif
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#endif
#endif /* _UAPI_ASM_SOCKET_H */ #endif /* _UAPI_ASM_SOCKET_H */

View File

@ -93,6 +93,7 @@ &eth1 {
bm,pool-long = <2>; bm,pool-long = <2>;
bm,pool-short = <1>; bm,pool-short = <1>;
buffer-manager = <&bm>; buffer-manager = <&bm>;
phys = <&comphy1 1>;
phy-mode = "sgmii"; phy-mode = "sgmii";
status = "okay"; status = "okay";
}; };
@ -103,6 +104,7 @@ &eth2 {
bm,pool-short = <1>; bm,pool-short = <1>;
buffer-manager = <&bm>; buffer-manager = <&bm>;
managed = "in-band-status"; managed = "in-band-status";
phys = <&comphy5 2>;
phy-mode = "sgmii"; phy-mode = "sgmii";
sfp = <&sfp>; sfp = <&sfp>;
status = "okay"; status = "okay";

View File

@ -335,6 +335,43 @@ gateclk: clock-gating-control@18220 {
#clock-cells = <1>; #clock-cells = <1>;
}; };
comphy: phy@18300 {
compatible = "marvell,armada-380-comphy";
reg = <0x18300 0x100>;
#address-cells = <1>;
#size-cells = <0>;
comphy0: phy@0 {
reg = <0>;
#phy-cells = <1>;
};
comphy1: phy@1 {
reg = <1>;
#phy-cells = <1>;
};
comphy2: phy@2 {
reg = <2>;
#phy-cells = <1>;
};
comphy3: phy@3 {
reg = <3>;
#phy-cells = <1>;
};
comphy4: phy@4 {
reg = <4>;
#phy-cells = <1>;
};
comphy5: phy@5 {
reg = <5>;
#phy-cells = <1>;
};
};
coreclk: mvebu-sar@18600 { coreclk: mvebu-sar@18600 {
compatible = "marvell,armada-380-core-clock"; compatible = "marvell,armada-380-core-clock";
reg = <0x18600 0x04>; reg = <0x18600 0x04>;

View File

@ -706,6 +706,7 @@ ptp_clock@2d10e00 {
fsl,tmr-fiper1 = <999999995>; fsl,tmr-fiper1 = <999999995>;
fsl,tmr-fiper2 = <99990>; fsl,tmr-fiper2 = <99990>;
fsl,max-adj = <499999999>; fsl,max-adj = <499999999>;
fsl,extts-fifo;
}; };
enet0: ethernet@2d10000 { enet0: ethernet@2d10000 {

View File

@ -20,7 +20,7 @@
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/clk-provider.h> #include <linux/clk-provider.h>
#include <linux/cpu.h> #include <linux/cpu.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/page.h> #include <asm/page.h>
#include <asm/setup.h> #include <asm/setup.h>
#include <asm/system_misc.h> #include <asm/system_misc.h>

View File

@ -16,7 +16,7 @@
#include <linux/mtd/physmap.h> #include <linux/mtd/physmap.h>
#include <linux/mv643xx_eth.h> #include <linux/mv643xx_eth.h>
#include <linux/ethtool.h> #include <linux/ethtool.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/mach-types.h> #include <asm/mach-types.h>
#include <asm/mach/arch.h> #include <asm/mach/arch.h>
#include <asm/mach/pci.h> #include <asm/mach/pci.h>

View File

@ -17,7 +17,7 @@
#include <linux/mv643xx_eth.h> #include <linux/mv643xx_eth.h>
#include <linux/ethtool.h> #include <linux/ethtool.h>
#include <linux/i2c.h> #include <linux/i2c.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/mach-types.h> #include <asm/mach-types.h>
#include <asm/mach/arch.h> #include <asm/mach/arch.h>
#include <asm/mach/pci.h> #include <asm/mach/pci.h>

View File

@ -18,7 +18,7 @@
#include <linux/spi/spi.h> #include <linux/spi/spi.h>
#include <linux/spi/flash.h> #include <linux/spi/flash.h>
#include <linux/ethtool.h> #include <linux/ethtool.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/mach-types.h> #include <asm/mach-types.h>
#include <asm/mach/arch.h> #include <asm/mach/arch.h>
#include <asm/mach/pci.h> #include <asm/mach/pci.h>

View File

@ -15,7 +15,7 @@
#include <linux/mtd/physmap.h> #include <linux/mtd/physmap.h>
#include <linux/mv643xx_eth.h> #include <linux/mv643xx_eth.h>
#include <linux/ethtool.h> #include <linux/ethtool.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/mach-types.h> #include <asm/mach-types.h>
#include <asm/mach/arch.h> #include <asm/mach/arch.h>
#include <asm/mach/pci.h> #include <asm/mach/pci.h>

View File

@ -18,7 +18,7 @@
#include <linux/leds.h> #include <linux/leds.h>
#include <linux/gpio_keys.h> #include <linux/gpio_keys.h>
#include <linux/input.h> #include <linux/input.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <asm/mach-types.h> #include <asm/mach-types.h>
#include <asm/mach/arch.h> #include <asm/mach/arch.h>
#include <asm/mach/pci.h> #include <asm/mach/pci.h>

View File

@ -1083,12 +1083,17 @@ static inline void emit_ldx_r(const s8 dst[], const s8 src,
/* Arithmatic Operation */ /* Arithmatic Operation */
static inline void emit_ar_r(const u8 rd, const u8 rt, const u8 rm, static inline void emit_ar_r(const u8 rd, const u8 rt, const u8 rm,
const u8 rn, struct jit_ctx *ctx, u8 op) { const u8 rn, struct jit_ctx *ctx, u8 op,
bool is_jmp64) {
switch (op) { switch (op) {
case BPF_JSET: case BPF_JSET:
emit(ARM_AND_R(ARM_IP, rt, rn), ctx); if (is_jmp64) {
emit(ARM_AND_R(ARM_LR, rd, rm), ctx); emit(ARM_AND_R(ARM_IP, rt, rn), ctx);
emit(ARM_ORRS_R(ARM_IP, ARM_LR, ARM_IP), ctx); emit(ARM_AND_R(ARM_LR, rd, rm), ctx);
emit(ARM_ORRS_R(ARM_IP, ARM_LR, ARM_IP), ctx);
} else {
emit(ARM_ANDS_R(ARM_IP, rt, rn), ctx);
}
break; break;
case BPF_JEQ: case BPF_JEQ:
case BPF_JNE: case BPF_JNE:
@ -1096,18 +1101,25 @@ static inline void emit_ar_r(const u8 rd, const u8 rt, const u8 rm,
case BPF_JGE: case BPF_JGE:
case BPF_JLE: case BPF_JLE:
case BPF_JLT: case BPF_JLT:
emit(ARM_CMP_R(rd, rm), ctx); if (is_jmp64) {
_emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx); emit(ARM_CMP_R(rd, rm), ctx);
/* Only compare low halve if high halve are equal. */
_emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx);
} else {
emit(ARM_CMP_R(rt, rn), ctx);
}
break; break;
case BPF_JSLE: case BPF_JSLE:
case BPF_JSGT: case BPF_JSGT:
emit(ARM_CMP_R(rn, rt), ctx); emit(ARM_CMP_R(rn, rt), ctx);
emit(ARM_SBCS_R(ARM_IP, rm, rd), ctx); if (is_jmp64)
emit(ARM_SBCS_R(ARM_IP, rm, rd), ctx);
break; break;
case BPF_JSLT: case BPF_JSLT:
case BPF_JSGE: case BPF_JSGE:
emit(ARM_CMP_R(rt, rn), ctx); emit(ARM_CMP_R(rt, rn), ctx);
emit(ARM_SBCS_R(ARM_IP, rd, rm), ctx); if (is_jmp64)
emit(ARM_SBCS_R(ARM_IP, rd, rm), ctx);
break; break;
} }
} }
@ -1615,6 +1627,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_JMP | BPF_JLT | BPF_X: case BPF_JMP | BPF_JLT | BPF_X:
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
/* Setup source registers */ /* Setup source registers */
rm = arm_bpf_get_reg32(src_hi, tmp2[0], ctx); rm = arm_bpf_get_reg32(src_hi, tmp2[0], ctx);
rn = arm_bpf_get_reg32(src_lo, tmp2[1], ctx); rn = arm_bpf_get_reg32(src_lo, tmp2[1], ctx);
@ -1641,6 +1664,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
case BPF_JMP | BPF_JLE | BPF_K: case BPF_JMP | BPF_JLE | BPF_K:
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
if (off == 0) if (off == 0)
break; break;
rm = tmp2[0]; rm = tmp2[0];
@ -1652,7 +1686,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
rd = arm_bpf_get_reg64(dst, tmp, ctx); rd = arm_bpf_get_reg64(dst, tmp, ctx);
/* Check for the condition */ /* Check for the condition */
emit_ar_r(rd[0], rd[1], rm, rn, ctx, BPF_OP(code)); emit_ar_r(rd[0], rd[1], rm, rn, ctx, BPF_OP(code),
BPF_CLASS(code) == BPF_JMP);
/* Setup JUMP instruction */ /* Setup JUMP instruction */
jmp_offset = bpf2a32_offset(i+off, i, ctx); jmp_offset = bpf2a32_offset(i+off, i, ctx);

View File

@ -62,6 +62,7 @@
#define ARM_INST_ADDS_I 0x02900000 #define ARM_INST_ADDS_I 0x02900000
#define ARM_INST_AND_R 0x00000000 #define ARM_INST_AND_R 0x00000000
#define ARM_INST_ANDS_R 0x00100000
#define ARM_INST_AND_I 0x02000000 #define ARM_INST_AND_I 0x02000000
#define ARM_INST_BIC_R 0x01c00000 #define ARM_INST_BIC_R 0x01c00000
@ -172,6 +173,7 @@
#define ARM_ADC_I(rd, rn, imm) _AL3_I(ARM_INST_ADC, rd, rn, imm) #define ARM_ADC_I(rd, rn, imm) _AL3_I(ARM_INST_ADC, rd, rn, imm)
#define ARM_AND_R(rd, rn, rm) _AL3_R(ARM_INST_AND, rd, rn, rm) #define ARM_AND_R(rd, rn, rm) _AL3_R(ARM_INST_AND, rd, rn, rm)
#define ARM_ANDS_R(rd, rn, rm) _AL3_R(ARM_INST_ANDS, rd, rn, rm)
#define ARM_AND_I(rd, rn, imm) _AL3_I(ARM_INST_AND, rd, rn, imm) #define ARM_AND_I(rd, rn, imm) _AL3_I(ARM_INST_AND, rd, rn, imm)
#define ARM_BIC_R(rd, rn, rm) _AL3_R(ARM_INST_BIC, rd, rn, rm) #define ARM_BIC_R(rd, rn, rm) _AL3_R(ARM_INST_BIC, rd, rn, rm)

View File

@ -18,7 +18,7 @@
#include <linux/clkdev.h> #include <linux/clkdev.h>
#include <linux/mv643xx_eth.h> #include <linux/mv643xx_eth.h>
#include <linux/mv643xx_i2c.h> #include <linux/mv643xx_i2c.h>
#include <net/dsa.h> #include <linux/platform_data/dsa.h>
#include <linux/platform_data/dma-mv_xor.h> #include <linux/platform_data/dma-mv_xor.h>
#include <linux/platform_data/usb-ehci-orion.h> #include <linux/platform_data/usb-ehci-orion.h>
#include <plat/common.h> #include <plat/common.h>

View File

@ -71,3 +71,20 @@ &duart0 {
&duart1 { &duart1 {
status = "okay"; status = "okay";
}; };
&enetc_port0 {
phy-handle = <&sgmii_phy0>;
phy-connection-type = "sgmii";
mdio {
#address-cells = <1>;
#size-cells = <0>;
sgmii_phy0: ethernet-phy@2 {
reg = <0x2>;
};
};
};
&enetc_port1 {
status = "disabled";
};

View File

@ -335,5 +335,40 @@ smmu: iommu@5000000 {
<GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 207 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 206 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 207 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>; <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
}; };
pcie@1f0000000 { /* Integrated Endpoint Root Complex */
compatible = "pci-host-ecam-generic";
reg = <0x01 0xf0000000 0x0 0x100000>;
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
msi-parent = <&its>;
device_type = "pci";
bus-range = <0x0 0x0>;
dma-coherent;
msi-map = <0 &its 0x17 0xe>;
iommu-map = <0 &smmu 0x17 0xe>;
/* PF0-6 BAR0 - non-prefetchable memory */
ranges = <0x82000000 0x0 0x00000000 0x1 0xf8000000 0x0 0x160000
/* PF0-6 BAR2 - prefetchable memory */
0xc2000000 0x0 0x00000000 0x1 0xf8160000 0x0 0x070000
/* PF0: VF0-1 BAR0 - non-prefetchable memory */
0x82000000 0x0 0x00000000 0x1 0xf81d0000 0x0 0x020000
/* PF0: VF0-1 BAR2 - prefetchable memory */
0xc2000000 0x0 0x00000000 0x1 0xf81f0000 0x0 0x020000
/* PF1: VF0-1 BAR0 - non-prefetchable memory */
0x82000000 0x0 0x00000000 0x1 0xf8210000 0x0 0x020000
/* PF1: VF0-1 BAR2 - prefetchable memory */
0xc2000000 0x0 0x00000000 0x1 0xf8230000 0x0 0x020000>;
enetc_port0: ethernet@0,0 {
compatible = "fsl,enetc";
reg = <0x000000 0 0 0 0>;
};
enetc_port1: ethernet@0,1 {
compatible = "fsl,enetc";
reg = <0x000100 0 0 0 0>;
};
};
}; };
}; };

View File

@ -200,6 +200,19 @@ backlight: backlight {
pinctrl-0 = <&bl_en>; pinctrl-0 = <&bl_en>;
pwm-delay-us = <10000>; pwm-delay-us = <10000>;
}; };
gpio_keys: gpio-keys {
compatible = "gpio-keys";
pinctrl-names = "default";
pinctrl-0 = <&bt_host_wake_l>;
wake_on_bt: wake-on-bt {
label = "Wake-on-Bluetooth";
gpios = <&gpio0 3 GPIO_ACTIVE_LOW>;
linux,code = <KEY_WAKEUP>;
wakeup-source;
};
};
}; };
&ppvar_bigcpu { &ppvar_bigcpu {

View File

@ -175,6 +175,21 @@ dmic: dmic {
pinctrl-0 = <&dmic_en>; pinctrl-0 = <&dmic_en>;
wakeup-delay-ms = <250>; wakeup-delay-ms = <250>;
}; };
gpio_keys: gpio-keys {
compatible = "gpio-keys";
pinctrl-names = "default";
pinctrl-0 = <&pen_eject_odl>;
pen-insert {
label = "Pen Insert";
/* Insert = low, eject = high */
gpios = <&gpio1 1 GPIO_ACTIVE_LOW>;
linux,code = <SW_PEN_INSERTED>;
linux,input-type = <EV_SW>;
wakeup-source;
};
};
}; };
/* pp900_s0 aliases */ /* pp900_s0 aliases */
@ -328,20 +343,6 @@ &cru {
<400000000>; <400000000>;
}; };
&gpio_keys {
pinctrl-names = "default";
pinctrl-0 = <&bt_host_wake_l>, <&pen_eject_odl>;
pen-insert {
label = "Pen Insert";
/* Insert = low, eject = high */
gpios = <&gpio1 1 GPIO_ACTIVE_LOW>;
linux,code = <SW_PEN_INSERTED>;
linux,input-type = <EV_SW>;
wakeup-source;
};
};
&i2c_tunnel { &i2c_tunnel {
google,remote-bus = <0>; google,remote-bus = <0>;
}; };
@ -437,8 +438,19 @@ &spi2 {
status = "okay"; status = "okay";
}; };
&wake_on_bt { &usb_host0_ohci {
gpios = <&gpio1 2 GPIO_ACTIVE_LOW>; #address-cells = <1>;
#size-cells = <0>;
qca_bt: bluetooth@1 {
compatible = "usbcf3,e300", "usb4ca,301a";
reg = <1>;
pinctrl-names = "default";
pinctrl-0 = <&bt_host_wake_l>;
interrupt-parent = <&gpio1>;
interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "wakeup";
};
}; };
/* PINCTRL OVERRIDES */ /* PINCTRL OVERRIDES */
@ -455,7 +467,7 @@ &bl_en {
}; };
&bt_host_wake_l { &bt_host_wake_l {
rockchip,pins = <1 2 RK_FUNC_GPIO &pcfg_pull_up>; rockchip,pins = <1 2 RK_FUNC_GPIO &pcfg_pull_none>;
}; };
&ec_ap_int_l { &ec_ap_int_l {

View File

@ -269,19 +269,6 @@ ap_rtc_clk: ap-rtc-clk {
#clock-cells = <0>; #clock-cells = <0>;
}; };
gpio_keys: gpio-keys {
compatible = "gpio-keys";
pinctrl-names = "default";
pinctrl-0 = <&bt_host_wake_l>;
wake_on_bt: wake-on-bt {
label = "Wake-on-Bluetooth";
gpios = <&gpio0 3 GPIO_ACTIVE_LOW>;
linux,code = <KEY_WAKEUP>;
wakeup-source;
};
};
max98357a: max98357a { max98357a: max98357a {
compatible = "maxim,max98357a"; compatible = "maxim,max98357a";
pinctrl-names = "default"; pinctrl-names = "default";

View File

@ -362,7 +362,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
const s16 off = insn->off; const s16 off = insn->off;
const s32 imm = insn->imm; const s32 imm = insn->imm;
const int i = insn - ctx->prog->insnsi; const int i = insn - ctx->prog->insnsi;
const bool is64 = BPF_CLASS(code) == BPF_ALU64; const bool is64 = BPF_CLASS(code) == BPF_ALU64 ||
BPF_CLASS(code) == BPF_JMP;
const bool isdw = BPF_SIZE(code) == BPF_DW; const bool isdw = BPF_SIZE(code) == BPF_DW;
u8 jmp_cond; u8 jmp_cond;
s32 jmp_offset; s32 jmp_offset;
@ -559,7 +560,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_X: case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
emit(A64_CMP(1, dst, src), ctx); case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
emit(A64_CMP(is64, dst, src), ctx);
emit_cond_jmp: emit_cond_jmp:
jmp_offset = bpf2a64_offset(i + off, i, ctx); jmp_offset = bpf2a64_offset(i + off, i, ctx);
check_imm19(jmp_offset); check_imm19(jmp_offset);
@ -601,7 +612,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
emit(A64_B_(jmp_cond, jmp_offset), ctx); emit(A64_B_(jmp_cond, jmp_offset), ctx);
break; break;
case BPF_JMP | BPF_JSET | BPF_X: case BPF_JMP | BPF_JSET | BPF_X:
emit(A64_TST(1, dst, src), ctx); case BPF_JMP32 | BPF_JSET | BPF_X:
emit(A64_TST(is64, dst, src), ctx);
goto emit_cond_jmp; goto emit_cond_jmp;
/* IF (dst COND imm) JUMP off */ /* IF (dst COND imm) JUMP off */
case BPF_JMP | BPF_JEQ | BPF_K: case BPF_JMP | BPF_JEQ | BPF_K:
@ -614,12 +626,23 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_K: case BPF_JMP | BPF_JSGE | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
emit_a64_mov_i(1, tmp, imm, ctx); case BPF_JMP32 | BPF_JEQ | BPF_K:
emit(A64_CMP(1, dst, tmp), ctx); case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_CMP(is64, dst, tmp), ctx);
goto emit_cond_jmp; goto emit_cond_jmp;
case BPF_JMP | BPF_JSET | BPF_K: case BPF_JMP | BPF_JSET | BPF_K:
emit_a64_mov_i(1, tmp, imm, ctx); case BPF_JMP32 | BPF_JSET | BPF_K:
emit(A64_TST(1, dst, tmp), ctx); emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_TST(is64, dst, tmp), ctx);
goto emit_cond_jmp; goto emit_cond_jmp;
/* function call */ /* function call */
case BPF_JMP | BPF_CALL: case BPF_JMP | BPF_CALL:

View File

@ -2,3 +2,4 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_64.h generated-y += unistd_64.h
generic-y += kvm_para.h generic-y += kvm_para.h
generic-y += socket.h

View File

@ -1,120 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _ASM_IA64_SOCKET_H
#define _ASM_IA64_SOCKET_H
/*
* Socket related defines.
*
* Based on <asm-i386/socket.h>.
*
* Modified 1998-2000
* David Mosberger-Tang <davidm@hpl.hp.com>, Hewlett-Packard Co
*/
#include <asm/sockios.h>
/* For setsockopt(2) */
#define SOL_SOCKET 1
#define SO_DEBUG 1
#define SO_REUSEADDR 2
#define SO_TYPE 3
#define SO_ERROR 4
#define SO_DONTROUTE 5
#define SO_BROADCAST 6
#define SO_SNDBUF 7
#define SO_RCVBUF 8
#define SO_SNDBUFFORCE 32
#define SO_RCVBUFFORCE 33
#define SO_KEEPALIVE 9
#define SO_OOBINLINE 10
#define SO_NO_CHECK 11
#define SO_PRIORITY 12
#define SO_LINGER 13
#define SO_BSDCOMPAT 14
#define SO_REUSEPORT 15
#define SO_PASSCRED 16
#define SO_PEERCRED 17
#define SO_RCVLOWAT 18
#define SO_SNDLOWAT 19
#define SO_RCVTIMEO 20
#define SO_SNDTIMEO 21
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 22
#define SO_SECURITY_ENCRYPTION_TRANSPORT 23
#define SO_SECURITY_ENCRYPTION_NETWORK 24
#define SO_BINDTODEVICE 25
/* Socket filtering */
#define SO_ATTACH_FILTER 26
#define SO_DETACH_FILTER 27
#define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 28
#define SO_TIMESTAMP 29
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_ACCEPTCONN 30
#define SO_PEERSEC 31
#define SO_PASSSEC 34
#define SO_TIMESTAMPNS 35
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SO_MARK 36
#define SO_TIMESTAMPING 37
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_PROTOCOL 38
#define SO_DOMAIN 39
#define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41
#define SCM_WIFI_STATUS SO_WIFI_STATUS
#define SO_PEEK_OFF 42
/* Instruct lower device to use last 4-bytes of skb data as FCS */
#define SO_NOFCS 43
#define SO_LOCK_FILTER 44
#define SO_SELECT_ERR_QUEUE 45
#define SO_BUSY_POLL 46
#define SO_MAX_PACING_RATE 47
#define SO_BPF_EXTENSIONS 48
#define SO_INCOMING_CPU 49
#define SO_ATTACH_BPF 50
#define SO_DETACH_BPF SO_DETACH_FILTER
#define SO_ATTACH_REUSEPORT_CBPF 51
#define SO_ATTACH_REUSEPORT_EBPF 52
#define SO_CNX_ADVICE 53
#define SCM_TIMESTAMPING_OPT_STATS 54
#define SO_MEMINFO 55
#define SO_INCOMING_NAPI_ID 56
#define SO_COOKIE 57
#define SCM_TIMESTAMPING_PKTINFO 58
#define SO_PEERGROUPS 59
#define SO_ZEROCOPY 60
#define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME
#endif /* _ASM_IA64_SOCKET_H */

View File

@ -127,7 +127,7 @@ static struct fixed_phy_status nettel_fixed_phy_status __initdata = {
static int __init init_BSP(void) static int __init init_BSP(void)
{ {
m5272_uarts_init(); m5272_uarts_init();
fixed_phy_add(PHY_POLL, 0, &nettel_fixed_phy_status, -1); fixed_phy_add(PHY_POLL, 0, &nettel_fixed_phy_status);
return 0; return 0;
} }

View File

@ -683,7 +683,7 @@ static int __init ar7_register_devices(void)
if (ar7_has_high_cpmac()) { if (ar7_has_high_cpmac()) {
res = fixed_phy_add(PHY_POLL, cpmac_high.id, res = fixed_phy_add(PHY_POLL, cpmac_high.id,
&fixed_phy_status, -1); &fixed_phy_status);
if (!res) { if (!res) {
cpmac_get_mac(1, cpmac_high_data.dev_addr); cpmac_get_mac(1, cpmac_high_data.dev_addr);
@ -696,7 +696,7 @@ static int __init ar7_register_devices(void)
} else } else
cpmac_low_data.phy_mask = 0xffffffff; cpmac_low_data.phy_mask = 0xffffffff;
res = fixed_phy_add(PHY_POLL, cpmac_low.id, &fixed_phy_status, -1); res = fixed_phy_add(PHY_POLL, cpmac_low.id, &fixed_phy_status);
if (!res) { if (!res) {
cpmac_get_mac(0, cpmac_low_data.dev_addr); cpmac_get_mac(0, cpmac_low_data.dev_addr);
res = platform_device_register(&cpmac_low); res = platform_device_register(&cpmac_low);

View File

@ -274,7 +274,7 @@ static int __init bcm47xx_register_bus_complete(void)
bcm47xx_leds_register(); bcm47xx_leds_register();
bcm47xx_workarounds(); bcm47xx_workarounds();
fixed_phy_add(PHY_POLL, 0, &bcm47xx_fixed_phy_status, -1); fixed_phy_add(PHY_POLL, 0, &bcm47xx_fixed_phy_status);
return 0; return 0;
} }
device_initcall(bcm47xx_register_bus_complete); device_initcall(bcm47xx_register_bus_complete);

View File

@ -11,6 +11,7 @@
#define _UAPI_ASM_SOCKET_H #define _UAPI_ASM_SOCKET_H
#include <asm/sockios.h> #include <asm/sockios.h>
#include <asm/bitsperlong.h>
/* /*
* For setsockopt(2) * For setsockopt(2)
@ -38,8 +39,8 @@
#define SO_RCVBUF 0x1002 /* Receive buffer. */ #define SO_RCVBUF 0x1002 /* Receive buffer. */
#define SO_SNDLOWAT 0x1003 /* send low-water mark */ #define SO_SNDLOWAT 0x1003 /* send low-water mark */
#define SO_RCVLOWAT 0x1004 /* receive low-water mark */ #define SO_RCVLOWAT 0x1004 /* receive low-water mark */
#define SO_SNDTIMEO 0x1005 /* send timeout */ #define SO_SNDTIMEO_OLD 0x1005 /* send timeout */
#define SO_RCVTIMEO 0x1006 /* receive timeout */ #define SO_RCVTIMEO_OLD 0x1006 /* receive timeout */
#define SO_ACCEPTCONN 0x1009 #define SO_ACCEPTCONN 0x1009
#define SO_PROTOCOL 0x1028 /* protocol type */ #define SO_PROTOCOL 0x1028 /* protocol type */
#define SO_DOMAIN 0x1029 /* domain/socket family */ #define SO_DOMAIN 0x1029 /* domain/socket family */
@ -65,21 +66,14 @@
#define SO_GET_FILTER SO_ATTACH_FILTER #define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 28 #define SO_PEERNAME 28
#define SO_TIMESTAMP 29
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_PEERSEC 30 #define SO_PEERSEC 30
#define SO_SNDBUFFORCE 31 #define SO_SNDBUFFORCE 31
#define SO_RCVBUFFORCE 33 #define SO_RCVBUFFORCE 33
#define SO_PASSSEC 34 #define SO_PASSSEC 34
#define SO_TIMESTAMPNS 35
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SO_MARK 36 #define SO_MARK 36
#define SO_TIMESTAMPING 37
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_RXQ_OVFL 40 #define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41 #define SO_WIFI_STATUS 41
@ -126,4 +120,41 @@
#define SO_TXTIME 61 #define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME #define SCM_TXTIME SO_TXTIME
#define SO_BINDTOIFINDEX 62
#define SO_TIMESTAMP_OLD 29
#define SO_TIMESTAMPNS_OLD 35
#define SO_TIMESTAMPING_OLD 37
#define SO_TIMESTAMP_NEW 63
#define SO_TIMESTAMPNS_NEW 64
#define SO_TIMESTAMPING_NEW 65
#define SO_RCVTIMEO_NEW 66
#define SO_SNDTIMEO_NEW 67
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
#define SO_TIMESTAMP SO_TIMESTAMP_OLD
#define SO_TIMESTAMPNS SO_TIMESTAMPNS_OLD
#define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
#define SO_RCVTIMEO SO_RCVTIMEO_OLD
#define SO_SNDTIMEO SO_SNDTIMEO_OLD
#else
#define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
#define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
#define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
#endif
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#endif
#endif /* _UAPI_ASM_SOCKET_H */ #endif /* _UAPI_ASM_SOCKET_H */

View File

@ -3,6 +3,7 @@
#define _UAPI_ASM_SOCKET_H #define _UAPI_ASM_SOCKET_H
#include <asm/sockios.h> #include <asm/sockios.h>
#include <asm/bitsperlong.h>
/* For setsockopt(2) */ /* For setsockopt(2) */
#define SOL_SOCKET 0xffff #define SOL_SOCKET 0xffff
@ -21,8 +22,8 @@
#define SO_RCVBUFFORCE 0x100b #define SO_RCVBUFFORCE 0x100b
#define SO_SNDLOWAT 0x1003 #define SO_SNDLOWAT 0x1003
#define SO_RCVLOWAT 0x1004 #define SO_RCVLOWAT 0x1004
#define SO_SNDTIMEO 0x1005 #define SO_SNDTIMEO_OLD 0x1005
#define SO_RCVTIMEO 0x1006 #define SO_RCVTIMEO_OLD 0x1006
#define SO_ERROR 0x1007 #define SO_ERROR 0x1007
#define SO_TYPE 0x1008 #define SO_TYPE 0x1008
#define SO_PROTOCOL 0x1028 #define SO_PROTOCOL 0x1028
@ -34,10 +35,6 @@
#define SO_BSDCOMPAT 0x400e #define SO_BSDCOMPAT 0x400e
#define SO_PASSCRED 0x4010 #define SO_PASSCRED 0x4010
#define SO_PEERCRED 0x4011 #define SO_PEERCRED 0x4011
#define SO_TIMESTAMP 0x4012
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_TIMESTAMPNS 0x4013
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
/* Security levels - as per NRL IPv6 - don't actually do anything */ /* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x4016 #define SO_SECURITY_AUTHENTICATION 0x4016
@ -58,9 +55,6 @@
#define SO_MARK 0x401f #define SO_MARK 0x401f
#define SO_TIMESTAMPING 0x4020
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_RXQ_OVFL 0x4021 #define SO_RXQ_OVFL 0x4021
#define SO_WIFI_STATUS 0x4022 #define SO_WIFI_STATUS 0x4022
@ -107,4 +101,40 @@
#define SO_TXTIME 0x4036 #define SO_TXTIME 0x4036
#define SCM_TXTIME SO_TXTIME #define SCM_TXTIME SO_TXTIME
#define SO_BINDTOIFINDEX 0x4037
#define SO_TIMESTAMP_OLD 0x4012
#define SO_TIMESTAMPNS_OLD 0x4013
#define SO_TIMESTAMPING_OLD 0x4020
#define SO_TIMESTAMP_NEW 0x4038
#define SO_TIMESTAMPNS_NEW 0x4039
#define SO_TIMESTAMPING_NEW 0x403A
#define SO_RCVTIMEO_NEW 0x4040
#define SO_SNDTIMEO_NEW 0x4041
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
#define SO_TIMESTAMP SO_TIMESTAMP_OLD
#define SO_TIMESTAMPNS SO_TIMESTAMPNS_OLD
#define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
#define SO_RCVTIMEO SO_RCVTIMEO_OLD
#define SO_SNDTIMEO SO_SNDTIMEO_OLD
#else
#define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
#define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
#define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
#endif
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#endif
#endif /* _UAPI_ASM_SOCKET_H */ #endif /* _UAPI_ASM_SOCKET_H */

View File

@ -337,6 +337,7 @@
#define PPC_INST_DIVWU 0x7c000396 #define PPC_INST_DIVWU 0x7c000396
#define PPC_INST_DIVD 0x7c0003d2 #define PPC_INST_DIVD 0x7c0003d2
#define PPC_INST_RLWINM 0x54000000 #define PPC_INST_RLWINM 0x54000000
#define PPC_INST_RLWINM_DOT 0x54000001
#define PPC_INST_RLWIMI 0x50000000 #define PPC_INST_RLWIMI 0x50000000
#define PPC_INST_RLDICL 0x78000000 #define PPC_INST_RLDICL 0x78000000
#define PPC_INST_RLDICR 0x78000004 #define PPC_INST_RLDICR 0x78000004

View File

@ -11,8 +11,8 @@
#define SO_RCVLOWAT 16 #define SO_RCVLOWAT 16
#define SO_SNDLOWAT 17 #define SO_SNDLOWAT 17
#define SO_RCVTIMEO 18 #define SO_RCVTIMEO_OLD 18
#define SO_SNDTIMEO 19 #define SO_SNDTIMEO_OLD 19
#define SO_PASSCRED 20 #define SO_PASSCRED 20
#define SO_PEERCRED 21 #define SO_PEERCRED 21

View File

@ -165,6 +165,10 @@
#define PPC_RLWINM(d, a, i, mb, me) EMIT(PPC_INST_RLWINM | ___PPC_RA(d) | \ #define PPC_RLWINM(d, a, i, mb, me) EMIT(PPC_INST_RLWINM | ___PPC_RA(d) | \
___PPC_RS(a) | __PPC_SH(i) | \ ___PPC_RS(a) | __PPC_SH(i) | \
__PPC_MB(mb) | __PPC_ME(me)) __PPC_MB(mb) | __PPC_ME(me))
#define PPC_RLWINM_DOT(d, a, i, mb, me) EMIT(PPC_INST_RLWINM_DOT | \
___PPC_RA(d) | ___PPC_RS(a) | \
__PPC_SH(i) | __PPC_MB(mb) | \
__PPC_ME(me))
#define PPC_RLWIMI(d, a, i, mb, me) EMIT(PPC_INST_RLWIMI | ___PPC_RA(d) | \ #define PPC_RLWIMI(d, a, i, mb, me) EMIT(PPC_INST_RLWIMI | ___PPC_RA(d) | \
___PPC_RS(a) | __PPC_SH(i) | \ ___PPC_RS(a) | __PPC_SH(i) | \
__PPC_MB(mb) | __PPC_ME(me)) __PPC_MB(mb) | __PPC_ME(me))

View File

@ -768,36 +768,58 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
case BPF_JMP | BPF_JGT | BPF_X: case BPF_JMP | BPF_JGT | BPF_X:
case BPF_JMP | BPF_JSGT | BPF_K: case BPF_JMP | BPF_JSGT | BPF_K:
case BPF_JMP | BPF_JSGT | BPF_X: case BPF_JMP | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_X:
true_cond = COND_GT; true_cond = COND_GT;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JLT | BPF_K: case BPF_JMP | BPF_JLT | BPF_K:
case BPF_JMP | BPF_JLT | BPF_X: case BPF_JMP | BPF_JLT | BPF_X:
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_X:
true_cond = COND_LT; true_cond = COND_LT;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JGE | BPF_K: case BPF_JMP | BPF_JGE | BPF_K:
case BPF_JMP | BPF_JGE | BPF_X: case BPF_JMP | BPF_JGE | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_K: case BPF_JMP | BPF_JSGE | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_X: case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_X:
true_cond = COND_GE; true_cond = COND_GE;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JLE | BPF_K: case BPF_JMP | BPF_JLE | BPF_K:
case BPF_JMP | BPF_JLE | BPF_X: case BPF_JMP | BPF_JLE | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_X:
true_cond = COND_LE; true_cond = COND_LE;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JEQ | BPF_K: case BPF_JMP | BPF_JEQ | BPF_K:
case BPF_JMP | BPF_JEQ | BPF_X: case BPF_JMP | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JEQ | BPF_X:
true_cond = COND_EQ; true_cond = COND_EQ;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JNE | BPF_K: case BPF_JMP | BPF_JNE | BPF_K:
case BPF_JMP | BPF_JNE | BPF_X: case BPF_JMP | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_X:
true_cond = COND_NE; true_cond = COND_NE;
goto cond_branch; goto cond_branch;
case BPF_JMP | BPF_JSET | BPF_K: case BPF_JMP | BPF_JSET | BPF_K:
case BPF_JMP | BPF_JSET | BPF_X: case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_X:
true_cond = COND_NE; true_cond = COND_NE;
/* Fall through */ /* Fall through */
@ -809,18 +831,44 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
case BPF_JMP | BPF_JLE | BPF_X: case BPF_JMP | BPF_JLE | BPF_X:
case BPF_JMP | BPF_JEQ | BPF_X: case BPF_JMP | BPF_JEQ | BPF_X:
case BPF_JMP | BPF_JNE | BPF_X: case BPF_JMP | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
/* unsigned comparison */ /* unsigned comparison */
PPC_CMPLD(dst_reg, src_reg); if (BPF_CLASS(code) == BPF_JMP32)
PPC_CMPLW(dst_reg, src_reg);
else
PPC_CMPLD(dst_reg, src_reg);
break; break;
case BPF_JMP | BPF_JSGT | BPF_X: case BPF_JMP | BPF_JSGT | BPF_X:
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_X: case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
/* signed comparison */ /* signed comparison */
PPC_CMPD(dst_reg, src_reg); if (BPF_CLASS(code) == BPF_JMP32)
PPC_CMPW(dst_reg, src_reg);
else
PPC_CMPD(dst_reg, src_reg);
break; break;
case BPF_JMP | BPF_JSET | BPF_X: case BPF_JMP | BPF_JSET | BPF_X:
PPC_AND_DOT(b2p[TMP_REG_1], dst_reg, src_reg); case BPF_JMP32 | BPF_JSET | BPF_X:
if (BPF_CLASS(code) == BPF_JMP) {
PPC_AND_DOT(b2p[TMP_REG_1], dst_reg,
src_reg);
} else {
int tmp_reg = b2p[TMP_REG_1];
PPC_AND(tmp_reg, dst_reg, src_reg);
PPC_RLWINM_DOT(tmp_reg, tmp_reg, 0, 0,
31);
}
break; break;
case BPF_JMP | BPF_JNE | BPF_K: case BPF_JMP | BPF_JNE | BPF_K:
case BPF_JMP | BPF_JEQ | BPF_K: case BPF_JMP | BPF_JEQ | BPF_K:
@ -828,43 +876,87 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
case BPF_JMP | BPF_JLT | BPF_K: case BPF_JMP | BPF_JLT | BPF_K:
case BPF_JMP | BPF_JGE | BPF_K: case BPF_JMP | BPF_JGE | BPF_K:
case BPF_JMP | BPF_JLE | BPF_K: case BPF_JMP | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
{
bool is_jmp32 = BPF_CLASS(code) == BPF_JMP32;
/* /*
* Need sign-extended load, so only positive * Need sign-extended load, so only positive
* values can be used as imm in cmpldi * values can be used as imm in cmpldi
*/ */
if (imm >= 0 && imm < 32768) if (imm >= 0 && imm < 32768) {
PPC_CMPLDI(dst_reg, imm); if (is_jmp32)
else { PPC_CMPLWI(dst_reg, imm);
else
PPC_CMPLDI(dst_reg, imm);
} else {
/* sign-extending load */ /* sign-extending load */
PPC_LI32(b2p[TMP_REG_1], imm); PPC_LI32(b2p[TMP_REG_1], imm);
/* ... but unsigned comparison */ /* ... but unsigned comparison */
PPC_CMPLD(dst_reg, b2p[TMP_REG_1]); if (is_jmp32)
PPC_CMPLW(dst_reg,
b2p[TMP_REG_1]);
else
PPC_CMPLD(dst_reg,
b2p[TMP_REG_1]);
} }
break; break;
}
case BPF_JMP | BPF_JSGT | BPF_K: case BPF_JMP | BPF_JSGT | BPF_K:
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_K: case BPF_JMP | BPF_JSGE | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
{
bool is_jmp32 = BPF_CLASS(code) == BPF_JMP32;
/* /*
* signed comparison, so any 16-bit value * signed comparison, so any 16-bit value
* can be used in cmpdi * can be used in cmpdi
*/ */
if (imm >= -32768 && imm < 32768) if (imm >= -32768 && imm < 32768) {
PPC_CMPDI(dst_reg, imm); if (is_jmp32)
else { PPC_CMPWI(dst_reg, imm);
else
PPC_CMPDI(dst_reg, imm);
} else {
PPC_LI32(b2p[TMP_REG_1], imm); PPC_LI32(b2p[TMP_REG_1], imm);
PPC_CMPD(dst_reg, b2p[TMP_REG_1]); if (is_jmp32)
PPC_CMPW(dst_reg,
b2p[TMP_REG_1]);
else
PPC_CMPD(dst_reg,
b2p[TMP_REG_1]);
} }
break; break;
}
case BPF_JMP | BPF_JSET | BPF_K: case BPF_JMP | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_K:
/* andi does not sign-extend the immediate */ /* andi does not sign-extend the immediate */
if (imm >= 0 && imm < 32768) if (imm >= 0 && imm < 32768)
/* PPC_ANDI is _only/always_ dot-form */ /* PPC_ANDI is _only/always_ dot-form */
PPC_ANDI(b2p[TMP_REG_1], dst_reg, imm); PPC_ANDI(b2p[TMP_REG_1], dst_reg, imm);
else { else {
PPC_LI32(b2p[TMP_REG_1], imm); int tmp_reg = b2p[TMP_REG_1];
PPC_AND_DOT(b2p[TMP_REG_1], dst_reg,
b2p[TMP_REG_1]); PPC_LI32(tmp_reg, imm);
if (BPF_CLASS(code) == BPF_JMP) {
PPC_AND_DOT(tmp_reg, dst_reg,
tmp_reg);
} else {
PPC_AND(tmp_reg, dst_reg,
tmp_reg);
PPC_RLWINM_DOT(tmp_reg, tmp_reg,
0, 0, 31);
}
} }
break; break;
} }
@ -1093,6 +1185,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE)); bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
if (!fp->is_func || extra_pass) { if (!fp->is_func || extra_pass) {
bpf_prog_fill_jited_linfo(fp, addrs);
out_addrs: out_addrs:
kfree(addrs); kfree(addrs);
kfree(jit_data); kfree(jit_data);

View File

@ -49,6 +49,7 @@ config RISCV
select RISCV_TIMER select RISCV_TIMER
select GENERIC_IRQ_MULTI_HANDLER select GENERIC_IRQ_MULTI_HANDLER
select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_PTE_SPECIAL
select HAVE_EBPF_JIT if 64BIT
config MMU config MMU
def_bool y def_bool y

View File

@ -77,7 +77,7 @@ KBUILD_IMAGE := $(boot)/Image.gz
head-y := arch/riscv/kernel/head.o head-y := arch/riscv/kernel/head.o
core-y += arch/riscv/kernel/ arch/riscv/mm/ core-y += arch/riscv/kernel/ arch/riscv/mm/ arch/riscv/net/
libs-y += arch/riscv/lib/ libs-y += arch/riscv/lib/

1
arch/riscv/net/Makefile Normal file
View File

@ -0,0 +1 @@
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o

File diff suppressed because it is too large Load Diff

View File

@ -11,13 +11,5 @@
#include <linux/device.h> #include <linux/device.h>
#include <linux/types.h> #include <linux/types.h>
#define PNETIDS_LEN 64 /* Total utility string length in bytes
* to cover up to 4 PNETIDs of 16 bytes
* for up to 4 device ports
*/
#define MAX_PNETID_LEN 16 /* Max.length of a single port PNETID */
#define MAX_PNETID_PORTS (PNETIDS_LEN / MAX_PNETID_LEN)
/* Max. # of ports with a PNETID */
int pnet_id_by_dev_port(struct device *dev, unsigned short port, u8 *pnetid); int pnet_id_by_dev_port(struct device *dev, unsigned short port, u8 *pnetid);
#endif /* _ASM_S390_PNET_H */ #endif /* _ASM_S390_PNET_H */

View File

@ -3,3 +3,4 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_32.h generated-y += unistd_32.h
generated-y += unistd_64.h generated-y += unistd_64.h
generic-y += socket.h

View File

@ -1,117 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* S390 version
*
* Derived from "include/asm-i386/socket.h"
*/
#ifndef _ASM_SOCKET_H
#define _ASM_SOCKET_H
#include <asm/sockios.h>
/* For setsockopt(2) */
#define SOL_SOCKET 1
#define SO_DEBUG 1
#define SO_REUSEADDR 2
#define SO_TYPE 3
#define SO_ERROR 4
#define SO_DONTROUTE 5
#define SO_BROADCAST 6
#define SO_SNDBUF 7
#define SO_RCVBUF 8
#define SO_SNDBUFFORCE 32
#define SO_RCVBUFFORCE 33
#define SO_KEEPALIVE 9
#define SO_OOBINLINE 10
#define SO_NO_CHECK 11
#define SO_PRIORITY 12
#define SO_LINGER 13
#define SO_BSDCOMPAT 14
#define SO_REUSEPORT 15
#define SO_PASSCRED 16
#define SO_PEERCRED 17
#define SO_RCVLOWAT 18
#define SO_SNDLOWAT 19
#define SO_RCVTIMEO 20
#define SO_SNDTIMEO 21
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 22
#define SO_SECURITY_ENCRYPTION_TRANSPORT 23
#define SO_SECURITY_ENCRYPTION_NETWORK 24
#define SO_BINDTODEVICE 25
/* Socket filtering */
#define SO_ATTACH_FILTER 26
#define SO_DETACH_FILTER 27
#define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 28
#define SO_TIMESTAMP 29
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_ACCEPTCONN 30
#define SO_PEERSEC 31
#define SO_PASSSEC 34
#define SO_TIMESTAMPNS 35
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SO_MARK 36
#define SO_TIMESTAMPING 37
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_PROTOCOL 38
#define SO_DOMAIN 39
#define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41
#define SCM_WIFI_STATUS SO_WIFI_STATUS
#define SO_PEEK_OFF 42
/* Instruct lower device to use last 4-bytes of skb data as FCS */
#define SO_NOFCS 43
#define SO_LOCK_FILTER 44
#define SO_SELECT_ERR_QUEUE 45
#define SO_BUSY_POLL 46
#define SO_MAX_PACING_RATE 47
#define SO_BPF_EXTENSIONS 48
#define SO_INCOMING_CPU 49
#define SO_ATTACH_BPF 50
#define SO_DETACH_BPF SO_DETACH_FILTER
#define SO_ATTACH_REUSEPORT_CBPF 51
#define SO_ATTACH_REUSEPORT_EBPF 52
#define SO_CNX_ADVICE 53
#define SCM_TIMESTAMPING_OPT_STATS 54
#define SO_MEMINFO 55
#define SO_INCOMING_NAPI_ID 56
#define SO_COOKIE 57
#define SCM_TIMESTAMPING_PKTINFO 58
#define SO_PEERGROUPS 59
#define SO_ZEROCOPY 60
#define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME
#endif /* _ASM_SOCKET_H */

View File

@ -1110,103 +1110,145 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
mask = 0xf000; /* j */ mask = 0xf000; /* j */
goto branch_oc; goto branch_oc;
case BPF_JMP | BPF_JSGT | BPF_K: /* ((s64) dst > (s64) imm) */ case BPF_JMP | BPF_JSGT | BPF_K: /* ((s64) dst > (s64) imm) */
case BPF_JMP32 | BPF_JSGT | BPF_K: /* ((s32) dst > (s32) imm) */
mask = 0x2000; /* jh */ mask = 0x2000; /* jh */
goto branch_ks; goto branch_ks;
case BPF_JMP | BPF_JSLT | BPF_K: /* ((s64) dst < (s64) imm) */ case BPF_JMP | BPF_JSLT | BPF_K: /* ((s64) dst < (s64) imm) */
case BPF_JMP32 | BPF_JSLT | BPF_K: /* ((s32) dst < (s32) imm) */
mask = 0x4000; /* jl */ mask = 0x4000; /* jl */
goto branch_ks; goto branch_ks;
case BPF_JMP | BPF_JSGE | BPF_K: /* ((s64) dst >= (s64) imm) */ case BPF_JMP | BPF_JSGE | BPF_K: /* ((s64) dst >= (s64) imm) */
case BPF_JMP32 | BPF_JSGE | BPF_K: /* ((s32) dst >= (s32) imm) */
mask = 0xa000; /* jhe */ mask = 0xa000; /* jhe */
goto branch_ks; goto branch_ks;
case BPF_JMP | BPF_JSLE | BPF_K: /* ((s64) dst <= (s64) imm) */ case BPF_JMP | BPF_JSLE | BPF_K: /* ((s64) dst <= (s64) imm) */
case BPF_JMP32 | BPF_JSLE | BPF_K: /* ((s32) dst <= (s32) imm) */
mask = 0xc000; /* jle */ mask = 0xc000; /* jle */
goto branch_ks; goto branch_ks;
case BPF_JMP | BPF_JGT | BPF_K: /* (dst_reg > imm) */ case BPF_JMP | BPF_JGT | BPF_K: /* (dst_reg > imm) */
case BPF_JMP32 | BPF_JGT | BPF_K: /* ((u32) dst_reg > (u32) imm) */
mask = 0x2000; /* jh */ mask = 0x2000; /* jh */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JLT | BPF_K: /* (dst_reg < imm) */ case BPF_JMP | BPF_JLT | BPF_K: /* (dst_reg < imm) */
case BPF_JMP32 | BPF_JLT | BPF_K: /* ((u32) dst_reg < (u32) imm) */
mask = 0x4000; /* jl */ mask = 0x4000; /* jl */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JGE | BPF_K: /* (dst_reg >= imm) */ case BPF_JMP | BPF_JGE | BPF_K: /* (dst_reg >= imm) */
case BPF_JMP32 | BPF_JGE | BPF_K: /* ((u32) dst_reg >= (u32) imm) */
mask = 0xa000; /* jhe */ mask = 0xa000; /* jhe */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JLE | BPF_K: /* (dst_reg <= imm) */ case BPF_JMP | BPF_JLE | BPF_K: /* (dst_reg <= imm) */
case BPF_JMP32 | BPF_JLE | BPF_K: /* ((u32) dst_reg <= (u32) imm) */
mask = 0xc000; /* jle */ mask = 0xc000; /* jle */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JNE | BPF_K: /* (dst_reg != imm) */ case BPF_JMP | BPF_JNE | BPF_K: /* (dst_reg != imm) */
case BPF_JMP32 | BPF_JNE | BPF_K: /* ((u32) dst_reg != (u32) imm) */
mask = 0x7000; /* jne */ mask = 0x7000; /* jne */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JEQ | BPF_K: /* (dst_reg == imm) */ case BPF_JMP | BPF_JEQ | BPF_K: /* (dst_reg == imm) */
case BPF_JMP32 | BPF_JEQ | BPF_K: /* ((u32) dst_reg == (u32) imm) */
mask = 0x8000; /* je */ mask = 0x8000; /* je */
goto branch_ku; goto branch_ku;
case BPF_JMP | BPF_JSET | BPF_K: /* (dst_reg & imm) */ case BPF_JMP | BPF_JSET | BPF_K: /* (dst_reg & imm) */
case BPF_JMP32 | BPF_JSET | BPF_K: /* ((u32) dst_reg & (u32) imm) */
mask = 0x7000; /* jnz */ mask = 0x7000; /* jnz */
/* lgfi %w1,imm (load sign extend imm) */ if (BPF_CLASS(insn->code) == BPF_JMP32) {
EMIT6_IMM(0xc0010000, REG_W1, imm); /* llilf %w1,imm (load zero extend imm) */
/* ngr %w1,%dst */ EMIT6_IMM(0xc00f0000, REG_W1, imm);
EMIT4(0xb9800000, REG_W1, dst_reg); /* nr %w1,%dst */
EMIT2(0x1400, REG_W1, dst_reg);
} else {
/* lgfi %w1,imm (load sign extend imm) */
EMIT6_IMM(0xc0010000, REG_W1, imm);
/* ngr %w1,%dst */
EMIT4(0xb9800000, REG_W1, dst_reg);
}
goto branch_oc; goto branch_oc;
case BPF_JMP | BPF_JSGT | BPF_X: /* ((s64) dst > (s64) src) */ case BPF_JMP | BPF_JSGT | BPF_X: /* ((s64) dst > (s64) src) */
case BPF_JMP32 | BPF_JSGT | BPF_X: /* ((s32) dst > (s32) src) */
mask = 0x2000; /* jh */ mask = 0x2000; /* jh */
goto branch_xs; goto branch_xs;
case BPF_JMP | BPF_JSLT | BPF_X: /* ((s64) dst < (s64) src) */ case BPF_JMP | BPF_JSLT | BPF_X: /* ((s64) dst < (s64) src) */
case BPF_JMP32 | BPF_JSLT | BPF_X: /* ((s32) dst < (s32) src) */
mask = 0x4000; /* jl */ mask = 0x4000; /* jl */
goto branch_xs; goto branch_xs;
case BPF_JMP | BPF_JSGE | BPF_X: /* ((s64) dst >= (s64) src) */ case BPF_JMP | BPF_JSGE | BPF_X: /* ((s64) dst >= (s64) src) */
case BPF_JMP32 | BPF_JSGE | BPF_X: /* ((s32) dst >= (s32) src) */
mask = 0xa000; /* jhe */ mask = 0xa000; /* jhe */
goto branch_xs; goto branch_xs;
case BPF_JMP | BPF_JSLE | BPF_X: /* ((s64) dst <= (s64) src) */ case BPF_JMP | BPF_JSLE | BPF_X: /* ((s64) dst <= (s64) src) */
case BPF_JMP32 | BPF_JSLE | BPF_X: /* ((s32) dst <= (s32) src) */
mask = 0xc000; /* jle */ mask = 0xc000; /* jle */
goto branch_xs; goto branch_xs;
case BPF_JMP | BPF_JGT | BPF_X: /* (dst > src) */ case BPF_JMP | BPF_JGT | BPF_X: /* (dst > src) */
case BPF_JMP32 | BPF_JGT | BPF_X: /* ((u32) dst > (u32) src) */
mask = 0x2000; /* jh */ mask = 0x2000; /* jh */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JLT | BPF_X: /* (dst < src) */ case BPF_JMP | BPF_JLT | BPF_X: /* (dst < src) */
case BPF_JMP32 | BPF_JLT | BPF_X: /* ((u32) dst < (u32) src) */
mask = 0x4000; /* jl */ mask = 0x4000; /* jl */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JGE | BPF_X: /* (dst >= src) */ case BPF_JMP | BPF_JGE | BPF_X: /* (dst >= src) */
case BPF_JMP32 | BPF_JGE | BPF_X: /* ((u32) dst >= (u32) src) */
mask = 0xa000; /* jhe */ mask = 0xa000; /* jhe */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JLE | BPF_X: /* (dst <= src) */ case BPF_JMP | BPF_JLE | BPF_X: /* (dst <= src) */
case BPF_JMP32 | BPF_JLE | BPF_X: /* ((u32) dst <= (u32) src) */
mask = 0xc000; /* jle */ mask = 0xc000; /* jle */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JNE | BPF_X: /* (dst != src) */ case BPF_JMP | BPF_JNE | BPF_X: /* (dst != src) */
case BPF_JMP32 | BPF_JNE | BPF_X: /* ((u32) dst != (u32) src) */
mask = 0x7000; /* jne */ mask = 0x7000; /* jne */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JEQ | BPF_X: /* (dst == src) */ case BPF_JMP | BPF_JEQ | BPF_X: /* (dst == src) */
case BPF_JMP32 | BPF_JEQ | BPF_X: /* ((u32) dst == (u32) src) */
mask = 0x8000; /* je */ mask = 0x8000; /* je */
goto branch_xu; goto branch_xu;
case BPF_JMP | BPF_JSET | BPF_X: /* (dst & src) */ case BPF_JMP | BPF_JSET | BPF_X: /* (dst & src) */
case BPF_JMP32 | BPF_JSET | BPF_X: /* ((u32) dst & (u32) src) */
{
bool is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
mask = 0x7000; /* jnz */ mask = 0x7000; /* jnz */
/* ngrk %w1,%dst,%src */ /* nrk or ngrk %w1,%dst,%src */
EMIT4_RRF(0xb9e40000, REG_W1, dst_reg, src_reg); EMIT4_RRF((is_jmp32 ? 0xb9f40000 : 0xb9e40000),
REG_W1, dst_reg, src_reg);
goto branch_oc; goto branch_oc;
branch_ks: branch_ks:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* lgfi %w1,imm (load sign extend imm) */ /* lgfi %w1,imm (load sign extend imm) */
EMIT6_IMM(0xc0010000, REG_W1, imm); EMIT6_IMM(0xc0010000, REG_W1, imm);
/* cgrj %dst,%w1,mask,off */ /* crj or cgrj %dst,%w1,mask,off */
EMIT6_PCREL(0xec000000, 0x0064, dst_reg, REG_W1, i, off, mask); EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0076 : 0x0064),
dst_reg, REG_W1, i, off, mask);
break; break;
branch_ku: branch_ku:
is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
/* lgfi %w1,imm (load sign extend imm) */ /* lgfi %w1,imm (load sign extend imm) */
EMIT6_IMM(0xc0010000, REG_W1, imm); EMIT6_IMM(0xc0010000, REG_W1, imm);
/* clgrj %dst,%w1,mask,off */ /* clrj or clgrj %dst,%w1,mask,off */
EMIT6_PCREL(0xec000000, 0x0065, dst_reg, REG_W1, i, off, mask); EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0077 : 0x0065),
dst_reg, REG_W1, i, off, mask);
break; break;
branch_xs: branch_xs:
/* cgrj %dst,%src,mask,off */ is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
EMIT6_PCREL(0xec000000, 0x0064, dst_reg, src_reg, i, off, mask); /* crj or cgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0076 : 0x0064),
dst_reg, src_reg, i, off, mask);
break; break;
branch_xu: branch_xu:
/* clgrj %dst,%src,mask,off */ is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
EMIT6_PCREL(0xec000000, 0x0065, dst_reg, src_reg, i, off, mask); /* clrj or clgrj %dst,%src,mask,off */
EMIT6_PCREL(0xec000000, (is_jmp32 ? 0x0077 : 0x0065),
dst_reg, src_reg, i, off, mask);
break; break;
branch_oc: branch_oc:
/* brc mask,jmp_off (branch instruction needs 4 bytes) */ /* brc mask,jmp_off (branch instruction needs 4 bytes) */
jmp_off = addrs[i + off + 1] - (addrs[i + 1] - 4); jmp_off = addrs[i + off + 1] - (addrs[i + 1] - 4);
EMIT4_PCREL(0xa7040000 | mask << 8, jmp_off); EMIT4_PCREL(0xa7040000 | mask << 8, jmp_off);
break; break;
}
default: /* too complex, give up */ default: /* too complex, give up */
pr_err("Unknown opcode %02x\n", insn->code); pr_err("Unknown opcode %02x\n", insn->code);
return -1; return -1;

View File

@ -12,6 +12,15 @@
#include <asm/ccwgroup.h> #include <asm/ccwgroup.h>
#include <asm/ccwdev.h> #include <asm/ccwdev.h>
#include <asm/pnet.h> #include <asm/pnet.h>
#include <asm/ebcdic.h>
#define PNETIDS_LEN 64 /* Total utility string length in bytes
* to cover up to 4 PNETIDs of 16 bytes
* for up to 4 device ports
*/
#define MAX_PNETID_LEN 16 /* Max.length of a single port PNETID */
#define MAX_PNETID_PORTS (PNETIDS_LEN / MAX_PNETID_LEN)
/* Max. # of ports with a PNETID */
/* /*
* Get the PNETIDs from a device. * Get the PNETIDs from a device.
@ -40,6 +49,7 @@ static int pnet_ids_by_device(struct device *dev, u8 *pnetids)
if (!util_str) if (!util_str)
return -ENOMEM; return -ENOMEM;
memcpy(pnetids, util_str, PNETIDS_LEN); memcpy(pnetids, util_str, PNETIDS_LEN);
EBCASC(pnetids, PNETIDS_LEN);
kfree(util_str); kfree(util_str);
return 0; return 0;
} }
@ -47,6 +57,7 @@ static int pnet_ids_by_device(struct device *dev, u8 *pnetids)
struct zpci_dev *zdev = to_zpci(to_pci_dev(dev)); struct zpci_dev *zdev = to_zpci(to_pci_dev(dev));
memcpy(pnetids, zdev->util_str, sizeof(zdev->util_str)); memcpy(pnetids, zdev->util_str, sizeof(zdev->util_str));
EBCASC(pnetids, sizeof(zdev->util_str));
return 0; return 0;
} }
return -EOPNOTSUPP; return -EOPNOTSUPP;

View File

@ -19,6 +19,16 @@ typedef unsigned short __kernel_old_gid_t;
typedef int __kernel_suseconds_t; typedef int __kernel_suseconds_t;
#define __kernel_suseconds_t __kernel_suseconds_t #define __kernel_suseconds_t __kernel_suseconds_t
typedef long __kernel_long_t;
typedef unsigned long __kernel_ulong_t;
#define __kernel_long_t __kernel_long_t
struct __kernel_old_timeval {
__kernel_long_t tv_sec;
__kernel_suseconds_t tv_usec;
};
#define __kernel_old_timeval __kernel_old_timeval
#else #else
/* sparc 32 bit */ /* sparc 32 bit */

View File

@ -3,6 +3,7 @@
#define _ASM_SOCKET_H #define _ASM_SOCKET_H
#include <asm/sockios.h> #include <asm/sockios.h>
#include <asm/bitsperlong.h>
/* For setsockopt(2) */ /* For setsockopt(2) */
#define SOL_SOCKET 0xffff #define SOL_SOCKET 0xffff
@ -20,8 +21,8 @@
#define SO_BSDCOMPAT 0x0400 #define SO_BSDCOMPAT 0x0400
#define SO_RCVLOWAT 0x0800 #define SO_RCVLOWAT 0x0800
#define SO_SNDLOWAT 0x1000 #define SO_SNDLOWAT 0x1000
#define SO_RCVTIMEO 0x2000 #define SO_RCVTIMEO_OLD 0x2000
#define SO_SNDTIMEO 0x4000 #define SO_SNDTIMEO_OLD 0x4000
#define SO_ACCEPTCONN 0x8000 #define SO_ACCEPTCONN 0x8000
#define SO_SNDBUF 0x1001 #define SO_SNDBUF 0x1001
@ -33,7 +34,6 @@
#define SO_PROTOCOL 0x1028 #define SO_PROTOCOL 0x1028
#define SO_DOMAIN 0x1029 #define SO_DOMAIN 0x1029
/* Linux specific, keep the same. */ /* Linux specific, keep the same. */
#define SO_NO_CHECK 0x000b #define SO_NO_CHECK 0x000b
#define SO_PRIORITY 0x000c #define SO_PRIORITY 0x000c
@ -45,19 +45,12 @@
#define SO_GET_FILTER SO_ATTACH_FILTER #define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 0x001c #define SO_PEERNAME 0x001c
#define SO_TIMESTAMP 0x001d
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_PEERSEC 0x001e #define SO_PEERSEC 0x001e
#define SO_PASSSEC 0x001f #define SO_PASSSEC 0x001f
#define SO_TIMESTAMPNS 0x0021
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SO_MARK 0x0022 #define SO_MARK 0x0022
#define SO_TIMESTAMPING 0x0023
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_RXQ_OVFL 0x0024 #define SO_RXQ_OVFL 0x0024
#define SO_WIFI_STATUS 0x0025 #define SO_WIFI_STATUS 0x0025
@ -104,9 +97,47 @@
#define SO_TXTIME 0x003f #define SO_TXTIME 0x003f
#define SCM_TXTIME SO_TXTIME #define SCM_TXTIME SO_TXTIME
#define SO_BINDTOIFINDEX 0x0041
/* Security levels - as per NRL IPv6 - don't actually do anything */ /* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_AUTHENTICATION 0x5001
#define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002
#define SO_SECURITY_ENCRYPTION_NETWORK 0x5004 #define SO_SECURITY_ENCRYPTION_NETWORK 0x5004
#define SO_TIMESTAMP_OLD 0x001d
#define SO_TIMESTAMPNS_OLD 0x0021
#define SO_TIMESTAMPING_OLD 0x0023
#define SO_TIMESTAMP_NEW 0x0046
#define SO_TIMESTAMPNS_NEW 0x0042
#define SO_TIMESTAMPING_NEW 0x0043
#define SO_RCVTIMEO_NEW 0x0044
#define SO_SNDTIMEO_NEW 0x0045
#if !defined(__KERNEL__)
#if __BITS_PER_LONG == 64
#define SO_TIMESTAMP SO_TIMESTAMP_OLD
#define SO_TIMESTAMPNS SO_TIMESTAMPNS_OLD
#define SO_TIMESTAMPING SO_TIMESTAMPING_OLD
#define SO_RCVTIMEO SO_RCVTIMEO_OLD
#define SO_SNDTIMEO SO_SNDTIMEO_OLD
#else
#define SO_TIMESTAMP (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMP_OLD : SO_TIMESTAMP_NEW)
#define SO_TIMESTAMPNS (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPNS_OLD : SO_TIMESTAMPNS_NEW)
#define SO_TIMESTAMPING (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_TIMESTAMPING_OLD : SO_TIMESTAMPING_NEW)
#define SO_RCVTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_RCVTIMEO_OLD : SO_RCVTIMEO_NEW)
#define SO_SNDTIMEO (sizeof(time_t) == sizeof(__kernel_long_t) ? SO_SNDTIMEO_OLD : SO_SNDTIMEO_NEW)
#endif
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#endif
#endif /* _ASM_SOCKET_H */ #endif /* _ASM_SOCKET_H */

View File

@ -3,3 +3,4 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_32.h generated-y += unistd_32.h
generated-y += unistd_64.h generated-y += unistd_64.h
generated-y += unistd_x32.h generated-y += unistd_x32.h
generic-y += socket.h

View File

@ -1 +0,0 @@
#include <asm-generic/socket.h>

View File

@ -881,20 +881,41 @@ xadd: if (is_imm8(insn->off))
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_X: case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
/* cmp dst_reg, src_reg */ /* cmp dst_reg, src_reg */
EMIT3(add_2mod(0x48, dst_reg, src_reg), 0x39, if (BPF_CLASS(insn->code) == BPF_JMP)
add_2reg(0xC0, dst_reg, src_reg)); EMIT1(add_2mod(0x48, dst_reg, src_reg));
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
EMIT2(0x39, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp; goto emit_cond_jmp;
case BPF_JMP | BPF_JSET | BPF_X: case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
/* test dst_reg, src_reg */ /* test dst_reg, src_reg */
EMIT3(add_2mod(0x48, dst_reg, src_reg), 0x85, if (BPF_CLASS(insn->code) == BPF_JMP)
add_2reg(0xC0, dst_reg, src_reg)); EMIT1(add_2mod(0x48, dst_reg, src_reg));
else if (is_ereg(dst_reg) || is_ereg(src_reg))
EMIT1(add_2mod(0x40, dst_reg, src_reg));
EMIT2(0x85, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp; goto emit_cond_jmp;
case BPF_JMP | BPF_JSET | BPF_K: case BPF_JMP | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_K:
/* test dst_reg, imm32 */ /* test dst_reg, imm32 */
EMIT1(add_1mod(0x48, dst_reg)); if (BPF_CLASS(insn->code) == BPF_JMP)
EMIT1(add_1mod(0x48, dst_reg));
else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg));
EMIT2_off32(0xF7, add_1reg(0xC0, dst_reg), imm32); EMIT2_off32(0xF7, add_1reg(0xC0, dst_reg), imm32);
goto emit_cond_jmp; goto emit_cond_jmp;
@ -908,8 +929,21 @@ xadd: if (is_imm8(insn->off))
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_K: case BPF_JMP | BPF_JSGE | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
/* cmp dst_reg, imm8/32 */ /* cmp dst_reg, imm8/32 */
EMIT1(add_1mod(0x48, dst_reg)); if (BPF_CLASS(insn->code) == BPF_JMP)
EMIT1(add_1mod(0x48, dst_reg));
else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg));
if (is_imm8(imm32)) if (is_imm8(imm32))
EMIT3(0x83, add_1reg(0xF8, dst_reg), imm32); EMIT3(0x83, add_1reg(0xF8, dst_reg), imm32);

View File

@ -2072,7 +2072,18 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
case BPF_JMP | BPF_JSGT | BPF_X: case BPF_JMP | BPF_JSGT | BPF_X:
case BPF_JMP | BPF_JSLE | BPF_X: case BPF_JMP | BPF_JSLE | BPF_X:
case BPF_JMP | BPF_JSLT | BPF_X: case BPF_JMP | BPF_JSLT | BPF_X:
case BPF_JMP | BPF_JSGE | BPF_X: { case BPF_JMP | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JEQ | BPF_X:
case BPF_JMP32 | BPF_JNE | BPF_X:
case BPF_JMP32 | BPF_JGT | BPF_X:
case BPF_JMP32 | BPF_JLT | BPF_X:
case BPF_JMP32 | BPF_JGE | BPF_X:
case BPF_JMP32 | BPF_JLE | BPF_X:
case BPF_JMP32 | BPF_JSGT | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
case BPF_JMP32 | BPF_JSLT | BPF_X:
case BPF_JMP32 | BPF_JSGE | BPF_X: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
u8 dreg_lo = dstk ? IA32_EAX : dst_lo; u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
u8 dreg_hi = dstk ? IA32_EDX : dst_hi; u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
u8 sreg_lo = sstk ? IA32_ECX : src_lo; u8 sreg_lo = sstk ? IA32_ECX : src_lo;
@ -2081,25 +2092,35 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
if (dstk) { if (dstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
STACK_VAR(dst_lo)); STACK_VAR(dst_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), if (is_jmp64)
STACK_VAR(dst_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
} }
if (sstk) { if (sstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
STACK_VAR(src_lo)); STACK_VAR(src_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), if (is_jmp64)
STACK_VAR(src_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EBX),
STACK_VAR(src_hi));
} }
/* cmp dreg_hi,sreg_hi */ if (is_jmp64) {
EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi)); /* cmp dreg_hi,sreg_hi */
EMIT2(IA32_JNE, 2); EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
EMIT2(IA32_JNE, 2);
}
/* cmp dreg_lo,sreg_lo */ /* cmp dreg_lo,sreg_lo */
EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo)); EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));
goto emit_cond_jmp; goto emit_cond_jmp;
} }
case BPF_JMP | BPF_JSET | BPF_X: { case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
u8 dreg_lo = dstk ? IA32_EAX : dst_lo; u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
u8 dreg_hi = dstk ? IA32_EDX : dst_hi; u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
u8 sreg_lo = sstk ? IA32_ECX : src_lo; u8 sreg_lo = sstk ? IA32_ECX : src_lo;
@ -2108,15 +2129,21 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
if (dstk) { if (dstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
STACK_VAR(dst_lo)); STACK_VAR(dst_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), if (is_jmp64)
STACK_VAR(dst_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
} }
if (sstk) { if (sstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
STACK_VAR(src_lo)); STACK_VAR(src_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), if (is_jmp64)
STACK_VAR(src_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EBX),
STACK_VAR(src_hi));
} }
/* and dreg_lo,sreg_lo */ /* and dreg_lo,sreg_lo */
EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo)); EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
@ -2126,32 +2153,39 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi)); EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
goto emit_cond_jmp; goto emit_cond_jmp;
} }
case BPF_JMP | BPF_JSET | BPF_K: { case BPF_JMP | BPF_JSET | BPF_K:
u32 hi; case BPF_JMP32 | BPF_JSET | BPF_K: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
u8 dreg_lo = dstk ? IA32_EAX : dst_lo; u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
u8 dreg_hi = dstk ? IA32_EDX : dst_hi; u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
u8 sreg_lo = IA32_ECX; u8 sreg_lo = IA32_ECX;
u8 sreg_hi = IA32_EBX; u8 sreg_hi = IA32_EBX;
u32 hi;
if (dstk) { if (dstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
STACK_VAR(dst_lo)); STACK_VAR(dst_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), if (is_jmp64)
STACK_VAR(dst_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
} }
hi = imm32 & (1<<31) ? (u32)~0 : 0;
/* mov ecx,imm32 */ /* mov ecx,imm32 */
EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32); EMIT2_off32(0xC7, add_1reg(0xC0, sreg_lo), imm32);
/* mov ebx,imm32 */
EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
/* and dreg_lo,sreg_lo */ /* and dreg_lo,sreg_lo */
EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo)); EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
/* and dreg_hi,sreg_hi */ if (is_jmp64) {
EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi)); hi = imm32 & (1 << 31) ? (u32)~0 : 0;
/* or dreg_lo,dreg_hi */ /* mov ebx,imm32 */
EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi)); EMIT2_off32(0xC7, add_1reg(0xC0, sreg_hi), hi);
/* and dreg_hi,sreg_hi */
EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
/* or dreg_lo,dreg_hi */
EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
}
goto emit_cond_jmp; goto emit_cond_jmp;
} }
case BPF_JMP | BPF_JEQ | BPF_K: case BPF_JMP | BPF_JEQ | BPF_K:
@ -2163,29 +2197,44 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
case BPF_JMP | BPF_JSGT | BPF_K: case BPF_JMP | BPF_JSGT | BPF_K:
case BPF_JMP | BPF_JSLE | BPF_K: case BPF_JMP | BPF_JSLE | BPF_K:
case BPF_JMP | BPF_JSLT | BPF_K: case BPF_JMP | BPF_JSLT | BPF_K:
case BPF_JMP | BPF_JSGE | BPF_K: { case BPF_JMP | BPF_JSGE | BPF_K:
u32 hi; case BPF_JMP32 | BPF_JEQ | BPF_K:
case BPF_JMP32 | BPF_JNE | BPF_K:
case BPF_JMP32 | BPF_JGT | BPF_K:
case BPF_JMP32 | BPF_JLT | BPF_K:
case BPF_JMP32 | BPF_JGE | BPF_K:
case BPF_JMP32 | BPF_JLE | BPF_K:
case BPF_JMP32 | BPF_JSGT | BPF_K:
case BPF_JMP32 | BPF_JSLE | BPF_K:
case BPF_JMP32 | BPF_JSLT | BPF_K:
case BPF_JMP32 | BPF_JSGE | BPF_K: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
u8 dreg_lo = dstk ? IA32_EAX : dst_lo; u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
u8 dreg_hi = dstk ? IA32_EDX : dst_hi; u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
u8 sreg_lo = IA32_ECX; u8 sreg_lo = IA32_ECX;
u8 sreg_hi = IA32_EBX; u8 sreg_hi = IA32_EBX;
u32 hi;
if (dstk) { if (dstk) {
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
STACK_VAR(dst_lo)); STACK_VAR(dst_lo));
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), if (is_jmp64)
STACK_VAR(dst_hi)); EMIT3(0x8B,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
} }
hi = imm32 & (1<<31) ? (u32)~0 : 0;
/* mov ecx,imm32 */ /* mov ecx,imm32 */
EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32); EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
/* mov ebx,imm32 */ if (is_jmp64) {
EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi); hi = imm32 & (1 << 31) ? (u32)~0 : 0;
/* mov ebx,imm32 */
/* cmp dreg_hi,sreg_hi */ EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi)); /* cmp dreg_hi,sreg_hi */
EMIT2(IA32_JNE, 2); EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
EMIT2(IA32_JNE, 2);
}
/* cmp dreg_lo,sreg_lo */ /* cmp dreg_lo,sreg_lo */
EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo)); EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));

View File

@ -25,6 +25,7 @@ generic-y += percpu.h
generic-y += preempt.h generic-y += preempt.h
generic-y += rwsem.h generic-y += rwsem.h
generic-y += sections.h generic-y += sections.h
generic-y += socket.h
generic-y += topology.h generic-y += topology.h
generic-y += trace_clock.h generic-y += trace_clock.h
generic-y += vga.h generic-y += vga.h

View File

@ -2,3 +2,4 @@ include include/uapi/asm-generic/Kbuild.asm
generated-y += unistd_32.h generated-y += unistd_32.h
generic-y += kvm_para.h generic-y += kvm_para.h
generic-y += socket.h

View File

@ -1,122 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
/*
* include/asm-xtensa/socket.h
*
* Copied from i386.
*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*/
#ifndef _XTENSA_SOCKET_H
#define _XTENSA_SOCKET_H
#include <asm/sockios.h>
/* For setsockoptions(2) */
#define SOL_SOCKET 1
#define SO_DEBUG 1
#define SO_REUSEADDR 2
#define SO_TYPE 3
#define SO_ERROR 4
#define SO_DONTROUTE 5
#define SO_BROADCAST 6
#define SO_SNDBUF 7
#define SO_RCVBUF 8
#define SO_SNDBUFFORCE 32
#define SO_RCVBUFFORCE 33
#define SO_KEEPALIVE 9
#define SO_OOBINLINE 10
#define SO_NO_CHECK 11
#define SO_PRIORITY 12
#define SO_LINGER 13
#define SO_BSDCOMPAT 14
#define SO_REUSEPORT 15
#define SO_PASSCRED 16
#define SO_PEERCRED 17
#define SO_RCVLOWAT 18
#define SO_SNDLOWAT 19
#define SO_RCVTIMEO 20
#define SO_SNDTIMEO 21
/* Security levels - as per NRL IPv6 - don't actually do anything */
#define SO_SECURITY_AUTHENTICATION 22
#define SO_SECURITY_ENCRYPTION_TRANSPORT 23
#define SO_SECURITY_ENCRYPTION_NETWORK 24
#define SO_BINDTODEVICE 25
/* Socket filtering */
#define SO_ATTACH_FILTER 26
#define SO_DETACH_FILTER 27
#define SO_GET_FILTER SO_ATTACH_FILTER
#define SO_PEERNAME 28
#define SO_TIMESTAMP 29
#define SCM_TIMESTAMP SO_TIMESTAMP
#define SO_ACCEPTCONN 30
#define SO_PEERSEC 31
#define SO_PASSSEC 34
#define SO_TIMESTAMPNS 35
#define SCM_TIMESTAMPNS SO_TIMESTAMPNS
#define SO_MARK 36
#define SO_TIMESTAMPING 37
#define SCM_TIMESTAMPING SO_TIMESTAMPING
#define SO_PROTOCOL 38
#define SO_DOMAIN 39
#define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41
#define SCM_WIFI_STATUS SO_WIFI_STATUS
#define SO_PEEK_OFF 42
/* Instruct lower device to use last 4-bytes of skb data as FCS */
#define SO_NOFCS 43
#define SO_LOCK_FILTER 44
#define SO_SELECT_ERR_QUEUE 45
#define SO_BUSY_POLL 46
#define SO_MAX_PACING_RATE 47
#define SO_BPF_EXTENSIONS 48
#define SO_INCOMING_CPU 49
#define SO_ATTACH_BPF 50
#define SO_DETACH_BPF SO_DETACH_FILTER
#define SO_ATTACH_REUSEPORT_CBPF 51
#define SO_ATTACH_REUSEPORT_EBPF 52
#define SO_CNX_ADVICE 53
#define SCM_TIMESTAMPING_OPT_STATS 54
#define SO_MEMINFO 55
#define SO_INCOMING_NAPI_ID 56
#define SO_COOKIE 57
#define SCM_TIMESTAMPING_PKTINFO 58
#define SO_PEERGROUPS 59
#define SO_ZEROCOPY 60
#define SO_TXTIME 61
#define SCM_TXTIME SO_TXTIME
#endif /* _XTENSA_SOCKET_H */

View File

@ -10,13 +10,13 @@
#include <linux/delay.h> #include <linux/delay.h>
#define bcma_err(bus, fmt, ...) \ #define bcma_err(bus, fmt, ...) \
pr_err("bus%d: " fmt, (bus)->num, ##__VA_ARGS__) dev_err((bus)->dev, "bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_warn(bus, fmt, ...) \ #define bcma_warn(bus, fmt, ...) \
pr_warn("bus%d: " fmt, (bus)->num, ##__VA_ARGS__) dev_warn((bus)->dev, "bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_info(bus, fmt, ...) \ #define bcma_info(bus, fmt, ...) \
pr_info("bus%d: " fmt, (bus)->num, ##__VA_ARGS__) dev_info((bus)->dev, "bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
#define bcma_debug(bus, fmt, ...) \ #define bcma_debug(bus, fmt, ...) \
pr_debug("bus%d: " fmt, (bus)->num, ##__VA_ARGS__) dev_dbg((bus)->dev, "bus%d: " fmt, (bus)->num, ##__VA_ARGS__)
struct bcma_bus; struct bcma_bus;
@ -33,7 +33,6 @@ int __init bcma_bus_early_register(struct bcma_bus *bus);
int bcma_bus_suspend(struct bcma_bus *bus); int bcma_bus_suspend(struct bcma_bus *bus);
int bcma_bus_resume(struct bcma_bus *bus); int bcma_bus_resume(struct bcma_bus *bus);
#endif #endif
struct device *bcma_bus_get_host_dev(struct bcma_bus *bus);
/* scan.c */ /* scan.c */
void bcma_detect_chip(struct bcma_bus *bus); void bcma_detect_chip(struct bcma_bus *bus);

View File

@ -183,7 +183,7 @@ int bcma_gpio_init(struct bcma_drv_cc *cc)
chip->direction_input = bcma_gpio_direction_input; chip->direction_input = bcma_gpio_direction_input;
chip->direction_output = bcma_gpio_direction_output; chip->direction_output = bcma_gpio_direction_output;
chip->owner = THIS_MODULE; chip->owner = THIS_MODULE;
chip->parent = bcma_bus_get_host_dev(bus); chip->parent = bus->dev;
#if IS_BUILTIN(CONFIG_OF) #if IS_BUILTIN(CONFIG_OF)
chip->of_node = cc->core->dev.of_node; chip->of_node = cc->core->dev.of_node;
#endif #endif

View File

@ -196,6 +196,8 @@ static int bcma_host_pci_probe(struct pci_dev *dev,
goto err_pci_release_regions; goto err_pci_release_regions;
} }
bus->dev = &dev->dev;
/* Map MMIO */ /* Map MMIO */
err = -ENOMEM; err = -ENOMEM;
bus->mmio = pci_iomap(dev, 0, ~0UL); bus->mmio = pci_iomap(dev, 0, ~0UL);

View File

@ -179,7 +179,6 @@ int __init bcma_host_soc_register(struct bcma_soc *soc)
/* Host specific */ /* Host specific */
bus->hosttype = BCMA_HOSTTYPE_SOC; bus->hosttype = BCMA_HOSTTYPE_SOC;
bus->ops = &bcma_host_soc_ops; bus->ops = &bcma_host_soc_ops;
bus->host_pdev = NULL;
/* Initialize struct, detect chip */ /* Initialize struct, detect chip */
bcma_init_bus(bus); bcma_init_bus(bus);
@ -213,6 +212,8 @@ static int bcma_host_soc_probe(struct platform_device *pdev)
if (!bus) if (!bus)
return -ENOMEM; return -ENOMEM;
bus->dev = dev;
/* Map MMIO */ /* Map MMIO */
bus->mmio = of_iomap(np, 0); bus->mmio = of_iomap(np, 0);
if (!bus->mmio) if (!bus->mmio)
@ -221,7 +222,6 @@ static int bcma_host_soc_probe(struct platform_device *pdev)
/* Host specific */ /* Host specific */
bus->hosttype = BCMA_HOSTTYPE_SOC; bus->hosttype = BCMA_HOSTTYPE_SOC;
bus->ops = &bcma_host_soc_ops; bus->ops = &bcma_host_soc_ops;
bus->host_pdev = pdev;
/* Initialize struct, detect chip */ /* Initialize struct, detect chip */
bcma_init_bus(bus); bcma_init_bus(bus);

Some files were not shown because too many files have changed in this diff Show More