Try to access TP indirect registers via firmware first. If this fails,
fallback and access them directly. This ensures that driver and
firmware do not conflict each other while accessing the TP indirect
registers.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Collect firmware mbox and device logs before collecting the rest of
the hardware dumps to snap the firmware state before the mailbox logs
are updated by other hardware dumps.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Collect EDC0 and EDC1 memory dump.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add base to collect dump entities. Collect register dump and
update template header accordingly.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement operations to set/get dump data via ethtool. Also add
template header that precedes dump data, which helps in decoding
and extracting the dump data.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today's -next build encountered an error due to a missing definition of
WARN_ON(), caused by some header reorganization removing an implicit
inclusion of linux/bug.h. Fix this with an explicit inclusion.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: remove .set_addr
An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.
If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch.
This won't make sense anymore in a multi-CPU ports system, because there
won't be a unique master device assigned to a switch tree.
Moreover this operation is confusing because it makes the user think
that it could be used to program the switch with the MAC address of the
CPU/management port such that MAC address learning can be disabled on
said port, but in fact, that's not how it is currently used.
To fix this, assign a random MAC address at setup time in the mv88e6060
and mv88e6xxx drivers before removing .set_addr completely from DSA.
Changes in v3:
- include fix for mv88e6060 switch MAC address setter.
Changes in v2:
- remove .set_addr implementation from drivers and use a random MAC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that there is no user for the .set_addr function, remove it from
DSA. If a switch supports this feature (like mv88e6xxx), the
implementation can be done in the driver setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The .set_addr function does nothing, remove the dsa_loop implementation
before getting rid of it completely in DSA.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for mv88e6xxx, setup the switch from within the mv88e6060 driver with
a random MAC address, and remove the .set_addr implementation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E6060 Ethernet switch always transmits the multicast bit of the
switch MAC address as a zero. It re-uses the corresponding bit 8 of the
register "Switch MAC Address Register Bytes 0 & 1" for "DiffAddr".
If the "DiffAddr" bit is 0, then all ports transmit the same source
address. If it is set to 1, then bit 2:0 are used for the port number.
The mv88e6060 driver is currently wrongly shifting the MAC address byte
0 by 9. To fix this, shift it by 8 as usual and clear its bit 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An Ethernet switch may support having a MAC address, which can be used
as the switch's source address in transmitted full-duplex Pause frames.
If a DSA switch supports the related .set_addr operation, the DSA core
sets the master's MAC address on the switch. This won't make sense
anymore in a multi-CPU ports system, because there won't be a unique
master device assigned to a switch tree.
Instead, setup the switch from within the Marvell driver with a random
MAC address, and remove the .set_addr implementation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: bpf: support direct packet access
The core of this series is direct packet access support. With a
small change to the verifier, the offloaded code can now make
use of DPA. We need to be careful to use kernel (after initial
translation) offsets in our JIT. Direct packet access also brings
us to the problem of eBPF endianness. After considering the
changes necessary we decided to not support translation on both
BE and LE hosts, for now.
This series contains two fixes - one for compare instructions and
one for ineffective jne optimization. I chose to include fixes
in this set because the code in -net works only with unreleased
PoC FW (ABI version 1) and therefore nobody outside of Netronome
can exercise it anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for direct packet access in TC, note that because
writing the packet will cause the verifier to generate a csum
fixup prologue we won't be able to offload packet writes from
TC, just yet, only the reads will work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ability to write packet contents using pre-validated
packet pointers (direct packet access).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In direct packet access bound checks are already done, we can
simply dereference the packet pointer.
Verifier/parser logic needs to record pointer type. Note that
although verifier does protect us from CTX vs other pointer
changes we will also want to differentiate between PACKET vs
MAP_VALUE or STACK, so we can add the check already.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move data load into a separate function and separate it from
packet length checks of legacy I/O. This makes the code more
readable and easier to reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend
on target architecture. Take that into account and use struct xdp_buff,
not struct xdp_md.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eBPF is host-endian specific. Translating both BE and LE eBPF
to the NFP is feasible, but would require quite a bit of indirection.
The fact that I don't have access to any BE hosts that would fit
a 25G/40G/100G NIC is also limiting my ability to test big endian.
For now restrict the offload to little endian hosts only.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement byte swaps with rotations, shifts and byte loads.
Remember to clear upper parts of the 64 bit registers.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register move operation is encoded as alu no op. This means
that one has to specify number of unused/none parameters to
the emit_alu(). Add a helper.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have BPF assemebler support in LLVM 6 we can easily
test all compare instructions (LLVM 4 didn't generate most of them
from C). Fix the compare to immediates and refactor the order
of compare to regs to make sure they both follow the same pattern.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We optimize comparisons to immediate 0 as if (reg.lo | reg.hi).
The early return statement was missing, however, which means we
would generate two comparisons - optimized one followed by a
normal 2x 32 bit compare.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ld_field instruction has the following format in NFP assembler:
ld_field[dst, 1000, src, <<24]
reoder parameters to emit_ld_field_any() to make it closer to
the familiar assembler order.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a simplified is_valid_access() callback when verifier
is used for program analysis by non-host JITs. This allows
us to teach the verifier about packet start and packet end
offsets for direct packet access.
We can extend the callback as needed but for most packet
processing needs there isn't much more the offloads may
require.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu says:
====================
net: stmmac: Improvements for multi-queuing and for AVB
Two improvements for stmmac: First one corrects the available fifo
size per queue, second one corrects enabling of AVB queues. More info
in commit log.
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Changes from v1:
- Fix typo in second patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Flow control must be disabled for AVB enabled queues and TX
AVB queues must be enabled by setting BIT(2) of TXQEN.
Correct this by passing the queue mode to DMA callbacks
and by checking in these functions wether we are in AVB
performing the necessary adjustments.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we are using all the available fifo size in RQS and
TQS fields. This will not work correctly in multi-queues IP's
because total fifo size must be splitted to the enabled queues.
Correct this by computing the available fifo size per queue and
setting the right value in TQS and RQS fields.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).
However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.
The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:
# start netserver in the test namespace
ip netns add test
ip netns exec test netserver
# create a VETH pair
ip link add name veth0 type veth peer name veth0 netns test
ip link set veth0 up
ip -n test link set veth0 up
for i in $(seq 20 29); do
# assign addresses to both ends
ip addr add dev veth0 192.168.$i.1/24
ip -n test addr add dev veth0 192.168.$i.2/24
# start the traffic
netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
done
# wait
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host
We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".
While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: Introduce Communcation Group feature
With this commit series we introduce a 'Group Communication' feature in
order to resolve the datagram and multicast flow control problem. This
new feature makes it possible for a user to instantiate multiple private
virtual brokerless message buses by just creating and joining member
sockets.
The main features are as follows:
---------------------------------
- Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
If it is the first socket of the group this implies creation of the
group. This call takes four parameters: 'type' serves as group
identifier, 'instance' serves as member identifier, and 'scope'
indicates the visibility of the group (node/cluster/zone). Finally,
'flags' indicates different options for the socket joining the group.
For the time being, there are only two such flags: 1) 'LOOPBACK'
indicates if the creator of the socket wants to receive a copy of
broadcast or multicast messages it sends to the group, 2) EVENTS
indicates if it wants to receive membership (JOINED/LEFT) events for
the other members of the group.
- Groups are closed, i.e., sockets which have not joined a group will
not be able to send messages to or receive messages from members of
the group, and vice versa. A socket can only be member of one group
at a time.
- There are four transmission modes.
1: Unicast. The sender transmits a message using the port identity
(node:port tuple) of the receiving socket.
2: Anycast. The sender transmits a message using a port name (type:
instance:scope) of one of the receiving sockets. If more than
one member socket matches the given address a destination is
selected according to a round-robin algorithm, but also considering
the destination load (advertised window size) as an additional
criteria.
3: Multicast. The sender transmits a message using a port name
(type:instance:scope) of one or more of the receiving sockets.
All sockets in the group matching the given address will receive
a copy of the message.
4: Broadcast. The sender transmits a message using the primtive
send(). All members of the group, irrespective of their member
identity (instance) number receive a copy of the message.
- TIPC broadcast is used for carrying messages in mode 3 or 4 when
this is deemed more efficient, i.e., depending on number of actual
destinations.
- All transmission modes are flow controlled, so that messages never
are dropped or rejected, just like we are used to from connection
oriented communication. A special algorithm guarantees that this is
true even for multipoint-to-point communication, i.e., at occasions
where many source sockets may decide to send simultaneously towards
the same destination socket.
- Sequence order is always guaranteed, even between the different
transmission modes.
- Member join/leave events are received in all other member sockets
in guaranteed order. I.e., a 'JOINED' (an empty message with the OOB
bit set) will always be received before the first data message from
a new member, and a 'LEAVE' (like 'JOINED', but with EOR bit set) will
always arrive after the last data message from a leaving member.
-----
v2: Reordered variable declarations in descending length order, as per
feedback from David Miller. This was done as far as permitted by the
the initialization order.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We already have point-to-multipoint flow control within a group. But
we even need the opposite; -a scheme which can handle that potentially
hundreds of sources may try to send messages to the same destination
simultaneously without causing buffer overflow at the recipient. This
commit adds such a mechanism.
The algorithm works as follows:
- When a member detects a new, joining member, it initially set its
state to JOINED and advertises a minimum window to the new member.
This window is chosen so that the new member can send exactly one
maximum sized message, or several smaller ones, to the recipient
before it must stop and wait for an additional advertisement. This
minimum window ADV_IDLE is set to 65 1kB blocks.
- When a member receives the first data message from a JOINED member,
it changes the state of the latter to ACTIVE, and advertises a larger
window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can
continue sending with minimal disturbances to the data flow.
- The active members are kept in a dedicated linked list. Each time a
message is received from an active member, it will be moved to the
tail of that list. This way, we keep a record of which members have
been most (tail) and least (head) recently active.
- There is a maximum number (16) of permitted simultaneous active
senders per receiver. When this limit is reached, the receiver will
not advertise anything immediately to a new sender, but instead put
it in a PENDING state, and add it to a corresponding queue. At the
same time, it will pick the least recently active member, send it an
advertisement RECLAIM message, and set this member to state
RECLAIMING.
- The reclaimee member has to respond with a REMIT message, meaning that
it goes back to a send window of ADV_IDLE, and returns its unused
advertised blocks beyond that value to the reclaiming member.
- When the reclaiming member receives the REMIT message, it unlinks
the reclaimee from its active list, resets its state to JOINED, and
notes that it is now back at ADV_IDLE advertised blocks to that
member. If there are still unread data messages sent out by
reclaimee before the REMIT, the member goes into an intermediate
state REMITTED, where it stays until the said messages have been
consumed.
- The returned advertised blocks can now be re-advertised to the
pending member, which is now set to state ACTIVE and added to
the active member list.
- To be proactive, i.e., to minimize the risk that any member will
end up in the pending queue, we start reclaiming resources already
when the number of active members exceeds 3/4 of the permitted
maximum.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following scenario is possible:
- A user sends a broadcast message, and thereafter immediately leaves
the group.
- The LEAVE message, following a different path than the broadcast,
arrives ahead of the broadcast, and the sending member is removed
from the receiver's list.
- The broadcast message arrives, but is dropped because the sender
now is unknown to the receipient.
We fix this by sequence numbering membership events, just like ordinary
unicast messages. Currently, when a JOIN is sent to a peer, it contains
a synchronization point, - the sequence number of the next sent
broadcast, in order to give the receiver a start synchronization point.
We now let even LEAVE messages contain such an "end synchronization"
point, so that the recipient can delay the removal of the sending member
until it knows that all messages have been received.
The received synchronization points are added as sequence numbers to the
generated membership events, making it possible to handle them almost
the same way as regular unicasts in the receiving filter function. In
particular, a DOWN event with a too high sequence number will be kept
in the reordering queue until the missing broadcast(s) arrive and have
been delivered.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following scenario is possible:
- A user joins a group, and immediately sends out a broadcast message
to its members.
- The broadcast message, following a different data path than the
initial JOIN message sent out during the joining procedure, arrives
to a receiver before the latter..
- The receiver drops the message, since it is not ready to accept any
messages until the JOIN has arrived.
We avoid this by treating group protocol JOIN messages like unicast
messages.
- We let them pass through the recipient's multicast input queue, just
like ordinary unicasts.
- We force the first following broadacst to be sent as replicated
unicast and being acknowledged by the recipient before accepting
any more broadcast transmissions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a mechanism guaranteeing that group unicasts sent out from a
socket are not bypassed by later sent broadcasts from the same socket.
We do this as follows:
- Each time a unicast is sent, we set a the broadcast method for the
socket to "replicast" and "mandatory". This forces the first
subsequent broadcast message to follow the same network and data path
as the preceding unicast to a destination, hence preventing it from
overtaking the latter.
- In order to make the 'same data path' statement above true, we let
group unicasts pass through the multicast link input queue, instead
of as previously through the unicast link input queue.
- In the first broadcast following a unicast, we set a new header flag,
requiring all recipients to immediately acknowledge its reception.
- During the period before all the expected acknowledges are received,
the socket refuses to accept any more broadcast attempts, i.e., by
blocking or returning EAGAIN. This period should typically not be
longer than a few microseconds.
- When all acknowledges have been received, the sending socket will
open up for subsequent broadcasts, this time giving the link layer
freedom to itself select the best transmission method.
- The forced and/or abrupt transmission method changes described above
may lead to broadcasts arriving out of order to the recipients. We
remedy this by introducing code that checks and if necessary
re-orders such messages at the receiving end.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Group unicast messages don't follow the same path as broadcast messages,
and there is a high risk that unicasts sent from a socket might bypass
previously sent broadcasts from the same socket.
We fix this by letting all unicast messages carry the sequence number of
the next sent broadcast from the same node, but without updating this
number at the receiver. This way, a receiver can check and if necessary
re-order such messages before they are added to the socket receive buffer.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previously introduced message transport to all group members is
based on the tipc multicast service, but is logically a broadcast
service within the group, and that is what we call it.
We now add functionality for sending messages to all group members
having a certain identity. Correspondingly, we call this feature 'group
multicast'. The service is using unicast when only one destination is
found, otherwise it will use the bearer broadcast service to transfer
the messages. In the latter case, the receiving members filter arriving
messages by looking at the intended destination instance. If there is
no match, the message will be dropped, while still being considered
received and read as seen by the flow control mechanism.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this commit, we make it possible to send connectionless unicast
messages to any member corresponding to the given member identity,
when there is more than one such member. The sender must use a
TIPC_ADDR_NAME address to achieve this effect.
We also perform load balancing between the destinations, i.e., we
primarily select one which has advertised sufficient send window
to not cause a block/EAGAIN delay, if any. This mechanism is
overlayed on the always present round-robin selection.
Anycast messages are subject to the same start synchronization
and flow control mechanism as group broadcast messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now make it possible to send connectionless unicast messages
within a communication group. To send a message, the sender can use
either a direct port address, aka port identity, or an indirect port
name to be looked up.
This type of messages are subject to the same start synchronization
and flow control mechanism as group broadcast messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We introduce an end-to-end flow control mechanism for group broadcast
messages. This ensures that no messages are ever lost because of
destination receive buffer overflow, with minimal impact on performance.
For now, the algorithm is based on the assumption that there is only one
active transmitter at any moment in time.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like with any other service, group members' availability can be
subscribed for by connecting to be topology server. However, because
the events arrive via a different socket than the member socket, there
is a real risk that membership events my arrive out of synch with the
actual JOIN/LEAVE action. I.e., it is possible to receive the first
messages from a new member before the corresponding JOIN event arrives,
just as it is possible to receive the last messages from a leaving
member after the LEAVE event has already been received.
Since each member socket is internally also subscribing for membership
events, we now fix this problem by passing those events on to the user
via the member socket. We leverage the already present member synch-
ronization protocol to guarantee correct message/event order. An event
is delivered to the user as an empty message where the two source
addresses identify the new/lost member. Furthermore, we set the MSG_OOB
bit in the message flags to mark it as an event. If the event is an
indication about a member loss we also set the MSG_EOR bit, so it can
be distinguished from a member addition event.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With group communication, it becomes important for a message receiver to
identify not only from which socket (identfied by a node:port tuple) the
message was sent, but also the logical identity (type:instance) of the
sending member.
We fix this by adding a second instance of struct sockaddr_tipc to the
source address area when a message is read. The extra address struct
is filled in with data found in the received message header (type,) and
in the local member representation struct (instance.)
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation for introducing flow control for multicast and datagram
messaging we need a more strictly defined framework than we have now. A
socket must be able keep track of exactly how many and which other
sockets it is allowed to communicate with at any moment, and keep the
necessary state for those.
We therefore introduce a new concept we have named Communication Group.
Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
The call takes four parameters: 'type' serves as group identifier,
'instance' serves as an logical member identifier, and 'scope' indicates
the visibility of the group (node/cluster/zone). Finally, 'flags' makes
it possible to set certain properties for the member. For now, there is
only one flag, indicating if the creator of the socket wants to receive
a copy of broadcast or multicast messages it is sending via the socket,
and if wants to be eligible as destination for its own anycasts.
A group is closed, i.e., sockets which have not joined a group will
not be able to send messages to or receive messages from members of
the group, and vice versa.
Any member of a group can send multicast ('group broadcast') messages
to all group members, optionally including itself, using the primitive
send(). The messages are received via the recvmsg() primitive. A socket
can only be member of one group at a time.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We often see a need for a linked list of destination identities,
sometimes containing a port number, sometimes a node identity, and
sometimes both. The currently defined struct u32_list is not generic
enough to cover all cases, so we extend it to contain two u32 integers
and rename it to struct tipc_dest_list.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We see an increasing need to send multiple single-buffer messages
of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes.
Instead of looping over the send queue and sending each buffer
individually, as we do now, we add a new help function
tipc_node_distr_xmit() to do this.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the following commits we will need to handle multiple incoming and
rejected/returned buffers in the function socket.c::filter_rcv().
As a preparation for this, we generalize the function by handling
buffer queues instead of individual buffers. We also introduce a
help function tipc_skb_reject(), and rename filter_rcv() to
tipc_sk_filter_rcv() in line with other functions in socket.c.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the coming commits, functions at the socket level will need the
ability to read the availability status of a given node. We therefore
introduce a new function for this purpose, while renaming the existing
static function currently having the wanted name.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address given to tipc_connect() is not completely sanity checked,
under the assumption that this will be done later in the function
__tipc_sendmsg() when the address is used there.
However, the latter functon will in the next commits serve as caller
to several other send functions, so we want to move the corresponding
sanity check there to the beginning of that function, before we possibly
need to grab the address stored by tipc_connect(). We must therefore
be able to trust that this address already has been thoroughly checked.
We do this in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As preparation for introducing communication groups, we add the ability
to issue topology subscriptions and receive topology events from kernel
space. This will make it possible for group member sockets to keep track
of other group members.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>