[Avila] fatal ethernet error on gw2348-4
devel
devel at oberonwireless.com
Thu Oct 19 10:06:19 EDT 2006
ARGH!!! Its back again. I'm not seeing the fatal error message on the
console, but I still from time to time am not receiving on the eth0 port for
some computers. I may increase the buffer number from 64 to something else,
128? I'm not sure what that could possibly do to the NPE driver. Any
thoughts?
Travis
----- Original Message -----
From: "devel" <devel at oberonwireless.com>
To: "Avila" <avila at lists.unixstudios.net>
Sent: Wednesday, October 18, 2006 5:16 PM
Subject: Re: [Avila] fatal ethernet error on gw2348-4
Subject: [Avila] > Disregard my previous post. It does appear to be working has I have had no
> problems for a few days now. Thanks.
>
> Travis
>
> ----- Original Message -----
> From: "devel" <devel at oberonwireless.com>
> To: "Avila" <avila at lists.unixstudios.net>
> Sent: Friday, October 13, 2006 11:03 AM
> Subject: Re: [Avila] fatal ethernet error on gw2348-4
>
>
>> I have tried this patch, and I can say I haven't seen the error
>> message in about 2 days. However, I am still having a problem with some
>> clients not getting the response back because it still seems at times
>> that the ethernet port is still not receiving.
>>
>> Before David Acker had mentioned any of this and found a possible
>> solution I didn't even think that my problem was caused by the ethernet
>> port/driver. My problem is I have an ethernet bridge with the wired
>> ethernet port and a wifi card (madwifi driver) and in Access Point mode
>> (Avila GW2348-4). And if I recall correctly, this problem started when we
>> went to the BSP's 2.6 Linux kernel. I have been after madwifi for months,
>> off and on, trying to resolve it thinking the madwifi driver was the
>> culprit.
>>
>> Now with David's findings, it certainly makes sense to me that my
>> problem is similar to his in the sense that a wifi client can associate,
>> ping the AP, ping associated clients, but nothing on the other side of
>> the AP. I have ethereal running on a server I am trying to ping from a
>> wifi client. The server sees the ping request and sends its reply. Looks
>> good there. But the client never sees the reply. I have put tcpdump on
>> the AP and it never sees the server's reply either. So my guess is the
>> ethernet port periodically stop receiving packets.
>>
>> So at the moment I'm not sure what to try. Maybe play with the source
>> code and look at little closer at what David was working with. If anyone
>> has any thoughts, please let me know. Thanks!
>>
>>
>> Travis
>>
>>
>>
>> ----- Original Message -----
>> From: "David Acker" <dacker at roinet.com>
>> To: "Avila" <avila at lists.unixstudios.net>
>> Sent: Wednesday, October 11, 2006 9:23 AM
>> Subject: Re: [Avila] fatal ethernet error on gw2348-4
>>
>>
>>>I found the issue. See the attached patch.
>>>
>>> The failure is due to
>>> ixp400_xscale_sw/src/ethAcc/IxEthAccDataPlane.c::ixEthAccPortRxFreeReplenish(...)
>>> thinking that the length is too small. See the following test:
>>> if (IX_OSAL_MBUF_MLEN(buffer) < IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN)
>>>
>>> The length reported by IX_OSAL_MBUF_MLEN(buffer) is 0 and
>>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN = 64.
>>>
>>> ixEthRxFrameProcess calls ixEthAccPortRxFreeReplenish and before the
>>> call it tries to reset the length with the following line of code:
>>> IX_OSAL_MBUF_MLEN(mbufPtr) = IX_OSAL_MBUF_PKT_LEN(mbufPtr) =
>>> IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr);
>>>
>>> Before this call to adjust the lengths I have seen the lengths reported
>>> by IX_OSAL_MBUF_MLEN(mbufPtr) and IX_OSAL_MBUF_PKT_LEN(mbufPtr) be
>>> various sizes although sometimes they are below 64 (I have seen as low
>>> as 60). IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr) is reporting 0 which
>>> leads to IX_OSAL_MBUF_MLEN(mbufPtr) becoming 0 and the length test
>>> failing.
>>>
>>> The packets that trigger this have the following flags set:
>>> 0x8090
>>> IX_ETHACC_NE_LINKMASK=0x01 - IEEE802.3 - Ethernet (Rx) / IEEE802.3 -
>>> Ethernet (Tx)
>>> IX_ETHACC_NE_FILTERMASK is set
>>> IX_ETHACC_NE_NEWSRCMASK is set
>>> or
>>> 0x8080
>>> IX_ETHACC_NE_LINKMASK=0x00 - IEEE802.3 - 8802 (Rx) / IEEE802.3 - 8802
>>> (Tx)
>>> IX_ETHACC_NE_FILTERMASK is set
>>> IX_ETHACC_NE_NEWSRCMASK is set
>>> It is the IX_ETHACC_NE_FILTERMASK that makes the driver try to replenish
>>> the packet. According to the header IX_ETHACC_NE_FILTERMASK means:
>>> * @brief This bit indicates whether a frame has been filtered by the
>>> Rx service.
>>> *
>>> * This mask applies to @a IX_ETHACC_NE_FLAGS.
>>> * Certain frames, which should normally be fully filtered by the NPE
>>> to due
>>> * the destination MAC address being on the same segment as the Rx port
>>> are
>>> * still forwarded to the XScale (although the payload is invalid) in
>>> order
>>> * to learn the MAC address of the transmitting station, if this is
>>> unknown.
>>> * Normally EthAcc will filter and recycle these framess internally and
>>> no
>>> * frames with the FILTER bit set will be received by the client.
>>>
>>> I suspect this is occurring because we are in promiscuous mode. The
>>> reason it doesn't kill your ethernet immediately is because each message
>>> is a leak of one buffer and there are RX_MBUF_POOL_SIZE or 80 of them
>>> available. Once your run out, you can not receive but you can send.
>>>
>>> The bug occurs because the ethernet driver is creating the receive
>>> buffer pool with a size of 0. Sadly the IX_OSAL_MBUF_POOL_INIT function
>>> does not check the size passed in even though the replenish code has a
>>> minimum size restriction. I changed the size passed in to
>>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN or 64 and now this situation
>>> seems to be handled properly.
>>> -Ack
>>>
>>>
>>> David Acker wrote:
>>>> The port is not part of a bridge. It is in promiscuous mode and is
>>>> receiving and sending a relatively large amount of traffic. I have a
>>>> userspace program that has a raw socket bound to each port. I don't
>>>> have any netfilter rules in place.
>>>>
>>>> I have done some more debugging and found some information. The
>>>> failure
>>>> is due to
>>>> ixp400_xscale_sw/src/ethAcc/IxEthAccDataPlane.c::ixEthAccPortRxFreeReplenish(...)
>>>> thinking that the length is too small. See the following test:
>>>> if (IX_OSAL_MBUF_MLEN(buffer) < IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN)
>>>>
>>>> The length reported by IX_OSAL_MBUF_MLEN(buffer) is 0 and
>>>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN = 64.
>>>>
>>>> ixEthRxFrameProcess calls ixEthAccPortRxFreeReplenish and before the
>>>> call it tries to reset the length with the following line of code:
>>>> IX_OSAL_MBUF_MLEN(mbufPtr) = IX_OSAL_MBUF_PKT_LEN(mbufPtr) =
>>>> IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr);
>>>>
>>>> Before this call to adjust the lengths I have seen the lengths reported
>>>> by IX_OSAL_MBUF_MLEN(mbufPtr) and IX_OSAL_MBUF_PKT_LEN(mbufPtr) be
>>>> various sizes although sometimes they are below 64 (I have seen as low
>>>> as 60). IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr) is reporting 0 which
>>>> leads to IX_OSAL_MBUF_MLEN(mbufPtr) becoming 0 and the length test
>>>> failing.
>>>>
>>>> I am still looking into when the allocated buffer length got corrupted.
>>>> It could be a locking issue and/or a race condition that only happens
>>>> at high speeds. With any luck I will know more by the end of the day.
>>>> -Ack
>>>>
>>>> Dave G wrote:
>>>>> Ack,
>>>>>
>>>>>
>>>>> Is the affected port part of a bridge group? Do you have any netfilter
>>>>> rules in place either using bridge nf or not?
>>>>>
>>>>>
>>>>> -Dave
>>>>>
>>>>>
>>>>>> Hello folks,
>>>>>> We are running the .06 development kit on our gw2348-4 board. If we
>>>>>> run it long enough with even small amounts of ethernet traffic we
>>>>>> eventually get the following error:
>>>>>>
>>>>>> [fatal] ixEthRxFrameProcess: Failed to replenish with filtered frame
>>>>>> on port 0
>>>>>>
>>>>>> After this error the port usually does not receive at all but can
>>>>>> send. For example, when I ping an ethernet client that has ethereal
>>>>>> on, I can see the ARP requests, and I can see the client send the ARP
>>>>>> response, but the board's arp tables do not show ever getting the
>>>>>> response. The port stays in this state through unplug/replug of the
>>>>>> cable. Sometimes it stays in this state through a reboot command.
>>>>>> It always clears up on a full power cycle. The other ethernet port
>>>>>> will work when one port is in this state.
>>>>>>
>>>>>> Does anyone know what causes this error and how to fix it?
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
>>>> For additional commands, e-mail: avila-help at lists.unixstudios.net
>>>>
>>>
>>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>> --- snapgear/modules/ixp425/net-2.0/ixp400_eth.c.orig 2006-10-10
>>> 16:54:17.000000000 -0400
>>> +++ snapgear/modules/ixp425/net-2.0/ixp400_eth.c 2006-10-10
>>> 14:07:54.000000000 -0400
>>> @@ -3198,7 +3198,7 @@ static int __devinit dev_eth_probe(struc
>>> TRACE;
>>>
>>> /* initialize RX pool */
>>> - priv->rx_pool = IX_OSAL_MBUF_POOL_INIT(RX_MBUF_POOL_SIZE, 0,
>>> + priv->rx_pool = IX_OSAL_MBUF_POOL_INIT(RX_MBUF_POOL_SIZE,
>>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN,
>>> "IXP400 Ethernet driver Rx Pool");
>>> if(priv->rx_pool == NULL)
>>> {
>>>
>>>
>>
>>
>> --------------------------------------------------------------------------------
>>
>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
>>> For additional commands, e-mail: avila-help at lists.unixstudios.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
>> For additional commands, e-mail: avila-help at lists.unixstudios.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
> For additional commands, e-mail: avila-help at lists.unixstudios.net
>
>
More information about the Avila
mailing list