[Avila] fatal ethernet error on gw2348-4

devel devel at oberonwireless.com
Wed Oct 18 17:16:42 EDT 2006


Disregard my previous post. It does appear to be working has I have had no 
problems for a few days now. Thanks.

Travis

----- Original Message ----- 
From: "devel" <devel at oberonwireless.com>
To: "Avila" <avila at lists.unixstudios.net>
Sent: Friday, October 13, 2006 11:03 AM
Subject: Re: [Avila] fatal ethernet error on gw2348-4
Subject: [Avila] >    I have tried this patch, and I can say I haven't seen the error message 
> in about 2 days. However, I am still having a problem with some clients 
> not getting the response back because it still seems at times that the 
> ethernet port is still not receiving.
>
>    Before David Acker had mentioned any of this and found a possible 
> solution I didn't even think that my problem was caused by the ethernet 
> port/driver. My problem is I have an ethernet bridge with the wired 
> ethernet port and a wifi card (madwifi driver) and in Access Point mode 
> (Avila GW2348-4). And if I recall correctly, this problem started when we 
> went to the BSP's 2.6 Linux kernel. I have been after madwifi for months, 
> off and on, trying to resolve it thinking the madwifi driver was the 
> culprit.
>
>    Now with David's findings, it certainly makes sense to me that my 
> problem is similar to his in the sense that a wifi client can associate, 
> ping the AP, ping associated clients, but nothing on the other side of the 
> AP. I have ethereal running on a server I am trying to ping from a wifi 
> client. The server sees the ping request and sends its reply. Looks good 
> there. But the client never sees the reply. I have put tcpdump on the AP 
> and it never sees the server's reply either. So my guess is the ethernet 
> port periodically stop receiving packets.
>
>    So at the moment I'm not sure what to try. Maybe play with the source 
> code and look at little closer at what David was working with. If anyone 
> has any thoughts, please let me know. Thanks!
>
>
> Travis
>
>
>
> ----- Original Message ----- 
> From: "David Acker" <dacker at roinet.com>
> To: "Avila" <avila at lists.unixstudios.net>
> Sent: Wednesday, October 11, 2006 9:23 AM
> Subject: Re: [Avila] fatal ethernet error on gw2348-4
>
>
>>I found the issue.  See the attached patch.
>>
>> The failure is due to
>> ixp400_xscale_sw/src/ethAcc/IxEthAccDataPlane.c::ixEthAccPortRxFreeReplenish(...)
>> thinking that the length is too small.  See the following test:
>> if (IX_OSAL_MBUF_MLEN(buffer) < IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN)
>>
>> The length reported by IX_OSAL_MBUF_MLEN(buffer) is 0 and
>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN = 64.
>>
>> ixEthRxFrameProcess calls ixEthAccPortRxFreeReplenish and before the
>> call it tries to reset the length with the following line of code:
>> IX_OSAL_MBUF_MLEN(mbufPtr) = IX_OSAL_MBUF_PKT_LEN(mbufPtr) =
>> IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr);
>>
>> Before this call to adjust the lengths I have seen the lengths reported
>> by IX_OSAL_MBUF_MLEN(mbufPtr) and IX_OSAL_MBUF_PKT_LEN(mbufPtr) be
>> various sizes although sometimes they are below 64 (I have seen as low
>> as 60).  IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr) is reporting 0 which
>> leads to IX_OSAL_MBUF_MLEN(mbufPtr) becoming 0 and the length test 
>> failing.
>>
>> The packets that trigger this have the following flags set:
>> 0x8090
>> IX_ETHACC_NE_LINKMASK=0x01 - IEEE802.3 - Ethernet (Rx) / IEEE802.3 -
>> Ethernet (Tx)
>> IX_ETHACC_NE_FILTERMASK is set
>> IX_ETHACC_NE_NEWSRCMASK is set
>> or
>> 0x8080
>> IX_ETHACC_NE_LINKMASK=0x00 - IEEE802.3 - 8802 (Rx) / IEEE802.3 - 8802 
>> (Tx)
>> IX_ETHACC_NE_FILTERMASK is set
>> IX_ETHACC_NE_NEWSRCMASK is set
>> It is the IX_ETHACC_NE_FILTERMASK that makes the driver try to replenish
>> the packet.  According to the header IX_ETHACC_NE_FILTERMASK means:
>>  * @brief This bit indicates whether a frame has been filtered by the
>> Rx service.
>>  *
>>  * This mask applies to @a IX_ETHACC_NE_FLAGS.
>>  * Certain frames, which should normally be fully filtered by the NPE
>> to due
>>  * the destination MAC address being on the same segment as the Rx port 
>> are
>>  * still forwarded to the XScale (although the payload is invalid) in 
>> order
>>  * to learn the MAC address of the transmitting station, if this is
>> unknown.
>>  * Normally EthAcc will filter and recycle these framess internally and 
>> no
>>  * frames with the FILTER bit set will be received by the client.
>>
>> I suspect this is occurring because we are in promiscuous mode.  The
>> reason it doesn't kill your ethernet immediately is because each message
>> is a leak of one buffer and there are RX_MBUF_POOL_SIZE or 80 of them
>> available.  Once your run out, you can not receive but you can send.
>>
>> The bug occurs because the ethernet driver is creating the receive
>> buffer pool with a size of 0.  Sadly the IX_OSAL_MBUF_POOL_INIT function
>> does not check the size passed in even though the replenish code has a
>> minimum size restriction.  I changed the size passed in to
>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN or 64 and now this situation
>> seems to be handled properly.
>> -Ack
>>
>>
>> David Acker wrote:
>>> The port is not part of a bridge.  It is in promiscuous mode and is
>>> receiving and sending a relatively large amount of traffic.  I have a
>>> userspace program that has a raw socket bound to each port.  I don't
>>> have any netfilter rules in place.
>>>
>>> I have done some more debugging and found some information.  The failure
>>>  is due to
>>> ixp400_xscale_sw/src/ethAcc/IxEthAccDataPlane.c::ixEthAccPortRxFreeReplenish(...)
>>> thinking that the length is too small.  See the following test:
>>> if (IX_OSAL_MBUF_MLEN(buffer) < IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN)
>>>
>>> The length reported by IX_OSAL_MBUF_MLEN(buffer) is 0 and
>>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN = 64.
>>>
>>> ixEthRxFrameProcess calls ixEthAccPortRxFreeReplenish and before the
>>> call it tries to reset the length with the following line of code:
>>> IX_OSAL_MBUF_MLEN(mbufPtr) = IX_OSAL_MBUF_PKT_LEN(mbufPtr) =
>>> IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr);
>>>
>>> Before this call to adjust the lengths I have seen the lengths reported
>>> by IX_OSAL_MBUF_MLEN(mbufPtr) and IX_OSAL_MBUF_PKT_LEN(mbufPtr) be
>>> various sizes although sometimes they are below 64 (I have seen as low
>>> as 60).  IX_OSAL_MBUF_ALLOCATED_BUFF_LEN(mbufPtr) is reporting 0 which
>>> leads to IX_OSAL_MBUF_MLEN(mbufPtr) becoming 0 and the length test 
>>> failing.
>>>
>>> I am still looking into when the allocated buffer length got corrupted.
>>>  It could be a locking issue and/or a race condition that only happens
>>> at high speeds.  With any luck I will know more by the end of the day.
>>> -Ack
>>>
>>> Dave G wrote:
>>>> Ack,
>>>>
>>>>
>>>> Is the affected port part of a bridge group? Do you have any netfilter
>>>> rules in place either using bridge nf or not?
>>>>
>>>>
>>>> -Dave
>>>>
>>>>
>>>>> Hello folks,
>>>>> We are running the .06 development kit on our gw2348-4 board.  If we
>>>>> run it long enough with even small amounts of ethernet traffic we
>>>>> eventually get the following error:
>>>>>
>>>>> [fatal] ixEthRxFrameProcess: Failed to replenish with filtered frame
>>>>> on port 0
>>>>>
>>>>> After this error the port usually does not receive at all but can
>>>>> send. For example, when I ping an ethernet client that has ethereal
>>>>> on, I can see the ARP requests, and I can see the client send the ARP
>>>>> response, but the board's arp tables do not show ever getting the
>>>>> response.  The port stays in this state through unplug/replug of the
>>>>> cable.  Sometimes it stays in this state through a reboot command.
>>>>> It always clears up on a full power cycle.  The other ethernet port
>>>>> will work when one port is in this state.
>>>>>
>>>>> Does anyone know what causes this error and how to fix it?
>>>>>
>>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
>>> For additional commands, e-mail: avila-help at lists.unixstudios.net
>>>
>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>> --- snapgear/modules/ixp425/net-2.0/ixp400_eth.c.orig 2006-10-10 
>> 16:54:17.000000000 -0400
>> +++ snapgear/modules/ixp425/net-2.0/ixp400_eth.c 2006-10-10 
>> 14:07:54.000000000 -0400
>> @@ -3198,7 +3198,7 @@ static int __devinit dev_eth_probe(struc
>>     TRACE;
>>
>>     /* initialize RX pool */
>> -    priv->rx_pool = IX_OSAL_MBUF_POOL_INIT(RX_MBUF_POOL_SIZE, 0,
>> +    priv->rx_pool = IX_OSAL_MBUF_POOL_INIT(RX_MBUF_POOL_SIZE, 
>> IX_ETHNPE_ACC_RXFREE_BUFFER_LENGTH_MIN,
>>             "IXP400 Ethernet driver Rx Pool");
>>     if(priv->rx_pool == NULL)
>>     {
>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
>> For additional commands, e-mail: avila-help at lists.unixstudios.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: avila-unsubscribe at lists.unixstudios.net
> For additional commands, e-mail: avila-help at lists.unixstudios.net
>
> 





More information about the Avila mailing list