Discussion:
Why "No buffer space available"?
Medialy
2009-12-29 02:49:56 UTC
Permalink
Hi,
I have written a program to log the nat behavior. the program works
well when traffic is low. But when the traffic reaches 1Gb, program
always error.
According to the previous discussions about this problem, I even set
the recv buffer size=A0to 50MB and the error still exists.
=46or every callback, format the data and then put it into the queue
directly.=A0 The formating of data causes less then 1 second for 0.65
million records.
Errors always occurs=A0 when there are less than 10 log records.
Is anyone who can help?
Thanks.

Setting:
=A0=A0=A0 Redhat Enterprise Linux 5
=A0=A0=A0 libnetfilter_conntrack-0.0.100
=A0=A0=A0 libnfnetlink-1.0.0
=A0=A0=A0 recv buffer size: 50MB
=A0=A0=A0 nfct_open(CONNTRACK,=A0NF_NETLINK_CONNTRACK_NEW|NF_NETLINK_CO=
NNTRACK_DESTROY)
=A0=A0=A0 1Gb nat traffic, 0.65 million records per minute
=A0=A0=A0 circular queue size: 1 million

Error:
=A0=A0=A0 nfct_catch error: No buffer space available

Program Structure:
=A0=A0=A0 Callback:
=A0=A0=A0=A0=A0=A0=A0 lock;
=A0=A0=A0=A0=A0=A0=A0 if log number > MAX_LOG_NUM:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 discard
=A0=A0=A0=A0=A0=A0=A0 else:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 put log in circular queue
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 log number +=3D 1
=A0=A0=A0=A0=A0=A0=A0 unlock

=A0=A0=A0 Thread 2:
=A0=A0=A0=A0=A0=A0=A0 lock;
=A0=A0=A0=A0=A0=A0=A0 if log number >0 :
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 get lock number
=A0=A0=A0=A0=A0=A0=A0 unlock
=A0=A0=A0=A0=A0=A0=A0 process log data in circular queue
=A0=A0=A0=A0=A0=A0=A0 lock
=A0=A0=A0=A0=A0=A0=A0=A0log number=A0=3D log number - log number proces=
sed
=A0=A0=A0=A0=A0=A0=A0 unlock
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso
2009-12-30 12:10:24 UTC
Permalink
Post by Medialy
Hi,
I have written a program to log the nat behavior. the program works
well when traffic is low. But when the traffic reaches 1Gb, program
always error.
According to the previous discussions about this problem, I even set
the recv buffer size to 50MB and the error still exists.
Increasing the buffer size would not solve the problem, that will only
delay the ENOBUFS error. There are several reasons why you may hit ENOBUFS:

a) your program is too slow to handle the Netlink messages that you
receive from the kernel at a given rate. This is easier to trigger if
the handling that you perform on every message takes too long.
b) the queue size is too small, but this does not seem to be your case.

ENOBUFS basically means that the kernel has to drop Netlink messages
because your user-space program cannot back-off.
Post by Medialy
For every callback, format the data and then put it into the queue
directly. The formating of data causes less then 1 second for 0.65
million records.
Errors always occurs when there are less than 10 log records.
I don't understand what you mean here above.

BTW, if you use a recent Linux kernel (>=2.6.30) you can set these two
socket options not to get ENOBUFS error and to try to improve ctnetlink
reliability.

int on = 1;

setsockopt(nfct_fd(h), SOL_NETLINK,
NETLINK_BROADCAST_SEND_ERROR, &on, sizeof(int));

setsockopt(nfct_fd(h), SOL_NETLINK,
NETLINK_NO_ENOBUFS, &on, sizeof(int));
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso
2009-12-31 11:31:09 UTC
Permalink
Problem solved. Thanks.
BTW, sometimes the program stops at function nfct_close() and never return!
I don't have an explanation for that, but it should not happen.
Hi,
I have written a program to log the nat behavior. the program works
well when traffic is low. But when the traffic reaches 1Gb, program
always error.
According to the previous discussions about this problem, I even set
the recv buffer size to 50MB and the error still exists.
Increasing the buffer size would not solve the problem, that will
only delay the ENOBUFS error. There are several reasons why you may
a) your program is too slow to handle the Netlink messages that you
receive from the kernel at a given rate. This is easier to trigger
if the handling that you perform on every message takes too long.
b) the queue size is too small, but this does not seem to be your case.
ENOBUFS basically means that the kernel has to drop Netlink messages
because your user-space program cannot back-off.
Reason: system was overloaded due to the storage capability. The
program (2 threads) was set to use last CPU. When the traffic was
heavy, most of the computing power of last CPU was occupied by the
thread which wrote Netlink messages to the stroage.
Good analysis. It is a good idea to put the thread that digest the
Netlink message in a spare CPU. That reduces the chances to hit ENOBUFS.

I forgot to say but reducing the nice() value also help to avoid ENOBUFS.
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...