I am using a SUB-20 board SPI interface as a demo controller for an evaluation board for our company product. I developed the application (a GUI) in C using GTK and used the libusb-1.0.6. The same GUI works under Linux and Windows. When I upgraded my Linux system from Hardy (8.04) to Jaunty (9.04), it quit working (reliably). The application comes up, but with repeated SPI transfers, I get a segmentation violation associated with the spi transfer call.
I re-downloaded the libusb-1.0.6 and recompiled it and installed it on the upgraded machine. I believe I have isolated the problem to this library. My system can dual boot between the old and new version of Linux, and on the new version, whenever I execute a function that does multiple SPI transfers, after a short time, I get a segmentation violation and the program crashes.
Is there anyone out there who has successfully used the sub-20 on a Linux Jaunty 9.04 system? Also, is there a different version of the libusb-1.0.6 or libusb-1.0.7 that I should try? (I have tried both).
Hi,
To try to help you we need additional information
1. So does it work on Jaunty with libusb-1.0.6 ?
2. Do you have specific API call it fails on? Can you send us a peace of code around it?
3. You can turn on tracing with export SUB_DEBUG=10
Please send us trace before segfault
4. And finally you can run it under gdb to trace back failure.
It does NOT work with libusb1.0.6 or 1.0.5, or 1.0.7.
Do you have specific API call it fails on? Can you send us a peace of code around it?
First of all, it happens when I am doing a tight loop of repetition. I have seen it occur on 3 API calls:
1:
if( sub_spi_config( dev,
SPI_ENABLE | SPI_CPOL_RISE | SPI_SMPL_SETUP |
SPI_MSB_FIRST | spi_clk_fq, 0 ) != 0 )
2:
ret = sub_gpio_write( dev, VSYNC, &tmp, VSYNC ) ;
ret = sub_gpio_write( dev, 0, &tmp, VSYNC ) ;
3:
int SUB20_SPI_TRANSFER( /* sub_device dev, */ char *bf_o, char *bf_i, int len, int ss )
{
int ret ;
if( len < 50 )
{
ret = sub_spi_transfer( dev, bf_o, bf_i, len, SS_CONF(ss, SS_LO) ) ;
}
else
{
ret = sub_spi_transfer( dev, bf_o, bf_i, len - 40, SS_CONF(ss, SS_L) ) ;
ret = sub_spi_transfer( dev, bf_o + 40, bf_i + 40, len - 40, SS_CONF(ss, SS_LO) ) ;
}
return ret ;
}
In 3 above, it can get caught on any one of the three sub_spi_transfer calls.
3. You can turn on tracing with
export SUB_DEBUG=10
Please send us trace before segfault
I am sorry, I do not know how to do the export SUB_DEBUG=10 call. However, I have done a strace and ltrace on when the failure occurs.
By the way, the ltrace is quite long, so I have only included the last 50 lines or so.
Using GDB, I have run it and looked at the stack frame from where my code calls are made, and there do not appear to be any undefined or out of range settings.
Unfortunately, the code is quite large and complicated. However, it is not secret, so if you would like, I can zip it up and send it to you. It is a pretty large GUI that we use to demo an IC to customers. it works flawlessly with Windows and Ubuntu 8.04.
I hope this information is helpful. I look forward to hearing from you.
Thanks and very best regards,
John
clock_gettime(CLOCK_MONOTONIC, {23939, 911313219}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++ll.
It looks a little different each time as it can fail on any of my BULK_OUT or IN functions; but they all seem to have completed. I never see one where the data in the transfer doesn't make sense.
HI,
I think the best we can do is to try to redact your code to something minimal that can still generate a segfault. Please try to extract only problematic portion.
In this case we could try to run it on our side and check.
Another option is to try our sub_app in loop. It has repeat option -r. Please take a look at it and try to run it. Tell us if it fails and if so we will do the same on our side. The problem can be related to libusb, kernel, or even specific PC USB host controller implementation. So it is good to get problem localized on your environment and after that check it deeper on other systems.
Hello and thank you for your comments,
I believe you are right on track. I spent several hours debugging with gdb and it appears that there is some interaction with the ITIMER_REAL and the USB libusb functions. I can run the operation with ITIMER_VIRTUAL just fine with only a slight cost in timing accuracy, but when ITIMER_REAL is used, the segmentation violation occurs.
I had originally thought it might be an interaction between gtk, glade, and libusb; but now that I understand that it is only between the timer and libusb, I will see if I can put together a simple application that duplicates the problem.
I had also thought that it might be kernel related; however I tested the code out on several different machines running Jaunty (9.04) and the failure was consistent with all of them. It could in fact still be something related to the kernel; but if I compile a special kernel that works, it would only mask the problem since I don't want every machine that runs the app to have to have a special kernel. So for now I am going with the virtual timer.
In the mean time, I will put together a small app that duplicates the problem without all the baggage of my app. When I get it set up able to duplicate the failure, I will send it to you.
Also, running in repeat mode does not cause the failure unless it is being operated with the itimer using the setitimer function.
I have placed a short program below that will duplicate the problem. I have placed a define statement that can be used to easily switch the program between use of the real time and virtual timers. When using the real time timer, the segmentation violation occurs. When using the virtual one, it works fine.
You can observe the timing by monitoring the GPIO 13 line.
There are two files. They are SPI_IO.h (very short) and rt_op.c (about 100 lines).
In normal operation, this will write successively to the GPIO and SPI buss for 20,000 cycles.
Each cycle is 5000 usec, so it goes for about 100 seconds. This is usually long enough to catch a segmentation violation.
Sometimes it occurs right away, and other times it takes 10 to 20 seconds.
Again, operation is reliable with the virtual timer (comment out the "define RT" line.
It seems from this that something in the libusb (or perhaps libsub?) is incompatible with something to do with ITIEMR_REAL.
Anyway, I hope this test case helps. As I mentioned before, I have a solution (use the virtual timer), but I would still be interested to know why this happens.
By the way, I have also verified that the real time timer works fine when not making the sub calls.
#define RT // If this is defined, you can duplicate the segmentation violation...
#ifdef RT
#define ITIMER_NUM ITIMER_REAL
#define SIG_NUM SIGALRM
#else
#define ITIMER_NUM ITIMER_VIRTUAL
#define SIG_NUM SIGVTALRM
#endif
sub_device dev ;
int soft_vsync = 0 ;
int soft_vsync_on = 1 ;
char bf_o[1024], bf_i[1024] ;
struct itimerval value ;
int which = ITIMER_NUM ;
struct sigaction sact ;
int open_SPI()
{
dev = sub_open( 0 ) ;
if( !dev )
return 0 ;
return 1 ;
}
int pulse_vsync( /* sub_device dev */ )
{
int ret ;
int tmp ;
ret = sub_gpio_write( dev, VSYNC, &tmp, VSYNC ) ;
ret = sub_gpio_write( dev, 0, &tmp, VSYNC ) ;
return( ret ) ;
}
void wait_soft_vsync()
{
// Wait for soft vsync true
//while( soft_vsync == 0 ) ; // Wait for timer to set it
soft_vsync = 0 ; // Reset it and then return
}
int main()
{
int len, i ;
int count ;
int ret ;
// Before the main while loop, if Vsync is internally generated
// then start the timer
sigemptyset( &sact.sa_mask ) ;
sact.sa_flags = 0 ;
sact.sa_handler = gen_soft_vsync ;
sigaction( SIG_NUM, &sact, NULL );
value.it_interval.tv_sec = 0 ;
value.it_interval.tv_usec = 5000 ;
value.it_value.tv_sec = 0 ;
value.it_value.tv_usec = 5000 ;
setitimer( which, &value, NULL ) ;
if( (ret = open_SPI()) != 1 )
printf( "SPI not connected ret = %d\n", ret ) ;
if( sub_spi_config( dev,
SPI_ENABLE | SPI_CPOL_RISE | SPI_SMPL_SETUP |
SPI_MSB_FIRST | SPI_CLK_FQ | SPI_CLK_8MHZ, 0 ) != 0 )
printf( "sub_spi_config: %s", sub_strerror(sub_errno) ) ;
len = 40 ;
for( i = 0 ; i < len ; i++ )
{
bf_o = i ;
bf_i[0] = 0 ;
}
for( count = 0 ; count < 20000 ; count++ )
{
// Do real time stuff
wait_soft_vsync() ; // Wait
pulse_vsync() ;
// Write it to the SPI.
sub_spi_transfer( dev, bf_o, bf_i, len, SS_CONF(1, SS_LO) ) ;
}
soft_vsync_on = 0 ;
return(0);
}
Hi,
I'm working with libusb development team on this issue.
They asked to check it with new Ubuntu 1.04 and libusb 1.0.8. Can you do it please?
(You can also try libusb 1.0.8 on Ubuntu 9.04)