Data missing from log
|Reported by:||mvglasow (2)||Owned by:||KaZeR|
|Severity:||Keywords:||logging, gpx, buffer|
During my work on #631 I have come across some odd behavior of the log functionality in navit:
I have created a new type of log (gpxwpt) which records trackpoints whenever a menu command is invoked. The command goes through the usual processing in navit and ends up calling a vehicle method, which in turn prepares the log string and then does a log_write to append it to the buffer of the log.
When analyzing the resulting log file, I frequently notice that waypoints are missing. In the most extreme cases, the log consists of only header and trailer, with no waypoints in between. In other cases some waypoints are there and others are missing; in other cases everything looks fine.
I have inserted some logging at the critical points to investigate into this problem. However, the difficulty is that the problem is sporadic and not easily reproducible.
I can say with certainty that my method gets called every time the command is invoked. I am also fairly (~90%) sure that log_write is called. (Unfortunately, after I introduced more detailed logging, the problem has not reappeared yet.)
It is always the data of an entire log_write operation that is missing. Hence it appears that the call to log_write writes either everything or nothing. Due to the nature of the data, the length of the data block to be added is always the same.
My suspect is the flush operation: It is triggered when either the buffer exceeds a certain size (after write, which looks harmless) or when a certain time has elapsed after the last flush operation (timer with callback, suspicious).
log_write essentially does a memcpy of the new data to the end of the buffer, then increases the length counter of the buffer. (The rest is some extra verification around it.)
Theory: If we're in the middle of log_write, appending data to the buffer, and the timer interrupts us just after the memcpy but before updating the buffer length counter, the above behavior occurs.
The flush operation will write the contents of the buffer, as indicated by the length counter (thus everything but the new data) to the file and reset the length to zero.
Then log_write regains control, adding the length of the new data to the (now zero) buffer length. However, the first length bytes of the buffer hold something completely different. (In my situation I would have ended up logging the same waypoint twice and omitting another.)
In short: it's a classical race condition. I don't see any synchronization in the code. Unless I'm overlooking something, we need to introduce some synchronization to defer the timer-triggered flush operations if any other operation on the buffer is in progress.
All tests were done with a modified r3520 on Windows CE.
Is there anyone who knows that part of the code better than I do and can tell me if my suspicion is likely to be the cause?