Opened 4 years ago

Closed 4 years ago

#1334 closed defect/bug (fixed)

Certain OSD enable_expressions fail to evaluate on OSD initialization, succeed on later attr changes

Reported by: mvglasow (2) Owned by: mvglasow (2)
Priority: major Milestone:
Component: osd/core Version: git master
Severity: normal Keywords: osd, enable_expression, command
Cc:

Description

While working on a new OSD (navigation_status), I noticed some odd behavior from enable_expression: Certain expressions will fail to evaluate when the OSD is first created. The error code is not_ready, indicating that the expression was never evaluated. (The structure member that stores the last error is initialized with that value, so this is the error that would result if one queried an expression before its forst evaulation.) Subsequent evaluations, which take place whenever one of the attributes used in the expression changes, complete succesfully and the expected value is returned.

The weird thing is that this behavior is selective, affecting certain expressions but not others.

Expressions that work correctly:

  • navigation.nav_status!=0

Expressions for which the first evaluation fails:

  • navigation.nav_status==-1 || navigation.nav_status==1 || navigation.nav_status==2
  • navigation.nav_status!=-2 && navigation.nav_status!=0 && navigation.nav_status!=3

(ampersand characters properly escaped in XML, as with all other special characters).

I did not yet conduct any further tests with other OSDs and expressions, but since my OSD does no special handling regarding its enable_expression but relies on the generic OSD functions shared with all other OSDs, I suspect this is a generic bug. I will conduct some further tests to see how it affects other OSDs and attributes, and then report back.

Change History (9)

comment:1 Changed 4 years ago by mvglasow (2)

Did some further tests...

  • At least nav_next_status and text suffer from the same issue.
  • Boolean operators (||, &&) seem to contribute. After eliminating the Boolean operator (changing e.g. attr==1 || attr=2 to attr==1) the expression would evaluate. I had similar effects with a sum operation.
  • The attributes used in the expression make a difference. 1 || 1 will evaluate correctly, as will route.route_status==1 || route.route_status==0. But 1 || route.route_status==1 will fail. Also, all Boolean operations involving navigation.nav_status or follow that I tried failed.

This will require close examination of the code to find out. In my experience, errors like these which don't follow a logical pattern are frequently tiny little typos having big effects, such as a forgotten ! in a Boolean expression.

comment:2 follow-up: Changed 4 years ago by mvglasow (2)

After digging through the code, it seems the culprit is somewhere around command_register_callbacks(): I get the following debug output:

error:navit:command_register_callbacks:enter: cs=0x10daed0, cs->async=0, cs->command=navigation.nav_status==-1 || navigation.nav_status==1 || navigation.nav_status==2
error:navit:eval_value:illegal character 0x2e
error:navit:command_register_callbacks:could not resolve an object: cs=0x10daed0, cs->ctx.error=0, cs->ctx.expr===-1 || navigation.nav_status==1 || navigation.nav_status==2
<snip/>
error:navit:command_register_callbacks:enter: cs=0x10daed0, cs->async=0, cs->command=navigation.nav_status==-1 || navigation.nav_status==1 || navigation.nav_status==2
error:navit:eval_value:illegal character 0x2e
error:navit:eval_value:illegal character 0x3d
error:navit:eval_value:illegal character 0x3d
error:navit:eval_value:illegal character 0x7c
error:navit:eval_value:illegal character 0x7c
error:navit:eval_value:illegal character 0x7c
error:navit:command_register_callbacks:could not resolve an object: cs=0x10daed0, cs->ctx.error=0, cs->ctx.expr=.nav_status==1 || navigation.nav_status==2

cs->ctx.expr gets shorter on each call. This indicates that command_register_callbacks() parses the expression only partially on each call. In the process, however, it apparently registers command_saved_callbacks_changed() as a callback, then returns. When the callback is triggered, it again calls command_register_callbacks() and parsing continues.

I have observed two cycles like these on startup. This is enough for short expressions but not for more complex ones – which explains the behavior we're seeing but I'm not sure this behavior was intended.

The log output also shows that whenever command_register_callbacks() terminates, cs->ctx.error is zero. That is, the abort was caused by the (cs->res.attr.type == attr_none) condition (command.c, around line 1592). Maybe we should continue evaluation upon encountering this condition?

Using blame, I see that this piece of code was written by tinloaf more than 6 years ago and has hardly been touched since. For all I know tinloaf is no longer active.

Also, I see a lot of errors from eval_value() complaining about everyday characters (the hex codes stand for .=| respectively). That doesn't look quite right either...

Last edited 4 years ago by mvglasow (2) (previous) (diff)

comment:3 in reply to: ↑ 2 Changed 4 years ago by tryagain

Replying to http://wiki.navit-project.org/index.php/user:mvglasow (2):

After digging through the code, it seems the culprit is somewhere around command_register_callbacks(): I get the following debug output:

It looks like most, if not all, of errors quoted are actually not final errors. For example, the 'illegal character' most probably is fired on each operation sign and later recovered in get_next_object, line 630.

Similary, 'navit:command_register_callbacks:could not resolve an object' is (to be certain, should be) a temporary condition during navit startup, when a requested subsystem is not yet ready.

cs->ctx.expr gets shorter on each call. This indicates that command_register_callbacks() parses the expression only partially on each call. In the process, however, it apparently registers command_saved_callbacks_changed() as a callback, then returns. When the callback is triggered, it again calls command_register_callbacks() and parsing continues.

Hm, i guess it should not continue to parse the command, i would expect rather it should attempt to parse it from the scratch on each iteration of command_saved_attr_new, and stop when all expression objects are correctly resolved.

I have observed two cycles like these on startup. This is enough for short expressions but not for more complex ones – which explains the behavior we're seeing but I'm not sure this behavior was intended.

The log output also shows that whenever command_register_callbacks() terminates, cs->ctx.error is zero. That is, the abort was caused by the (cs->res.attr.type == attr_none) condition (command.c, around line 1592). Maybe we should continue evaluation upon encountering this condition?

Current logic, as i understand it, looks like this:

  • osd with enable_expression calls command_saved_attr_new
  • command_saved_attr_new retries itself with idle callback till command_register_callbacks() succeeds
  • then evaluation fires on each attribute change, and command_register_callbacks() is retried without checking its return value.

There's a possible flaw: if object resolve succeeds a few times, but then fails, callbacks are not properly registered. But imho it shouldn't make sense in your situation, because, i think, the navigation object is not disposable once it's created.

comment:4 Changed 4 years ago by mvglasow (2)

Thanks for your insight.

It looks like most, if not all, of errors quoted are actually not final errors. For example, the 'illegal character' most probably is fired on each operation sign and later recovered in get_next_object, line 630.

That's what it looks like to me, but if it is a regular condition, we shouldn't be logging it as an error. Haven't looked into that portion of the code yet, though. If it's just cosmetic (i.e. misleading log output), that should be easy to fix.

Hm, i guess it should not continue to parse the command, i would expect rather it should attempt to parse it from the scratch on each iteration of command_saved_attr_new, and stop when all expression objects are correctly resolved.

If the object in question is indeed not yet ready, then I agree. Here, however, we're talking about navigation.nav_status. If we have a valid navigation object, it will also have a valid nav_status attribute. Thus we should never encounter a situation in which we can resolve one but not the other, but the logs seem to indicate otherwise. I suspect that under certain conditions command_register_callbacks() gives up too early. I'll need to run a few tests to find out.

command_saved_attr_new retries itself with idle callback till command_register_callbacks() succeeds

Yep, I suspected there's an idle callback involved somewhere. This would explain why certain attributes behave differently from others – it would be a timing issue. Some attributes take less computing to retrieve, allowing for more runs of the idle callback, others take longer and evaluation does not finish in time.

Then again, if the whole thing runs in an idle loop, I'd expect the expression to eventually get resolved and the OSD to show if I wait a few seconds. However, that is not the case – the OSD does not show until the first attribute callback fires. I'll add some debug output here to see when command_saved_attr_new() fires, then I'll hopefully know more...

comment:5 Changed 4 years ago by mvglasow (2)

I found something else...

command_saved_attr_new() calls command_register_callbacks() and examines its return value. If it is 0, indicating that not all callbacks could be registered (as happens in my test case), it registers command_saved_callbacks_changed() as an idle event.

When command_saved_callbacks_changed() runs, it makes a simple call to command_register_callbacks() without examining its return value.

As a result, expressions for which command_register_callbacks() aborts more than once will not get fully evaluated until one of their attribute callbacks (the ones that did get registered) fires.

A possible solution could be to have command_saved_callbacks_changed() to examine the return value of command_register_callbacks() in a similar way to command_saved_attr_new(). To avoid evaluating a faulty expression over and over again, we might want to cancel evaluation if an error is encountered.

comment:6 Changed 4 years ago by mvglasow (2)

Nope. This causes command_register_callbacks() to hang at the same point over and over again. Need to get more detailed debug output to find out which object exactly won't resolve.

comment:7 Changed 4 years ago by mvglasow (2)

Added some debug code to display the name of the object which could not be resolved and ran it with multiple OSDs having the following enable_expressions:

navigation.nav_status&gt;=3
navigation.nav_status==-1 || navigation.nav_status==1 || navigation.nav_status==2

On the first run it is indeed 'nav_status' which cannot be resolved – for both expressions. On the second run the unresolvable object is ' navigation' – note the leading space here. If I rewrite the expression without spaces, both runs will complain about 'nav_status' being unresolvable.

Apparently we got both lice and fleas and the itch won't go away until we get rid of both. The leading space is probably just a parsing issue.

Oddly, however, in the long command the instance of nav_status that cannot be resolved advances between the two calls. (With my previous solution attempt, it would however no longer change after the second call.)

comment:8 Changed 4 years ago by mvglasow (2)

  • Owner changed from Singesang to mvglasow (2)
  • Status changed from new to assigned

Got it. Turns out cs->ctx.attr in command_register_callbacks() is initialized correctly before entering the loop, but not updated with each step.

Since the default parent in this case is navit, everything works well for unqualified object references. However, when we have a qualified object reference such as navigation.nav_status, the function ends up looking for navit.nav_status. Hardly surprising this doesn't work.

For some qualified object names this error was not noticeable, presumably because they change frequently and thus the respective expressions would get evaluated because an attribute callback fired. That part of the code apparently works with correct object references, therefore things would work correctly on attribute changes. I didn't investigate that in detail, though...

Merge to follow soon...

comment:9 Changed 4 years ago by mvglasow (2)

  • Resolution set to fixed
  • Status changed from assigned to closed

Merged with pull request https://github.com/navit-gps/navit/pull/37. Thanks @tryagain for your input!

Note: See TracTickets for help on using tickets.