[sbml-discuss] What to do about his unit checking case

Michael Hucka mhucka at caltech.edu
Thu Feb 22 13:39:12 PST 2007


I wanted to get back to the issue of syntax for indicating
units in SBML.  Stefan's awesome message was wonderfully
detailed and hit all the right points.  (Thanks!)  Still,
one item that I dispute slightly is that assuming default
units for numbers makes solutions infeasible.  We can define
a default, and tell people what it is, and how to define
other units when the default is not suitable.

Here is a proposal for the simplest way forward for L2v3:

1) Define "raw" numbers appearing in MathML expressions as
   having units "dimensionless" (which is not the same as
   having undefined units).

2) Tell people that if they want to introduce a number
   having other units into their expressions, they should
   define a parameter for that number and then use the
   parameter's id in the expression instead of the number.

Features of this approach:

(a) No new syntax needs to be introduced.

(b) For some cases, people's existing math expressions will
    be correct.  In particular, expressions where a "raw"
    number is multiplying or dividing another number, will
    be correct under mechanistic unit checking.  However
    when numbers are being added or substracted, in most
    cases they would fail unit checking.  In Stefan's
    original example, the model would still be incorrect
    because there is a mix of these two.  The way to solve
    it for that example is to define a parameter for at
    least the "1", and the "2" could be left as-is.  By
    contrast, assuming that raw numbers have undefined units
    would cause all cases to fail unit validation.  I think
    on balance defaulting to "dimensionless" in this way
    will cause people less grief.

My proposal for L3 (it could optionally be done for L2v3 but
I think it would have to be voted on, which would take more
time, and I think no one wants to delay L2v3 further):

1) Get rid of the UnitKind enumeration, and instead, define
   UnitKind's current values as being reserved words in
   UnitSId.

2) Change the data type of the Unit "kind" field to be
   UnitSId.  (Combination of #1 and #2 thereby puts all
   units into the same identifier space, for #3.)

3) Define the ability to put unit attributes on MathML
   elements in this form:

      <cn sbml:units="mole"> 1 </cn>

   Bonus: if the sbml:units attribute is permitted anywhere
   in the MathML, then units of more than just <cn> could be
   defined uniformly.

4) Define "raw" numbers appearing in MathML expressions as
   having units "dimensionless" (which is not the same as
   having undefined units).

5) Tell people that if they want to introduce a number
   having other units into their expressions, they can
   either define a parameter for that number and then use
   the parameter's id in the expression instead of the
   number, or else use the sbml:units attribute.


I would personally rather do this 2nd alternative, because
it's more powerful and (I wager) less cumbersome, but it
would delay the issuing of L3.

Does anyone have any objections or see any problems with the
first approach?

MH

>>>>> On 12 Feb 2007, Stefan Hoops <shoops at vbi.vt.edu> wrote:
  shoops> Dear Darrin:
  shoops>
  shoops> I like to pick up on your example since it is a
  shoops> real one frequently encountered (even in SBML). I
  shoops> modified it a little to conform with the
  shoops> definition of the kineticLaw of SBML, i.e., it
  shoops> looks like
  shoops>
  shoops> k * V * X(X-1)/2
  shoops>
  shoops> Here k is the kinetic constant in the appropriate
  shoops> units and V the volume of the reaction
  shoops> compartment. The MathML for this rate law may be
  shoops> found below
  shoops>
  shoops> <math xmlns="http://www.w3.org/1998/Math/MathML">
  shoops> <apply> <divide/> <apply> <times/> <ci> k </ci>
  shoops> <ci> V </ci> <ci> X </ci> <apply> <minus/> <ci> X
  shoops> </ci> <cn> 1 </cn> </apply> </apply> <cn> 2 </cn>
  shoops> </apply> </math>
  shoops>
  shoops> This MathML includes two cn elements of which one
  shoops> <cn> 2 </cn> is dimensionless and the other <cn> 1
  shoops> </cn> must have the same units as X.  It is easy
  shoops> for a human to detect the needed units for the
  shoops> above example.
  shoops>
  shoops> As Mike stated in his initial post to the mailing
  shoops> list defining the units of numbers in MathML
  shoops> implicitly has at least one major problem.  Let us
  shoops> assume that the units of k are incorrectly given
  shoops> in the SBML document. In this case the implicit
  shoops> unit definition of <cn> 2 </cn> would correct for
  shoops> this obvious error and thus preventing the unit
  shoops> checking which SBML facilitates. This is clearly
  shoops> not the intent of implicit unit definition for
  shoops> numbers.
  shoops>
  shoops> Please note that the above example can be written
  shoops> in SBML without any problems. We just need to
  shoops> replace <cn> 1 </cn> with <ci> ONE_ITEM </ci> and
  shoops> define ONE_ITEM as a global parameter with the
  shoops> appropriate units.  This approach is clearly
  shoops> cumbersome but the only correct way in SBML as it
  shoops> stands now. Implicit unit definitions where
  shoops> brought up to make writing correct SBML
  shoops> easier. However, the above problem (at least in my
  shoops> opinion) makes them unfeasible.
  shoops>
  shoops> What other options do we have:
  shoops> 1) We could add an attribute to the cn element
  shoops>    similar to CellML
  shoops> <cn sbml:units="UnitSId"> 1.0 </cn> In this case
  shoops> the default value should be
  shoops> sbml:units="dimensionless", which currently exist
  shoops> only as unit kind and not as UnitSId. This however
  shoops> is only a minor problem
  shoops>
  shoops> 2) We can adopt the proposal for MathML which can
  shoops>    be found at:
  shoops> http://www.w3.org/TR/mathml-units/ This is
  shoops> basically introducing text into the content
  shoops> markup, i.e., it is by itself not structured
  shoops> enough for SBML needs.
  shoops>
  shoops> 3) We could allow UnitSId to be valid within a ci
  shoops>    element. This creates
  shoops> the problem that UnitSIds and SIds are in separate
  shoops> name spaces and thus we may have
  shoops> ambiguity. However, we could remove this problem
  shoops> by having just one name space for all SIds.
  shoops>
  shoops> 4) We define a third csymbol:
  shoops> http://www.sbml.org/sbml/symbols/units and allow
  shoops> only UnitSIds as values for this csymbol. In case
  shoops> that we need a number with units in MathML this
  shoops> expands to: <apply> <times/> <cn> 1 </cn> <csymbol
  shoops> encoding="text"
  shoops> definitionURL="http://www.sbml.org/sbml/symbols/units">
  shoops> SUBSTANCE_UNITS </csymbol> </apply>
  shoops>
  shoops> I would like to hear about your thoughts an the
  shoops> suggested solutions or even better other simpler
  shoops> ones :)
  shoops>
  shoops> Thanks, Stefan
  shoops>
  shoops>
  shoops> On Sun, 11 Feb 2007 21:59:40 +0000 (GMT) Darren
  shoops> Wilkinson <darrenjwilkinson at btinternet.com> wrote:
  shoops>
  >> Yes, units really are tricky! I don't have a strong
  >> view on what to do about them (yet!), but I always
  >> think its helps clarify things to think about my
  >> favourite rate law:
  >>
  >> X(X-1)/2
  >>
  >> Here it is assumed that X has units of "item". Then the
  >> "1" also has units of "item", but the "2" is
  >> dimensionless. This difference really matters if you
  >> decide to convert units to (say) moles, as then the "1"
  >> needs converting (to become the reciprocal of avogadros
  >> constant), but the 2 remains unchanged.  It would
  >> obviously be highly desirable for any unit
  >> checking/conversion capability in SBML/libSBML to be
  >> able to cope with such issues... However, it would be a
  >> bit messy to have to define a "one" and a "two"
  >> externally with appropriate units just to be able to
  >> construct this rate law...
  >>
  >> Regards,
  >>
  >> Darren
  >>
  >>
  >>
  >> --- Michael Hucka <mhucka at caltech.edu> wrote:
  >>
  >> > The SBML editors had an off-line discussion which we
  >> > realized needs to be put on sbml-discuss.  It
  >> > concerns an issue that Sarah Keating ran into while
  >> > implementing unit validation in libSBML.
  >> >
  >> > The issue is how the units of pure numbers in MathML
  >> > should be treated.  Here's an example, in the case of
  >> > the 'delay' field on Events:
  >> >
  >> >     <delay>
  >> >      <math
  >> >      xmlns="http://www.w3.org/1998/Math/MathML">
  >> >       <cn> 1 </cn>
  >> >      </math>
  >> >     </delay>
  >> >
  >> > Note the literal "1".  What units, if any, should be
  >> > assumed for the number?
  >> >
  >> > One option is to treat literal numbers as always
  >> > being without units.  This is a valid viewpoint, but
  >> > it leads to more model validation failures.  Although
  >> > modelers should define parameters (which can have
  >> > defined units) for a lot of the cases where they
  >> > often write numbers directly, the practical reality
  >> > is that many people won't think of doing that.
  >> >
  >> > A second option is to take the units to be whatever
  >> > units are defined as the default for the field in
  >> > question, or if there is an associated field
  >> > definining the units, then whatever that field says
  >> > the units are.  (For instance, there is a 'timeUnits'
  >> > field on Events that could apply in this example.)
  >> > The logic behind this idea is that even though the
  >> > user didn't specify the units explicitly, a
  >> > reasonable interpretation of their intention is that
  >> > the units are meant to be whatever the defaults are.
  >> > In other words, the modeler didn't mean "1" without
  >> > units, they meant "1 in the units of time" for this
  >> > example.
  >> >
  >> > Sounds reasonable?  But it gets more complicated.
  >> > Imagine now something like
  >> >
  >> >   <math xmlns="http://www.w3.org/1998/Math/MathML">
  >> >     <apply>
  >> >       <plus/> <ci> S </ci> <cn> 2 </cn>
  >> >     </apply>
  >> >   </math>
  >> >
  >> > For the purpose of this second example, the "S" can
  >> > be anything (e.g., a species identifier) and the
  >> > context can be anything, not just the Events example
  >> > above.
  >> >
  >> > The question is, what should be assumed for the units
  >> > of "2"?
  >> >
  >> > If we follow the same guideline, i.e., that the
  >> > literal number has whatever units are appropriate for
  >> > the situation, then it creates validation problems.
  >> > First, it implies a unit validator must deduce the
  >> > expected units somehow, and this may sometimes be
  >> > impossible due to ambiguities in a given situation.
  >> > Second, Stefan Hoops pointed out it opens a hole in
  >> > validation: it effectively means that numbers in
  >> > MathML are assumed to always have the appropriate
  >> > units, so unit checking becomes worthless.
  >> >
  >> > To cope with this, we are tentatively thinking of
  >> > using the following set of guidelines:
  >> >
  >> > 1) If the number is the only thing contained inside a <math>
  >> >    (such as the original example involving a delay on
  >> >    Events), assume that the units of the number are
  >> >    the units defined for that field (so, timeUnits
  >> >    for delays).
  >> >
  >> > 2) In all other cases, assume pure numbers have no
  >> >    units.
  >> >
  >> > This handles the first case, and also addresses the
  >> > second case.  (In the second case, there would be a
  >> > unit validation failure because the "2" would not
  >> > have units and therefore the overall expression could
  >> > not have consistent units.)
  >> >
  >> > The downside is that this "rule" is defined with an
  >> > exception.
  >> >
  >> > What do the rest of you SBMLers think?
  >> >
  >> > Whatever is decided, the next SBML specification will
  >> > have to spell out the guidelines precisely.
  >> >
  >> > MH
  >> >
  >> > ____________________________________________________________
  >> > To manage your sbml-discuss list subscription, visit
  >> > https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
  >> >
  >> > For a web interface to the sbml-discuss mailing list,
  >> > visit http://sbml.org/forums/
  >> >
  >> > For questions or feedback about the sbml-discuss
  >> > list, contact sbml-team at caltech.edu.
  >> >
  >>
  >>
  >> -- Darren Wilkinson email:
  >> darrenjwilkinson at btinternet.com home www:
  >> http://www.darrenjwilkinson.btinternet.co.uk/ work www:
  >> http://www.staff.ncl.ac.uk/d.j.wilkinson/
  >> ____________________________________________________________
  >> To manage your sbml-discuss list subscription, visit
  >> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
  >>
  >> For a web interface to the sbml-discuss mailing list,
  >> visit http://sbml.org/forums/
  >>
  >> For questions or feedback about the sbml-discuss list,
  >> contact sbml-team at caltech.edu.
  shoops>
  shoops>
  shoops> -- Stefan Hoops, Ph.D.  Senior Project Associate
  shoops> Virginia Bioinformatics Institute - 0477 Virginia
  shoops> Tech Bioinformatics Facility I Blacksburg, Va
  shoops> 24061, USA
  shoops>
  shoops> Phone: (540) 231-1799 Fax: (540) 231-2606 Email:
  shoops> shoops at vbi.vt.edu
  shoops> ____________________________________________________________
  shoops> To manage your sbml-discuss list subscription,
  shoops> visit
  shoops> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
  shoops>
  shoops> For a web interface to the sbml-discuss mailing
  shoops> list, visit http://sbml.org/forums/
  shoops>
  shoops> For questions or feedback about the sbml-discuss
  shoops> list, contact sbml-team at caltech.edu.



More information about the sbml-discuss mailing list