• timEd message base locking (or lack thereof) in UNIX

    From andrew clarke@3:633/267 to All on Thu Oct 25 04:38:04 2012
    Can't open message (594)!
    MSGAPI: MERR_NOENT: File/message does not exist

    This is very old news, but I don't think I ever documented it. It's what happens in the UNIX version of timEd when you start writing a message destined for a Squish message base but in the background a message is imported into the same echomail area by another program, eg. HPT.

    I don't think you actually lose any messages but the index seems to get corrupted somehow. The way I fix it is to run sqconver (from MSGAPI/SMAPI/XMSGAPI) over the broken msgbase, then replace the original msgbase with the newly created one. Obviously you wouldn't want to do this while HPT is trying to toss messages into it, though.

    I don't think this happens in the OS/2 version because of file locking, meaning
    HPT (or Squish) can't write to the .SQD file while timEd has it open. Then again, back when I used OS/2 I don't ever recall having locking violations, so I'm not sure. Maybe Squish just waited until the msgbase became writable.

    In UNIX, file locking works quite differently than in DOS, Windows & OS/2. From
    Wikipedia:

    "Although some types of locks can be configured to be mandatory, file locks under Unix are by default advisory."

    In simplistic terms this means HPT & timEd need to cooperate with each other with regards to not clobbering each other's files, but they're clearly not doing that.

    The timEd code is broken though. It shouldn't be doing this (pseudocode):

    ma = MsgOpenArea("artware")
    mh = MsgOpenMsg(ma)
    txt = GetTextFromUser()
    MsgWriteMsg(mh, txt)
    MsgCloseMsg(mh)
    MsgCloseArea(ma)

    Instead it should do this:

    txt = GetTextFromUser()
    ma = MsgOpenArea("artware")
    mh = MsgOpenMsg(ma)
    MsgWriteMsg(mh, txt)
    MsgCloseMsg(mh)
    MsgCloseArea(ma)

    I have no idea if the same bug (or a different bug...) occurs when writing to *.MSG format bases.

    The main reason I'd not bothered to attempt to fix it is because it's (from what I can tell) a fundamental design problem with timEd. Also, it only affects
    me very occasionally on account of me only posting to very low-volume echomail areas.

    It's also a design fault with Scott Dudley's design of the MSGAPI. A well-written API shouldn't allow data corruption of the very thing it's allowing you to access. Of course it's easy to say that in hindsight all these years later.

    I think from now on I'll be using GoldED or Msged just to avoid needing to run sqconver again. I'm pretty sure both of those write to Squish messagebases the correct way (see above psuedocode).

    In an ideal world I'd write my own very basic message editor in Python and bypass the MSGAPI altogether but doing that is pretty low on my list of Things To Do(tm). :-)

    Regards
    Andrew

    --- timEd/FreeBSD 1.11.b8
    * Origin: Blizzard of Ozz, Melbourne, Victoria, Australia (3:633/267)
  • From andrew clarke@3:633/267 to All on Thu Oct 25 04:45:50 2012
    Hello everybody.

    25 Oct 12 04:38, I wrote to all:

    I don't think you actually lose any messages but the index seems to
    get corrupted somehow. The way I fix it is to run sqconver (from MSGAPI/SMAPI/XMSGAPI) over the broken msgbase, then replace the
    original msgbase with the newly created one. Obviously you wouldn't
    want to do this while HPT is trying to toss messages into it, though.

    Actually it turns out I'm still getting "MSGAPI: MERR_NOENT: File/message does not exist" errors even after running sqconver, which is pretty weird when you think about how sqconver is supposed to work.

    I might try getting sqconver to convert the base to *.MSG format, then back to Squish format.

    Or I could just not use Squish format, I suppose. On modern PCs with new filesystems, having few thousand *.MSG files in a single directory is not a big
    deal.

    --- GoldED+/BSD 1.1.5-b20110223-b20110223
    * Origin: Blizzard of Ozz, Melbourne, Victoria, Australia (3:633/267)
  • From andrew clarke@3:633/267 to All on Thu Oct 25 05:24:16 2012
    25 Oct 12 04:45, I wrote to all:

    I might try getting sqconver to convert the base to *.MSG format, then back to Squish format.

    More fun, as I discover that:

    sqconver oldbase squish newbase '*.msg' 0

    occasionally generates *.msg files with extranous nuls appended to the end:

    00000390 35 2f 30 0d 01 50 41 54 48 3a 20 32 34 39 2f 33 |5/0..PATH: 249/3| 000003a0 30 33 20 32 32 39 2f 32 30 30 30 20 31 32 33 2f |03 229/2000 123/| 000003b0 35 30 30 20 32 36 31 2f 33 38 20 36 33 33 2f 32 |500 261/38 633/2| 000003c0 36 30 0d 0d 00 30 30 20 32 36 31 2f 33 38 20 36 |60...00 261/38 6| 000003d0 33 33 2f 32 36 30 0d 0d 00 39 2f 33 30 33 20 32 |33/260...9/303 2| 000003e0 32 39 2f 32 30 30 30 20 31 32 33 2f 35 30 30 20 |29/2000 123/500 | 000003f0 32 36 31 2f 33 38 20 36 33 33 2f 32 36 30 0d 0d |261/38 633/260..| 00000400 00 36 31 2f 33 38 20 36 33 33 2f 32 36 30 0d 00 |.61/38 633/260..| 00000410

    ... which then cause sqconver to generate a corrupted .SQD file when the *.MSG files are converted back to Squish format:

    sqconver newbase '*.msg' newerbase squish 0

    Possibly a case of GIGO, where the original Squish base was already partly corrupt, but FTS-1 requires that a stored *.MSG file to have just a single nul terminator at the end, so these aren't to spec. Arguably another MsgAPI bug.

    Also, more of a cosmetic bug perhaps, but sqconver adds @INTL kludges to each message, despite it being an echomail area (which is why I specified zero for the fifth parameter).

    Hmph.

    One day I may write a sqconver-workalike in Python. Minus all the bugs...

    --- GoldED+/BSD 1.1.5-b20110223-b20110223
    * Origin: Blizzard of Ozz, Melbourne, Victoria, Australia (3:633/267)
  • From andrew clarke@3:633/267 to All on Sun Oct 28 07:02:32 2012
    25 Oct 12 04:45, I wrote to all:

    Or I could just not use Squish format, I suppose. On modern PCs with
    new filesystems, having few thousand *.MSG files in a single directory
    is not a big deal.

    Actually in hindsight with *.MSG there's still a window where a tosser can overwrite a message editor's newly-created .MSG file. The problem is that both programs need to find the highest numbered .MSG file first before creating the new message, and the time this takes even on a modern PC that could be quite high - in the order of several seconds if the CPU is loaded, or the filesystem is on a remote machine. During this window it's possible for the other program to generate a new message, so the "highest message" counter can get out of sync
    between the two programs. The higher the number of messages the more likely it is to occur.

    Just out of curiosity I hacked together two short Python programs. One to create the .msg files initially, and the other to look for the highest numbered
    .msg file. On my system both of these programs take less than a second to run, but this is on a fast CPU on a local filesystem.

    #!/usr/bin/env python

    # Create 5000 files named 1.msg to 5000.msg

    for i in range(5000):
    fn = '%d.msg' % (i + 1, )
    fp = open(fn, 'w')
    fp.close()


    #!/usr/bin/env python

    # Find the highest numbered .msg file in a directory

    import os, os.path

    highest = 0

    dp = os.listdir('.')

    for fn in dp:
    filename, ext = os.path.splitext(fn)
    if ext == '.msg':
    n = 0
    try:
    n = int(filename)
    except ValueError:
    pass
    if n > highest:
    highest = n

    print highest

    --- GoldED+/BSD 1.1.5-b20110223-b20110223
    * Origin: Blizzard of Ozz, Melbourne, Victoria, Australia (3:633/267)
  • From andrew clarke@3:633/267 to All on Wed Nov 14 05:05:44 2012
    25 Oct 12 04:38, I wrote to all:

    @MSGID: 3:633/267.0 508826fe5
    --- timEd/FreeBSD 1.11.b8

    I just happened to notice timEd was generating malformed MSGID serial numbers where time_t is 64-bit.

    Fixed in SVN.

    --- GoldED+/BSD 1.1.5-b20110223-b20110223
    * Origin: Blizzard of Ozz, Melbourne, Victoria, Australia (3:633/267)