[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [computer-go] alpha-beta + graph history question

To: evand@xxxxxxxxx,computer-go@xxxxxxxxxxxxxxx
Subject: Re: [computer-go] alpha-beta + graph history question
From: Don Dailey <drd@xxxxxxx>
Date: Thu, 17 Mar 2005 20:58:03 -0500
Cc: computer-go@xxxxxxxxxxxxxxx
Delivered-to: computer-go@xxxxxxxxxxxxxxxxx
In-reply-to: <4f9383510503162232566f1dd8@xxxxxxxxxxxxxxxxx> (message from EvanDaniel on Thu, 17 Mar 2005 01:32:32 -0500)
List-archive: <http://computer-go.org/pipermail/computer-go>
List-help: <mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=help>
List-id: computer-go <computer-go.computer-go.org>
List-post: <mailto:computer-go@xxxxxxxxxxxxxxxxx>
List-subscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=subscribe>
List-unsubscribe: <http://hosting.midvalleyhosting.com/mailman/listinfo/computer-go>,<mailto:computer-go-request@xxxxxxxxxxxxxxxxx?subject=unsubscribe>
References: <4f9383510503162232566f1dd8@xxxxxxxxxxxxxxxxx>
Reply-to: drd@xxxxxxx,computer-go <computer-go@xxxxxxxxxxxxxxx>
Sender: computer-go-bounces@xxxxxxxxxxxxxxx

Evan,

There is another issue here too.   In your code you have
the call:  check_transposition_table();

But there is no clue about what happens as a result of the call.
Presumably, you may be able to stop the search at this node, but your
code sample doesn't indicate this.

But the check_transposition_table() function must also have some
tricky logic to it based on what kind of entry is stored in the
table.   For instance if there is a potentially usable score in the table
whether you can actually use it or not depends on what alpha and beta
are when the call is made and the type of record stored in the entry.
If the entry is stored as an exact score you can  use it as is and 
you are finished at this node.    If the score stored in the table is
a lower bound,  and is greater than alpha,  you can treat this as a
beta cutoff.  

But you don't show any of this logic,  so it's not easy to figure out
exactly what you are doing.

In my program, I do all the work inside the get_transposition and
put_transposition call and I do not worry about this inside the tree
search code itself.  I simply pass the current alpha and beta values
to put and get.  This makes the tree search stuff simpler to
understand and isolates all the crufty logic to the transposition
table routines.

Here is some actual code from my checkers program.  Instead of
cleaning it up, I'll show it to you in all its glory (or ugliness.)  
A couple of notes first to make it a bit easier to follow:

  1.  There are actually 2 tables, one is always overwrite, the other
      is depth precendent.  Even though this means you can use only
      have as many entries for a given amount of memory, it is clearly
      superior than a single table and makes you program search
      better.

  2.  This code is from a palm OS program, so unfortunately you must
      wade through some memory management issues like MemHandleLock.
      You can pretty much ignore that at pick it up at the point *ht,
      the actual entry, has been established.

  3.  The code that looks similar to this:  if ( sc >  (WIN - 1000) ) t->hr[i].sc += ply;
      deals with scoring wins correctly.  Since in checkers you can
      see a won position at various depths due to transpositions,
      funky things happen when you store scores that represent wins at
      various depths.  This is likely to be a non issue in a Go
      program, so you can basically ignore or delete that code.
  
  4.  The program ages records, so that it is not neccessary to clear the table
      before each search.  That's a big benefit because you can use old entries.
      After each search, a global age variable is incremented and an entry that
      doesn't have the current age can always be overwritten, in  either table.
      I use 5 bits for age, but even 2 or 3 works well if you only have this many
      of extra bits laying around!

  5.  The program stores and returns a "best move" which is very handy for improving
      the search efficiency,  even if the entry cannot be used.

  6.  get_hash() returns 1 if the entry can be used directly and puts the score in the
      *sc pointer.

  7.  'ctx' is a big global state structure where I keep global variables.  I probably
      wouldn't use it in a non-palm program but it makes it convenient for keeping
      the state of the application which must be saved and restored whenever someone
      enters and leaves an application.

  8.  Even with all of this,  the routines do not require a whole lot of code.  


I apologize for not simplifying the code, but if I did, I would
probably make errors.  Besides, the way it is now is a complete
example and demonstrates aging, double entry tables, correct depth
based win scoring, best move storage (and simple palm memory
management if anyone cares.)



typedef struct
{
   UInt32        key;
   Int16         sc;
   int           depth:8;
   int           bound:3;
   UInt8         keep_forever;
   UInt16        age:5;          
   move_t        bm;

} sub_rec;


typedef struct
{
   sub_rec  hr[2];

} transposition;



void   put_hash( position *p, int sc, int alpha, int beta, int depth, move_t bm, int ply )
{
  int             i;
  transposition  *t;
  unsigned long   cix = (p->akey >> 16) & (HT_CHUNKS-1);
  unsigned long   add = p->akey & (TT_SIZE-1);
  transposition   *ht;

  ht = MemHandleLock( htmh[cix] );
  add |= 1;                       // set bit 0 for black to move positions
  if (p->ctm & 1) add ^= 1;
  t = &(ht[add]);

  for (i=0; i<2; i++)
  {
    if (ply >= 0)
    {
      if ( i==0 && t->hr[i].age == ctx->age && depth < t->hr[i].depth) continue;
      if (t->hr[i].keep_forever)
      {
        MemHandleUnlock( htmh[cix] );
        return;
      }
    }

    if (ply == -1)
      t->hr[i].keep_forever = 1;
    else
      t->hr[i].keep_forever = 0;

    t->hr[i].age = ctx->age;
    t->hr[i].bm = bm;
    t->hr[i].key = p->pkey;
    t->hr[i].sc = sc;
    t->hr[i].depth = depth;

    if (sc >= beta)
    {
      t->hr[i].bound = LB;
      if ( sc >  (WIN - 1000) ) t->hr[i].sc += ply;
      if ( sc < -(WIN - 1000) ) t->hr[i].sc -= ply;
    }
    else if (sc <= alpha)
    {
      t->hr[i].bound = UB;
      if ( sc >  (WIN - 1000) ) t->hr[i].sc += ply;
      if ( sc < -(WIN - 1000) ) t->hr[i].sc -= ply;
    }
    else
    {
      if ( sc >  (WIN - 1000) ) t->hr[i].sc += ply;
      if ( sc < -(WIN - 1000) ) t->hr[i].sc -= ply;
      t->hr[i].bound = EX;
    }

    MemHandleUnlock( htmh[cix] );
    return;
  }

}




int  get_hash( position *p, int *sc, int alpha, int beta, int depth, move_t *bm, int ply )
{
  int             i;
  transposition   *t;
  unsigned long   cix = (p->akey >> 16) & (HT_CHUNKS-1);
  unsigned long   add = p->akey & (TT_SIZE-1);
  int             rval = 0;
  transposition   *ht;

  ht = MemHandleLock( htmh[cix] );

  add |= 1;                    // set bit 0 for black to move positions
  if (p->ctm & 1) add ^= 1;
  t = &(ht[add]);

  for (i=0; i<2; i++)
  {
    if ( t->hr[i].key != p->pkey ) continue;
    *bm = t->hr[i].bm;
    if ( t->hr[i].depth < depth ) continue;

    *sc = t->hr[i].sc;

    if ( *sc >  (WIN - 1000) ) *sc -= ply;      
    if ( *sc < -(WIN - 1000) ) *sc += ply;

    switch ( t->hr[i].bound )
    {
      case LB :
        if (*sc >= beta) rval = 1;
        break;

      case UB :
        if (*sc <= alpha) rval = 1;
        break;

      case EX :
        rval = 1;
        break;
    }

    if (rval)
    {
      MemHandleUnlock( htmh[cix] );
      return(rval);
    }
  }

  MemHandleUnlock( htmh[cix] );
  return(0);
}






- Don





   In my go program (GoFigure), I use a global alpha-beta search with
   transpostion table.  The transposition table is rotationally
   invariant, and maintained across invocations of the program, which
   results in walking of the game tree in ways that seem good at
   demonstrating subtle problems with my code and my understanding of
   alpha-beta search.

   My alpha-beta function (simplified) looks something like this:

   alphabeta(int alpha, int beta, int depth) {
     check_transposition_table();
     get_moves();
     score_type = ALPHA_CUTOFF;
     for (int i = 0; i < children.length; i++) {
       children[i].alphabeta(beta, alpha, depth + 1);
       if (child.score > score) {
	 score = child.score;
	 score_type = EXACT_SCORE;
       }

       if (score > beta) {
	 store_transposition_table();
	 score_type = BETA_CUTOFF;
	 return score;
       }

       if (score > alpha) {
	 assert child.score_type == EXACT_SCORE;
       }
     }

     store_transposition_table();
     return score;


   (It's a bit more complicated, but I think that's the relevant bit.)

   Now, the question is:  is the assert at the bottom of the loop a valid
   assertion to make?  On very rare occasion, it triggers, but I haven't
   been able to track down why or under what conditions exactly.  So does
   that represent a bug buried in my code somewhere, or are there real
   cases where it isn't true in properly coded alpha-beta searches?

   Thanks in advance,

   Evan
   _______________________________________________
   computer-go mailing list
   computer-go@xxxxxxxxxxxxxxxxx
   http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@xxxxxxxxxxxxxxxxx
http://www.computer-go.org/mailman/listinfo/computer-go/

References:
- [computer-go] alpha-beta + graph history question
  - From: Evan Daniel

Prev by Date: Re: [computer-go] alpha-beta + graph history question
Next by Date: [computer-go] Fwd: IEEE CIG 2005 Call for Participation
Previous by thread: Re: [computer-go] alpha-beta + graph history question
Next by thread: RE: [computer-go] alpha-beta + graph history question
Index(es):
- Date
- Thread