Finding S3 objects associated with the snapshot

Hi,

I am trying to find a way to discover which blobs on S3 are needed to restore a snapshot.

Examples where this would be needed:

  • placing the AWS Legal hold on those objects
  • increasing the Object lock retention period for those objects
  • requesting objects from Glacier

I found this post that asked the same question for p* files.

Is traversing all files in the snapshot the only way or there is a more efficient method by using the Kopia API?

If there is a more efficient way please point me to Kopia code I can use as an example to implement this.

Thank you,
David

Yeah, that’s a much-needed addition to Kopia and pretty essential for coldline/glacier storage. It’s actually quite easy to implement, simply wasn’t done et. It does require walking the snapshot tree and translating contents into blobs, but all the primitives are in place so that should be quite easy.

My bandwidth is quite limited these days, but I’ll see if this can be added in the upcoming version.

3 Likes

That would be good to have, thank you.

I wrote a simple script to “walk” through snapshots to find p-blocks for some particular file.

The question is, why

kopia index inspect --all --content-id=ObectID

can find p-blocks for files that are less than 20Mb in size, but return nothing if files are bigger (Object ID started with Ix).

Script is very simple, choose snapshot from list and “walk” through to find p-block for some file, press Enter on directories to go into it and press escape to go back to previous directory.

The only 3 programs needed to run script: kopia, gawk and fzf

#!/bin/sh

# Check dependencies: kopia, fzf, gawk
KOPIA=$(command -v kopia); [ ! -x "${KOPIA}" ] && { echo "Can't find 'kopia'. Exit..."; exit 1; }
FZF=$(  command -v fzf);   [ ! -x "${FZF}" ]   && { echo "Can't find 'fzf'. Exit...";   exit 1; }
GAWK=$( command -v gawk);  [ ! -x "${GAWK}" ]  && { echo "Can't find 'gawk'. Exit...";  exit 1; }



### Security sensitive values below ##################################
### token obtained on repository creation via:
### kopia repo status -t -s | grep '\$ kopia repository connect from-config' | awk '{print $7}'
reconnect_token=''


[ -z "${reconnect_token}" ] && {
  echo;
  echo "Error#1: no repo token"; echo;
  echo "Run: kopia repo status -t -s | grep '\$ kopia repository connect from-config' | awk '{print \$7}'"
  echo "to get token and fill variable 'reconnect_token' with it";echo
  exit 1
}

KOPIA_CHECK_FOR_UPDATES=false
export KOPIA_CHECK_FOR_UPDATES

$KOPIA repository connect from-config --token ${reconnect_token};

dl='================================================================================'
sl='--------------------------------------------------------------------------------'

getObj(){
  local list cur_OBJ new_OBJ nPath nName
  cur_OBJ=$1
  nName="$2"

  list="$($KOPIA ls -l  ${cur_OBJ})";

  new_OBJ=$(
    echo "${list}" |
    $FZF --border --info inline --prompt="${nName} > " --header="${dl}
ObjectID: ${cur_OBJ}
${sl}
Search: '-exact, ^-start-with, \$-end-with, |-or, !-negate(plays with ^ and $)
${sl}" |
    $GAWK '{printf "%s",$6}'
  )

  nPath=$(echo "${list}" | $GAWK -vaobj=${new_OBJ} '$6 == aobj {printf "%s", $7}');
  [ -n "${nPath}" ] && nName="${nName}${nPath}"  # ToDo remove ^.*obj\s+ and leave on right part of $0 to make sure whitespaces included

  [ "${cur_OBJ}" != "${new_OBJ}" ] && echo -n "${new_OBJ}∕${nName}"  # return new object, where: '∕' - is not /(slash) but UTF8 0x2215
}


obj_inspect(){
  local bOBJ rc
  bOBJ=$1

  echo "
${dl}
 You can walk through directories to obtain object's hashes and now
 Obtaining p-blocks of object: ${obj}
 Please wait...
${sl}"

  rc=$($KOPIA index inspect --all --content-id=${bOBJ})
  p_blob=$(echo "${rc}" | $GAWK '{print $10}')

  echo "${rc}"
  echo "${sl}"
  echo "p-blob: ${p_blob}"
  echo "${dl}"

  read -p 'Press Enter to continue...' ans

}




while true; do   ############# Main LOOP through snapshots #####################
  fName='/'
  # Obtain snapshot ID/hash
  snapID=$(
    $KOPIA snap list 2>&1 |
    $FZF --tac --header-lines=1 --border --info inline \
        --prompt='Search snapshot > ' --header="${dl}
Search: '-exact, ^-start-with, \$-end-with, |-or, !-negate(plays with ^ and $)
${sl}" |
    $GAWK '{print $4}'
  )

  obj=$snapID
  while true; do ############# LOOP through directories #####################
    if [ -n "${obj}" ]; then
      oType=$(echo "${obj}" | $GAWK '{printf "%s", substr($0,0,1)}')     #'
      case $oType in
        [Kk]) ;;    # Allow to dive in subdirectories only
           *)
              obj_inspect "${obj}"

              # restore where we stopped
              obj=$( echo "${obj_stack}" | $GAWK '{printf "%s", $NF}');
              fName="${last_fName}"
              continue;;
      esac

      last_fName="${fName}"
      last_OBJ=$( echo "${obj_stack}" | $GAWK '{printf "%s", $NF}')
      [ "${last_OBJ}" != "${obj}" ] && obj_stack="${obj_stack} ${obj}"  # PUSH new on stack
      rc=$(getObj ${obj} "${fName}")
      obj=$(  echo "${rc}" | $GAWK -F'∕' '{printf "%s", $1}')    # '∕' - is not /(slash) but UTF8 0x2215
      fName=$(echo "${rc}" | $GAWK -F'∕' '{printf "%s", $2}')    # '∕' - is not /(slash) but UTF8 0x2215
    fi

    if [ -z "${obj}" ]; then
      obj_stack=$(echo "${obj_stack}" | $GAWK 'NF{NF--};1')      # POP. remove last element from stack
      obj=$(      echo "${obj_stack}" | $GAWK '{printf "%s", $NF}')
      fName="${fName%/*}"; fName="${fName%/*}/"
      [ -z "${obj}" ] && break
    fi
  done

  if [ "${snapID}" = "${obj}" ]; then
    ans=''; read -p "Quit (y/n) :=> " ans
    case $ans in
      [Yy]) exit 0;;
    esac
  fi
done


exit