Debugging sopi

⇠ Back to Blog:Hacks

sopi works great except when it doesn't, in which case he just vomits an error message like this:

laussy@covid:~/Dropbox/Fabrice/Pictures/2023$ ./sopi .
Hi there! Sopi working with 6364 files in /home/laussy/Dropbox/Fabrice/Pictures/2023
Started 2024-01-05T01:26:57.484
ERROR: LoadError: BoundsError: attempt to access 0-element Array{Int64,1} at index [1]
Stacktrace:
 [1] getindex(::Array{Int64,1}, ::Int64) at ./array.jl:809
 [2] datemy(::String) at /home/laussy/Dropbox/Fabrice/Pictures/2023/mysopi:46
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect_to!(::Array{Array{SubString{String},1},1}, ::Base.Generator{Array{String,1},typeof(datemy)}, ::Int64, ::Int64) at ./array.jl:732
 [5] collect_to_with_first!(::Array{Array{SubString{String},1},1}, ::Array{SubString{String},1}, ::Base.Generator{Array{String,1},typeof(datemy)}, ::Int64) at ./array.jl:710
 [6] collect(::Base.Generator{Array{String,1},typeof(datemy)}) at ./array.jl:691
 [7] top-level scope at /home/laussy/Dropbox/Fabrice/Pictures/2023/mysopi:51
 [8] include(::Function, ::Module, ::String) at ./Base.jl:380
 [9] include(::Module, ::String) at ./Base.jl:368
 [10] exec_options(::Base.JLOptions) at ./client.jl:296
 [11] _start() at ./client.jl:506
in expression starting at /home/laussy/Dropbox/Fabrice/Pictures/2023/mysopi:51

Something is causing an exception, returns an unexpected empty list... I don't know what this is, and of course I forgot everything about the code. So I have to debug it as if it's been written by someone else!

The code works on extracting information from exiftool. Some file is not returning a valid output.

The first thing to do is reduce the bug to its offending material, supposedly a particular file. I am dealing with thousands of files which takes several minutes, so the first thing is by trial an error, find a subset that breaks the code in a few seconds. I do this by running on subsets until one breaks the code. I arrive to this one:

Screenshot 20240105 122429.png

Even without digging into the code, it is clear that one is different from the others:

Screenshot 20240105 122926.png

Sure enough 2023-06-18 12.12.04.jpg is not a picture but an image downloaded by Elena which contaminated the collection, although it gets there with a timestamp:

2023-06-18 12.12.04.jpg

The problem is that such images do not have the tag "Date/Time Original"

laussy@covid:~/Dropbox/Fabrice/Pictures/2023/subset/subset/subset/subset4/subset$ exiftool working.jpg | grep "Date/Time Original"
Date/Time Original              : 2023:06:17 15:40:57
Date/Time Original              : 2023:06:17 15:40:57.200+01:00
laussy@covid:~/Dropbox/Fabrice/Pictures/2023/subset/subset/subset/subset4/subset$ exiftool broken.jpg | grep "Date/Time Original"
laussy@covid:~/Dropbox/Fabrice/Pictures/2023/subset/subset/subset/subset4/subset$ 

so the information is not extracted, making a 0-index table, which access crashes the code. Now the bug is found.

To solve it, I tag the offending case with a ["0","0"] entry as opposed to date and time strings. And then I skip over this case. This makes v0°3:

#!/bin/sh
#  ____              _ 
# / ___|  ___  _ __ (_)
# \___ \ / _ \| '_ \| |
#  ___) | (_) | |_) | |
# |____/ \___/| .__/|_|
# v°0.3       |_|      
# Fri  5 Jan 2024
# F.P. Laussy - fabrice.laussy@gmail.com
# 
# Sopi sorts jpg files in a directory tree named after the dates
# at which the pictures have been taken (after their exif data)
# (`sopi' stands for Sort Pictures)
#
#=
exec julia -O3 "$0" -- $@
=#

using Dates;

# Goes to given directory if one is given as argument
# If not argument given, exit (for safety)
if length(ARGS)!=0
   cd(ARGS[1])
   else
       println("The working directory must be given as argument")
       println("To process files in the current directory, use:")
       println(" sopi .")
       exit()
end
       
# This list the filenames of JPG files to process (in current path)
lfn=filter(x->occursin("jpg",lowercase(x)), readdir());

# Starting
println("Hi there! Sopi working with "*string(length(lfn))*" files in "*pwd())

print("Started ");
print(now());
print("\n")

# This returns a vector with date and time from the exif data
function datemy(fn)
    mdata = read(`exiftool $fn`,String)
    sdata=split(mdata,"\n")[findall( x -> occursin("Date/Time Original", x) , split(mdata,"\n"))]
    if isempty(sdata)
        # if there is no "Date/Time Original" in exif, we tag for exclusion
        ["0","0"]
    else
        # otherwise we extract date and time
        [split(sdata[1]," ")[end-1],split(sdata[1]," ")[end]]
    end
end

# This collects all the dates and times to process and transform into a matrix
alltimes=permutedims(reduce(hcat,[datemy(i) for i in lfn]))

# Initialize the files to exclude to none
excludefile = Int[];

# Find the entries tagged for exclusion from exif
for i=1:length(lfn)
    if alltimes[i] == "0"
        append!(excludefile, i)
    end
end

# Keep unique days
uniquetimes=(x->replace(x, ":"=>"/")).(unique(alltimes[:,1]))

# This creates the directory tree
lmonths=["/01/" "/01-January/"; "/02/" "/02-February/"; "/03/" "/03-March/"; "/04/" "/04-April/"; "/05/" "/05-May/"; "/06/" "/06-June/"; "/07/" "/07-July/"; "/08/" "/08-August/"; "/09/" "/09-September/"; "/10/" "/10-October/"; "/11/" "/11-November/"; "/12/" "/12-December/"]

for i=1:12
 global uniquetimes=(x->replace(x,lmonths[i,1]=>lmonths[i,2])).(uniquetimes[:,1])
end

# Remove the case of exif exclusion (would create a "0" year)
filter!(e->e!="0",uniquetimes)

print("Working with "*string(length(uniquetimes))*" directories...\n")
mkpath.(uniquetimes)

print("Moving files!\n")
# This puts the files in place
for i=1:length(lfn)
    # skip file if no exif
    (i in excludefile) && continue
    dest=split(alltimes[i,1],":")
    # to rename after date
#    mv(lfn[i],dest[1]*lmonths[:,2][parse(Int64,dest[2])]*dest[3]*"/"*replace(replace(lfn[i],".JPG"=>"-"),".jpg"=>"-")*alltimes[i,2]*".jpg")
    # to NOT rename after date
    mv(lfn[i],dest[1]*lmonths[:,2][parse(Int64,dest[2])]*dest[3]*"/"*lfn[i])
end

print("Finished ")
print(now())
print("\n")