subset problem -- for ship loop (related to Issue #5)
Hello this post is related to issue #5 (closed) highlighting other lines where the subset function is consuming time and resources.
This is however a tiny sample of code ran with the Rstudio profiler only on this section of the code:
for ( ship in names(sp) ) {
icount<-icount+1
sub<-sp[[ship]]
if ( print.time) setTxtProgressBar(pb, icount)
if(nrow(sub) <= 5 ) next
sub<-sub[base::order(sub$date),]
# check if intervals are all the same
if(length(table(diff(sub$date))) == 1 ) next
idck<-paste(names(table(sub$dck)),collapse="/")
igroup<-names(sort(table(sub$group),decreasing=T)[1])
#if ( igroup != 0 ) next
igroup<-paste0("g_",igroup)
if(print.comm)cat(ship,idck,"\n")
if ( !(igroup %in% names(sp.fill) ) ) next
#app.fills<-rbind(app.fills,find_gap_func(sub,sp.fill[[igroup]]))
tofill<-subset(sp.fill[[igroup]],!(sp.fill[[igroup]]$date %in% sub$date))[,c("date","lat","lon","new.id","uid","date","group","dck")]
if(nrow(tofill) == 0 ) next
#app.fills[[icount]]<-find_gap_func(sub,sp.fill[[igroup]])
app.fills[[icount]]<-find_gap_func(sub,tofill,sub$new.id[1])
#print(app.fills)
}
if(print.time) cat("\n")
app.fills<-do.call("rbind", app.fills)
Tips to check profvis output: at the right hand side of the link below ... there is an Options tab > click: Hide lines of code with zero time and deselect: split horizontally for an easier view. Also ignore the tree presentation graph (too complicated) ... you can also click on Data for a summary
You can see the results of the profiler in this link: https://rpubs.com/bearecinos/621348
The sample is super small (which is a shame that such output can't be produced when run outside Rstudio) but I think is easier to see which lines are taken the longest and probably slightly extrapolate if we consider all the data.
This loop is through ship names over one month that is hosted within another loop through years
e.g. 1970/3 the data frame which loops into (sp
) has 2997 rows.
But since the sample is very small I still need to complement this with the output from Rprof() ran in jasmin on the command line... which I will add in the comments below.
The summary is:
-
Line #L528 of new_merge_ids_year.R Again the subset there is a problem.
-
Line #50 of find_gap_func.R This again the subset already pointed out by @eck in issue #5 (closed)