Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Issues
  • #6

Closed
Open
Created May 29, 2020 by brivas@brecinosrivasMaintainer

subset problem -- for ship loop (related to Issue #5)

Hello this post is related to issue #5 (closed) highlighting other lines where the subset function is consuming time and resources.

This is however a tiny sample of code ran with the Rstudio profiler only on this section of the code:

for ( ship in names(sp) ) {
   icount<-icount+1
   sub<-sp[[ship]]
   if ( print.time) setTxtProgressBar(pb, icount)
   if(nrow(sub) <= 5 ) next
   sub<-sub[base::order(sub$date),]
   # check if intervals are all the same 
   if(length(table(diff(sub$date))) == 1 ) next
   idck<-paste(names(table(sub$dck)),collapse="/")
   igroup<-names(sort(table(sub$group),decreasing=T)[1])
   #if ( igroup != 0 ) next
   igroup<-paste0("g_",igroup)
   if(print.comm)cat(ship,idck,"\n")
   if ( !(igroup %in% names(sp.fill) ) ) next
   #app.fills<-rbind(app.fills,find_gap_func(sub,sp.fill[[igroup]]))
   tofill<-subset(sp.fill[[igroup]],!(sp.fill[[igroup]]$date %in% sub$date))[,c("date","lat","lon","new.id","uid","date","group","dck")]
   if(nrow(tofill) == 0 ) next
   #app.fills[[icount]]<-find_gap_func(sub,sp.fill[[igroup]])

   app.fills[[icount]]<-find_gap_func(sub,tofill,sub$new.id[1])
    
   #print(app.fills)
  }
  if(print.time) cat("\n")
  app.fills<-do.call("rbind", app.fills)

Tips to check profvis output: at the right hand side of the link below ... there is an Options tab > click: Hide lines of code with zero time and deselect: split horizontally for an easier view. Also ignore the tree presentation graph (too complicated) ... you can also click on Data for a summary

You can see the results of the profiler in this link: https://rpubs.com/bearecinos/621348

The sample is super small (which is a shame that such output can't be produced when run outside Rstudio) but I think is easier to see which lines are taken the longest and probably slightly extrapolate if we consider all the data.

This loop is through ship names over one month that is hosted within another loop through years e.g. 1970/3 the data frame which loops into (sp) has 2997 rows.

But since the sample is very small I still need to complement this with the output from Rprof() ran in jasmin on the command line... which I will add in the comments below.

The summary is:

  • Line #L528 of new_merge_ids_year.R Again the subset there is a problem.

  • Line #50 of find_gap_func.R This again the subset already pointed out by @eck in issue #5 (closed)

Edited May 29, 2020 by brivas
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking