Improve the general performance of file processing

Registered by cmm2 on 2015-03-01

It came to our attention in bug #1404588 that the time taken by Pantheon Files to perform move/copy/delete operations could be much decreased after an optimization pass.

An earlier examination of the libcore code revealed the slow performance is likely caused by

   * an excessive number of object copies and allocations
   * poor choice of containers and algorithms (e.g., using a hash table for key iteration, when an array or unordered set would be more appropriate)
   * loose usage of the GLib/GIO libraries (e.g., allocating hundreds of thousands of small objects, wreaking havoc on the memory allocator)
   * maintainability issues probably resulting from close coupling with the UI layer

A callgrind log of r1754 doing a copy of 250K files can be downloaded here:
        http://paste.ubuntu.com/10480707/plain/
Or annotated:
        http://paste.ubuntu.com/10480709/

I suggest a larger-scale plan to solve these and other issues by:

1. Removing the coupling of the undo/redo system from the file layer

      Undo should behave in simplistic response to prior user actions. It should not need to keep a record of all intimate file details (e.g., what files were copied, where they were copied to, and what their permissions were).

      Instead, it should issue the opposite action to what the user originally commanded (i.e., delete Folder A and its subcontents, *not* delete Folder A and its 1,000,000 recorded subcontents).

     Windows Explorer and Nautilus work this way, and I consider it only reasonable given performance concerns.

2. Removing the mass returning of affected files from recursive file operations

      Currently, up to millions of GFile objects are created and returned when copying a large directory. This isn't good, obviously, and there are other alternatives, such as:

   * using signals or callbacks to report back single files asynchronously to the UI thread
   * using inotify for monitoring changes to the destination directory.
   * (later edit: or removing this behavior entirely, as a code analysis suggests the return data isn't even being used)

      In general, the second approach (inotify/fanotify/GFileMonitor) should be preferred, because some move/create/rename file operations may fail in unreported ways.

3. Implementing the undo/redo code in Vala

      Following item #1, the logical continuation would be to implement undo in Vala, as this is primarily a UI component.

4. Refactor or reimplement the file handling code

      There are a large number of improvements that can be made in this area, beginning with a reduction in code surface size, usage of newer GFile APIs, providing async methods, introduction of unit tests, and general performance gains.

      A longer-term goal could be to also write this subsystem in Vala. However, initial work at simplifying the current C code would make a Vala port easier to accomplish later in time.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
cmm2
Direction:
Needs approval
Assignee:
None
Definition:
Approved
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Whiteboard

There is a lot of work here. It needs to be broken into work items prioritised according to their cost (ease of implementation) and benefit (performance gain).

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.