Investigate and improve scalability of RhodeCode http access

Registered by Данило Шеган

For the purposes of scalable http git backend, we need to investigate how memory hungry and CPU intensive current http git access is in RhodeCode. If needed, we may need to modify some of the serving methods to serve only bare files (ala non-smart HTTP access).

Blueprint information

Status:
Complete
Approver:
Данило Шеган
Priority:
Essential
Drafter:
Milo Casagrande
Direction:
Needs approval
Assignee:
Georgy Redkozubov
Definition:
Approved
Series goal:
Accepted for trunk
Implementation:
Implemented
Milestone target:
milestone icon 2013.03
Started by
Milo Casagrande
Completed by
Georgy Redkozubov

Related branches

Sprints

Whiteboard

Meta:
Headline: Improve scalability of RhodeCode http access.
Acceptance: git clones over staging git server http can scale to tens of concurrent processes without overloading the machine with memory/CPU usage.
Roadmap id: CARD-148

[danilo, 2013-01-16] Note, this is lower priority for 2013.01.
[milo, 2013-02-01] Blocked since we are waiting to complete previous BP.
[milo, 2013-02-05] Work started, previous BP is almost complete, just a few things left out but that do not prevent work here.
[milo, 2013-02-05] Played around with the website, monitoring it: some thread errors appeared in the logs (known Python problem, fixed also for 2.7, not in Ubuntu), and then I obtained also a OSError [Errno 24] (too many open files) while browsing changelog of a git repository.
[milo, 2013-02-06] Link to the Python issue: http://bugs.python.org/issue14308
[milo, 2013-02-06] Local tests to use another backend for Beaker sessions that looks like is the cause of the too many open files.
[milo, 2013-02-06] Wrote email to Philip, and spoke with him on IRC: Beaker session handling now is done via PostgreSQL.
[milo, 2013-02-06] CI job already configured: https://ci.linaro.org/jenkins/job/milo-staging-git-test/
[milo, 2013-02-06] Problem cloning from staging instance: due to how Apache has been configured for HTTPS: right now all connections are redirected.
[milo, 2013-02-07] Discussed with gesha: problem with clone is due also to the fact that we need to have a 'refs' file in the repositories in order to serve file via HTTP.
[milo, 2013-02-07] Pushed 'update-server-info' script in rhodecode-config branch, under 'scripts/' directory.
[milo, 2013-02-07] Created new CI job, to test external fragments retrieval via staging instance:
https://ci.linaro.org/jenkins/job/milo-staging-git-external-fragments-test
[milo, 2013-02-07] linaro-ci branch with updated references: lp:~milo/linaro-ci/milo-git-staging-tests
[milo, 2013-02-08] Reported bug against Python 2.7: bug 1119195
[milo, 2013-02-08] Still seeing OSError 24 when performing web-based actions: browsing changelogs, branches, and files.
[milo, 2013-02-08] Spoke with ubuntu devs: Python bug has been targeted for 12.04.3, due in August this year, 12.04.2 is due next week.
[milo, 2013-02-08] Sent email to Philip about too many open files on the system.
[milo, 2013-02-08] Added simple RabbitMQ configuration file in rhodecode-config repository, to set a lower value of memory used.
[gesha, 2013-02-11] Another one job pulling from staging instance: https://ci.linaro.org/jenkins/job/gesha-rhodecode-public-test_beagle-omap2plus/
[milo, 2013-02-11] Added HAKCING file to rhodecode-config branch with memory readings from staging server: http://git.linaro.org/gitweb?p=infrastructure/rhodecode-config.git;a=blob_plain;f=HACKING
[milo, 2013-02-12] Removed WI, not strictly needed, gesha talked with RhodeCode devs.
[milo, 2013-02-14] Re-packaged Python to tentatively fix OSError 24, spoke with RhodeCode developer, he is also taking a look at the problem.
[milo. 2013-02-18] Trying locally to reproduce problem: need at least a VM with 3GB of RAM, still not possible to fully reproduce.
[milo, 2013-02-18] Re-packaged Python with another possible solution for OSError 24, upstream bug is: http://bugs.python.org/issue16327
[milo, 2013-02-18] Created single CI job for triggering multiple git cloning jobs: https://ci.linaro.org/jenkins/job/infra-staging-git-tests/
[milo, 2013-02-20] Tried to get debug information via Heapy/Dozer: Heapy works only with Python 2.6, Dozer was not producing any output so far.
[milo, 2013-02-21] Spoke with RhodeCode developer, he looked into the problem too, this is his answer: "so for me it looks like there's no leak, dulwich opens a lot of files to read content, but going to next page, or somewhere else recycles thoses open files infact if bigger repo than bigger amount of open files, i think you should just increase ulimit"
[milo, 2013-02-22] Tried to run RhodeCode without the reverse-proxy: needs to be run as a priviledged user, no SSL support in waitress. Might be possible to use paste for SSL support.
[milo, 2013-02-22] Error when cloning via HTTPS from staging, same error on local instance: error: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none while accessing
[milo, 2013-02-22] Using the default suggested reverse-proxy approach to serve git operation via HTTP, results in git processes running on the server, consuming memory and taking a long time. Can result in connection errors. Serving via Apache seems to be better.
[milo, 2013-02-25] Connection errors on clone operation looks like were due to wrong EC2 node usage, jobs where running on the master.
[milo, 2013-02-25] Updated HACKING file in rhodecode-config repository with more memory readings from the staging server.
[milo, 2013-02-25] Still some errors on the log, but hard to reproduce them (in particular Python error -3 with dulwich library).
[milo, 2013-02-28] Found a new error on the logs: https://pastebin.linaro.org/1884/ looks like being triggered while browsing this: staging.git.linaro.org/boot/u-boot-linaro-next.git/summary
[milo, 2013-02-28] Memory usage of git processes while cloning big repositories spikes other 1GB, average memory usage for medium sized repository is settling around 900MB. Investigating means to reduce this.
[gesha, 2013-03-19] It is now possible to clone/pull repos using dumb http protocol. Still some confusing error messages appear but they don't influence on the result.
[gesha, 2013-04-02] Filed a bug about UI option to control dumb http support from setting menu: https://bugs.launchpad.net/linaro-infrastructure-misc/+bug/1163317

(?)

Work Items

Work items:
[milo] Setup a CI job using git staging instance for cloning code: DONE
[milo] Write script to run git command to create refs file, to make clone work via HTTP: DONE
[milo] Clone repositories from git staging instance using local, slower, connection: DONE
[milo] Monitor git staging instance memory consumption: DONE
[milo] Investigate and implement ways to reduce memory consumption during pull operations (clone/pull): DONE
[gesha] Implement git dumb http support in Rhodecode: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.