At work, we needed to retrieve the full list of jobs a given Jenkins instance was hosting.
Our first solution was to use the jenkinsapi Python package:
import xml.etree.ElementTree as XmlElementTree
from jenkinsapi.jenkins import Jenkins
def get_all_jenkins_jobs(server_url):
jenkins = Jenkins(server_url, lazy=True, timeout=30,
username=os.environ['JENKINS_USERNAME'], password=os.environ['JENKINS_PASSWORD'])
all_jobs = []
for job in jenkins.jobs.values():
config_xml_content = job.get_config() # performs an HTTP request to retrieve config.xml
job_data = {'full_name': job.get_full_name(), 'url': job.url}
job_data.update(extra_job_data_from_xml(config_xml_content))
all_jobs.append(job_data)
return all_jobs
def extra_job_data_from_xml(config_xml_content):
xml_config = XmlElementTree.fromstring(config_xml_content)
git_repo = xml_config.find('./definition/scm/userRemoteConfigs/hudson.plugins.git.UserRemoteConfig/url')
if git_repo is None:
return {}
return {
'scm': git_repo.text,
'scm_http': 'https://' + git_repo.text.replace('git@', '').replace(':', '/').replace('.git', ''),
'branch': xml_config.find('./definition/scm/branches/hudson.plugins.git.BranchSpec/name').text,
'jenkinsfile': xml_config.find('./definition/scriptPath').text,
}
This solution is nice and short. But it has a main drawback: it is very slow.
Hence I decided to bypass the jenkinsapi
package and make direct HTTP calls to the Jenkins API,
but this time using requests-futures to perform requests asynchronously:
from concurrent.futures import as_completed
import requests
from requests.adapters import HTTPAdapter
from requests_futures.sessions import FuturesSession
FOLDER_CLASSES = ('com.cloudbees.hudson.plugins.folder.Folder',
'org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject')
def get_all_jenkins_jobs_async(server_url):
with FuturesSession() as session:
session.auth = (os.environ['JENKINS_USERNAME'], os.environ['JENKINS_PASSWORD'])
session.mount('https://', HTTPAdapter(max_retries=3))
all_jobs_paths = []
folder_paths_to_crawl = ['/']
i = 0
while folder_paths_to_crawl:
print('Async breadth-first pass %s - Processing %s folders' % (i, len(folder_paths_to_crawl)))
next_folder_paths_to_crawl = []
futures = [session.get(server_url + folder_path + '/api/python',
params={'tree': 'jobs[name,color,url]'})
for folder_path in folder_paths_to_crawl]
for future in as_completed(futures):
resp = future.result()
path_prefix = resp.request.path_url[:-len('/api/python?tree=jobs%5Bname%2Ccolor%2Curl%5D')]
for job in resp.json()['jobs']:
job_path = path_prefix + '/job/' + job['name']
if job['_class'] in FOLDER_CLASSES:
next_folder_paths_to_crawl.append(job_path)
else:
all_jobs_paths.append(job_path)
folder_paths_to_crawl = next_folder_paths_to_crawl
i += 1
print('Now retrieving & parsing all config.xml files (%s)' % len(all_jobs_paths))
def response_hook(resp, *_, **__):
'This hook is executed in a dedicated thread'
job_path = resp.request.path_url[:-len('config.xml')]
resp.data = {
'full_name': '/'.join(frag for frag in job_path.split('/') if frag not in ('', 'job')),
'url': server_url + job_path,
}
resp.data.update(extra_job_data_from_xml(resp.text))
session.hooks['response'] = response_hook
all_jobs = []
futures = [session.get(server_url + job_path + '/config.xml') for job_path in all_jobs_paths]
for future in as_completed(futures):
all_jobs.append(future.result().data)
return all_jobs
The resulting code is definitively more verbose, but at least 4 times faster from my tests !
Now a word on the algorithmic approach taken here: we need to crawl a tree starting from its root. For every node, if it is a leaf (an actual WorkflowJob) we collect it, else we need to continue traversing its children.
My initial idea was to start new concurrent Futures
for every child inside its parent Future
callback. However this adds a lot of programmatic complexity
(cannot schedule new futures after shutdown
errors,
an added difficulty to wait on the last Future
completion...).
In the end, I realized that because our Jenkins job tree has a very low depth,
a breadth-first tree traversal
was a very good approach, with a known amount of Future
requests being triggered for every depth level of the tree.
This is the solution implemented above.
I'd be happy to know if you already used requests-futures
in such kind of tree-crawling scenario
(and if you compared it to other solutions),
or to answer any other feedback / question you may have.
PS: Many thanks to Vincent Lae for the initial idea ! 😉