Sahara Specshttps://specs.openstack.org/openstack/sahara-specsen2023, OpenStack Sahara TeamImproved secret storage utilizing castellanhttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/improved-secret-storage.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/improved-secret-storage">https://blueprints.launchpad.net/sahara/+spec/improved-secret-storage</a></p>
<p>There are several secrets (for example passwords) that sahara uses with
respect to deployed frameworks which are currently stored in its database.
This blueprint proposes the usage of the castellan package key manager
interface for offloading secret storage to the OpenStack Key management
service.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are several situations under which sahara stores usernames and
passwords to its database. The storage of these credentials represents
a security risk for any installation that exposes routes to the
controller’s database. To reduce the risk of a breach in user
credentials, sahara should move towards using an external key manager
to control the storage of user passwords.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This specification proposes the integration of the castellan package into
sahara. Castellan is a package that provides a single point of entry to
the OpenStack Key management service. It also provides a pluggable key
manager interface allowing for differing implementations of that service,
including implementations that use hardware security modules (HSMs) and
devices which support the key management interoperability protocol (KMIP).</p>
<p>Using the pluggable interface, a sahara specific key manager will be
implemented that will continue to allow storage of secrets in the
database. This plugin will be the default key manager to maintain
backward compatibility, furthermore it will not require any database
modification or migration.</p>
<p>For users wishing to take advantage of an external key manager,
documentation will be provided on how to enable the barbican key
manager plugin for castellan. Enabling the barbican plugin requires
a few modifications to the sahara configuration file. In this manner,
users will be able to customize the usage of the external key manager
to their deployments.</p>
<p>Example default configuration:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="p">[</span><span class="n">key_manager</span><span class="p">]</span>
<span class="n">api_class</span> <span class="o">=</span> <span class="n">sahara</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">key_manager</span><span class="o">.</span><span class="n">sahara_key_manager</span><span class="o">.</span><span class="n">SaharaKeyManager</span>
</pre></div>
</div>
<p>Example barbican configuration:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="p">[</span><span class="n">key_manager</span><span class="p">]</span>
<span class="n">api_class</span> <span class="o">=</span> <span class="n">castellan</span><span class="o">.</span><span class="n">key_manager</span><span class="o">.</span><span class="n">barbican_key_manager</span><span class="o">.</span><span class="n">BarbicanKeyManager</span>
</pre></div>
</div>
<p>To accomodate the specific needs of sahara, a new class will be created
for interacting with castellan; <code class="docutils literal notranslate"><span class="pre">SaharaKeyManager</span></code>. This class will
be based on the abstract base class <code class="docutils literal notranslate"><span class="pre">KeyManager</span></code> defined in the
castellan package.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">SaharaKeyManager</span></code> class will implement a thin layer around the storage
of secrets without an external key manager. This class will allow sahara
to continue operation as it exists for the Kilo release and thus maintain
backward compatibility. This class will be the default plugin implementation
to castellan.</p>
<p>Example usage:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="kn">from</span> <span class="nn">castellan</span> <span class="kn">import</span> <span class="n">key_manager</span> <span class="k">as</span> <span class="n">km</span>
<span class="kn">from</span> <span class="nn">castellan.key_manager.objects</span> <span class="kn">import</span> <span class="n">passphrase</span>
<span class="n">keymanager</span> <span class="o">=</span> <span class="n">km</span><span class="o">.</span><span class="n">API</span><span class="p">()</span>
<span class="c1"># create secret</span>
<span class="n">new_secret</span> <span class="o">=</span> <span class="n">passphrase</span><span class="o">.</span><span class="n">Passphrase</span><span class="p">(</span><span class="s1">'password_text_here'</span><span class="p">)</span>
<span class="c1"># store secret</span>
<span class="n">new_secret_id</span> <span class="o">=</span> <span class="n">keymanager</span><span class="o">.</span><span class="n">store_key</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="n">new_secret</span><span class="p">)</span>
<span class="c1"># retrieve secret</span>
<span class="n">retrieved_secret</span> <span class="o">=</span> <span class="n">keymanager</span><span class="o">.</span><span class="n">get_key</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="n">new_secret_id</span><span class="p">)</span>
<span class="n">secret_cleartext</span> <span class="o">=</span> <span class="n">retrieved_secret</span><span class="o">.</span><span class="n">get_encoded</span><span class="p">()</span>
<span class="c1"># revoke secret</span>
<span class="n">keymanager</span><span class="o">.</span><span class="n">delete_key</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="n">new_secret_id</span><span class="p">)</span>
</pre></div>
</div>
<p>This solution will provide the capability, through the barbican plugin, to
offload the secrets in such a manner that an attacker would need to
penetrate the database and learn the sahara admin credentials to gain
access to the stored passwords. In essence we are adding one more block
in the path of a would-be attacker.</p>
<p>This specification focuses on passwords that are currently stored in the
sahara database. The following is a list of the passwords that will be moved
to the key manager for this specification:</p>
<ul class="simple">
<li><p>Swift passwords entered from UI for data sources</p></li>
<li><p>Swift passwords entered from UI for job binaries</p></li>
<li><p>Proxy user passwords for data sources</p></li>
<li><p>Proxy user passwords for job binaries</p></li>
<li><p>Hive MySQL passwords for vanilla 1.2.1 plugin</p></li>
<li><p>Hive database passwords for CDH 5 plugin</p></li>
<li><p>Hive database passwords for CDH 5.3.0 plugin</p></li>
<li><p>Hive database passwords for CDH 5.4.0 plugin</p></li>
<li><p>Sentry database passwords for CDH 5.3.0 plugin</p></li>
<li><p>Sentry database passwords for CDH 5.4.0 plugin</p></li>
<li><p>Cloudera Manager passwords for CDH 5 plugin</p></li>
<li><p>Cloudera Manager passwords for CDH 5.3.0 plugin</p></li>
<li><p>Cloudera Manager passwords for CDH 5.4.0 plugin</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>One possible alternative to using an external key manager would be
for sahara to encrypt passwords and store them in swift. This would
satisfy the goal of removing passwords from the sahara database
while providing a level of security from credential theft.</p>
<p>The downside to this methodology is that it places sahara in the position
of arbitrating security transactions. Namely, the use of cryptography in
the creation and retrieval of the stored password data.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>A new configuration option will be provided by the castellan package to
set the key manager implementation. This will be the SaharaKeyManager by
default. Deployers wishing to use barbican might need to set a few more
options depending on their installation. These options will be discussed
in the documentation.</p>
<p>Use of an external key manager will depend on having barbican installed
in the stack where it will be used.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers adding new stored passwords to sahara should always be using
the key manager interface.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>mimccune (Michael McCune)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>create SaharaKeyManager class</p></li>
<li><p>add tests for new class</p></li>
<li><p>add tests for secret storage</p></li>
<li><p>create documentation for external key manager usage</p></li>
<li><p>migrate passwords to key manager</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Castellan package, available through pypi. Currently this version (0.1.0)
does not have a barbican implementation, but it is under review[1].</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be created to exercise the SaharaKeyManager class. There
will also be unit tests for the integrated implementation.</p>
<p>Ideally, functional integration tests will be created to ensure the
proper storage and retrieval of secrets. The addition of these tests
represents a larger change to the testing infrastructure as barbican will
need to be added. Depending on the impact of changing the testing
deployment these might best be addressed in a separate change.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>A new section in the advanced configuration guide will be created to
describe the usage of this new feature.</p>
<p>Additionally this feature should be described in the OpenStack
Security Guide. This will require a separate change request to the
documentation project.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1]: <a class="reference external" href="https://review.openstack.org/#/c/171918">https://review.openstack.org/#/c/171918</a></p>
<p>castellan repository <a class="reference external" href="https://github.com/openstack/castellan">https://github.com/openstack/castellan</a></p>
<p><em>note, the castellan documentation is still a work in progress</em></p>
<p>barbican documentation <a class="reference external" href="https://docs.openstack.org/barbican/latest/">https://docs.openstack.org/barbican/latest/</a></p>
<p>barbican wiki <a class="reference external" href="https://github.com/cloudkeep/barbican/wiki">https://github.com/cloudkeep/barbican/wiki</a></p>
</section>
Thu, 08 Jun 2023 00:00:00 Code refactoring for CDH pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/cdh-plugin-refactoring.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-plugin-refactoring">https://blueprints.launchpad.net/sahara/+spec/cdh-plugin-refactoring</a></p>
<p>This spec is to do some refactoring to the code to allow easier support to new
versions in the future.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>CDH plugin contains many duplicated code. Current implementation extracts some
general and base behavior of the plugin, and each version has its own
implementation for something not included in base classes and modules. But
there are many overlaps between versions because of the downward compatibility.
For example, sahara.plugins.cdh.v5.config_helper extends
sahara.plugins.cdh.db_helper, but functions such as get_plugin_configs are
written again in sahara.plugins.cdh.v5_3_0.config_helper.</p>
<p>And currently the low test coverage of CDH plugin makes it hard to guarantee
the quality of the new code after refactoring. So some new unit test cases need
to be added. And some old test cases may be altered according to refactoring.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>For duplicate codes in each version of plugins, move them to the base class.
In validation module, function validate_cluster_creating is too long to be read
easily. Seprate it into serveral small clearly functions.
We can encapsulate funtions in module into a class for better extensibility.
ClouderaUtils and deploy modules are not going to be changed util CDH v5 are
totally removed, because these modules’ codes in v5 are quite different
from other versions.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>There is another way, just let a new version extends an old one instead of
all extends from base. This may bring problems when deperate an old version.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>jxwang</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>This will require following changes:</p>
<ul class="simple">
<li><p>move the duplicate codes to base class.
Files need to be modified:
sahara/plugin/cdh/version/edp_engine.py: EdpOozieEngine, EdpSparkEngine
sahara/plugin/cdh/version/plugin_utils.py: PluginUtils
sahara/plugin/cdh/version/versionhandler.py: VersionHandler
sahara/plugin/cdh/version/config_helper.py
sahara/plugin/cdh/version/validation.py</p></li>
<li><p>Separate Validation.validate_cluster_creating function</p></li>
<li><p>Add unit test case for low covered modules.</p></li>
<li><p>Remove useless test cases dedicated for refactoring
useless test cases in:
sahara/tests/unit/plugins/cdh/v5/test_versionhandler.py
sahara/tests/unit/plugins/cdh/v5_3_0/test_versionhandler.py
sahara/tests/unit/plugins/cdh/v5_4_0/test_versionhandler.py</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Before starting refactoring, keep current scenario test and provide new unit
tests to ensure the CDH works as well as before. For each version, unit tests
are aslo added individually. Test cases are written for the codes before
refactoring, so we may need a little changes for new codes.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Sun, 16 Aug 2020 00:00:00 Backloghttps://specs.openstack.org/openstack/sahara-specs/specs/backlog_idx.html
<p>This directory is for specifications that have been approved but not yet
implemented. If you are interested in implementing one of these specifications
submit a review request moving it to the appropriate release.</p>
<p>The updated review should reflect the new primary assignee. Please maintain
all previous assignees in the list of Other Contributors.</p>
<p>If a specification has been partially implemented, the document in the backlog
will contain information of what has been completed.</p>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="backlog/boot-from-volume.html">Support for Boot from Volume</a></li>
<li class="toctree-l1"><a class="reference internal" href="backlog/chronic-edp-revival.html">Revival of “Chronic EDP” work</a></li>
<li class="toctree-l1"><a class="reference internal" href="backlog/heat-two-step-scaling.html">Two step scaling with Heat engine</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Juno specshttps://specs.openstack.org/openstack/sahara-specs/specs/juno_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="juno/anti-affinity-via-server-groups.html">Make anti affinity working via server groups</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/append-to-remote-file.html">Append to a remote existing file</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/cdh-plugin.html">Plugin for CDH with Cloudera Manager</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/ceilometer-integration.html">Specification for integration Sahara with Ceilometer</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/cluster-persist-sahara-configuration.html">Store Sahara configuration in cluster properties</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/cluster-secgroups.html">Security groups management in Sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/edp-move-examples.html">Move the EDP examples from the sahara-extra repo to sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/edp-refactor-job-manager.html">[EDP] Refactor job manager to support multiple implementations</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/edp-spark-job-type.html">[EDP] Add a Spark job type (instead of overloading Java)</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/edp-spark-standalone.html">[EDP] Add an engine for a Spark standalone deployment</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/edp-swift-trust-authentication.html">[EDP] Using trust delegation for Swift authentication</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/error-handling-in-provisioning.html">Improve error handling for provisioning operations</a></li>
<li class="toctree-l1"><a class="reference internal" href="juno/move-rest-samples-to-docs.html">Move Sahara REST API samples from /etc to docs</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Kilo specshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="kilo/add-lib-subset-cm-api.html">Add CM API Library into Sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/add-more-cdh-services.html">Add More Services into CDH plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/add-service-test-in-integration.html">Add check service test in integration test</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/add-timeouts-for-polling.html">Add timeouts for infinite polling for smth</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/auth-policy.html">Authorization Policy Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/cdh-hbase-support.html">CDH HBase Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/cdh-version-management.html">Better Version Management in Cloudera Plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/cdh-zookeeper-support.html">CDH Zookeeper Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/default-templates.html">Default templates for each plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/drop-hadoop-2.3-support.html">Remove support of Hadoop 2.3.0 in Vanilla plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-add-hbase-sharelib.html">Add a common HBase lib in hdfs on cluster start</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-add-oozie-shell-action.html">[EDP] Add Oozie Shell Action job type</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-api-json-samples.html">JSON sample files for the EDP API</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-data-sources-in-job-configs.html">[EDP] Add options supporting DataSource identifiers in job_configs</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-hive-vanilla-swift.html">Enable Swift resident Hive tables for EDP with the vanilla plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-improve-compatibility.html">[EDP] Improve Java type compatibility</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-job-types-endpoint.html">[EDP] Add a new job-types endpoint</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/edp-spark-swift-integration.html">Enable Spark jobs to access Swift URL</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/event-log.html">Storage of recently logged events for clusters</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/exceptions-improvement.html">Exceptions improvement</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/first-run-api-usage.html">Use first_run to One-step Start Cluster</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/hdp-plugin-enable-hdfs-ha.html">Enable HDFS NameNode High Availability with HDP 2.0.6 plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/indirect-vm-access.html">Indirect access to VMs</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/mapr-plugin.html">Plugin for Sahara with MapR</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/mapr-refactor.html">Refactor MapR plugin code</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/periodic-cleanup.html">Clean up clusters that are in non-final state for a long time</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/sahara-api-workers.html">Support multi-worker Sahara API deployment</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/sahara-log-guidelines.html">New style logging</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/sahara-support-https.html">HTTPS support for sahara-api</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/scenario-integration-tests.html">Scenario integration tests for Sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/security-guidelines-doc.html">Creation of Security Guidelines Documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/spark-cleanup.html">Spark Temporary Job Data Retention and Cleanup</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/spec-repo-backlog-refactor.html">Specification Repository Backlog Refactor</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/storm-integration.html">Storm Integration</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/support-cinder-api-v2.html">Support Cinder API version 2</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/support-cinder-availability-zones.html">Support Cinder availability zones</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/support-nova-availability-zones.html">Support Nova availability zones</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/support-query-filtering.html">Spec - Add support for filtering results</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/support-template-editing.html">Spec - Add support for editing templates</a></li>
<li class="toctree-l1"><a class="reference internal" href="kilo/volume-instance-locality.html">Cinder volume instance locality functionality</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Liberty specshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="liberty/adding-custom-scenario-tests.html">Adding custom scenario to scenario tests</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/adding-ids-to-logs.html">Adding cluster/instance/job_execution ids to log messages</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/allow-creation-of-multiple-cluster-simultaneously.html">Allow the creation of multiple clusters simultaneously</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/api-for-objects-update.html">Objects update support in Sahara API</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/clients-calls-retry.html">Retry of all OpenStack clients calls</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/cluster-creation-with-trust.html">Use trusts for cluster creation and scaling</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/deprecate-direct-engine.html">Deprecation of Direct Engine</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/drop-hadoop-1-support.html">Drop Hadoop v1 support in provisioning plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/edp-add-spark-shell-action.html">[EDP] Add Spark Shell Action job type</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/edp-datasource-placeholders.html">Allow placeholders in datasource URLs</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/edp-edit-data-sources.html">[EDP] Allow editing datasource objects</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/edp-edit-job-binaries.html">[EDP] Allow editing job binaries</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/enable-cdh-hdfs-ha.html">CDH HDFS HA Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/enable-cdh-rm-ha.html">CDH YARN ResourceManager HA Support</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/hdp-22-support.html">Add support HDP 2.2 plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/heat-hot.html">Migrate to HEAT HOT language</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/heat-template-decomposition.html">Decompose cluster template for Heat</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/keystone-sessions.html">Updating authentication to use keystone sessions</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/manila-as-a-data-source.html">Manila as a runtime data source</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/manila-as-binary-store.html">Addition of Manila as a Binary Store</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/mount-share-api.html">API to Mount and Unmount Manila Shares to Sahara Clusters</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/recommend-configuration.html">Provide ability to configure most important configs automatically</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/sahara-heat-wait-conditions.html">Heat WaitConditions support</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/scenario-test-config-template.html">Use Templates for Scenario Tests Configuration</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/shared-protected-resources.html">Support of shared and protected resources</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/spark-jobs-for-cdh-5-3-0.html">Running Spark Jobs on Cloudera Clusters 5.3.0</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/storm-edp.html">Storm EDP</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/storm-scaling.html">Storm Scaling</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/support-ntp.html">Support NTP service for cluster instances</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/unified-job-interface-map.html">Unified Map to Define Job Interface</a></li>
<li class="toctree-l1"><a class="reference internal" href="liberty/upgrade-oozie-engine-client.html">upgrade oozie Web Service API version of sahara edp oozie engine</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Mitaka specshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="mitaka/add-suspend-resume-ability-for-edp-jobs.html">Add ability of suspending and resuming EDP jobs for sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/allow-public-on-protected.html">Allow ‘is_public’ to be set on protected resources</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/cdh-5-5-support.html">add cdh 5.5 support into sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/cdh-plugin-refactoring.html">Code refactoring for CDH plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/cluster-verification.html">Implement Sahara cluster verification checks</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/deprecate-old-mapr-versions.html">Remove unsupported versions of MapR plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/distributed-periodics.html">Support of distributed periodic tasks</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/improved-secret-storage.html">Improved secret storage utilizing castellan</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/move-scenario-tests-to-separate-repository.html">Move scenario tests to a separate repository</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/recurrence-edp-job.html">Add recurrence EDP jobs for sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/reduce-number-of-dashboard-panels.html">Reduce Number Of Dashboard panels</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/remove-direct-engine.html">Removal of Direct Engine</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/remove-plugin-vanilla-v2.6.0.html">Remove plugin Vanilla V2.6.0</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/replace_is_default_with_is_protected.html">Modify ‘is_default’ behavior relative to ‘is_protected’ for templates</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/scheduling-edp-jobs.html">Add ability of scheduling EDP jobs for sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="mitaka/validate-image-spi.html">SPI Method to Validate Image</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Newton specshttps://specs.openstack.org/openstack/sahara-specs/specs/newton_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="newton/designate-integration.html">Designate Integration</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/image-generation-cli.html">CLI for Plugin-Declared Image Generation</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/improving-anti-affinity.html">Improve anti-affinity behavior for cluster creation</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/initial-kerberos-integration.html">Initial Kerberos Integration</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/pagination.html">Adding pagination and sorting ability to Sahara</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/plugin-management-api.html">Admin API for Managing Plugins</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/python-storm-jobs.html">Allow creation of python topologies for Storm</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/refactor-sahara.service.api.html">Refactor the sahara.service.api module</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/refactor-use-floating-ip.html">Refactor the logic around use of floating ips in node groups and clusters</a></li>
<li class="toctree-l1"><a class="reference internal" href="newton/spark-jobs-for-vanilla-hadoop.html">Run Spark jobs on vanilla Hadoop 2.x</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Ocata specshttps://specs.openstack.org/openstack/sahara-specs/specs/ocata_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="ocata/data-source-plugin.html">Data Source and Job Binary Pluggability</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Pike specshttps://specs.openstack.org/openstack/sahara-specs/specs/pike_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="pike/deprecate-centos6-images.html">Deprecation of CentOS 6 images</a></li>
<li class="toctree-l1"><a class="reference internal" href="pike/node-group-template-and-cluster-template-portability.html">Node Group Template And Cluster Template Portability</a></li>
<li class="toctree-l1"><a class="reference internal" href="pike/support-for-s3-compatible-object-stores.html">Support for S3-compatible object stores</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Queens specshttps://specs.openstack.org/openstack/sahara-specs/specs/queens_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="queens/add-hbase-on-vanilla-cluster.html">Add HBase on Vanilla cluster</a></li>
<li class="toctree-l1"><a class="reference internal" href="queens/api-v2-experimental-impl.html">There and back again, a roadmap to API v2</a></li>
<li class="toctree-l1"><a class="reference internal" href="queens/decommission-of-specific-node.html">Decommission of specific instance</a></li>
<li class="toctree-l1"><a class="reference internal" href="queens/force-delete-clusters.html">Force Delete Clusters</a></li>
<li class="toctree-l1"><a class="reference internal" href="queens/remove-job-binary-internal.html">Remove Job Binary Internal</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Rocky specshttps://specs.openstack.org/openstack/sahara-specs/specs/rocky_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="rocky/placeholder.html">placeholder</a></li>
<li class="toctree-l1"><a class="reference internal" href="rocky/plugins-outside-sahara-core.html">Plugins outside Sahara core</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Sahara Image Elementshttps://specs.openstack.org/openstack/sahara-specs/specs/sahara-image-elements_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="sahara-image-elements/sahara-bare-metal-images.html">Add an option to sahara-image-create to generate bare metal images</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Sahara testshttps://specs.openstack.org/openstack/sahara-specs/specs/sahara-tests_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="sahara-tests/api-for-sahara-scenario.html">Add an API to sahara-scenario for integration to another frameworks</a></li>
<li class="toctree-l1"><a class="reference internal" href="sahara-tests/sahara-scenario-feature-sets.html">Feature sets for scenario tests</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Sahara clienthttps://specs.openstack.org/openstack/sahara-specs/specs/saharaclient_idx.html
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="saharaclient/cli-as-openstackclient-plugin.html">SaharaClient CLI as an OpenstackClient plugin</a></li>
<li class="toctree-l1"><a class="reference internal" href="saharaclient/cli-delete-by-multiple-names-or-ids.html">CLI: delete by multiple names or ids</a></li>
</ul>
</div>
Fri, 20 Sep 2019 00:00:00 Feature sets for scenario testshttps://specs.openstack.org/openstack/sahara-specs/specs/sahara-tests/sahara-scenario-feature-sets.html
<p>The runner and the templates for scenario tests do not provide an easy
way to specify the dependency between a certain item (i.e. an EDP job for
a certain cluster) and its configuration (i.e. credentials, the definition
of the job). This proposal tries to address the problem.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>A key feature of sahara-scenario is the support for test templates
They support parameters which allow tester to describe different
test scenarios without writing multiple copies of test templates
which differs only by few arguments.
This flexibility is somehow limited when testing features like S3 support.
Testing S3 integration requires few additional details
to be specified when configuring the tests:</p>
<ul class="simple">
<li><p>the credentials required to access the S3 API;</p></li>
<li><p>the definitions of the new EDP jobs which uses S3;</p></li>
<li><p>for each cluster where the jobs needs to be executed,
the name of the EDP job.</p></li>
</ul>
<p>The first two items can be easily added to new files
that can be specified as argument for the sahara-scenario
command (for example <cite>credentials_s3.yaml.mako</cite> and
<cite>edp_s3.yaml.mako</cite>). The keys that they contain
(<cite>credentials</cite> and <cite>edp_jobs_flow</cite>) are merged together
by the runner.
Their content may also be added directly to the existing
YAML files (<cite>credentials.yaml.mako</cite> and <cite>edp_s3.yaml.mako</cite>)
but that would mean adding a long list of default values
for all the arguments in the template.</p>
<p>The third item is more complicated to model, because
there is no easy way to override an item in a <cite>cluster</cite>
element, which is a list, not a dictionary.</p>
<p>While it could be possible to introduce a command line parameter
to override the items in a specific cluster, that would
still leave up to the user to remember to specify all
the required bits to enable S3 testing.</p>
<p>A more general solution to the problem is the definition
of feature sets for testing.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The user of sahara-scenario would only need to pass an argument like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span><span class="o">-</span><span class="n">scenario</span> <span class="o">...</span> <span class="o">--</span><span class="n">feature</span> <span class="n">s3</span> <span class="o">...</span>
</pre></div>
</div>
<p>If the <cite>-p</cite> and <cite>-v</cite> arguments are specified, for each <cite>feature</cite> argument
<cite>sahara-scenario</cite> will include the files <cite>credentials_<feature>.yaml.mako</cite>
and <cite>edp_<feature>.yaml.mako</cite>, if they exist.</p>
<p>In addition, from the list of EDP jobs specified for all enabled clusters,
all the items marked with <cite>feature: s3</cite> will be selected.</p>
<p>This means that items without the <cite>feature</cite> tag will always be executed,
while items with <cite>feature</cite> will be executed only when the associated
feature(s) are selected.</p>
<p>The initial implementation will focus on EDP jobs, but other items
may benefit from the tagging.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Use conditional statements in the YAML file. But the author
of this spec requested multiple times to keep the YAML files
free of business logic and purely declarative instead
when the <cite>sahara-scenario</cite> initial spec was discussed.</p>
<p>Another possible solution is the duplication of the YAML templates,
which is going against maintainability and easiness of usage.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>The new test runner will be backward compatible with the old templates,
but new templates will require the new runner, but this should not be
a problem.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="image-packing-impact">
<h3>Image Packing impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ltoscano</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>extend the runner to accept the <cite>feature</cite> argument</p></li>
<li><p>add new EDP jobs and add the feature marker to the EDP jobs
which need it, extending the existing attempt
of S3 testing (see <a class="reference internal" href="#spec-references"><span class="std std-ref">References</span></a>.)
which is an early attempt to solve the issue.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>The new argument and the merging of values will be covered by unit tests.
The regression testing of will be covered by the <cite>sahara-tests-scenario</cite>
jobs in the gate.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The new arguments and its usage will be documented from the user and
the test writer point of view.</p>
</section>
<section id="references">
<span id="spec-references"/><h2>References</h2>
<p>Initial work to support S3 testing:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://review.openstack.org/610920">https://review.openstack.org/610920</a></p></li>
<li><p><a class="reference external" href="https://review.openstack.org/590055">https://review.openstack.org/590055</a></p></li>
</ul>
</section>
Tue, 04 Dec 2018 00:00:00 placeholderhttps://specs.openstack.org/openstack/sahara-specs/specs/rocky/placeholder.html
<p>placeholder</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>placeholder</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>placeholder</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>placeholer</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>None</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>None</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 15 Mar 2018 00:00:00 Plugins outside Sahara corehttps://specs.openstack.org/openstack/sahara-specs/specs/rocky/plugins-outside-sahara-core.html
<p>Plugins are a very important part of Sahara, they allow the creation of
clusters for different data processing tools and the execution of jobs on those
clusters.
This proposal is to remove the plugins code from Sahara core and create a new
project to host the plugins.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>With the plugins code inside Sahara core we are limited to upgrade plugins
versions with the cycle milestone. It also forces the user to upgrade OpenStack
version whenever he/she needs to upgrade Sahara plugins.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We are going to move the plugins to its own project, releasing new versions
when we upgrade new plugins, thus allowing the users to upgrade to newer
versions without the hussle of upgrading the whole cloud.</p>
<p>In order to keep the projects as less coupled as possible we are implementing a
mediator under sahara/plugins that will be used as an API between the projects.
Also this API aims to facilitate manutenability of both projects.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep the plugins code as it is. Not changing will not break anything or make
things more difficult to the users.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>User will be able to upgrade plugins versions a lot faster once this is done.
Deployer will have to keep an eye for the compatibility between sahara and
sahara-plugins if there is significant changes to the API.</p>
<p>There is also an impact for packagers and translators since we will need to do
one-time work to setup and copy translations in the new repository.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>With a new project, developers will have to get used to the fact that plugins
don’t live with the core anymore.
There is a new API (mediator) implemented on the sahara side that will be the
bridge between the two projects. Developers must respect that mediator and
significant changes to that will require version bumping or branching.</p>
</section>
<section id="image-packing-impact">
<h3>Image Packing impact</h3>
<p>Image packing using the new image generation and validation system will
require to have sahara-plugins installed as well.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tellesnobrega</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Split the plugins code from Sahara core</p></li>
<li><p>Bring plugins unit test to the plugins repo</p></li>
<li><p>Make sure imports in sahara-plugins from sahara are well structured so not to
break with sahara changes later on</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Move plugins tests from sahara core to sahara-plugins</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We need to update the documentation to reflect the change and make sure users
and developers are well aware of this new structure.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 15 Mar 2018 00:00:00 Force Delete Clustershttps://specs.openstack.org/openstack/sahara-specs/specs/queens/force-delete-clusters.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-force-delete">https://blueprints.launchpad.net/sahara/+spec/sahara-force-delete</a></p>
<p>Let’s resolve a long-standing complaint.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>An oft-reported issue is that a Sahara cluster hangs on deletion. This can be
a major user headache, and also looks really bad. Usually the problem is not
the fault of Sahara, and rather the fault of some undeletable resource.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We will make use of Heat’s stack abandoning feature. Essentially, this means
the stack will get deleted but leave resources in place. It’s a last-ditch
effort to be made when undeleteable (or slow-to-delete) resources are present
in the stack.</p>
<p>This will be done through a new addition to Sahara API which wraps Heat stack
abandon. And it’s probably best to release as part of APIv2 rather than
continue to break the rules that we have broken so often for APIv1.1.</p>
<p>Note that even though Heat stack abandon does not clean up any resources, we
should not have Sahara try to clean them up manually.</p>
<p>The above point is justified by two things:</p>
<ul class="simple">
<li><p>This gerrit comment [0]</p></li>
<li><p>If abandoning is needed, it’s really hard to delete the resource anyway</p></li>
</ul>
<p>It’s best to create an API which wraps stack abandon, rather than just telling
users to use abandon directly, because there’s other cleanup to do during
cluster delete. See [1] for more info.</p>
<p>The change will enable the following workflow: force delete gets called and
cleans up cluster as best it can, then user gets to handle orphaned resources
themselves. Thanks to explicit abandon through Sahara API, users are always
encouraged to make sure stack is in deleting state first, so that amount of
orphaned resources is minimized.</p>
<p>With regards to the above point, it is absolutely crucial that we do not
simply enhance the regular delete call to include an abandon call. There are
two key reasons to avoid that:</p>
<ul class="simple">
<li><p>In normal operation, Heat stack delete does retry</p></li>
<li><p>In normal use, probably the user wants cluster to stay in deleting state
until resources actually gone: force delete is just for emergencies</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Just tell users/operators to use Heat’s stack abandon manually. That’s not a
great choice for the reasons discussed above.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>We’ll add the following endpoint:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">DELETE /v2/clusters/{cluster_id}/force</span>
</pre></div>
</div>
<ul class="simple">
<li><p>It’ll give 204 on success just like regular DELETE.</p></li>
<li><p>It’ll give the usual 4xx errors in the usual ways.</p></li>
<li><p>It’ll give 503 (on purpose) when Heat stack abandon is unavailable.</p></li>
<li><p>Request body, headers, query, etc are the same as regular DELETE.</p></li>
</ul>
<p>Again, best practice says make this a v2 exclusive, rather than further dirty
the existing v1.1 API.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Need to extend Saharaclient API bindings and OSC in order to support the new
API methods.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>They need to enable_stack_abandon=True in heat.conf and make sure that their
cloud policy does allow users to do such an operation.</p>
<p>We could try to do something fancy with RBAC and trusts so that Sahara service
user may abandon Heat stacks on behalf of the user, when the operator wishes
to restrict stack abandon. But it might not be worth messing with that…</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="image-packing-impact">
<h3>Image Packing impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Yes, expose the functionality in the UI. We could put some warnings about
force-delete’s implications as well.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Jeremy Freudberg</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Remove the last bit of direct engine code
(without this change, even an abandoned stack may still be stuck, as Sahara
engine is trying manual cleanup; it also entails that after this change is
made, the bug of fully deleted stack but cluster still in deleting state is
essentially resolved…)</p></li>
<li><p>Create the new API bits and corresponding operation</p></li>
<li><p>Add functionality to client and dashboard</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Probably scenario tests are not strictly needed for this feature.</p>
<p>Beyond the obvious unit tests, there will also be updates to the API tests in
the tempest plugin.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Nothing out of the ordinary, but important to keep in mind both operator and
developer perspective.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[0] <a class="reference external" href="https://review.openstack.org/#/c/466778/20/sahara/service/engine.py@276">https://review.openstack.org/#/c/466778/20/sahara/service/engine.py@276</a></p>
<p>[1] <a class="reference external" href="https://github.com/openstack/sahara/blob/master/sahara/service/ops.py#L355">https://github.com/openstack/sahara/blob/master/sahara/service/ops.py#L355</a></p>
</section>
Thu, 19 Oct 2017 00:00:00 Add HBase on Vanilla clusterhttps://specs.openstack.org/openstack/sahara-specs/specs/queens/add-hbase-on-vanilla-cluster.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/hbase-on-vanila">https://blueprints.launchpad.net/sahara/+spec/hbase-on-vanila</a></p>
<p>Apache HBase provides large-scale tabular storage for Hadoop using
the Hadoop Distributed File System(HDFS). This document serves as
a description to add the support of HBase and ZooKeeper services on
Vanilla cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara vanilla plugin allows user to quickly provision a cluster
with many core services, but it doesn’t support HBase and ZooKeeper.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>To go against the Vanilla cluster distributed architecture, we only support
fully-distributed HBase deployment. In a distributed configuration,
the cluster contains multiple nodes, each of which runs one or more HBase
Daemon. These include HBase Master instance, multiple ZooKeeper nodes and
multiple RegionServer nodes.</p>
<p>A distributed HBase installation depends on a running ZooKeeper cluster.
HBase default manages a ZooKeeper “cluster” for you, but you can also
manage the ZooKeeper ensemble independent of HBase. The variable
“HBASE_MANAGES_ZK” in “conf/hbase-env.sh”, which default to true, tells
HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase.</p>
<p>We should expose this variable in “cluster_configs” to let user determine
the creator of ZooKeeper service.</p>
<p>In production, it is recommended that run a ZooKeeper ensemble of 3, 5 or 7
machines; the more members an ensemble has, the more tolerant the ensemble
is of host failures. Also, run an odd number of machines. An even number
of peers is supported, but it is normally not used because an even sized
ensemble requires, proportionally, more peers to form a quorum than an odd
sized ensemble requires.</p>
<ul class="simple">
<li><p>If we set “HBASE_MANAGES_ZK” to false, Sahara will validate the number
of ZooKeeper services in node groups to keep ZK instances in odd number.</p></li>
<li><p>If we set “HBASE_MANAGES_ZK” to true, Sahara will automatically
determine the instances to start ZooKeeper. The cluster contains ZK
nodes more than 1 nodes, less than 5 nodes. If we want to have more
ZK nodes, setting HBASE_MANAGES_ZK to false would be a good choice.</p></li>
</ul>
<p>If we want to scale the cluster up or down, ZooKeeper and HBase services
will be restarted. And after scaling up or down, the rest of ZooKeeper nodes
should also be kept in odd number. If there is only one ZooKeeper node, the
status of ZooKeeper service will be “standalone”.</p>
<p>One thing should be specified is the default value used in configuration:</p>
<p>ZooKeeper Configuration in “/opt/zookeeper/conf/zoo.cfg”:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">dataDir=/var/data/zookeeper</span>
<span class="go">clientPort=2181</span>
<span class="go">server.1=zk-0:2888:3888</span>
<span class="go">server.2=zk-1:2888:3888</span>
</pre></div>
</div>
<p>HBase Configuration in “/opt/hbase/conf/hbase-site.xml”:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">hbase.tmp.dir=/var/data/hbase</span>
<span class="go">hbase.rootdir=hdfs://master:9000/hbase</span>
<span class="go">hbase.cluster.distributed=true</span>
<span class="go">hbase.master.port=16000</span>
<span class="go">hbase.master.info.port=16010</span>
<span class="go">hbase.regionserver.port=16020</span>
</pre></div>
</div>
<p>Security Group will open ports (2181, 2888, 3888, 16000, 16010, 16020) after
this change if configuration is not changed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<ul class="simple">
<li><p>Build new Vanilla image includes ZK and HBase packages</p></li>
</ul>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<ul class="simple">
<li><p>An option should be added to the Node Group create and update forms.</p></li>
</ul>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Shu Yingya</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Build new image by sahara-image-elements</p></li>
<li><p>Add ZooKeeper to Vanilla in sahara</p></li>
<li><p>Add HBase to Vanilla in sahara</p></li>
<li><p>Update Sahara-dashboard to choose ZK creator in sahara-dashboard</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit test coverage in sahara</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<ul class="simple">
<li><p>Vanilla plugin description should be updated</p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 18 Oct 2017 00:00:00 Decommission of specific instancehttps://specs.openstack.org/openstack/sahara-specs/specs/queens/decommission-of-specific-node.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/decommission-specific-instance">https://blueprints.launchpad.net/sahara/+spec/decommission-specific-instance</a></p>
<p>When facing issues with a cluster it can be useful to remove an specific
instance. The way that Sahara is constructed today allows the user to scale a
cluster down but it will choose a random instance from the selected node group
to be removed. We want to give the users the opportunity to choose which
instance(s) he/she would like to remove from the cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Users may need to remove specific node to make cluster healthier. This is not
possible today.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We will add the possibility for the user to choose the specific instance to
remove from the cluster.</p>
<p>After selecting the node group from which the instance will be removed the user
will be allowed to choose the instance or instances to be removed.</p>
<p>We will also allow wildcard removal, if the user wants to randomly select the
instance he can just leave it blank as well as if more than one instance is
being deleted the user will be able to choose each instance to be deleted or
just a subset and Sahara will choose the rest.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep randomly selecting instance to be scaled down.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>We will change the body request for scale cluster.</p>
<p>Currently the body should be like this:</p>
<dl>
<dt>{</dt><dd><dl>
<dt>“add_node_groups”: [</dt><dd><dl class="simple">
<dt>{</dt><dd><p>“count”: 1,
“name”: “b-worker”,
“node_group_template_id”: “bc270ffe-a086-4eeb-9baa-2f5a73504622”</p>
</dd>
</dl>
<p>}</p>
</dd>
</dl>
<p>],</p>
<dl>
<dt>“resize_node_groups”: [</dt><dd><dl class="simple">
<dt>{</dt><dd><p>“count”: 4,
“name”: “worker”</p>
</dd>
</dl>
<p>}</p>
</dd>
</dl>
<p>]</p>
</dd>
</dl>
<p>}</p>
<p>We will change the second part to support an extra parameter:</p>
<dl>
<dt>{</dt><dd><dl>
<dt>“add_node_groups”: [</dt><dd><dl class="simple">
<dt>{</dt><dd><p>“count”: 1,
“name”: “b-worker”,
“node_group_template_id”: “bc270ffe-a086-4eeb-9baa-2f5a73504622”</p>
</dd>
</dl>
<p>}</p>
</dd>
</dl>
<p>],</p>
<dl>
<dt>“resize_node_groups”: [</dt><dd><dl class="simple">
<dt>{</dt><dd><p>“count”: 4,
“name”: “worker”,
“instances”: [“instance_id1”, “instance_id2”]</p>
</dd>
</dl>
<p>}</p>
</dd>
</dl>
<p>]</p>
</dd>
</dl>
<p>}</p>
<p>In case the user does not specify instances to be removed the parameter will
not be passed and we will act on removing with the current approach.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>CLI command will have a new option to select instances.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>We will need to add a place for user to select the instances to be removed. It
can be after the NG selection and we add the number of selector of instances to
be removed with the option to leave it blank.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Telles Nobrega</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add the possibility to select an instance when scaling down</p></li>
<li><p>Add CLI option to select instances to be removed</p></li>
<li><p>Add UI option to select instances to be removed</p></li>
<li><p>Unit tests</p></li>
<li><p>Documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be needed.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Nothing out of the ordinary, but important to keep in mind both user and
developer perspective.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 05 Oct 2017 00:00:00 Initial Kerberos Integrationhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/initial-kerberos-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/initial-kerberos-integration">https://blueprints.launchpad.net/sahara/+spec/initial-kerberos-integration</a></p>
<p>There is no way to enable kerberos integration for clusters deployed
by Sahara. By the way, default managament services like Cloudera Manager
and Ambari Management console already have support of configuring Kerberos
for deployed clusters. We should make initial integration of this feature in
sahara for Cloudera and Ambari plugins.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Users wants to use Kerberos to make clusters created by Sahara more
secure.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The proposed changes will be the following:</p>
<blockquote>
<div><ul class="simple">
<li><p>the initial MIT of KDC integration on cluster nodes. Also
admin principal will be created for sahara usages (and
sahara will store that using improved secret storage);</p></li>
<li><p>ability to configure Kerberos on clusters created by using
Cloudera Manager API (see the reference below [0], [1], [2]) and
Ambari (see the reference below [3]) will be added;</p></li>
<li><p>remote operations which are requires auth should be wrapped by
ticket granting method so that sahara will be able to perform
operations with hdfs and something like that;</p></li>
<li><p>Oozie client should be re-implemented to reflect these changes.
By default, in case when cluster is not deployed with Kerberos
(like vanilla) we will continue using requests python library
to bind Oozie API (without needed auth). The new Oozie client
should be implemented in case of kerberos implemented. This
client will use standard remote implementation and curl in order
to process auth to Oozie with ticket granting. As another way, we can use
request-kerberos for that goal, but it’s not good solution since this lib
are not python3 compatible.</p></li>
</ul>
</div></blockquote>
<p>New config options will be added in general section of cluster template (or
cluster):</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">Enable</span> <span class="pre">Kerberos</span></code>: will enable the kerberos security for cluster
(Ambari or CDH);</p></li>
</ul>
</div></blockquote>
<p>As additional enhancement support of using existing KDC server can be added.
In such case, additional options are required for that:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">Use</span> <span class="pre">existing</span> <span class="pre">KDC</span></code>: will enable using existing KDC server, additional
data should be provided then;</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Admin</span> <span class="pre">principal</span></code>: admin principal to have ability to create new
principals;</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Admin</span> <span class="pre">password</span></code>: will be hidden from API outputs and will be stored in
improved secret storage;</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">KDC</span> <span class="pre">server</span> <span class="pre">hostname</span></code>: hostname of KDC server</p></li>
</ul>
</div></blockquote>
<p>If something additional will be needed to identify KDC server it also will
be added to general section of configs.</p>
<p>Other possible improvements can be done after implementation of steps above.
By the way, initial implementation will only include steps above.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None. If needed, extra field will be used for that goal.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>All packages required for KDC infrastucture should be included on our
images. If something is not presented, it will be additionally installed.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Nothing additional is required on Sahara dashboard side, since all needed
options are in general section and will be prepared as usual</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev (Vitaly Gridnev) and msionkin (Michael Ionkin)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>All working items are will described in Proposed change. Additionally,
we can include testing initial Kerberos integration on Sahara CI, but only
when all steps are completed. For sure, unit tests will be added.</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements. As initial idea, nothing additional
should not be added to current sahara requirements, all needed packages
will be included only on sahara images.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Sahara CI will cover that change, unit tests for sure will be added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>New sections can be added with description of Kerberos integration
in Ambari and Cloudera.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[0] <a class="reference external" href="https://github.com/cloudera/cm_api/blob/f4431606a690d95208457a64d1cc2610d9cfa2bf/python/src/cm_api/endpoints/cms.py#L134">https://github.com/cloudera/cm_api/blob/f4431606a690d95208457a64d1cc2610d9cfa2bf/python/src/cm_api/endpoints/cms.py#L134</a>
[1] <a class="reference external" href="https://github.com/cloudera/cm_api/blob/f4431606a690d95208457a64d1cc2610d9cfa2bf/python/src/cm_api/endpoints/clusters.py#L585">https://github.com/cloudera/cm_api/blob/f4431606a690d95208457a64d1cc2610d9cfa2bf/python/src/cm_api/endpoints/clusters.py#L585</a>
[2] <a class="reference external" href="http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_sg_using_cm_sec_config.html">http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_sg_using_cm_sec_config.html</a>
[3] <a class="reference external" href="https://cwiki.apache.org/confluence/display/AMBARI/Automated+Kerberizaton#AutomatedKerberizaton-TheRESTAPI">https://cwiki.apache.org/confluence/display/AMBARI/Automated+Kerberizaton#AutomatedKerberizaton-TheRESTAPI</a></p>
</section>
Wed, 20 Sep 2017 00:00:00 [EDP] Add a new job-types endpointhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-job-types-endpoint.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-job-types-endpoint">https://blueprints.launchpad.net/sahara/+spec/edp-job-types-endpoint</a></p>
<p>Add a new job-types endpoint that can report all supported job types for
a Sahara instance and which plugins support them.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are currently two problems around job types in Sahara that impact
user experience:</p>
<ul class="simple">
<li><p>The current <em>/jobs/config-hints/<job_type></em> endpoint is not adequate for
providing configuration hints because it does not take into account plugin
type or the framework version. This endpoint was meant to give users a list
of likely configuration values for a job that they may want to modify
(at one point the UI incorporated the hints as well). Although some config is
common across multiple plugins, hints need to be specific to the plugin and
the framework version in order to be useful. The endpoint does not take
a cluster or plugin argument and so hints must be general.</p></li>
<li><p>A user currently has no indicator of the job types that are actually
available from a Sahara instance (the UI lists them all). The set of
valid job types is based on the plugins loaded for the current instance.
Furthermore, not all job types will be available to run on all
clusters launched by the user because they are plugin dependent.</p></li>
</ul>
<p>These problems should be solved without breaking backward compatibility in
the REST API.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add a new endpoint that will indicate for the running Sahara instance
which job types are supported by which versions of which plugins.
Optionally, plugin-and-version-specific config hints will be included
for each supported job type.</p>
<p>Because config hints can be very long, they will not be included in a
response by default. A query string parameter will be used to indicate
that they should be included.</p>
<p>The endpoint will support the following optional query strings for filtering.
Each may be used more than once to query over a list of values, for example
<cite>type=Pig&type=Java</cite>:</p>
<ul class="simple">
<li><p><strong>type</strong>
A job type to consider. Default is all job types.</p></li>
<li><p><strong>plugin</strong>
A plugin to consider. Default is all plugins.</p></li>
<li><p><strong>version</strong>
A plugin version to consider. Default is all versions.</p></li>
</ul>
<p>The REST API method is specified in detail below under <em>REST API impact</em>.</p>
<p>We will need two new optional methods in the <cite>Plugin SPI</cite>. This information
ultimately comes from the EDP engine(s) used by a plugin but we do
not want to actually allocate an EDP engine object for this so the
existing <strong>get_edp_engine()</strong> will not suffice (and besides, it requires
a cluster object):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">get_edp_job_types</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">versions</span><span class="o">=</span><span class="p">[]):</span>
<span class="k">return</span> <span class="p">[]</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">get_edp_config_hints</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">job_type</span><span class="p">,</span> <span class="n">version</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{}</span>
</pre></div>
</div>
<p>These specific methods are mentioned here because they represent a
change to the public <cite>Plugin SPI</cite>.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Fix the existing <em>/jobs/config-hints</em> endpoint to take a cluster id or a
plugin-version pair and return appropriate config hints. However, this
would break backward compatibility.</p>
<p>Still add an additional endpoint to retrieve the supported job types
for the Sahara instance separate from config hints.</p>
<p>However, it makes more sense to deprecate the current config-hints interface
and add the new endpoint which serves both purposes.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Backward compatibility will be maintained since this is a new endpoint.</p>
<p><strong>GET /v1.1/{tenant_id}/job-types</strong></p>
<p>Normal Response Code: 200 (OK)</p>
<p>Errors: none</p>
<p>Indicate which job types are supported by which versions
of which plugins in the current instance.</p>
<dl>
<dt><strong>Example</strong></dt><dd><p><strong>request</strong></p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span/>GET http://sahara/v1.1/775181/job-types
</pre></div>
</div>
<p><strong>response</strong></p>
<div class="highlight-http notranslate"><div class="highlight"><pre><span/><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/json</span>
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"job_types"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hive"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Apache Vanilla plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Vanilla Apache Hadoop"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.2.1"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Java"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Apache Vanilla plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Vanilla Apache Hadoop"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.2.1"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"MapReduce"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Apache Vanilla plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Vanilla Apache Hadoop"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.2.1"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"MapReduce.Streaming"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Apache Vanilla plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Vanilla Apache Hadoop"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.2.1"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Pig"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Apache Vanilla plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Vanilla Apache Hadoop"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.2.1"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div>
</div>
</dd>
</dl>
<p>The job-types endpoint returns a list. Each item in the list is
a dictionary describing a job type that is supported by the
running Sahara. Notice for example that the <em>Spark</em> job type is missing.</p>
<p>Each job type dictionary contains the name of the job type and
a list of plugins that support it.</p>
<p>For each plugin, we include the basic identifying information and then
a <cite>versions</cite> dictionary. Each entry in the versions dictionary has
the name of the version as the key and the corresponding config hints
as the value. Since this example did not request config hints, the
dictionaries are empty.</p>
<p>Here is an example of a request that uses the plugin and version filters:</p>
<dl>
<dt><strong>Example</strong></dt><dd><p><strong>request</strong></p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span/>GET http://sahara/v1.1/775181/job-types?plugin=hdp&version=2.0.6
</pre></div>
</div>
<p><strong>response</strong></p>
<div class="highlight-http notranslate"><div class="highlight"><pre><span/><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/json</span>
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"job_types"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hive"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Java"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"MapReduce"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"MapReduce.Streaming"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Pig"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.0.6"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div>
</div>
</dd>
</dl>
<p>Here is another example that enables config hints and also filters by plugin,
version, and job type.</p>
<dl>
<dt><strong>Example</strong></dt><dd><p><strong>request</strong></p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span/>GET http://sahara/v1.1/775181/job-types?hints=true&plugin=hdp&version=1.3.2&type=Hive
</pre></div>
</div>
<p><strong>response</strong></p>
<div class="highlight-http notranslate"><div class="highlight"><pre><span/><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/json</span>
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"job_types"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hive"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"The Hortonworks Sahara plugin."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"hdp"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Hortonworks Data Platform"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"versions"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"1.3.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"job_config"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"args"</span><span class="p">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nt">"configs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Reduce tasks."</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"mapred.reduce.tasks"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"-1"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">"params"</span><span class="p">:</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">]</span>
<span class="p">}</span>
</pre></div>
</div>
</dd>
</dl>
<p>This is an abbreviated example that shows imaginary config hints.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The python-saharaclient should be extended to support this as well:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/>$ sahara job-types-list [--type] [--plugin [--plugin-version]]
</pre></div>
</div>
<p>Output should look like this (not sure where else to specify this):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="o">+---------------------+-----------------------------------+</span>
<span class="o">|</span> <span class="n">name</span> <span class="o">|</span> <span class="n">plugin</span><span class="p">(</span><span class="n">versions</span><span class="p">)</span> <span class="o">|</span>
<span class="o">+---------------------+-----------------------------------+</span>
<span class="o">|</span> <span class="n">Hive</span> <span class="o">|</span> <span class="n">vanilla</span><span class="p">(</span><span class="mf">1.2.1</span><span class="p">),</span> <span class="n">hdp</span><span class="p">(</span><span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">2.0.6</span><span class="p">)</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">Java</span> <span class="o">|</span> <span class="n">vanilla</span><span class="p">(</span><span class="mf">1.2.1</span><span class="p">),</span> <span class="n">hdp</span><span class="p">(</span><span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">2.0.6</span><span class="p">)</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">MapReduce</span> <span class="o">|</span> <span class="n">vanilla</span><span class="p">(</span><span class="mf">1.2.1</span><span class="p">),</span> <span class="n">hdp</span><span class="p">(</span><span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">2.0.6</span><span class="p">)</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">MapReduce</span><span class="o">.</span><span class="n">Streaming</span> <span class="o">|</span> <span class="n">vanilla</span><span class="p">(</span><span class="mf">1.2.1</span><span class="p">),</span> <span class="n">hdp</span><span class="p">(</span><span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">2.0.6</span><span class="p">)</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">Pig</span> <span class="o">|</span> <span class="n">vanilla</span><span class="p">(</span><span class="mf">1.2.1</span><span class="p">),</span> <span class="n">hdp</span><span class="p">(</span><span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">2.0.6</span><span class="p">)</span> <span class="o">|</span>
<span class="o">+---------------------+-----------------------------------+</span>
</pre></div>
</div>
<p>Since config hints can return so much information, and description
fields for instance can contain so much text, how to support
config hints through the python-saharaclient is TBD.</p>
<p>As noted above, the <cite>Plugin SPI</cite> will be extended with optional
methods. Existing plugins that support EDP will be modified as
part of this change.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>The UI will be able to take advantage of this information
and filter the job types available to the user on the forms.
It will also be able to make use of config hints.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tmckay</p>
</dd>
<dt>Other contributors:</dt><dd><p>none</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add basic endpoint support with optional methods in the plugin SPI</p></li>
<li><dl class="simple">
<dt>Implement the methods for each plugin that supports EDP</dt><dd><p>This can be done as a series of separate small CRs</p>
</dd>
</dl>
</li>
<li><p>Add support to python-saharaclient</p></li>
<li><p>Update documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit tests</p></li>
<li><p>Tempest tests for API</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>It should be added to the REST API doc.</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Fri, 18 Aug 2017 00:00:00 [EDP] Allow editing datasource objectshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/edp-edit-data-sources.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-edit-data-sources">https://blueprints.launchpad.net/sahara/+spec/edp-edit-data-sources</a></p>
<p>Currently there is no way to edit a datasource object. If a path needs
to be changed, for example, the datasource must be deleted and a new
one created. The most common use case is a situation where a user
creates a datasource, runs a job, and receives an error from the job
because the path does not exist. Although it is not strictly necessary,
editable datasource objects would be a convenience when a user needs to
correct a path or credentials.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are no API methods for updating a datasource object in the
REST API or at the conductor level.</p>
<p>The only way to correct a datasource is to delete an existing one and
create a new one with corrected information. Although it is possible to
use the same name, the object id will be different.</p>
<p>If editing is allowed, a user only needs to do a single operation to
make corrections. Additionally, the id is preserved so that objects which
reference it will reference the corrected path.</p>
<p>In the general case, editing a datasource should not be a problem for
a job execution which references it. Once a job execution enters the “RUNNING”
state, any information in datasource objects it references has been extracted
and passed to the process running the job. Consequently, editing a datasource
referenced by running or completed jobs will cause no errors. On relaunch,
a job execution will extract the current information from the datasource.</p>
<p>There is only a small window where perhaps editing should not be allowed.
This is when a datasource object is referenced by a job execution in the
“PENDING” state. At this point, information has not yet been extracted
from the datasource object, and a change during this window would
cause the job to run with paths other than the ones that existed at submission
time.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add an update operation to the REST API for datasource objects. Do not
allow updates for datasource objects that are referenced by job executions
in the “PENDING” state (this can be checked during validation).</p>
<p>Datasource objects referenced by job executions that are not in the PENDING
state may be changed. In an existing blueprint and related CR (listed in the
reference section) the URLs used by a job execution will be recorded in
the job execution when the job enters the RUNNING state. This means that for
any running or completed job execution, the list of exact datasource URLs
used in the execution will be available from the job execution itself even
if the referenced datasource has been edited.</p>
<p>Allow any fields in a datasource object to be updated except for id.
The object id should be preserved.</p>
<p>Add the corresponding update operation to the python-saharaclient.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Backward compatiblity will be maintained since this is a new endpoint.</p>
<p><strong>PUT /v1.1/{tenant_id}/data-sources/{data_source_id}</strong></p>
<p>Normal Response Code: 202 (ACCEPTED)</p>
<p>Errors: 400 (BAD REQUEST), 404 (NOT FOUND)</p>
<p>Update the indicated datasource object</p>
<dl>
<dt><strong>Example</strong></dt><dd><p><strong>request</strong></p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span/>PUT http://sahara/v1.1/{tenant_id}/data-sources/{data_source_id}
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"some description"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my_input"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift://container/correct_path"</span>
<span class="p">}</span>
</pre></div>
</div>
<p><strong>response</strong></p>
<div class="highlight-http notranslate"><div class="highlight"><pre><span/><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">202</span> <span class="ne">ACCEPTED</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/json</span>
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"created_at"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2015-04-08 20:27:13"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"some_description"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"7b25fc64-5913-4bc3-aaf4-f82ad03ea2bc"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my_input"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"tenant_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"33724d3bf3114ae9b8ab1c170e22926f"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"updated_at"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2015-04-09 10:27:13"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift://container_correct_path"</span>
<span class="p">}</span>
</pre></div>
</div>
</dd>
</dl>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>This operation should be added to the python-saharaclient API as well</p>
<p>$ sahara data-source-update [–name NAME] [–id ID] [–json]</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>To take advantage of this from the Horizon UI, we would need a selectable
“Edit” action for each datasource on the datasources page</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Trevor McKay</p>
</dd>
<dt>Other contributors:</dt><dd><p>Chad Roberts</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Add REST and support methods to Sahara
Add operation to python-saharaclient
Add operation to datasource screens in Horizon
Add to WADL in api-ref</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests in Sahara and python-saharaclient</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Potentially any user documentation that talks about relaunch, or
editing of other objects like templates</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-datasource-placeholders">https://blueprints.launchpad.net/sahara/+spec/edp-datasource-placeholders</a>
<a class="reference external" href="https://review.openstack.org/#/c/158909/">https://review.openstack.org/#/c/158909/</a></p>
</section>
Fri, 18 Aug 2017 00:00:00 [EDP] Allow editing job binarieshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/edp-edit-job-binaries.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-edit-job-binaries">https://blueprints.launchpad.net/sahara/+spec/edp-edit-job-binaries</a></p>
<p>Currently there is no way to edit a job binary. If a path needs
to be changed, for example, the job binary must be deleted and a new
one created. The most common use case is a situation where a user
creates a job binary, runs a job, and receives an error from the job
because the path does not exist. Although it is not strictly necessary,
editable job binary objects would be a convenience when a user needs to
correct a path or credentials.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are no API methods for updating a job binary object in the
REST API or at the conductor level.</p>
<p>The only way to correct a job binary is to delete an existing one and
create a new one with corrected information. Although it is possible to
use the same name, the object id will be different.</p>
<p>If editing is allowed, a user only needs to do a single operation to
make corrections. Additionally, the id is preserved so that objects which
reference it will reference the corrected path.</p>
<p>In the general case, editing a job binary should not be a problem for
a job object that references it. Once a job execution enters the “RUNNING”
state, any job binary objects it references indirectly through the job object
have been uploaded to the cluster for execution. Consequently, editing a
job binary object will cause no errors.</p>
<p>There is only a small window where editing should not be allowed.
This is when a job binary object is referenced by a job execution in the
“PENDING” state. At this point, binaries have not yet been uploaded to the
cluster and a change during this window would cause the job to run with
paths other than the ones that existed at submission time.</p>
<p>Note, the paths of binaries used by a job execution should be recorded in
the job execution. This will remove a restriction on editing of paths in
a job binary that is referenced by an existing job execution. This will be
done in a separate blueprint listed in the references section (similar
recording of data source paths used during an execution is supported in
another blueprint).</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add an update operation to the REST API for job binary objects. Do not
allow updates for job binaries that are referenced by job executions
in the “PENDING” state (this can be checked during validation).</p>
<p>Allow the following fields in the job binary to be edited:</p>
<ul class="simple">
<li><p>name</p></li>
<li><p>description</p></li>
<li><p>url if the value is not an “internal-db://” path</p></li>
</ul>
<p>For binaries stored in the Sahara database, the URL is generated by
Sahara and should not be editable.</p>
<p>Add the corresponding update operation to the python-saharaclient.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Backward compatiblity will be maintained since this is a new endpoint.</p>
<p><strong>PUT /v1.1/{tenant_id}/job-binaries/{job_binary_id}</strong></p>
<p>Normal Response Code: 202 (ACCEPTED)</p>
<p>Errors: 400 (BAD REQUEST), 404 (NOT FOUND)</p>
<p>Update the indicated job-binary object</p>
<dl>
<dt><strong>Example</strong></dt><dd><p><strong>request</strong></p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span/>PUT http://sahara/v1.1/{tenant_id}/job-binaries/{job_binary_id}
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"some description"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my.jar"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift://container/correct_path"</span>
<span class="p">}</span>
</pre></div>
</div>
<p><strong>response</strong></p>
<div class="highlight-http notranslate"><div class="highlight"><pre><span/><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">202</span> <span class="ne">ACCEPTED</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/json</span>
</pre></div>
</div>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"created_at"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2015-04-08 20:48:18"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"640ca841-d4d9-48a1-a838-6aa86b12520f"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my.jar"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"tenant_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"33724d3bf3114ae9b8ab1c170e22926f"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"updated_at"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2015-04-09 10:48:18"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift://container/correct_path"</span>
<span class="p">}</span>
</pre></div>
</div>
</dd>
</dl>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>This operation should be added to the python-saharaclient API as well</p>
<p>$ sahara job-binary-update [–name NAME] [–id ID] [–json]</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>To take advantage of this from the Horizon UI, we would need a selectable
“Edit” action for each job binary on the job binaries page</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Trevor McKay</p>
</dd>
<dt>Other contributors:</dt><dd><p>Chad Roberts</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Add REST and support methods to Sahara
Add operation to python-saharaclient
Add operation to job binary screens in Horizon
Add to WADL in api-ref</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This is a blueprint to store the job binary paths in the job execution object.
Implementing this first will allow editing of job binaries as long as they are
not in the PENDING state. Otherwise, editing will have to be disallowed for job
binaries referenced by an existing job execution.</p>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-store-binary-paths-in-job-executions">https://blueprints.launchpad.net/sahara/+spec/edp-store-binary-paths-in-job-executions</a></p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests in Sahara and python-saharaclient</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Potentially any user documentation that talks about
editing of other objects like templates</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 18 Aug 2017 00:00:00 Admin API for Managing Pluginshttps://specs.openstack.org/openstack/sahara-specs/specs/newton/plugin-management-api.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/plugin-management-api">https://blueprints.launchpad.net/sahara/+spec/plugin-management-api</a></p>
<p>This is a proposal to implement new admin API for management plugins
for different projects. Also, smarter deprecation can be implemented
with a new management API for plugins.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Right now we have the following problems:</p>
<blockquote>
<div><ul class="simple">
<li><p>plugin state is totally unclear for user. From the user point
of view, there are only default plugins, which are enabled for
all projects;</p></li>
<li><p>deprecation of the particular plugin version is not perfect right
now: validation error with deprecation note will be raised during
creation - which is not so good for user;</p></li>
<li><p>all plugins and versions are enabled for all projects, which is
not perfect. Someone wants to use only Ambari plugin, and other one
wants to use only CDH plugin (or a particular version)</p></li>
</ul>
</div></blockquote>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>It’s proposed to implement several new API calls for plugins management.
Actually, there is plan to support that in API v1.1 correctly so that we
can put this to API v2 without additional stuff.</p>
<p>First of all we need to enable all plugins by default. There is no sense
to continue separating default and non-default plugins. There is an agreement
that having default/non-default plugins is redundant, just because such
process don’t allow to deliver probably non-stable plugins at least for testing
in several projects.</p>
<p>Additionally as a part of this blueprint migration of <code class="docutils literal notranslate"><span class="pre">hadoop_version</span></code>
to <code class="docutils literal notranslate"><span class="pre">plugin_version</span></code> should be done to keep this api consistent with new
management API.</p>
<p>New database should be created to store all required metadata about current
state of each plugin. This metadata is a combination of labels for plugins
itself and for each version of the plugin. The plugin label is an indicator
of plugin state (or version) that we will help user to understand the current
state of plugin itself (like a stability, deprecation status) or of some
features. If no metadata will be stored in DB, plugin SPI method will
help us to return default metadata for plugin. This also will help us
to avoid possible issues with upgrades from old releases of sahara.</p>
<p>This metadata should describe each plugin and its versions. The example
of return value of this plugin SPI method is the following:</p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"version_labels"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"2.3"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"stable"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"enabled"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"2.2"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"deprecated"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"enabled"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"plugin_labels"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"enabled"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"status"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>PluginManager on requests of all plugins will collect all data stored in DB
for plugins and will accumulate that data with default data for plugins without
entry in DB. Also, for each label entry manager will additionally put the
description of the label and possibility of changing of label by admins. The
collected data will be exposed to the user. See example of return value below
in REST API section.</p>
<p>The initial set of labels is the following:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">enabled</span></code> if plugin is enabled, then each user will be able to use plugin
for cluster creation and will have ability to perform all CRUD operations on
top the cluster. If plugins doesn’t have this label, only deletion can be
performed with this cluster.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">hidden</span></code> if plugin is hidden, it’s still available for performing actions
using saharaclient, but it will be hidden from UI side and CLI. It’s special
tag for fake plugin.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">stable</span></code> plugin is stable enough to be used. Sahara CI should be enabled
for plugin to prove it’s stability in terms of cluster creation and running
EDP jobs. This label can’t be removed. Will not
be stored in DB and will be handled by plugin SPI method only.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">deprecated</span></code> plugin is deprecated. Warnings about deprecation
will be shown for this plugin. Not intended to be used. Sahara CI (nightly)
will continue testing this plugin to prove it’s still works well. Label
can’t be removed. Recommendations should be provided about what operations
are available for this cluster.</p></li>
</ul>
</div></blockquote>
<p>Admin user will be able to perform <code class="docutils literal notranslate"><span class="pre">PATCH</span></code> actions with labels via API,
if label is really can be removed. <code class="docutils literal notranslate"><span class="pre">oslo_policy</span></code> will be used for handling
that user have admin role. Only status can be changed for each label.
Mutability and description can’t be changed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>New table is needed for this feature to store data about plugin tags.</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">+----------------+--------------+</span>
<span class="go">| plugins | column type |</span>
<span class="go">+----------------+--------------+</span>
<span class="go">| tenant_id | String |</span>
<span class="go">+----------------+--------------+</span>
<span class="go">| plugin_labels | JsonDictType |</span>
<span class="go">+----------------+--------------+</span>
<span class="go">| version_labels | JsonDictType |</span>
<span class="go">+----------------+--------------+</span>
<span class="go">| id (Unique) | String |</span>
<span class="go">+----------------+--------------+</span>
<span class="go">| name | String |</span>
<span class="go">+----------------+--------------+</span>
</pre></div>
</div>
<p>An simple example of stored data:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> 'name': "fake",</span>
<span class="go"> "plugin_labels": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "status": true,</span>
<span class="go"> }</span>
<span class="go"> },</span>
<span class="go"> "tenant_id": "uuid",</span>
<span class="go"> "id": "uuid just to be unique",</span>
<span class="go"> "version_labels": {</span>
<span class="go"> "0.1": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "status": true</span>
<span class="go"> }</span>
<span class="go"> }</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>There are bunch of changes in REST API are going to be done.</p>
<p>Endpoint changes:</p>
<ol class="arabic simple">
<li><p>for <code class="docutils literal notranslate"><span class="pre">GET</span></code> <code class="docutils literal notranslate"><span class="pre">/plugins</span></code> to following output will be expected after
implementation. All labels will be additionally serialized with description,
mutability.</p></li>
</ol>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "plugins": [</span>
<span class="go"> {</span>
<span class="go"> "description": "HDP plugin with Ambari",</span>
<span class="go"> "versions": [</span>
<span class="go"> "2.3",</span>
<span class="go"> "2.4",</span>
<span class="go"> ],</span>
<span class="go"> "name": "ambari",</span>
<span class="go"> "plugin_labels": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "description": "Indicates that plugin is switched on",</span>
<span class="go"> "mutable": true,</span>
<span class="go"> "status": true</span>
<span class="go"> }</span>
<span class="go"> },</span>
<span class="go"> "version_labels": {</span>
<span class="go"> "2.3": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "description": "Indicates that version is switched on",</span>
<span class="go"> "mutable": true,</span>
<span class="go"> "status": false,</span>
<span class="go"> },</span>
<span class="go"> "deprecated": {</span>
<span class="go"> "description": "Plugin is deprecated, but can be used"</span>
<span class="go"> "mutable": false,</span>
<span class="go"> "status": true</span>
<span class="go"> },</span>
<span class="go"> "stable": {</span>
<span class="go"> "description": "Plugin stability",</span>
<span class="go"> "mutable": false,</span>
<span class="go"> "status": false</span>
<span class="go"> }</span>
<span class="go"> },</span>
<span class="go"> "2.4": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> ..</span>
<span class="go"> },</span>
<span class="go"> "stable": {</span>
<span class="go"> ..</span>
<span class="go"> },</span>
<span class="go"> },</span>
<span class="go"> },</span>
<span class="go"> "title": "HDP Plugin"</span>
<span class="go"> },</span>
<span class="go"> ]</span>
<span class="go">}</span>
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>new <code class="docutils literal notranslate"><span class="pre">PATCH</span> <span class="pre">/plugins/<name></span></code> which is intended for updating tags for plugin
or/and its versions. Update will be done successfully if all modified labels
are mutable. Validation will be done for user if updating only
status of each labels. To update a label you need to send request with
only this label in body. Mutability and description are fields that can’t be
changed.</p></li>
</ol>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "plugin_labels": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "status": false,</span>
<span class="go"> }</span>
<span class="go"> }</span>
<span class="go"> "version_labels: {</span>
<span class="go"> "2.3": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "status": true,</span>
<span class="go"> },</span>
<span class="go"> },</span>
<span class="go"> "2.4": {</span>
<span class="go"> "enabled": {</span>
<span class="go"> "status": false,</span>
<span class="go"> },</span>
<span class="go"> },</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>New CLI will be extended with plugin updates. Warnings about
deprecation label will be added too.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Nothing additional is required from deployers; anyway we should notify about
new default value for <code class="docutils literal notranslate"><span class="pre">plugins</span></code> option.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Things to do:</p>
<ol class="arabic simple">
<li><p>New tab for management plugins should be implemented. All labels
will be shown in this tab. Each label will have checkboxes that will add
this label to plugin. Only admin will have ability to produce changes.</p></li>
<li><p>Warning regarding deprecation label will be added to templates/cluster
creation tabs. If the only plugin enabled we will not have dropdown for
plugin choice, and the same thing for version. If the only plugin and
version is enabled, plugin choice action will be skipped.</p></li>
</ol>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev (Vitaly Gridnev)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The following items should be covered:</p>
<blockquote>
<div><ul class="simple">
<li><p>enable all plugins by default;</p></li>
<li><p>implement database side;</p></li>
<li><p>new API methods should be added;</p></li>
<li><p>plugin SPI method for default metadata;</p></li>
<li><p>document new api features in API docs;</p></li>
<li><p>python-saharaclient implementation;</p></li>
<li><p>sahara-dashboard changes</p></li>
</ul>
</div></blockquote>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Feature will covered by unit tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>All plugin labels should be documented properly.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 18 Aug 2017 00:00:00 Deprecation of CentOS 6 imageshttps://specs.openstack.org/openstack/sahara-specs/specs/pike/deprecate-centos6-images.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/deprecate-centos6-images">https://blueprints.launchpad.net/sahara/+spec/deprecate-centos6-images</a></p>
<p>Starting from the Newton release, the images based on CentOS 6
are available also with CentOS 7, sometimes even with more choices for
CentOS 7 version (CDH).
This spec propose to deprecate and remove the support for CentOS 6
images.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Keeping the support for CentOS 6 is increasinglly difficult.</p>
<p>The cloud image provided by the CentOS team can not be used as it is
(lack of resize) so a special image should be prepared. This is
<a class="reference external" href="http://git.openstack.org/cgit/openstack/sahara-image-elements/tree/diskimage-create/Create_CentOS_cloud_image.rst?h=6.0.1">documented</a>
but the default image, which should be manually regenerated, is hosted on
sahara-files.mirantis.com, which will be discontinued.</p>
<p>Also, diskimage-builder’s support for CentOS 6 is not so effective
as it should be, as most of the focus is (rightfully) on CentOS 7.</p>
<p>Example of issues which requires a workaround:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://bugs.launchpad.net/diskimage-builder/+bug/1534387">https://bugs.launchpad.net/diskimage-builder/+bug/1534387</a></p></li>
<li><p><a class="reference external" href="https://bugs.launchpad.net/diskimage-builder/+bug/1477179">https://bugs.launchpad.net/diskimage-builder/+bug/1477179</a></p></li>
</ul>
<p>A blocker bug right now is:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://bugs.launchpad.net/diskimage-builder/+bug/1698551">https://bugs.launchpad.net/diskimage-builder/+bug/1698551</a></p></li>
</ul>
<p>The (non-blocking, even if they should be blocking) gate jobs for
sahara-image-builder fails due to the latter.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The support for CentOS 6 images should be deprecated starting from Pike and
removed as soon as the compliance with the follows-standard-deprecation
allows us to do it.</p>
<p>The change mainly affects sahara-image-elements. The CentOS 6 would not be
built anymore by default while building all the images for a certain plugin
and a warning message would be printed if one of them is selected.</p>
<p>The code path which checks for CentOS 6 in Sahara services should be kept
as they are and not changed as long as the features is available even if
deprecated; after the removal the code can be restructured, if needed,
to not consider the CentOS 6 use case.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep CentOS 6 support until it is retired officially (November 30, 2020)
or until diskimage-builder removes the support, but make sure that the
current issues are fixed. A change is needed anyway in the
sahara-image-elements jobs, as the building fails right now.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users won’t be able to use CentOS 6 as base.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Most of the described changes are in sahara-image-elements (see above).</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Minor: remove the reference to CentOS 6 and the default cloud image
from the image registration panel when the feature is removed.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ltoscano</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>do not build CentOS 6 image by default for a certain plugin</p></li>
<li><p>add a warning message if one of them is requested</p></li>
<li><p>inform the operators (openstack-operators@) about the change to evaluate
the time for the removal</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>If the change is implemented, the existing jobs for sahara-image-elements
will only test the supported images and won’t fail.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Add or change the list of supported base images.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 12 Jul 2017 00:00:00 Node Group Template And Cluster Template Portabilityhttps://specs.openstack.org/openstack/sahara-specs/specs/pike/node-group-template-and-cluster-template-portability.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/portable-node-group-and-cluster-templates">https://blueprints.launchpad.net/sahara/+spec/portable-node-group-and-cluster-templates</a></p>
<p>Sahara allows creation of node group templates and cluster templates. However,
when there’s need to create template with the same parameters on another
openstack deployment one must create new template rewriting parameters by
hand. This change proposes to create functions to export these templates to
JSON files and import them later.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara provides the ability to create templates for node groups and clusters.
In the case where the user has access to multiple clouds or is constantly
redeploying development environment it is very time consuming to recreate all
the templates. We aim to allow the user to have the option to download a
template and also upload an existing template to Sahara for a quicker setup.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change proposes to allow import and export of cluster templates and node
group templates. Uses node_group_template_get(node_group_template_id) and
node_group_template_create(parameters) to connect to database.
Templates will be changed before export so that IDs and other sensitive
information is not exported. Some additional information will be requested for
import of a template.
REST API changes:
Node group template interface
* node_group_template_export(node_group_template_id) - exports node group
template to a JSON file
Cluster template interface
* cluster_template_export(cluster_template_id) - exports cluster template to a
JSON file
UI changes:
Node group template interface:
* field for exporting node group template
* field for importing node group template - uses ngt create
Cluser template interface:
* field for exporting cluster template
* field for importing cluster template - uses cluster create
CLI changes:
dataprocessing node group template export
dataprocessing node group template import
dataprocessing cluster template export
dataprocessing cluster template import</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>A clear alternative is let things be the way they are, but it makes Sahara
hard to configure and reconfigure if it was configured before.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<ul class="simple">
<li><p>node-group-templates/{NODE_GROUP_TEMPLATE_ID}/export</p></li>
<li><p>cluster-template/{CLUSTER_TEMPLATE_ID}/export</p></li>
</ul>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Export and import of node group templates and cluster templates available in
both CLI and UI.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Simplified deployment, deployer can download pre existing templates if needed.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>An option to export and another to import a template will be added. An option
to import a template will have needed fields to complete the template.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<ul class="simple">
<li><p>Iwona</p></li>
<li><p>tellesmvn</p></li>
</ul>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add export of a node group template</p></li>
<li><p>Add import of a node group template</p></li>
<li><p>Add export of a cluster template</p></li>
<li><p>Add import of a cluster template</p></li>
<li><p>Testing</p></li>
<li><p>Documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation about new features will be added.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 26 Jun 2017 00:00:00 Support for Boot from Volumehttps://specs.openstack.org/openstack/sahara-specs/specs/backlog/boot-from-volume.html
<p>This specification proposes to add boot from volume capability to Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara engine provisions VMs using a Glance image directly. In most
installations this means that the image is copied as a local file the
Nova-compute host and used as a root disk for the VM.</p>
<p>Having the root disk as a plain file introduces some limitations:</p>
<ul class="simple">
<li><p>Nova VM live migration (or Host evacuation) is not possible.</p></li>
<li><p>Root disk performance may significantly suffer if QCOW2 format is used.</p></li>
</ul>
<p>Having root disk replaced with a bootable Volume backed by a distributed
backend or local disk storage will solve the limitation listed above.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The ability to boot from volume still requires an image to serve as source.
This means that Volume base provisioning will require 2 steps.</p>
<ul class="simple">
<li><p>Create a bootable volume from a registered Sahara image.</p></li>
<li><p>Boot a VM with the block_device_mapping parameter pointing to the created
volume.</p></li>
</ul>
<p>Volume based provisioning requires the volume size to be set explicitly. This
means that Sahara should take care about this parameter. The proposed way to
set the bootable volume size is take the root disk size of the flavor being
used for the VM.</p>
<p>If the user selects the Node Group to be provisioned from the volume, he should
set the boot_from_volume flag to True.</p>
<p>The volume based provisioning is different from image based and implies the
following change.</p>
<p>The image parameter from the instance template should be removed.</p>
<p>The instance Heat template should have a new section with the block device
mapping.</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span/><span class="nt">block_device_mapping</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[{</span>
<span class="nt"> device_name</span><span class="p">:</span><span class="w"> </span><span class="s">"vda"</span><span class="p p-Indicator">,</span>
<span class="nt"> volume_id </span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{</span>
<span class="nt"> get_resource </span><span class="p">:</span><span class="w"> </span><span class="nv">bootable_volume</span><span class="w"> </span><span class="p p-Indicator">},</span>
<span class="nt"> delete_on_termination </span><span class="p">:</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="p p-Indicator">}</span>
<span class="p p-Indicator">]</span>
</pre></div>
</div>
<p>The resource group definition should have a volume added by the following
template:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span/><span class="nt">bootable_volume</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">OS::Cinder::Volume</span>
<span class="w"> </span><span class="nt">properties</span><span class="p">:</span>
<span class="w"> </span><span class="nt">size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain"><size derived from the flavor></span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain"><regular Sahara image></span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Alternatively the user may be allowed to chose an existing volume to boot from.
This however cannot guarantee that the provided volume is suitable for the
cluster installation. Sahara also requires a username metadata to be able to
log into VM. This metadata is only stored in images right now.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Node Group Template, Node Group and Templates relation objects should now
have a boolean boot_from_volume field.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Convert and Boot from Volume flag should be added to all endpoints responsible
for Node Group manipulation.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>An option should be added to the Node Group create and update forms.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Nikita Konovalov <a class="reference external" href="mailto:nkonovalov%40mirantis.com">nkonovalov<span>@</span>mirantis<span>.</span>com</a></p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement boot from volume support at the backend with the db migration and
Heat templates update.</p></li>
<li><p>Add boot_from_volume flag support in python-saharaclient</p></li>
<li><p>Add boot_from_volume flag support in sahara-dashboard</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit test coverage in sahara and python-saharaclient repositories.</p></li>
<li><p>Integration test coverage in sahara-tests framework.</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<ul class="simple">
<li><p>REST API documents should be updated.</p></li>
<li><p>General user documentation should describe the behavior introduced by the
boot_from_volume flag.</p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 10 May 2017 00:00:00 Revival of “Chronic EDP” workhttps://specs.openstack.org/openstack/sahara-specs/specs/backlog/chronic-edp-revival.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs">https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs</a>
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/recurrence-edp-job">https://blueprints.launchpad.net/sahara/+spec/recurrence-edp-job</a>
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-suspend-resume-ability-for-edp-jobs">https://blueprints.launchpad.net/sahara/+spec/add-suspend-resume-ability-for-edp-jobs</a></p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Three specs about time-related EDP job submission are partially completed.
The work could be revived.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>With reference to existing specs/patches, revive any or all of the 3 specs.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep the specs in their current state, either abandoned or partially complete.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="image-packing-impact">
<h3>Image Packing impact</h3>
<p>Refer to existing specs.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Refer to existing specs.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>None</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Enumerate and verify what has already been done</p></li>
<li><p>Decide what work should be revived</p></li>
<li><p>Clean up doc of exisitng work</p></li>
<li><p>Follow steps outlined in existing specs for new work</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Refer to existing specs.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Refer to existing specs.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 10 May 2017 00:00:00 Two step scaling with Heat enginehttps://specs.openstack.org/openstack/sahara-specs/specs/backlog/heat-two-step-scaling.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/heat-two-step-scaling">https://blueprints.launchpad.net/sahara/+spec/heat-two-step-scaling</a></p>
<p>In case of failure during cluster scaling Heat will rollback deletion of
all resources. After that Sahara will ask Heat to delete them anyway.
“Rollback deletion and delete after that” step looks unnecessary.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The following scenario happens when Sahara with Heat engine failed to scale
cluster:</p>
<ol class="arabic simple">
<li><p>User requested cluster scaling</p></li>
<li><p>Sahara runs hadoop nodes decommissioning</p></li>
<li><p>Sahara runs heat stack update with both added and removed nodes and
rollback_on_failure=True</p></li>
<li><p>If 3 failed Heat returns all deleted nodes back</p></li>
<li><p>Sahara runs heat stack update with removed nodes only</p></li>
<li><p>Heat removes nodes one more time</p></li>
</ol>
<p>So, at step 4 Heat restores nodes that will be deleted later anyway.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The described problem could be avoided by scaling in two steps. So, resulting
flow will look like this:</p>
<ol class="arabic simple">
<li><p>User requested cluster scaling</p></li>
<li><p>Sahara runs hadoop nodes decommissioning</p></li>
<li><p>Sahara runs heat stack update with removed resources only and
rollback_on_failure=False</p></li>
<li><p>Sahara runs heat stack update with new resources and
rollback_on_failure=True</p></li>
</ol>
<p>In this case if step 4 failed Heat will not try to restore deleted resources.
It will rollback to the state where resources are already deleted.</p>
<p>If step 3 fails there is nothing Sahara can do. Cluster will be moved to
‘Error’ state.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing since issue appears only in rare scenario of failed scaling down.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Perform changes</p></li>
<li><p>Test that rollback on cluster scale failure works as expected</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 10 May 2017 00:00:00 Storm Scalinghttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/storm-scaling.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/storm-scaling">https://blueprints.launchpad.net/sahara/+spec/storm-scaling</a></p>
<p>This blueprint aims to implement Scaling for Storm.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Storm plugin in sahara doesn’t have the scaling option implemented yet. This
feature is one of the major attractions to building cluster using sahara.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation of the scaling feature following the implementation from
Spark plugin.</p>
<p>The implementation will allow users to:</p>
<ul class="simple">
<li><p>Scale up a cluster</p></li>
<li><p>Scale down a cluster</p></li>
</ul>
<p>Storm is a fairly easy tool to scale. Since it uses Zookeeper as a
configuration manager and central point of communication, a new node just
needs to configure itself to communicate with the Zookeeper machine and the
master node will find the new node. One important point that needs to be taken
in consideration is the Storm rebalance action. Once a new node is added, a
running topology will not be rescheduled to use the new instance. We are going
to call this rebalance action automatically so the user won’t have to worry
about this call.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tellesmvn</p>
</dd>
</dl>
<p>Other contributors:</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement Storm Scaling feature</p></li>
<li><p>Implement topology rebalance</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Follow examples on scaling tests from other plugins to implement unit tests
for Storm scaling.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 10 May 2017 00:00:00 Removal of Direct Enginehttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/remove-direct-engine.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/remove-direct-engine">https://blueprints.launchpad.net/sahara/+spec/remove-direct-engine</a></p>
<p>Direct Infrastructure Engine was deprecated in Liberty and it’s time to remove
that in Mitaka cycle.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>We are ready to remove direct engine. Sahara have ability to delete clusters
created using direct engine, heat engine works nicely.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>First it’s proposed to remove direct engine first. Then after the removal
we can migrate all direct-engine-jobs to heat engine. After that direct
engine will be completely removed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Deployers should switch to use Heat Engine instead of Direct Engine finally.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>This will require following changes:</p>
<ul class="simple">
<li><p>Remove direct engine from the codebase (with unit tests).</p></li>
<li><p>Remove two gate jobs for direct engine.</p></li>
<li><p>Document that Direct Engine was removed finally.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to document that Direct Engine was removed.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 10 May 2017 00:00:00 Run Spark jobs on vanilla Hadoop 2.xhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/spark-jobs-for-vanilla-hadoop.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-vanilla-hadoop">https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-vanilla-hadoop</a></p>
<p>This specification proposes to add ability to run Spark jobs on cluster
running vanilla version of Hadoop 2.x (YARN).</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Support for running Spark jobs in stand-alone mode exists as well as for CDH
but not for vanilla version of Hadoop.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add a new edp_engine class in the vanilla v2.x plugin that extends
the SparkJobEngine. Leverage design and code from blueprint:
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0">https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0</a></p>
<p>Configure Spark to run on YARN by setting Spark’s configuration
file (spark-env.sh) to point to Hadoop’s configuration and deploying
that configuration file upon cluster creation.</p>
<p>Extend sahara-image-elements to support creating a vanilla image
with Spark binaries (vanilla+spark).</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Withouth these changes, the only way to run Spark along with Hadoop MapReduce
is to run on a CDH cluster.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Requires changes to sahara-image-elements to support building a vanilla 2.x
image with Spark binaries. New image type can be vanilla+spark.
Spark version can be fixed at Spark 1.3.1.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>New edp class for vanilla 2.x plugin.
sahara-image-elements vanilla+spark extension.
Unit test</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Leveraging blueprint:
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0">https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0</a></p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests to cover vanilla engine working with Spark.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 10 May 2017 00:00:00 Support for S3-compatible object storeshttps://specs.openstack.org/openstack/sahara-specs/specs/pike/support-for-s3-compatible-object-stores.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-support-s3">https://blueprints.launchpad.net/sahara/+spec/sahara-support-s3</a></p>
<p>Following the efforts done to make data sources and job binaries “pluggable”,
it should be feasible to introduce support for S3-compatible object stores.
This will be an additional alternative to the existing HDFS, Swift, MapR-FS,
and Manila storage options.</p>
<p>A previous spec regarding this topic existed around the time of Icehouse
release, but the work has been stagnant since then:
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-data-source-from-s3">https://blueprints.launchpad.net/sahara/+spec/edp-data-source-from-s3</a></p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Hadoop already offers filesystem libraries with support for s3a:///path URIs,
so supporting S3-compatible object stores on Sahara is a reasonable feature
to add.</p>
<p>Within the world of OpenStack, many cloud operators choose Ceph RadosGW
instead of Swift. RadosGW object store supports access through either
the Swift or S3 APIs. Also, with some extra configuration, a “native”
install of Swift can also support the S3 API. For some users we may expect
the Hadoop S3 library to be preferable over the Hadoop Swift library as it
has recently received several enhancements including support for larger
objects and other performance improvements.</p>
<p>Additionally, some cloud users may wish to use other S3-compatible object
stores, including:</p>
<ul class="simple">
<li><p>Amazon S3 (including AWS Public Datasets)</p></li>
<li><p>LeoFS</p></li>
<li><p>Riak Cloud Storage</p></li>
<li><p>Cloudian HyperStore</p></li>
<li><p>Minio</p></li>
<li><p>SwiftStack</p></li>
<li><p>Eucalyptus</p></li>
</ul>
<p>It is clear that adding support for S3 datasources will open up a new world of
Sahara use cases.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>An “s3” data source type will be added, via new code in
<em>sahara.service.edp.data_sources</em>. We will need utilities to validate S3
URIs, as well as to handle job configs (access key, secret key, endpoint,
bucket URI).</p>
<p>Regarding EDP, there should not be much work to do outside of defining the new
data source type, since the Hadoop S3 library allows jobs to be run against S3
seamlessly.</p>
<p>Similar work will be done to enable an “s3” job binary type, including the
writing of “job binary retriever” code.</p>
<p>While the implementation of the abstraction is simple, a lot of work comes
from dashboard, saharaclient, documentation, and testing.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do not add support for S3 as a data source for EDP. Since the Hadoop S3
libraries are already included on the image regardless of this change,
users can run data processing jobs against S3 manually. We still may wish
to add the relevant JARs to the classpath as a courtesy to users.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None (only “s3” as a valid data source type and job binary type in the schema)</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>On most images, hadoop-aws.jar needs to be added to the classpath. Generally
images with Hadoop (or related component) installed already have the JAR. The
work will probably take place during the transition from SIE to image packing,
so the work will probably need to be done in both places.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Data Source and Job Binary forms should support s3 type, with fields for
access key, secret key, S3 URI, and S3 endpoint. Note that this is a lot
of fields, in fact more than we have for Swift. There will probably be some
saharaclient impact too, because of this.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Jeremy Freudberg</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>S3 as a data source</p></li>
<li><p>S3 as a job binary</p></li>
<li><p>Ensure presence of AWS JAR on images</p></li>
<li><p>Dashboard and saharaclient work</p></li>
<li><p>Scenario tests</p></li>
<li><p>Documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will probably want scenario tests (although we don’t have them for Manila).</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Nothing out of the ordinary, but important to keep in mind both user and
developer perspective.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 10 May 2017 00:00:00 There and back again, a roadmap to API v2https://specs.openstack.org/openstack/sahara-specs/specs/queens/api-v2-experimental-impl.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/v2-api-experimental-impl">https://blueprints.launchpad.net/sahara/+spec/v2-api-experimental-impl</a></p>
<p>As sahara’s API has evolved there have been several features introduced
in the form of routes and methods that could be crafted in a more
consistent and predictable manner. Additionally, there are several new
considerations and methodologies that can only be addressed by updating the
major version of the API. This document serves as a roadmap to implement an
experimental v2 API which will form the basis of the eventual stable version.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>This is an umbrella specification covering many changes, there will be
followup specifications to cover some of the more intricate details.</p>
</div>
<section id="problem-description">
<h2>Problem description</h2>
<p>The current version of sahara’s REST API, 1.1, contains several methodologies
and patterns that have created inconsistencies within the API and with
respect to the API Working Group’s evolving guidelines[1]. Many of these
are due to the iterative nature of the design work, and some have been created
at a time before stable guidelines existed.</p>
<p>Examples of inconsistencies within the current API:</p>
<ul class="simple">
<li><p>Inaccurate names for method endpoints, for example “jobs” instead of
“job-templates”.</p></li>
<li><p>Technology specific parameters in JSON resources, for example “oozie_job_id”
instead of “engine_job_id”.</p></li>
<li><p>Improper HTTP method usage for some operations, for example using PUT for
partial resource updates instead of PATCH.</p></li>
</ul>
<p>In addition to resolving the inconsistencies in the API, a new version will
provide an opportunity to implement features which will improve the experience
for consumers of the sahara API.</p>
<p>Examples of features to implement in the new API:</p>
<ul class="simple">
<li><p>Micro-version support to aid in feature discovery by client applications.</p></li>
<li><p>Creation of tasks endpoint and infrastructure to improve usage of
asynchronous operations.</p></li>
<li><p>HREF embedding in responses to improve resource location discovery.</p></li>
</ul>
<p>These are just a few examples of issues which can be addressed in a new
major version API implementation.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>To address the creation of a new major version API, an experimental <code class="docutils literal notranslate"><span class="pre">/v2</span></code>
endpoint should be created. This new endpoint will be clearly marked as
experimental and no contract of stability will be enforced with regards to
the content of its sub-endpoints. Changes to the <code class="docutils literal notranslate"><span class="pre">/v2</span></code> endpoint will be
tracked through features described in this specification, and through further
specifications which will be created to better describe the details of larger
changes.</p>
<p>When all the changes to the <code class="docutils literal notranslate"><span class="pre">/v2</span></code> endpoint have been made such that it
has a 1:1 feature compliance with the current API, and the Python sahara
client has been updated to use these new endpoints, the experimental status
of the API should be assessed with the goal of marking it as stable and
ready for public consumption.</p>
<p>The individual changes will be broken down into individual tasks. This will
allow the sahara team members to more easily research and implement the
changes. These efforts will be coordinated through a page on the sahara
wiki site[2].</p>
<section id="initial-v2-commit">
<h3>Initial v2 commit</h3>
<p>The initial changes to create the <code class="docutils literal notranslate"><span class="pre">/v2</span></code> endpoint should also include moving
the Identity project identifier to a header named <code class="docutils literal notranslate"><span class="pre">OpenStack-Project-ID</span></code>.
In all other respects, the endpoints currently in place for the <code class="docutils literal notranslate"><span class="pre">/v1.1</span></code> API
will be carried forward into the new endpoint namespace. This will create a
solid base point from which to make further changes as the new API evolves
and moves towards completion of all features described in the experimental
specifications.</p>
<p>Removing the project identifier from the URI will help to create more
consistent, reusable, routes for client applications and embedded HREFs.
This move will also help decouple the notion of URI scoped resources being
tied to a single project identifier.</p>
</section>
<section id="roadmap-of-changes">
<h3>Roadmap of changes</h3>
<p>The following list is an overview of all the changes that should be
incorporated into the experimental API before it can be considered for
migration to stable. These changes are not in order of precedence, and can
be carried out in parallel. Some of these changes can be addressed with
simple bugs, which should be marked with <code class="docutils literal notranslate"><span class="pre">[APIv2]</span></code> in their names. The more
complex changes should be preceeded by specifications marked with the same
<code class="docutils literal notranslate"><span class="pre">[APIv2]</span></code> moniker in their names. For both types of changes, the
commits should contain <code class="docutils literal notranslate"><span class="pre">Partial-Implements:</span> <span class="pre">bp</span> <span class="pre">v2-api-experimental-impl</span></code>
to aid in tracking the API conversion process.</p>
<p>Overview of changes:</p>
<ul class="simple">
<li><p>Endpoint changes</p>
<ul>
<li><p>/images/{image_id}/tag and /images/{image_id}/untag should be changed
to follow the guidelines on tags[3].</p></li>
<li><p>/jobs should be renamed to /job-templates.</p></li>
<li><p>/job-executions should be renamed to /jobs.</p></li>
<li><p>executing a job template through the /jobs/{job_id}/execute endpoint
should be changed to a POST operation on the new /jobs endpoint.</p></li>
<li><p>cancelling a job execution through the
/job-executions/{job_execution_id}/cancel endpoint should be removed in
favor of requesting a cancelled state on a PATCH to the new
/jobs/{job_id} endpoint.</p></li>
<li><p>/job-binary-internals should be removed in favor of /job-binaries as
the latter accepts internal database referenced items, an endpoint
under /job-binaries can be created for uploading files(if required).</p></li>
<li><p>/job-executions/{job_execution_id}/refresh-status should be removed in
favor of using a GET on the new /jobs/{job_id} endpoint for running
job executions.</p></li>
<li><p>all update operations should synchronize around using PATCH instead of
PUT for partial resource updates.</p></li>
</ul>
</li>
<li><p>JSON payload changes</p>
<ul>
<li><p>hadoop_version should be changed to plugin_version.</p></li>
<li><p>oozie_job_id should be changed to engine_job_id.</p></li>
<li><p>all returned payloads should be wrapped in their type, this is currently
true for the API and should remain so for consistency.</p></li>
<li><p>HREFs should be embedded in responses that contain references to other
objects.</p></li>
</ul>
</li>
<li><p>New features</p>
<ul>
<li><p>Identity project identifier moved to headers. This will be part of the
initial version 2 commit but is worth noting as a major feature change.</p></li>
<li><p>Micro-version support to be added, this should be documented fully in
a separate specification but should be based on the work done by the
ironic[4] and nova[5] projects. Although implemented during the
experimental phase, these microversions will not implement the backward
compatibility features until the API has been declared stable. Once
the API has moved into the stable phase, the microversions will only
implement backward compatibility for version 2, and only for features
added after the stable release.</p></li>
<li><p>Version discovery shall be improved by adding support for a “home
document” which will be returned from the version 2 root URL. This
document will follow the json-home[6] draft specification.
Additionally, support for microversion discovery will be added
using prior implementations and the API working group guidelines as
guides.</p></li>
<li><p>Creation of an actions endpoint for clusters to provide a single
entrypoint for operations on those clusters. This endpoint should
initially allow operations such as scaling but will be used for
further improvements in the future. The actions endpoint will be
the subject a separate specification as it will describe the
removal of several verb-oriented endpoints that currently exists,
and the creation of a new mechanism for synchronous and asynchronous
operations.</p></li>
</ul>
</li>
</ul>
<p>This list is not meant to contain all the possible future changes, but a
window of the minimum necessary changes to be made before the new API can
be declared as stable.</p>
<p>The move to stable for this API should not occur before the Python sahara
client has been updated to use the new functionality.</p>
</section>
<section id="alternatives">
<h3>Alternatives</h3>
<p>An alternative might be to make changes to the current version API, but this
is inadvisable as it breaks the API version contract for end users.</p>
<p>Although the current version API can be changed, there is no way to safely
make the proposed changes without breaking backward compatibility. As the
proposed changes are quite large in nature it is not advisable to create a
“1.2” version of the API.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Most of these changes will not require modifications to the data model. The
two main exceptions are the payload name changes for <code class="docutils literal notranslate"><span class="pre">hadoop_version</span></code> and
<code class="docutils literal notranslate"><span class="pre">oozie_job_id</span></code>. As the data model will continue to be used for the v1.1
API until it is deprecated, it is not advisable to rename these fields at
this time. When the v2 API has been made stable, and the v1.1 API has been
deprecated, these fields should be revisisted and changed in the data model.</p>
<p>During the experimental phase of the API, these translations will occur in
the code that handles requests and responses. After the API has transitioned
to production mode, migrations should be created to align the data models
with the API representations and translations should be created for the
older versions only as necessary. As the older version API will eventually
be deprecated, these changes should be scheduled to coincide with that
transition.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>As this specification is addressing a high level change of the API, the
following changes are enumerated in brief. Full details should be created
for changes that will require more than just renaming an endpoint.</p>
<ul class="simple">
<li><p>creation of /v2 root endpoint</p></li>
<li><p>removal of {tenant_id} from URI, to be replaced by
<code class="docutils literal notranslate"><span class="pre">OpenStack-Project-ID</span></code> header on all requests.</p></li>
<li><p>removal of POST to /images/{image_id}/tag</p></li>
<li><p>removal of POST to /images/{image_id}/untag</p></li>
<li><p>creation of GET/PUT/DELETE to /images/{image_id}/tags, this should be
followed with a specification describing the new tagging methodology.</p></li>
<li><p>creation of GET/PUT/DELETE to /images/{image_id}/tags/{tag_id}, this
should also be in the previously mentioned specification on tagging.</p></li>
<li><p>move operations on /jobs to /job-templates</p></li>
<li><p>move operations /job-executions to /jobs</p></li>
<li><p>removal of POST to /jobs/{job_id}/execute</p></li>
<li><p>creation of POST to /jobs, this should be defined in a specification
about restructuring the job execution endpoint.</p></li>
<li><p>creation of jobs via the /jobs endpoint should be transitioned away
from single input and output fields to use the newer job configuration
interface[7].</p></li>
<li><p>removal of GET to /job-executions/{job_execution_id}/cancel</p></li>
<li><p>creation of PATCH to /jobs/{job_id}, this should be defined in the
specification about restructuring the job execution endpoint.</p></li>
<li><p>removal of GET to /job-executions/{job_execution_id}/refresh-status</p></li>
<li><p>removal of all /job-binary-internals endpoints with their functionality
being provides by /job-binaries, this may require creating a separate
sub-endpoint for uploading.</p></li>
<li><p>refactor of PUT to /node-group-templates/{node_group_template_id} into
PATCH on same endpoint.</p></li>
<li><p>refactor of PUT to /cluster-templates/{cluster_template_id} into PATCH on
same endpoint.</p></li>
<li><p>refactor of PUT to /job-binaries/{job_binary_id} into PATCH on same
endpoint.</p></li>
<li><p>refactor of PUT to /data-sources/{data_source_id} into PATCH on same
endpoint.</p></li>
</ul>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>In the experimental phase, this change should have no noticeable affect on
the end user. Once the API has been declared stabled, users will need to
switch python-saharaclient versions as well as upgrade their horizon
installations to make full use of renamed features.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>During the experimental phase, this change will have no effect on deployers.</p>
<p>When the API reaches the stable phase, deployers will be responsible for
upgrading their installations to ensure that sahara and python-saharaclient
are upgraded as well as changing the service catalog to represent the
base endpoint.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>As this change is targeted for experimental work, developers should know
that the details of the v2 API will be constantly changing. There is no
guarantee of stability.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>This change should not require changes to horizon as many of the primitives
that are changing already display the proper names, for example
“Job Templates”. When this change moves to the stable phase, horizon should
be re-evaluated.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Telles Nobrega</p>
</dd>
<dt>Other contributors:</dt><dd><p>mimccune (Michael McCune)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The main work item for this specification is the initial v2 commit.</p>
<ul class="simple">
<li><p>create v2 endpoint</p></li>
<li><p>create code to handle project id in headers</p></li>
<li><p>create mappings to current endpoints</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This change should not require new dependencies.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be created to exercise the new endpoints. Additionally, the
gabbi[8] testing framework should be investigated as a functional testing
platform for the REST API.</p>
<p>To improve security testing, tools such as Syntribos[9] and RestFuzz[10]
should be investigated for use in directed testing efforts and as possible
gate tests.</p>
<p>These investigations should result in further specifications if sufficient
results are discovered to warrent their creation as they will deal with
new testing modes for the sahara API server.</p>
<p>As the v2 API reaches stable status, and the python-saharaclient has been
ported to use the new API, the current functional tests should provide the
necessary framework to ensure successful end-to-end testing.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>During the experimental phase, this work will not produce documentation. As
the evaluation for stable approaches there will need to be a new version of
the WADL files for the api-ref[11] site, if necessary. There is the
possibility that this site will change its format, in which case these new
API documents will need to be generated.</p>
<p>Further, the v2 API should follow keystone’s model[12] of publishing
the API reference documents in restructured text format to the specs
repository. This would make the API much easier to document and update as
new specification changes could also propose their API changes to the same
repo. Also, the WADL format is very verbose and the future of this format is
under question within the OpenStack documentation community[13]. The effort
to make accurate documentation for sahara’s API should also include the
possibility of creating Swagger[14] output as the v2 API approaches stable
status, this should be addressed in a more separate specification as that
time approaches.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1]: <a class="reference external" href="http://specs.openstack.org/openstack/api-wg/#guidelines">http://specs.openstack.org/openstack/api-wg/#guidelines</a></p>
<p>[2]: <a class="reference external" href="https://wiki.openstack.org/wiki/Sahara/api-v2">https://wiki.openstack.org/wiki/Sahara/api-v2</a></p>
<p>[3]: <a class="reference external" href="http://specs.openstack.org/openstack/api-wg/guidelines/tags.html">http://specs.openstack.org/openstack/api-wg/guidelines/tags.html</a></p>
<p>[4]: <a class="reference external" href="http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/api-microversions.html">http://specs.openstack.org/openstack/ironic-specs/specs/kilo-implemented/api-microversions.html</a></p>
<p>[5]: <a class="reference external" href="http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/api-microversions.html">http://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/api-microversions.html</a></p>
<p>[6]: <a class="reference external" href="https://tools.ietf.org/html/draft-nottingham-json-home-03">https://tools.ietf.org/html/draft-nottingham-json-home-03</a></p>
<p>[7]: <a class="reference external" href="http://specs.openstack.org/openstack/sahara-specs/specs/liberty/unified-job-interface-map.html">http://specs.openstack.org/openstack/sahara-specs/specs/liberty/unified-job-interface-map.html</a></p>
<p>[8]: <a class="reference external" href="https://github.com/cdent/gabbi">https://github.com/cdent/gabbi</a></p>
<p>[9]: <a class="reference external" href="https://github.com/openstack/syntribos">https://github.com/openstack/syntribos</a></p>
<p>[10]: <a class="reference external" href="https://github.com/redhat-cip/restfuzz">https://github.com/redhat-cip/restfuzz</a></p>
<p>[11]: <a class="reference external" href="http://developer.openstack.org/api-ref.html">http://developer.openstack.org/api-ref.html</a></p>
<p>[12]: <a class="reference external" href="https://github.com/openstack/keystone-specs/tree/master/api">https://github.com/openstack/keystone-specs/tree/master/api</a></p>
<p>[13]: <a class="reference external" href="http://specs.openstack.org/openstack/docs-specs/specs/liberty/api-site.html">http://specs.openstack.org/openstack/docs-specs/specs/liberty/api-site.html</a></p>
<p>[14]: <a class="reference external" href="https://github.com/swagger-api/swagger-spec">https://github.com/swagger-api/swagger-spec</a></p>
<p>Liberty summit etherpad,
<a class="reference external" href="https://etherpad.openstack.org/p/sahara-liberty-api-v2">https://etherpad.openstack.org/p/sahara-liberty-api-v2</a></p>
<p>Mitaka summit etherpad,
<a class="reference external" href="https://etherpad.openstack.org/p/sahara-mitaka-apiv2">https://etherpad.openstack.org/p/sahara-mitaka-apiv2</a></p>
</section>
Wed, 10 May 2017 00:00:00 Remove Job Binary Internalhttps://specs.openstack.org/openstack/sahara-specs/specs/queens/remove-job-binary-internal.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/remove-job-binary-internal">https://blueprints.launchpad.net/sahara/+spec/remove-job-binary-internal</a></p>
<p>Job binary internal is not needed now, since swift and manila
are available (and possibly other storage options in the future)
and these are more suitable options for storage.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Job binary internal is a job binary storage option that is kept in
Sahara’s internal database, this option should not be available
anymore since there are better storage options (swift, manila, …).</p>
<p>Besides that, Sahara’s internal database should be used only to
Sahara’s data storage. Allowing job binaries to be stored in
Sahara’s database isn’t only unneeded, but also a possible source
of problems, since it increases the size of the database. It also opens
a loophole for free storage. It’s definitely the wrong tool for the job.</p>
<p>Also, it’s important to notice that this change is related to
APIv2 and should not break APIv1.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change proposes to remove job binary internal in
favour of other storage options. Planning to deprecate
it when APIv2 is stable, in tandem with a deprecation of APIv1. Both APIv1
and JBI will be fully removed together after APIv2 has been stable for long
enough.</p>
<p>This change can be divided in the patches below.</p>
<ul class="simple">
<li><p>remove job binary internal from APIv2</p></li>
<li><p>remove internal code that deals with job binary internal
(possibly a few patches)</p></li>
<li><p>remove job binary internal from database</p></li>
<li><p>remove job binary internal from saharaclient</p></li>
<li><p>remove job binary internal option from Horizon
(sahara-dashboard)</p></li>
<li><p>update documentation involving job binary internal</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Another option for this change would be not remove job
binary internal, which could bring future problems to Sahara.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Job binary internal will be removed from the data model.
This will require a database migration.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Job binary internal related requests must be removed
only in APIv2.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Job binary internal option should not be available
through Horizon.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Job binary internal option should not be available
through Horizon.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee: mariannelinharesm</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>remove job binary internal from APIv2</p></li>
<li><p>remove internal code that deals with job binary internal
(possibly a few patches)</p></li>
<li><p>remove job binary internal from database</p></li>
<li><p>remove job binary internal from saharaclient</p></li>
<li><p>remove job binary internal option from Horizon
(sahara-dashboard)</p></li>
<li><p>update documentation involving job binary internal</p></li>
<li><p>(all but first and last steps are done in tandem with APIv1 removal)</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Update documentation to related to job binary internal:</p>
<ul class="simple">
<li><p>add a warning about job binary internal’s deprecation
whenever APIv2 becomes stable and default.</p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<p>[0] <a class="reference external" href="http://eavesdrop.openstack.org/meetings/sahara/2017/sahara.2017-04-06-14.00.log.txt">http://eavesdrop.openstack.org/meetings/sahara/2017/sahara.2017-04-06-14.00.log.txt</a>
[1] <a class="reference external" href="http://eavesdrop.openstack.org/meetings/sahara/2017/sahara.2017-03-30-18.00.log.txt">http://eavesdrop.openstack.org/meetings/sahara/2017/sahara.2017-03-30-18.00.log.txt</a></p>
</section>
Sun, 09 Apr 2017 00:00:00 Add an API to sahara-scenario for integration to another frameworkshttps://specs.openstack.org/openstack/sahara-specs/specs/sahara-tests/api-for-sahara-scenario.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara-tests/+spec/api-for-scenario-tests">https://blueprints.launchpad.net/sahara-tests/+spec/api-for-scenario-tests</a></p>
<p>Sahara scenario is the main tool for testing Sahara and we can provide
API for using this framework in another tools/frameworks/tests.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>When you perform testing of Sahara from another framework, you can’t reuse
created scenarios and need to create similar code in another framework with
new scenarios for Sahara. I think, that we can implement simple Python API for
running Sahara scenarios from default templates. It will be useful, for
example in frameworks with destructive tests or something else.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Refactor current <cite>runner.py</cite> module, pull out code
in <cite>sahara_tests/scenario/utils.py</cite> for reusing it in API methods.</p>
<p>I propose to move code with:</p>
<ul class="simple">
<li><p>creation of tmp files,</p></li>
<li><p>working with .mako,</p></li>
<li><p>running tests,</p></li>
<li><p>merging/generation of scenario files.</p></li>
</ul>
<p>In <cite>runner.py</cite> we should leave only calls of this methods, for more convenient
usage. The next step is adding the directory <cite>api</cite> in <cite>sahara_tests/scenario</cite>
with files for implementing API.</p>
<p>I think, we need to add file <cite>api.py</cite> with methods for external usage:</p>
<ul class="simple">
<li><p>def run_scenario(plugin, version, release=None)</p></li>
</ul>
<p>and <cite>base.py</cite> with preparing of files for running and auxiliary methods in
the future.</p>
<p>In future we can separate one scenario into steps:</p>
<ul class="simple">
<li><p>create node group template;</p></li>
<li><p>create cluster template;</p></li>
<li><p>create cluster;</p></li>
<li><p>create data sources;</p></li>
<li><p>create job binaries;</p></li>
<li><p>perform EDP;</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>esikachev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>move methods from <cite>runner.py</cite></p></li>
<li><p>add the ability to run tests via API</p></li>
<li><p>add the job for testing API</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We can create separate job on Sahara-ci with custom script for checking API
calls and correct performing of actions.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The API implementation should be mentioned in Sahara-tests docs.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 20 Mar 2017 00:00:00 Data Source and Job Binary Pluggabilityhttps://specs.openstack.org/openstack/sahara-specs/specs/ocata/data-source-plugin.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/data-source-plugin">https://blueprints.launchpad.net/sahara/+spec/data-source-plugin</a></p>
<p>Sahara allows multiple types of data source and job binary. However, there’s
no clean abstraction around them, and the code to deal with them is often very
difficult to read and modify. This change proposes to create clean
abstractions that each data source type and job binary type can implement
differently depending on its own needs.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently, the data source and job binary code are spread over different
folders and files in Sahara, making this code hard to change and to extend.
Right now, a developer who wants to create a new data source needs to look all
over the code and modify things in a lot of places (and it’s almost impossible
to know all of them without deep experience with the code). Once this change
is complete, developers will be able to create code in a single directory and
will be able to write their data source by implementing an abstract class.
This will allow users to enable data sources that they write themselves
(and hopefully contribute upstream) much more easily, and it will allow
operators to disable data sources that their own stack does not support
as well.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change proposes to create the data source and job binary abstractions as
plugins, in order to provide loading code dynamically, with a well defined
interface. The existing types of data sources and job binaries will be
refactored.</p>
<p>The interfaces that will be implemented are described below:</p>
<p><em>Data Source Interface</em></p>
<ul class="simple">
<li><p>prepare_cluster(cluster, kwargs) - Makes a cluster ready to use this
data source. Different implementations for each data source: for Manila
it will be mount the share, for Swift verify credentials, etc.</p></li>
<li><p>construct_url(url, job_exec_id) - Resolves placeholders in data source URL.</p></li>
<li><p>get_urls(cluster, url, job_exec_id) - Returns the data source url and the
runtime url the data source must be referenced as. Returns: a tuple of the
form (url, runtime_url).</p></li>
<li><p>get_runtime_url(url, cluster) - If needed it will construct a runtime
url for the data source, by the default if a runtime url is not needed
it will return the native url.</p></li>
<li><p>validate(data) - Checks whether or not the data passed through the API
to create or update a data source is valid.</p></li>
<li><p>_validate_url(url) - This method is optional and can be used by the
validate method in order to check whether or not the data source url
is valid.</p></li>
</ul>
<p><em>Job Binary Interface</em></p>
<ul class="simple">
<li><p>prepare_cluster(cluster, kwargs) - Makes a cluster ready to use this
job binary. Different implementations for each data source: for Manila it
will be mount the share, for Swift verify credentials, etc.</p></li>
<li><p>copy_binary_to_cluster(job_binary, cluster) - If necessary, pull
binary data from the binary store and copy that data to a useful path
on the cluster.</p></li>
<li><p>get_local_path(cluster, job_binary) - Returns the path on the local
cluster for the binary.</p></li>
<li><p>validate(data) - Checks whether or not the data passed through the API
to create or update a job binary is valid.</p></li>
<li><p>_validate_url(url) - This method is optional and can be used by the
validate method in order to check whether or not the job binary url
is valid</p></li>
<li><p>validate_job_location_format(entry) - Pre checks whether or not the API
entry is valid.</p></li>
<li><p>get_raw_data(job_binary, kwargs) - Used only by the API, it returns
the raw binary. If the type doesn’t support this operation it should
raise NotImplementedException.</p></li>
</ul>
<p>These interfaces will be organized in the following folders structure:</p>
<ul class="simple">
<li><p>services/edp/data_sources - Will contain the data source interface and
the data source types implementations.</p></li>
<li><p>services/edp/job_binaries - Will contain the job binary interface and
the job binary types implementations.</p></li>
<li><p>services/edp/utils - Will contain utility code that can be shared
by data source implementations and job binary implementations.
Per example: Manila data source implementation will probably share some
code with the job binary implementation.</p></li>
</ul>
<p>Probably some changes in the interface are possible until the changes are
implemented (parameters, method names, parameter names), but the main
structure and idea should stay the same.</p>
<p>Also a plugin manager will be needed to deal directly with the
different types of data sources and job binaries and to provide methods for
the operators to disable/enable data sources and job binaries dynamically.
This plugin manager was not detailed because is going to be similar to the
plugin manager already existent for the cluster plugins.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>A clear alternative is let things the way they are, but Sahara would be more
difficult to extend and to understand; An alternative for the abstractions
defined in the Proposed Change section would be to have only one abstraction
instead of two interfaces for data sources and job binaries since these
interfaces have a lot in common, implementing this alternative would remove
the edp/service/utilities folder letting the code more unified and compact,
but job binary and data source code would be considered only one plugin,
which could difficult the pluggability feature of this change (per example:
the provider would not be able to disable manila for data sources, but enable
it for job binaries) and because of this it was not considered the best
approach, instead we keep job binaries and data sources apart, but in
contrast we need the utilities folder to avoid code replication.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Probably some new methods to manage supported types of data sources and
job binaries will be needed (similar to the methods already offered by
plugins).</p>
<ul class="simple">
<li><p>data_sources_types_list() ; job_binaries_types_list()</p></li>
<li><p>data_sources_types_get(type_name) ; job_binaries_types_get(type_name)</p></li>
<li><p>data_sources_types_update(type_name, data) ;
job_binaries_types_update(type_name, data)</p></li>
</ul>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>After this change is implemented developers will be able to add and enable
new data sources and job binaries easily, by just implementing the abstraction.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee: mariannelinharesm</p>
<p>Other contributors: egafford</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Creation of a Plugin for Data Sources containing the Data Source Abstraction</p></li>
<li><p>Creation of a Plugin for Job Binaries containing the Job Binary Abstraction</p></li>
<li><p>HDFS Plugin Intern</p></li>
<li><p>HDFS Plugin Extern</p></li>
<li><p>Swift Plugin</p></li>
<li><p>Manila Plugin</p></li>
<li><p>Allow job engines to declare which data sources/job binaries they are
capable of using (this may be needed or not depending if exists a job type
that does not support a particular data source or job binary type)</p></li>
<li><p>Changes in the API</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>This change will require only changes in existing unit tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Will be necessary to add a devref doc about the abstractions created.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 28 Dec 2016 00:00:00 CLI for Plugin-Declared Image Generationhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/image-generation-cli.html
<p>This specification describes the next stage of the image generation epic,
and proposes a CLI to generate images from the plugin-defined recipes
described in the <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/validate-image-spi">validate-image-spi</a> blueprint.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/image-generation-cli">https://blueprints.launchpad.net/sahara/+spec/image-generation-cli</a></p>
<p>Sahara has historically used a single monolithic repository of DIB
elements for generation of images. This poses several problems:</p>
<ol class="arabic simple">
<li><p>Unlike most use cases for DIB, Sahara end-users are not necessarily
OpenStack experts, and are using OpenStack as a means to manage
their data processing clusters, but desire to customize their nodes
at the image level. As these users are concerned with data processing
over cloud computing, ease of image modification is critical in the
early stages of technology evaluation. The Sahara team has found that
while DIB is a very powerful tool, it is not overly friendly to new
adopters.</p></li>
<li><p>In order to mature to a stage at which new Sahara users can add
features to the codebase cleanly, Sahara must draw firmer lines
of encapsulation around its plugins. Storing all image generation
recipes (whatever the implementation) in a single repository that
cuts across all plugins is counter to the end goal of allowing
the whole functionality of a new plugin to be exposed at a single
path.</p></li>
<li><p>Sahara end users will very seldom need to modify base OS images in
a particularly deep way, and will seldom need to create new images.
The changes Sahara must make to images in order to prepare them for
use are actually quite basic: we install packages and modify
certain configuration files. Sacrificing some power and speed for
some ease of use is very reasonable in our use case.</p></li>
</ol>
<p>For these reasons, a means to encapsulate image generation logic within
a plugin, a command line utility to exercise these recipes, and a
backing toolchain that emphasizes reliability over flexibility and
speed are indicated.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/validate-image-spi">https://blueprints.launchpad.net/sahara/+spec/validate-image-spi</a>
blueprint describes the first steps of this epic, enabling us to
define recipes for image validation and generation of clusters from
“clean” images.</p>
<section id="cli-input">
<h3>CLI input</h3>
<p>The current specification proposes that a CLI script be added to
sahara at sahara.cli.sahara_pack_image.py. This script will not
start a new service as others do; rather, it will use the pre-existent
sahara.services.images module to run the image generation recipe on a
base image through a new remote implementation. This remote
implementation will use libguestfs’python API to alter the image to the
plugin’s specifications.</p>
<p>–help text for this script follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">Usage</span><span class="p">:</span> <span class="n">sahara</span><span class="o">-</span><span class="n">image</span><span class="o">-</span><span class="n">create</span>
<span class="o">--</span><span class="n">image</span> <span class="n">IMAGE_PATH</span>
<span class="p">[</span><span class="o">--</span><span class="n">root</span><span class="o">-</span><span class="n">filesystem</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">test</span><span class="p">]</span>
<span class="n">PLUGIN</span> <span class="n">PLUGIN_VERSION</span>
<span class="p">[</span><span class="n">More</span> <span class="n">arguments</span> <span class="n">per</span> <span class="n">plugin</span> <span class="ow">and</span> <span class="n">version</span><span class="p">]</span>
<span class="o">*</span> <span class="o">--</span><span class="n">image</span><span class="p">:</span> <span class="n">The</span> <span class="n">path</span> <span class="n">to</span> <span class="n">an</span> <span class="n">image</span> <span class="n">to</span> <span class="n">modify</span><span class="o">.</span> <span class="n">This</span> <span class="n">image</span> <span class="n">will</span> <span class="n">be</span> <span class="n">modified</span>
<span class="ow">in</span><span class="o">-</span><span class="n">place</span><span class="p">:</span> <span class="n">be</span> <span class="n">sure</span> <span class="n">to</span> <span class="n">target</span> <span class="n">a</span> <span class="n">copy</span> <span class="k">if</span> <span class="n">you</span> <span class="n">wish</span> <span class="n">to</span> <span class="n">maintain</span> <span class="n">a</span>
<span class="n">clean</span> <span class="n">master</span> <span class="n">image</span><span class="o">.</span>
<span class="o">*</span> <span class="o">--</span><span class="n">root</span><span class="o">-</span><span class="n">filesystem</span><span class="p">:</span> <span class="n">The</span> <span class="n">filesystem</span> <span class="n">to</span> <span class="n">mount</span> <span class="k">as</span> <span class="n">the</span> <span class="n">root</span> <span class="n">volume</span> <span class="n">on</span> <span class="n">the</span>
<span class="n">image</span><span class="o">.</span> <span class="n">No</span> <span class="n">value</span> <span class="ow">is</span> <span class="n">required</span> <span class="k">if</span> <span class="n">only</span> <span class="n">one</span> <span class="n">filesystem</span> <span class="ow">is</span> <span class="n">detected</span><span class="o">.</span>
<span class="o">*</span> <span class="o">--</span><span class="n">test</span><span class="o">-</span><span class="n">only</span><span class="p">:</span> <span class="n">If</span> <span class="n">this</span> <span class="n">flag</span> <span class="ow">is</span> <span class="nb">set</span><span class="p">,</span> <span class="n">no</span> <span class="n">changes</span> <span class="n">will</span> <span class="n">be</span> <span class="n">made</span> <span class="n">to</span> <span class="n">the</span>
<span class="n">image</span><span class="p">;</span> <span class="n">instead</span><span class="p">,</span> <span class="n">the</span> <span class="n">script</span> <span class="n">will</span> <span class="n">fail</span> <span class="k">if</span> <span class="n">discrepancies</span> <span class="n">are</span> <span class="n">found</span>
<span class="n">between</span> <span class="n">the</span> <span class="n">image</span> <span class="ow">and</span> <span class="n">the</span> <span class="n">intended</span> <span class="n">state</span><span class="o">.</span>
<span class="p">[</span><span class="o">*</span> <span class="n">variable</span> <span class="n">per</span> <span class="n">plugin</span> <span class="ow">and</span> <span class="n">version</span><span class="p">:</span> <span class="n">Other</span> <span class="n">arguments</span> <span class="k">as</span> <span class="n">specified</span> <span class="ow">in</span>
<span class="n">the</span> <span class="n">generation</span> <span class="n">script</span><span class="o">.</span>
<span class="o">*</span> <span class="o">...</span><span class="p">]</span>
<span class="o">*</span> <span class="o">-</span><span class="n">h</span><span class="p">,</span> <span class="o">--</span><span class="n">help</span><span class="p">:</span> <span class="n">See</span> <span class="n">this</span> <span class="n">message</span><span class="o">.</span>
</pre></div>
</div>
<p>Both PLUGIN and PLUGIN_VERSION will be implemented as required subcommands.</p>
<p>If –test-only is set, the image will be packed with the <code class="docutils literal notranslate"><span class="pre">reconcile</span></code>
option set to False (meaning that the image will only be tested, not changed.)</p>
<p>Each image generation .yaml (as originally described in the
validate-images-spi spec,) may now register a set of ‘arguments’ as well
as a set of ‘validators’. This argument specification should precede the
validator declaration, and will take the following form:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">arguments</span><span class="p">:</span>
<span class="o">-</span> <span class="n">java</span><span class="o">-</span><span class="n">version</span><span class="p">:</span>
<span class="n">description</span><span class="p">:</span> <span class="n">The</span> <span class="n">java</span> <span class="n">distribution</span><span class="o">.</span>
<span class="n">target_variable</span><span class="p">:</span> <span class="n">JAVA_VERSION</span>
<span class="n">default</span><span class="p">:</span> <span class="n">openjdk</span>
<span class="n">required</span><span class="p">:</span> <span class="n">false</span>
<span class="n">choices</span><span class="p">:</span>
<span class="o">-</span> <span class="n">oracle</span><span class="o">-</span><span class="n">java</span>
<span class="o">-</span> <span class="n">openjdk</span>
</pre></div>
</div>
<p>A set of arguments for any one yaml must obey the following rules:</p>
<ol class="arabic simple">
<li><p>All argnames must be unique, and no argnames may collide with global script
argument tokens.</p></li>
<li><p>If required is false, a default value must be provided.</p></li>
<li><p>The target_variable field is required, for clarity.</p></li>
</ol>
<p>An ImageArgument class will be added to the sahara.service.images module in
order that all plugins can use the same object format. It will follow the
object model above.</p>
</section>
<section id="new-spi-methods">
<h3>New SPI methods</h3>
<p>In order to facilitate the retrieval of image generation arguments, a
get_image_arguments SPI method will be added to the plugin SPI. The arguments
returned from this method will be used to build help text specific to a plugin
and version, and will also be used to validate input to the CLI. It will
return a list of ImageArguments.</p>
<p>A pack_image method will also be added to the plugin SPI. This method will
have the signature:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="k">def</span> <span class="nf">pack_image</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">hadoop_version</span><span class="p">,</span> <span class="n">remote</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">image_arguments</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span>
</pre></div>
</div>
<p>This method will take an ImageRemote (see below). In most plugins, this
will:</p>
<ol class="arabic simple">
<li><p>Validate the incoming argument map (to ensure that all required arguments
have been provided, that values exist in any given enum, etc.)</p></li>
<li><p>Place the incoming arguments into an env_map per their target_variables.</p></li>
<li><p>Generate a set of ImageValidators (just as is done for image validation).</p></li>
<li><p>Call the validate method of the validator set using the remote and
argument_map.</p></li>
</ol>
<p>However, the implementation is intentionally vague, to allow plugins to
introduce their own image packing tools if desired (as per the
validate-images-spi spec.)</p>
</section>
<section id="argumentcasevalidator">
<h3>ArgumentCaseValidator</h3>
<p>Now that these image definitions need to be able to take arguments, a new
ArgumentCaseValidator will be added to the set of concrete
SaharaImageValidators to assist image packing developers in writing clean,
readable recipes. This validator’s yaml definition will take the form:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">argument_case</span><span class="p">:</span>
<span class="n">argument_name</span><span class="p">:</span> <span class="n">JAVA_VERSION</span>
<span class="n">cases</span><span class="p">:</span>
<span class="n">openjdk</span><span class="p">:</span>
<span class="o">-</span> <span class="p">[</span><span class="n">action</span><span class="p">]</span>
<span class="n">oracle</span><span class="o">-</span><span class="n">java</span><span class="p">:</span>
<span class="o">-</span> <span class="p">[</span><span class="n">action</span><span class="p">]</span>
</pre></div>
</div>
<p>The first value case key which matches the value of one of the variable will
execute its nested actions. All subsequent cases will be skipped.</p>
</section>
<section id="imageremote">
<h3>ImageRemote</h3>
<p>A new ImageRemote class will be added to a new module, at
sahara.utils.image_remote. This class will be encapsulated in its own module
to allow distribution packagers the option of externalizing the dependency on
libguestfs into a subpackage of sahara, rather than requiring libguestfs as a
dependency of the main sahara python library package in all cases.</p>
<p>This class will represent an implementation of the sahara.utils.remote.Remote
abstraction. Rather than executing ssh commands from the provided arguments,
however, this class will execute scripts on a target image file using
libguestfs’ python API.</p>
<p>The CLI will use generate a remote targeting the image at the path specified
by the ‘image’ argument, and use it to run the scripts which would normally be
run over ssh (on clean image generation) within the image file specified.</p>
</section>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We have discussed the option of bringing DIB elements into the Sahara plugins.
However, this was nixed due to the issues above related to tradeoff between
speed and power and usability, and because of certain testing issues (discussed
in an abandoned spec in the Mitaka cycle).</p>
<p>It is also possible that we could maintain our current CLI in
sahara-image-elements indefinitely. However, as more plugins are developed,
a single monolithic repository will become unwieldy.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The new CLI will be (at present) the only means of interacting with this
feature.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Packaging this script as a separate python (and Debian/RPM package) is an
option worth discussing, but at this time it is intended that this tool
be packed with the Sahara core.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>This feature will reuse definitions specified in the <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/validate-image-spi">validate-image-spi</a> spec,
and thus should have minimal developer impact from that spec.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>This feature will hopefully, once it reaches full maturity and testability,
supplant sahara-image-elements, and will provide the baseline for building
image packing facility into the sahara API itself.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>No dashboard representation is intended at this time.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>Create and unit test images.py changes.</p></li>
<li><p>Test plugin-specific image packing for any plugins to implement this
feature. Intended first round of plugins to implement:</p></li>
<li><p>Implement CI tests for this feature.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This feature will introduce a dependency on libguestfs, though it will only
use libguestfs features present in all major distributions.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>As this feature does not touch the API, it will not introduce new Tempest
tests. However, testing the image generation process itself will require
attention.</p>
<p>It is proposed that tests be added for image generation as each plugin
implements this generation strategy, and that nightly tests be created to
generate images and run these images through cluster creation and EDP testing.</p>
<p>These should be implemented as separate tests in order to quickly
differentiate image packing failure and cluster failure.</p>
<p>As these recipes stabilize for any given plugin, we should begin to run
these tests when any change to the sahara repository touches image
generation resources for a specific plugin (which should be well-encapsulated
in a single directory under each version for each plugin.) Toward the end of
this epic (as we are nearing the stage of authoring the API to pack images,
we may consider removing integration tests for SIE to save lab time. Still,
these tests will be, compared to sahara service tests, very resource-light
to run.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This feature should be documented in both devref (for building image
generation recipes) and in userdoc (for script usage).</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Mon, 18 Jul 2016 00:00:00 Designate Integrationhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/designate-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/designate-integration">https://blueprints.launchpad.net/sahara/+spec/designate-integration</a></p>
<p>Designate provides DNS as a service so we can use hostnames instead of
IP addresses. This spec is proposal to implement this feature in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Users want to use meaningful hostnames instead of just ip addresses. Currently
Sahara only changes <code class="docutils literal notranslate"><span class="pre">/etc/hosts</span></code> files when deploying the cluster and
it allows to resolve vms by hostnames only through console and only between
these vms. But with Designate integration the hostnames of vms could be used in
the dashboard and we don’t need to change <code class="docutils literal notranslate"><span class="pre">/etc/hosts</span></code>.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>With this change we’ll be able to act in such way: user have pre installed
Designate on controller machine and set up it with network (see [0]). Also user
have added DNS address to <code class="docutils literal notranslate"><span class="pre">/etc/resolv.conf</span></code> file on client machine. User
creates a cluster template in which it chooses domain names for internal and
external resolution. Then user launches the cluster and all its instances can
be resolved by their hostnames:</p>
<blockquote>
<div><ol class="arabic simple">
<li><p>from user client machine (e.g., links in Sahara-dashboard)</p></li>
<li><p>between instances</p></li>
</ol>
</div></blockquote>
<p>Designate integration will work if we point that we want to use it in sahara
configs. So new config option should be added in the <code class="docutils literal notranslate"><span class="pre">default</span></code> section:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">use_designate</span></code>: boolean variable which indicates should Sahara use
Designate or not (by default is False).</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">nameservers</span></code>: list of servers’ ips with pre installed Designate. This
is required if ‘use_designate’ option is True.</p></li>
</ul>
</div></blockquote>
<p>Domain records format will be <code class="docutils literal notranslate"><span class="pre">instance_name.domain_name.</span></code>. Domain records
will be created by Heat on create heat stack step of cluster creation process.
The hostnames collision isn’t expected: 1. Designate allows only unique domain
names so user can’t create domain with the same name in different tenants. Also
domain names are created in one tenant and aren’t available in another.
2. Nova allows to launch instances with the same name but Sahara adds indexes
(0,1,2,…) for each instance name. The single collision case is when we launch
two clusters with the same node group names and the same cluster names then
Designate doesn’t allow to create duplicated records so user can just change
cluster name.</p>
<p>We should maintain backward compatibility. Backward compatibility cases:</p>
<blockquote>
<div><ul class="simple">
<li><p>Old version of Openstack. Then user can switch off designate with
<code class="docutils literal notranslate"><span class="pre">use_designate</span> <span class="pre">=</span> <span class="pre">false</span></code> (this value by default).</p></li>
<li><p>Cluster already exists. Then user can’t use designate feature.</p></li>
</ul>
</div></blockquote>
<p>Addresses of the domain servers should be written to <code class="docutils literal notranslate"><span class="pre">/etc/resolv.conf</span></code> files
on each of vms in order to successfully resolve created domain records across
these vms. It could be done with cloud-init capabilities.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None. We can leave all as is: now we change <code class="docutils literal notranslate"><span class="pre">/etc/hosts</span></code> files for resolving
hostnames between vms.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Cluster, Cluster Template and Instance entities should have two new columns:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">internal_domain_name</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">external_domain_name</span></code></p></li>
</ul>
</div></blockquote>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>User need to pre install and setup designate by self (see [0]). Also user
should change <code class="docutils literal notranslate"><span class="pre">resolv.conf</span></code> files on appropriate machines in order to
resolve server with designate.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Cluster template creation page will contain additional ‘DNS’ tab with two
dropdown fields: internal domain and external domain.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<p>None</p>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>msionkin (Michael Ionkin)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>implement designate heat template</p></li>
<li><p>implement writing dns server address to <code class="docutils literal notranslate"><span class="pre">resolv.conf</span></code> files</p></li>
<li><p>provide backward compatibility</p></li>
<li><p>add new db fields</p></li>
<li><p>add tab and fields for cluster template page in Sahara dashboard</p></li>
<li><p>add unit tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p>python-designateclient for Sahara-dashboard</p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests should be added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This feature should be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[0] <a class="reference external" href="http://docs.openstack.org/mitaka/networking-guide/adv-config-dns.html">http://docs.openstack.org/mitaka/networking-guide/adv-config-dns.html</a></p>
</section>
Fri, 08 Jul 2016 00:00:00 Adding pagination and sorting ability to Saharahttps://specs.openstack.org/openstack/sahara-specs/specs/newton/pagination.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/pagination">https://blueprints.launchpad.net/sahara/+spec/pagination</a></p>
<p>This specification describes an option for pagination ability which will admit
to work with objects in Sahara with API, CLI and Dashboard.
Also this specification describes sorting ability to Sahara API.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>We are working on adding the ability for pagination to work with objects.
But having too many objects to be shown is not very user friendly. We want
to implement pagination so the dashboard can split the list of pages making
the UI better. User has ability to sort objects by columns in dashboard, but
if we add pagination ability, we will need to add sorting ability in API,
because ordering on UI side only will cause inconvenience since the service
will continue returning pages ordered by the default column.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Will be added four optional parameters in API GET requests, which return
a lists of objects.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Now elements on lists sorted by date of creation.</p>
</div>
<p><code class="docutils literal notranslate"><span class="pre">marker</span></code> - index of the last element on the list, which won’t be in response.</p>
<p><code class="docutils literal notranslate"><span class="pre">limit</span></code> - maximum count of elements in response. This argument must be
positive integer number. If this parameter isn’t passed, in response will be
all the elements which follow the element with id in marker parameter.
Also if <code class="docutils literal notranslate"><span class="pre">marker</span></code> parameter isn’t passed, response will contain first
objects of the list with count equal <code class="docutils literal notranslate"><span class="pre">limit</span></code>. If both parameters aren’t
passed, API will work as usual.</p>
<p><code class="docutils literal notranslate"><span class="pre">sort_by</span></code> - name of the field of object which will be used by sorting.
If this parameter is passed, objects will be sorting by date of creation,
otherwise by this field.</p>
<p>The field can possess one of the next values for every object:</p>
<p>For Node Group Template:
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">plugin</span></code>, <code class="docutils literal notranslate"><span class="pre">version</span></code>, <code class="docutils literal notranslate"><span class="pre">created_at</span></code>, <code class="docutils literal notranslate"><span class="pre">updated_at</span></code></p>
<p>For Cluster Templates:
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">plugin</span></code>, <code class="docutils literal notranslate"><span class="pre">version</span></code>, <code class="docutils literal notranslate"><span class="pre">created_at</span></code>, <code class="docutils literal notranslate"><span class="pre">updated_at</span></code></p>
<p>For Clusters:
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">plugin</span></code>, <code class="docutils literal notranslate"><span class="pre">version</span></code>, <code class="docutils literal notranslate"><span class="pre">status</span></code>, <code class="docutils literal notranslate"><span class="pre">instance_count</span></code></p>
<p>For Job Binaries and Job Binaries Internal
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">create</span></code>, <code class="docutils literal notranslate"><span class="pre">update</span></code></p>
<p>For Data Sources:
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">type</span></code>, <code class="docutils literal notranslate"><span class="pre">create</span></code>, <code class="docutils literal notranslate"><span class="pre">update</span></code></p>
<p>For Job Templates:
<code class="docutils literal notranslate"><span class="pre">name</span></code>, <code class="docutils literal notranslate"><span class="pre">type</span></code>, <code class="docutils literal notranslate"><span class="pre">create</span></code>, <code class="docutils literal notranslate"><span class="pre">update</span></code></p>
<p>For Jobs:
<code class="docutils literal notranslate"><span class="pre">id</span></code>, <code class="docutils literal notranslate"><span class="pre">job_template</span></code>, <code class="docutils literal notranslate"><span class="pre">cluster</span></code>, <code class="docutils literal notranslate"><span class="pre">status</span></code>, <code class="docutils literal notranslate"><span class="pre">duration</span></code></p>
<p>By default Sahara api will return list in ascending order. Also if the user
wants a descending order list, he can use <code class="docutils literal notranslate"><span class="pre">-</span></code> prefix for <code class="docutils literal notranslate"><span class="pre">sort_by</span></code>
argument.</p>
<p>Examples:</p>
<p>Get list of jobs in ascending order sorted by name.</p>
<p><strong>request</strong>
<code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">http://sahara/v1.1/775181/jobs?sort_by=name</span></code></p>
<p>Get list of jobs in descending order sorted by name.</p>
<p><strong>request</strong>
<code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">http://sahara/v1.1/775181/jobs?sort_by=-name</span></code></p>
<p>For convenience, collections contain atom “next” and “previous” markers.
The first page on the list doesn’t contain a previous marker, the last page
on the list doesn’t contain a next link. The following examples illustrate
pages in a collection of cluster templates.</p>
<p>Example:</p>
<p>Get one cluster template after template with <code class="docutils literal notranslate"><span class="pre">id=3</span></code>.</p>
<p><strong>request</strong></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">http://sahara/v1.0/775181/cluster-templates?limit=1&marker=3</span></code></p>
<p><strong>response</strong></p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"cluster_templates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"cluster-template"</span><span class="p">,</span>
<span class="nt">"plugin_name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vanilla"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"4"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node_groups"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"master"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"worker"</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">"markers"</span><span class="p">:</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"next"</span><span class="p">:</span><span class="w"> </span><span class="s2">"32"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"previous"</span><span class="p">:</span><span class="w"> </span><span class="s2">"22"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Example:
Let cluster template with id = 5 will be the last of collection.
Response will contain only “previous” link.</p>
<p><strong>request</strong></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">http://sahara/v1.0/775181/cluster-templates?limit=1&marker=4</span></code></p>
<p><strong>response</strong></p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"cluster_templates"</span><span class="p">:[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node_groups"</span><span class="p">:[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"master"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"worker"</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"cluster-template-2"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"5"</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">],</span>
<span class="w"> </span><span class="nt">"markers"</span><span class="p">:</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"previous"</span><span class="p">:</span><span class="w"> </span><span class="s2">"3"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Add ability get <code class="docutils literal notranslate"><span class="pre">marker</span></code>, <code class="docutils literal notranslate"><span class="pre">limit</span></code>, <code class="docutils literal notranslate"><span class="pre">sort_by</span></code> parameters
in next requests:</p>
<section id="sahara-api-v1-0">
<h4>Sahara API v1.0</h4>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.0/{tenant_id}/images</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.0/{tenant_id}/node-group-templates</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.0/{tenant_id}/cluster-templates</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.0/{tenant_id}/clusters</span></code></p>
</section>
<section id="sahara-api-v1-1">
<h4>Sahara API v1.1</h4>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.1/{tenant_id}/data-sources</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.1/{tenant_id}/job-binary-internals</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.1/{tenant_id}/job-binaries</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.1/{tenant_id}/jobs</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v1.1/{tenant_id}/job-executions</span></code></p>
</section>
<section id="sahara-api-v2">
<h4>Sahara API v2</h4>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/cluster-templates</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/clusters</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/data_sources</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/images</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/job-binaries</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/jobs</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/job-templates</span></code></p>
<p><code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre">/v2/node-group-templates</span></code></p>
</section>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Pagination will be added to Sahara-Dashboard via Horizon abilities. Now we
are using <code class="docutils literal notranslate"><span class="pre">DataTable</span></code> class to represent lists of data objects.
This class supports pagination abilities.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>mlelyakin (<a class="reference external" href="mailto:mlelyakin%40mirantis.com">mlelyakin<span>@</span>mirantis<span>.</span>com</a>)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability of take arguments <code class="docutils literal notranslate"><span class="pre">marker</span></code>, <code class="docutils literal notranslate"><span class="pre">limit</span></code> in Sahara API</p></li>
<li><p>Adding unit tests for new features.</p></li>
<li><p>Adding ability of take argument <code class="docutils literal notranslate"><span class="pre">sort_by</span></code> in Sahara API</p></li>
<li><p>Adding this abilities in Sahara CLI Client</p></li>
<li><p>Adding this abilities in Dashboard</p></li>
<li><p>Documented pagination and sorting features</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Will be covered with unit tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Will be adding to documentation of Sahara API.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Sun, 03 Jul 2016 00:00:00 Refactor the logic around use of floating ips in node groups and clustershttps://specs.openstack.org/openstack/sahara-specs/specs/newton/refactor-use-floating-ip.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/refactor-use-floating-ip">https://blueprints.launchpad.net/sahara/+spec/refactor-use-floating-ip</a></p>
<p>Currently, there is a boolean in the configuration file called
<em>use_floating_ips</em> which conflates the logic around the existence of
floating ips for instances and the use of floating ips for management by
sahara. This logic should be refactored.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The <em>use_floating_ips</em> boolean when set True has the following implications:</p>
<ul class="simple">
<li><p>All instances must have a floating ip, except for the case of using a
proxy node.</p></li>
<li><p>Every node group therefore must supply a floating ip pool value when using
neutron, and nova must be configured to auto-assign floating ips when using
nova networking.</p></li>
<li><p>Sahara must use the floating ip for management.</p></li>
<li><p>These requirements are in force across the application for every user,
every node group, and every cluster.</p></li>
</ul>
<p>When <em>use_floating_ips</em> is False and neutron is used:</p>
<ul class="simple">
<li><p>Any node group template that has a floating ip pool value set is not
usable and will fail to launch. Again, there is an exception for this when
using proxy nodes.</p></li>
<li><p>Therefore simply modifying the value of <em>use_floating_ips</em> in the sahara
configuration file will invalidate whole groups of existing, functioning
templates.</p></li>
</ul>
<p>As we move toward a world where virtual and baremetal clusters co-exist, we
need to modify the logic around floating ips to provide more flexibility. For
instance, a baremetal cluster that uses a flat physical network with no
floating ips should be able to co-exist in sahara with a VM cluster that
uses a virtual network with floating ips (currently this is not possible).</p>
<p>As nova, neutron, and ironic make further advances toward hybrid
networking and instance scenarios, sahara needs the flexibility to adapt to new
features. The current conflated logic is a roadblock to this flexibility.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Logically the change is very simple, but implementation will require careful
analysis and testing since knowledge of floating ips and the use of nova vs
neutron networking is spread throughout the code base. This becomes
particularly complex in the logic around ssh connections and proxy gateways
and in the quota logic.</p>
<p>The proposed logical change goes like this:</p>
<ul class="simple">
<li><p>The semantics of <em>use_floating_ips</em> will be changed to mean “if an instance
has a floating ip assigned to it, sahara will use that floating ip for
management, otherwise it will use the internal ip”. This flag will impose no
other requirements, and no other expectations should be associated with it.</p></li>
<li><p>In the case of neutron networking, if a node group specifies a floating ip
pool then the instances in that node group will have floating ips. If the
node group does not specify a floating ip pool this will not be an error but
the instances will not have floating ips.</p></li>
<li><p>In the case of nova networking, if nova is configured to auto-assign
floating ips to instances then instances will have floating ips. If nova
does not assign a floating ip, the instances will not have floating ips.</p></li>
<li><p>In the case of neutron networking, <em>use_namespaces</em> set to True will
continue to mean “use ip netns to connect to machines that do not have
floating ips” but each instance will have to be checked individually to
determine if it has a floating ip assigned. It will no longer be valid
to test <em>CONF.use_namespases and not CONF.use_floating_ips</em></p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None. This logic has been in sahara since the beginning when nova networking
was exclusively used and things were simpler; it’s time to refactor.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None, although changes are needed in the validation code. The API itself
does not change, but what is semantically allowable does change (ie, whether
or not a node group <em>must</em> have a floating ip pool value)</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users will need to be educated on the shifting implications of setting
<em>use_floating_ips</em> and the choice they have when configuring node groups.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>This change <em>should be</em> transparent to existing instances of sahara. Templates
that are functioning should continue to function, and clusters that are
running should not be affected. What will change is the ability to control
use of floating ips in new clusters.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Uncertain. In the past there were configuration settings for the sahara
dashboard that touched on floating ip use. The current configuration
parameters should be reviewed to see if this holds any implication for horizon.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Trevor McKay has produced a patch for this which should be fairly complete,
but he is unable to finish it. He was in the testing phase when work on this
stopped, with fair confidence that the solution works in neutron (but more
testing needs to be done in a nova networking environment)</p>
<p>Primary assignee: tellesnobrega</p>
<p>Other contributors: tmckay</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Refactor floating-ip use</p></li>
<li><p>Implement tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests are sufficient to cover changes to template validation routines
and logical flow in sahara (is sahara in a particular case trying to use a
floating ip or not?)</p>
<p>Scenario tests should be constructed for both values of <em>use_floating_ips</em>,
for both neutron and nova networking configurations, and for node groups
with and without floating ip pool values.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The new implications of <em>use_floating_ips</em> should be covered in the
documentation on configuration values and set up of nova in the
nova networking case.</p>
<p>It should also be noted in discussion of node group templates that
floating ip pool values are no longer required or disallowed based
on the value of <em>use_floating_ips</em></p>
<p>As mentioned above, it’s unclear whether anything needs to change in
sahara dashboard configuration values. If something does change, then
horizon docs should be changed accordingly.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 01 Jul 2016 00:00:00 Allow creation of python topologies for Stormhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/python-storm-jobs.html
<p>In order to allow user to develop storm topologies that we can also can call
storm jobs in a pure python form we are adding support to Pyleus in Sahara.
Pyleus is a framework that allows the creation of Storm topologies in
python and uses yaml to wire how the flow is going to work.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/python-storm-jobs">https://blueprints.launchpad.net/sahara/+spec/python-storm-jobs</a></p>
<p>Storm is the plugin in Sahara responsible for real time processing. Storm is
natively written in Java, being the common languange to write topologies.
Storm allows topologies to be written in different languages, including
python, but the default way to implement this still requires a java shell
combining the python components together being not very pythonic.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We are OpenStack and we love our python, so in order to allow OpenStack users
to create a submit python written topologies we propose to integrate the
Pyleus framework into the Storm plugin.
Pyleus allows the user to create Storm topologies components in python and
it provides an abstraction to help that construction, making the
implementation easier. Also it uses a yaml file that will be used to compile
the topology. The final object of the compilation is a jar file containing
the python written topology.
The change we need to do is to integrate pyleus command line into the Storm
plugin to submit the topologies using its CLI instead of Storm’s. The
overall UX will remain the same, since the user will upload a jar file and
start a topology. We will create a new job type for Storm called pyleus so
the plugin can handle the new way of submitting the job to the cluster.</p>
<p>The command line will like this:</p>
<ul class="simple">
<li><p>pyleus submit [-n NIMBUS_HOST] /path/to/topology.jar</p></li>
<li><p>pyleus kill [-n NIMBUS_HOST] TOPOLOGY_NAME</p></li>
<li><p>pyleus list [-n NIMBUS_HOST]</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>The option is to leave Storm as it is, accepting only default jobs java or
python.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>There will be a minor REST API impact since we are introducing a new Job Type.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>We will need to install pyleus on the Storm images.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Minor changes will be made to add new Job Type.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tellesmvn</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>Create new Job type.</p></li>
<li><p>Change Storm plugin the deal with the new job type.</p></li>
<li><p>Implement tests for this feature.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>To this point only unit tests will be implemented.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This feature should be documented in the user doc.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 22 Jun 2016 00:00:00 Refactor the sahara.service.api modulehttps://specs.openstack.org/openstack/sahara-specs/specs/newton/refactor-sahara.service.api.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/v2-api-experimental-impl">https://blueprints.launchpad.net/sahara/+spec/v2-api-experimental-impl</a></p>
<p>The HTTP API calls that sahara receives are processed by the
<code class="docutils literal notranslate"><span class="pre">sahara.api</span></code> package, the functions of this module then call upon the
<code class="docutils literal notranslate"><span class="pre">sahara.service.api</span></code> and <code class="docutils literal notranslate"><span class="pre">sahara.service.edp.api</span></code> modules to perform
processing before being passed to the conductor. To help accommodate future
API changes, these modules should be refactored to create a more unified
package for future implementors.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The current state of the API service level modules can be confusing when
comparing it to the API route level modules. This confusion can lead to
misunderstandings in the way the service code interacts with the routing
and conductor modules. To improve the state of readability, and expansion,
this spec defines a new layout for these modules.</p>
<p>More than confusion, as the API is being reworked for the new version 2
features there will need to be additions in the service layer to assist
the endpoint and JSON notational changes. Refactoring these modules will
create a more clear pathway for adding to the sahara API.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change will create a new package named <code class="docutils literal notranslate"><span class="pre">sahara.service.api</span></code> which will
contain all the service level API modules. This new package will create a
unified location for service level API changes, and provide a clear path for
those wishing to make said changes. The new package will also contain the
base service level files for the v2 API.</p>
<p>The new package layout will be as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span><span class="o">/</span><span class="n">service</span><span class="o">/</span><span class="n">api</span><span class="o">/</span><span class="fm">__init__</span><span class="o">.</span><span class="n">py</span>
<span class="n">v10</span><span class="o">.</span><span class="n">py</span>
<span class="n">v11</span><span class="o">.</span><span class="n">py</span>
<span class="n">v2</span><span class="o">/</span><span class="fm">__init__</span><span class="o">.</span><span class="n">py</span>
<span class="n">clusters</span><span class="o">.</span><span class="n">py</span>
<span class="n">cluster_templates</span><span class="o">.</span><span class="n">py</span>
<span class="n">data_sources</span><span class="o">.</span><span class="n">py</span>
<span class="n">images</span><span class="o">.</span><span class="n">py</span>
<span class="n">job_binaries</span><span class="o">.</span><span class="n">py</span>
<span class="n">job_executions</span><span class="o">.</span><span class="n">py</span>
<span class="n">jobs</span><span class="o">.</span><span class="n">py</span>
<span class="n">job_types</span><span class="o">.</span><span class="n">py</span>
<span class="n">node_group_templates</span><span class="o">.</span><span class="n">py</span>
<span class="n">plugins</span><span class="o">.</span><span class="n">py</span>
</pre></div>
</div>
<p>This new layout will provide a clean, singular, location for all service
level API code. The files created for the <code class="docutils literal notranslate"><span class="pre">sahara.service.api.v2</span></code> package
will be simple copies of the current functionality.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>One alternative is to do nothing and leave the <code class="docutils literal notranslate"><span class="pre">sahara.service.api</span></code> and
<code class="docutils literal notranslate"><span class="pre">sahara.service.edp.api</span></code> modules as they are and create new code for v2
either in those locations or a new file. A downside to this approach is that
it will be less clear where the boundaries between the different versions
will exist. This will also leave a larger questions as to where the new v2
code will live.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>This change should improve the developer experience with regard to creating
and maintaining the API code as it will be more clear which modules control
each version.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Michael McCune (elmiko)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>create new package</p></li>
<li><p>move current implementations into new package</p></li>
<li><p>add v2 related service files</p></li>
<li><p>fix outstanding references</p></li>
<li><p>add documentation to v2 developer docs</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>As these changes will leave the interfaces largely intact, they will be
tested through our current unit and tempest tests. The will not require
new tests specifically for this change.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>A new section in the API v2 developer docs will be added to help inform about
the purpose of these files and how they will be used in the work to
implement features like JSON payload changes.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 28 Mar 2016 00:00:00 Support of distributed periodic taskshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/distributed-periodics.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/distributed-periodics">https://blueprints.launchpad.net/sahara/+spec/distributed-periodics</a></p>
<p>This specification proposes to add distributed periodic tasks.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently periodic tasks are executed on each engine simultaneously, so that
the amount of time between those tasks can be very small and load between
engines is not balanced. Distribution of periodic tasks between engines can
deal with both problems.</p>
<p>Also currently we have 2 periodic tasks that make clusters cleanup. Clusters
terminations in these tasks are performed in cycle with direct termination
calls (bypassing OPS). This approach leads to cessation of periodic tasks on
particular engine during cluster termination (it harms even more if some
problems during termination occurs).</p>
<p>Moreover distribution of periodic tasks is important for peridic health checks
to prevent extra load on engines.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change consists of two things:</p>
<ul class="simple">
<li><p>Will be added ability to terminate clusters in periodic tasks via OPS;</p></li>
<li><p>Will be added support of distributed periodic tasks.</p></li>
</ul>
<p>Distributed periodic tasks will be based on HashRing implementation and Tooz
library that provides group membership support for a set of backends [1].
Backend will be configured with <code class="docutils literal notranslate"><span class="pre">periodic_coordination_backend</span></code> option.</p>
<p>There will be only one group called <code class="docutils literal notranslate"><span class="pre">sahara-periodic-tasks</span></code>. Once a group is
created, any coordinator can join the group and become a member of it.</p>
<p>As an engine joins the group and builds HashRing, hash of its ID is being
made, and that hash determines the data it’s going to be responsible for.
Everything between this number and one that’s next in the ring and that belongs
to a different engine, is now belong to this engine.</p>
<p>The only remaining problem is that some engines will have disproportionately
large space before them, and this will result in greater load on them. This can
be ameliorated by adding each server to the ring a number of times in different
places. This is achieved by having a replica count (will be 40 by default).</p>
<p>HashRing will be rebuilt before execution of each periodic task to reflect
actual state of coordination group.</p>
<p><code class="docutils literal notranslate"><span class="pre">sahara.service.coordinator.py</span></code> module and two classes will be added:</p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">Coordinator</span></code> class will contain basic coordination and grouping methods:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="k">class</span> <span class="nc">Coordinator</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">backend_url</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">heartbeat</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">join_group</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">group_id</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">get_members</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">group_id</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">HashRing</span></code> class will contain methods for ring building and subset
extraction:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="k">class</span> <span class="nc">HashRing</span><span class="p">(</span><span class="n">Coordinator</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">backend_url</span><span class="p">,</span> <span class="n">group_id</span><span class="p">,</span> <span class="n">replicas</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">_build_ring</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">def</span> <span class="nf">get_subset</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">objects</span><span class="p">):</span>
<span class="o">...</span>
</pre></div>
</div>
</li>
</ul>
<p>Now we have 4 periodic tasks. For each of them will be listed what exactly
will be distributed:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">update_job_statuses</span></code>
Each engine will get a list of all job executions but will update statuses
for only a part of them according to a hash values of their IDs.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">terminate_unneeded_transient_clusters</span></code>
Each engine will get a list of all clusters, check part of them and request
termination via OPS if needed.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">terminate_incomplete_clusters</span></code>
Same as for <code class="docutils literal notranslate"><span class="pre">terminate_unneeded_transient_clusters</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">check_for_zombie_proxy_users</span></code>
Each engine will get a list of users but will check if it is zombie or not
only for part of them.</p></li>
</ul>
<p>Also we will have a periodic task for health checks soon. Health check task
will be executed on clusters and this clusters will be split among engines in
the same way as job executions for <code class="docutils literal notranslate"><span class="pre">update_job_statuses</span></code>.</p>
<p>If a coordination backend is not provided during configuration, periodic
tasks will be launched in an old-fashioned way and HashRing will not be built.</p>
<p>If a coordination backend is provided, but configured wrong or not accessible,
engine will not be started with corresponded error.</p>
<p>If a connection to the coordinator will be lost, periodic tasks will be
stopped. But once connection is established again, periodic tasks will be
executed in the distributed mode.</p>
<p>In order to keep the connection to the coordination server active,
<code class="docutils literal notranslate"><span class="pre">heartbeat</span></code> method will be called regularly (every <code class="docutils literal notranslate"><span class="pre">heartbeat_timeout</span></code>
seconds) in a separate thread.</p>
<p>Configurable number of threads (each thread will be a separate member of a
group) performing periodic tasks will be launched.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Coordination backend should be configured.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability to terminate clusters in periodic tasks via OPS;</p></li>
<li><p>Implementing HashRing for distribution of periodic tasks;</p></li>
<li><p>Documenting changes in periodic tasks configuration.</p></li>
<li><p>Adding support of distributed periodics to devstack with ZooKeeper as a
backend</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Tooz package [2]</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests, enabling distributed periodics in intergration tests with one of
the supported backends (for example, ZooKeeper) and manual testing for all
available backends [1] supported by Tooz library.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Sahara REST API documentation in api-ref will be updated.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1]: <a class="reference external" href="http://docs.openstack.org/developer/tooz/compatibility.html#driver-support">http://docs.openstack.org/developer/tooz/compatibility.html#driver-support</a></p>
<p>[2]: <a class="reference external" href="https://pypi.python.org/pypi/tooz">https://pypi.python.org/pypi/tooz</a></p>
</section>
Tue, 26 Jan 2016 00:00:00 Implement Sahara cluster verification checkshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/cluster-verification.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cluster-verification">https://blueprints.launchpad.net/sahara/+spec/cluster-verification</a></p>
<p>Now we don’t have any check to take a look at cluster processes status through
Sahara interface. Our plans is to implement cluster verifications and ability
to re-trigger these verifications for the particular cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara doesn’t have any check for cluster processes health monitoring.
Cluster can be broken or unavailable but Sahara will still think that
in ACTIVE status. This may result in end-users losses and so on.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>First of all let’s make several important definitions in here.
A cluster health check is a perform a limited functionality check
of a cluster. For example, instances accessibility,
writing some data to <code class="docutils literal notranslate"><span class="pre">HDFS</span></code> and so on.
Each health check will be implemented as a class object,
implementing several important methods from an abstract base class(abc):</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="k">class</span> <span class="nc">ExampleHealthCheck</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cluster</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cluster</span> <span class="o">=</span> <span class="n">cluster</span>
<span class="c1"># other stuff</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">available</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># method will verify</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">check</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># actual health will in here</span>
<span class="k">def</span> <span class="nf">execute</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># based in availability of the check and results</span>
<span class="c1"># of check will write correct data into database</span>
</pre></div>
</div>
<p>The expected behaviour of <code class="docutils literal notranslate"><span class="pre">check</span></code> method
of a health check is return some important data in case when
everything is ok and in case of errors to raise an exception
with detailed info with failures reasons. Let’s describe
important statuses of the health checks.</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">GREEN</span></code> status: cluster in healthy and health check
passed correctly. In description in such case
we can have the following info: <code class="docutils literal notranslate"><span class="pre">HDFS</span> <span class="pre">available</span> <span class="pre">for</span> <span class="pre">writing</span>
<span class="pre">data</span></code> or <code class="docutils literal notranslate"><span class="pre">All</span> <span class="pre">datanodes</span> <span class="pre">are</span> <span class="pre">active</span> <span class="pre">and</span> <span class="pre">available</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> status: <code class="docutils literal notranslate"><span class="pre">YellowHealthError</span></code> will be raised
as result of the operation, it means that something is
probably wrong with cluster. By the way cluster is still
operable and can be used for running jobs. For example,
in exception message we will see the following information:
<code class="docutils literal notranslate"><span class="pre">2</span> <span class="pre">out</span> <span class="pre">of</span> <span class="pre">10</span> <span class="pre">datanodes</span> <span class="pre">are</span> <span class="pre">not</span> <span class="pre">working</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">RED</span></code> status: <code class="docutils literal notranslate"><span class="pre">RedHealthError</span></code> will be raised in such
case, it means that something is definitely wrong we the cluster
we can’t guarantee that cluster is still able to perform perations
on it. For example, <code class="docutils literal notranslate"><span class="pre">Amount</span> <span class="pre">of</span> <span class="pre">active</span> <span class="pre">datanodes</span> <span class="pre">is</span> <span class="pre">less</span> <span class="pre">then</span>
<span class="pre">replication</span></code> is possible messange in such case.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">CHECKING</span></code> state, health check is still running or
was just created.</p></li>
</ul>
</div></blockquote>
<p>A cluster verification is a combination of the several cluster
health checks. Cluster verification will indicate the current status
for the cluster: <code class="docutils literal notranslate"><span class="pre">GREEN</span></code>, <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> or <code class="docutils literal notranslate"><span class="pre">RED</span></code>. We will
store the latest verification for the cluster in database. Also
we will send results of verifications to Ceilometer, to
view progress of cluster health.</p>
<p>Also there is an idea of running several jobs as part
of several health checks, but it will be too harmful for the
cluster health and probably will be done later.</p>
<p>Also we can introduce periodic task for refreshing health status
of the cluster.</p>
<p>So, several additional options in <code class="docutils literal notranslate"><span class="pre">sahara.conf</span></code>
should be added in new section <code class="docutils literal notranslate"><span class="pre">cluster_health_verification</span></code>:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">enable_health_verification</span></code> by default will
be True, will allow disabling periodic cluster verifications</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">verification_period</span></code> variable to define period
between two consecutive health verifications in periodic tasks.
By default I would suggest to run verification once in 10 minutes.</p></li>
</ul>
</div></blockquote>
<p><strong>Proposed checks</strong></p>
<p>This section is going to describe basic checks of functionalities
of clusters. Several checks will affect almost all plugins, few
checks will be specific only for the one plugin.</p>
<p><strong>Basic checks</strong>:</p>
<p>There are several basic checks for all possible clusters
in here.</p>
<blockquote>
<div><ul class="simple">
<li><p>Check to verify all instances access. If some instances
are not accessible we have <code class="docutils literal notranslate"><span class="pre">RED</span></code> state.</p></li>
<li><p>Check that all volumes mounted. If some volume is not
mounted, we will have <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> state.</p></li>
</ul>
</div></blockquote>
<p><strong>HDFS checks</strong>:</p>
<blockquote>
<div><ul class="simple">
<li><p>Check that will verify that namenode is working. Of course,
<code class="docutils literal notranslate"><span class="pre">RED</span></code> state in bad case. Actually this will affects
only vanilla plugin clusters and clusters deployed without
HA mode in case of CDH and Ambari plugins.</p></li>
<li><p>Check that will verify amount of living datanodes. We will
have <code class="docutils literal notranslate"><span class="pre">GREEN</span></code> status only when all datanodes are active,
<code class="docutils literal notranslate"><span class="pre">RED</span></code> in case when amount of living datanodes are less
then <code class="docutils literal notranslate"><span class="pre">dfs.replicaton</span></code>, and <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> in all other cases.</p></li>
<li><p>Check to verify ability of cluster to write some data to
<code class="docutils literal notranslate"><span class="pre">HDFS</span></code>. We will have <code class="docutils literal notranslate"><span class="pre">RED</span></code> status if something is failed.</p></li>
<li><p>Amount of free space in HDFS. We will compare this value with
reserved memory in HDFS, and if amount of free space is less
then provided value, check will be supposed to be failed.
If check is not passed, we will have <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> state
we will advice to scale cluster up with some extra datanodes (or
clean some data). I think, we should not have any additional
configuration options in here, just because this check
will never report <code class="docutils literal notranslate"><span class="pre">RED</span></code> state.</p></li>
</ul>
</div></blockquote>
<p><strong>HA checks</strong>:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> state in case when at least one stand-by service
is working; and <code class="docutils literal notranslate"><span class="pre">RED</span></code> otherwise. Affects YARN and HDFS both.</p></li>
</ul>
</div></blockquote>
<p><strong>YARN checks</strong>:</p>
<blockquote>
<div><ul class="simple">
<li><p>Resourcemanger is active. Obviously, <code class="docutils literal notranslate"><span class="pre">RED</span></code> state if something
is wrong.</p></li>
<li><p>Amount of active nodemanagers. <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code> state if something
is not available, and <code class="docutils literal notranslate"><span class="pre">RED</span></code> if amout of live nodemanagers
are less then <code class="docutils literal notranslate"><span class="pre">50%</span></code>.</p></li>
</ul>
</div></blockquote>
<p><strong>Kafka check</strong>:</p>
<blockquote>
<div><ul class="simple">
<li><p>Check that kafka is operable: create example topic, put
several messages in topic, consuming messages. <code class="docutils literal notranslate"><span class="pre">RED</span></code>
state in case of something is wrong.</p></li>
</ul>
</div></blockquote>
<p><strong>CDH plugin check</strong>:</p>
<p>This section is going to describe specific checks for CDH plugin.
For this checks we will need to extend current sahara’s implementation of
<code class="docutils literal notranslate"><span class="pre">cm_api</span></code> tool. There is an API methods to get current health
of the cluster. There are few examples of responses for yarn service.</p>
<p>There is the bad case example:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">"yarn01": {</span>
<span class="go"> "checks": [</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_JOBHISTORY_HEALTH",</span>
<span class="go"> "summary": "GOOD"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_NODE_MANAGERS_HEALTHY",</span>
<span class="go"> "summary": "CONCERNING"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_RESOURCEMANAGERS_HEALTH",</span>
<span class="go"> "summary": "BAD"</span>
<span class="go"> }</span>
<span class="go"> ],</span>
<span class="go"> "summary": "BAD"</span>
<span class="go">}</span>
</pre></div>
</div>
<p>and good case example:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">"yarn01": {</span>
<span class="go"> "checks": [</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_JOBHISTORY_HEALTH",</span>
<span class="go"> "summary": "GOOD"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_NODE_MANAGERS_HEALTHY",</span>
<span class="go"> "summary": "GOOD"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "name": "YARN_RESOURCEMANAGERS_HEALTH",</span>
<span class="go"> "summary": "GOOD"</span>
<span class="go"> }</span>
<span class="go"> ],</span>
<span class="go"> "summary": "GOOD"</span>
<span class="go">}</span>
</pre></div>
</div>
<p>Based on responses above we will calculate health of
the cluster. Also possible states which Cloudera can return through API
are <code class="docutils literal notranslate"><span class="pre">DISABLED</span></code> when service was stopped and <code class="docutils literal notranslate"><span class="pre">CONCERNING</span></code> if something
is going to be bad soon. In this health check sahara’s statuses
will be calculated based on the following table:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">+--------------+--------------------------------+</span>
<span class="go">| Sahara state | Cloudera state |</span>
<span class="go">+--------------+--------------------------------+</span>
<span class="go">| GREEN | All services GOOD |</span>
<span class="go">+--------------+--------------------------------+</span>
<span class="go">| YELLOW | At least 1 service CONCERNING |</span>
<span class="go">+--------------+--------------------------------+</span>
<span class="go">| RED | At least 1 service BAD/DISABLED|</span>
<span class="go">+--------------+--------------------------------+</span>
</pre></div>
</div>
<p>Some additional information about Cloudera health checks are
in here: [0]</p>
<p><strong>Ambari plugin</strong>:</p>
<p>Current <code class="docutils literal notranslate"><span class="pre">HDP</span> <span class="pre">2.0.6</span></code> will support only basic verifications. The main
focus in here is to implement additional checks for the Ambari plugin.
There are several ideas of checks in Ambari plugin:</p>
<blockquote>
<div><ul class="simple">
<li><p>Ambari alerts verification. Ambari plugin have several alerts if something
is wrong with current state of the cluster. We can get alerts through Ambari
API. If we have at least one alert in here it’s proposed to use <code class="docutils literal notranslate"><span class="pre">YELLOW</span></code>
status for the verification, and otherwise we will use <code class="docutils literal notranslate"><span class="pre">GREEN</span></code> status for
that.</p></li>
<li><p>Ambari service checks verification. Ambari plugin have a bunch of services
checks in here, which can be re-triggered by user through the Ambari API.
These checks are well described in [1]. If at least one
check failed, we will use <code class="docutils literal notranslate"><span class="pre">RED</span></code> status for that sutiation, otherwise
it’s nice to use <code class="docutils literal notranslate"><span class="pre">GREEN</span></code>.</p></li>
</ul>
</div></blockquote>
<section id="alternatives">
<h3>Alternatives</h3>
<p>All health checks can be disabled by the option.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Graphical description of data model impact:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">+----------------------------+ +-------------------------------+</span>
<span class="go">| verifications | | health_checks |</span>
<span class="go">+----------------------------+ +-----------------+-------------+</span>
<span class="go">| id | Primary Key | | id | Primary Key |</span>
<span class="go">+------------+---------------+ +-----------------+-------------+</span>
<span class="go">| cluster_id | Foreign Key | +-| verification_id | Foreign Key |</span>
<span class="go">+----------------------------+ | +-----------------+-------------+</span>
<span class="go">| created_at | | | | created_at | |</span>
<span class="go">+------------+---------------+ | +-----------------+-------------+</span>
<span class="go">| updated_at | | | | updated_at | |</span>
<span class="go">+------------+---------------+ | +-----------------+-------------+</span>
<span class="go">| checks | | <+ | status | |</span>
<span class="go">+------------+---------------+ +-----------------+-------------+</span>
<span class="go">| status | | | description | |</span>
<span class="go">+------------+---------------+ +-----------------+-------------+</span>
<span class="go"> | name | |</span>
<span class="go"> +-----------------+-------------+</span>
</pre></div>
</div>
<p>We will have two additional tables where we will store verifications
and health checks.</p>
<p>First table with of verifications will have following columns id,
cluster_id (foreign key), created_at, updated_at.</p>
<p>Also will be added new table to store health check results. This table
will have the following columns: id, verification_id,
description, status, created_at and updated_at.</p>
<p>We will have cascade relationship (checks) between cluster verifications and
cluster health checks to get correct access from health check to
cluster verification and vice versa. Also same relationship will be
between cluster and verification for same purpose.</p>
<p>Also to aggregation results of latest verification and disabling/enabling
verifications for particular cluster will be added the new column to
cluster model: <code class="docutils literal notranslate"><span class="pre">verifications_status</span></code>. We will not use <code class="docutils literal notranslate"><span class="pre">status</span></code>
for that purpose just to keep these two variables separately (we
already using status in many places in sahara).</p>
<p>For example of verifications:</p>
<ol class="arabic simple">
<li><p>One health check is still running:</p></li>
</ol>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">"cluster_verification": {</span>
<span class="go"> "id": "1",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "status": "CHECKING",</span>
<span class="go"> "checks": [</span>
<span class="go"> {</span>
<span class="go"> "id": "123",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "status": "GREEN",</span>
<span class="go"> "description": "some description",</span>
<span class="go"> "name": "same_name"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "id": "221",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "status": "CHECKING",</span>
<span class="go"> "description": "some description",</span>
<span class="go"> "name": "same_name"</span>
<span class="go"> },</span>
<span class="go"> ]</span>
<span class="go">}</span>
</pre></div>
</div>
<ol class="arabic simple" start="2">
<li><p>All health checks are completed but one was failed:</p></li>
</ol>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">"cluster_verification": {</span>
<span class="go"> "id": "2",</span>
<span class="go"> "cluster_id": "1112",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:30.295701",</span>
<span class="go"> "STATUS": "RED",</span>
<span class="go"> "checks": [</span>
<span class="go"> {</span>
<span class="go"> ..</span>
<span class="go"> "status": "RED",</span>
<span class="go"> "description": "Resourcemanager is down",</span>
<span class="go"> ..</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> ..</span>
<span class="go"> "status": "GREEN",</span>
<span class="go"> "description": "HDFS is healthy",</span>
<span class="go"> }</span>
<span class="go"> ]</span>
<span class="go">}</span>
</pre></div>
</div>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Mechanism of receiving results of cluster verifications will be quite
simple. We will just use usual <code class="docutils literal notranslate"><span class="pre">GET</span></code> method for clusters.</p>
<p>So, the main API method will be the following:
<code class="docutils literal notranslate"><span class="pre">GET</span> <span class="pre"><tenant_id>/clusters/<cluster_id></span></code>.
In such case, we will return detailed info of the cluster with verifications.</p>
<p>Example of response:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "status": "Active",</span>
<span class="go"> "id": "1111",</span>
<span class="go"> "cluster_template_id": "5a9a09a3-9349-43bd-9058-16c401fad2d5",</span>
<span class="go"> "name": "sample",</span>
<span class="go"> "verifications_status": "RUNNING",</span>
<span class="go"> ..</span>
<span class="go"> "verification": {</span>
<span class="go"> "id": "1",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "checks": [</span>
<span class="go"> {</span>
<span class="go"> "id": "123",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "status": "GREEN",</span>
<span class="go"> "description": "some description",</span>
<span class="go"> "name": "same_name"</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "id": "221",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "status": "CHECKING",</span>
<span class="go"> "description": "some description",</span>
<span class="go"> "name": "same_name"</span>
<span class="go"> },</span>
<span class="go"> ]</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
<p>For re-triggering to cluster verification, some additional
behaviour should be added to the following API method:</p>
<p><code class="docutils literal notranslate"><span class="pre">PATCH</span> <span class="pre"><tenant_id>/clusters/<cluster_id></span></code></p>
<p>If the following data will be provided to this API method
we will re-trigger verification:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> 'verification': {</span>
<span class="go"> 'status': 'START'</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
<p>Start will be reject when verifications disabled for the
cluster or when verification is running on the cluster.</p>
<p>Also we can disable verification for particular cluster
to avoid unneeded noisy verifications until health issues are
fixed by the following request data:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> 'verification': {</span>
<span class="go"> 'status': 'DISABLE'</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
<p>And enable in case we need to enable health checks again. If
user is trying to disable verification only future verifications
will be disabled, so health checks still will be running.</p>
<p>If something additional will be added to this data we will mark
request as invalid. Also we will implement new validation methods
to deny verifications on cluster which already have one verification running.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Need to implement requests for run checks via python-saharaclient
and get their results.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Dashboard impact is need to add new tab in cluster details with results of
verifications.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
<dt>Other contributors:</dt><dd><p>apavlov-n, esikachev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>Implement basic skeleton for verifications (with base checks)</p></li>
<li><p>Python-saharaclient support addition</p></li>
<li><p>CLI support should be implemented</p></li>
<li><p>Implement tab with verification results to Horizon</p></li>
<li><p>Need to add new WADL docs with new api-method</p></li>
<li><p>All others checks should be implemented</p></li>
<li><p>Should be added support to scenario framework to allow
re-triggering.</p></li>
<li><p>Implement sending history to Ceilometer.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Feature will be covered by the unit tests, and manually.
New test commit (not for merging) will be added to show
that all verifications are passed (since we are at the middle
of moving scenario framework).</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentaion should updated with additional information
of <code class="docutils literal notranslate"><span class="pre">How</span> <span class="pre">to</span></code> repair issues described in the health check results.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[0] <a class="reference external" href="http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_ht.html">http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_ht.html</a>
[1] <a class="reference external" href="https://cwiki.apache.org/confluence/display/AMBARI/Running+Service+Checks">https://cwiki.apache.org/confluence/display/AMBARI/Running+Service+Checks</a></p>
</section>
Wed, 20 Jan 2016 00:00:00 Add an option to sahara-image-create to generate bare metal imageshttps://specs.openstack.org/openstack/sahara-specs/specs/sahara-image-elements/sahara-bare-metal-images.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-bare-metal-images">https://blueprints.launchpad.net/sahara/+spec/sahara-bare-metal-images</a></p>
<p>Bare metal image generation is supported by disk-image-create but there
is no option exposed in sahara-image-create that allows the generation
of sahara images for bare metal. If sahara is to support bare metal
deployments, we must have sahara bare metal images.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Users should have a simple option to generate a sahara bare metal image.
This option should be applicable to all platforms and all plugins. The
default behavior of sahara-image-create should remain unchanged – it
should generate VM images if the option is not enabled.</p>
<p>To generate a sahara bare metal image, the “vm” element needs to be left
out of the element list and the “grub2”, “baremetal”, and
“dhcp-all-interfaces” elements should be added.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add a “-b” command line option that sets a boolean flag indicating bare
metal image generation. If this option is set, add
“grub2 baremetal dhcp-all-interfaces” to the list of elements
passed to disk-image-create and prevent the “vm” element from being passed.</p>
<p>Do not bother making the list of baremetal elements modifiable from
the shell. It’s unlikely that capability will be needed.</p>
<p>If the “-b” command line option is not set, no bare metal elements
will be added to the element list and the “vm” element will not be
removed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>No impact beyond the change to sahara-image-create itself.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tmckay</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>A single patch to diskimage-create.sh</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually test the generation of bare metal images for all OSs and plugins.
Image generation should succeed and generate .vmlinuz, .initrd, and .qcow2
files. However, successful generation doesn’t guarantee that they will
actually work (for instance in Kilo, Fedora and CentOS 6 images will not
boot correctly in ironic)</p>
<p>For this change, it is enough that the image generation completes.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The “-b” option should be mentioned where we discuss image generation.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 19 Jan 2016 00:00:00 Improve anti-affinity behavior for cluster creationhttps://specs.openstack.org/openstack/sahara-specs/specs/newton/improving-anti-affinity.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/improving-anti-affinity">https://blueprints.launchpad.net/sahara/+spec/improving-anti-affinity</a></p>
<p>Enable sahara to distribute node creation in a more equitable manner with
respect to compute hardware affinity.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current anti-affinity in sahara allows nodes in the anti-affinity group,
equal to the number of hypervisors
(<a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1426398">https://bugs.launchpad.net/sahara/+bug/1426398</a>).</p>
<p>If the number of nodes in the anti-affinity group are more than the number of
hypervisors, sahara throws an error.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>User will be able to define a ratio i.e number of nodes per hypervisor
when requesting anti-affinity for a process.</p>
<p>The ratio would be a field while creating a cluster if user selects
anti-affinity</p>
<p>Based on the ratio given by the user and number of nodes, more server
groups will be created.</p>
<p>Number of server groups would be equal to the number of nodes per hypervisor.</p>
<p>In terms of heat templates, the server groups would be created while
serializing the resources if anti-affinity is enabled for that cluster.</p>
<p>Instances would be allocated to those server groups while serializing the
instance using “group” property of “scheduler_hints” which will be set to
different server group for each instance in round robin fashion.</p>
<p>For allocation of server groups, following changes would be required:</p>
<ul class="simple">
<li><p>Create a parameter named SERVER_GROUP_NAMES of type list in
the OS::Heat::ResourceGroup resource</p></li>
<li><p>Store the server group name for each instance in the node group in
this parameter. So the size of the parameter list would be equal to
the number of instances in the node group</p></li>
<li><p>Now the instance with index i would belong to the server group name
stored at SERVER_GROUP_NAMES[i]</p></li>
<li><p>This parameter will then be accessed from the scheduler hints</p></li>
</ul>
<p>So in the node group template, scheduler hints will look like this,</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="s2">"scheduler_hints"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"group"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"get_param"</span><span class="p">:</span> <span class="p">[</span><span class="n">SERVER_GROUP_NAMES</span><span class="p">,</span> <span class="p">{</span><span class="s2">"get_param"</span><span class="p">:</span> <span class="s2">"instance_index"</span><span class="p">}]</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>E.g</p>
<p>A = Number of hypervisors = 5</p>
<p>B = Total number of nodes in the a-a group = 10</p>
<p>C = Number of nodes per hypervisor = nodes:hypervisor = 2</p>
<p>Number of server groups = C = 2</p>
<p>Nodes would be distributed in each of the created server groups in round-robin
fashion.</p>
<p>Although, placement of any node in any of the server groups does not matter
because all the nodes are anti-affine.</p>
<p>In case the ratio given by the user in the above example is 1, user will
still get an error which will be thrown by nova.</p>
<p>We won’t allow old clusters to scale with new ratio</p>
<p>When a user requests to scale a cluster after the ratio has changed or requests
a new ratio on an existing cluster, an error would be thrown saying “This
cluster was created with X ratio, but now the ratio is Y. You will need to
recreate”.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Ratio would be a field in the cluster object</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The change would provision the instances without any error even in case of more
nodes in the anti-affinity group than the number of hypervisors if user defines
the ratio correctly.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Yes a field has to be added in the sahara dashboard for collecting the ratio.
The field will be displayed only when anti-affinity is selected.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>akanksha-aha</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a ratio field in sahara-dashboard</p></li>
<li><p>Add the same field in sahara wherever required (Data Access Layer)</p></li>
<li><p>Add a new API which creates more server groups when required</p></li>
<li><p>Write Unit tests and run those tests</p></li>
<li><p>Write documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Will need to write unit tests</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to add about improved anti-affinity behavior</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 18 Jan 2016 00:00:00 Reduce Number Of Dashboard panelshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/reduce-number-of-dashboard-panels.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/reduce-number-of-panels">https://blueprints.launchpad.net/sahara/+spec/reduce-number-of-panels</a></p>
<p>The sahara UI currently consists of 10 panels. That is more than any other
Openstack service. Additionally, the way that the panels are meant to
interact with each other is not intuitively obvious and can lead to confusion
for users. The purpose of this spec is to propose reorganizing the sahara
UI into a more logical layout that will give both a cleaner look and a more
intuitive set of tools to the users.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The sahara dashboard has too many panels without enough context provided
to convey how each panel relates to the others. They are currently grouped
vertically so that each appears close to the related panels in the list,
but there is no other visual cue given. Given that, learning how to use the
dashboard can often be a frustrating exercise of trial and error.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>In the 10 panels that are currently in the dashboard, 4 of them
(Clusters, Cluster Templates, Node Group Templates, Image Registry) are
focused around cluster creation. Given that, it makes sense to fold them
into a single panel which will be called “Clusters”. This will make it very
obvious where to go for all things cluster-related. Each of the current
cluster-centric panels will have their own tab within the newly created panel.
There are 5 panels (Jobs, Job Templates, Job Binaries, Data Sources and
Plugins) that are focused around job creation and execution.</p>
<p>In addition to the conversion from panels to tabs, I’m proposing that the code
also gets reorganized to reflect the new layout (ie: moving code to be a
subdir of the panel that it will reside in – there will be a top level
“clusters” panel that will contain subdirectories for each of the tabs in that
panel (clusters, cluster templates, node group templates, image registry).
That reorganization will result in the changing of the URL definitions and
will also change how we reference template names (Instead of
“project/data_processing.nodegroup-templates/whatever.html”, the “/project”
part can be dropped, resulting in
“data_processing.nodegroup-templates/whatever.html”, which is a slight
improvement in brevity. Additionally, we can also elect to drop the
“data_processing.” portion by renaming the template folders. The oddity that
forced us to use that as a workaround has since been solved in the horizon
“enabled” mechanism.</p>
<p>Here is a quick ascii diagram showing the proposed UI layout:
+————+—————————————————+
| | |
| Clusters | Clusters|Cluster Templates|Node Group Templates |
| Jobs | +——————————————–+ |
| | | | |
| | | Usual table for chosen object | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | +——————————————–+ |
| | |
+————+—————————————————+</p>
<p>Addionally, the current “Guide” panel will be split to fit more readily
into the job and cluster spaces. It currently occupies a separate tab and is
linked to from buttons in the other panels. The buttons will remain the same,
but the code will be moved into subdirectories under clusters for the clusters
guide and jobs for the job guide.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We could choose to leave the current panels alone and rely on stronger,
more verbose documentation and possibly provide links to that documentation
from within the dashboard itself. After multiple discussions with the sahara
team, it is clear that this approach is not as desirable as reorganizing
the layout.</p>
<p>We could possibly avoid some of the code reorganization, but in my opinion,
it is important to have the code accurately reflect the layout. It makes it
easier to design future changes and easier to debug.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>No data model impact</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>No REST API impact</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>End users will see a different layout in the sahara dashboard. While the
difference in moving to tabs is significant, once an experienced user gets
into the tabs, the functionality there will be unchanged from how it
currently works. Some of the URLs used will also change a bit to remove
any ambiguity (for instance the “details” URLs for each object type will be
renamed to become something like “ds-details” for the data source details
URLs).</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>No deployer impact</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>No developer impact</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>No sahara-image-elements impact</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>There will be changes to the “enabled” files in the sahara-dashboard project,
but the process for installation and running will remain the same. If you
are upgrading an existing installation, there will be some “enabled” files
that are no longer required (in addition to updated files). We will need to be
sure to document (and script wherever possible) which files should be removed.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>croberts</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Reorganize sahara UI code to convert multiple panels into 2 panels that
contain multiple tabs. This will likely be spread into multiple patches
in order to try to minimize the review effort of each chunk. Each chunk
is still going to be somewhat large though since it will involve moving
entire directories of code around.</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Each of the existing unit tests will still apply. They will all need to be
passing in order to accept this change. It is likely that the (few) existing
integration tests will need to be rewritten or adapted to expect the proposed
set of changes.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The sahara-dashboard documentation is written and maintained by the sahara
team. There will be several documentation updates required as a result of
the changes described in this spec.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 12 Jan 2016 00:00:00 Add ability of suspending and resuming EDP jobs for saharahttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/add-suspend-resume-ability-for-edp-jobs.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/">https://blueprints.launchpad.net/sahara/+spec/</a>
add-suspend-resume-ability-for-edp-jobs</p>
<p>This spec is to allow suspending and resuming edp jobs in sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently sahara does not allow suspending and resuming edp jobs.
But in some use cases, for Example, one edp job containing many steps,
after finishing one step, user want to suspend this job and check the
output, and then resume this job. So by adding suspending and resuming
ability to sahara edp engine, we can have different implementation for
different engine.(oozie,spark,storm etc)</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add one api interface in sahara v11 API.</p>
<p>Define suspend_job() and resume_job() interface in sahara base edp engine,
then implement this interface to the oozie engine. (Spark and storm engine
will be drafted in later spec)</p>
<p>Add “SUSPENDED” and “PREPSUSPENDEDED” in the sahara. only the job’s status
is “RUNNING” or “PREP” can we suspend this job. and make the job’s status
shown as “SUSPENDED” or “PREPSUSPENDED”.</p>
<p>Add a validation dict named suspend_resume_supported_job_type = {} to check
which job type is allowed to suspend and resume when request comes in.</p>
<p>If the job’s status is not in RUNNING or PREP, for example, job is already
finished, we do validation check, and there is no suspend or resume action.</p>
<p>Example of suspending or resuming an edp job</p>
<p>PATCH /v1.1/{tenant_id}/job-executions/<job_execution_id></p>
<p>response:</p>
<p>HTTP/1.1 202 Accepted
Content-Type: application/json</p>
<p>Add one api interface in the python-sahara-client</p>
<p>For oozie implementation, we just call oozie client to invoke suspend and
resume API.</p>
<p>For spark and storm implementation, there is no implementation now, and we
will add them later.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Add one API interface.
PATCH /v1.1/{tenant_id}/job-executions/<job_execution_id></p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>add two combobox item named “suspend job” and “resume job” option
at the right side of the Job list table.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>luhuichun(lu huichun)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>add one api in v11</p></li>
<li><p>add suspend_job and resume_job in base engine and oozie engine</p></li>
<li><p>add two new job status “SUSPENDED” and “PREPSUSPENDED”.</p></li>
<li><p>add two api interface in python-sahara-client</p></li>
<li><p>modify sahara api reference docs</p></li>
<li><p>Add task to update the WADL at api-site</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>unit test in edp engine
add scenario integration test</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>oozie suspend and resume jobs implementation
<a class="reference external" href="https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html">https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html</a></p>
</section>
Thu, 24 Dec 2015 00:00:00 Add recurrence EDP jobs for saharahttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/recurrence-edp-job.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/recurrence-edp-job">https://blueprints.launchpad.net/sahara/+spec/recurrence-edp-job</a></p>
<p>This spec is to allow running recurrence edp jobs in sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently sahara only supports one time click edp job.
But sometimes, we need recurrence edp jobs. For example, from 8:00AM
to 8:00PM, run a job every 5mins. By adding recurrence edp jobs to
sahara edp engine, we can have different engine implementation.
(oozie,spark,storm etc)</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Define run_recurrence_job() interface in sahara base edp engine, then
implement this interface to the oozie engine. (Spark engine will be
drafted in later spec)</p>
<p>Add one job execution type named “recurrence”. Sahara base job engine
will run different type of jobs according to the job execution type.</p>
<p>Add one sahara perodic task named update_recurrence_job_statuses() to
update the recurrence jobs’s sub-jobs running status.</p>
<p>Add validation for invalid data input and check if the output url is
already exist.</p>
<p>Example of recurrence edp job request</p>
<p>POST /v1.1/{tenant_id}/jobs/<job_id>/execute</p>
<p>For supporting this feature,add a new table named child_job_execution
which will store sub-jobs of the father recurrence job. When create a
recurrence edp job, in the same time, we create M(configured) sub-jobs,so
we can show up the status of the M sub-jobs. perodic task will check the
status of the M child jobs’s status and update them, and also if there is
one child job is finished, we create a new feature-run child job to fill
the vacancy, in this case, we maintain a M-row-window in the DB, so this
will avoid the endless child job creation.if we delete this recurrence
job, we delete it’s child jobs in the child_job_execution table,including
finished jobs and future-run jobs.</p>
<p>For creating the child job execution table,we just add one more column
named “father_job_id” based on job execution table which points to it’s
father recurrence job.</p>
<p>For Horizon changes, considering we may have many child jobs, so we only
show the latest M sub jobs when user click the recurrence edp job, and
add two query datetime picker with a button for user to search the history
finished child jobs.</p>
<p>For oozie engine implementation of recurrence edp jobs. we have
changes as below:</p>
<p>Implement the run_recurrence_job in the base edp engine. call oozie client
to submit the recurrence edp job. add perodic task update recurrence job
statuses to update child jobs’s status in the child_job_execution table.</p>
<p>Example of coordinator.xml</p>
<p>For spark implementation, no implementation here, will add them later.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Run edp job manually by login into the VM and running oozie command.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>add a new table named child job execution.Just add one more column named
father_job_id than the current job execution table. Other columns are
totally the same as job execution table.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>There is no change here, we can use current API,
POST /v1.1/{tenant_id}/jobs/<job_id>/execute
We can pass job_execution_type, start time, end time, period_minutes into
job_configs.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">CREATE</span> <span class="n">TABLE</span> <span class="n">child_job_execution</span> <span class="p">(</span>
<span class="nb">id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span>
<span class="n">job_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">),</span>
<span class="n">father_job_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">),</span>
<span class="n">tenant_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">80</span><span class="p">),</span>
<span class="n">input_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">),</span>
<span class="n">output_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">),</span>
<span class="n">start_time</span> <span class="n">DATETIME</span><span class="p">,</span>
<span class="n">end_time</span> <span class="n">DATETIME</span><span class="p">,</span>
<span class="n">info</span> <span class="n">TEXT</span><span class="p">,</span>
<span class="n">cluster_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">),</span>
<span class="n">oozie_job_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">),</span>
<span class="n">return_code</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">80</span><span class="p">),</span>
<span class="n">job_configs</span> <span class="n">TEXT</span><span class="p">,</span>
<span class="n">extra</span> <span class="n">TEXT</span><span class="p">,</span>
<span class="n">data_source_urls</span> <span class="n">TEXT</span><span class="p">,</span>
<span class="n">PRIMARY</span> <span class="n">KEY</span> <span class="p">(</span><span class="nb">id</span><span class="p">),</span>
<span class="n">FOREIGN</span> <span class="n">KEY</span> <span class="p">(</span><span class="n">job_id</span><span class="p">)</span>
<span class="n">REFERENCES</span> <span class="n">jobs</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="n">ON</span> <span class="n">DELETE</span> <span class="n">CASCADE</span>
<span class="p">);</span>
</pre></div>
</div>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>In Job launch page, add select choose box for user to choose the job
execution type, it has three values(basic, scheduled, recurrence),default
value is basic, which is the one-click running job.if user choose
recurrence, there will be two datetime picker named as start and end time
and a textbox for user to input the period_minutes.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>luhuichun(lu huichun)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>define recurrence job type</p></li>
<li><p>add one perodic task named update_recurrence_job_statuses</p></li>
<li><p>create coordinator.xml before run job in edp engine</p></li>
<li><p>upload the coordinator.xml to job’s HDFS folder</p></li>
<li><p>add run_recurrence_job in sahara base engine</p></li>
<li><p>modify sahara api reference docs</p></li>
<li><p>add task to update the WADL at api-site</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>unit test in edp engine
add scenario integration test</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>oozie scheduled and recursive job implementation
<a class="reference external" href="https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html">https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html</a></p>
</section>
Thu, 24 Dec 2015 00:00:00 Add ability of scheduling EDP jobs for saharahttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/scheduling-edp-jobs.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs">https://blueprints.launchpad.net/sahara/+spec/enable-scheduled-edp-jobs</a></p>
<p>This spec is to allow running scheduled edp jobs in sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently sahara only supports one time click edp job.
But in many use cases, we need scheduled edp jobs. So by
adding scheduled ability to sahara edp engine, we can have
different implementation for different engine.(oozie,spark,storm etc)</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Define run_scheduled_job() interface in sahara base edp engine, then
implement this interface to the oozie engine. (Spark and storm engine
will be drafted in later spec)</p>
<p>Define two job execution types which indicate the job execution.
But we need not to change the API, instead we can add parameters into
job configs. Sahara will run different type of jobs according
to the job execution type. In the api request, user should pass the
job_execution_type into job_configs.</p>
<p>Two job execution types:
(1)basic. runs simple one-time edp jobs, current sahara implementation
(2)scheduled. runs scheduled edp jobs</p>
<p>Example of a scheduled edp job request</p>
<p>POST /v1.1/{tenant_id}/jobs/<job_id>/execute</p>
<p>For oozie engine implementation of scheduled edp jobs, we have
changes as blow:</p>
<p>Before running the job, sahara will create a coordinator.xml to describe
the job, then upload it to the HDFS EDP job lib folder. With this file,
sahara call oozie client to submit this job, the job will be run at the
scheduled time, the job status will be shown as “PREP” in the Horizon
page. Certainly, user can delete this job in preparing status as welll as
in running status.</p>
<p>Example of coordinator.xml</p>
<p>For spark and storm implementation, there is no implementation now, and we
will add them later.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>(1)Run edp job manually by login into the VM and running oozie command.
(2)users can create cron jobs</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>There is no change here, and we can use current API,
POST /v1.1/{tenant_id}/jobs/<job_id>/execute
We can pass job_execution_type, start time, into job_configs
to sahara.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>In Job launch page, add textbox for user to input start job time,
default value is now, to compatible with current implementation</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>luhuichun(lu huichun)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>define scheduled job type</p></li>
<li><p>create coordinator.xml before run job in edp engine</p></li>
<li><p>upload the coordinator.xml to job’s HDFS folder</p></li>
<li><p>add run_schedule_job in oozie engine</p></li>
<li><p>modify sahara api reference docs</p></li>
<li><p>Add task to update the WADL at api-site</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>unit test in edp engine
add scenario integration test</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>oozie scheduled and recursive job implementation
<a class="reference external" href="https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html">https://oozie.apache.org/docs/4.0.0/CoordinatorFunctionalSpec.html</a></p>
</section>
Thu, 24 Dec 2015 00:00:00 Remove plugin Vanilla V2.6.0https://specs.openstack.org/openstack/sahara-specs/specs/mitaka/remove-plugin-vanilla-v2.6.0.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/deprecate-plugin-vanilla2.6.0">https://blueprints.launchpad.net/sahara/+spec/deprecate-plugin-vanilla2.6.0</a></p>
<section id="problem-description">
<h2>Problem description</h2>
<p>As far as Vanilla v2.6.0 plugin is deprecated, we should remove all the code
completely.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ol class="arabic simple">
<li><p>Remove all plugin v2_6_0 code under plugins</p></li>
<li><p>Remove all tests including unit and integration tests.</p></li>
<li><p>Update Vanilla plugin documentation.</p></li>
</ol>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>luhuichun</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Remove Vanilla v2.6.0</p></li>
<li><p>Remove all tests for Vanilla v2.6.0</p></li>
<li><p>Update documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Dependend on change Iff2faf759ac21d5bd15372bae97a858a3d036ccb</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documention needs to be updated:
* doc/source/userdoc/vanilla_plugin.rst</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 21 Dec 2015 00:00:00 Remove unsupported versions of MapR pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/deprecate-old-mapr-versions.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/deprecate-old-mapr-versions">https://blueprints.launchpad.net/sahara/+spec/deprecate-old-mapr-versions</a></p>
<p>Remove MapR plugin 3.1.1 4.0.1 4.0.2 5.0.0.mrv1 and mapr-spark.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Some of supported MapR versions are old and not used and others
have their benefits contained in newer versions.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We will not support the following MapR versions and remove
them in Mitaka release.</p>
<p>The following MapR version is removed due to end of support:
* MapR 5.0.0 mrv1
* MapR 4.0.2 mrv1 mrv2
* MapR 4.0.1 mrv1 mrv2
* MapR 3.1.1</p>
<p>mapr-spark version will also be removed because Spark will
go up with latest MapR as a service on YARN.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<ul class="simple">
<li><p>Remove MapR v3.1.1 support</p></li>
<li><p>Remove MapR v4.0.1 support</p></li>
<li><p>Remove MapR v4.0.2 support</p></li>
<li><p>Remove MapR v5.0.0 support</p></li>
<li><p>Remove MapR mapr-spark</p></li>
</ul>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>groghkov</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Remove MapR v3.1.1</p></li>
<li><p>Remove all tests for v3.1.1</p></li>
<li><p>Remove MapR v4.0.1</p></li>
<li><p>Remove all tests for v4.0.1</p></li>
<li><p>Remove MapR v4.0.2</p></li>
<li><p>Remove all tests for v4.0.2</p></li>
<li><p>Remove MapR v5.0.0.mrv1</p></li>
<li><p>Remove all tests for v5.0.0.mrv1</p></li>
<li><p>Remove MapR Spark</p></li>
<li><p>Remove all tests for MapR Spark</p></li>
<li><p>Remove default templates for MapR 3.1.1 4.0.1 4.0.2 5.0.0.mrv2
in sahara/plugins/default_templates/mapr</p></li>
<li><p>Update documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documention needs to be updated:
* doc/source/userdoc/mapr_plugin.rst</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 16 Dec 2015 00:00:00 SPI Method to Validate Imagehttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/validate-image-spi.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/validate-image-spi">https://blueprints.launchpad.net/sahara/+spec/validate-image-spi</a></p>
<p>This specification details the addition of a method to the Sahara plugin SPI
to validate that a chosen image is up to the specification that the plugin
requires. While it is not expected that plugin writers will be able to test
the image deeply enough to ensure that a given arbitrary image will succeed
in cluster generation and be functional in all contexts, it is hoped that by
implementing this method well, plugin authors can provide a well-defined,
machine-actionable contract which will be versioned with the plugin itself.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>At present, Sahara’s image generation and cluster provisioning features are
almost entirely decoupled: sahara-image-elements generates an image, and this
image is taken by the server and assumed to be valid. This introduces the
possibility of version incompatibility between sahara-image-elements and
sahara itself, and failure (complete or partial, immediate or silent) in the
case of the addition or modification of features on either side.</p>
<section id="larger-context">
<h3>Larger context</h3>
<p>This issue is only one part of a larger problem, which will not be wholly
addressed in this spec, but for which this spec is an incremental step toward
a solution.</p>
<p>At present, the processes involved in image generation and use are:</p>
<ol class="arabic simple">
<li><p>Image packing (pre-cluster spawn)</p></li>
<li><p>Clean image provisioning (building a cluster from an OS-only base image)</p></li>
<li><p>Image validation (ensuring that a previously packed image really is ok to
use for the plugin, at least for the rules we can easily check)</p></li>
</ol>
<p>The first, image-packing, is currently only possible via a command line
script. The ideal user experience would allow generation of images either
outside of OpenStack, via a command-line script, or with OpenStack, via a
sahara API method. At present, this is not possible.</p>
<p>The second, in our present architecture, requires essentially rewriting the
logic required to generate an image via the command line process in the plugin
code, leading to duplicate logic and multiple maintenance points wherever
cluster provisioning from clean images is allowed. However, it should be noted
that in the clean image generation case, this logic is in its right place
from an encapsulation perspective (it is maintained and versioned with the
plugin code, allowing for easy separation, rather than maintained in a
monolithic cross-cutting library which serves all plugins.)</p>
<p>The third is not formally undertaken as a separate step at all; it will be
implemented by the feature this specification describes.</p>
<p>Within the context of this larger problem, this feature can be seen as the
first incremental step toward a unified solution for image validation,
unification of clean and packed image generation logic, and facilitation of
image packing via an API. Once this SPI method is stable, functional, and
expresses a complete set of tests for all maintained plugins, the validation
specification can then be reused as a series of idempotent state descriptions
for image packing, which can then be exposed via an API for any plugins which
support it.</p>
</section>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<section id="spi-method-contract">
<h3>SPI method contract</h3>
<p>A new method will be added to the plugin SPI in Sahara:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">validate_images</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cluster</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
<p>This method will be called after cluster provisioning (as this will be
necessary for machine access) and before cluster configuration. This method
will receive the cluster definition as an argument, as well as a boolean flag
describing whether or not the plugin should attempt to fix problems if it
finds them.</p>
<p>If this method is not implemented by a plugin, provisioning will proceed as
normal; as this is purely a safety feature, full backward compatibility with
previous plugin versions is acceptable.</p>
<p>The contract of this method is that on being called, the plugin will take
any steps it sees fit to validate that any utilized images are fit for their
purposes. It is expected that all tests that are run will be necessary for
the cluster to succeed, but not that the whole set of tests will be
absolutely sufficient for the cluster to succeed (as this would essentially
be disproving a universal negative, and would require such in-depth testing as
to become ludicrous.)</p>
<p>If the reconcile flag is set to False, this instructs the plugin that it
should only test the image, but change nothing, and report error if its tests
fail. If reconcile is True (this will be set by default,) then the plugin will
also take any steps it is prepared to take to bring the instances of the
cluster into line with its expectations. Plugins are not required to provide
this functionality, just as they are not required to implement validate_image;
if they wish to fail immediately in the case of an imperfect image, that is
their choice. However, if a plugin does not support reconciliation, and
reconcile is set to True, it must raise an error; likewise, if a plugin
receives reconcile=False but it is not able to avoid reconciliation (if, for
instance, its implementation uses Puppet and will by definition make changes
if needed,) it must raise as well.</p>
</section>
<section id="sahara-plugins-images">
<h3>sahara.plugins.images</h3>
<p>The sahara base service will provide a set of utilities to help plugin authors
to validate their images. These will be found in sahara.plugins.images. Usage
of these utilities is wholly optional; plugin authors may implement validation
using whatever framework they see fit. It is noted that this module could be
immediately written to allow a great deal of deep functionality in terms of
matching image validations to services, allowing custom images to be used for
specific nodegroups and service sets. However, as no plugins are currently
implementing such a feature set, a more basic first iteration is reasonable,
and the methods described below will allow a plugin author to perform such
specific validations if it is desired.</p>
<p>The images module will provide several public members: the definitions of
the most notable (if not all) are given below:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="k">def</span> <span class="nf">validate_instance</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">validators</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Runs all validators against the specified instance."""</span>
<span class="k">class</span> <span class="nc">ImageValidator</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""Validates the image spawned to an instance via a set of rules."""</span>
<span class="n">__metaclass__</span> <span class="o">=</span> <span class="n">abc</span><span class="o">.</span><span class="n">ABCMeta</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">validate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">remote</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">class</span> <span class="nc">SaharaImageValidatorBase</span><span class="p">(</span><span class="n">ImageValidator</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Still-abstract base class for Sahara's native image validation,</span>
<span class="sd"> which provides instantiation of subclasses from a yaml file."""</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">from_yaml</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">yaml_path</span><span class="p">,</span> <span class="n">validator_map</span><span class="p">,</span> <span class="n">resource_roots</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Constructs and returns a validator from the provided yaml file.</span>
<span class="sd"> :param yaml_path: The path to a yaml file.</span>
<span class="sd"> :param validator_map: A map of validator name to class. Each class</span>
<span class="sd"> is expected to descend from SaharaImageValidator. This method</span>
<span class="sd"> will use the static map of validator name to class provided in</span>
<span class="sd"> the sahara.plugins.images module, updated with this map, to</span>
<span class="sd"> parse he appropriate classes to be used.</span>
<span class="sd"> :param resource_root: The roots from which relative paths to</span>
<span class="sd"> resources (scripts and such) will be referenced. Any resource</span>
<span class="sd"> will be pulled from the first path in the list at which a file</span>
<span class="sd"> exists."""</span>
<span class="k">class</span> <span class="nc">SaharaImageValidator</span><span class="p">(</span><span class="n">SaharaImageValidatorBase</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""The root of any tree of SaharaImageValidators."""</span>
<span class="k">def</span> <span class="nf">validate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">remote</span><span class="p">,</span> <span class="n">reconcile</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">env_map</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="w"> </span><span class="sd">"""Validates the image spawned to an instance."""</span>
<span class="p">:</span><span class="n">param</span> <span class="n">env_map</span><span class="p">:</span> <span class="n">A</span> <span class="nb">map</span> <span class="n">of</span> <span class="n">environment</span> <span class="n">variables</span> <span class="n">to</span> <span class="n">be</span> <span class="n">passed</span> <span class="n">to</span>
<span class="n">scripts</span> <span class="ow">in</span> <span class="n">this</span> <span class="n">validation</span><span class="o">.</span><span class="s2">"""</span>
</pre></div>
</div>
<p>Additionally, two classes of error will be added to sahara.plugins.exceptions:</p>
<ul class="simple">
<li><p>ImageValidationError: Exception indicating that an image has failed
validation.</p></li>
<li><p>ImageValidationSpecificationError: Exception indicating that an image
validation spec (yaml) is in error.</p></li>
</ul>
</section>
<section id="saharaimagevalidator">
<h3>SaharaImageValidator</h3>
<p>It is entirely possible for a plugin author, in this framework, to use
idempotent state enforcement toolsets, such as Ansible, Puppet, Chef, and the
like, to validate and reconcile images. However, in order that Sahara need
not absolutely depend on these tools, we will provide the
SaharaImageValidator class.</p>
<p>This validator will provide a classmethod which allows it to build its
validations from a .yaml file. The first iteration of this validator will be
very limited, and as such will provide only a few abstract validation types.
This yaml will be interpreted using whatever ordering is available; as dicts
are unordered in yaml, this scheme makes extensive use of lists of single-
item dicts.</p>
<p>An example .yaml file showing the revision-one validator set follows. Note
that these are not intended to be realistic, sahara-ready definitions, merely
examples taken from our experience:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">validators</span><span class="p">:</span>
<span class="o">-</span> <span class="n">os_case</span><span class="p">:</span>
<span class="o">-</span> <span class="n">redhat</span><span class="p">:</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span> <span class="n">nfs</span><span class="o">-</span><span class="n">utils</span>
<span class="o">-</span> <span class="n">debian</span><span class="p">:</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span> <span class="n">nfs</span><span class="o">-</span><span class="n">common</span>
<span class="o">-</span> <span class="nb">any</span><span class="p">:</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span> <span class="n">java</span><span class="o">-</span><span class="mf">1.8.0</span><span class="o">-</span><span class="n">openjdk</span><span class="o">-</span><span class="n">devel</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span> <span class="n">java</span><span class="o">-</span><span class="mf">1.7.0</span><span class="o">-</span><span class="n">openjdk</span><span class="o">-</span><span class="n">devel</span>
<span class="o">-</span> <span class="n">script</span><span class="p">:</span> <span class="n">java</span><span class="o">/</span><span class="n">setup</span><span class="o">-</span><span class="n">java</span><span class="o">-</span><span class="n">home</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span>
<span class="o">-</span> <span class="n">hadoop</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">libhdfs</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">native</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">pipes</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">sbin</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">lzo</span>
<span class="o">-</span> <span class="n">lzo</span>
<span class="o">-</span> <span class="n">lzo</span><span class="o">-</span><span class="n">devel</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">lzo</span><span class="o">-</span><span class="n">native</span>
</pre></div>
</div>
<p>These resource declarations will be used to instantiate the following basic
validator types:</p>
</section>
<section id="validator-types">
<h3>Validator types</h3>
<section id="saharapackagevalidator-key-package">
<h4>SaharaPackageValidator (key: package)</h4>
<p>Verifies that the package or packages are installed. In the reconcile=True
case, ensures that local package managers are queried before resorting to
networked tools, along the lines of:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/>`dpkg -s $package || apt-get -y install $package` # debian
`rpm -q $package || yum install -y $package` # redhat
</pre></div>
</div>
<p>The input to this validator may be a single package definition or a list of
package definitions. If the packages are grouped in a list, any attempt to
install the packages will be made simultaneously. A package definition may be
a single string or a nested structure, which may support a version attribute
as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="o">-</span> <span class="n">package</span><span class="p">:</span> <span class="n">hadoop</span>
<span class="o">-</span> <span class="n">package</span><span class="p">:</span>
<span class="o">-</span> <span class="n">hadoop</span><span class="o">-</span><span class="n">libhdfs</span>
<span class="o">-</span> <span class="n">lzo</span><span class="p">:</span>
<span class="n">version</span><span class="p">:</span> <span class="n">xxx</span><span class="o">.</span><span class="n">xxx</span>
</pre></div>
</div>
<p>Because reliable version comparison will often require reference to epochs,
and because the tool must succeed in an offline context, the initial, Sahara
core-provided package validator will allow only exact version pinning. As this
version is yaml-editable, this is not adequate to our purposes, and can be
extended by plugin developers if needed and appropriate.</p>
</section>
<section id="saharascriptvalidator-key-script">
<h4>SaharaScriptValidator (key: script)</h4>
<p>Runs an arbitrary script from source, as specified by a relative path from the
resource root.</p>
<p>The input to this validator must be a single script definition. A script
definition may be a single string or a nested structure, which may support
attributes as follows (the example is purely explanatory of the structure):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="o">-</span> <span class="n">script</span><span class="p">:</span> <span class="n">simple_script</span><span class="o">.</span><span class="n">sh</span>
<span class="o">-</span> <span class="n">script</span><span class="p">:</span>
<span class="n">java</span><span class="o">/</span><span class="n">find_jre_home</span><span class="p">:</span>
<span class="n">output</span><span class="p">:</span> <span class="n">JRE_HOME</span> <span class="c1"># Places the stdout of this script into the env map</span>
<span class="c1"># for future scripts</span>
<span class="o">-</span> <span class="n">script</span><span class="p">:</span>
<span class="n">java</span><span class="o">/</span><span class="n">setup_java_home</span><span class="p">:</span>
<span class="n">env_vars</span><span class="p">:</span> <span class="c1"># Sets only the named env vars from the env map</span>
<span class="o">-</span> <span class="n">JDK_HOME</span>
<span class="o">-</span> <span class="n">JRE_HOME</span>
</pre></div>
</div>
<p>Scripts are always provided the env var $SIV_DISTRO, which specifies the linux
distribution per our current SIE distro conventions, and the env var
$SIV_RECONCILE, which is set to 0 if only validation should occur and 1 if
corrective action should be taken.</p>
<p>Additional variables are referenced from the env_map argument passed
originally to SaharaImageValidator.from_yaml (and are presumably parsed from
cluster configuration information). The output attribute of the script
resource can be used to modify this map in flight, placing the output of a
script into the (single) named variable. More complex interactions require
extension.</p>
<p>This validator is intentionally lightweight. These image validations and
manipulations should not be overwhelmingly complex; if deep configuration is
needed, then the more freeform configuration engine should run those steps, or
the plugin author should utilize a more fully-featured state enforcement
engine, with all the dependencies that entails (or write a custom validator).</p>
<p>NOTE THAT ALL SCRIPTS REVIEWED BY THE SAHARA TEAM MUST BE WRITTEN TO BE
IDEMPOTENT. If they are to take non-reproducible action, they must test to see
if that action has already been taken. This is critical to the success of this
feature in the long term.</p>
</section>
<section id="saharaanyvalidator-key-any">
<h4>SaharaAnyValidator (key: any)</h4>
<p>Verifies that at least one of the validators it contains succeeds. If
reconcile is true, runs all validators in reconcile=False mode before
attempting to enforce any. If all fail in reconcile=False mode, it then
attempts to enforce each in turn until one succeeds.</p>
<p>Note that significant damage can be done to an image in failed branches if
any is used with reconcile=true. However, guarding against this sort of
failure would impose a great deal of limitation on the use of this validator.
As such, warnings will be documented, but responsible use is left to the
author of the validation spec.</p>
</section>
<section id="saharaallvalidator-key-all">
<h4>SaharaAllValidator (key: all)</h4>
<p>Verifies that all of the validators it contains succeed. This class will be
instantiated by the yaml factory method noted above, and will contain all
sub-validations.</p>
</section>
<section id="saharaoscasevalidator-key-os-case">
<h4>SaharaOSCaseValidator (key: os_case)</h4>
<p>Switches on the distro of the instance being validated. Recognizes the OS
family names redhat and debian, as per DIB. Runs all validators under the
first case that matches.</p>
</section>
</section>
<section id="notes-on-validators">
<h3>Notes on validators</h3>
<p>Plugin authors may write their own validator types by extending the
SaharaImageValidator type, implementing the interface, and passing the key and
class into the validator_map argument of SaharaImageValidator.from_yaml.</p>
<p>It should be noted that current “clean” image generation scripts should be
moved into this layer as part of the initial effort to implement this method
for any given plugin, even if they are represented as a monolithic script
resource. Otherwise clean images will very likely fail validation.</p>
<p>Note also that the list above are certain to be needed, but as the implementer
works, it may become useful to create additional validators (file, directory,
and user spring to mind as possible candidates.) As such, the list above is
not necessarily complete; I hesitate, however, to list all possible validator
types I can conceive of for fear of driving over-engineering from the spec,
and believe that review of the design of further minor validator types can
wait for code review, so long as this overall structure is agreeable.</p>
</section>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We have many alternatives here.</p>
<p>First, to the problem of merging our validation, packing, and clean image
provisioning logic, we could opt to merge our current image generation code
with our service layer. However, this poses real difficulties in testing, as
our image generation layer, while functional, lacks the stability of our
service layer, and merging it as-is could slow forward progress on the project
as we wrestle with CI.</p>
<p>Assuming that we do not wish to merge our current image generation layer,
we could begin immediately to implement a new image generation layer in the
service side. However, this sort of truly revolutionary step frequently ends
in apathy, conflict, or both. Providing an image validation layer, with the
possibility of growing into a clean image generation API and, later, an image
packing API, is an incremental step which can provide real value in the short
term, and which is needed regardless.</p>
<p>Assuming that we are, in fact, building an image validation API, we could
wholly separate it from any image preparation logic (including clean image
provisioning.) There is a certain purist argument for separation of duties
here, but the practical argument that resource testing and enforcement are
frequently the same steps suggests that we should merge the two for
efficiency.</p>
<p>Assuming that we are allowing reconciliation of the image with the validation
layer, we could, instead of building our own lightweight validation layer,
demand that plugin authors immediately adopt one of Ansible, Puppet, Chef,
Salt, etc. However, three factors lead me not to embrace this option. First,
normal usage of these tools expects network access by default; in our
context, we do not want to use the external network unless absolutely
necessary, as our instances may not be network-enabled. While it is possible
to use them offline, it requires some care to do so, which might be offputting
for newcomers to Sahara who are versed in the chosen tool. Second, Sahara
should not be that opinionated about toolchains, either within our team or to
our userbase. Facilitating the usage of devops toolchains by providing a
clear, well-encapsulated API point is a good goal, but it is not Sahara’s job
to pick a winner in that market. Third, such a framework is a significant
dependency for the sahara core, and such massive dependencies are always to be
regarded with suspicion. As such, providing a very lightweight framework for
validations is worthwhile, so that we do not need to depend absolutely on any
such framework, even in the short term before plugins are abstracted out of
the service repo.</p>
<p>Assuming that we do not wish to immediately adopt such a framework, we could
instead decide to immediately build a full-featured idempotent resource
description language, building many more validators with many more options.
While I may well have missed required, basic options, and welcome feedback, I
strongly suggest that we start with a minimal framework and build upon it,
instead of trying to build the moon from the outset. I have aimed in this spec
for extensibility over completeness (and as such have left some explicit
wiggle room in the set of validators to be implemented in the first pass.)</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None; this change is SPI only.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>For plugins using SaharaImageValidators, end-users will be able to modify the
.yaml files to add packages or run validation or modification scripts against
their images on spawn.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>This SPI method is optional; plugins may, if they’re feeling a bit cowboy
about things today, continue to spawn from any provided image without testing
it. As such, there is no strictly required developer impact with this spec.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None. Sahara-image-elements can keep doing its thing if this is adopted.
Future dependent specs may drive changes in how we expect images to be packed
(hopefully via an OpenStack API,) but this is not that spec, and can be
approved wholly independently.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Other contributors:</dt><dd><p>ptoscano</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add SPI method and call in provisioning flow; wrap to ensure that if absent,
no error is raised.</p></li>
<li><p>Build sahara.plugins.images as specified above, and all listed validators.</p></li>
<li><p>Write .yaml files for CDH and Ambari plugins using this mechanism (other
plugins may adopt over time, as the SPI method is optional.)</p></li>
<li><p>Add unit tests.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>No new dependencies (though this does provide an extension point for which
plugins may choose to adopt new dependencies.)</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit testing is assumed, as in all cases. The image validation mechanism
itself does not need extensive new integration testing; the positive case will
be covered by existing tests. Idempotence testing requires whitebox access to
the server, and is not possible in the scenario framework; if this system ever
is adopted for image generation, at that point we will have the blackbox hooks
to test idempotence by rerunning against a pre-packed image (which should
result in no change and a still-valid image.)</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We will need to document the SPI method, the SaharaImageValidator classes,
and the .yaml structure that describes them.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Tue, 08 Dec 2015 00:00:00 Move scenario tests to a separate repositoryhttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/move-scenario-tests-to-separate-repository.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/move-scenario-tests-to-separate-repo">https://blueprints.launchpad.net/sahara/+spec/move-scenario-tests-to-separate-repo</a></p>
<p>Scenario tests represent an independent framework for testing Sahara. Moving
scenario tests to a separate repository will make testing Sahara stable
releases more comfortable. The migration should be done in a way that there
will be no need to pre-configure the CI or any other external tool, just
install the scenario framework and start the tests.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are 2 issues to be solved:
1. It is difficult to package tests independently from Sahara.
2. It is impossible to release those tests independently.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>First it’s proposed to implement a method for the correct tests installation.
Next a new repository should be created and all Sahara scenario tests should
be moved to that repository. After that all sahara-ci jobs should be migrated
to the new mechanism of running scenario tests.</p>
<p>The scenario framework itself should be improved to be able to correctly skip
tests for the features that are not available in earlier Sahara releases.
Testing stable/kilo and stable/liberty branches may require some updates to
appropriate scenarios. The CI should initially start running scenario tests for
stable/kilo and stable/liberty and keep testing 2 latest stable releases.</p>
<p>Having a separate repository allows using these tests as a plugin for Sahara
(like client tests for tempest, without releases). Perhaps, scenario tests
will be installed in a virtualenv.</p>
<p>The scenario framework and tests themselves may be packaged and released.
All requirements should then be installed to the current operating system.
This however implies that scenario framework dependencies should be packaged
and installed as well. Many of these requirements are not packaged neither
in MOS nor in RDO.</p>
<p>There are 2 ways to solve this:
1. Manual update of requirements to keep them compatible with 2 previous
stable releases
2. Move all changing dependencies from requirements to test-requirements</p>
<p>Yaml-files defining the scenarios are a part of the test framework and they
should be moved to the new repository with scenario tests.</p>
<p>Next step is to add support of default templates for every plugin in each
stable release with correct flavors, node processes. For example, list of
plugins for the 2 previous stable releases:</p>
<ul class="simple">
<li><p>kilo: vanilla 2.6.0, cdh 5.3.0, mapr 4.0.2, spark 1.0.0, ambari 2.2/2.3</p></li>
<li><p>liberty: vanilla 2.7.1, cdh 5.4.0, mapr 5.0.0, spark 1.3.1, ambari 2.2/2.3</p></li>
</ul>
<p>Each default template for should be implemented as yaml file which
will be run on CI.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Alternative would be to package the tests for easier installation and usage.
For example, command for running tests for vanilla 2.7.1 plugin will be:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="gp">$ </span>sahara_tests<span class="w"> </span>-p<span class="w"> </span>vanilla<span class="w"> </span>-v<span class="w"> </span><span class="m">2</span>.7.1
</pre></div>
</div>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Deployers should switch CI to use the new repository with scenario tests
instead of current Sahara repository.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>esikachev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>This will require following changes:</p>
<ul class="simple">
<li><p>Implement installation mechanism for tests.</p></li>
<li><p>Move tests to separate repository (with scenario_unit tests).</p></li>
<li><p>Update sahara-ci jobs for new scenario tests repository.</p></li>
<li><p>Add default templates for each plugin and release.</p></li>
<li><p>Update scenario framework for correct testing of previous releases.</p></li>
<li><p>Add new jobs on sahara-ci for testing Sahara on previous releases.</p></li>
<li><p>Add ability for tests packaging.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to add documentation with description “How to run Sahara scenario tests”.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 26 Nov 2015 00:00:00 Allow ‘is_public’ to be set on protected resourceshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/allow-public-on-protected.html
<p>Since <em>is_public</em> is meta-information on all objects rather
than content, sahara should allow it to be changed even if <em>is_protected</em>
is True. This will make it simpler for users to share protected
objects.</p>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/allow-public-on-protected">https://blueprints.launchpad.net/sahara/+spec/allow-public-on-protected</a></p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently checks on <em>is_protected</em> prevent any field in an object
from being modified. This guarantees that the content of the object
will not be changed accidentally.</p>
<p>However, <em>is_public</em> is an access control flag and does not really
pertain to object content. In order to share a protected object,
a user must currently set <em>is_protected</em> to False while making
the change to <em>is_public</em>, and then perform another operation
to set the <em>is_protected</em> flag back to True.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>As a convenience, allow <em>is_public</em> to be modified even if <em>is_protected</em>
is True. The <em>is_public</em> field will be the only exception to the normal checks.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Leave it unchanged</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>This may require a Horizon change (croberts please comment)</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tmckay</p>
</dd>
<dt>Other contributors:</dt><dd><p>croberts</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Modify <em>is_protected</em> checks in the sahara engine
Modify unit tests
Horizon changes</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None, unless there is a current section discussing protected/public</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 16 Nov 2015 00:00:00 add cdh 5.5 support into saharahttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/cdh-5-5-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-5-5-support">https://blueprints.launchpad.net/sahara/+spec/cdh-5-5-support</a></p>
<p>This specification proposes to add CDH 5.5 plugin with Cloudera Distribution
of Hadoop and Cloudera Manager in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now we have already supported plugins for CDH 5.3.0 and 5.4.0 versions
in liberty. With the release of the CDH 5.5.0 by Cloudera, we can add the new
version support into Sahara.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Since we already support 5.4.0, we can follow the current implemention to
avoid too many changes. We must guarantee all the services supported in
previous version work well in 5.5.0. Supporting for new services in CDH 5.5.0
will be discussed later.</p>
<p>Cloudera starts to support ubuntu 14.04 in CDH 5.5.0. So we decide to provide
Ubuntu 14.04 as other plugins do. And the building for Ubuntu 14.04 image with
CDH 5.5 should aslo be supported in sahara-image-elements project. CentOS 6.5
will still be supported.</p>
<p>Due to the refactoring for previous CDH plugin versions, we should not merge
patches related with 5.5.0 until all the refactoring patches are merged.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Sahara-image-elements support for CDH 5.5.0 need to be done.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>jxwang92 (Jaxon Wang)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will be:</p>
<ul class="simple">
<li><p>Add python codes in sahara/sahara/plugins/cdh/v5_5_0.</p></li>
<li><p>Add service resource files in sahara/sahara/plugins/cdh/v5_5_0/resources.</p></li>
<li><p>Add test cases including unit and scenario.</p></li>
<li><p>Test and evaluate the change.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Follow what exsiting test cases of previous version do.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Cloudera Plugin doc needs some little changes.
<a class="reference external" href="http://docs.openstack.org/developer/sahara/userdoc/cdh_plugin.html">http://docs.openstack.org/developer/sahara/userdoc/cdh_plugin.html</a></p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 06 Nov 2015 00:00:00 Modify ‘is_default’ behavior relative to ‘is_protected’ for templateshttps://specs.openstack.org/openstack/sahara-specs/specs/mitaka/replace_is_default_with_is_protected.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/replace-is-default">https://blueprints.launchpad.net/sahara/+spec/replace-is-default</a></p>
<p>With the addition of the general <em>is_protected</em> field on all objects,
the <em>is_default</em> field on cluster and node group templates is semi-redundant.
Its semantics should be changed; specifically, gating of update and
delete operations should be handled exclusively by <em>is_protected</em>. The
<em>is_default</em> field should still be used to identify default templates for bulk
update and delete operations by the sahara-templates tool.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The <em>is_protected</em> and <em>is_default</em> fields both control gating of
update and delete operations. This overlap is not clean and should
be resolved to avoid confusion for end users and developers alike
and to decrease complexity in code maintenance.</p>
<p>Gating of update or delete operations should no longer be associated
with the <em>is_default</em> field. Instead, it should only mark the default
templates created with the <em>sahara-templates</em> tool so that bulk update
and delete operations continue to function as is.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Remove the checks in the sahara db layer that raise exceptions based
on the <em>is_default</em> field for template update and delete operations. This
will leave the gating under the control of <em>is_protected</em>. Note that
there will no longer be an error message that informs the user of
failure to update or delete a default template – it will be phrased in
terms of the <em>is_protected</em> field instead.</p>
<p>Modify all of the default template sets bundled with sahara to include
the <em>is_protected</em> field set to <em>True</em>. This will ensure that templates
used as-is from the sahara distribution will be created with update and
delete operations hindered.</p>
<p>Provide a database migration that sets <em>is_protected</em> to <strong>True</strong> for
all templates with <em>is_default</em> set to <strong>True</strong>. This will guarantee
the hindrance on update and delete operations continues for existing default
templates.</p>
<p>Currently, sahara will allow update and delete of default templates if the
<em>ignore_defaults</em> flag is set on the conductor methods. Change the name
and handling of this flag slightly so that if it is set, sahara will ignore
the <em>is_protected</em> flag instead. This will allow the <em>sahara-templates</em>
tool to easily update and delete default templates without having to first
clear the <em>is_protected</em> flag.</p>
<p>Write a documentation section on default templates best practices. Given the
addition of <em>is_protected</em> and <em>is_public</em> and the changed semantics for
<em>is_default</em>, provide some suggestions for best practices as follows: operators
will be encouraged to load the default template sets in the <em>admin</em> or
<em>service</em> tenant and then make them available to users with <em>is_public</em>. If
modifications will be made to the default templates, they can be loaded, copied
and modified and the resultant templates made available to users with
<em>is_public</em>. This will ensure that the original default templates as bundled
with sahara will always be available as a reference with <em>is_protected</em> set to
<em>True</em>. Alternatively, an admin could copy the default templates directory from
the sahara distribution to another location and modify them on disk before
loading them.</p>
<p>Implications of the change: it will now be possible for tenants with default
templates to set <em>is_protected</em> to <em>True</em> and then edit those templates. If
a subsequent update of those templates is done from the <em>sahara-templates</em>
tool, those edits will be overwritten. This should be made clear in the best
practices documentation described above. Additionally, <em>is_default</em> will remain
a field that is only settable from the <em>sahara-templates</em> tool so that the
default sets can be managed as a group.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None. Goals of the original default templates implementation were:</p>
<p>1 allow offline update and delete of default templates
2 treat a group of node group and cluster templates as a coherent set
3 allow operation filtered by plugin and version</p>
<p>The first requirement precludes use of the sahara client to manage default
templates. If we did not have the first requirement, we could conceivably
use the sahara client to manage them but requirements two and three
mean that we would need logic around the operations anyway. Lastly,
removing the <em>is_default</em> field completely leaves us with a need for
some other mechanism to identify default templates. It seems best to
remove the overlapping functionality of <em>is_default</em> and <em>is_protected</em> but
leave the rest of the current implementation unchanged.</p>
<p>Also, this could be deferred as part of the API V2 initiative, but since it has
so little impact on the API (see below) it seems unnecessary to wait</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None, but provide an appropriate migration to set <em>is_protected</em> to
<strong>True</strong> for all existing templates with <em>is_default</em> set to <strong>True</strong>.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None, <em>is_default</em> is only accessible directly through the conductor layer.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Deployers should be familiar with the best practices documentation we will
add concerning default templates.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tmckay</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Remove code in the sahara db layer that rejects update or delete for defaults
Update default templates bundled with sahara to include <em>is_protected</em>
Add alembic migration for setting <em>is_protected</em> on existing defaults
Review documentation and add a best practices section</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be sufficient. Existing tests will be modified
to prove that default templates may be updated and edited after
this change, assuming <em>is_protected</em> is False. The existing
tests on <em>is_protected</em> cover normal update/delete control.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation on the sahara-templates tool may change slightly where it
describes how to update or delete a default template, and we will add a
best practices section as noted</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 04 Nov 2015 00:00:00 Add CM API Library into Saharahttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/add-lib-subset-cm-api.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-lib-subset-cm-api">https://blueprints.launchpad.net/sahara/+spec/add-lib-subset-cm-api</a></p>
<p>This specification proposes to add CM API library into Sahara so that the
Cloudera plugin will not have to depend on a third party library support.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now the Cloudera plugin is depending on a third party library cm_api (by
Cloudera). This library is not python3 compatible, and Cloudera has no
resource to fix it. Therefore, it is hard to put this library into
requirements.txt file. To fix this issue, we plan to implement a subset of
CM APIs and put them into Sahara project, so that the Cloudera plugin will not
depend on a third-party library, and we can enable this plugin by default.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Now the CM APIs used in Cloudera plugin include (maybe not all included):</p>
<ul class="simple">
<li><p>ApiResource.create_cluster</p></li>
<li><p>ApiResource.get_cluster</p></li>
<li><p>ApiResource.get_all_hosts</p></li>
<li><p>ApiResource.delete_host</p></li>
<li><p>ApiResource.get_cloudera_manager</p></li>
<li><p>ClouderaManager.create_mgmt_service</p></li>
<li><p>ClouderaManager.hosts_start_roles</p></li>
<li><p>ClouderaManager.get_service</p></li>
<li><p>ApiCluster.start</p></li>
<li><p>ApiCluster.remove_host</p></li>
<li><p>ApiCluster.create_service</p></li>
<li><p>ApiCluster.get_service</p></li>
<li><p>ApiCluster.deploy_client_config</p></li>
<li><p>ApiCluster.first_run</p></li>
<li><p>ApiService.create_role</p></li>
<li><p>ApiService.delete_role</p></li>
<li><p>ApiService.refresh</p></li>
<li><p>ApiService.deploy_client_config</p></li>
<li><p>ApiService.start</p></li>
<li><p>ApiService.restart</p></li>
<li><p>ApiService.start_roles</p></li>
<li><p>ApiService.format_hdfs</p></li>
<li><p>ApiService.create_hdfs_tmp</p></li>
<li><p>ApiService.create_yarn_job_history_dir</p></li>
<li><p>ApiService.create_oozie_db</p></li>
<li><p>ApiService.install_oozie_sharelib</p></li>
<li><p>ApiService.create_hive_metastore_tables</p></li>
<li><p>ApiService.create_hive_userdir</p></li>
<li><p>ApiService.create_hive_warehouse</p></li>
<li><p>ApiService.create_hbase_root</p></li>
<li><p>ApiService.update_config</p></li>
<li><p>ApiServiceSetupInfo.add_role_info</p></li>
<li><p>ApiRole.update_config</p></li>
</ul>
<p>Those APIs are what we need to implement in our CM APIs. We can create a
directory client in plugin cdh directory, and put the lib files in this
directory.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Deployers will no longer need to install cm_api package anymore.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>When new version of CDH is released, if it is incompatible with current used
client, or use some new APIs, developer may need to update the client when
adding support for new CDH release.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ken chen</p>
</dd>
<dt>Other contributors:</dt><dd><p>ken chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will include:</p>
<ul class="simple">
<li><p>Add a directory client in cdh plugin directory, and put lib files under
this directory.</p></li>
<li><p>Change all current cm_api imports into using the new client.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Sahara Integration test for CDH plugin is enough.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documents about CDH plugin prerequisites and enabling should be modified, for
cm_api is not required any more.</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="https://pypi.python.org/pypi/cm-api/8.0.0">https://pypi.python.org/pypi/cm-api/8.0.0</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 Add More Services into CDH pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/add-more-cdh-services.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-cdh-more-services">https://blueprints.launchpad.net/sahara/+spec/add-cdh-more-services</a></p>
<p>This specification proposes to add below services into CDH plugins:
Flume, Sentry, Sqoop, SOLR, Key-Value Store Indexer, and Impala.</p>
<p>Those services can be added into CDH plugin by using CM API first_run to start
them, which can save much effort to prepare and start those services one by
one.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now services supported in Sahara CDH plugin is still limited. We want to add
more services ASAP. They are Flume, Sentry, Sqoop, SOLR, Key-Value Store
Indexer, and Impala.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Since we plan to use first_run to prepare and start those services, we will
not need to call other CM APIs for those services in start_cluster() method.</p>
<p>The implementation will need below changes on codes for each service:</p>
<ul class="simple">
<li><p>Add process names of the service in some places.</p></li>
<li><p>Add service or process configuration, and network ports to open.</p></li>
<li><p>Add service validation.</p></li>
<li><p>Modify some utils methods, like get_service, to meet more services.</p></li>
<li><p>Some other changes for a few specific services if needed.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ken chen</p>
</dd>
<dt>Other contributors:</dt><dd><p>ken chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will be:</p>
<ul class="simple">
<li><p>Change python codes in sahara/sahara/plugins/cdh.</p></li>
<li><p>Add more service resource files in sahara/sahara/plugins/cdh/resources.</p></li>
<li><p>Test and evaluate the change.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Use an integration test to create a cluster.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 29 Oct 2015 00:00:00 Add check service test in integration testhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/add-service-test-in-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-service-test-in-integration">https://blueprints.launchpad.net/sahara/+spec/add-service-test-in-integration</a></p>
<p>This specification proposes to add check services tests in integration test
for CDH plugins.Currently we have added zookeeper,HBase,Flume,Sentry,Sqoop,
SOLR,Key-Value Store Indexer, and Impala services.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently we have enabled many new services in CDH plugin.And we want to
increase the coverage of the test cases.So we plan to add test cases in the
integration test, which will check the availability of those services by using
simple scripts like we did in map_reduce_testing.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We plan to write test cases like the way we did in map_reduce_testing. First
copy the shell script to the node,then run this script, the script will run
basic usage of the services.</p>
<p>The implementation will need below changes on codes for each service:</p>
<ul class="simple">
<li><p>Add new cluster template (including all services process) in
test_gating_cdh.py</p></li>
<li><p>Add check_services.py (check all services) to check all basic usage of
services</p></li>
<li><p>Add shell scripts (check all services)</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>huichun lu</p>
</dd>
<dt>Other contributors:</dt><dd><p>huichun lu</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will be:</p>
<ul class="simple">
<li><p>Add python codes in <code class="docutils literal notranslate"><span class="pre">sahara/sahara/tests/integration/tests/gating/</span>
<span class="pre">test_cdh_gating.py</span></code>.</p></li>
<li><p>Add check services scripts files in <code class="docutils literal notranslate"><span class="pre">sahara/sahara/tests/integrations/</span>
<span class="pre">tests/resources</span></code>.</p></li>
<li><p>Add check_services.py in <code class="docutils literal notranslate"><span class="pre">sahara/sahara/tests/integration/tests/</span></code>.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Use an integration test to create a cluster.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 29 Oct 2015 00:00:00 Add timeouts for infinite polling for smthhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/add-timeouts-for-polling.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-timeouts-for-polling">https://blueprints.launchpad.net/sahara/+spec/add-timeouts-for-polling</a></p>
<p>We have infinite polling-processes for smth in sahara code.
It would be nice to add timeouts for its execution</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>When creating a cluster, cluster’s status may be stuck in Waiting and will
always await network if the network configuration is wrong. We can add
configurable timeouts for all possible polling processes.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Here you can find list of all polling processes in Sahara code:</p>
<p>For service module:</p>
<ul class="simple">
<li><p>volumes._await_attach_volumes</p></li>
<li><p>volumes._create_attach_volume</p></li>
<li><p>volumes._detach_volume</p></li>
<li><p>engine._wait_until_accessible</p></li>
<li><p>engine._await_networks</p></li>
<li><p>direct_engine._await_active</p></li>
<li><p>direct_engine._await_deleted</p></li>
<li><p>edp.job_manager.cancel_job</p></li>
</ul>
<p>For utils module:</p>
<ul class="simple">
<li><p>openstack.heat.wait_stack_completion</p></li>
</ul>
<p>For cdh plugin:</p>
<ul class="simple">
<li><p>cloudera_utils.await_agents</p></li>
<li><p>plugin_utils.start_cloudera_manager</p></li>
</ul>
<p>For hdp plugin:</p>
<ul class="simple">
<li><p>_wait_for_async_request for both plugin versions</p></li>
<li><p>wait_for_host_registrations for both plugin versions</p></li>
<li><p>decommission_cluster_instances for 2.0.6 versionhandler</p></li>
</ul>
<p>For spark plugin:</p>
<ul class="simple">
<li><p>scaling.decommission_dn</p></li>
</ul>
<p>For vanilla plugin:</p>
<ul class="simple">
<li><p>hadoop2.scaling._check_decommission</p></li>
<li><p>hadoop2.await_datanodes</p></li>
<li><p>v1_2_1.scaling.decommission_dn</p></li>
<li><p>v1_2_1.versionhandler._await_datanodes</p></li>
</ul>
<p>Proposed change would consists of following steps:</p>
<ul class="simple">
<li><p>Add new module polling_utils in sahara/utils where would register
new timeouts options for polling processes from service and utils modules.
Also it would consist with a specific general util for polling.
As example it can be moved from
<a class="reference external" href="https://github.com/openstack/sahara/blob/master/sahara/utils/general.py#L175">https://github.com/openstack/sahara/blob/master/sahara/utils/general.py#L175</a>.</p></li>
<li><p>Add new section in sahara.conf.sample which would consist with all
timeouts.</p></li>
<li><p>All plugin specific options would be related only with this plugin and also
would be configurable. Also user would have ability to configure all plugin
specific options during cluster template creation.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change doesn’t require any data models modifications.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>This change doesn’t require any REST API modifications.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>User would have ability to configure all timeouts separately. Some options
would be configurable via sahara.conf.sample, other would be configurable
from plugin during cluster template creation.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>During cluster template creation user would have ability to configure
plugin specific options from UI.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add general polling util to sahara/utils/polling_utils</p></li>
<li><p>Apply this changes to all plugins and sahara engines.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on current Openstack Data Processing Requirements.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>this change would require to add unit tests. Also this change would be tested
manually.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Required to document this feature in sahara/userdoc/configuration.guide.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1] <a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1402903">https://bugs.launchpad.net/sahara/+bug/1402903</a>
[2] <a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1319079">https://bugs.launchpad.net/sahara/+bug/1319079</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 Default templates for each pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/default-templates.html
<p>Blueprint: <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/default-templates">https://blueprints.launchpad.net/sahara/+spec/default-templates</a></p>
<p>In order to create a basic cluster for any plugin, a user currently has to
go through several steps to create an appropriate set of node group and
cluster templates. We believe that set of actions should be captured in a
default set of templates (per plugin) and loaded into the database ahead of
time so that users do not need to repeat some of the most basic steps to
provision a simple cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Creating even a basic cluster currently requires several steps before the
user is able to launch their cluster. In an effort to reduce the amount of
effort and time to get a simple cluster running, we propose a set of default
templates for each plugin that will be pre-loaded and available for use.</p>
<p>Other potential issues/answers:
1) How do we make the set of templates available for all users/tenants?
Today, any given template is only read/writable by one tenant.</p>
<p>A) Implement ACL support for templates (SLukjanov plans to write the spec
for this). Having proper ACL support in place will allow us to manage
access for read/write across all tenants.</p>
<p>ACL support is not available yet. For the time being, templates will be
added per-tenant.</p>
<p>2) Do we allow editing of default templates?
I don’t think we should allow editing of the default templates since they
are shared among all tenants. The flow to “edit” one would be to make a
copy of the template and work from there. I propose that each template will
be stored with a flag in the database that identifies each default template
as being a default template so that we can enforce that users cannot change
the default templates.</p>
<p>3) How do we avoid uploading the same default templates each time at startup
while still allowing for them to be updated as necessary?
We could use a numbering system in the template file names to indicate the
version number and store that in the database (perhaps instead of a boolean
flag indicating that the template is a default template,
we could store an int that is the version number). At startup time,
we would go through all of the template files for each plugin and compare
the version numbers to the version numbers that are stored in the database.</p>
<p>Sqlalchemy will detect when fields have changed and set the “updated at” field
accordingly. Therefore, we can simply attempt to update existing templates
whenever the script is run. If the template matches what is in the database,
there will be no update.</p>
<p>4) How do we make a json default cluster template reference a json default node
group template since we won’t know the node group template IDs?</p>
<p>A) CLI util will operate them one-by-one starting with node group templates and
then cluster templates. In addition, we could create a few example jobs that
could be used for health check.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ol class="arabic">
<li><p>Create sets of default template json files for each plugin.</p>
<p>A set of default templates will consist of node group templates and/or
cluster templates. Whether or not a template file defines a node group
template or a cluster template will be determined based on the presence of
required fields specified in the Sahara JSON validation code. Node group
templates require <cite>flavor_id</cite> and <cite>node_processes</cite>, so presence of these
fields implicitly identify a template as a node group template. If it is
not a node group template, it is a cluster template. This identification
avoids a naming scheme or some other kind of extra labeling to identify a
template type.</p>
<p>Default templates distributed with Sahara will be located in
<cite>sahara/plugins/default_templates/</cite>. The code that processes the default
templates should use this path as a default starting point but should allow
a different starting point to be specified. It should make no assumptions
about the directory structure beyond the following:</p>
<ul class="simple">
<li><p>All of the template files in a particular directory should be treated as
a set, and cluster templates may reference node group templates in the
same set by name.</p></li>
<li><p>Directories may be nested for logical organization, so that a plugin may
define default template sets for each version. Therefore, the code should
recurse through subdirectories by default but it should be possible to
specify no recursion.</p></li>
</ul>
<p>This design will allow code to process default templates decoupled from
explicit knowledge of Sahara plugins, plugin versions, or the structure of
plugin directories. The ability to specify a different starting point
will allow a user to process particular template sets if the entire set
is not desired (only some plugins enabled, for instance), or if an alternate
set at a different location is desired. In short, it keeps the processing
general and simple.</p>
<p>In practice, the directories under <cite>sahara/plugins/default_templates</cite> will
be named for plugins, and subdirectories will be created for different
versions as appropriate.</p>
</li>
<li><p>Add a CLI util that can be executed by cron with admin credentials to
create/update existing default templates. This utility needs to be able to
take some placeholders like “flavor” or “network” and make the appropriate
substitutions (either from configs or via command line args) at runtime.
The cron job can be optional if we want to force any updates to be
triggered explicitly.</p>
<p>The CLI will take an option to specify the starting directory (default
will be <cite>sahara/plugins/default_templates</cite>).</p>
<p>The CLI will take an option to disable recursion through subdirectories
(default will be to recurse).</p>
<p>At a minimum, the CLI will provide a command to create or update default
templates with processing beginning at the designated starting directory.</p>
<p>The CLI should use the “plugins” configuration parameter from the [database]
section to filter the templates that are processed. If the “plugin-name”
field of a template matches a plugin name in the “plugins” list, it will
be processed. If the “plugins” list is empty, all templates will be
processed. It should be possible to override the “plugins” configuration
from the command line with a “–plugin-name” option.</p>
<p>If there is an error during updating or creating templates in a particular
set, the CLI should attempt to undo any modifications or creations that
were done as part of that set.</p>
<p>The CLI should also provide a mechanism for deleting default templates,
since the <cite>is_default</cite> field will prevent that, should an admin for
some reason desire to remove default templates. This can be a simple
operation that will remove a single default template by ID. It is not
likely to be used very often.</p>
</li>
</ol>
<section id="alternatives">
<h3>Alternatives</h3>
<p>1) The loading process could be done via the REST API if we wanted to have
some sort of external process that manages the default templates. That might
require changing the API a bit to add endpoints for managing the default
templates and seems like a fair bit of unnecessary work since the management of
default templates should be something done only within Sahara.</p>
<p>2) Add a hook, possibly in plugins/base:PluginManager for
“load_default_templates”. This method would be responsible for triggering
the loading of the defaults at startup time.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>N/A</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>N/A</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>End users will see the default templates show up just like any other
template that they may have created.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>N/A</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>N/A</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>N/A</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>N/A
The default templates will show-up in the UI and look like regular templates.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>croberts</p>
</dd>
<dt>Secondary assignee:</dt><dd><p>tmckay</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>1) Come up with a set of default templates for each plugin. These will
probably be json formatted files.</p>
<p>2) Come up with some sort of mechanism to load the templates or ensure that
they are already loaded when Sahara starts-up.</p>
<ol class="arabic simple" start="3">
<li><p>Update the Sahara documentation.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>1) Implementation of the ACL for templates (spec still TBD). This will let
us give all users read access to the default templates while still possibly
allowing admins to edit the templates.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Ideally, tests will be added to ensure that a functioning cluster can be
started based on each of the default template sets. If that is determined
to be too time-consuming per-run, then tests to ensure the validity of each set
of templates may be sufficient.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The Sahara documentation should be updated to note that the default
templates are available for use. Additionally, any future plugins will be
expected to provide their own set of default templates.</p>
</section>
<section id="references">
<h2>References</h2>
<p>N/A</p>
</section>
Thu, 29 Oct 2015 00:00:00 [EDP] Add Oozie Shell Action job typehttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-add-oozie-shell-action.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-edp-shell-action">https://blueprints.launchpad.net/sahara/+spec/add-edp-shell-action</a></p>
<p>Oozie shell actions provide a great deal of flexibility and will
empower users to easily customize and extend the features of
Sahara EDP as needed. For example, a shell action could be used
to manage hdfs on the cluster, do pre or post processing for
another job launched from Sahara, or run a data processing job
from a specialized launcher that does extra configuration not
otherwise available from Sahara (ie, setting a special classpath
for a Java job).</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Today if a user finds a limitation in Sahara EDP that prevents them
from performing desired processing they have a few choices:</p>
<ul class="simple">
<li><p>Log into the cluster nodes with ssh using the specified keypair
and perform processing by hand. This may not be available to all
users.</p></li>
<li><p>Write a Java application to do the custom processing and run it
as a Java job in EDP. This can work, however we as developers know
that Java is not often used as a scripting language; it’s a little
too heavyweight and not everyone knows it.</p></li>
<li><p>Modify the Sahara source. A savvy user might extend Sahara EDP
themselves to get the desired functionality. However, not everyone
is a developer or has the time to understand Sahara enough to do this.</p></li>
<li><p>Submit a bug or a blueprint and wait for the Sahara team to address it.</p></li>
</ul>
<p>However, the existence of shell actions would empower users to easily
solve their own problems. With a shell action, a user can bundle files
with a script written in bash, Python, etc and execute it on the cluster.</p>
<p>Here is a real-world example of a case that could be easily solved
with a shell action:</p>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-add-hbase-lib">https://blueprints.launchpad.net/sahara/+spec/edp-add-hbase-lib</a></p>
<p>In the above blueprint we are calling for additional features in Sahara
as a convenience for users, but with a shell action a user could solve this
problem on their own. A simple bash script can be written to launch a Java
application like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/>#!/bin/bash
java -cp HBaseTest.jar:`hbase classpath` HBaseTest
</pre></div>
</div>
<p>In this case a user would associate the script and the Java application
with the shell job as job binaries and Sahara would execute the script.</p>
<p>In a similar case, Sahara EDP itself uses a Python wrapper around
spark-submit to run Spark jobs. A shell action makes these kinds of
launchers easily available to end users.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add a <cite>Shell</cite> job type that is implemented by the Oozie EDP engine.
(It is possible that other EDP engines, such as the Spark or Storm
engines, could share a basic shell command implementation for such
jobs but that would be another CR).</p>
<p>Shell jobs can use the existing <cite>mains</cite> and <cite>libs</cite> fields in a job
execution. The script identified in <cite>mains</cite> will be the script that
Sahara runs and the files in <cite>libs</cite> will be supporting files bundled
in the working directory by Oozie.</p>
<p>As with other job types, shell actions will support <cite>configs</cite> and
<cite>args</cite> passed in the existing <cite>job_configs</cite> field of a job execution.
Values in <cite>configs</cite> are specified in the Oozie workflow with
<strong><configuration></strong> tags and will be available in a file created by Oozie.
The <cite>args</cite> are specified with the <strong><argument></strong> tag and will be passed
to the shell script in order of specification.</p>
<p>In the reference section below there is a simple example of a shell
action workflow. There are three tags in the workflow that for Sahara’s
purposes are unique to the <cite>Shell</cite> action and should be handled by
Sahara:</p>
<ul>
<li><p><strong><exec>script</exec></strong>
This identifies the command that should be executed by the shell action.
The value specified here will be the name of the script identified in
<cite>mains</cite>.
Technically, this can be any command on the path but it is probably
simpler if we require it to be a script. Based on some experimentation,
there are subtleties of path evaluation that can be avoided if a script
is run from the working directory</p></li>
<li><p><strong><file>support.jar</file></strong>
This identifies a supporting file that will be included in the working
directory. There will be a <file> tag for every file named in <cite>libs</cite>
as well as the script identified in <cite>mains</cite>.</p>
<p>(Note that the <file> tag can be used in Oozie in any workflow, but
currently Sahara does not implement it at all. It is necessary for the
shell action, which is why it’s mentioned here. Whether or not to add
general support for <file> tags in Sahara is a different discussion)</p>
</li>
<li><p><strong><env-var>NAME=VALUE</env-var></strong>
The env-var tag sets a variable in the shell’s environment. Most likely
we can use the existing <cite>params</cite> dictionary field in <cite>job_configs</cite> to
pass env-var values even if we want to label them as “environment
variables” in the UI.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change adds a new job type, but since job types are stored as strings
it should not have any data model impact.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Only a change in validation code for job type</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>There may be security considerations related to <cite>Shell Action Caveats</cite>,
bullet number 2, in the URL listed in the reference section.</p>
<p>It is unclear whether or not in Sahara EDP the user who started the TaskTracker
is different than the user who submits a workflow. This needs investigation –
how is Sahara authenticating to the Oozie client? What user is the Oozie
server using to deploy jobs?</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>We would need a new form for a Shell job type submission. The form should allow
specification of a main script, supporting libs, configuration values,
arguments, and environment variables (which are 100% analogous to params from
the perspective of the UI)</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Investigate user issue mentioned above (who is the user that runs
shell actions in Sahara and what are the implications?)</p></li>
<li><p>Add a Shell job type and an implementation in the Oozie EDP engine
components under the <cite>workflow_creator</cite> directory</p></li>
<li><p>Update job validation routines to handle the Shell job type</p></li>
<li><p>Add an integration test for Shell jobs</p></li>
<li><p>Update the EDP documentation to describe the Shell job type</p></li>
<li><p>Add a UI form for Shell job submission</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit tests to cover creation of the Shell job</p></li>
<li><p>Integration tests to cover running of a simple shell job</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The EDP sections of the documentation need updating</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="http://blog.cloudera.com/blog/2013/03/how-to-use-oozie-shell-and-java-actions/">http://blog.cloudera.com/blog/2013/03/how-to-use-oozie-shell-and-java-actions/</a></p>
<p>A simple Shell action workflow looks like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>doit.sh</exec>
<argument>now</argument>
<env-var>VERSION=3</env-var>
<file>HBaseTest.jar</file>
<file>doit.sh</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>oops!</message>
</kill>
<end name='end' />
</workflow-app>
</pre></div>
</div>
</section>
Thu, 29 Oct 2015 00:00:00 [EDP] Add options supporting DataSource identifiers in job_configshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-data-sources-in-job-configs.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-data-sources-in-job-configs">https://blueprints.launchpad.net/sahara/+spec/edp-data-sources-in-job-configs</a></p>
<p>In some cases it would be convenient if users had a way to pass a DataSource
object as a job configuration value to take advantage of the data codified in
the object instead of copying and specifying that information manually. This
specification describes options that allow users to reference DataSource
objects in configuration values, parameters, and arguments in a JobExecution
record.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>With the exception of Java and Spark all job types in Sahara take a single
input DataSource object and a single output DataSource object. The DataSource
object ids are passed as part of a job execution request and Sahara configures
the job based on the information in the objects. Sahara assumes that jobs at
runtime will consume the path information using a fixed set of parameters and
configuration values. This works well in the general case for the constrained
job types supported by Oozie/hadoop (Hive, Pig, MapReduce) but there are cases
where DataSource objects currently may not be used:</p>
<ul class="simple">
<li><p>Java and Spark job types do not require DataSources since they have
no fixed arg list. Currently input and output paths must be specified as URLs
in the <code class="docutils literal notranslate"><span class="pre">args</span></code> list inside of <code class="docutils literal notranslate"><span class="pre">job_configs</span></code> and authentication configs
must be manually specified.</p></li>
<li><p>Hive, Pig, and MapReduce jobs which use multiple input or output paths or
consume paths through custom parameters require manual configuration.
Additional paths or special configuration parameters (ie anything outside
of Sahara’s assumptions) require manual specification in the <code class="docutils literal notranslate"><span class="pre">args</span></code>,
<code class="docutils literal notranslate"><span class="pre">params</span></code>, or <code class="docutils literal notranslate"><span class="pre">configs</span></code> elements inside of <code class="docutils literal notranslate"><span class="pre">job_configs</span></code>.</p></li>
</ul>
<p>Allowing DataSources to be referenced in <code class="docutils literal notranslate"><span class="pre">job_configs</span></code> is an incremental
improvement that gives users the option of easily using DataSource objects in
the above cases to specify IO.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add optional boolean config values on a JobExecution that cause Sahara to
to treat values in <code class="docutils literal notranslate"><span class="pre">job_configs</span></code> as potential names or uuids of DataSource
objects. This applies to configuration values, parameters, and arguments
for all job types for maximum flexibility – Hive and Pig jobs use parameters
to pass values, MapReduce uses configuration values, and Java and Spark use
arguments.</p>
<p>If Sahara resolves a value to the name or uuid of a DataSource it will
substitute the path information from the DataSource for the value and update
the job configuration as necessary to support authentication. If a value does
not resolve to a DataSource name or uuid value, the original value will be
used.</p>
<p>Note that the substitution will occur during submission of the job to the
cluster but will <em>not</em> alter the original JobExecution. This means that if
a user relaunches a JobExecution or examines it, the original values will be
present.</p>
<p>The following non mutually exclusive configuration values will control this
feature:</p>
<ul>
<li><p>edp.substitue_data_source_for_name (bool, default False)</p>
<p>Any value in the <code class="docutils literal notranslate"><span class="pre">args</span></code> list, <code class="docutils literal notranslate"><span class="pre">configs</span></code> dict, or <code class="docutils literal notranslate"><span class="pre">params</span></code> dict in
<code class="docutils literal notranslate"><span class="pre">job_configs</span></code> may be the name of a DataSource object.</p>
</li>
<li><p>edp.substitute_data_source_for_uuid (bool, default False)</p>
<p>Any value in the <code class="docutils literal notranslate"><span class="pre">args</span></code> list, <code class="docutils literal notranslate"><span class="pre">configs</span></code> dict or <code class="docutils literal notranslate"><span class="pre">params</span></code> dict in
<code class="docutils literal notranslate"><span class="pre">job_configs</span></code> may be the uuid of a DataSource object.</p>
</li>
</ul>
<p>This change will be usable from all interfaces: client, CLI, and UI. The UI may
choose to wrap the feature in some way but it is not required. A user could
simply specify the config options and the DataSource identifiers on the job
execution configuration panel.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>A slightly different approach could be taken in which DataSource names or uuids
are prepended with a prefix to identify them. This would eliminate the need for
config values to turn the feature on and would allow individual values to be
looked up rather than all values. It would be unambiguous but may hurt
readability or be unclear to new users.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None required. However, I can imagine that the UI might benefit from some
simple tooling around this feature, like checkboxes to enable the feature
on a Spark or Java job submission panel.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p><tmckay></p>
</dd>
<dt>Other contributors:</dt><dd><p><croberts></p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Support in Sahara</p></li>
<li><p>Document</p></li>
<li><p>Support in the UI (optional, it will work without additional work)</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We will need to document this in the sections covering submission of jobs
to Sahara</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Thu, 29 Oct 2015 00:00:00 Enable Spark jobs to access Swift URLhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-spark-swift-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-spark-swift-integration">https://blueprints.launchpad.net/sahara/+spec/edp-spark-swift-integration</a></p>
<p>Spark uses Hadoop filesystem libraries to dereference input and output URLs.
Consequently, Spark can access Swift filesystem URLs if the Hadoop Swift JAR
is included in the image and a Spark job’s Hadoop configuration includes the
necessary Swift credentials. This specification describes a method of
transparently adding Swift credentials to the Hadoop configuration of a Spark
job so that the job source code does not need to be altered and recompiled to
access Swift URLs.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Spark jobs may access Swift URLs assuming the cluster image includes the
Hadoop Swift JAR. To do this, the job’s Hadoop configuration must include the
necessary Swift credentials (<cite>fs.swift.service.sahara.username</cite> and
<cite>fs.swift.service.sahara.password</cite>, for example).</p>
<p>As with Oozie Java actions, job source code may be modified and recompiled to
add the necessary configuration values to the job’s Hadoop configuration.
However, this means that a Spark job which runs successfully with HDFS input
and output sources cannot be used “as is” with Swift input and output sources.</p>
<p>Sahara should allow users to run Spark jobs with Swift input and output
sources without altering job source code.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change follows the approach developed by Kazuki Oikawa in
<a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility">https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility</a> for
Java compatibility.</p>
<p>A new configuration value will be added to Sahara, <cite>edp.spark.adapt_for_swift</cite>.
If this configuration value is set to True on a Spark job, Sahara will run a
wrapper class (SparkWrapper) instead of the original class indicated by the
job. The default for this configuration value will be False.</p>
<p>Sahara will generate a <cite>spark.xml</cite> file containing the necessary Swift
credentials as Hadoop configuration values. This XML file will be uploaded to
the master node with the JAR containing the SparkWrapper class, along with the
other files normally needed to execute a Spark job.</p>
<p>Sahara’s Spark launcher script will run the SparkWrapper class instead of the
job’s designated main class. The launcher will pass the name of the XML
configuration file to SparkWrapper at runtime, followed by the name of the
original class and any job arguments. SparkWrapper will add this XML file to
the default Hadoop resource list in the job’s configuration before invoking the
original class with any arguments.</p>
<p>When the job’s main class is run, it’s default Hadoop configuration will
contain the specified Swift credentials.</p>
<p>The permissions of the job dir on the master node will be set to 0x700. This
will prevent users other than the job dir owner from reading the Swift values
from the configuration file.</p>
<p>The sources for the SparkWrapper class will be located in the
sahara-extra repository under the <cite>edp-adapt-for-spark</cite> directory.
This directory will contain a <cite>pom.xml</cite> so that the JAR may be built
with maven. Maintenance should be light since the SparkWrapper class is
so simple; it is not expected to change unless the Hadoop Configuration class
itself changes.</p>
<p>Currently, the plan is to build the JAR as needed and release it with
Sahara in <cite>service/edp/resources/edp-spark-wrapper.jar</cite>. Alternatively, the
JAR could be hosted publically and added to Sahara images as an element.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>There is no known way to supply Swift credentials to the Hadoop filesystem
libraries currently other than through configuration values. Additionally,
there is no way to add values to the Hadoop configuration transparently other
than through configuration files.</p>
<p>This does present some security risk, but it is no greater than the risk
already presented by Oozie jobs that include Swift credentials. In fact, this
is probably safer since a user must have direct access to the job directory to
read the credentials written by Sahara.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>The DIB image for Spark will need to change to include the Hadoop Swift JAR.
There is an existing element for this, this is trivial.</p>
<p>Additionally, we still need a solution to a compatibility issue with the
<cite>jackson-mapper-asl.jar.</cite></p>
<p>This can be patched by including an additional JAR on the cluster nodes, so we
can conceivably bundle the extra jar with the Spark image as a patch.
Alternatively, the issue might be resolved by upgrading the CDH version (or
Spark assembly version) used in the image.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>The Spark job submission form should have an input (checkbox?) which allows a
user to set <cite>edp.spark.adapt_for_swift</cite> to True. Default should be False.</p>
<p>We don’t want to assume that just because Swift paths are passed as arguments
to a Spark job that Sahara should run the wrapper. The job itself may handle
the Swift paths in its own way.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee: tmckay</p>
<p>Other contributors: croberts</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a simple SparkWrapper class.
This is different than the wrapper class developed for Oozie Java actions.</p></li>
<li><p>Update Spark image to include the Hadoop Openstack JAR</p></li>
<li><p>Find a solution to the jackson issue</p></li>
<li><p>Update the UI</p></li>
<li><p>Implement handling of the <cite>edp.spark.adapt_for_swift</cite> option in Sahara.
This includes generation and upload of the extra XML file, upload of the
additional utility jar, and alteration of the command generated to invoke
spark-submit</p></li>
<li><p>Updated node configuration
All nodes in the cluster should include the Hadoop <cite>core-site.xml</cite> with
general Swift filesystem configuration. Additionally, spark-env.sh should
point to the Hadoop <cite>core-site.xml</cite> so that Spark picks up the Swift configs
and <cite>spark-defaults.conf</cite> needs to set up the executor classpath. These
changes will allow a user to run Spark jobs with Swift paths manually using
<cite>spark-submit</cite> from any node in the cluster should they so choose.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility">https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility</a></p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests and integration tests for Spark jobs will identify any regressions
introduced by this change.</p>
<p>Once we have Spark images with all necessary elements included, we can
add an integration test for Spark with Swift URLs.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Any user docs describing job submission should be updated to cover the new
option for Spark jobs.</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Thu, 29 Oct 2015 00:00:00 Use first_run to One-step Start Clusterhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/first-run-api-usage.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/first-run-api-usage">https://blueprints.launchpad.net/sahara/+spec/first-run-api-usage</a></p>
<p>This specification proposes to use cm_api method first_run to start cluster
for CDH plugin in Sahara, instead of current a batch of methods.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now in CDH plugin method start_cluster, a lot of methods defined in
cloudera_utils.py is used to configure/prepare services to be started. Those
methods include cm_api methods in their body, and check the return status.
E.g., cu.format_namenode will call cm_api ApiService.format_hdfs.</p>
<p>However, this way is not preferred by Cloudera. The much easier way is using
a single method first_run to most of those work. In fact, in Cloudera Manager
first_run is also used to do the final step of deploying a cluster.</p>
<p>Changing current start_cluster codes into using first_run method will benefit
by:</p>
<ul class="simple">
<li><p>Leave work up to Cloudera Manager to itself, instead of manually doing it.</p></li>
<li><p>Simplify the work of adding more service support.</p></li>
<li><p>Avoid possible errors generated by future CM changes.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation will change start_cluster to call first_run, and remove the
other part of work can be done by first_run from the method body.</p>
<p>For detail, it will be like following:</p>
<p>In deploy.py, possible start_cluster method may be like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="k">def</span> <span class="nf">start_cluster</span><span class="p">(</span><span class="n">cluster</span><span class="p">):</span>
<span class="n">cm_cluster</span> <span class="o">=</span> <span class="n">cu</span><span class="o">.</span><span class="n">get_cloudera_cluster</span><span class="p">(</span><span class="n">cluster</span><span class="p">)</span>
<span class="w"> </span><span class="sd">""" some pre codes """</span>
<span class="n">cu</span><span class="o">.</span><span class="n">first_run</span><span class="p">(</span><span class="n">cluster</span><span class="p">)</span>
<span class="w"> </span><span class="sd">""" some post codes """</span>
</pre></div>
</div>
<p>Current methods used to configure CDH components, like _configure_spark,
_install_extjs, and most part of start_cluster body can be removed.</p>
<p>In cloudera_utils.py, first_run can be defined as (just for example):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="nd">@cloudera_cmd</span>
<span class="k">def</span> <span class="nf">first_run</span><span class="p">(</span><span class="n">cluster</span><span class="p">):</span>
<span class="n">cm_cluster</span> <span class="o">=</span> <span class="n">get_cloudera_cluster</span><span class="p">(</span><span class="n">cluster</span><span class="p">)</span>
<span class="k">yield</span> <span class="n">cm_cluster</span><span class="o">.</span><span class="n">first_run</span><span class="p">()</span>
</pre></div>
</div>
<p>Methods for configuring CDH components, like create_yarn_job_history_dir,
create_oozie_db, install_oozie_sharelib, create_hive_metastore_db,
create_hive_dirs can be removed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Current way works at this stage, but it increases complexity of coding work to
add more services support to CDH plugin. And, when CM is upgraded in the
future, the correctness of current codes cannot be assured. At the end, the
first_run method to start services is recommended by Cloudera.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>It will be easier for developers to add more CDH services support.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ken chen</p>
</dd>
<dt>Other contributors:</dt><dd><p>ken chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will be:</p>
<ul class="simple">
<li><p>Change current deploy.py and cloudera_utils.py in the way above.</p></li>
<li><p>Test and evaluate the change.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Take an integration test to create a cluster.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 29 Oct 2015 00:00:00 New style logginghttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/sahara-log-guidelines.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/new-style-logging">https://blueprints.launchpad.net/sahara/+spec/new-style-logging</a></p>
<p>Rewrite Sahara logging in unified OpenStack style proposed by
<a class="reference external" href="https://blueprints.launchpad.net/nova/+spec/log-guidelines">https://blueprints.launchpad.net/nova/+spec/log-guidelines</a></p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now log levels and messages in Sahara are mixed and don’t match the OpenStack
logging guidelines.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The good way to unify our log system would be to follow the major guidelines.
Here is a brief description of log levels:</p>
<ul class="simple">
<li><p>Debug: Shows everything and is likely not suitable for normal production
operation due to the sheer size of logs generated (e.g. scripts executions,
process execution, etc.).</p></li>
<li><p>Info: Usually indicates successful service start/stop, versions and such
non-error related data. This should include largely positive units of work
that are accomplished (e.g. service setup, cluster start, successful job
execution).</p></li>
<li><p>Warning: Indicates that there might be a systemic issue;
potential predictive failure notice (e.g. job execution failed).</p></li>
<li><p>Error: An error has occurred and an administrator should research the event
(e.g. cluster failed to start, plugin violations of operation).</p></li>
<li><p>Critical: An error has occurred and the system might be unstable, anything
that eliminates part of Sahara’s intended functionality; immediately get
administrator assistance (e.g. failed to access keystone/database, plugin
load failed).</p></li>
</ul>
<p>Here are examples of LOG levels depending on cluster execution:</p>
<ul class="simple">
<li><p>Script execution:</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">LOG</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">"running configure.sh script"</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li><p>Cluster startup:</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">LOG</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">_LI</span><span class="p">(</span><span class="s2">"Hadoop stack installed successfully."</span><span class="p">))</span>
</pre></div>
</div>
<ul class="simple">
<li><p>Job failed to execute:</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">LOG</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="n">_LW</span><span class="p">(</span><span class="s2">"Can't run job execution </span><span class="si">{job}</span><span class="s2"> (reason: </span><span class="si">{reason}</span><span class="s2">"</span><span class="p">))</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">job</span> <span class="o">=</span> <span class="n">job_execution_id</span><span class="p">,</span> <span class="n">reason</span> <span class="o">=</span> <span class="n">ex</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li><p>HDFS can’t be configured:</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">LOG</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">_LE</span><span class="p">(</span><span class="s1">'Configuring HDFS HA failed. </span><span class="si">{reason}</span><span class="s1">'</span><span class="p">))</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">reason</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li><p>Cluster failed to start:</p></li>
</ul>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">LOG</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">_LE</span><span class="p">(</span><span class="s1">'Install command failed. </span><span class="si">{reason}</span><span class="s1">'</span><span class="p">))</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">reason</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">text</span><span class="p">)</span>
</pre></div>
</div>
<p>Additional step for our logging system should be usage of pep3101 as unified
format for all our logging messages. As soon as we try to make our code more
readable please use {<smthg>} instead of {0} in log messages.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We need to follow OpenStack guidelines, but if needed we can move plugin logs
to DEBUG level instead of INFO. It should be discussed separately in each case.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>starodubcevna</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Unify existing logging system</p></li>
<li><p>Unify logging messages</p></li>
<li><p>Add additional logs if needed</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/nova/+spec/log-guidelines">https://blueprints.launchpad.net/nova/+spec/log-guidelines</a>
<a class="reference external" href="https://www.python.org/dev/peps/pep-3101/">https://www.python.org/dev/peps/pep-3101/</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 Creation of Security Guidelines Documentationhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/security-guidelines-doc.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/security-guidelines-doc">https://blueprints.launchpad.net/sahara/+spec/security-guidelines-doc</a></p>
<p>As Sahara increases in functionality and complexity, the need for clear,
concise documentation grows. This is especially true for security related
topics as they are currently under-represented. This specification
proposes the creation of a document that will provide a source for security
related topics, guides, and instructions as they pertain to Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There is currently a distinct lack of centralized, security related,
documentation for Sahara. Several features have been implemented to address
security shortfalls and they could use broadened discussions of their
application and proper usage. Additionally there is no current documentation
which discusses the specific procedures for securing the individual plugin
technologies at use within Sahara.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This specification proposes the creation of Sahara specific documentation
addressing the security concerns of users and operators. This documentation
will cover current features, best practices, and guidelines for installing and
operating Sahara.</p>
<p>The documentation generated by this effort will be included in the OpenStack
Security Guide[1]. Bug patches will be generated against the current OpenStack
manuals, and the OpenStack Security Group will be engaged with respect to
finalizing and including the documentation.</p>
<p>The process will be broken down into sub-chapter sections that will make up a
Sahara chapter for the OpenStack Security Guide. Initially these sub-chapters
will include; Sahara controller installs, current feature discussions, and
plugin specific topics.</p>
<p>The information provided is intended to be updated as new methodologies,
plugins, and features are implemented. It will also be open to patching
through the standard OpenStack workflows by the community at large.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Creation of a separate document managed by the Sahara team outside the purview
of the OpenStack manuals, and distributed through the Sahara documentation.
This solution would be non-ideal as it creates an alternate path for OpenStack
manuals that is outside the expected locations for end users.</p>
<p>Creation of a separate document as above with the exception that it will be
maintained with the other OpenStack manuals. This option might be more
plausible than the previous, but it still suffers from the problem of creating
an alternate location for security related guidelines that is separate from
the official manual. It also bears the burden of creating a new project
within the manuals infrastructure.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>mimccune (Michael McCune)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Create a bug against the OpenStack manuals for Sahara chapter inclusion</p></li>
<li><p>Create the documentation and change requests for the following sub-chapters:</p>
<ul>
<li><p>Sahara controller install guides, with security related focus</p></li>
<li><p>Feature discussions and examples</p></li>
<li><p>plugin specific topics</p>
<ul>
<li><p>Hadoop</p></li>
<li><p>Spark</p></li>
<li><p>Storm</p></li>
</ul>
</li>
</ul>
</li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Format testing as provided by the security guide project.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This specification proposes the creation of hypertext and PDF documentation.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1]: <a class="reference external" href="http://docs.openstack.org/security-guide/content/ch_preface.html">http://docs.openstack.org/security-guide/content/ch_preface.html</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 Specification Repository Backlog Refactorhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/spec-repo-backlog-refactor.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/add-backlog-to-specs">https://blueprints.launchpad.net/sahara/+spec/add-backlog-to-specs</a></p>
<p>This specification proposes the refactoring of the <cite>sahara-specs</cite> repository
to inform a new workflow for backlogged features. This new workflow will
increase the visibility of specifications that have been approved but which
require implementation.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently the <cite>sahara-specs</cite> repository suggests a layout that creates
“implemented” and “approved” subdirectories for each release. This pattern
could create duplicates and/or empty directories as plans that have been
approved for previous releases but have not been completed are moved and
copied around as necessary.</p>
<p>Additionally, this pattern can increase the barrier for new contributors
looking for features to work on and the future direction of the project. By
having a single location for all backlogged specifications it will increase
visibility of items which require attention. And the release directories
can be devoted entirely to specifications implemented during those cycles.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change will add a backlog directory to the specs directory in the
repository, and refactor the juno directory to contain only the specs
implemented in juno.</p>
<p>It will also update the documentation to clearly indicate the usage of the
backlog directory and the status of the release directories. This
documentation will be part of the root readme file, a new document in the
backlog directory, and a reference in the main documentation.</p>
<p>This change is also proposing a workflow that goes along with the
repository changes. Namely that any work within a release that is either
predicted to not be staffed, or that is not started at the end of a
release should be moved to the backlog directory. This process should be
directed by the specification drafters as they will most likely be the
primary assignees for new work. In situations where the drafter of a
specification feels that there will be insufficient resources to create
an implementation then they should move an approved specification to the
backlog directory. This process should also be revisited at the end of a
release cycle to move all specifications that have not been assigned to the
backlog.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep the directory structure the way it is currently. This does not improve
the status of the repository but is an option to consider. If the
directory structure is continued as currently configured then the release
directories will need to create additional structures for each cycle’s
incomplete work.</p>
<p>Refactor each release directory to contain a backlog. This is very similar to
leaving things in their current state with the exception that it changes
the names in the directories. This change might add a small amount of
clarification but it is unlikely as the current names for “implemented” and
“approved” are relatively self explanatory.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>The main impact will be for crafters of specifications to be more aware of
the resource requirements to implement the proposed changes. And potentially
move their specifications based on those requirements.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>mimccune (Michael McCune)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Create the backlog directory and documentation.</p></li>
<li><p>Clean up the juno directory.</p></li>
<li><p>Add references to backlog in the contributing documentation.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The “How to Participate” document should be updated to make reference to
the backlog directory as a place to look for new work.</p>
</section>
<section id="references">
<h2>References</h2>
<p>Keystone is an example of another project using this format for backlogged
work. Examples of the documentation for the backlog directory[1] and the
root documentation[2] will be used as reference in the creation of Sahara
specific versions.</p>
<p>[1]: <a class="reference external" href="https://github.com/openstack/keystone-specs/tree/master/specs/backlog">https://github.com/openstack/keystone-specs/tree/master/specs/backlog</a>
[2]: <a class="reference external" href="https://github.com/openstack/keystone-specs">https://github.com/openstack/keystone-specs</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 Storm Integrationhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/storm-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/storm-integration">https://blueprints.launchpad.net/sahara/+spec/storm-integration</a></p>
<p>This blueprint aims to implement Storm as a processing option in Sahara.
Storm is a real time processing framework that is highly used in the big
data processing community.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara today is only able to process big data in batch. Storm provides a
easy setup stream processing. Having storm integrated with Sahara gives sahara
a new feature, so not only users will be able to process batch but also real
time data.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ul>
<li><p>The implementation is divided in three steps:</p>
<ul class="simple">
<li><p>First we need to implement Storm Plugin.</p>
<ul>
<li><p>Identify Storm configuration files and manage their creation via sahara;</p></li>
<li><p>Create the plugin itself to manage Storm deploy;</p></li>
</ul>
</li>
<li><p>Second, we need to create a new Job Manager for storm, following Trevor
McKay’s refactoring</p></li>
<li><p>Last we will create a new image that comes with storm installed.</p></li>
</ul>
</li>
<li><p>Node Groups:</p>
<ul>
<li><p>Storm has two basic components <strong>Nimbus</strong> and <strong>Supervisor</strong>, Nimbus is
the master node, hosts the UI and is the main node of the topology.
Supervisors are the worker nodes. Other than that, storm needs only
zookeeper to run, we need have it as well.
The basic Node Groups that we will have are:</p>
<blockquote>
<div><ul class="simple">
<li><p>Nimbus;</p></li>
<li><p>Supervisor;</p></li>
<li><p>Nimbus and Supervisor;</p></li>
<li><p>Zookeeper;</p></li>
<li><p>Nimbus and Zookeeper;</p></li>
<li><p>Supervisor and Zookeeper;</p></li>
<li><p>Nimbus, Supervisor and Zookeeper</p></li>
</ul>
</div></blockquote>
</li>
</ul>
</li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Sahara-image-elements may be the place to create storm image, it will need
deeper investigation but looks like the best place to create and publish
storm images.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>The first changes needed here is to show the configuration options of Storm
when this type of job is selected.
There will be also necessary to have a deploy job where the user can submit
the topology jar and the command line to run it.
As for now, I don’t see other changes in the UI, the jobs are very simple and
need no special configurations.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee:</p>
<blockquote>
<div><ul class="simple">
<li><p>tellesmvn</p></li>
</ul>
</div></blockquote>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The implementation is divided in three steps:</p>
<ul class="simple">
<li><p>First we need to implement Storm Plugin.</p>
<ul>
<li><p>Storm plugin is similar to Spark plugin. The implementation will be based
on Spark’s but necessary changes will be made.</p></li>
<li><p>Storm doesn’t rely on many configuration files, there is only one needed
and it is used by all nodes. This configuration file is written in YAML
and it should be dynamically written in the plugin since it needs to have
the name or ip of the master node and also zookeeper node(s). We will need
PYYAML to parse this configuration to YAML. PYYAML is already a global
requirement of OpenStack and will be added to Sahara’s requirement as well.</p></li>
<li><p>The plugin will run the following processes:</p>
<ul>
<li><p>Storm Nimbus;</p></li>
<li><p>Storm Supervisor;</p></li>
<li><p>Zookeeper.</p></li>
</ul>
</li>
</ul>
</li>
<li><p>Second, we need to create a new Job Manager for storm, following Trevor
McKay’s refactoring</p>
<ul>
<li><p>This implementation is under review and the details can be seen here:
<a class="reference external" href="https://review.openstack.org/#/c/100678/">https://review.openstack.org/#/c/100678/</a></p></li>
</ul>
</li>
<li><p>Last we will create a new image that comes with storm installed.</p>
<ul>
<li><p>This part is yet not the final decision, but it seems to me that it is
better to have a prepared image with storm than having to install it
every time a new cluster is set up.</p></li>
</ul>
</li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://review.openstack.org/#/c/100622/">https://review.openstack.org/#/c/100622/</a></p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>I will write Unit Tests, basically to test the write configuration file and
more tests can be added as needed.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We will need to add Storm as a new plugin in the documentation and write how
to use the plugin.
Also a example in sahara-extra on how to run a storm job will be provided</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><cite>Wiki <https://wiki.openstack.org/wiki/HierarchicalMultitenancy></cite></p></li>
<li><p><cite>Etherpad <https://etherpad.openstack.org/p/juno-summit-sahara-edp></cite></p></li>
<li><p><cite>Storm Documentation <http://storm.incubator.apache.org/></cite></p></li>
</ul>
</section>
Thu, 29 Oct 2015 00:00:00 Spec - Add support for filtering resultshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/support-query-filtering.html
<p>Blueprints:</p>
<p>Sahara service: <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/enable-result-filtering">https://blueprints.launchpad.net/sahara/+spec/enable-result-filtering</a></p>
<p>Client library: <a class="reference external" href="https://blueprints.launchpad.net/python-saharaclient/+spec/enable-result-filtering">https://blueprints.launchpad.net/python-saharaclient/+spec/enable-result-filtering</a></p>
<p>Horizon UI: <a class="reference external" href="https://blueprints.launchpad.net/horizon/+spec/data-processing-table-filtering">https://blueprints.launchpad.net/horizon/+spec/data-processing-table-filtering</a></p>
<p>As result sets can be very large when dealing with nontrivial openstack
environments, it is desirable to be able to filter those result sets by
some set of field constraints. This spec lays out how Sahara should support
such queries. Changes will be required in the API, client library,
and in the Horizon UI.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The original use case for this change came up when we wanted to provide a
way to filter some of the tables in the UI. In order to make that filtering
possible, either the UI has to perform the filtering (inefficient since
large result sets would still be returned by the api) or the api/client
library need to provide support for such filtering. Upon review of other
api/client libraries, it looks like most, if not all,
provide filtering capabilities. Sahara should also make filtering possible.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>In order to make filtering possible, each of the “list()” methods in the
client library should be extended to take an optional (default=None)
parameter, “search_opts”. “search_opts” should be a dict that will include
one or more {field:query} entries. Pagination can also be handled through
search_opts by allowing for the setting of page/page_size/max results.</p>
<p>In addition, the REST api needs to be extended for the list() operations to
support query syntax. This will require changes to how Sahara currently
does its database queries to include query options. Currently,
they are set to just return all entries for each call.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Filtering could be done entirely in the UI, but that is a wasteful way to do
it since each library/api call will still return the full result set.
Also, any users of the REST api or the client library would not benefit
from this functionality.</p>
<p>It would also be possible to just implement filtering at the client library
level, but for the same reasons that doing it in the UI is undesirable,
this approach is also suboptimal.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>No changes to the model itself should be required. Only changes to how we
query the database are needed.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>The rest api methods for the list operations will need to be changed to
support query-style parameters. The underlaying database access methods
will need to be changed from the basic “get all” functionality to support a
<cite>**kwargs</cite> parameter (this is already done in cluster_get_all(),
but needs to be done for the other methods).</p>
<p>There are a pair of special cases to be considered for filtering on the Job
Executions table. The “job” and “cluster” columns actually contain
information that are not part of the job execution object. For filters on
those fields, a field value of “job.name” or “cluster.name” should be passed
down. That will trigger the database query to be joined to a filter on the
name property of either the job or cluster table.</p>
<p>Additionally, there is special handling required to search on the Job
Executions status field. This is because status is not a proper field in
the Job Execution itself, but rather it is part of the “info” field. To
handle this, a bit of manual filtering will be done against the info.status
field.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users of the REST API, client library and Horizon UI will be able to filter
their result sets via simple queries.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>These changes should not affect deployment of the Sahara service.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>No developer impact.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>No sahara-image-elements impact.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Once the REST API and client library changes are in place,
the Horizon UI will be modified to allow for filtering on the following
tables: Clusters, Cluster Templates, Node Group Templates, Job Executions,
Jobs. It would also be possible to filter any of our tables in the future
without any other API/client library changes.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>croberts (Horizon UI)</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay (REST API/sahara.db.api)
croberts (Client Library)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>These may be able to be worked on in parallel, but they must be released in
the order they are given below.</p>
<ol class="arabic simple">
<li><p>REST API implementation</p></li>
<li><p>Client library implementation</p></li>
<li><p>Horizon UI implementation</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>There should be tests added against the REST interface,
client library and Horizon UI to test the filtering capabilities being added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Docs for the REST API will need to be updated to reflect the querying
functionality.</p>
<p>Docs for the client library will need to be updated to reflect the querying
functionality.</p>
<p>The dashboard user guide should be updated to note the table filtering
functionality that will be added.</p>
</section>
<section id="references">
<h2>References</h2>
<p>Sahara service: <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/enable-result-filtering">https://blueprints.launchpad.net/sahara/+spec/enable-result-filtering</a></p>
<p>Client library: <a class="reference external" href="https://blueprints.launchpad.net/python-saharaclient/+spec/enable-result-filtering">https://blueprints.launchpad.net/python-saharaclient/+spec/enable-result-filtering</a></p>
<p>Horizon UI: <a class="reference external" href="https://blueprints.launchpad.net/horizon/+spec/data-processing-table-filtering">https://blueprints.launchpad.net/horizon/+spec/data-processing-table-filtering</a></p>
</section>
Thu, 29 Oct 2015 00:00:00 CLI: delete by multiple names or idshttps://specs.openstack.org/openstack/sahara-specs/specs/saharaclient/cli-delete-by-multiple-names-or-ids.html
<p><a class="reference external" href="https://blueprints.launchpad.net/python-saharaclient/+spec/cli-delete-by-multiple-names-or-ids">https://blueprints.launchpad.net/python-saharaclient/+spec/cli-delete-by-multiple-names-or-ids</a></p>
<p>In sahara-cli you can only delete an object by name or id. In several
os clients, such as nova and heat, you can delete an object by
providing a list of ids or names. This blueprint is about adding this
feature in sahara-cli.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The CLI does not include deletion of objects by list of names or ids.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ul class="simple">
<li><p>Long term solution</p></li>
</ul>
<p>Our long term goal is to have the sahara-cli more consistent with
other os clients.</p>
<p>Current CLI usage for cluster-delete:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span> <span class="n">cluster</span><span class="o">-</span><span class="n">delete</span> <span class="p">[</span><span class="o">--</span><span class="n">name</span> <span class="n">NAME</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="nb">id</span> <span class="o"><</span><span class="n">cluster_id</span><span class="o">></span><span class="p">]</span>
</pre></div>
</div>
<p>Nova CLI usage for delete:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">nova</span> <span class="n">delete</span> <span class="o"><</span><span class="n">server</span><span class="o">></span> <span class="p">[</span><span class="o"><</span><span class="n">server</span><span class="o">></span> <span class="o">...</span><span class="p">]</span>
</pre></div>
</div>
<p>In nova-cli, and other os clients, you pass directly the id(s) or the
name(s) of the items you want to delete. We can refactor sahara-cli
to remove the –name and –id arguments. So in long term the usage
of sahara cli will be:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/>sahara cluster-delete <cluster> [<cluster> ...]
Positional arguments:
<cluster> Name or ID of cluster(s).``
</pre></div>
</div>
<ul class="simple">
<li><p>Short term solution</p></li>
</ul>
<p>Note that the CLI refactoring will take substantial time, so as
short term solution, can temporary use –names and –ids for all
delete verbs of the CLI. And once the CLI will be refactored,
we will remove all –name(s) and –id(s) arguments.</p>
<p>So the proposed change implies to add –names and –ids arguments
which consist of a Comma separated list of names and ids:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span> <span class="n">cluster</span><span class="o">-</span><span class="n">delete</span> <span class="p">[</span><span class="o">--</span><span class="n">name</span> <span class="n">NAME</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="nb">id</span> <span class="n">cluster_id</span><span class="p">]</span>
<span class="p">[</span><span class="o">--</span><span class="n">names</span> <span class="n">NAMES</span><span class="p">]</span> <span class="p">[</span><span class="o">--</span><span class="n">ids</span> <span class="n">IDS</span><span class="p">]</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>No short term solution and just depend on the CLI refactoring
to provide this feature.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<p>Update CLI methods *_delete in saharaclient/api/shell.py</p>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:
Pierre Padrixe (stannie)</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add delete by list of names or ids in the CLI</p></li>
<li><p>Once the CLI is refactored, remove –name(s) –id(s) arguments</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p>For long term solution: we depend on the refactoring of the CLI</p></li>
<li><p>For short term solution: none</p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Update the tests to delete list of names and ids</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation of the CLI needs to be updated</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 29 Oct 2015 00:00:00 Heat WaitConditions supporthttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/sahara-heat-wait-conditions.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-heat-wait-conditions">https://blueprints.launchpad.net/sahara/+spec/sahara-heat-wait-conditions</a></p>
<p>Before Heat engine in Sahara Nova was continuously asked for fixed and
assigned floating IP and for active SSH connections to VMs. To get rid of
such polling mechanism suggested to use Heat WaitConditions feature.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now Sahara checks instances availability via SSH. Wait Condition resource
supports reporting signals to Heat. We should report signal to Heat about
booting instance.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add WaitCondition resource to Sahara Heat template.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Using SSH for polling instance accessible.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>sreshetniak</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Add Wait Condition support to Sahara</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>WaitCondition requires pre-installed cloud-init.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Need to add unit tests for this feature.
Integration tests will cover this feature.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::WaitCondition">http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::WaitCondition</a></p>
</section>
Wed, 19 Aug 2015 00:00:00 CDH YARN ResourceManager HA Supporthttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/enable-cdh-rm-ha.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-ha-support">https://blueprints.launchpad.net/sahara/+spec/cdh-ha-support</a></p>
<p>This blueprint aims to implement YARN ResourceManager (RM) High-Availability
(HA) for Cloudra plugin.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently Cloudera plugin does not support HA for YARN ResourceManager.
Therefore we plan to implement RM HA.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation of the RM HA will be done via CM API enable_rm_ha(). This
API will help us enable ResourceManager HA by giving several info augments.</p>
<p>To achieve RM HA, we need to add one Standby NodeManager in the node
of the cluster templates (Cloudera Plugin only support one Standby NodeManager,
while the other implementations might allow >1 Standby NodeManager).
When RM HA is enabled, we will start both the Primary and Standby NodeManagers,
let the Primary NodeManager be active, and leave the Standby NodManager
standby.</p>
<p>CM API enable_rm_ha accepts a new ResourceManager Host new_rm_host_id as
parameter. A Standby ResourceManager will be started on new_rm_host_id node.</p>
<p>ResourceManager HA requires Primary ResourceManager and Standby ResourceManager
roles deployed on different physical hosts. Zookeeper service is required too.</p>
<p>When ResourceManager HA is enabled, Oozie or applications depended on RM should
be able to check all available RMs and get the active RM by themselves. There
is no way to automatically switch between RMs smoothly.</p>
<p>Overall, we will implement RM HA as below:</p>
<ul class="simple">
<li><p>Add a role YARN_STANDBYRM.</p></li>
<li><p>If YARN_STANDBYRM was selected by user (cluster admin), then YARN RM HA will
be enabled.</p></li>
<li><p>If RM HA is enabled, we will check Anti-affinity to make sure YARN_STANDBYRM
and YARN_RESOURCEMANAGER will not be on the same physical host.</p></li>
<li><p>If RM HA is enabled, Zookeeper service is required in the cluster.</p></li>
<li><p>If RM HA is enabled, we will create cluster with ResourceManager on node
where YARN_RESOURCEMANAGER role is assigned.</p></li>
<li><p>If RM HA is enabled, after the cluster is started, we will call enable_rm_ha
to enabled RM HA by using the YARN_STANDBYRM node as parameter.</p></li>
</ul>
<p>It should be noted that, if HA is enabled, in Oozie workflow xml file, we need
to detect the active ResourceManager in method get_resource_manager_uri and
pass it to Oozie each time when we use it. I plan to include this part of codes
in later patches.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Ken Chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Changes will be only in sahara/plugins/cdh directory. We will only do this
based on CDH 5.4.0 at this stage. CDH 5.0.0 and CDH 5.3.0 plugins will not be
supported. Changes were described in the Proposed change section.</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will only do primitive checks: create a Cloudera cluster with RM HA, and
see whether it is active.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The documentation needs to be updated with information about enabling CDH YARN
ResourceManager HA.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><cite>Configuring High Availability for ResourceManager (MRv2/YARN) <http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_rm_ha_config.html/></cite></p></li>
</ul>
</section>
Thu, 13 Aug 2015 00:00:00 Manila as a runtime data sourcehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/manila-as-a-data-source.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data-source">https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data-source</a></p>
<p>Using manila nfs shares and the mounting features developed for use with
job binaries, it should be feasible to use nfs shares to host input
and output data. This will allow data to be referenced through local
filesystem paths as a another simple alternative to hdfs or swift storage.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The work has already been done to support mounting of manila nfs shares at
cluster provisioning time or automounting shares for EDP job binaries with a
url of the form <em>manila://<share-id>/path</em>. Additionally, the Hadoop filesystem
APIs already support <em>file:///path</em> urls for referencing the local filesystem.</p>
<p>Sahara can build on these existing features by allowing
<em>manila://<share-id>/path</em> urls for data sources, automounting shares
referenced by data sources when necessary, and generating the correct
local filesystem urls for EDP jobs at runtime.</p>
<p>Some of the benefits of this approach are:</p>
<ul class="simple">
<li><p>parity for job binaries and data sources in regards to manila shares</p></li>
<li><p>the ability to store job binaries and data sources in the same share</p></li>
<li><p>the flexibility to add a new data share to a cluster at any time</p></li>
<li><p>the ability to operate on data from a cluster node using native OS tools</p></li>
<li><p>works on any node where <em>mount -t nfs …</em> is supported</p></li>
<li><p>lays the groundwork for other manila share types in the future</p></li>
</ul>
<p>The problem can be divided into three high-level items:</p>
<ul class="simple">
<li><p>Add a <em>manila</em> data source type with validation of <em>manila://</em> urls</p></li>
<li><p>Translate <em>manila://</em> urls to <em>file://</em> urls for use at job runtime</p></li>
<li><p>Call the existing automounting methods when a data source references
a new share</p></li>
</ul>
<p>Note, automounting and url translation will only work for manila shares
referenced by data source objects. A <em>manila://</em> url embedded as a literal
in a job config, param, or arg will be ignored. It will not be translated
to a <em>file://</em> url by Sahara and it will not cause automounting. However,
there is a precedent for this – Sahara currently has other features that
are only supported on data source objects, not on literal urls. (It may
be possible to remove these limitations in the future through greater
use of the unified job mapping interface recently introduced).</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>A <em>manila</em> data source type will be added to the JSON schema for data sources,
with appropriate validation of <em>manilla://</em> urls.</p>
<p>The existing code in <em>sahara/service/edp/binary_retrievers/manila_share.py</em>
that supports path name generation and automounting of manila nfs shares for
job binaries will be refactored and broken up between
<em>sahara/service/edp/job_utils.py</em> and <em>sahara/service/shares.py</em>. The essential
implementation is complete, but this logic needs to be callable from multiple
places and in different combinations to support data sources.</p>
<p>Currently, all data source urls are returned to the EDP engines from
<em>get_data_sources()</em> and <em>resolve_data_source_references()</em> in <em>job_utils.py</em>.
The returned urls are recorded in the job_execution object and used by the
EDP engine to generate the job on the cluster. These two routines will be
extended to handle manila data sources in the following ways:</p>
<ul class="simple">
<li><p>Mount a referenced nfs share on the cluster when necessary. Since an
EDP job runs on multiple nodes, the share must be mounted to the whole
cluster instead of to an individual instance</p></li>
<li><p>Translate the <em>manila://</em> url to a <em>file://</em> url and return both urls
Since the submission time url and the runtime url for these data
sources will be different, both must be returned. Sahara will record
the submission time url in the job_execution but use the runtime url
for job generation</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do not support <em>manila://</em> urls for data sources but support data hosted on nfs
as described in</p>
<p><a class="reference external" href="https://review.openstack.org/#/c/210839/">https://review.openstack.org/#/c/210839/</a></p>
<p>However, these features are complementary, not mutually exclusive, and most of
the appartus necessary to make this proposal work already exists.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None (only “manila” as a valid data source type in the JSON schema)</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Obviously, if this feature is desired then the manila service should be running</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None (nfs-utils element is already underway)</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Sahara needs a manila data source type on the data source creation form</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tmckay</p>
</dd>
<dt>Other contributors:</dt><dd><p>croberts, egafford</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add manila data source type to the JSON schema</p></li>
<li><p>Allow submission time and runtime urls to differ
(<a class="reference external" href="https://review.openstack.org/#/c/209634/3">https://review.openstack.org/#/c/209634/3</a>)</p></li>
<li><p>Refactor binary_retrievers/manila_share.py</p></li>
<li><p>Extend job_utils get_data_sources() and resolve_data_source_references()
to handle manila:// urls</p></li>
<li><p>Add manila data source creation to Horizon</p></li>
<li><p>Modify/extend unit tests</p></li>
<li><p>Documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/manila-as-binary-store">https://blueprints.launchpad.net/sahara/+spec/manila-as-binary-store</a></p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests.</p>
<p>Eventually, as with job binaries, this can be tested with integration
tests if/when we have manila support in the gate</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Discussion of the manila data source type should be added to any
sections we currently have that talk about data being hosted in swift of hdfs.</p>
<p>Additionally, we should consider adding information to the Sahara section
of the security guide on the implications of using manila data shares.</p>
<p>If the security guide or the manila documentation contains a section on
security, this probably can be a short discussion from a Sahara perspective
with a link to the security info. If there isn’t such a section currently, then
probably there should be a separate CR against the security guide to create a
section for Manila.</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Thu, 13 Aug 2015 00:00:00 Support of shared and protected resourceshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/shared-protected-resources.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/shared-protected-resources">https://blueprints.launchpad.net/sahara/+spec/shared-protected-resources</a></p>
<p>This specification proposes to add ability of creation and modification of
shared across tenants and protected from updates objects.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently all objects created by Sahara are visible only from the tenant in
which they were created and not insured from occasional modification or
deletion.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This specification proposes to add <code class="docutils literal notranslate"><span class="pre">is_public</span></code> and <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> boolean
fields to all Sahara objects that can be accessed through REST API. They will
be added to clusters, cluster templates, node group templates, data sources,
job executions, jobs, job binaries and job binary internals.</p>
<p>All this objects can be created with enabled <code class="docutils literal notranslate"><span class="pre">is_public</span></code> and
<code class="docutils literal notranslate"><span class="pre">is_protected</span></code> parameters which can be updated after creation with
corresponding API call. Both of them will be False by default.</p>
<p>If some object has <code class="docutils literal notranslate"><span class="pre">is_public</span></code> field set to <code class="docutils literal notranslate"><span class="pre">True</span></code>, it means that it’s
visible not only from the tenant in which it was created, but from any other
tenants too.</p>
<p>If some object has <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> field set to <code class="docutils literal notranslate"><span class="pre">True</span></code>, it means that it
could not be modified (updated, scaled, canceled or deleted) unless this field
will be set to <code class="docutils literal notranslate"><span class="pre">False</span></code>. If <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> parameter is set to <code class="docutils literal notranslate"><span class="pre">True</span></code>,
object can be modified only if <code class="docutils literal notranslate"><span class="pre">is_protected=False</span></code> will be supplied in
update request.</p>
<p>Public objects created in one tenant can be used by other tenants (for example,
cluster can be created from public cluster template which is created in another
tenant), but to prevent management of resources in different tenants,
operations like update, delete, cancel and scale will be possible only from
tenant in which object was created.</p>
<p>To control this restrictions, a couple of methods will be implemented in
<code class="docutils literal notranslate"><span class="pre">sahara.service.validation.acl</span></code>:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="k">def</span> <span class="nf">check_tenant_for_delete</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="nb">object</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">check_tenant_for_update</span><span class="p">(</span><span class="n">context</span><span class="p">,</span> <span class="nb">object</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">check_protected_from_delete</span><span class="p">(</span><span class="nb">object</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">check_protected_from_update</span><span class="p">(</span><span class="nb">object</span><span class="p">,</span> <span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">check_tenant_for_*</span></code> will compare tenant_id in context with object tenant_id
and if they different, raise an error. But this check should be skipped for
periodics as there is no tenant_id in context in this case.</p>
<p><code class="docutils literal notranslate"><span class="pre">check_protected_from_delete</span></code> will check <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> field and if it’s
set to True, raise an error.
<code class="docutils literal notranslate"><span class="pre">check_protected_from_update</span></code> will additionally check that <code class="docutils literal notranslate"><span class="pre">is_protected</span></code>
field wasn’t changed to <code class="docutils literal notranslate"><span class="pre">False</span></code> with update data.</p>
<p>This methods will be called mostly in <code class="docutils literal notranslate"><span class="pre">sahara.db.sqlalchemy.api</span></code> inside of
update and delete methods that make only db changes. But for cluster_create,
cluster_scale, job_execute, job_execution_cancel and job_execution_delete
operations they will be called during validation before api calls.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Two extra fields <code class="docutils literal notranslate"><span class="pre">is_public</span></code> and <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> will be added to
objects listed above.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>New API calls will not be added, but existing ones will be updated to support
new fields.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Saharaclient API will be updated to support new fields.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p><code class="docutils literal notranslate"><span class="pre">is_public</span></code> and <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> checkboxes will be added to <code class="docutils literal notranslate"><span class="pre">Update</span></code> and
<code class="docutils literal notranslate"><span class="pre">Create</span></code> panels of each object.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding new fields <code class="docutils literal notranslate"><span class="pre">is_public</span></code> and <code class="docutils literal notranslate"><span class="pre">is_protected</span></code> to objects listed above;</p></li>
<li><p>Implementation of validations, described above;</p></li>
<li><p>Updating saharaclient with corresponding changed;</p></li>
<li><p>Documentation about new features will be added.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be added and a lot of manual testing.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>All changes will be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 04 Aug 2015 00:00:00 Objects update support in Sahara APIhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/api-for-objects-update.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/api-for-objects-update">https://blueprints.launchpad.net/sahara/+spec/api-for-objects-update</a></p>
<p>This specification proposes to add api calls that allows to update objects that
currently cant be updated this way.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current Sahara API doesn’t support update of some objects, which can be
required by some other features (for example it’s needed for shared and
protected resources implementation that will be proposed in ACL spec later).</p>
<p>Updates are already implemented for node group templates, cluster templates,
job binaries, data sources and should be done for clusters,
jobs, job executions and job binary internals.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Update operation will be added for cluster, job, job execution and job binary
internal objects.</p>
<p>For clusters and jobs only description and name update will be allowed for now.
For job binary internals only name will be allowed to update.
There is nothing will be allowed to update for job executions, only
corresponding methods will be added.</p>
<p>Also will be added support of PATCH HTTP method to modify existing resources.
It will be implemented the same way as current PUT method.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Following API calls will be added:</p>
<p><strong>PATCH /v1.1/{tenant_id}/clusters/{cluster_id}</strong></p>
<p><strong>PATCH /v1.1/{tenant_id}/jobs/{job_id}</strong></p>
<p><strong>PATCH /v1.1/{tenant_id}/job-executions/{job_execution_id}</strong></p>
<p><strong>PATCH /v1.1/{tenant_id}/job-binary-internals/{job_binary_internal_id}</strong></p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>This update methods will be added to saharaclient API.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>This update methods will not be added to Horizon yet, but will be added later
as part of ACL spec.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding PATCH method;</p></li>
<li><p>Adding new API calls;</p></li>
<li><p>Adding operations to saharaclient;</p></li>
<li><p>Documentation update in api-ref.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests and API tests in tempest will be added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Sahara REST API documentation in api-ref will be updated.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 03 Aug 2015 00:00:00 CDH HDFS HA Supporthttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/enable-cdh-hdfs-ha.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-ha-support">https://blueprints.launchpad.net/sahara/+spec/cdh-ha-support</a></p>
<p>This blueprint aims to implement HDFS High-Availability (HA) for Cloudra
plugin.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently Cloudera plugin does not support HA for services. We plan to
implement HDFS HA as the first step. HA for Yarn and other services will be
the later steps.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation of the HDFS HA will be done via CM API enable_nn_ha(). This
API will help us enable HDFS HA by giving several info augments.</p>
<p>CDH 5 only supports Quorum-based Storage as the only HA implementation, so we
will only implement this. To achieve this, we need to add a Standby NameNode,
and several JournalNodes. The JournalNode number should be odd and at least 3.
When HDFS HA is enabled, SecondaryNameNode will not be used. So we can reuse
the node for SecondaryNameNode for StandbyNameNode.</p>
<p>HDFS HA has several hardware constraints (see the reference link). However,
for all resources are virtual in Openstack, we will only require NameNode and
StandbyNameNode are on different physical hosts.</p>
<p>Overall, we will implement HDFS as below:</p>
<ul class="simple">
<li><p>Add a role JournalNode.</p></li>
<li><p>If JournalNode was selected by user (cluster admin), then HA will be enabled.</p></li>
<li><p>If HA is enabled, we will validate whether JournalNode number meet
requirements.</p></li>
<li><p>JournalNode roles will not be really created during cluster creation. In fact
they will be used as parameters of CM API enable_nn_ha.</p></li>
<li><p>If HA is enabled, we will use SecondaryNameNode as the StandbyNameNode.</p></li>
<li><p>If HA is enabled, we will set Anti-affinity to make sure NameNode and
SecondaryNameNode will not be on the same physical host.</p></li>
<li><p>If HA is enabled, Zookeeper service is required in the cluster.</p></li>
<li><p>After the cluster was started, we will call enable_nn_ha to enable HDFS HA.</p></li>
<li><p>If HA is enabled, in Oozie workflow xml file, we will give nameservice name
instead of the NameNode name in method get_name_node_uri. So that the cluster
can determine by itself which NameNode is active.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Ken Chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Changes will be only in sahara/plugins/cdh directory. We will only do this
based on CDH 5.4.0 at this stage. CDH 5.0.0 and CDH 5.3.0 plugins will not be
supported. Changes were described in the Proposed change section.</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will only do primitive checks: create a Cloudera cluster with HDFS HA, and
see whether it is active.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The documentation needs to be updated with information about enabling CDH HDFS
HA.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><cite>NameNode HA with QJM <http://www.edureka.co/blog/namenode-high-availability-with-quorum-journal-manager-qjm/></cite></p></li>
<li><p><cite>Introduction to HDFS HA <http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_hdfs_ha_intro.html></cite></p></li>
<li><p><cite>Enable HDFS HA Using Cloudera Manager <http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_hdfs_ha_enabling.html#cmug_topic_5_12_unique_1></cite></p></li>
<li><p><cite>Configuring Hardware for HDFS HA <http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_hdfs_ha_hardware_config.html></cite></p></li>
</ul>
</section>
Wed, 29 Jul 2015 00:00:00 SaharaClient CLI as an OpenstackClient pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/saharaclient/cli-as-openstackclient-plugin.html
<p><a class="reference external" href="https://blueprints.launchpad.net/python-saharaclient/+spec/cli-as-openstackclient-plugin">https://blueprints.launchpad.net/python-saharaclient/+spec/cli-as-openstackclient-plugin</a></p>
<p>This specification proposes to create SaharaClient CLI as an OpenstackClient
plugin.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently SaharaClient CLI has a lot of problems and is not so attractive
as wanted to be. It should be refactored or recreated from the start.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>New SaharaClient CLI will be based on OpenstackClient that brings the command
set for different projects APIs together in a single shell with a uniform
command structure.</p>
<p>OpenStackClient provides first class support for the several services. The
other ones (including Data Processing service) may create an OpenStackClient
plugin.</p>
<p>Proposed Objects:</p>
<blockquote>
<div><ul class="simple">
<li><p>plugin</p></li>
<li><p>image</p></li>
<li><p>data source</p></li>
<li><p>job template</p></li>
<li><p>job</p></li>
<li><p>job binary</p></li>
<li><p>node group template</p></li>
<li><p>cluster template</p></li>
<li><p>cluster</p></li>
</ul>
</div></blockquote>
<p>Proposed Commands:</p>
<p>All commands will have a prefix <code class="docutils literal notranslate"><span class="pre">dataprocessing</span></code>.
Arguments in <code class="docutils literal notranslate"><span class="pre">[]</span></code> are optional, in <code class="docutils literal notranslate"><span class="pre"><></span></code> are positional.</p>
<p>For Plugins:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>plugin<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
plugin<span class="w"> </span>show<span class="w"> </span><plugin>
plugin<span class="w"> </span>configs<span class="w"> </span>get<span class="w"> </span><plugin><span class="w"> </span><version><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="c1"># default name of</span>
<span class="c1"># the file is <plugin></span>
</pre></div>
</div>
<p>Detailed description of a plugin contains too much information to display it
on the screen. It will be saved in file instead. If provided file exists, data
will not be rewritten.</p>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">list</span></code>: name, versions.
Columns for <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, versions, title, description.
Rows for <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">show</span></code>: name, versions, title, description.</p>
<p>For Images:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>image<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--tags<span class="w"> </span><tag<span class="o">(</span>s<span class="o">)</span>><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
image<span class="w"> </span>show<span class="w"> </span><image>
image<span class="w"> </span>register<span class="w"> </span><image><span class="w"> </span><username><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span>
image<span class="w"> </span>unregister<span class="w"> </span><image<span class="o">(</span>s<span class="o">)</span>>
image<span class="w"> </span>tags<span class="w"> </span><span class="nb">set</span><span class="w"> </span><image><span class="w"> </span><tag<span class="o">(</span>s<span class="o">)</span>>
image<span class="w"> </span>tags<span class="w"> </span>add<span class="w"> </span><image><span class="w"> </span><tag<span class="o">(</span>s<span class="o">)</span>>
image<span class="w"> </span>tags<span class="w"> </span>remove<span class="w"> </span><image><span class="w"> </span><tag<span class="o">(</span>s<span class="o">)</span>>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">tags</span> <span class="pre">set</span></code> will replace current image tags with provided ones.
<code class="docutils literal notranslate"><span class="pre">tags</span> <span class="pre">remove</span></code> will support passing <code class="docutils literal notranslate"><span class="pre">all</span></code> to <code class="docutils literal notranslate"><span class="pre">tags</span></code> argument to remove all
tags.</p>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">image</span> <span class="pre">list</span></code>: name, id, username, tags.
Columns for <code class="docutils literal notranslate"><span class="pre">image</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, username, tags, status,
description.
Rows for <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">show</span></code>: name, id, username, tags, status, description.</p>
<p>For Data Sources:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>data<span class="w"> </span><span class="nb">source</span><span class="w"> </span>create<span class="w"> </span><name><span class="w"> </span><type><span class="w"> </span><url><span class="w"> </span><span class="o">[</span>--password<span class="w"> </span><password><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--username<span class="w"> </span><user><span class="o">]</span><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span>
data<span class="w"> </span><span class="nb">source</span><span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--type<span class="w"> </span><type><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
data<span class="w"> </span><span class="nb">source</span><span class="w"> </span>show<span class="w"> </span><datasource>
data<span class="w"> </span><span class="nb">source</span><span class="w"> </span>delete<span class="w"> </span><datasource<span class="o">(</span>s<span class="o">)</span>>
data<span class="w"> </span><span class="nb">source</span><span class="w"> </span>update<span class="w"> </span><datasource><span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name><span class="o">]</span><span class="w"> </span><span class="o">[</span>--type<span class="w"> </span><type><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--url<span class="w"> </span><url><span class="o">]</span><span class="w"> </span><span class="o">[</span>--password<span class="w"> </span><password><span class="o">]</span><span class="w"> </span><span class="o">[</span>--username<span class="w"> </span><user><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span>
</pre></div>
</div>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">data</span> <span class="pre">source</span> <span class="pre">list</span></code>: name, id, type.
Columns for <code class="docutils literal notranslate"><span class="pre">data</span> <span class="pre">source</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, type, url, description.
Rows for <code class="docutils literal notranslate"><span class="pre">data</span> <span class="pre">source</span> <span class="pre">show</span></code>: name, id, type, url, description.</p>
<p>New CLI behavior in case of Node Group Templates, Cluster Templates and
Clusters creation will be pretty much the same as in Horizon, but additionally
will allow to create them from json.
It doesn’t let to mark some arguments as required for successful creation, but
it could be done in help strings.</p>
<p>For Job Binaries:
Job Binaries and Job Binary Internals will be combined</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>job<span class="w"> </span>binary<span class="w"> </span>create<span class="w"> </span><name><span class="w"> </span><span class="o">[</span>--data<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--url<span class="w"> </span><url><span class="o">]</span><span class="w"> </span><span class="o">[</span>--username<span class="w"> </span><username><span class="o">]</span><span class="w"> </span><span class="o">[</span>--password<span class="w"> </span><password><span class="o">]</span>
job<span class="w"> </span>binary<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name-regex><span class="o">]</span>
job<span class="w"> </span>binary<span class="w"> </span>show<span class="w"> </span><job-binary>
job<span class="w"> </span>binary<span class="w"> </span>update<span class="w"> </span><job-binary><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span><span class="w"> </span><span class="o">[</span>--url<span class="w"> </span><url><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--username<span class="w"> </span><username><span class="o">]</span><span class="w"> </span><span class="o">[</span>--password<span class="w"> </span><password><span class="o">]</span>
job<span class="w"> </span>binary<span class="w"> </span>delete<span class="w"> </span><job-binary<span class="o">(</span>ies<span class="o">)</span>>
job<span class="w"> </span>binary<span class="w"> </span>download<span class="w"> </span><job-binary><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><filepath><span class="o">]</span>
</pre></div>
</div>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">binary</span> <span class="pre">list</span></code>: name, id.
Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">binary</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, url, description.
Rows for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">binary</span> <span class="pre">show</span></code>: name, id, url, description.</p>
<p>For Node Group Templates:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>create<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name><span class="o">]</span><span class="w"> </span><span class="o">[</span>--plugin<span class="w"> </span><plugin><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--version<span class="w"> </span><version><span class="o">]</span><span class="w"> </span><span class="o">[</span>--flavor<span class="w"> </span><flavor><span class="o">]</span><span class="w"> </span><span class="o">[</span>--autoconfigs<span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--node-processes<span class="w"> </span><node-processes><span class="o">]</span><span class="w"> </span><span class="o">[</span>--floating-ip-pool<span class="w"> </span><pool><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--proxy-gateway<span class="o">]</span><span class="w"> </span><span class="o">[</span>--configs<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="c1"># and other arguments except of "image-id"</span>
node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--plugin<span class="w"> </span><plugin><span class="o">]</span><span class="w"> </span><span class="o">[</span>--version<span class="w"> </span><version><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name-regex><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>show<span class="w"> </span><node-group-template>
node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>configs<span class="w"> </span>get<span class="w"> </span><node-group-template><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="c1"># default name of the file is <node-group-template></span>
node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>update<span class="w"> </span><node-group-template><span class="w"> </span>...<span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="c1"># and other arguments the same as in create command</span>
node<span class="w"> </span>group<span class="w"> </span>template<span class="w"> </span>delete<span class="w"> </span><node-group-template<span class="o">(</span>s<span class="o">)</span>>
</pre></div>
</div>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">node</span> <span class="pre">group</span> <span class="pre">template</span> <span class="pre">list</span></code>: name, id, plugin, version.
Columns for <code class="docutils literal notranslate"><span class="pre">node</span> <span class="pre">group</span> <span class="pre">template</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, plugin, version,
node-processes, description.
Rows for <code class="docutils literal notranslate"><span class="pre">node</span> <span class="pre">group</span> <span class="pre">template</span> <span class="pre">show</span></code>: name, id, plugin, version,
node-processes, availability zone, flavor, is default, is proxy gateway,
security groups or auto security group, if node group template contains
volumes following rows will appear: volumes per node,
volumes local to instance, volumes mount prefix, volumes type,
volumes availability zone, volumes size, description.</p>
<p>For Cluster Templates:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>cluster<span class="w"> </span>template<span class="w"> </span>create<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name><span class="o">]</span><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--node-groups<span class="w"> </span><ng1:1,ng2:2><span class="o">]</span><span class="w"> </span><span class="o">[</span>--anti-affinity<span class="w"> </span><node-processes><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--autoconfigs<span class="o">]</span><span class="w"> </span><span class="o">[</span>--configs<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
cluster<span class="w"> </span>template<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--plugin<span class="w"> </span><plugin><span class="o">]</span><span class="w"> </span><span class="o">[</span>--version<span class="w"> </span><version><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name-regex><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
cluster<span class="w"> </span>template<span class="w"> </span>configs<span class="w"> </span>get<span class="w"> </span><cluster-template><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="c1"># default name of the file is <cluster-template></span>
cluster<span class="w"> </span>template<span class="w"> </span>show<span class="w"> </span><cluster-template>
cluster<span class="w"> </span>template<span class="w"> </span>update<span class="w"> </span><cluster-template><span class="w"> </span>...<span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="c1"># and other arguments the same as in create command</span>
cluster<span class="w"> </span>template<span class="w"> </span>delete<span class="w"> </span><cluster-template<span class="o">(</span>s<span class="o">)</span>>
</pre></div>
</div>
<p>Plugin and its version will be taken from node group templates.</p>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">template</span> <span class="pre">list</span></code>: name, id, plugin, version.
Columns for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">template</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, plugin, version,
node groups (in format name:count), description.
Rows for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">template</span> <span class="pre">show</span></code>: name, id, plugin, version,
node groups, anti affinity, description.</p>
<p>For Clusters:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>cluster<span class="w"> </span>create<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name><span class="o">]</span><span class="w"> </span><span class="o">[</span>--cluster-template<span class="w"> </span><cluster-template><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><description><span class="o">][</span>--user-keypair<span class="w"> </span><keypair><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--image<span class="w"> </span><image><span class="o">]</span><span class="w"> </span><span class="o">[</span>--management-network<span class="w"> </span><network><span class="o">]</span><span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--wait<span class="o">]</span>
cluster<span class="w"> </span>scale<span class="w"> </span><span class="o">[]</span><span class="w"> </span><span class="o">[</span>--wait<span class="o">]</span>
cluster<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--plugin<span class="w"> </span><plugin><span class="o">]</span><span class="w"> </span><span class="o">[</span>--version<span class="w"> </span><version><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name-regex><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
cluster<span class="w"> </span>show<span class="w"> </span><cluster>
cluster<span class="w"> </span>delete<span class="w"> </span><cluster<span class="o">(</span>s<span class="o">)</span>><span class="w"> </span><span class="o">[</span>--wait<span class="o">]</span>
</pre></div>
</div>
<p>If <code class="docutils literal notranslate"><span class="pre">[--wait]</span></code> attribute is set, CLI will wait for command completion.
Plugin and its version will be taken from cluster template.</p>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">list</span></code>: name, id, status.
Columns for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, url, description.
Rows for <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">show</span></code>: name, id, anti affinity, image id, plugin, version,
is transient, status, status_description, user keypair id, description.</p>
<p>For Job Templates (Jobs):</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>job<span class="w"> </span>template<span class="w"> </span>create<span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name><span class="o">]</span><span class="w"> </span><span class="o">[</span>--type<span class="w"> </span><type><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--main-binary<span class="o">(</span>ies<span class="o">)</span><span class="w"> </span><mains><span class="o">]</span><span class="w"> </span><span class="o">[</span>--libs<span class="w"> </span><libs><span class="o">]</span><span class="w"> </span><span class="o">[</span>--description<span class="w"> </span><descr><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--interface<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span>
job<span class="w"> </span>template<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--type<span class="w"> </span><type><span class="o">]</span><span class="w"> </span><span class="o">[</span>--name<span class="w"> </span><name-regex><span class="o">]</span><span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
job<span class="w"> </span>template<span class="w"> </span>show<span class="w"> </span><job-template>
job<span class="w"> </span>template<span class="w"> </span>delete<span class="w"> </span><job-template>
job<span class="w"> </span>template<span class="w"> </span>configs<span class="w"> </span>get<span class="w"> </span><type><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><file><span class="o">]</span><span class="w"> </span><span class="c1"># default file name <type></span>
job<span class="w"> </span>types<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--plugin<span class="w"> </span><plugin><span class="o">]</span><span class="w"> </span><span class="o">[</span>--version<span class="w"> </span><version><span class="o">]</span><span class="w"> </span><span class="o">[</span>--type<span class="w"> </span><type><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--hints<span class="o">]</span><span class="w"> </span><span class="o">[</span>--file<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="c1"># default file name depends on provided</span>
<span class="w"> </span><span class="c1"># args</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">types</span> <span class="pre">list</span></code> and <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">template</span> <span class="pre">configs</span> <span class="pre">get</span></code> outputs will be saved in
file just like <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">configs</span> <span class="pre">get</span></code>.</p>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">template</span> <span class="pre">list</span></code>: name, id, type.
Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">template</span> <span class="pre">list</span> <span class="pre">--long</span></code>: name, id, type, libs(ids),
mains(ids), description.
Rows for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">template</span> <span class="pre">show</span></code>: name, id, type, libs(ids),
mains(ids), description.</p>
<p>For Jobs (Job Executions):</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/>job<span class="w"> </span>execute<span class="w"> </span><span class="o">[</span>--job-template<span class="w"> </span><job-template><span class="o">]</span><span class="w"> </span><span class="o">[</span>--cluster<span class="w"> </span><cluster><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--input<span class="w"> </span><data-source><span class="o">]</span><span class="w"> </span><span class="o">[</span>--output<span class="w"> </span><data-source><span class="o">]</span><span class="w"> </span><span class="o">[</span>--args<span class="w"> </span><arg<span class="o">(</span>s<span class="o">)</span>><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--params<span class="w"> </span><name1:value1,name2:value2><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--configs<span class="w"> </span><name1:value1,name2:value2><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--interface<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--json<span class="w"> </span><filepath><span class="o">]</span><span class="w"> </span><span class="o">[</span>--wait<span class="o">]</span>
job<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
job<span class="w"> </span>show<span class="w"> </span><job>
job<span class="w"> </span>delete<span class="w"> </span><job<span class="o">(</span>s<span class="o">)</span>><span class="w"> </span><span class="o">[</span>--wait<span class="o">]</span>
</pre></div>
</div>
<p>Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">list</span></code>: id, cluster id, job id, status.
Columns for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">list</span> <span class="pre">--long</span></code>: id, cluster id, job id, status, start time,
end time
Rows for <code class="docutils literal notranslate"><span class="pre">job</span> <span class="pre">show</span></code>: id, cluster id, job id, status, start time,
end time, input id, output id</p>
<p>If <code class="docutils literal notranslate"><span class="pre">[--wait]</span></code> attribute is set, CLI will wait for command completion.</p>
<p>Besides this, there are a bunch of arguments provided by OpenstackClient, that
depends on chosen command plugin.
For example, there is a help output for <code class="docutils literal notranslate"><span class="pre">plugin</span> <span class="pre">list</span></code> command:</p>
<div class="highlight-shell notranslate"><div class="highlight"><pre><span/><span class="o">(</span>openstack<span class="o">)</span><span class="w"> </span><span class="nb">help</span><span class="w"> </span>dataprocessing<span class="w"> </span>plugin<span class="w"> </span>list
usage:<span class="w"> </span>dataprocessing<span class="w"> </span>plugin<span class="w"> </span>list<span class="w"> </span><span class="o">[</span>-h<span class="o">]</span><span class="w"> </span><span class="o">[</span>-f<span class="w"> </span><span class="o">{</span>csv,html,json,table,value,
<span class="w"> </span>yaml<span class="o">}]</span>
<span class="w"> </span><span class="o">[</span>-c<span class="w"> </span>COLUMN<span class="o">]</span><span class="w"> </span><span class="o">[</span>--max-width<span class="w"> </span><integer><span class="o">]</span>
<span class="w"> </span><span class="o">[</span>--quote<span class="w"> </span><span class="o">{</span>all,minimal,none,nonnumeric<span class="o">}]</span>
<span class="w"> </span><span class="o">[</span>--long<span class="o">]</span>
Lists<span class="w"> </span>plugins
optional<span class="w"> </span>arguments:
-h,<span class="w"> </span>--help<span class="w"> </span>show<span class="w"> </span>this<span class="w"> </span><span class="nb">help</span><span class="w"> </span>message<span class="w"> </span>and<span class="w"> </span><span class="nb">exit</span>
--long<span class="w"> </span>List<span class="w"> </span>additional<span class="w"> </span>fields<span class="w"> </span><span class="k">in</span><span class="w"> </span>output
output<span class="w"> </span>formatters:
output<span class="w"> </span>formatter<span class="w"> </span>options
<span class="w"> </span>-f<span class="w"> </span><span class="o">{</span>csv,html,json,table,value,yaml<span class="o">}</span>,<span class="w"> </span>--format<span class="w"> </span><span class="o">{</span>csv,html,json,table,value,
<span class="w"> </span>yaml<span class="o">}</span>
<span class="w"> </span>the<span class="w"> </span>output<span class="w"> </span>format,<span class="w"> </span>defaults<span class="w"> </span>to<span class="w"> </span>table
<span class="w"> </span>-c<span class="w"> </span>COLUMN,<span class="w"> </span>--column<span class="w"> </span>COLUMN
<span class="w"> </span>specify<span class="w"> </span>the<span class="w"> </span>column<span class="o">(</span>s<span class="o">)</span><span class="w"> </span>to<span class="w"> </span>include,<span class="w"> </span>can<span class="w"> </span>be<span class="w"> </span>repeated
table<span class="w"> </span>formatter:
<span class="w"> </span>--max-width<span class="w"> </span><integer>
<span class="w"> </span>Maximum<span class="w"> </span>display<span class="w"> </span>width,<span class="w"> </span><span class="m">0</span><span class="w"> </span>to<span class="w"> </span>disable
CSV<span class="w"> </span>Formatter:
<span class="w"> </span>--quote<span class="w"> </span><span class="o">{</span>all,minimal,none,nonnumeric<span class="o">}</span>
<span class="w"> </span>when<span class="w"> </span>to<span class="w"> </span>include<span class="w"> </span>quotes,<span class="w"> </span>defaults<span class="w"> </span>to<span class="w"> </span>nonnumeric
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Current CLI code can be refactored.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Creating OpenstackClient plugin for SaharaClient</p></li>
<li><p>Commands implementation for each object, described in “Proposed change”
section</p></li>
<li><p>Updating documentation with corresponding changes</p></li>
<li><p>Old CLI will be deprecated and removed after some time</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Every command will be provided with unit tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation about new CLI usage will be written.</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="http://docs.openstack.org/developer/python-openstackclient/plugins.html">OpenstackClient documentation about using Plugins</a>
<a class="reference external" href="http://docs.openstack.org/developer/python-openstackclient/commands.html">OpenstackClient documentation about Objects and Actions naming</a></p>
</section>
Mon, 27 Jul 2015 00:00:00 Add support HDP 2.2 pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/hdp-22-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/hdp-22-support">https://blueprints.launchpad.net/sahara/+spec/hdp-22-support</a></p>
<p>This specification proposes to add new HDP plugin based on Ambari
Blueprints [1] with Ambari Management Console.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently we support old HDP plugin which contains old HDP distribution.
Also old HDP plugin looks like unsupported by HortonWorks team every year [2].
Many customers want new version of HDP. New HDP plugin will be based on Ambari
Blueprints. Ambari Blueprints are a declarative definition of a cluster.
With a Blueprint, you specify a Stack, the Component layout and
the Configurations to materialize a Hadoop cluster instance via REST API.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>New HDP plugin will support provisioning HDP stack via Ambari Blueprints.</p>
<p>Plugin will support key Sahara features:</p>
<ul class="simple">
<li><p>Cinder integration</p></li>
<li><p>Swift integration</p></li>
<li><p>EDP</p></li>
<li><p>Scaling</p></li>
<li><p>Event logs</p></li>
</ul>
<p>New HDP plugin will support the following OS: Ubuntu 12.04 and CentOS 6. Aslo
new plugin will support mirrors with HDP packages.</p>
<p>New HDP plugin will support all services which supports Ambari. Also new plugin
will support HA for NameNode and ResourceManager. Client will be installed on
all nodes if selected our process. For example if selected Oozie then will be
installed Oozie client on all nodes.</p>
<p>Plugin wil be support the following services:</p>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Service</p></th>
<th class="head"><p>Process</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>Ambari</p></td>
<td><p>Ambari</p></td>
</tr>
<tr class="row-odd"><td><p>Falcon</p></td>
<td><p>Falcon Server</p></td>
</tr>
<tr class="row-even"><td><p>Flume</p></td>
<td><p>Flume</p></td>
</tr>
<tr class="row-odd"><td rowspan="2"><p>HBase</p></td>
<td><p>HBase Master</p></td>
</tr>
<tr class="row-even"><td><p>HBase RegionServer</p></td>
</tr>
<tr class="row-odd"><td rowspan="4"><p>HDFS</p></td>
<td><p>NameNode</p></td>
</tr>
<tr class="row-even"><td><p>DataNode</p></td>
</tr>
<tr class="row-odd"><td><p>SecondaryNameNode</p></td>
</tr>
<tr class="row-even"><td><p>JournalNode</p></td>
</tr>
<tr class="row-odd"><td rowspan="2"><p>Hive</p></td>
<td><p>Hive Metastore</p></td>
</tr>
<tr class="row-even"><td><p>HiveServer</p></td>
</tr>
<tr class="row-odd"><td><p>Kafka</p></td>
<td><p>Kafka Broker</p></td>
</tr>
<tr class="row-even"><td><p>Knox</p></td>
<td><p>Knox Gateway</p></td>
</tr>
<tr class="row-odd"><td><p>Oozie</p></td>
<td><p>Oozie</p></td>
</tr>
<tr class="row-even"><td rowspan="2"><p>Ranger</p></td>
<td><p>Ranger Admin</p></td>
</tr>
<tr class="row-odd"><td><p>Ranger Usersync</p></td>
</tr>
<tr class="row-even"><td><p>Slider</p></td>
<td><p>Slider</p></td>
</tr>
<tr class="row-odd"><td><p>Spark</p></td>
<td><p>Spark History Server</p></td>
</tr>
<tr class="row-even"><td><p>Sqoop</p></td>
<td><p>Sqoop</p></td>
</tr>
<tr class="row-odd"><td rowspan="4"><p>Storm</p></td>
<td><p>DRPC Server</p></td>
</tr>
<tr class="row-even"><td><p>Nimbus</p></td>
</tr>
<tr class="row-odd"><td><p>Storm UI Server</p></td>
</tr>
<tr class="row-even"><td><p>Supervisor</p></td>
</tr>
<tr class="row-odd"><td rowspan="4"><p>YARN</p></td>
<td><p>YARN Timeline Server</p></td>
</tr>
<tr class="row-even"><td><p>MapReduce History Server</p></td>
</tr>
<tr class="row-odd"><td><p>NodeManager</p></td>
</tr>
<tr class="row-even"><td><p>ResourceManager</p></td>
</tr>
<tr class="row-odd"><td><p>ZooKeeper</p></td>
<td><p>ZooKeeper</p></td>
</tr>
</tbody>
</table>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Add support of HDP 2.2 in old plugin, but it is very difficult to do without
Ambari Blueprints.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Need to add elements for building images with pre-installed Ambari packages.
For installing HDP Stack plugin should use mirror with HDP packages. Also
should add elements for building local HDP mirror.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>sreshetniak</p>
</dd>
<dt>Other contributors:</dt><dd><p>nkonovalov</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add base implementation of plugin [3] [4]</p></li>
<li><p>Add elements for building image with Ambari [5]</p></li>
<li><p>Add EDP support [6]</p></li>
<li><p>Add additional services support [7]</p></li>
<li><p>Add scaling support [8]</p></li>
<li><p>Add HA support [9]</p></li>
<li><p>Add elements for building HDP mirror [10]</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Add unit tests for plugin</p></li>
<li><p>Add scenario tests and job on sahara-ci</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>New plugin documentation should be added to Sahara docs.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1] <a class="reference external" href="https://cwiki.apache.org/confluence/display/AMBARI/Blueprints">https://cwiki.apache.org/confluence/display/AMBARI/Blueprints</a></p>
<p>[2] <a class="reference external" href="http://stackalytics.com/?module=sahara-group&release=all&company=hortonworks&metric=commits">http://stackalytics.com/?module=sahara-group&release=all&company=hortonworks&metric=commits</a></p>
<p>[3] <a class="reference external" href="https://review.openstack.org/#/c/184292/">https://review.openstack.org/#/c/184292/</a></p>
<p>[4] <a class="reference external" href="https://review.openstack.org/#/c/185100/">https://review.openstack.org/#/c/185100/</a></p>
<p>[5] <a class="reference external" href="https://review.openstack.org/#/c/181732/">https://review.openstack.org/#/c/181732/</a></p>
<p>[6] <a class="reference external" href="https://review.openstack.org/#/c/194580/">https://review.openstack.org/#/c/194580/</a></p>
<p>[7] <a class="reference external" href="https://review.openstack.org/#/c/195726/">https://review.openstack.org/#/c/195726/</a></p>
<p>[8] <a class="reference external" href="https://review.openstack.org/#/c/193081/">https://review.openstack.org/#/c/193081/</a></p>
<p>[9] <a class="reference external" href="https://review.openstack.org/#/c/197551/">https://review.openstack.org/#/c/197551/</a></p>
<p>[10] <a class="reference external" href="https://review.openstack.org/#/c/200570/">https://review.openstack.org/#/c/200570/</a></p>
</section>
Wed, 22 Jul 2015 00:00:00 Addition of Manila as a Binary Storehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/manila-as-binary-store.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/manila-as-binary-store">https://blueprints.launchpad.net/sahara/+spec/manila-as-binary-store</a></p>
<p>Network and distributed filesystems are a useful means of sharing files among
a distributed architecture. This core use case makes them an excellent
candidate for storage of job binaries.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>While internal database storage and Swift storage are sensible options for
binary storage, the addition of Manila integration allows it as a third option
for job binary storage and retrieval. This specification details how this
will be implemented.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The scheme <code class="docutils literal notranslate"><span class="pre">manila</span></code> will be added as an option in creation of job binaries.
URLs will take the form:</p>
<p><code class="docutils literal notranslate"><span class="pre">manila://{share_id}/absolute_path_to_file</span></code></p>
<p>URLs stored as binary locations can be passed through the following
validations:</p>
<ol class="arabic simple">
<li><p>The manila share id exists.</p></li>
<li><p>The share is of a type Sahara recognizes as a valid binary store.</p></li>
</ol>
<p>Because manila shares are not mounted to the control plane and should not be,
we will not be able to assess the existence or nonexistence of files intended
to be job binaries.</p>
<p>We can, however, assess the share type through Manila. For the initial
reference implementation, only NFS shares will be permitted for this
use; other share types may be verified and added in later changes.</p>
<p>For this binary type, Sahara will not retrieve binary files and copy them into
the relevant nodes; they are expected to be reachable through the nodes’ own
filesystems. Instead, Sahara will:</p>
<ol class="arabic simple">
<li><p>Ensure that the share is mounted to the appropriate cluster nodes (in the
Oozie case, the node group containing Oozie server; in the Spark case, the
node group containing Spark Master, etc.) If the share is not already
mounted, Sahara will mount the share to the appropriate node groups using
the mechanism described by blueprint mount-share-api (at the default path)
and update the cluster’s DB definition to note the filesystem mount.</p></li>
<li><p>Replace <code class="docutils literal notranslate"><span class="pre">manila://{share_id}</span></code> with the local filesystem mount point, and
use these local filesystem paths to build the workflow document or job
execution command, as indicated by engine. Job execution can then take
place normally.</p></li>
</ol>
<p>It is notable that this specification does not cover support of Manila
security providers. Such support can be added in future changes, and should
not affect this mechanism.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>While verification of paths on binary creation would be ideal, mounting tenant
filesystems (in the abstract, without a cluster necessarily available) is a
prohibitive security concern that outweighs this convenience feature (even
if we assume that networking is not an issue.)</p>
<p>We could also create a more general <code class="docutils literal notranslate"><span class="pre">file://</span></code> job binary scheme, either in
addition to <code class="docutils literal notranslate"><span class="pre">manila://</span></code> or as a replacement for it. However, this would not
particularly facilitate reuse among clusters (without a number of manual steps
on the user’s part) or allow auto-mounting when necessary.</p>
<p>We could also opt to simply raise an exception if the share has not already
been mounted to the cluster by the user. However, as the path to automatic
mounting is clear and will be reasonably simple once the mount-share-api
feature is complete, automatic mounting seems sensible for the initial
implementation.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Manila will be added as a visible job binary storage option; no other changes.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Secondary assignee/reviewer:</dt><dd><p>croberts</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Validation changes (URL scheme).</p></li>
<li><p>Integration with mounting feature.</p></li>
<li><p>Creation of “job binary retriever” strategy for Manila (which will mostly
no-op, given the strategy above).</p></li>
<li><p>Modification of workflow and execution command code to facilitate this flow.</p></li>
<li><p>Horizon changes (in separate spec).</p></li>
<li><p>Documentation.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit testing is assumed; beyond this, full integration testing will depend on
the feasibility of adding a manila endpoint to our CI environment. If this is
feasible, then our testing path becomes clear; if it is not, then gated
integration testing will not be possible.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This feature will require documentation in edp.rst.</p>
</section>
<section id="references">
<h2>References</h2>
<p>See <a class="reference external" href="https://wiki.openstack.org/wiki/Manila/API">https://wiki.openstack.org/wiki/Manila/API</a> if unfamiliar with manila
operations.</p>
</section>
Wed, 15 Jul 2015 00:00:00 upgrade oozie Web Service API version of sahara edp oozie enginehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/upgrade-oozie-engine-client.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/update-oozie-client-version">https://blueprints.launchpad.net/sahara/+spec/update-oozie-client-version</a></p>
<p>This spec is to upgrade oozie web service api version of oozie engine.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently sahara oozie server version is 4.1.0, but oozie engine still
use v1 oozie web service api which is used in oozie 3.x. By upgrading
oozie web service api from v1 to v2, we can add more oozie features
into sahara.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>change sahara OozieClient job_url and jobs_url from /v1/job/%s,
/v1/jobs to /v2/job/%s, /v2/jobs.</p>
<p>for /v2/jobs, remains the same as /v1/jobs</p>
<p>for /v2/job/, there is a difference in the JSON format of job
information API,particularly for map-reduce action,no changes
for other actions.In v1, externalId and consoleUrl point to
spawned child job ID, and exteranlChildIDs is null in map-reduce
action. In v2, externalId and consoleUrl point to launcher job ID,
and exteranlChildIDs is spawned child job ID in map-reduce action.
this exteranlChildIDs can be used for recurrence edp job’s child
job id.</p>
<p>here are the new oozie features can be added into sahara.</p>
<p>(1)PUT oozie/v2/job/oozie-job-id?action=update, we can update
job’s definition and properties.
(2)GET /oozie/v2/job/oozie-job-id?show=errorlog, we can get oozie
error log when job is failed, so we can show user the detail error
information. Currently sahara edp engine tells user nothing when
job is failed.</p>
<p>so we can add update_job() and show_error_log() into oozie client.
details about these two features will be drafted in another spec.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>luhuichun(lu huichun)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>update oozie client in oozie engine</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>unit test in edp engine
add scenario integration test</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>oozie web service api
<a class="reference external" href="http://oozie.apache.org/docs/4.2.0/WebServicesAPI.html">http://oozie.apache.org/docs/4.2.0/WebServicesAPI.html</a></p>
</section>
Mon, 13 Jul 2015 00:00:00 API to Mount and Unmount Manila Shares to Sahara Clustershttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/mount-share-api.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/mount-share-api">https://blueprints.launchpad.net/sahara/+spec/mount-share-api</a></p>
<p>As OpenStack’s shared file provisioning service, manila offers great
integration potential with sahara, both for shared binary storage and as a
data source. While it seems unnecessary to wrap manila’s share provisioning
APIs in sahara, allowing users to easily mount shares to all nodes of a Sahara
cluster in a predictable way will be a critical convenience feature for this
integration.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Manually mounting shares to every node in a large cluster would be a tedious
and error-prone process. Auto-mounting shares that are requested for use in
either the data source or binary storage case might be feasible for some use
cases. However, outside of our (optional) EDP interface this functionality
would never be usable. As such, it is best to provide the user an API for
mounting of shares onto Sahara clusters.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This change proposes to expand the node group template, node group, cluster
template, and cluster API resources and database objects to contain a “shares”
field. As per other node group fields, the cluster template and cluster APIs
will allow overwrite of this field (which is particularly critical for
composition, given that manila shares represent concrete data rather than
abstract resource pools.) At the resource level (for both resource types),
this field will be defined by the following jsonschema:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="s2">"shares"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"array"</span><span class="p">,</span>
<span class="s2">"items"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"object"</span><span class="p">,</span>
<span class="s2">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"format"</span><span class="p">:</span> <span class="s2">"uuid"</span>
<span class="p">},</span>
<span class="s2">"path"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"string"</span><span class="p">,</span> <span class="s2">"null"</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"access_level"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"string"</span><span class="p">,</span> <span class="s2">"null"</span><span class="p">],</span>
<span class="s2">"enum"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"rw"</span><span class="p">,</span> <span class="s2">"ro"</span><span class="p">],</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="s2">"rw"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s2">"additionalProperties"</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">"required"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"id"</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>“id” above refers to the UUID of the manila share to be mounted. It is
required.</p>
<p>“path” refers to the local path on each cluster node on which to mount this
share, which should be universal across all nodes of the cluster for
simplicity. It will default to <code class="docutils literal notranslate"><span class="pre">/mnt/{share_id}</span></code>.</p>
<p>“access_level” governs permissions set in manila for the cluster ips.
This defaults to ‘rw’.</p>
<p>Because no part of this field requires indexing, it is proposed that the
above structure be directly serialized to the database as a TEXT field in
JSON format.</p>
<p>Any share specified at the node group level will be mounted to all instances
of that node group. Any share specified at the cluster level will be mounted
to all nodes of that cluster. At cluster creation, in the case that a specific
share id is specified at both the node group and cluster level, the cluster’s
share configuration (path and access level) will entirely replace the node
group’s configuration. Any merge of share configurations is seen as needlessly
complex and error-prone, and it is our longstanding pattern that cluster-level
configurations trump node group-level configurations.</p>
<p>Error cases in this API include:</p>
<ol class="arabic simple">
<li><p>The provided id is not a valid manila share id, as assessed via
manilaclient with the user’s credentials.</p></li>
<li><p>The provided path is not a valid, absolute Linux path.</p></li>
<li><p>Path is not unique (within the set of shares specified for any one node
or the set of shares specified for any cluster.)</p></li>
<li><p>The provided id maps to a manila share type which sahara is not currently
equipped to mount.</p></li>
<li><p>No manila service endpoint exists within the user’s service catalog.</p></li>
</ol>
<p>On cluster creation (or update, if update becomes an available endpoint,)
just after the cluster becomes available and before delegating to the plugin
(such that any shares intended for HDFS integration will be in place for
the plugin configuration to act upon,) Sahara will execute a share mounting
step. For each share, Sahara will take the following steps:</p>
<ol class="arabic simple">
<li><p>Query manila for share information, including share type and defaults.</p></li>
<li><p>Query the cluster object to find internal ip addresses for all cluster
nodes of any node group for which the share should be mounted.</p></li>
<li><p>For each such node, call to manila to allow access for each ip according
to the permissions set on the share.</p></li>
<li><p>Make a remote call to each qualifying node and mount the share via its
mount address as returned from manila.</p></li>
</ol>
<p>Steps 1-3 above will be handled via common code in an abstract ShareHandler
class. The last step will be delegated to a concrete instance of this class,
based on share type as reported by manila, which will execute appropriate
command-line operations over a remote socket to mount the share.</p>
<p>The reference and test implementation for the first revision of this feature
will only provide an NFS mounter. An HDFS mounter is the next logical step,
but this feature set is already being worked on in parallel to this change and
falls outside of the scope of this specification.</p>
<p>Unmounting is a natural extension of this class, but is not covered in this
specification.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>A more-seamless approach to manila share storage and data sourcing could be
attempted, in which no API is exposed to the user, and shares are automatically
mounted and unmounted when resources on the share in question are needed (as
referenced in a data source URL or binary storage path). However, giving the
user the ability to mount and unmount shares at will may allow use cases which
we do not anticipate, and particularly in the context of usage of a sahara-
provisioned cluster without use of the EDP API, the new API is critical.</p>
<p>It would also be possible to attempt to wrap manila share creation (or even
share network creation or network router configuration) in sahara. It seems
reasonable, however, to assert that this would be an overstep of our charter,
and that asking users to create shares directly through manila will allow them
much fuller and up-to-date access to manila’s feature set.</p>
<p>On the sahara implementation side, it would be possible to create a new
‘share’ resource and table, for ease of update and compositional modelling.
However, shares will likely never be a top-level noun in sahara; it seems that
a field is a better fit for the degree of share management we intend to
undertake than an entire resource.</p>
<p>It should be noted that this specification does not attempt to deal with the
question of filesystem driver installation across n distributions of Linux and
m filesystem types; such an effort is better suited to many specifications and
change sets than one. For the first stage of this effort, NFS will be used as
the test reference filesystem type.</p>
<p>Note that both binary storage and data source integration are intentionally
not handled here. A binary storage specification will build on this spec, but
this spec is being posted independently such that the engineers working on data
source integration can propose revisions to only the changes relevant to their
needs.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>A new ‘shares’ TEXT field will be added to both node groups and node group
templates.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>A new ‘shares’ field will be added to the resource for both node groups and
node group templates. This field will only allow create functionality in the
initial change, as cluster update is currently a sticking point in our API.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Python-saharaclient will need to be made aware of the new shares field on all
supported resources.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None for the initial change; addition of specialized fs drivers in the
future may require image changes.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>The share mounting feature in Horizon will likely require a separate tab on
all affected resources, and is left for a separate spec.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Secondary assignee/reviewer:</dt><dd><p>croberts</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>API Resource modification and call validation.</p></li>
<li><p>DB model modification and testing.</p></li>
<li><p>Manila client integration with Sahara.</p></li>
<li><p>Logical glue code on cluster provisioning.</p></li>
<li><p>ShareMounter abstraction and NFS impl.</p></li>
<li><p>Unit testing.</p></li>
<li><p>Integration testing as feasible (will require manila in CI env for full CI.)</p></li>
<li><p>Update of API WADL site.</p></li>
<li><p>Horizon changes (in separate spec).</p></li>
<li><p>Documentation.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This feature introduces a new dependency on python-manilaclient.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit testing is assumed; beyond this, full integration testing will depend on
the feasibility of adding a manila endpoint to our CI environment. If this is
feasible, then our testing path becomes clear; if it is not, then gated
integration testing will not be possible.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This feature will require documentation in features.rst, and will drive changes
to the api documentation.</p>
</section>
<section id="references">
<h2>References</h2>
<p>See <a class="reference external" href="https://wiki.openstack.org/wiki/Manila/API">https://wiki.openstack.org/wiki/Manila/API</a> if unfamiliar with manila
operations.</p>
</section>
Fri, 10 Jul 2015 00:00:00 Use trusts for cluster creation and scalinghttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/cluster-creation-with-trust.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cluster-creation-with-trust">https://blueprints.launchpad.net/sahara/+spec/cluster-creation-with-trust</a></p>
<p>Creation of cluster can be a pretty long operation. Currently we use sessions
that usually expire in less than one hour. So, currently it is impossible to
create a cluster that requires more than one hour for spawn.</p>
<p>Sahara could get trust from user and use it whenever it is needed for cluster
creation or scaling.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara communicates with OpenStack services using session provided by user.
Session is created by keystone for comparably small amount of time. Creation
of large cluster could require more than session life time. That’s why Sahara
will not be able to operate cluster at the end of the process.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Sahara could perform all operations with cluster using trust obtained from
user. Now trusts are used for termination of transient cluster. The same
logic could be used for all operations during cluster creation and scaling.</p>
<p>Since we still support keystone v2, the option should be configurable. I
suggest making it enabled by default.</p>
<p>The proposed workflow:</p>
<ol class="arabic simple">
<li><p>User requests cluster creation or scaling.</p></li>
<li><p>Sahara creates trust to be used for OpenStack operation. This trust is
stored in the DB in the cluster’s trust_id field.</p></li>
<li><p>Sahara finishes cluster provisioning or the periodic cluster cleanup task
recognizes that cluster activation has timed out and uses the trust to
delete the cluster.</p></li>
<li><p>Sahara deletes the trust.</p></li>
</ol>
<p>For safety reasons created trusts should be limited by time, but their life
time should be sufficient for cluster creation. Parameter in config file with
1 day default should work well.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>The trust id could be stored in memory rather than in the database. However,
this implementation would not allow the periodic cluster cleanup task (which
is not run in the context of the tenant cluster provisioning request) to
successfully delete stale clusters.</p>
<p>It is notable that storing the trust_id in the database will also be of use
to us in improving the HA capabilities of cluster creation if we move to a
multi-stage, DAG-based provisioning flow.</p>
<p>While potential concerns about storing trust ids in the DB exist, these ids
require a valid auth token for either the admin or tenant user to utilize,
adding some degree of security in depth in case of a control plane database
breach. This mechanism may be further secured by storing all trust ids (for
both transient and long-running clusters) via a secret storage module in the
future. This change, however, falls outside the scope of this specification.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add option to Sahara config</p></li>
<li><p>Implement new authentication strategy</p></li>
<li><p>Document new behavior</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually. CI will cover the feature.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Because trusts are now being used to clean up clusters, we will need to
document that the periodic cluster cleanup task should be run on a schedule
that fits within the expiration period of a trust.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Thu, 09 Jul 2015 00:00:00 Support NTP service for cluster instanceshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/support-ntp.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/support-ntp">https://blueprints.launchpad.net/sahara/+spec/support-ntp</a></p>
<p>Sahara can support NTP (Network Time Protocol) for clusters instances.
NTP is intended to synchronize time on all clusters instances. It can
be useful for several services on Cloudera, HDP clusters (see [1] for
reference).</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara should support configuring NTP on cluster instances since it’s
required to HDP and Cloudera clusters.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>First, it’s proposed to preinstall ntp daemon on all images via
<code class="docutils literal notranslate"><span class="pre">sahara-image-elements</span></code>. For clean images will not configure
ntp at all. Also installing ntp on instances of long living clusters
can be prevented, as well.</p>
<p>As a second step we should add new common string config option to
sahara in <code class="docutils literal notranslate"><span class="pre">general</span></code> section of <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">configs</span></code>, that will allow
to specify own NTP server for current cluster. This config option can
be supported in all plugins and will allow to install NTP on all
cluster instances with specified NTP server. As option, we can allow
disabling NTP, at least for fake plugin. So, following plugin options
we will have:</p>
<ol class="arabic simple">
<li><p><code class="docutils literal notranslate"><span class="pre">NTP_ENABLED</span></code>: default value is True, this option is required to allow
disabling ntp on cluster instances</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">NTP_URL</span></code>: default value is empty string. So, if user input of this
option is empty string, we will use default ntp server from
<code class="docutils literal notranslate"><span class="pre">sahara.conf</span></code>. Otherwise, user input will be used.</p></li>
</ol>
<p>As the third step, we should provide new config for <code class="docutils literal notranslate"><span class="pre">sahara.conf</span></code>
that will to specify default NTP server on current sahara installation.
It can useful because default NTP server can be different in different
regions. Also it would allow to use NTP server that was installed
specially for current lab and current sahara installation.</p>
<p>Second step require to have common options for all plugins to avoid
code duplication for all plugins. All options can be added to
<code class="docutils literal notranslate"><span class="pre">plugins/provisioning.py</span></code> as well.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We can store <code class="docutils literal notranslate"><span class="pre">ntp_url</span></code> and <code class="docutils literal notranslate"><span class="pre">ntp_enabled</span></code> as cluster column, it’s looks like
long story for being merged: sahara-side code -> python-saharaclient ->
(long long story) horizon support.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>We don’t need extra migrations, we will store information about NTP
server in <code class="docutils literal notranslate"><span class="pre">general</span></code> section of <code class="docutils literal notranslate"><span class="pre">cluster</span> <span class="pre">configs</span></code>.</p>
<p>Storing NTP server in separate column in database not really useful,
since we can store this information just in cluster configs.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>User will have ability to specify own NTP server for current cluster
and current sahara installation.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>It’s required to preinstall NTP on all images.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Since Sahara already have ability to expose all general config options during
cluster-template creation, we don’t need extra modifications on Horizon side.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
<dt>Other contributors:</dt><dd><p>sreshetniak</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Proposed change will contain following steps:</p>
<ol class="arabic simple">
<li><p>Install NTP on images in <code class="docutils literal notranslate"><span class="pre">sahara-image-elements</span></code>.</p></li>
<li><p>Add ability to install NTP on cluster instances and add required
config options.</p></li>
<li><p>Add documentation for feature.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on Openstack requirements</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Feature will covered with integration tests as well.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to document feature and all config options, which will be used for
NTP configuration.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1] <a class="reference external" href="http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/install_cdh_enable_ntp.html">http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/install_cdh_enable_ntp.html</a></p>
</section>
Thu, 09 Jul 2015 00:00:00 Use Templates for Scenario Tests Configurationhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/scenario-test-config-template.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/scenario-test-config-template">https://blueprints.launchpad.net/sahara/+spec/scenario-test-config-template</a></p>
<p>Users that want to run the scenario tests available in the Sahara repository
need to modify the provided YAML files, which are really template files even
if not marked as such. The usability of this process could be improved by
extending the test runner to read the templates and perform the substitution
of the required variables instead.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The scenario tests provided in the Sahara repository are template files
even if they are simply marked as YAML files; this is not clear from the
README.rst file.
Users that want to run them need to manually replace the required variables
(the variables are needed because they depend on the environment: host ip,
credentials, network type, flavors used, etc).
This step needs to be done both by developers and testers, and by wrapper
scripts used to run the script on a CI. The repository of the main Sahara CI,
sahara-ci-config, contains code which replaces the variables:</p>
<p><a class="reference external" href="https://github.com/stackforge/sahara-ci-config/blob/master/slave-scripts/functions-common.sh#L148">https://github.com/stackforge/sahara-ci-config/blob/master/slave-scripts/functions-common.sh#L148</a></p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The current template files (mostly under etc/sahara-ci right now) need to be
properly identified as templates. The chosen format is Mako, because it is
already a dependency for the scenario test runner (runner.py).
The files will be marked using a special suffix (file.yaml.mako)
and the variables used will be converted in Mako format.
The usage of templating would be limited to variable replacement, which means
no logic in the templates.</p>
<p>The test runner will continue to handle normal YAML files as usual, in
addition to template files.</p>
<p>runner.py will also take a simply INI-style file with the values for
the variables used in the template. It will be used by the runner
to generate the real YAML files used for the input (in addition to
the normal YAML files, if specified).</p>
<p>A missing value for some variable (key not available in the INI file)
will raise an exception and lead to the runner termination.
This differs from the current behavior of sahara-ci-config code, where a
missing values just prints a warning, but given that this would likely bring
to a failure, enforcing an early termination limit the resource consumption.</p>
<p>The current sahara-ci-config code allows to specify more details for the
replacement variables, like specific end keys where to stop the match for
a certain key, but they are likely not needed with a proper choice of names
for the variables.</p>
<p>Finally, sahara/tests/scenario/README.rst should be changed to document the
currently used variables and the instruction on how to feed the key/value
configuration file.
The code in sahara-ci-config should be changed as well to create such
configuration file and to use the new names of the template files for the
respective tests.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>CI systems which runs tox -e scenario should be updated to use the new
filenames.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers/QE running tests need to use the new template names.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ltoscano</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The change will be implemented as follow:</p>
<ol class="arabic simple">
<li><p>allow runner.py to use mako template files as input;</p></li>
<li><p>copy the existing files to the new name and use non-ambiguous variable
names for templates;</p></li>
<li><p>change sahara-ci scripts to use to set the new variables and use the
renamed template files;</p></li>
<li><p>remove the old yaml files;</p></li>
<li><p>(optional) clean up unnedded code (insert_scenario_value, etc)</p></li>
</ol>
<p>Repositories affected:
- sahara: 1, 2, 4
- sahara-ci-config: 3, 5</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Successful CI run will ensure that the new code did not regress the existing
scenarios.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>sahara/tests/scenario/README.rst needs to be updated.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 03 Jul 2015 00:00:00 Updating authentication to use keystone sessionshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/keystone-sessions.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/keystone-sessions">https://blueprints.launchpad.net/sahara/+spec/keystone-sessions</a></p>
<p>Sahara currently uses per access authentication when creating OpenStack
clients. This style of authentication requires keystone connections on every
client creation. The keystone project has created a mechanism to streamline
and improve this process in the form of Session objects. These objects
encapsulate mechanisms for updating authentication tokens, caching of
connections, and a single point for security improvements. Sahara should
migrate its OpenStack client objects to use session objects for all clients.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>For all OpenStack client instances, sahara uses authentication on a
per-client creation basis. For each client object that is requested, a set of
credentials are acquired from the context, or the configuration file in the
case of admin accounts, which are used to initialize the client object.
During this initialization a request is made to the Identity service to
determine the user’s privileges with respect to the new client.</p>
<p>Sahara must be aware of any changes to the authentication methods for each
client as well as any potential security vulnerabilities resulting from the
usage of those methods.</p>
<p>This method of authentication does not allow sahara to share any information
between clients, aside from the raw credentials. In turn, this introduces
brittleness to the sahara/client interface as each authentication
relationship must be maintained separately or, worse yet, with a partial
shared model.</p>
<p>Having separate interfaces for each client also makes applying security
updates more difficult as each client instance must be visited, researched,
and ultimately fixed according the specific details for that client.</p>
<p>Although this methodology has served sahara well thus far, the keystone
project has introduced new layers of abstraction to aid in sharing common
authentication between clients. This shared methodology, the keystoneclient
Session object, provides a unified point of authentication for all clients.
It serves as a single point to contain security updates, on-demand
authentication token updating, common authentication methods, and
standardized service discovery.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Sahara should standardize its client authentication code by utilizing
keystoneclient Session objects. This change will entail creating a new
module, modifying the OpenStack client utility functions, and adding
an authentication plugin object to the context.</p>
<p>A new module, <code class="docutils literal notranslate"><span class="pre">sahara.service.sessions</span></code>, will be created to contain utility
functions and classes to aid in the creation and storage of session objects.
This module will also contain a global singleton for the sessions cache.</p>
<p><code class="docutils literal notranslate"><span class="pre">sahara.service.sesssions</span></code> will provide a class named <code class="docutils literal notranslate"><span class="pre">SessionCache</span></code> as
well as a function to gain the global singleton instance of that class. The
<code class="docutils literal notranslate"><span class="pre">SessionCache</span></code> will contain cached session objects that can be reused in
the creation of individual OpenStack clients. It will also contain functions
for generating the session objects required by specific clients. Some clients
may require unique versions to be cached, for example if a client requires a
specific certificate file then it may have a unique session. For all other
clients that do not require a unique session, a common session will be used.</p>
<p>Authentication for session objects will be provided by one of a few methods
depending on the type of session needed. For user based sessions,
authentication will be obtained from the keystonemiddleware authentication
plugin that is generated with each request. For admin based sessions, the
credentials found in the sahara configuration file will be used to generate
the authentication plugin. Trust based authentication will be handled by
generating an authentication plugin based on the information available in
each case, either a token for a user or a password for an admin or proxy
user.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">sahara.context.Context</span></code> object will be changed to incorporate an
authentication plugin object. When created through REST calls the
authentication plugin will be obtained from the keystonemiddleware. When
copying a context, the authentication plugin will be copied as well. For
other cases the authentication plugin may be set programmatically, for
example if an admin authentication plugin is required it can be generated
from values in the configuration file, or if a trust based authentication
is required it can be generated.</p>
<p>The individual OpenStack client utility modules will be changed to use
session and authentication plugin objects for their creation. The
sessions will be obtained from the global singleton and the authentication
plugin objects can be obtained from the context, or created in circumstances
that require more specific authentication, for example when using the
admin user or trust based authentication.</p>
<p>The clients for heat and swift do not yet enable session based
authentication. These clients should be monitored for addition of this
feature and migrated when available.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>An alternative to this approach would be to create our own methodology for
storing common authentication credentials, but this would be an exercise in
futility as we would merely be replicating the work of keystoneclient.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Michael McCune (elmiko)</p>
</dd>
<dt>Other contributors:</dt><dd><p>Andrey Pavlov (apavlov)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>create sahara.service.sessions</p></li>
<li><p>modify context to accept authentication plugin</p></li>
<li><p>modify sahara.utils.openstack clients to use sessions</p>
<ul>
<li><p>cinder</p></li>
<li><p>keystone</p></li>
<li><p>neutron</p></li>
<li><p>nova</p></li>
</ul>
</li>
<li><p>modify admin authentications to use plugin objects</p></li>
<li><p>modify trust authentications to use plugin objects</p></li>
<li><p>create tests for session cache</p></li>
<li><p>create developer documentation for client usage</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>The tests created for this feature will be unit based, to exercise the code
paths and logic points. Functional testing should not be necessary as these
authentication methods will be exercised in the course of the standard
functional testing.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This change will only create documentation within the sahara project.
Currently there exists no documentation about client usage within the sahara
codebase. This change will add a small section describing how to instantiate
clients using the <code class="docutils literal notranslate"><span class="pre">sahara.utils.openstack</span></code> package, with a note about
common session authentication.</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="http://docs.openstack.org/developer/python-keystoneclient/using-sessions.html">Keystoneclient documentation about using Sessions</a></p>
<p><a class="reference external" href="http://www.jamielennox.net/blog/2014/09/15/how-to-use-keystoneclient-sessions/">How to Use Keystoneclient Sessions (article by Jamie Lennox)</a></p>
</section>
Wed, 01 Jul 2015 00:00:00 Running Spark Jobs on Cloudera Clusters 5.3.0https://specs.openstack.org/openstack/sahara-specs/specs/liberty/spark-jobs-for-cdh-5-3-0.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0">https://blueprints.launchpad.net/sahara/+spec/spark-jobs-for-cdh-5-3-0</a></p>
<p>This specification proposes to add ability to run Spark Jobs on clusters with
CDH (Cloudera Distribution Including Apache Hadoop).</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara is able to run CDH clusters with running Spark services. However there
was no possibility to run Spark jobs on clusters of this type.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The work involves adding a class for running Spark job via Cloudera plugin.
Existing Spark engine was changed so that it lets to run Spark jobs with Spark
and Cloudera plugins.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Required processes:
- Master: SPARK_YARN_HISTORY_SERVER
- Workers: YARN_NODEMANAGER</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Alexander Aleksiyants</p>
</dd>
<dt>Other contributors:</dt><dd><p>Oleg Borisenko</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p><a class="reference external" href="https://review.openstack.org/#/c/190128/">https://review.openstack.org/#/c/190128/</a></p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit tests to cover CDH engine for working with Spark jobs.</p></li>
<li><p>Unit tests for EDP Spark is now used for Spark Engine and EDP engine.</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Fri, 19 Jun 2015 00:00:00 Drop Hadoop v1 support in provisioning pluginshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/drop-hadoop-1-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/drop-hadoop-1">https://blueprints.launchpad.net/sahara/+spec/drop-hadoop-1</a></p>
<p>This specification proposes to drop old Hadoop versions</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Support for Hadoop 1 in provisioning plugins has become an unused feature. The
Hadoop development for v1 is almost frozen and Hadoop vendors are also dropping
its support.</p>
<p>As the sahara-plugins and sahara main repository split is going to happen it
will be a lot easier move less plugins to the new repo. Also the number of
unit and scenario tests will reduce.</p>
<p>The list of versions to be dropped is:</p>
<ul class="simple">
<li><p>Vanilla 1.2.1</p></li>
<li><p>HDP 1.3.2</p></li>
</ul>
<p>This spec does not suppose to drop the deprecated versions of plugins. That
should be a regular part of the release cycle.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Drop the plugins code for Hadoop v1 along with:</p>
<ul class="simple">
<li><p>xml/json resources</p></li>
<li><p>unit and scenario tests</p></li>
<li><p>sample files</p></li>
<li><p>image elements</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>TBD</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Elements for building Hadoop v1 images should be dropped as well</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>nkonovalov</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Disable Hadoop v1 tests in sahara-ci</p></li>
<li><p>Drop related code and resource from sahara main repo</p></li>
<li><p>Drop image elements</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Sahara-ci should not test Hadoop v1 anymore</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Add a note that Hadoop v1 is not supported in new releases.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 17 Jun 2015 00:00:00 Adding custom scenario to scenario testshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/adding-custom-scenario-tests.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/custom-checks">https://blueprints.launchpad.net/sahara/+spec/custom-checks</a></p>
<p>This specification proposes to add custom tests to scenario tests for more
exhaustive testing of Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now, scenario tests testing of basic functionality and user can not add
custom personal tests for check of other functionality in Sahara.</p>
<p>Extra tests should be added:</p>
<ul class="simple">
<li><p>checks for mount and available cinder volumes;</p></li>
<li><p>checks for started services on cluster;</p></li>
<li><p>checks for other processes that now not testing</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Custom test need add to sahara/tests/scenario/custom_checks and need implement
support of this scenarios in scenario tests.</p>
<p>For implementation this spec, need change field parameters for field “scenario”
in scenario tests. Now is type “enum”, need change to “string” for adding
ability set custom tests.</p>
<p>Additionally, should be rewrite sahara/tests/scenario/testcase.py.mako
template. Custom tests will be called from module with name in format
<cite>check_{name of check}</cite> with method <cite>check()</cite> inside.</p>
<p>All auxiliary methods for current custom check will be written in module with
this tests. Methods, for global using in several custom scenario can be
implemented in sahara/tests/scenario/base.py.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Tests can be added manually to scenario tests in Base class.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>esikachev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability to run custom scenario tests;</p></li>
<li><p>Move scripts from old integration tests to scenario tests as custom checks;</p></li>
<li><p>Adding new custom checks.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 05 Jun 2015 00:00:00 Allow the creation of multiple clusters simultaneouslyhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/allow-creation-of-multiple-cluster-simultaneously.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/simultaneously-creating-multiple-clusters">https://blueprints.launchpad.net/sahara/+spec/simultaneously-creating-multiple-clusters</a></p>
<p>We want to improve user experience when creating new clusters by allowing the
user to create multiple clusters at the same time.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>When creating new clusters, the user has only the option to start a single
cluster. In some cases, for example when dealing with researches, the user
needs more than a single cluster with a given template. In order to reduce the
work of creating a cluster then going back to create a second one and so on, we
want to include the option of creating multiple clusters simultaneously by
adding an option of number of clusters.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We want to introduce an option for the user to select how many clusters will be
spawned.
When creating multiple clusters we will add a sequential number to the given
cluster name (hadoop-cluster1, hadoop-cluster2, …).</p>
<p>The creation workflow would go as follows:</p>
<ul class="simple">
<li><ol class="arabic simple">
<li><p>The users will request the creation of multiple clusters
POST v1.1/<project_id>/clusters/multiple using a body as described below.</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="2">
<li><p>The return of this call will be a list of clusters id</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="3">
<li><p>Finally the user will be able to track the cluster state using the ids.</p></li>
</ol>
</li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>The user keeps creating a cluster at a time.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>We need to create a new API call (create_multiple_clusters()) to allow the
creation of multiple clusters by passing a new parameter specifying the number
of clusters that will be created.</p>
<p>POST v1.1/<project_id>/clusters/multiple</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>We will also need to change the python-saharaclient to allow the creation of
multiple clusters.</p>
<ul class="simple">
<li><p>Request</p></li>
</ul>
<dl class="simple">
<dt>{</dt><dd><p>“plugin_name”: “vanilla”,
“hadoop_version”: “2.4.1”,
“cluster_template_id”: “1beae95b-fd20-47c0-a745-5125dccbd560”,
“default_image_id”: “be23ce84-68cb-490a-b50e-e4f3e340d5d7”,
“user_keypair_id”: “doc-keypair”,
“name”: “doc-cluster”,
“count”: 2,
“cluster_configs”: {}</p>
</dd>
</dl>
<p>}</p>
<ul class="simple">
<li><p>Response</p></li>
</ul>
<dl class="simple">
<dt>{clusters: [“c8c3fee5-075a-4969-875b-9a00bb9c7c6c”,</dt><dd><p>“d393kjj2-973b-3811-846c-9g33qq4c9a9f”]</p>
</dd>
</dl>
<p>}</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>We need to add a box to allow the user to insert the number of clusters that
will be created (default set to 1).</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tellesmvn</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement the backend for creating multiple clusters</p></li>
<li><p>Implement the API change</p></li>
<li><p>Implement Unit tests</p></li>
<li><p>Implement changes to the python-saharaclient</p></li>
<li><p>Implement changes to the UI</p></li>
<li><p>Update WADL file in the api-site repo</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will implement unit tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The documentation needs to be updated datailing the new option.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 03 Jun 2015 00:00:00 Unified Map to Define Job Interfacehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/unified-job-interface-map.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/unified-job-interface-map">https://blueprints.launchpad.net/sahara/+spec/unified-job-interface-map</a></p>
<p>This specification proposes the addition of an “interface” map to the API for
job creation, such that the operator registering a job can define a unified,
human-readable way to pass in all arguments, parameters, and configurations
that the execution of that job may require or accept. This will allow
platform-agnostic wizarding at the job execution phase and allows users to
document use of their own jobs once in a persistent, standardized format.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>At present, each of our data processing engines require or may optionally take
any of arguments, parameters, configuration values, and data sources (which
may be any of 0, 1, or many inputs to 0, 1, or many outputs). This forces
users to bear the burden of documenting their own jobs outside of Sahara,
potentially for job operators who may not be particularly technical.</p>
<p>A single, human readable way to define the interface of a job, that can be
registered at the time of job registration (rather than job execution,) would
allow several benefits:</p>
<ul class="simple">
<li><p>A more unified UI flow across plugins</p></li>
<li><p>A clean separation of responsibility between the creator of the job (likely
a technical user) and the executor of the job</p></li>
<li><p>A means of correcting our current assumptions regarding data sources (where
for several plugins we are inappropriately assuming 1 input and 1 output
source)</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>When creating a job, an optional “interface” list may be added to the job
json (though we are fundamentally creating a map, presenting a list structure
will allow more intuitive ordering and fewer unnecessary error cases.) Each
member of this list describes an argument to the job (whether it is passed
as a configuration value, a named argument, or a positional argument.)</p>
<p>The interface is described by the following jsonschema object field:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="s2">"interface"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"array"</span><span class="p">,</span>
<span class="s2">"uniqueItems"</span><span class="p">:</span> <span class="kc">True</span><span class="p">,</span>
<span class="s2">"items"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"object"</span><span class="p">,</span>
<span class="s2">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"minLength"</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">},</span>
<span class="s2">"description"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span>
<span class="p">},</span>
<span class="s2">"mapping"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"object"</span><span class="p">,</span>
<span class="s2">"properties"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"enum"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"args"</span><span class="p">,</span> <span class="s2">"configs"</span><span class="p">,</span> <span class="s2">"params"</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"location"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"minLength"</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s2">"additionalProperties"</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">"required"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"type"</span><span class="p">,</span>
<span class="s2">"location"</span>
<span class="p">]</span>
<span class="p">},</span>
<span class="s2">"value_type"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"enum"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"string"</span><span class="p">,</span>
<span class="s2">"number"</span><span class="p">,</span>
<span class="s2">"data_source"</span><span class="p">,</span>
<span class="s2">"input_data_source"</span><span class="p">,</span>
<span class="s2">"output_data_source"</span><span class="p">],</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="s2">"string"</span>
<span class="p">},</span>
<span class="s2">"required"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"boolean"</span>
<span class="p">},</span>
<span class="s2">"default"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"string"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s2">"additionalProperties"</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">"required"</span><span class="p">:</span> <span class="p">[</span>
<span class="s2">"name"</span><span class="p">,</span>
<span class="s2">"mapping"</span><span class="p">,</span>
<span class="s2">"required"</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Post-schema validations include:</p>
<ol class="arabic simple">
<li><p>Names must be unique.</p></li>
<li><p>Mapping is unique.</p></li>
<li><p>The set of all positional arguments’ locations must be an unbroken integer
sequence with an inclusive minimum of 0.</p></li>
<li><p>Positional arguments may not be required, but must be given default values
if they are not.</p></li>
</ol>
<p>The job execution will also have a simpler interface field definition,
described by:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="s2">"interface"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"simple_config"</span>
<span class="p">}</span>
</pre></div>
</div>
<p>New error cases at execution time include:</p>
<ol class="arabic simple">
<li><p>One configuration value or parameter is given two definitions (one through
the interface map and one via configs, params, or data sources.)</p></li>
<li><p>An interface value does not pass validation for the type specified for the
field in question.</p></li>
<li><p>A key in the execution interface map does not equal any key in the job
definition’s interface map.</p></li>
<li><p>The specified mapping type is not accepted by the job type being created
(for instance, specifying the params type for a Spark job.)</p></li>
<li><p>An input data source does not contain data.</p></li>
<li><p>An output data source contains data.</p></li>
</ol>
<p>In the case of additional positional values, the positional arguments given
in the args list will be appended to the list of interface positional
arguments (whether provided or default values.) This will allow for an
<code class="docutils literal notranslate"><span class="pre">*args</span></code> pattern, should a plugin permit it.</p>
<p>Params and configs passed via the current mechanism that do not overlap with
any key in the execution interface map will be merged and passed to the job as
normal. This also applies to $INPUT and $OUTPUT params passed via the input
source and output source fields.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>In truth, after discussion, it seems that there is not a good alternative to
the broad strokes of this plan (save doing nothing). Leaving all configuration
of jobs to the execution phase is a real difficulty given that our supported
data processing engines simply lack a unified interface. If we wish to create
a unified flow, we need to create one; if we want to create one, the job
definition phase produces the least user pain, and a simple, flat map is the
most legible and flexible thing that can do the job.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>A new table will need to be created for storage of interface fields, described
by the following DDL (rendered in MySQL syntax for friendliness):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">CREATE</span> <span class="n">TABLE</span> <span class="n">job_interface_arguments</span> <span class="p">(</span>
<span class="nb">id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span>
<span class="n">job_id</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">36</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span>
<span class="n">name</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">80</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 'Main Class'</span>
<span class="n">description</span> <span class="n">TEXT</span><span class="p">,</span> <span class="c1"># ex: 'The main Java class for this job.'</span>
<span class="n">mapping_type</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">80</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 'configs'</span>
<span class="n">location</span> <span class="n">TEXT</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 'edp.java.main_class'</span>
<span class="n">value_type</span> <span class="n">VARCHAR</span><span class="p">(</span><span class="mi">80</span><span class="p">)</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 'string'</span>
<span class="n">required</span> <span class="n">BOOL</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 0</span>
<span class="n">order</span> <span class="n">TINYINT</span> <span class="n">NOT</span> <span class="n">NULL</span><span class="p">,</span> <span class="c1"># ex: 1</span>
<span class="n">default_value</span> <span class="n">TEXT</span><span class="p">,</span> <span class="c1"># ex: 'org.openstack.sahara.examples.WordCount'</span>
<span class="n">created_at</span> <span class="n">DATETIME</span><span class="p">,</span>
<span class="n">PRIMARY</span> <span class="n">KEY</span> <span class="p">(</span><span class="nb">id</span><span class="p">),</span>
<span class="n">FOREIGN</span> <span class="n">KEY</span> <span class="p">(</span><span class="n">job_id</span><span class="p">)</span>
<span class="n">REFERENCES</span> <span class="n">jobs</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="n">ON</span> <span class="n">DELETE</span> <span class="n">CASCADE</span>
<span class="p">);</span>
</pre></div>
</div>
<p>This table will have uniqueness constraints on (job_id, name) and (job_id,
mapping_type, location).</p>
<p>Note: While the TEXT type fields above (save Description) could validly be
given an upper length limit and stored as VARCHARs, TEXT is safer in the case
that a job actually requires an overly long argument, or is configured with
a reasonably massive key. This implementation detail is certainly up for
debate re: efficiency vs. usability.</p>
<p>Happily, this change will not require a migration for extant data; the
interface fields table has a (0, 1, or many)-to-one relationship to the jobs
table, and the existing configs/params/args method of propagating job
execution data can continue to function.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>The Create Job schema will have a new “interface” field, described above. Each
listed exceptional case above will generate a 400: Bad Request error.</p>
<p>This field will also be represented in all GET methods of the Job resource.</p>
<p>The Create Job Execution schema will have a new “interface” field, described
above. Each listed exceptional case above will generate a 400: Bad Request
error. This field will not be returned on a GET of a job execution object;
instead, the final, merged configuration will be returned.</p>
<p>No other impact is foreseen.</p>
<p>Note: I am profoundly open to better options for terminology throughout this
document. As “args”, “params”, and “configs” are already taken, naming of a
new option has become difficult. “Interface” and “Interface arguments” seem
to me to be the best option remaining in all cases. If you can do one better,
please do.</p>
<p>Note: As interface fields will be represented in the data layer as individual
records, it would be possible to create an entirely new set of CRUD methods
for this object. I believe that course of action to be unnecessarily heavy,
however: should the job binary change, the job must be recreated regardless,
and a sensible interface need not change for the life of any concrete binary.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>python-saharaclient will require changes precisely parallel to the interface
changes described above.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>The immediate change does not require a Horizon change. Any UI that utilizes
this feature should be represented as a separate blueprint and spec, and will
doubtless touch wizarding decisions which are wholly orthogonal to this
feature.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>API updates as specified.</p></li>
<li><p>DB layer updates as specified.</p></li>
<li><p>Job execution argument validation and propagation to clusters.</p></li>
<li><p>Testing.</p></li>
<li><p>Python-saharaclient updates and testing.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None at present.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>A tempest test will cover injection of each mapping type into jobs (args,
configs, params.) This will be tested via a Pig job, as that type may take all
of the above. This test will include arguments mapping to both a Swift
datasource and an HDFS datasource, to ensure that both URL types are preserved
through the flow.</p>
<p>Thorough unit testing is assumed.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None that have not already been mentioned.</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="http://eavesdrop.openstack.org/irclogs/%23openstack-sahara/%23openstack-sahara.2014-12-05.log">Chat</a> (2014/12/05; begins at 2014-12-05T16:07:55)</p>
</section>
Wed, 03 Jun 2015 00:00:00 Provide ability to configure most important configs automaticallyhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/recommend-configuration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/recommend-configuration">https://blueprints.launchpad.net/sahara/+spec/recommend-configuration</a></p>
<p>Now users manually should configure most important hadoop configurations.
It would be friendly to provide advices about cluster configurations for
users.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now users manually should configure most important hadoop configs, but it’s
required to have advanced knowledge in Hadoop. Most configs are complicated
and not all users know them. We can
provide advices about cluster configuration and automatically configure
few basic configs, that will improve user experience. Created workaround
can extended in future with new confiuguration and advices.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>It’s proposed to add calculator, which would automatically configure
most important configurations in dependency cluster specification:
available disk space, ram, cpu, and so on. Such calculator already
implemented in Ambari (see [1] and [2]), and we can use it as well. We should
have ability to switch off autoconfiguration and if user also manually
configured some hadoop config, autoconfiguration also will not be applied.</p>
<p>The following list of configs will be configured, using formulas from [1] and
[2]:</p>
<ul class="simple">
<li><p>yarn.nodemanager.resource.memory-mb</p></li>
<li><p>yarn.scheduler.minimum-allocation-mb</p></li>
<li><p>yarn.scheduler.maximum-allocation-mb</p></li>
<li><p>yarn.app.mapreduce.am.resource.mb</p></li>
<li><p>yarn.app.mapreduce.am.command-opts</p></li>
<li><p>mapreduce.map.memory.mb</p></li>
<li><p>mapreduce.reduce.memory.mb</p></li>
<li><p>mapreduce.map.java.opts</p></li>
<li><p>mapreduce.reduce.java.opts</p></li>
<li><p>mapreduce.task.io.sort.mb</p></li>
</ul>
<p>Also as a simple example we can autoconfigure before cluster validation
<code class="docutils literal notranslate"><span class="pre">dfs.replication</span></code> if amout of <code class="docutils literal notranslate"><span class="pre">datanodes</span></code> less than default value.</p>
<p>Also it’s required to add new plugin SPI method <code class="docutils literal notranslate"><span class="pre">recommend_configs</span></code> which
will autoconfigure cluster configs.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>It’s required to add new column <code class="docutils literal notranslate"><span class="pre">use_autoconfig</span></code> to cluster,
cluster_template, node_group, node_group_template, templates_relations
objects in DB. By default <code class="docutils literal notranslate"><span class="pre">use_autoconfig</span></code> will be <code class="docutils literal notranslate"><span class="pre">True</span></code>. If
<code class="docutils literal notranslate"><span class="pre">use_autoconfig</span></code> is <code class="docutils literal notranslate"><span class="pre">False</span></code>, then we will not use autoconfiguration
during cluster creation. If none of the configs from the list above are
configured manually and <code class="docutils literal notranslate"><span class="pre">use_autoconfig</span></code> is <code class="docutils literal notranslate"><span class="pre">True</span></code>, then we will
autoconfigure configs from list above. Same behaviour will be used for
node_groups configs autoconfiguration.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Need to support of switch off autoconfiguration.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Need to support of switch off autoconfiguration via python-saharaclient.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Need to add new checkbox which will allow to swith off autoconfiguration from
Horizon during cluster creation/scaling. If plugin doesn’t support autoconfig
this checkbox will not be displayed. We can use <code class="docutils literal notranslate"><span class="pre">_info</span></code> field at [3] for
field.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
<dt>Other contributors:</dt><dd><p>sreshetniak</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Proposed change will consists with following steps:</p>
<ul class="simple">
<li><p>Implement new plugin SPI method which will provide configuration advices;</p></li>
<li><p>Add support of this method in following plugins: CDH, Vanilla 2.6.0,
Spark (<code class="docutils literal notranslate"><span class="pre">dfs.replication</span></code> only);</p></li>
<li><p>Provide ability to switch on autoconfiguration via UI;</p></li>
<li><p>Provide ability to switch on autoconfiguration via saharaclient;</p></li>
<li><p>Update WADL docs about new feilds objects.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on Openstack requirements</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be implemented for this feature. Sahara CI also can start use
autoconfiguration as well.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to document feature and all rules, which will be used for
autoconfiguration.</p>
</section>
<section id="references">
<h2>References</h2>
<p>[1] <a class="reference external" href="https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py">https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py</a>
[2] <a class="reference external" href="https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.1/services/stack_advisor.py">https://apache.googlesource.com/ambari/+/a940986517cbfeb2ef889f0d8a45579b27adad1c/ambari-server/src/main/resources/stacks/HDP/2.1/services/stack_advisor.py</a>
[3] <a class="reference external" href="https://github.com/openstack/sahara/blob/master/sahara/service/api.py#L188">https://github.com/openstack/sahara/blob/master/sahara/service/api.py#L188</a></p>
</section>
Wed, 22 Apr 2015 00:00:00 Retry of all OpenStack clients callshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/clients-calls-retry.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/clients-calls-retry">https://blueprints.launchpad.net/sahara/+spec/clients-calls-retry</a></p>
<p>This specification proposes to add ability of retrying OpenStack clients calls
in case of occasional errors occurrence.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara uses a bunch of OpenStack clients to communicate with other OpenStack
services. Sometimes during this clients calls can be occurred occasional
errors that lead to Sahara errors as well. If you make a lot of calls, it may
not be surprising if one of them doesn’t respond as it should - especially for
a service under heavy load.</p>
<p>You make a valid call and it returns a 4xx or 5xx error. You make the same
call again a moment later, and it succeeds. To prevent such kind of failures,
all clients calls should be retried. But retries should be done only for
certain error codes, because not all of the errors can be avoided just with
call repetition.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Swift client provides the ability of calls retry by its own. So, only number of
retries and retry_on_ratelimit flag should be set during client initialisation.</p>
<p>Neutron client provides retry ability too, but repeats call only if
<code class="docutils literal notranslate"><span class="pre">ConnectionError</span></code> occurred.</p>
<p>Nova, Cinder, Heat, Keystone clients don’t offer such functionality at all.</p>
<p>To retry calls <code class="docutils literal notranslate"><span class="pre">execute_with_retries(method,</span> <span class="pre">*args,</span> <span class="pre">**kwargs)</span></code> method will be
implemented. If after execution of given method (that will be passed with first
param), error occurred, its <code class="docutils literal notranslate"><span class="pre">http_status</span></code> will be compared with http statuses
in the list of the errors, that can be retried. According to that, client call
will get another chance or not.</p>
<p>There is a list of errors that can be retried:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">REQUEST_TIMEOUT</span> <span class="pre">(408)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">OVERLIMIT</span> <span class="pre">(413)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">RATELIMIT</span> <span class="pre">(429)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">INTERNAL_SERVER_ERROR</span> <span class="pre">(500)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">BAD_GATEWAY</span> <span class="pre">(502)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">SERVICE_UNAVAILABLE</span> <span class="pre">(503)</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">GATEWAY_TIMEOUT</span> <span class="pre">(504)</span></code></p></li>
</ul>
<p>Number of times to retry the request to clients before failing will be taken
from <code class="docutils literal notranslate"><span class="pre">retries_number</span></code> config value (5 by default).</p>
<p>Time between retries will be configurable (<code class="docutils literal notranslate"><span class="pre">retry_after</span></code> option in
config) and equal to 10 seconds by default. Additionally, Nova client provides
<code class="docutils literal notranslate"><span class="pre">retry_after</span></code> field in <code class="docutils literal notranslate"><span class="pre">OverLimit</span></code> and <code class="docutils literal notranslate"><span class="pre">RateLimit</span></code> error classes, that
can be used instead of config value in this case.</p>
<p>These two config options will be under <code class="docutils literal notranslate"><span class="pre">timeouts</span></code> config group.</p>
<p>All clients calls will be replaced with <code class="docutils literal notranslate"><span class="pre">execute_with_retries</span></code> wrapper.
For example, instead of the following method call</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">nova</span><span class="o">.</span><span class="n">client</span><span class="p">()</span><span class="o">.</span><span class="n">images</span><span class="o">.</span><span class="n">get_registered_image</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
</pre></div>
</div>
<p>it will be</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">execute_with_retries</span><span class="p">(</span><span class="n">nova</span><span class="o">.</span><span class="n">client</span><span class="p">()</span><span class="o">.</span><span class="n">images</span><span class="o">.</span><span class="n">get_registered_image</span><span class="p">,</span> <span class="nb">id</span><span class="p">)</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding new options to Sahara config;</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">execute_with_retries</span></code> method implementation;</p></li>
<li><p>Replacing OpenStack clients call with <code class="docutils literal notranslate"><span class="pre">execute_with_retries</span></code> method.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be added. They will check that only specified errors will
be retried</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Fri, 17 Apr 2015 00:00:00 Deprecation of Direct Enginehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/deprecate-direct-engine.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/deprecate-direct-engine">https://blueprints.launchpad.net/sahara/+spec/deprecate-direct-engine</a></p>
<p>Currently Sahara has two types of infrastructure engines. First is Direct
Infrastructure Engine and second is Heat Infrastructure Engine. This spec
proposes deprecating the Direct Engine.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Each time when Sahara start support new feature, Sahara should support it in
both engines. So, it became much harder to support both versions of engines,
because in case of Direct Engine it would need to have duplication of work,
which is already done in Heat.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>It’s proposed to deprecate Direct Engine in Liberty release, but it will be
available to use. After merging this spec Direct Engine should be <code class="docutils literal notranslate"><span class="pre">freezed</span></code>
for new feautures which will be added in Liberty. It will be opened for
fixes of <code class="docutils literal notranslate"><span class="pre">High</span></code> and <code class="docutils literal notranslate"><span class="pre">Critical</span></code> bugs. We should make Heat Engine used by
default in Sahara.</p>
<p>This change will allow to switch most testing jobs in Sahara CI
to use Heat Engine instead of Direct Engine.</p>
<p>In M release we should remove all operations from direct engine.
After that the only operation which can be done with direct-engine-created
cluster is the cluster deletion. We should rewrite cluster deletion behavior
to support deletion direct-engine-created cluster via Heat Engine. Now heat
engine removes cluster from database but doesn’t remove all cluster elements
(for example, instances).</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Sahara can continue support of both engines.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Deployers should switch to use Heat Engine instead of Direct Engine.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>New features, which impacts infrastructure part of Sahara, should be supported
only in Heat Engine.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>This change will require following changes:</p>
<ul class="simple">
<li><p>Add deprecation warnings at sahara startup.</p></li>
<li><p>Mark Heat Engine as default in Sahara.</p></li>
<li><p>Document deprecation of Direct Engine.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>This change require manual testing of deletion direct-engine-created cluster
after switch to heat engine.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to document that Direct Engine became deprecated.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 14 Apr 2015 00:00:00 Adding cluster/instance/job_execution ids to log messageshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/adding-ids-to-logs.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/logs-improvement">https://blueprints.launchpad.net/sahara/+spec/logs-improvement</a></p>
<p>This specification proposes to add more information to Sahara logs.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Looking at some Sahara logs it is difficult to determine what
cluster/instance/job_execution to which they refer.</p>
<p>Extra information should be added:</p>
<ul class="simple">
<li><p>logs associated with cluster creation/scaling/deletion should contain
cluster id;</p></li>
<li><p>logs associated with job execution/canceling/deletion should contain
job execution id;</p></li>
<li><p>logs associated with operations executed on specific instance should
contain id of this instance.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Information can be added to context resource_uuid field and then can be
used by ContextAdapter in openstack.common.log for a group of logs.</p>
<p>This change requires additional saving of context in
openstack.common.local.store to access context from openstack.common.log</p>
<p>We need to set cluster id and job execution id only once. So, it could be done
with 2 methods that will be added to sahara.context:
set_current_cluster_id(cluster_id) and set_current_job_execution_id(je_id)</p>
<p>Additionally, instances and their ids are changing during the thread. So,
instance id should be set only when operation executes on this instance.
It will be provided by class SetCurrentInstanceId and will be used with
wrapping function set_current_instance_id(instance_id) this way:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="k">with</span> <span class="n">set_current_instance_id</span><span class="p">(</span><span class="n">instance_id</span><span class="p">):</span>
</pre></div>
</div>
<p>Code inside “with” statement will be executed with new context (which
includes instance id in resource_uuid field) but outside of it context will
stay the same.</p>
<p>If instance and cluster specified, log message will looks like:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">2014-12-22 13:54:19.574 23128 ERROR sahara.service.volumes [-] [instance:</span>
<span class="go">3bd63e83-ed73-4c7f-a72f-ce52f823b080, cluster: 546c15a4-ab12-4b22-9987-4e</span>
<span class="go">38dc1724bd] message</span>
</pre></div>
</div>
<p>If only cluster specified:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">2014-12-22 13:54:19.574 23128 ERROR sahara.service.volumes [-] [instance:</span>
<span class="go">none, cluster: 546c15a4-ab12-4b22-9987-4e38dc1724bd] message</span>
</pre></div>
</div>
<p>If job execution specified:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">2014-12-22 13:54:19.574 23128 ERROR sahara.service.edp.api [-] [instance:</span>
<span class="go">none, job_execution: 9de0de12-ec56-46f9-80ed-96356567a196] message</span>
</pre></div>
</div>
<p>Field “instance:” is presented in every message (even if it’s not necessary)
because of default value of instance_format=’[instance: %(uuid)s] ‘
that cannot be fixed without config changing.</p>
<p>After implementation of this changes, Sahara log messages should be checed and
fixed to avoid information duplication.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Information can be added manually to every log message.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability to access context from openstack.common.log;</p></li>
<li><p>Adding information about cluster/instance/job execution ids to context;</p></li>
<li><p>Fixing log messages to avoid information duplication.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 19 Mar 2015 00:00:00 Allow placeholders in datasource URLshttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/edp-datasource-placeholders.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-datasource-placeholders">https://blueprints.launchpad.net/sahara/+spec/edp-datasource-placeholders</a></p>
<p>This spec is to allow using placeholders in EDP data source URL.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Common use case: user wants to run EDP job two times. Now the only way to do
that with the same data sources is to erase result of the first run before
running job the second time. Allowing to have random part in URL will allow
to use output with random suffix.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Introduce special strings that could be used in EDP data source URL and will
be replaced with appropriate value.</p>
<p>The proposed syntax for placeholder is %FUNC(ARGS)%.</p>
<p>As a first step I suggest to implement two functions only:</p>
<ul class="simple">
<li><p>%RANDSTR(len)% - will be replaced with random string of lowercase letters of
length <code class="docutils literal notranslate"><span class="pre">len</span></code>.</p></li>
<li><p>%JOB_EXEC_ID% - will be replaced with the job execution ID.</p></li>
</ul>
<p>Placeholders will not be allowed in protocol prefix. So, there will be no
validation impact.</p>
<p>List of functions could be extended later (e.g. to have %JOB_ID%, etc.).</p>
<p>URLs after placeholders replacing will be stored in <code class="docutils literal notranslate"><span class="pre">job_execution.info</span></code>
field during job_execution creation. This will allow to use them later to find
objects created by a particular job run.</p>
<p>Example of create request for data source with placeholder:</p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"demo-pig-output"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"A data source for Pig output, stored in Swift"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"swift://edp-examples.sahara/pig-job/data/output.%JOB_EXEC_ID%"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"credentials"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"user"</span><span class="p">:</span><span class="w"> </span><span class="s2">"demo"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"password"</span><span class="p">:</span><span class="w"> </span><span class="s2">"password"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do not allow placeholders.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p><code class="docutils literal notranslate"><span class="pre">job_execution.info</span></code> field (json dict) will also store constructed URLs.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Horizon need to be updated to display actual URLs for job execution. Input
Data Source and Output Data Source sections of job execution details page will
be extended to include information about URLs used.</p>
<p>REST will not be changed since new information is stored in the existing
‘info’ field.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement feature</p></li>
<li><p>Document feature</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Thu, 19 Mar 2015 00:00:00 Migrate to HEAT HOT languagehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/heat-hot.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/heat-hot">https://blueprints.launchpad.net/sahara/+spec/heat-hot</a></p>
<p>This blueprint suggests to rewrite cluster template for Heat from JSON to HOT
language.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Heat supports two different template languages: YAML-based
<a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/hot_guide.html">HOT</a> templates
and JSON-based CFN templates (no documentation about it, only a number of
examples).</p>
<p>HOT is the de-facto main markup language for Heat. <a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/index.html">Template Guide</a>
recommends to use HOT and contains examples for HOT only.</p>
<p>CFN templates are supported mostly for compatibility with AWS CloudFormation.</p>
<p>Sahara historically uses CFN templates. Given that Sahara is an integrated
OpenStack project it would be nice to switch to HOT.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>There is no urgent need in switching to HOT. But it would be nice to be
compliant with current tendencies in community.</p>
<p>This spec suggests to use HOT template language for sahara heat templates.</p>
<p>This will require changes mostly in .heat resources. Code that generate
template parts on the fly should be changed too.</p>
<p>Having templates written on HOT will simplify implementation of new
heat-related features like <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/heat-template-decomposition">template decomposition</a>.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do not change anything.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Change all .heat files used by Sahara</p></li>
<li><p>Update code that generates parts of template</p></li>
<li><p>Update unit tests</p></li>
<li><p>Make sure that sahara with heat engine still works in all supported
configurations</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Mostly manually. CI should also cover heat changes.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/hot_guide.html">http://docs.openstack.org/developer/heat/template_guide/hot_guide.html</a></p></li>
</ul>
</section>
Thu, 19 Mar 2015 00:00:00 [EDP] Add Spark Shell Action job typehttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/edp-add-spark-shell-action.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-add-spark-shell-action">https://blueprints.launchpad.net/sahara/+spec/edp-add-spark-shell-action</a></p>
<p>The EDP Shell action job type allows users to run arbitrary
shell scripts on their cluster, providing a great deal of flexibility
to extend EDP functionality without engine changes or direct cluster
interface. This specification proposes the addition of this job type
for the Spark engine.</p>
<p>A fuller explication of the benefits of this feature can be found in
the <a class="reference external" href="http://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-add-oozie-shell-action.html">edp-add-oozie-shell-action</a> specification, which need not be
repeated here.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>While the Oozie engine now supports Shell actions, Spark users do not
presently have access to this job type. Its addition would allow the
creation of cluster maintenance tools, pre- or post-processing jobs
which might be cumbersome to implement in Spark itself, the retrieval
of data from filesystems not supported as Sahara data sources, or any
other use case possible from a shell command.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>From an interface standpoint, the Spark Shell action implementation will
follow the Oozie Shell action implementation almost precisely:</p>
<ul class="simple">
<li><p>Shell jobs will require a single script binary in <code class="docutils literal notranslate"><span class="pre">mains</span></code>, which will be
pushed to the cluster’s master node and executed.</p></li>
<li><p>Shell jobs will optionally permit any number of file binaries to be
passed as <code class="docutils literal notranslate"><span class="pre">libs</span></code>, which will be placed in the script’s working directory
and may be used by the script as it executes.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">configs</span></code> will be permitted to allow Sahara EDP-internal features
(<code class="docutils literal notranslate"><span class="pre">substitute_data_source_for_uuid</span></code> and <code class="docutils literal notranslate"><span class="pre">subsitute_data_source_for_name</span></code>
will be implemented for this job type, as they are for Oozie Shell actions.)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">params</span></code> key-value pairs will be passed to the script as environment
variables (whether these are passed into a remote ssh client or injected
into the script itself is left to the discretion of the implementer.)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">args</span></code> will be passed to the script as positional command-line arguments.</p></li>
<li><p>The Shell engine for Spark will store files as the main Spark engine does,
creating a directory under /tmp/spark-edp/<code class="docutils literal notranslate"><span class="pre">job_name</span></code>/<code class="docutils literal notranslate"><span class="pre">job_execution_id</span></code>,
which will contain all required files and the output of the execution.</p></li>
<li><p>The Spark Shell engine will reuse the <code class="docutils literal notranslate"><span class="pre">launch_command.py</span></code> script (as used
by the main Spark engine at this time,) which will record childpid, stdout,
and stderr from the subprocess for record-keeping purposes.</p></li>
</ul>
<p>Spark Shell actions will differ in implementation from Oozie Shell actions
in the following ways:</p>
<ul class="simple">
<li><p>As Spark jobs and Shell actions which happen to be running on a Spark
cluster differ quite entirely, the Spark plugin will be modified to contain
two separate engines (provided via an extensible strategy pattern based on
job type.) Sensible abstraction of these engines is left to the discretion
of the implementer.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">configs</span></code> values which are not EDP-internal will not be passed to the
script by any means (as there is no intermediary engine to act on them.)</p></li>
<li><p>Spark Shell actions will be run as the image’s registered user, as Spark
jobs are themselves. As cluster and VM maintenance tasks are part of the
intended use case of this feature, allowing sudo access to the VMs is
desirable.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Do nothing.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>No additional changes after merge of the Oozie Shell action job type
implementation.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>No additional changes after merge of the Shell action job type UI
implementation.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>sgotliv</p>
</dd>
<dt>Other contributors:</dt><dd><p>egafford</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a Shell engine to the Spark plugin, and refactor this plugin to provide
an appropriate engine branching on job type.</p></li>
<li><p>Add an integration test for Spark Shell jobs (as per previous plugin-
specific Shell job tests).</p></li>
<li><p>Update the EDP documentation to specify that the Spark plugin supports the
Shell job type.</p></li>
<li><p>Verify that the UI changes made for Oozie Shell jobs are sufficient to
support the Shell job type in the Spark case (as is anticipated).</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This change builds on the change <a class="reference external" href="https://review.openstack.org/#/c/159920/">[EDP] Add Oozie Shell Job Type</a>.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit tests to cover the Spark Shell engine and appropriate engine selection
within the plugin.</p></li>
<li><p>One integration test to cover running of a simple shell job through the
Spark plugin.</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The EDP sections of the documentation need to be updated.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Mon, 16 Mar 2015 00:00:00 Decompose cluster template for Heathttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/heat-template-decomposition.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/heat-template-decomposition">https://blueprints.launchpad.net/sahara/+spec/heat-template-decomposition</a></p>
<p>Currently Sahara creates one large template with a lot of copy-paste
resources. Heat features like <a class="reference external" href="http://docs.openstack.org/developer/heat/template_guide/composition.html">template composition</a>
could help to move all composition work from Sahara to Heat. This will also
allow sahara to have individual cluster parts as separate templates and insert
them as resources (in comparison to current text manipulations).</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently Sahara serializes cluster resources as text to heat template. There
are several issues with this approach:</p>
<ol class="arabic simple">
<li><p>Code duplication. If a node group contains 10 instances the template will
contain all instance-dependent resources 10 times.</p></li>
<li><p>No code validation. There is no guarantee that resulting template will be
syntactically correct (Sahara treats it as text). Missing comma in one
resource could influence the other resource.</p></li>
<li><p>Not Sahara’s work. Sahara micro-manages the process of infrastructure
creation. This is Heat’s job.</p></li>
</ol>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Use <a class="reference external" href="http://docs.openstack.org/hot-reference/content/OS__Heat__ResourceGroup.html">OS::Heat::ResourceGroup</a>
for resources inside node group. Each node group will contain only one
resource group and specify number of instances needed. Each individual
instance of resource group will contain all resources needed for a
corresponding sahara instance (nova server, security group, volume,
floating ip, etc.).</p>
<p>This change will also prepare ground for node group auto-scaling feature.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Resulting Sahara stack in Heat will contain nested stacks.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add ability to generate separate ResourceGroup for a single instance of node
group</p></li>
<li><p>Switch template for node groups to ResourceGroup with specified count of
instances</p></li>
<li><p>Update unit tests</p></li>
<li><p>Make sure that Sahara with heat engine still works for all supported
configurations</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually. CI will also cover changes in heat.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 11 Mar 2015 00:00:00 Storm EDPhttps://specs.openstack.org/openstack/sahara-specs/specs/liberty/storm-edp.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/storm-edp">https://blueprints.launchpad.net/sahara/+spec/storm-edp</a></p>
<p>This blueprint aims to implement EDP for Storm. This will require a Storm Job
Type.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara needs an EDP implementation to allow the submission of Storm Jobs.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation of the EDP engine will have 3 basic functions:</p>
<ul class="simple">
<li><p>run_job()</p></li>
<li><p>get_job_status()</p></li>
<li><p>cancel_job(kill=False)</p></li>
</ul>
<p>This methods are mapped to Storm’s:
* deploy_toplogy (i.e. storm jar topology-jar-path class …)
* storm list (i.e. storm list)
* storm deactivate (i.e storm deactivate topology-name)
* storm kill (i.e. storm kill topology-name)</p>
<p>The second part of this implementation is to adapt the UI to allow Storm Job
submission.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We may be able to submit Storm jobs as Java Job but it is better for the user
to have a specific Storm Job.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Sahara needs to adapt its UI to allow creation of Storm Jobs. A draft was done
by crobertsrh and can be found in <a class="reference external" href="https://review.openstack.org/#/c/112408/4">https://review.openstack.org/#/c/112408/4</a></p>
<p>The main changes in the UI will be:
* Box for the user to define the main class to be executed
* Box for the user to give parameters (if applicable)
* Buttons to control job execution (Start, Stop, Kill, View Status)
* Since it is possible to have more than one job executing in the same topology
the control can be done by job or by topology. In the second case the user
will have to choose between the jobs in the topology to control.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>tellesmvn</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay (primary review)
crobertsrh</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement Storm Job Type</p></li>
<li><p>Implement EDP engine for Storm</p></li>
<li><p>Implement Unit tests</p></li>
<li><p>Implement integration tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>First we will implement Unit Tests that follow the example from Spark found in
<a class="reference external" href="https://github.com/openstack/sahara/blob/master/sahara/tests/unit/service/edp/spark/test_spark.py">https://github.com/openstack/sahara/blob/master/sahara/tests/unit/service/edp/spark/test_spark.py</a>
And also implement the integration tests</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The documentation needs to be updated with information about Storm EDP and also
about Storm Job Type.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><cite>Etherpad <https://etherpad.openstack.org/p/juno-summit-sahara-edp></cite></p></li>
<li><p><cite>Storm Documentation <http://storm.incubator.apache.org/></cite></p></li>
</ul>
</section>
Wed, 11 Mar 2015 00:00:00 Cinder volume instance locality functionalityhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/volume-instance-locality.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/volume-instance-locality">https://blueprints.launchpad.net/sahara/+spec/volume-instance-locality</a></p>
<p>This specification proposes to add an opportunity to have an instance and an
attached volumes on the same physical host.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently there is no way to specify that volumes are attached to the same
physical host as the instance. It would be nice to have this opportunity.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This feature could be done with Cinder InstanceLocalityFilter which allows to
request creation of volumes local to instance. It would increase performance
of I/O operations.</p>
<p>There will be several changes:</p>
<ul class="simple">
<li><p>Boolean field <code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> will be added to every node group
template, node group and templates relation. This field will be optional and
<code class="docutils literal notranslate"><span class="pre">False</span></code> by default.</p></li>
<li><p>If <code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> is set to <code class="docutils literal notranslate"><span class="pre">True</span></code>, Cinder volumes will be
created on the same host.</p></li>
<li><p>If <code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> is <code class="docutils literal notranslate"><span class="pre">True</span></code>, all instances of the node group
should be created on hosts with free disk space >= <code class="docutils literal notranslate"><span class="pre">volumes_per_node</span></code> *
<code class="docutils literal notranslate"><span class="pre">volumes_size</span></code>. If it cannot be done, error should be occurred.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p><code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> field should be added to <code class="docutils literal notranslate"><span class="pre">node_groups</span></code>,
<code class="docutils literal notranslate"><span class="pre">node_group_templates</span></code>, <code class="docutils literal notranslate"><span class="pre">templates_relations</span></code>.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<ul class="simple">
<li><p>API will be extended to support <code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> option.
<code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> is optional argument with <code class="docutils literal notranslate"><span class="pre">False</span></code> value by
default, so this change will be backward compatible.</p></li>
<li><p>python client will be updated</p></li>
</ul>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>New field to select <code class="docutils literal notranslate"><span class="pre">volume_local_to_instance</span></code> option during node group
template creation will be added.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability to create instance and volumes on the same host;</p></li>
<li><p>Adding ability to create instance on appropriate host;</p></li>
<li><p>Updating documentation;</p></li>
<li><p>Updating UI;</p></li>
<li><p>Updating python client;</p></li>
<li><p>Adding unit tests.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be added.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation will be updated. Will be documented when this feature can be
used and when it cannot. Also will be noted how to enable it on Cinder side.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="http://docs.openstack.org/developer/cinder/api/cinder.scheduler.filters.instance_locality_filter.html">http://docs.openstack.org/developer/cinder/api/cinder.scheduler.filters.instance_locality_filter.html</a></p></li>
</ul>
</section>
Tue, 17 Feb 2015 00:00:00 Add a common HBase lib in hdfs on cluster starthttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-add-hbase-sharelib.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-add-hbase-lib">https://blueprints.launchpad.net/sahara/+spec/edp-add-hbase-lib</a></p>
<p>HBase applications written in Java require JARs on the HBase classpath.
Java actions launched from Oozie may reference JARs stored in hdfs
using the <cite>oozie.libpath</cite> configuration value, but there is no
standard HBase directory location in hdfs that is installed with Oozie.</p>
<p>Users can build their own HBase directory in hdfs manually from a cluster
node but it would be convenient if Sahara provided an option to build
the directory at cluster launch time.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>HBase applications written in Java require the Hbase classpath. Typically,
a Java program will be run this way using /usr/bin/hbase to get the classpath:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/>java -cp `hbase classpath`:MyHbaseApp.jar MyHBaseApp
</pre></div>
</div>
<p>Java jobs launched from EDP are Oozie actions, and there is no way to set
an extra classpath value. Instead, the Oozie solution to this problem is
to create a directory of JAR files in hdfs and then set the
<cite>oozie.libpath</cite> configuration property on the job to that location.
This causes Oozie to make all of the jars in the directory available to the
job.</p>
<p>Sahara currently supports setting the <cite>oozie.libpath</cite> configuration on a job
but there is no existing collection of HBase JARs to reference. Users can log
into a cluster node and build an HBase directory in hdfs manually using bash or
Python. The steps are relatively simple:</p>
<ul class="simple">
<li><p>Run the <code class="docutils literal notranslate"><span class="pre">hbase</span> <span class="pre">classpath</span></code> command to retrieve the classpath as a string</p></li>
<li><p>Separate the string on the <strong>:</strong> character</p></li>
<li><p>Prune away all paths that do not end in <strong>.jar</strong></p></li>
<li><p>Upload all of the remaining paths to the designated directory in hdfs</p></li>
</ul>
<p>However, it would be relatively simple for Sahara to do this optionally
at cluster creation time for clusters that include HBase services.</p>
<p>Note that the same idea of a shared hdfs directory is used in two
different but related ways in Oozie:</p>
<ul class="simple">
<li><p>the <cite>Oozie sharelib</cite> is a pre-packaged collection of JARs released and
supported as part of Oozie and referenced from a job by
setting the <cite>oozie.use.system.libpath</cite> configuration parameter to <cite>True</cite>.
Sahara already sets this option for all Ooozie-based jobs.
The official Oozie sharelib changes over time, and Oozie uses a timestamp
naming convention to support upgrades, multiple versions, etc.</p></li>
<li><p>the ability to create an hdfs directory containing JAR files and reference
it from a job with the <cite>oozie.libpath</cite> configuration parameter is open to
anyone. This is what is being proposed here. This change in no way touches
the official <cite>Oozie sharelib</cite>. If Oozie ever adds HBase JARs to the
system sharelib, we probably will no longer need this feature.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Create a class that can be shared by any provisioning plugins that support
installing the HBase service on a cluster. This class should provide a
method that runs remote commands on a cluster node to:</p>
<ul class="simple">
<li><p>Run the <code class="docutils literal notranslate"><span class="pre">hbase</span> <span class="pre">classpath</span></code> command to retrieve the classpath as a string</p></li>
<li><p>Separate the string on the <strong>:</strong> character</p></li>
<li><p>Prune away all paths that do not end in <strong>.jar</strong></p></li>
<li><p>Upload all of the remaining paths to pre-determined directory in hdfs</p></li>
</ul>
<p>A code sample in the reference section below shows one method of doing this
in a Python script, which could be uploaded to the node and executed via
remote utilities.</p>
<p>The HBase hdfs directory can be fixed, it does not need to be configurable. For
example, it can be “/user/sahara-hbase-lib” or something similar. It should be
readable by the user that runs Hadoop jobs on the cluster. The EDP engine can
query this class for the location of the directory at runtime.</p>
<p>An option can be added to cluster configs that controls the creation of this
hdfs library. The default for this option should be True. If the config option
is True, and the cluster is provisioned with an HBase service, then the hdfs
HBase library should be created after the hdfs service is up and running and
before the cluster is moved to “Active”.</p>
<p>A job needs to set the <cite>oozie.libpath</cite> value to reference the library.
Setting it directly presents a few problems:</p>
<ul class="simple">
<li><p>the hdfs location needs to be known to the end user</p></li>
<li><p>it exposes more “Oozie-ness” to the end user. A lot of “Oozie-ness” has
leaked into Sahara’s interfaces already but there is no reason to
add to it.</p></li>
</ul>
<p>Instead, we should use an <cite>edp.use_hbase_lib</cite> boolean configuration
parameter to specify whether a job should use the HBase hdfs library. If this
configuration parameter is True, EDP can retrieve the hdfs location
from the utility class described above and set <cite>oozie.libpath</cite> accordingly.
Note that if for some reason an advanced user has already set <cite>oozie.libpath</cite>
to a value, the location of the HBase lib should be added to the value (which
may be a comma-separated list).</p>
<section id="alternatives">
<h3>Alternatives</h3>
<ul>
<li><p>Do nothing. Let users make their own shared libraries.</p></li>
<li><p>Support Oozie Shell actions in Sahara.</p>
<p>Shell actions are a much more general feature under consideration
for Sahara. Assuming they are supported, a Shell action provides
a way to launch a Java application from a script and the classpath
can be set directly without the need for an HBase hdfs library.</p>
<p>Using a Shell action would allow a user to run a script. In a script
a user would have complete control over how to launch a Java application
and could set the classpath appropriately.</p>
<p>The user experience would be a little different. Instead of just writing
a Java HBase application and launching it with <cite>edp.use_hbase_lib</cite> set to
True, a user would have to write a wrapper script and launch that as a
Shell action instead.</p>
</li>
</ul>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>We may want a simple checkbox option on the UI for Java actions
to set the <cite>edp.use_hbase_libpath</cite> config so that users don’t need to
add it by hand.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>huichun</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
</section>
<section id="testing">
<h2>Testing</h2>
<p>An EDP integration test on a cluster with HBase installed would be great test
coverage for this since it involves cluster configuration.</p>
<p>Unit tests can verify that the oozie.libpath is set correctly.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We need to document how to enable creation of the shared lib at cluster
creation time, and how to configure a job to reference it</p>
</section>
<section id="references">
<h2>References</h2>
<p>Here is a good blog entry on Oozie shared libraries in general.</p>
<blockquote>
<div><p><a class="reference external" href="http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/">http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/</a></p>
</div></blockquote>
<p>Here is a simple script that can be used to create the lib:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="ch">#!/usr/bin/python</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="s2">"hadoop fs -mkdir </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="n">cp</span><span class="p">,</span> <span class="n">stderr</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="s2">"hbase classpath"</span><span class="p">,</span>
<span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">,</span>
<span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">)</span><span class="o">.</span><span class="n">communicate</span><span class="p">()</span>
<span class="n">paths</span> <span class="o">=</span> <span class="n">cp</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">:</span>
<span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s2">".jar"</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span><span class="s2">"hadoop fs -put </span><span class="si">%s</span><span class="s2"> </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">realpath</span><span class="p">(</span><span class="n">p</span><span class="p">),</span>
<span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span>
<span class="n">shell</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">"__main__"</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</pre></div>
</div>
</section>
Fri, 13 Feb 2015 00:00:00 Clean up clusters that are in non-final state for a long timehttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/periodic-cleanup.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/periodic-cleanup">https://blueprints.launchpad.net/sahara/+spec/periodic-cleanup</a></p>
<p>This spec is to introduce periodic task to clean up old clusters in
non-final state.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>For now it is possible that sahara cluster becomes stuck because of different
reasons (e.g. if sahara service was restarted during provisioning or neutron
failed to assign floating IP). This could lead to clusters holding resources
for a long time. This could happen in different tenants and it is hard to
check such conditions manually.</p>
<p>Related bug: <a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1185909">https://bugs.launchpad.net/sahara/+bug/1185909</a></p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add “cleanup_time_for_nonfinal_clusters” parameter in “periodic” section of
configuration.</p>
<p>Based on this configuration periodic task will search clusters that are in
non-final state and weren’t updated for a given time.</p>
<p>Term “non-final” includes all cluster states except “Active” and “Error”.</p>
<p>“cleanup_time_for_nonfinal_clusters” parameter will be in hours. Non-positive
value will indicate that clean up option is disabled.</p>
<p>Default value will be 0 to keep backward compatibility (users don’t expect
that after upgrade all their non-final cluster will be deleted).</p>
<p>‘updated_at’ column of ‘clusters’ column will be used to determine last
change. This is not 100% accurate, but good enough. This field is changed
each time cluster status is changed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Add such functionality to external service (e.g. Blazar).</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement feature</p></li>
<li><p>Document feature</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1185909">https://bugs.launchpad.net/sahara/+bug/1185909</a></p></li>
</ul>
</section>
Mon, 26 Jan 2015 00:00:00 Refactor MapR plugin codehttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/mapr-refactor.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/mapr-refactor">https://blueprints.launchpad.net/sahara/+spec/mapr-refactor</a></p>
<p>MapR plugin’s code should be refactored to support easy addition of new
services and releases</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current plugin implementation has several weaknesses:
* Declarative nature of Service
* Code complexity caused by usage of plugin-spec.json
* Almost all actions are implemented as sequence of utility functions calls
* Not clear separation of behaviour to modules and classes
* Some code is redundant</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ul class="simple">
<li><p>Extract Service entity to separate class with customizable behaviour</p></li>
<li><p>Move provisioning logic from utility modules to specialized classes</p></li>
<li><p>Remove redundant code</p></li>
</ul>
<p>MapR plugin implementation delegates all operations to it’s counterparts in
VersionHandler interface. VersionHandler interface mimics plugin SPI with
additional methods get_context(cluster) and get_services(). ClusterContext
object returned by get_context wraps cluster object passed as argument and
provides additional information about cluster as well as utility methods
related to wrapped cluster.</p>
<p>Service definitions resides in ‘sahara.plugins.mapr.services’ package instead
of plugin-spec.json which is completely removed now. Each service definition
represents particular version of service.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Leave code “as-is”</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers of subsequent versions of the MapR plugin should take into account
this changes.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>aosadchiy</p>
</dd>
<dt>Other contributors:</dt><dd><p>ssvinarchuk</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>Extract Service entity to separate class with customizable behaviour</p></li>
<li><p>Move provisioning logic from utility modules to specialized classes</p></li>
<li><p>Remove redundant code</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Existing integration tests are sufficient</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Sun, 18 Jan 2015 00:00:00 Enable HDFS NameNode High Availability with HDP 2.0.6 pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/hdp-plugin-enable-hdfs-ha.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/hdp-plugin-enable-hdfs-ha">https://blueprints.launchpad.net/sahara/+spec/hdp-plugin-enable-hdfs-ha</a></p>
<p>Extend HDP 2.0.6 plugin to include the setup and configuration of the HDFS
NameNode High Availability after creating, configuring and starting the
cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Hadoop clusters are created with a single NameNode which represents a SPOF
(Single Point Of Failure). If the NameNode becomes unavailable, the whole
cluster becomes unavailable and we have to wait for the NameNode to come back
up before using the cluster again. NameNode High Availability was introduced in
Hadoop 2.0.0 and integrated into HDP as of the 2.0.0 release. The High
Availability is achieved by having 2 NameNodes, an active NameNode and a stand
by one. When the active fails the standby steps in and act as the cluster’s
NameNode. NameNode’s High Availability can be configured manually on clusters
deployed with HDP 2.0.6 plugin in Sahara through Ambari, but the process is
long, tedious and error prone.</p>
<p>End users might not have the necessary skills required to setup the High
Availability and deployers might prefer to deploy highly available clusters
without manually configuring each one.</p>
<p>HDFS NameNode High Availability (Using Quorum Journal Manager (QJM)) uses
Journal nodes to share HDFS edits between the active and the standby
namenodes. The journal nodes ensure that the two namenodes have the same set of
HDFS edits, but do not ensure the automatic failover. The automatic failover
requires Zookeeper servers and Zookeeper Failover Controllers (ZKFC). A
typical cluster with HDFS NameNode High Availability uses at least three (or a
an odd number greater than three) journal nodes and a zookeeper cluster with
three or five zookeeper servers. The Zookeeper Failover Controllers are
installed on the servers acting as active and standby namenodes. The setup
removes the secondary namenode (which is usually installed on a different
server than the one hosting the namenode) and installs a second namenode
process. The journal nodes and zookeeper servers can be installed on the same
servers running the active and standby (old secondary namenode) namenodes. This
leaves us with 2 journal nodes and 2 zookeeper servers. An additional server is
all what’s needed with journal node and zookeeper server to setup a minimally
viable Hadoop cluster with HDFS NameNode High Availability. (For more info :
<a class="reference external" href="http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html">http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html</a>)</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The ‘Create Node Group Template’ wizard will introduce a new process
‘JOURNALNODE’ to the list of available processes for HDP 2.0.6 plugin. Other
processes necessary for HDFS HA are either already included in the list
(NAMENODE, ZOOKEEPER_SERVER and SECONDARY_NAMENODE) or will be automatically
setup (Zookeeper Failover Controllers).</p>
<p>The ‘Launch Cluster’ wizard for HDP 2.0.6 plugin will include a checkbox
‘Enable HDFS HA’. This option will default to False and will be added to the
cluster object.</p>
<p>The verification code will verify the necessary requirements for a Hadoop
cluster and a set of additional requirements in the case where ‘Enable HDFS HA’
is set to True. The requirements include : NAMENODE and SECONDARY_NAMENODE on
different servers At least three journal nodes on different servers At least
three zookeeper servers on different servers</p>
<p>Upon successful validation the cluster will be created and once it’s in the
‘Active’ state Availability’ and if ‘Enable HDFS HA’ is True the service will
instruct the plugin to start configuring the NameNode High Availability. The
cluster will be set in ‘Configuring HDFS HA’ state and the plugin will start
the configuration procedure. The procedure starts by the plugin stopping all
the services and executing some preparation commands on the server with the
namenode process (through the hadoopserver objects). Then the plugin installs
and starts the journal nodes using Ambari REST API (POST, PUT, WAIT ASYNC).
Next the configuration is updated using Ambari REST API (PUT), other services
including Hive, Oozie and Hue might require configuration update if they are
installed. Finally some more remote commands are executed on the namenodes and
the Zookeeper Failover Controllers are installed and started and the
SECONDARY_NAMENODE process is deleted. The plugin will return and the cluster
will be set back in the ‘Active’ state.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Manual setup through Ambari web interface</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers of subsequent versions of the HDP plugin should take into account
this option and the added functionality. The procedure is not likely to change
in newer versions of HDP as it uses Ambari’s API which stays intact with newer
versions.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>A checkbox ‘Enable HDFS HA’ in the ‘Launch Cluster’ wizard when the user
chooses HDP 2.0.6 plugin.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>abbass-marouni</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a new attribute to cluster-level configs to indicate whether HA is
enabled or not.</p></li>
<li><p>Add new service classes to HDP 2.0.6 for Journal nodes and Zookeeper Failover
Controllers</p></li>
<li><p>Add new remote methods to hdp hadoopserver.py for remote commands</p></li>
<li><p>Add new methods to generate new configurations according to cluster
configuration</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit Test service classes
Unit Test new cluster specs
Integration Test cluster creation with HA
Integration Test cluster creation without HA</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Update documentation to reflect new changes and to explain new options.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 14 Jan 2015 00:00:00 Indirect access to VMshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/indirect-vm-access.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/indirect-vm-access">https://blueprints.launchpad.net/sahara/+spec/indirect-vm-access</a></p>
<p>This blueprint proposes one more way for Sahara to manage VMs. Management
could be done via VM that works as proxy node. In this case access
from controller should be given for one VM only.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently there are several ways to give Sahara access to VMs:</p>
<ol class="arabic simple">
<li><p>flat private network</p>
<ul class="simple">
<li><p>not secure</p></li>
<li><p>doesn’t work with neutron</p></li>
</ul>
</li>
<li><p>floating IPs</p>
<ul class="simple">
<li><p>all nodes need to have floating IPs</p>
<ul>
<li><p>floating IPs are limited resource</p></li>
<li><p>floating IPs are usually for external world, not for access from
controller</p></li>
</ul>
</li>
<li><p>all nodes need to be accessible from controller nodes</p>
<ul>
<li><p>it is more complicated in HA mode</p></li>
</ul>
</li>
<li><p>access to data nodes should be secure</p></li>
</ul>
</li>
<li><p>net_ns</p>
<ul class="simple">
<li><p>hard to configure</p></li>
<li><p>can be inappropriate</p></li>
<li><p>doesn’t work in HA mode</p></li>
</ul>
</li>
</ol>
<ol class="arabic simple" start="5">
<li><p>tenant-specific proxy node (<a class="reference external" href="https://review.openstack.org/#/c/131142/">https://review.openstack.org/#/c/131142/</a>)</p>
<ul class="simple">
<li><p>proxy setting is for the whole system (template based)</p></li>
<li><p>proxy can’t be configured for a specific cluster</p></li>
<li><p>proxy node needs to be spawned and configured manually</p></li>
</ul>
</li>
</ol>
<ol class="arabic simple" start="4">
<li><p>agents</p>
<ul class="simple">
<li><p>not implemented yet</p></li>
<li><p>require external message queue accessible from VMs and controllers</p></li>
<li><p>require maintenance of agents</p></li>
</ul>
</li>
</ol>
<p>So, there can be cases when none of listed approaches work.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>This blueprint proposes one more way to access VMs by Sahara.
Sahara will use one of spawned VMs as proxy node and gain access to all other
nodes through it. Access to VM that has proxy node role could be gained using
any of methods above.</p>
<p>Sahara will understand which node to use as a proxy node by “is_proxy” field
of nodegroup. If this nodegroup contains several instances the first one will
be used as proxy (this leaves space for load balancing).</p>
<p>So, proposed workflow:</p>
<ol class="arabic simple">
<li><p>Nodegoup object is extended with “is_proxy” field, horizon is changed
accordingly.</p></li>
<li><p>User selects “is_proxy” checkbox for one of node groups (manager, master or
separate one). If proxy is used for separate node group it could be with
really small flavor.</p></li>
<li><p>Sahara spawns all infrastructure</p></li>
<li><p>Sahara communicates with all instances via node with proxy role. Internal
IPs are used for communication. This removes restriction that all nodes
must have floating IP in case if floating network used for management. In
case with proxy node this restriction will be applied to proxy node only.</p></li>
</ol>
<p>Pros:</p>
<ol class="arabic simple">
<li><p>we need external access to only one VM.</p></li>
<li><p>data nodes could be isolated</p></li>
</ol>
<p>Cons:</p>
<ol class="arabic simple">
<li><p>Indirect access could be slower.</p></li>
<li><p>Loss of proxy means loss of access to entire cluster. Intellectual
selection is possible, but not planned for the first implementation.</p></li>
</ol>
<p>Implementation will extend global proxy implemented at
<a class="reference external" href="https://review.openstack.org/#/c/131142/">https://review.openstack.org/#/c/131142/</a>. For indirect access Paramiko-based
analogy of “ssh proxy nc host port” command will be used. Paramiko
implementation will allow to use private keys from memory.</p>
<p>Note, proxy command can still be used to access proxy instance.</p>
<section id="implementation-details">
<h3>Implementation details</h3>
<p>Sahara uses two ways of access to instances:</p>
<ol class="arabic simple">
<li><p>SSH</p></li>
<li><p>HTTP</p></li>
</ol>
<section id="ssh-access">
<h4>SSH access</h4>
<p>For ssh access one more layer of ssh will be added. Old code:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">_ssh</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">,</span> <span class="n">pkey</span><span class="o">=</span><span class="n">private_key</span><span class="p">,</span> <span class="n">sock</span><span class="o">=</span><span class="n">proxy</span><span class="p">)</span>
</pre></div>
</div>
<p>New code:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="n">_proxy_ssh</span> <span class="o">=</span> <span class="n">paramiko</span><span class="o">.</span><span class="n">SSHClient</span><span class="p">()</span>
<span class="n">_proxy_ssh</span><span class="o">.</span><span class="n">set_missing_host_key_policy</span><span class="p">(</span><span class="n">paramiko</span><span class="o">.</span><span class="n">AutoAddPolicy</span><span class="p">())</span>
<span class="n">_proxy_ssh</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">proxy_host</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">proxy_username</span><span class="p">,</span>
<span class="n">pkey</span><span class="o">=</span><span class="n">proxy_private_key</span><span class="p">,</span> <span class="n">sock</span><span class="o">=</span><span class="n">proxy</span><span class="p">)</span>
<span class="n">chan</span> <span class="o">=</span> <span class="n">_proxy_ssh</span><span class="o">.</span><span class="n">get_transport</span><span class="p">()</span><span class="o">.</span><span class="n">open_session</span><span class="p">()</span>
<span class="n">chan</span><span class="o">.</span><span class="n">exec_command</span><span class="p">(</span><span class="s2">"nc </span><span class="si">{0}</span><span class="s2"> </span><span class="si">{1}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="n">SSH_PORT</span><span class="p">))</span>
<span class="n">_ssh</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">host</span><span class="p">,</span> <span class="n">username</span><span class="o">=</span><span class="n">username</span><span class="p">,</span> <span class="n">pkey</span><span class="o">=</span><span class="n">private_key</span><span class="p">,</span> <span class="n">sock</span><span class="o">=</span><span class="n">chan</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="http-access">
<h4>HTTP access</h4>
<p>Http access will be implemented in a similar way with ProxiedHTTPAdapter.
SshProxySocket class will be implemented that corresponds to netcat socket
running on remote host.</p>
<p>Note, if proxycommand present, it will be passed to paramiko directly without
involving NetcatSocket class.</p>
</section>
</section>
<section id="alternatives">
<h3>Alternatives</h3>
<p>This blueprint offers one more way to access VMs. All existing ways will remain
unchanged.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>New boolean field “is_proxy” in nodegroup and nodegroup template objects.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>One more deployment option to consider.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Checkbox in nodegroup template edit form.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:</p>
<p>Andrew Lazarev (alazarev)</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Sahara core changes</p></li>
<li><p>Python client changes</p></li>
<li><p>Horizon changes</p></li>
<li><p>Doc changes</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p>Global proxy implementation (<a class="reference external" href="https://review.openstack.org/#/c/131142/">https://review.openstack.org/#/c/131142/</a>)</p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The feature needs to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 14 Jan 2015 00:00:00 Scenario integration tests for Saharahttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/scenario-integration-tests.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/scenario-integration-tests">https://blueprints.launchpad.net/sahara/+spec/scenario-integration-tests</a></p>
<p>For now the Sahara project has not functional integration tests.
We need to create new integration tests that will be more flexible.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current integration tests allow to test only limited number of test scenarios
and cluster configurations that are hardcoded in code of tests.
In many cases we should test various configurations of Sahara clusters but
current integration tests don’t have this functionality. Also we have many
copy-pasted code in test files and this code requires large work
for its refactoring.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>It is proposed to create new integration tests that will be more flexible.
Test scenarios will be defined in YAML files and it is supposed this approach
will provide more flexibility in testing. The usual scenario will have the
following look:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span/><span class="nt">credentials</span><span class="p">:</span>
<span class="w"> </span><span class="nt">os_username</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dev-user</span>
<span class="w"> </span><span class="nt">os_password</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">swordfish</span>
<span class="w"> </span><span class="nt">os_tenant</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">devs</span>
<span class="w"> </span><span class="nt">os_auth_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://os_host:5000/v2.0</span>
<span class="w"> </span><span class="nt">sahara_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://sahara_host:8336/v1.1</span><span class="w"> </span><span class="c1"># optional</span>
<span class="nt">network</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">neutron</span><span class="w"> </span><span class="c1"># or nova-network</span>
<span class="w"> </span><span class="nt">private_network</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">private</span><span class="w"> </span><span class="c1"># for neutron</span>
<span class="w"> </span><span class="nt">auto_assignment_floating_ip</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span><span class="w"> </span><span class="c1"># for nova-network</span>
<span class="w"> </span><span class="nt">public_network</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">public</span><span class="w"> </span><span class="c1"># or floating_ip_pool for nova-network</span>
<span class="nt">clusters</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">plugin_name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">vanilla</span>
<span class="w"> </span><span class="nt">plugin_version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2.6.0</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">some_id</span>
<span class="w"> </span><span class="nt">node_group_templates</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span>
<span class="w"> </span><span class="nt">node_processes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">namenode</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">resourcemanager</span>
<span class="w"> </span><span class="nt">flavor_id</span><span class="p">:</span><span class="w"> </span><span class="s">'3'</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">worker</span>
<span class="w"> </span><span class="nt">node_processes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">datanode</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nodemanager</span>
<span class="w"> </span><span class="nt">flavor_id</span><span class="p">:</span><span class="w"> </span><span class="s">'3'</span>
<span class="w"> </span><span class="nt">cluster_template</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">vanilla</span>
<span class="w"> </span><span class="nt">node_group_templates</span><span class="p">:</span>
<span class="w"> </span><span class="nt">master</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">worker</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span>
<span class="w"> </span><span class="nt">scenario</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">run_jobs</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">scale</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">run_jobs</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">plugin_name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">hdp</span>
<span class="w"> </span><span class="nt">plugin_version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2.0.6</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">some_id</span>
</pre></div>
</div>
<p>Minimal scenario will look the following way:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span/><span class="nt">clusters</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">plugin_name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">vanilla</span>
<span class="w"> </span><span class="nt">plugin_version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2.6.0</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">some_id</span>
</pre></div>
</div>
<p>Full scenario will look the following way:</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span/><span class="nt">concurrency</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span>
<span class="nt">credentials</span><span class="p">:</span>
<span class="w"> </span><span class="nt">os_username</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">dev-user</span>
<span class="w"> </span><span class="nt">os_password</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">swordfish</span>
<span class="w"> </span><span class="nt">os_tenant</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">devs</span>
<span class="w"> </span><span class="nt">os_auth_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://os_host:5000/v2.0</span>
<span class="w"> </span><span class="nt">sahara_url</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">http://sahara_host:8336/v1.1</span><span class="w"> </span><span class="c1"># optional</span>
<span class="nt">network</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">neutron</span><span class="w"> </span><span class="c1"># or nova-network</span>
<span class="w"> </span><span class="nt">private_network</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">private</span><span class="w"> </span><span class="c1"># for neutron</span>
<span class="w"> </span><span class="nt">auto_assignment_floating_ip</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span><span class="w"> </span><span class="c1"># for nova-network</span>
<span class="w"> </span><span class="nt">public_network</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">public</span><span class="w"> </span><span class="c1"># or floating_ip_pool for nova-network</span>
<span class="nt">clusters</span><span class="p">:</span>
<span class="w"> </span><span class="c1"># required</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">plugin_name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">vanilla</span>
<span class="w"> </span><span class="c1"># required</span>
<span class="w"> </span><span class="nt">plugin_version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2.6.0</span>
<span class="w"> </span><span class="c1"># required (id or name)</span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">some_id</span>
<span class="w"> </span><span class="nt">node_group_templates</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span>
<span class="w"> </span><span class="nt">node_processes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">namenode</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">resourcemanager</span>
<span class="w"> </span><span class="nt">flavor_id</span><span class="p">:</span><span class="w"> </span><span class="s">'3'</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">>-</span>
<span class="w"> </span><span class="no">Some description</span>
<span class="w"> </span><span class="nt">volumes_per_node</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2</span>
<span class="w"> </span><span class="nt">volumes_size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2</span>
<span class="w"> </span><span class="nt">node_configs</span><span class="p">:</span>
<span class="w"> </span><span class="nt">HDFS</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dfs.datanode.du.reserved</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10</span>
<span class="w"> </span><span class="nt">security_groups</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">~</span>
<span class="w"> </span><span class="nt">auto_security_group</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
<span class="w"> </span><span class="nt">availability_zone</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nova</span>
<span class="w"> </span><span class="nt">volumes_availability_zone</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nova</span>
<span class="w"> </span><span class="nt">volume_type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">lvm</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">worker</span>
<span class="w"> </span><span class="nt">node_processes</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">datanode</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">nodemanager</span>
<span class="w"> </span><span class="nt">flavor_id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span>
<span class="w"> </span><span class="nt">cluster_template</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">vanilla</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">>-</span>
<span class="w"> </span><span class="no">Some description</span>
<span class="w"> </span><span class="nt">cluster_configs</span><span class="p">:</span>
<span class="w"> </span><span class="nt">HDFS</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dfs.replication</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">node_group_templates</span><span class="p">:</span>
<span class="w"> </span><span class="nt">master</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">worker</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span>
<span class="w"> </span><span class="nt">anti_affinity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
<span class="w"> </span><span class="nt">cluster</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">test-cluster</span>
<span class="w"> </span><span class="nt">is_transient</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span>
<span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">>-</span>
<span class="w"> </span><span class="no">Cluster description</span>
<span class="w"> </span><span class="nt">scaling</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">operation</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">resize</span>
<span class="w"> </span><span class="nt">node_group</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">worker</span>
<span class="w"> </span><span class="nt">size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">4</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">operation</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">add</span>
<span class="w"> </span><span class="nt">node_group</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">worker</span>
<span class="w"> </span><span class="nt">size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2</span>
<span class="w"> </span><span class="nt">scenario</span><span class="p">:</span><span class="w"> </span><span class="c1"># optional</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">run_jobs</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">scale</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">run_jobs</span>
<span class="w"> </span><span class="nt">edp_jobs_flow</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">example</span>
<span class="w"> </span><span class="nt">retain_resource</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"> </span><span class="c1"># optional</span>
<span class="nt">edp_jobs_flow</span><span class="p">:</span>
<span class="w"> </span><span class="nt">example</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Pig</span>
<span class="w"> </span><span class="nt">main_lib</span><span class="p">:</span>
<span class="w"> </span><span class="nt">source</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">swift</span>
<span class="w"> </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">path_to_pig_script.pig</span>
<span class="w"> </span><span class="nt">input_datasource</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">swift</span>
<span class="w"> </span><span class="nt">source</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">etc/edp-examples/edp-pig/top-todoers/data/input</span>
<span class="w"> </span><span class="nt">output_datasource</span><span class="p">:</span>
<span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">hdfs</span>
<span class="w"> </span><span class="nt">destination</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/user/hadoop/edp-output</span>
<span class="w"> </span><span class="nt">configs</span><span class="p">:</span>
<span class="w"> </span><span class="nt">dfs.replication</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Java</span>
<span class="w"> </span><span class="nt">additional_libs</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">database</span>
<span class="w"> </span><span class="nt">source</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">etc/edp-examples/.../hadoop-mapreduce-examples-2.4.1.jar</span>
<span class="w"> </span><span class="nt">configs</span><span class="p">:</span>
<span class="w"> </span><span class="nt">edp.java.main_class</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">org.apache.hadoop.examples.QuasiMonteCarlo</span>
<span class="w"> </span><span class="nt">args</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">10</span>
</pre></div>
</div>
<p>After we described test scenario in YAML file we run test as usual.
The python test code will be generated from these YAML files.</p>
<p>We will use the Mako library to generate the python code. The generated code
will look like:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span/><span class="kn">from</span> <span class="nn">sahara.tests.scenario</span> <span class="kn">import</span> <span class="n">base</span>
<span class="k">class</span> <span class="nc">vanilla2_4_1TestCase</span><span class="p">(</span><span class="n">base</span><span class="o">.</span><span class="n">BaseTestCase</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">setUpClass</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">vanilla2_4_1TestCase</span><span class="p">,</span> <span class="bp">cls</span><span class="p">)</span><span class="o">.</span><span class="n">setUpClass</span><span class="p">()</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">credentials</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'os_username'</span><span class="p">:</span> <span class="s1">'dev-user'</span><span class="p">,</span>
<span class="s1">'os_password'</span><span class="p">:</span> <span class="s1">'swordfish'</span><span class="p">,</span>
<span class="s1">'os_tenant'</span><span class="p">:</span> <span class="s1">'devs'</span><span class="p">,</span>
<span class="s1">'os_auth_url'</span><span class="p">:</span> <span class="s1">'http://172.18.168.5:5000/v2.0'</span><span class="p">,</span>
<span class="s1">'sahara_url'</span><span class="p">:</span> <span class="kc">None</span>
<span class="p">}</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">network</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'neutron'</span><span class="p">,</span>
<span class="s1">'public_network'</span><span class="p">:</span> <span class="s1">'net04_ext'</span><span class="p">,</span>
<span class="s1">'auto_assignment_floating_ip'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">'private_network'</span><span class="p">:</span> <span class="s1">'dev-network'</span>
<span class="p">}</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">testcase</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'image'</span><span class="p">:</span> <span class="s1">'sahara-juno-vanilla-2.4.1-ubuntu-14.04'</span><span class="p">,</span>
<span class="s1">'plugin_name'</span><span class="p">:</span> <span class="s1">'vanilla'</span><span class="p">,</span>
<span class="s1">'retain_resources'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">'class_name'</span><span class="p">:</span> <span class="s1">'vanilla2_4_1'</span><span class="p">,</span>
<span class="s1">'edp_jobs_flow'</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'configs'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'dfs.replication'</span><span class="p">:</span> <span class="mi">1</span>
<span class="p">},</span>
<span class="s1">'output_datasource'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'hdfs'</span><span class="p">,</span>
<span class="s1">'destination'</span><span class="p">:</span> <span class="s1">'/user/hadoop/edp-output'</span>
<span class="p">},</span>
<span class="s1">'input_datasource'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'swift'</span><span class="p">,</span>
<span class="s1">'source'</span><span class="p">:</span>
<span class="s1">'etc/edp-examples/edp-pig/top-todoers/data/input'</span>
<span class="p">},</span>
<span class="s1">'main_lib'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'swift'</span><span class="p">,</span>
<span class="s1">'source'</span><span class="p">:</span>
<span class="s1">'etc/edp-examples/edp-pig/top-todoers/example.pig'</span>
<span class="p">},</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Pig'</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'Java'</span><span class="p">,</span>
<span class="s1">'args'</span><span class="p">:</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span>
<span class="s1">'additional_libs'</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'database'</span><span class="p">,</span>
<span class="s1">'source'</span><span class="p">:</span>
<span class="s1">'etc/edp-examples/hadoop2/edp-java/'</span>
<span class="s1">'hadoop-mapreduce-examples-2.4.1.jar'</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s1">'configs'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'edp.java.main_class'</span><span class="p">:</span>
<span class="s1">'org.apache.hadoop.examples.QuasiMonteCarlo'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s1">'scenario'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'run_jobs'</span><span class="p">,</span> <span class="s1">'scale'</span><span class="p">,</span> <span class="s1">'run_jobs'</span><span class="p">],</span>
<span class="s1">'plugin_version'</span><span class="p">:</span> <span class="s1">'2.4.1'</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_plugin</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">create_cluster</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_run_jobs</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_scale</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_run_jobs</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">hdp2_0_6TestCase</span><span class="p">(</span><span class="n">base</span><span class="o">.</span><span class="n">BaseTestCase</span><span class="p">):</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">setUpClass</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">hdp2_0_6TestCase</span><span class="p">,</span> <span class="bp">cls</span><span class="p">)</span><span class="o">.</span><span class="n">setUpClass</span><span class="p">()</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">credentials</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'os_username'</span><span class="p">:</span> <span class="s1">'dev-user'</span><span class="p">,</span>
<span class="s1">'os_password'</span><span class="p">:</span> <span class="s1">'swordfish'</span><span class="p">,</span>
<span class="s1">'os_tenant'</span><span class="p">:</span> <span class="s1">'devs'</span><span class="p">,</span>
<span class="s1">'os_auth_url'</span><span class="p">:</span> <span class="s1">'http://172.18.168.5:5000/v2.0'</span><span class="p">,</span>
<span class="s1">'sahara_url'</span><span class="p">:</span> <span class="kc">None</span>
<span class="p">}</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">network</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'neutron'</span><span class="p">,</span>
<span class="s1">'public_network'</span><span class="p">:</span> <span class="s1">'net04_ext'</span><span class="p">,</span>
<span class="s1">'auto_assignment_floating_ip'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">'private_network'</span><span class="p">:</span> <span class="s1">'dev-network'</span>
<span class="p">}</span>
<span class="bp">cls</span><span class="o">.</span><span class="n">testcase</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'image'</span><span class="p">:</span> <span class="s1">'f3c4a228-9ba4-41f1-b100-a0587689d4dd'</span><span class="p">,</span>
<span class="s1">'plugin_name'</span><span class="p">:</span> <span class="s1">'hdp'</span><span class="p">,</span>
<span class="s1">'retain_resources'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s1">'class_name'</span><span class="p">:</span> <span class="s1">'hdp2_0_6'</span><span class="p">,</span>
<span class="s1">'edp_jobs_flow'</span><span class="p">:</span> <span class="kc">None</span><span class="p">,</span>
<span class="s1">'scenario'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'run_jobs'</span><span class="p">,</span> <span class="s1">'scale'</span><span class="p">,</span> <span class="s1">'run_jobs'</span><span class="p">],</span>
<span class="s1">'scaling'</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'operation'</span><span class="p">:</span> <span class="s1">'resize'</span><span class="p">,</span>
<span class="s1">'size'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s1">'node_group'</span><span class="p">:</span> <span class="s1">'worker'</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s1">'plugin_version'</span><span class="p">:</span> <span class="s1">'2.0.6'</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_plugin</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">create_cluster</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_run_jobs</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_scale</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_run_jobs</span><span class="p">()</span>
</pre></div>
</div>
<p>Mako template will look the following way:</p>
<div class="highlight-mako notranslate"><div class="highlight"><pre><span/><span class="x">from sahara.tests.scenario import base</span>
<span class="cp">%</span> <span class="k">for</span> <span class="n">testcase</span> <span class="ow">in</span> <span class="n">testcases</span><span class="p">:</span>
<span class="x"> </span><span class="cp">${</span><span class="n">make_testcase</span><span class="p">(</span><span class="n">testcase</span><span class="p">)</span><span class="cp">}</span>
<span class="cp">%</span><span class="k"> endfor</span>
<span class="cp"><%</span><span class="nb">def</span> <span class="na">name=</span><span class="s">"make_testcase(testcase)"</span><span class="cp">></span>
<span class="x">class </span><span class="cp">${</span><span class="n">testcase</span><span class="p">[</span><span class="s1">'class_name'</span><span class="p">]</span><span class="cp">}</span><span class="x">TestCase(base.BaseTestCase):</span>
<span class="x"> @classmethod</span>
<span class="x"> def setUpClass(cls):</span>
<span class="x"> super(</span><span class="cp">${</span><span class="n">testcase</span><span class="p">[</span><span class="s1">'class_name'</span><span class="p">]</span><span class="cp">}</span><span class="x">TestCase, cls).setUpClass()</span>
<span class="x"> cls.credentials = </span><span class="cp">${</span><span class="n">credentials</span><span class="cp">}</span>
<span class="x"> cls.network = </span><span class="cp">${</span><span class="n">network</span><span class="cp">}</span>
<span class="x"> cls.testcase = </span><span class="cp">${</span><span class="n">testcase</span><span class="cp">}</span>
<span class="x"> def test_plugin(self):</span>
<span class="x"> self.create_cluster()</span>
<span class="w"> </span><span class="cp">%</span> <span class="k">for</span> <span class="n">check</span> <span class="ow">in</span> <span class="n">testcase</span><span class="p">[</span><span class="s1">'scenario'</span><span class="p">]:</span>
<span class="x"> self.check_</span><span class="cp">${</span><span class="n">check</span><span class="cp">}</span><span class="x">()</span>
<span class="w"> </span><span class="cp">%</span><span class="k"> endfor</span>
<span class="cp"></%</span><span class="nb">def</span><span class="cp">></span>
</pre></div>
</div>
<p>By default concurrency will be equal to number of cpu cores. This value
can be changed in YAML file.</p>
<p>We are going to use new integration tests for CI as soon as they are completely
implemented.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We can use current integration tests for further testing Sahara but they
don’t have sufficient coverage of Sahara use cases.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers will be able much better to test their changes in Sahara.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>sreshetniak</p>
</dd>
<dt>Other contributors:</dt><dd><p>ylobankov
slukjanov</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will be the following:</p>
<ul class="simple">
<li><p>Add python code to sahara/tests</p></li>
<li><p>Add examples of test scenarios</p></li>
<li><p>Add documentation for new integration tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Mako
tempest-lib</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>We need to add a note about new tests in Sahara documentation.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 14 Jan 2015 00:00:00 Spec - Add support for editing templateshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/support-template-editing.html
<p>Blueprints:</p>
<p>Sahara Service: <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/support-template-editing">https://blueprints.launchpad.net/sahara/+spec/support-template-editing</a></p>
<p>Horizon UI: <a class="reference external" href="https://blueprints.launchpad.net/horizon/+spec/data-processing-edit-templates">https://blueprints.launchpad.net/horizon/+spec/data-processing-edit-templates</a></p>
<p>Currently, once a node group template or a cluster template has been
created, the only operations available are “copy” or “delete”. That has
unfortunate consequences on a user’s workflow when they want to change a
template. For example, in order to make even a small change to a node group
template that is used by a cluster template, the user must make a copy of
the node group template, change it, save it, and then create a new (or copy)
cluster template that uses the new node group template. Ideally,
a user could just edit the template in place and move on.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara currently lacks implementation for the update operation for both node
group and cluster templates. In order to provide an implementation for
these operations, the following issues must be addressed.</p>
<p>1. What to do with existing clusters that have been built by using a
template that is being edited. Would we automatically scale a cluster if a
cluster template was changed in a way that added/removed nodes? Would we
shut down or start up processes if a node group template changed to
add/remove processes?</p>
<p>I propose that for this iteration, a user may not change any templates that
were used to build a currently running cluster. This is consistent with the
rule currently in place to not allow deletion of templates that were are
used by a currently active cluster. A future iteration could remove that
restriction for both delete and edit.
A user could still make a copy of a template and edit that if they wanted to
start a new cluster with the altered version of the template.</p>
<p>2. Make sure that all cluster templates that are dependant upon an edited
node group template will pick-up the changes to the node group template.</p>
<p>This is only a problem if we go the route of allowing templates used by an
active cluster to be edited. In that case, we would need to change the
existing templates to reference the newly created template.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The put (update) method will be implemented for both node group and cluster
templates. Currently, the stubs are there, but they just return “Not
Implemented”.</p>
<p>I think that the simplest method of sending an updated template is to send
the entire new template rather than just a diff. It could be slightly
inefficient to do it this way, but avoids the need to build-in the diff
logic in the UI. The current client library methods for update() seem to be
anticipating that sort of implementation. If we went to sending a diff,
we may need to adjust the client library.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Updates could be achieved by doing a delete/add combination,
but that is not a very good solution to this problem since any given node
group or cluster template could be referenced by another object. Deleting
and adding would require updating each referencing object.</p>
<p>If we need to be able to edit templates that are used by an active cluster,
the edited version of the template will retain the ID and name of the
original template. Prior to overwriting the original template,
it will be saved with a new ID and some form of “dotted” name to
indicate the version (“origname.1”). All running clusters would be changed
to reference the original template with the new ID.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>N/A</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>N/A.
The update operation are already defined in the API, but they are not yet
implemented.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The Horizon UI will be updated to include “Edit” as an operation for each
node group and cluster template. The edit button will appear in the list of
operations in each row of the table.</p>
<p>The python-saharaclient library already defines the update() methods that
will be used for implementing this feature in the Horizon UI. No changes to
python-saharaclient are anticipated to support this feature.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>N/A</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>N/A</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>N/A</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Horizon will be updated to include an “edit” button in each row of both the
node group and cluster templates tables. That edit button will bring up the
edit form (essentially the “create” form, but with all the values
pre-populated from the existing template). Clicking on “Save” from the edit
form will result in a call to the node group/cluster template update()
method in the python-saharaclient library.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>croberts</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Horizon UI: croberts
Sahara service: tmckay</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>N/A</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>There will be unit and integration tests added in Sahara.</p>
<p>Horizon will have a test added to the node group and cluster template panels
to verify that the appropriate forms are generated when edit is chosen.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The Sahara UI user guide should be updated to note the availability of edit
functionality for templates.</p>
</section>
<section id="references">
<h2>References</h2>
<p>N/A</p>
</section>
Wed, 14 Jan 2015 00:00:00 Better Version Management in Cloudera Pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/cdh-version-management.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-version-management">https://blueprints.launchpad.net/sahara/+spec/cdh-version-management</a></p>
<p>This specification proposes to manage different versions of CDH in Cloudera
plugin in a better solution.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current situation of CDH version management in CDH plugin:</p>
<ul class="simple">
<li><p>We do not have version control for CDH plugin at all.</p></li>
<li><p>Due to backward support requirement, all existing codes need to support
CDH version from 5.0.0 to latest (currently it is 5.3.0).</p></li>
</ul>
<p>This will cause below issues:</p>
<ul class="simple">
<li><p>When new packages are released, we do not know whether our plugin still
can work. In fact nobody can ensure that.</p></li>
<li><p>Our work have to be based on a non-static environment.</p></li>
<li><p>Backward-compatibility is desired, but it is not ensured at all. For
example, we are not sure tomorrow whether a new released CDH package will
be compatible with old plugin codes or configuration files.</p></li>
<li><p>CI-test results are not stable, so we cannot always turn it into voting.</p></li>
<li><p>If new released package version bring issues, it can block all developers
even if they do not work on this version.</p></li>
<li><p>When we do not want to support some obsolete versions, we cannot easily
remove it.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<ul>
<li><p>Add package version constraints, and give a key packages version list to
support CDH5.3.0 (or 5.2.0), and base our development only on this.</p></li>
<li><p>Freeze the package to prevent later CDH release come in.</p></li>
<li><p>Add sub-directories in plugin/cdh, like 5_3_0 for CDH5.3.0 to support
different versions of CDH. As we did for vanilla and hdp plugins.</p></li>
<li><p>We also need to do some modifications in sahara-image-elements project to
support different versions of CDH image build.</p></li>
<li><p>We need to change sahara-ci-config project to break current cdh tests into
several tests for different CDH versions. All new added versions should not
break old version tests. For example, current
gate-sahara-nova-direct-cdh_ubuntu-aio will be separated into
sahara-nova-direct-cdh_5.0_ubuntu-aio and
sahara-nova-direct-cdh_5.3.0_ubuntu-aio.</p></li>
<li><p>Break sahara/tests/integration/tests/gating/test_cdh_gating.py into
several files, like test_cdh5_0_0_gating.py and test_cdh5_3_0_gating.py to
support different CDH versions.</p></li>
<li><p>We need not ensure backward compatibility. E.g., CDH5.3.0 codes are not
required to work for CDH5.0.0. We may add some features and codes only for
later version of CDH, which was not supported by former CDH version.</p></li>
<li><p>If we want to add more CDH versions in the future, we need to open a BP to
do this. For example, if we want to support CDH 5.4.0 in the future, below
works are required:</p>
<blockquote>
<div><ul class="simple">
<li><p>Add a directory 5_3_0 including codes to support CDH 5.3.0 in
plugin/cdh.</p></li>
<li><p>Modify scripts in sahara/image-elements/elements/hadoop_cloudera, to
add functions to install packages for CDH 5.4.0, while still keeping
functions installing packages for former CDH versions.</p></li>
<li><p>Add a test_cdh5_4_0_gating.py in sahara/tests/integration/tests/gating
directory for integration test.</p></li>
<li><p>Add a new ci-test item in sahara-ci-config, like
gate-sahara-nova-direct-cdh_5.4.0_ubuntu-aio. We can set this item as
non-voting at the beginning, and after it is tuned and verified ok, we
set it back to voting.</p></li>
</ul>
</div></blockquote>
</li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ken chen</p>
</dd>
<dt>Other contributors:</dt><dd><p>ken chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items will include:</p>
<ul class="simple">
<li><p>Change directory structure in sahara/sahara/plugins/cdh to add 5_3_0 and
5_0_0 for CDH5.0.0 and 5.3.0 (We will support those two versions as the
first step).</p></li>
<li><p>Retain common codes for each version in the original place, and move
version specific codes and files into their own directory.</p></li>
<li><p>Add codes in sahara-image-elements/elements/hadoop-cloudera to support
installing different package groups for different CDH versions.</p></li>
<li><p>Add different test items in sahara/tests/integration/tests/gating directory
to support different CDH versions</p></li>
<li><p>Add item in sahara-ci-configs project for different CDH versions. At first
we mark it as non-voting. After it is verified, we can mark it as voting.</p></li>
<li><p>Test and evaluate the change.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Create clusters of different CDH versions one by one, and do integration tests
for each of them.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 07 Jan 2015 00:00:00 Support multi-worker Sahara API deploymenthttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/sahara-api-workers.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-api-workers">https://blueprints.launchpad.net/sahara/+spec/sahara-api-workers</a></p>
<p>Add support of multi-worker deployment of Sahara API.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently Sahara API uses one thread with wsgi application. This means that
API can service only one request at a time. Some requests (e.g. cluster scale)
require synchronous requests to Sahara engine (using message queue). This
means that Sahara API will not be able to service other requests until full
round trip finished.</p>
<p>Also, multi-threaded solution gives much more options for performance tuning.
It could allow to utilize more server CPU power.</p>
<p>Most of OpenStack services support several workers for API.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The ideal solution for that would be migration to Pecan and WSME (
<a class="reference external" href="https://etherpad.openstack.org/havana-common-wsgi">https://etherpad.openstack.org/havana-common-wsgi</a>) with multi-threading
support. Although this will require a lot of work and there is no much
pressure to do that.</p>
<p>This spec suggests simple solution of specific problem without much
refactoring of existing code.</p>
<p>So, the solution is:
1. Leave current wsgi implementation
2. Leave current socket handling
3. Run wsgi server in several threads/processes
4. Implement only children processes management, leave all existing code as is.</p>
<p>Children processes management will include:</p>
<ol class="arabic simple">
<li><p>Handling of children processes, restart of dead processes</p></li>
<li><p>Proper signals handling (see <a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1276694">https://bugs.launchpad.net/sahara/+bug/1276694</a>)</p></li>
<li><p>Graceful shutdown</p></li>
<li><p>Support of debug mode (with green threads instead of real threads)</p></li>
</ol>
<p>Things that will NOT be included:
1. Config reload / API restart</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Migrate to Pecan and WSME first.</p>
</section>
<section id="implementation-details">
<h3>Implementation details</h3>
<p>Most OpenStack services use deprecated oslo wsgi module. It has tight
connections with oslo services module.</p>
<p>So, there are three options here:</p>
<ol class="arabic simple">
<li><p>Use deprecated oslo wsgi module. (Bad, since module is deprecated)</p></li>
<li><p>Use oslo services module, but write all wsgi stuff ourselves (or copy-paste
from other project).</p></li>
<li><p>Write minimum code to make server start multi-worker (e.g. see how it is
implemented in Heat).</p></li>
</ol>
<p>I propose going with the option 3. There is no much sense spending resources
for code, that will be replaced anyway (we will definitely migrate to Pecan
some day).</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>One more configuration parameter.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement feature</p></li>
<li><p>Document feature</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manually. Probably CI could be changed to run different tests in
different modes.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Wed, 07 Jan 2015 00:00:00 HTTPS support for sahara-apihttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/sahara-support-https.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/sahara-support-https">https://blueprints.launchpad.net/sahara/+spec/sahara-support-https</a></p>
<p>Most OpenStack services support running server supporting HTTPS connections.
Sahara must support such way too.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There are two common ways to enable HTTPS for the OpenStack service:</p>
<ol class="arabic simple">
<li><p>TLS proxy. Proxy communicates with user via HTTPS and redirects all
requests to service via unsecured HTTP. Keystone is configured to point
on HTTPS port. Internal port is usually closed for outside using firewall.
No additional work to support HTTPS required on service side.</p></li>
<li><p>Native support. Service can be configured to expect HTTPS connections. In
this case service handles all security aspects by itself.</p></li>
</ol>
<p>Most OpenStack services support both types of enabling SSL. Sahara currently
can be secured only using TLS proxy.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add ability to Sahara API to listen on HTTPS port.</p>
<p>Currently there is no unified way for OpenStack services to work with HTTPS.
Process of unification started with sslutils module in oslo-incubator. Sahara
could use this module to be on the same page with other services.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Copy-paste SSL-related code from other OpenStack project.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<ul class="simple">
<li><p>python-saharaclient should support SSL-related options</p></li>
</ul>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>One more option to consider.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Add SSL-related parameters to pass to python client.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement feature in Sahara</p>
<ul>
<li><p>Import sslutils</p></li>
<li><p>Configure WSGI server to HTTPS</p></li>
</ul>
</li>
<li><p>Add support to python-saharaclient</p></li>
<li><p>Add support to devstack</p></li>
<li><p>Add documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p>sslutils module from oslo-incubator</p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Devstack doesn’t have HTTPS testing for now. It looks like manual testing is
the only option.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 06 Jan 2015 00:00:00 Spark Temporary Job Data Retention and Cleanuphttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/spark-cleanup.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-cleanup">https://blueprints.launchpad.net/sahara/+spec/spark-cleanup</a></p>
<p>Creates a configurable cron job at cluster configuration time to clean up data
from Spark jobs, in order to ease maintenance of long-lived clusters.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The current Spark plugin stores data from any job run in the /tmp directory,
without an expiration policy. While this is acceptable for short-lived
clusters, it increases maintenance on long-running clusters, which are likely
to run out of space in time. A mechanism to automatically clear space is
needed.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>On the creation of any new Spark cluster, a script (from
sahara.plugins.spark/resources) will be templated with the following
variables (which will be defined in Spark’s config_helper module and thus
defined per cluster):</p>
<ul class="simple">
<li><p>Minimum Cleanup Seconds</p></li>
<li><p>Maximum Cleanup Seconds</p></li>
<li><p>Minimum Cleanup Megabytes</p></li>
</ul>
<p>That script will then be pushed to /etc/hadoop/tmp_cleanup.sh. In the
following cases, no script will be pushed:</p>
<ol class="arabic simple">
<li><p>Maximum Cleanup Seconds is 0 (or less)</p></li>
<li><p>Minimum Cleanup Seconds and Minimum Cleanup Megabytes are both 0 (or less)</p></li>
</ol>
<p>Also at cluster configuration time, a cron job will be created to run this
script once per hour.</p>
<p>This script will iterate over each extant job directory on the cluster; if it
finds one older than Maximum Cleanup Seconds, it will delete that directory.
It will then check the size of the set of remaining directories. If there is
more data than Minimum Cleanup Megabytes, then it will delete directories
older than Minimum Cleanup Seconds, starting with the oldest, until the
remaining data is smaller than Minimum Cleanup Megabytes or no sufficiently
aged directories remain.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Any number of more complex schemes could be developed to address this problem,
including per-job retention information, data priority assignment (to
effectively create a priority queue for deletion,) and others. The above plan,
however, while it does allow for individual cluster types to have individual
retention policies, does not demand excessive maintenance or interface with
that policy after cluster creation, which will likely be appropriate for most
users. A complex retention and archival strategy exceeds the intended scope of
this convenience feature, and could easily become an entire project.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None; all new data will be stored as cluster configuration.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None; operative Spark cluster template configuration parameters will be
documented the current interface allows this change.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None. (config_helper.py variables will be automatically represented in
Horizon.)</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Creation of periodic job</p></li>
<li><p>Creation of deletion script</p></li>
<li><p>Testing</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Because this feature is entirely Sahara-internal, and requires only a remote
shell connection to the Spark cluster (without which many, many other tests
would fail) I believe that Tempest tests of this feature are unnecessary. Unit
tests should be sufficient to cover this feature.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The variables used to set retention policy will need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/spark-cleanup">https://blueprints.launchpad.net/sahara/+spec/spark-cleanup</a></p>
</section>
Wed, 31 Dec 2014 00:00:00 CDH HBase Supporthttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/cdh-hbase-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-hbase-support">https://blueprints.launchpad.net/sahara/+spec/cdh-hbase-support</a></p>
<p>This specification proposes to add HBase support for CDH Plugin in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>There is no HBase support in current cdh plugin, but Cloudera Manager
supports to install this service in a cluster. HBase is a non-relational,
distributed database model and can provide BigTable-like capabilities for
Hadoop. This service should be supported in Sahara cdh plugin.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation will support CDH 5.0.0.
Support Features:</p>
<ul class="simple">
<li><p>Install HBase processes in a CDH cluster using cm-api</p></li>
<li><p>Zookeeper must be selected and launched in a cluster first</p></li>
<li><p>Support HMaster and Multiple HRegion processes in a cluster</p></li>
<li><p>Most Configuration Parameters Support in a CDH cluster</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The end users need to select HMaster and HRegion process in node group
templates.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>It will be required to install the necessary HBase package in the cdh image.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>lu huichun</p>
</dd>
<dt>Other contributors:</dt><dd><p>weiting-chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items can be divided to several parts:</p>
<ul class="simple">
<li><p>Investigate HBase service in a CDH cluster via Cloudera Manager(CM)</p></li>
<li><p>Leverage CM-API Client to call functions to install Zookeep via CM</p></li>
<li><p>Test and Evaluate the Concept</p></li>
<li><p>Implement Source Code in Sahara cdh plugin</p></li>
<li><p>Test the Code</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Zookeeper process must be installed first in the cluster. HBase needs to use
Zookeeper service.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Writing Unit Tests to basically test the configuration. It also required to
have an integration test with the cluster creation.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Add HBase in the list and add more information about the configuration.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/">http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/</a></p></li>
</ul>
</section>
Wed, 24 Dec 2014 00:00:00 CDH Zookeeper Supporthttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/cdh-zookeeper-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-zookeeper-support">https://blueprints.launchpad.net/sahara/+spec/cdh-zookeeper-support</a></p>
<p>This specification proposes to add Zookeeper support for CDH Plugin in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently, Zookeeper isn’t supported in the cdh plugin. Zookeeper is a
centralized service to provide functions like maintaining configuration,
distributed synchronization, and providing group services. It’s important to
have a Zookeeper to prevent data loss and to avoid a single point failure(SPoF)
in a cluster. It has become a basic service to deploy a hadoop cluster in CDH
environment.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation will support CDH 5.0.0.
Support Features:</p>
<ul class="simple">
<li><p>Install Zookeeper Service in a CDH cluster</p></li>
<li><p>Support to run Standalone Zookeeper in a cluster</p></li>
<li><p>Support to run Replicated Zookeepers(Multiple Servers) in a cluster</p></li>
<li><p>To have an option to select Zookeeper in the node group templates</p></li>
<li><p>Most Configuration Parameters Support in a CDH cluster</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>The end users need to select Zookeeper process in their node group templates.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>It will be required to put the necessary packages in the cdh image.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>ken chen</p>
</dd>
<dt>Other contributors:</dt><dd><p>weiting-chen</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The work items can be divided to several parts:</p>
<ul class="simple">
<li><p>Investigate Zookeeper service in a CDH cluster via Cloudera Manager(CM)</p></li>
<li><p>Leverage CM-API Client to call functions to install Zookeep via CM</p></li>
<li><p>Test and Evaluate the Concept</p></li>
<li><p>Implement Source Code in Sahara cdh plugin</p></li>
<li><p>Test the Code</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Writing Unit Tests to basically test the configuration. It is also required
to have an integration test with the cluster creation.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Add Zookeeper in the list and add more information about the configuration.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/">http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/</a></p></li>
</ul>
</section>
Wed, 24 Dec 2014 00:00:00 Remove support of Hadoop 2.3.0 in Vanilla pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/drop-hadoop-2.3-support.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/drop-hadoop-2-3-support">https://blueprints.launchpad.net/sahara/+spec/drop-hadoop-2-3-support</a></p>
<p>At the ATL Design summit it was decided to have 1 OpenStack release cycle
management for support of previous Hadoop versions in Vanilla plugin.
So it’s time to remove 2.3.0 plugin.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Current Sahara code contains Vanilla Hadoop 2.3.0 in sources.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The proposed change is to remove all Hadoop 2.3 and any related mentions
from Sahara code and subprojects.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users will not be able to deploy Hadoop 2.3</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>Hadoop 2.3 related elements should be removed</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:
Sergey Reshetnyak (sreshetniak)</p>
<p>Other Assignees:
Sergey Lukjanov (slukjanov) - dib cleanup</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Remove Vanilla plugin 2.3.0 from Sahara sources</p></li>
<li><p>Remove DIB stuff related to 2.3.0</p></li>
<li><p>Clear unit/integration tests</p></li>
<li><p>Replace EDP examples with newest version of hadoop-examples</p></li>
<li><p>Documentation changes</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<ul class="simple">
<li><p>Documentation must to be updated accordingly</p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 24 Dec 2014 00:00:00 Exceptions improvementhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/exceptions-improvement.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/exceptions-improvement">https://blueprints.launchpad.net/sahara/+spec/exceptions-improvement</a></p>
<p>This specification proposes to add identifiers for every raised Sahara
exception, so that they can be easily found in logs.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now it’s hard to find an error in logs, especially if there are a lot of
errors of the same type which occurs when a lot of operations are executed
simultaneously. They can produce a bunch of similar exceptions (error code
doesn’t help in this kind of situation).</p>
<p>It would be nice to have an opportunity to find exceptions by unique
identifiers. This identifiers will be found in Horizon tab with events that
will be implemented in this spec: <a class="reference external" href="https://review.openstack.org/#/c/119052/">https://review.openstack.org/#/c/119052/</a>.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Support Features:</p>
<ul class="simple">
<li><p>Every error that has been raised during the workflow will have, besides of
error message, uuid property, whereby error can be found in logs easily.</p></li>
</ul>
<p>For example, NotFoundException will leave in logs:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">NotFoundException: Error ID: 7a229eda-f630-4153-be03-d71d6467f2f4</span>
<span class="go">Object 'object' is not found</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>apavlov-n</p>
</dd>
<dt>Other contributors:</dt><dd><p>sreshetniak</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Adding ability to generate unique identifiers for SaharaException class</p></li>
<li><p>Change messages of Sahara exceptions so that all of them contain
identifier.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Wed, 24 Dec 2014 00:00:00 Make anti affinity working via server groupshttps://specs.openstack.org/openstack/sahara-specs/specs/juno/anti-affinity-via-server-groups.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/anti-affinity-via-server-groups">https://blueprints.launchpad.net/sahara/+spec/anti-affinity-via-server-groups</a></p>
<p>Server groups is an openstack way to implement anti affinity. Anti affinity
was implemented in Sahara before server groups were introduced in nova. Now it
is time to replace custom solution with the common one.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Direct engine uses manual scheduler hints for anti affinity.</p>
<p>Heat engine has limited anti affinity support and also uses scheduler hints
(<a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1268610">https://bugs.launchpad.net/sahara/+bug/1268610</a>).</p>
<p>Nova has generic mechanism for this purpose.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Proposed solution is to switch both engines to implementation that uses server
groups.</p>
<p>Current server group implementation has limitation that each server could
belong to a single server group only. We can handle this constraint by having
one server group per cluster. In this case each instance with affected
processes will be included to this server group. So, with such implementation
there will be no several affected instances on the same host even if they
don’t have common processes. Such implementation is fully compliant with all
documentation we have about anti affinity.</p>
<p>We need to keep backward compatibility for direct engine. Users should be
able to scale clusters deployed on Icehouse release. Sahara should update
already spawned VMs accordingly. Proposed solution - Sahara should check
server group existence during scaling and update whole cluster if server group
is missing.</p>
<p>We don’t care about backward compatibility for heat engine. It was in beta
state for Icehouse and there are other changes that break it.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>We can implement anti affinity via server groups in heat engine only. We will
stop support of direct engine somewhere in the future. So, we can freeze
current behavior in direct engine and don’t change it until it is deprecated
and removed.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Anti affinity behavior changes in case of several involved processes. Before
the change there was possible situation when several instances with affected
(but different) processes are spawned on the same host. After the change all
instances with affected processes will be scheduled to different hosts.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Implement change</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Will be covered by current integration tests.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need note in upgrade notes about anti affinity behavior change.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>Team meeting minutes: <a class="reference external" href="http://eavesdrop.openstack.org/meetings/sahara/2014/sahara.2014-08-21-18.02.html">http://eavesdrop.openstack.org/meetings/sahara/2014/sahara.2014-08-21-18.02.html</a></p></li>
</ul>
</section>
Tue, 25 Nov 2014 00:00:00 Append to a remote existing filehttps://specs.openstack.org/openstack/sahara-specs/specs/juno/append-to-remote-file.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/append-to-remote-file">https://blueprints.launchpad.net/sahara/+spec/append-to-remote-file</a></p>
<p>Sahara utils remote can only create a new file and write to it or replace a
line for a new one, but it can’t append to an existing file. This bp aims to
implement this feature.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>When managing remote files, sahara can only create new files and replace lines
from existing one. The feature to append to an existing file doesn’t exist
and it is necessary.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Implement this feature following the idea of the write_to_file method
The code is basically the same, the change will be the method of opening
the file. Write uses ‘w’ we need to use ‘a’.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee:</p>
<blockquote>
<div><ul class="simple">
<li><p>tellesmvn</p></li>
</ul>
</div></blockquote>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The implementation is very basic, the idea is similar to the write_file,
the necessary change is to open the remote file in append mode.</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None for now.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 Plugin for CDH with Cloudera Managerhttps://specs.openstack.org/openstack/sahara-specs/specs/juno/cdh-plugin.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cdh-plugin">https://blueprints.launchpad.net/sahara/+spec/cdh-plugin</a></p>
<p>This specification proposes to add CDH plugin with Cloudera Distribution of
Hadoop and Cloudera Manager in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Cloudera is open-source Apache Hadoop distribution, CDH (Cloudera Distribution
Including Apache Hadoop). CDH contains the main, core elements of Hadoop that
provide reliable, scalable distributed data processing of large data sets
(chiefly MapReduce and HDFS), as well as other enterprise-oriented components
that provide security, high availability, and integration with hardware and
other software. Cloudera Manager is the industry’s first end-to-end management
application for Apache Hadoop. Cloudera Manager provides many useful features
for monitoring the health and performance of the components of your cluster
(hosts, service daemons) as well as the performance and resource demands of
the user jobs running on your cluster. [1]</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>CDH plugin implementation will support Cloudera Manager version 5 and CDH
version 5.</p>
<p>Plugin will support key Sahara features:</p>
<ul class="simple">
<li><p>Cinder integration</p></li>
<li><p>Cluster scaling</p></li>
<li><p>EDP</p></li>
<li><p>Cluster topology validation</p></li>
<li><p>Integration with Swift</p></li>
<li><p>Data locality</p></li>
</ul>
<p>Plugin will be able to install following services:</p>
<ul class="simple">
<li><p>Cloudera Manager</p></li>
<li><p>HDFS</p></li>
<li><p>YARN</p></li>
<li><p>Oozie</p></li>
</ul>
<p>CDH plugin will support the following OS: Ubuntu 12.04 and CentOS 6.5.
CDH provisioning plugin will support mirrors with CDH and CM packages.</p>
<p>By default CDH doesn’t support Hadoop Swift library. Integration with Swift
should be added to CDH plugin. CDH maven repository contains Hadoop Swift
library. [2]</p>
<p>CDH plugin will support the following processes:</p>
<ul class="simple">
<li><p>MANAGER - Cloudera Manager, master process</p></li>
<li><p>NAMENODE - HDFS NameNode, master process</p></li>
<li><p>SECONDARYNAMENODE - HDFS SecondaryNameNode, master process</p></li>
<li><p>RESOURCEMANAGER - YARN ResourceManager, master process</p></li>
<li><p>JOBHISTORY - YARN JobHistoryServer, master process</p></li>
<li><p>OOZIE - Oozie server, master process</p></li>
<li><p>DATANODE - HDFS DataNode, worker process</p></li>
<li><p>NODEMANAGER - YARN NodeManager, worker process</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>CDH plugin must be support vanilla images and images with Cloudera packages.
For building pre-installed images with Cloudera packages use specific CDH
elements.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>sreshetniak</p>
</dd>
<dt>Other contributors:</dt><dd><p>iberezovskiy</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add implementation of plugin</p></li>
<li><p>Add jobs in Sahara-ci</p></li>
<li><p>Add integration tests</p></li>
<li><p>Add elements to Sahara-image-elements for building images with pre-installed
Cloudera packages</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements, needs a <cite>cm_api</cite> python library version
6.0.2, which is not present in OS requirements. [3] Need to add <cite>cm_api</cite> to
OS requirements. [4]</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Add unit tests to Sahara to cover basic functionality of plugin</p></li>
<li><p>Add integration tests to Sahara</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>CDH plugin documentation should be added to the plugin section of Sahara docs.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html">http://www.cloudera.com/content/cloudera/en/products-and-services/cdh.html</a></p></li>
<li><p>[2] <a class="reference external" href="https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-openstack/">https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-openstack/</a></p></li>
<li><p>[3] <a class="reference external" href="https://pypi.python.org/pypi/cm-api">https://pypi.python.org/pypi/cm-api</a></p></li>
<li><p>[4] <a class="reference external" href="https://review.openstack.org/#/c/106011/">https://review.openstack.org/#/c/106011/</a></p></li>
</ul>
</section>
Tue, 25 Nov 2014 00:00:00 Specification for integration Sahara with Ceilometerhttps://specs.openstack.org/openstack/sahara-specs/specs/juno/ceilometer-integration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/ceilometer-integration">https://blueprints.launchpad.net/sahara/+spec/ceilometer-integration</a></p>
<p>It is impossible to send notifications from Sahara to Ceilometer now. Sahara
should have an ability to send notifications about cluster modifications,
for example about creating/updating/destoroying cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>New feature will provide the following ability:</p>
<ul class="simple">
<li><p>Sending notifications to Ceilometer about cluster modifications, which
can help for user to achive some information about clusters which he have:
number of active clusters in each moment and etc.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Change will consist of following modifications:</p>
<ul class="simple">
<li><p>Adding to Sahara an ability to send notifications.</p></li>
<li><p>Adding to Sahara sending notificitions in such places, where cluster is
modified</p></li>
<li><p>Adding to Ceilometer an ability to pull notifications from Sahara exchange
and to parse it.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee:
vgridnev</p>
<p>Other contributors:
slukjanov</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add notification sender in Sahara</p></li>
<li><p>Add Ceilometer parser</p></li>
<li><p>Add unit tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>There will be:</p>
<blockquote>
<div><ul class="simple">
<li><p>unit tests in Sahara</p></li>
<li><p>unit tests in Ceilometer</p></li>
</ul>
</div></blockquote>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Need modifications in Ceilometer documentation here:</p>
<ul class="simple">
<li><p>[1] <a class="reference external" href="http://docs.openstack.org/developer/ceilometer/measurements.html">http://docs.openstack.org/developer/ceilometer/measurements.html</a></p></li>
<li><p>[2] <a class="reference external" href="http://docs.openstack.org/developer/ceilometer/install/manual.html">http://docs.openstack.org/developer/ceilometer/install/manual.html</a></p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="https://review.openstack.org/#/c/108982/">https://review.openstack.org/#/c/108982/</a></p></li>
</ul>
</section>
Tue, 25 Nov 2014 00:00:00 Store Sahara configuration in cluster propertieshttps://specs.openstack.org/openstack/sahara-specs/specs/juno/cluster-persist-sahara-configuration.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cluster-persist-sahara-configuration">https://blueprints.launchpad.net/sahara/+spec/cluster-persist-sahara-configuration</a></p>
<p>It is important to know Sahara version and configuration for proper cluster
migration and preventing dangerous operations.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now Sahara has no way to know conditions when cluster was created. Cluster
operations after Openstack upgrade or switching infrastructure engine could be
dangerous and cause data loss.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Store main Sahara properties (version, type and version of infrastructure
engine, etc) to cluster DB object. This will allow to prevent dangerous
operations and notify user gracefully.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Don’t store any information about Sahara settings. Always assume that settings
didn’t change and we have current or previous release of OpenStack.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>New field in cluster object. Probably this should be dictionary with defined
keys.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>We can expose information about Sahara settings to user. Or not.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Some error messages could become more descriptive.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>More options to handle errors.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Implement change</p>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Manual</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 Security groups management in Saharahttps://specs.openstack.org/openstack/sahara-specs/specs/juno/cluster-secgroups.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/cluster-secgroups">https://blueprints.launchpad.net/sahara/+spec/cluster-secgroups</a></p>
<p>It is not acceptable for production use to require default security group with
all ports open. Sahara need more flexible way to work with security groups.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Now Sahara doesn’t manage security groups and use default security group for
instances provisioning.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Solution will consist of several parts:</p>
<ol class="arabic simple">
<li><p>Allow user to specify list of security groups for each of node groups.</p></li>
<li><p>Add support of automatic security group creation. Sahara knows everything
to create security group with required ports open. In the first iteration
this will be security group with all exposed ports open for all networks.</p></li>
</ol>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Creation of security groups by Sahara could be done in several ways. Ideally
Sahara should support separation between different networks and configuration
on what to allow and what is not.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<ol class="arabic simple">
<li><p>List of security groups need to be saved in each node group.</p></li>
<li><p>Flag indicating that one of security groups is created by Sahara</p></li>
<li><p>List of ports to be opened. It need to be stored somewhere to provide this
information to provisioning engine.</p></li>
</ol>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Requests to create cluster, nodegroup, cluster template and nodegroup template
will be extended to receive security groups to use. Also option for
automatic security group creation will be added.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>In some cases there will be no need to configure default security group.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Plugin SPI will be extended with method to return required ports for node
group.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>New field to select security group in all create screens.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:
Andrew Lazarev (alazarev)</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Allow user to specify security groups for node group</p></li>
<li><p>Implement ability of security group creation by Sahara</p></li>
</ul>
<p>Both items require the following steps:</p>
<ul class="simple">
<li><p>Implement in both engines (heat and direct engine)</p></li>
<li><p>Test for nova network and neutron</p></li>
<li><p>Update documentation</p></li>
<li><p>Update UI</p></li>
<li><p>Create integration test</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Feature need to be covered by integration tests both for engine and UI.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Feature need to be documented.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 Move the EDP examples from the sahara-extra repo to saharahttps://specs.openstack.org/openstack/sahara-specs/specs/juno/edp-move-examples.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-move-examples">https://blueprints.launchpad.net/sahara/+spec/edp-move-examples</a></p>
<p>Moving the Sahara EDP examples from the sahara-extra repo to
the sahara repo accomplishes several things:</p>
<ul class="simple">
<li><p>It eliminates code duplication since the examples are actually
used in integration tests</p></li>
<li><p>It removes an element from the sahara-extra repo, thereby moving
us closer to retiring that repo and simplifying our repo structure</p></li>
<li><p>It puts examples where developers are more likely to find it, and
makes it simpler to potentially bundle the examples with a Sahara
distribution</p></li>
</ul>
<section id="problem-description">
<h2>Problem description</h2>
<p>The goal is to create one unified set of EDP jobs that can
be used to educate users and developers on how to create/run
jobs and can also be used as jobs submitted during integration
testing.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Under the sahara root directory, we should create a new directory:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">examples</span>
</pre></div>
</div>
<p>The directory structure should follow a standard pattern (names
are not important per se, this is just an illustration):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">subdirectory_for_each_example</span><span class="o">/</span>
<span class="n">README</span><span class="o">.</span><span class="n">rst</span> <span class="p">(</span><span class="n">what</span> <span class="n">it</span> <span class="ow">is</span><span class="p">,</span> <span class="n">how</span> <span class="n">to</span> <span class="nb">compile</span><span class="p">,</span> <span class="n">etc</span><span class="p">)</span>
<span class="n">script_and_jar_files</span>
<span class="n">src_for_jars</span><span class="o">/</span>
<span class="n">how_to_run_from_node_command_line</span><span class="o">/</span> <span class="p">(</span><span class="n">optional</span><span class="p">)</span>
<span class="n">expected_input_and_output</span><span class="o">/</span> <span class="p">(</span><span class="n">optional</span><span class="p">)</span>
<span class="n">hadoop_1_specific_examples</span><span class="o">/</span>
<span class="n">subdirectory_for_each_example</span>
<span class="n">hadoop_2_specific_examples</span><span class="o">/</span>
<span class="n">subdirectory_for_each_example</span>
</pre></div>
</div>
<p>The integration tests should be modified to pull job files from
the sahara/edp-examples directory.</p>
<p>Here are some notes on equivalence for the current script and jar
files in <code class="docutils literal notranslate"><span class="pre">sahara-extra/edp-examples</span></code> against
<code class="docutils literal notranslate"><span class="pre">sahara/tests/integration/tests/resources</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">pig</span><span class="o">-</span><span class="n">job</span><span class="o">/</span><span class="n">example</span><span class="o">.</span><span class="n">pig</span> <span class="o">==</span> <span class="n">resources</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">job</span><span class="o">.</span><span class="n">pig</span>
<span class="n">pig</span><span class="o">-</span><span class="n">job</span><span class="o">/</span><span class="n">udf</span><span class="o">.</span><span class="n">jar</span> <span class="o">==</span> <span class="n">resources</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">lib</span><span class="o">.</span><span class="n">jar</span>
<span class="n">wordcount</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">java</span><span class="o">.</span><span class="n">jar</span> <span class="o">==</span> <span class="n">resources</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">java</span><span class="o">/</span><span class="n">edp</span><span class="o">-</span><span class="n">java</span><span class="o">.</span><span class="n">jar</span>
</pre></div>
</div>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Examples won’t be found in the sahara-extra repo any longer.
We should perhaps put a README file there that says “We have
moved” for a release cycle.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>None as yet</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The problem has several components:</p>
<ul class="simple">
<li><p>Move the examples to the sahara repository</p></li>
<li><p>Merge any jobs used by the integration tests into the new
examples directory to create one comprehensive set</p></li>
<li><p>Provide source code and compilation instructions for any
examples that currently lack them</p></li>
<li><p>Make the integration tests reference the new directory structure</p></li>
<li><p>Delineate which, if any, examples work only with specific
Hadoop versions. Most examples work on both Hadoop 1 and Hadoop 2
but some do not. Version-specific examples should be in a subdirectory
named for the version</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Testing will be inherent in the integration tests. The change will be
deemed successful if the integration tests run successfully after the
merging of the EDP examples and the integration test jobs.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>If our current docs reference the EDP examples, those references should
change to the new location. If our current docs do not reference the
EDP examples, a reference should be added in the developer and/or user
guide.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 [EDP] Refactor job manager to support multiple implementationshttps://specs.openstack.org/openstack/sahara-specs/specs/juno/edp-refactor-job-manager.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-refactor-job-manager">https://blueprints.launchpad.net/sahara/+spec/edp-refactor-job-manager</a></p>
<p>The job manager at the core of Sahara EDP (Elastic Data Processing) was
originally written to execute and monitor jobs via an Oozie server. However,
the job manager should allow for alternative EDP implementations which
can support new cluster types or new job execution environments.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>To date, the provisioning plugins released with Sahara all support deployment
of an Oozie server. Oozie was a logical choice for the early releases of
Sahara EDP because it provided commonality across several plugins and allowed
the rapid development of the EDP feature.</p>
<p>However, Oozie is built on top of Hadoop Mapreduce and not every cluster
configuration can or should support it (consider a Spark cluster, for example,
where Hadoop Mapreduce is not a necessary part of the install). As Sahara
supports additional cluster types and gains a wider install base, it’s
important for EDP to have the flexibility to run on new cluster configurations
and new job execution paradigms.</p>
<p>The current implementation of the job_manager has hardcoded dependencies on
Oozie. These dependencies should be removed and the job manager should be
refactored to support the current Oozie implementation as well as new
implementations. The job manager should select the appropriate implementation
for EDP operations based on attributes of the cluster.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Sahara EDP requires three basic operations on a job:</p>
<ul class="simple">
<li><p>Launch the job</p></li>
<li><p>Poll job status</p></li>
<li><p>Terminate the job</p></li>
</ul>
<p>Currently these operations are implemented in job_manager.py with explicit
dependencies on the Oozie server and Oozie-style workflows.</p>
<p>To move these dependencies out of the job manager, we can define an abstract
class that contains the three operations:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="nd">@six</span><span class="o">.</span><span class="n">add_metaclass</span><span class="p">(</span><span class="n">abc</span><span class="o">.</span><span class="n">ABCMeta</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">JobEngine</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">cancel_job</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">job_execution</span><span class="p">):</span>
<span class="k">pass</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">get_job_status</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">job_execution</span><span class="p">):</span>
<span class="k">pass</span>
<span class="nd">@abc</span><span class="o">.</span><span class="n">abstractmethod</span>
<span class="k">def</span> <span class="nf">run_job</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">job_execution</span><span class="p">):</span>
<span class="k">pass</span>
</pre></div>
</div>
<p>For each EDP implementation Sahara supports, a class should be derived from
JobEngine that contains the details. Each implementation and its supporting
files should be contained in its own subdirectory.</p>
<p>Logic can be added to the job manager to allocate a JobEngine based on the
cluster and/or job type, and the job manager can call the appropriate method
on the allocated JobEngine.</p>
<p>As much as possible the JobEngine classes should be concerned only with
implementing the operations. They may read objects from the Sahara database as
needed but modifications to the job execution object should generally be
done in the job_manager.py (in some cases this may be difficult, or may require
optional abstract methods to be added to the JobEngine base class).</p>
<p>For example, the “cancel_job” sequence should look something like this:</p>
<ol class="arabic simple">
<li><p>“cancel_job(id)” is called in job_manager.py</p></li>
<li><p>The job manager retrieves the job execution object and the cluster
object and allocates an appropriate job engine</p></li>
<li><p>The job manager calls engine.cancel_job(job_execution)</p></li>
<li><p>The engine performs necessary steps to cancel the job</p></li>
<li><p>The job manager traps and logs any exceptions</p></li>
<li><p>The job manager calls engine.get_job_status(job_execution)</p></li>
<li><p>The engine returns the status for the job, if possible</p></li>
<li><p>The job manager updates the job execution object in the Sahara database
with the indicated status, if any, and returns</p></li>
</ol>
<p>In this example, the job manager has no knowledge of operations in the engine,
only the high level logic to orchestrate the operation and update the status.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>It may be possible to support “true” plugins for EDP similar to the
implementation of the provisioning plugins. In this case, the plugins would be
discovered and loaded dynamically at runtime.</p>
<p>However, this requires much more work than a refactoring and introduction of
abstract classes and is probably not realistic for the Juno release. There are
several problems/questions that need to be solved:</p>
<ul class="simple">
<li><p>Should EDP interfaces simply be added as optional methods in the current
provisioning plugin interface, or should EDP plugins be separate entities?</p></li>
<li><p>Do vendors that supply the provisioning plugins also supply the EDP plugins?</p></li>
<li><p>If separate EDP plugins are chosen, a convention is required to associate an
EDP plugin with a provisioning plugin so that the proper EDP implementation
can be chosen at runtime for a particular cluster without any internal glue</p></li>
<li><p>For clusters that are running an Oozie server, should the Oozie EDP
implementation always be the default if another implementation for the
cluster is not specified? Or should there be an explicitly designated
implementation for each cluster type?</p></li>
<li><p>It requires not only a formal interface for the plugin, but a formal
interface for elements in Sahara that the plugin may require. For instance,
an EDP plugin will likely need a formal interface to the conductor so that it
can retrieve EDP objects (job execution objects, data sources, etc).</p></li>
</ul>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change should only cause one minor (optional) change to the current data
model. Currently, a JobExecution object contains a string field named
“oozie_job_id”. While a job id field will almost certainly still be needed for
all implementations and can be probably be stored as a string, the name should
change to “job_id”.</p>
<p>JobExecution objects also contain an “extra” field which in the case of the
Oozie implementation is used to store neutron connection info for the Oozie
server. Other implementations may need similar data, however since the “extra”
field is stored as a JsonDictType no change should be necessary.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:
Trevor McKay</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Testing will be done primarily through the current unit and integration tests.
Tests may be added that test the selection of the job engine.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 [EDP] Add a Spark job type (instead of overloading Java)https://specs.openstack.org/openstack/sahara-specs/specs/juno/edp-spark-job-type.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-spark-job-type">https://blueprints.launchpad.net/sahara/+spec/edp-spark-job-type</a></p>
<p>Spark EDP has been implemented initially using the Java job type. However,
the spark semantics are slightly different and Spark jobs will probably
continue to diverge from Java jobs during future development. Additionally,
a specific job type will help users distinguish between Spark apps and Java
mapreduce jobs in the Sahara database.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The work involves adding a Spark job type to the job type enumeration in
Sahara and extending the dashboard to allow the creation and submission
of Spark jobs. The Sahara client must be able to create and submit Spark
jobs as well (there may not be any new work in the client to support this).</p>
<p>Existing unit tests and integration tests must be repaired if the addition
of a new job type causes them to fail. Unit tests analagous to the tests
for current job types should be added for the Spark job type.</p>
<p>Integration tests for Spark clusters/jobs will be added as a separate effort.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Changes in the Sahara-api code:</p>
<ul class="simple">
<li><p>Add the Spark job type to the enumeration</p></li>
<li><p>Add validation methods for job creation and job execution creation</p></li>
<li><p>Add unit tests for the Spark job type</p></li>
<li><p>Add the Spark job type to the job types supported by the Spark plugin</p>
<ul>
<li><p>Leave the Java type supported for Spark until the dashboard is changed</p></li>
</ul>
</li>
<li><p>Add config hints for the Spark type</p>
<ul>
<li><p>These may be empty initially</p></li>
</ul>
</li>
</ul>
<p>Changes in the Sahara dashboard:</p>
<ul class="simple">
<li><p>Add the Spark job type as a selectable type on the job creation form.</p>
<ul>
<li><p>Include the “Choose a main binary” input on the Create Job tab</p></li>
<li><p>Supporting libraries are optional, so the form should include the Libs tab</p></li>
</ul>
</li>
<li><p>Add a “Launch job” form for the Spark job type</p>
<ul>
<li><p>The form should include the “Main class” input.</p></li>
<li><p>No data sources, as with Java jobs</p></li>
<li><p>Spark jobs will share the edp.java.main_class configuration with Java jobs.
Alternatively, Spark jobs could use a edp.spark.main_class config</p></li>
<li><p>There may be additional configuration parameters in the future, but none
are supported at present. The Configuration button may be included or
left out.</p></li>
<li><p>The arguments button should be present</p></li>
</ul>
</li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Overload the existing Java job type. It is similar enough to work as a
proof of concept, but long term this is probably not clear, desirable or
maintainable.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None. Job type is stored as a string in the database, so there should be
no impact there.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None. The JSON schema will list “Spark” as a valid job type, but the API
calls themselves should not be affected.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>Described under Proposed Change.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>Trevor McKay (sahara-api)</p>
</dd>
<dt>Other contributors:</dt><dd><p>Chad Roberts (dashboard)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>Additional notes on implementation of items described under Proposed Change:</p>
<ul class="simple">
<li><p>The simple addition of JOB_TYPE_SPARK to sahara.utils.edp.JOB_TYPES_ALL
did not cause existing unit tests to fail in an experiment</p></li>
<li><p>Existing unit tests should be surveyed and analagous tests for the Spark job
type should be added as appropriate</p></li>
<li><p>sahara.service.edp.job_manager.get_job_config_hints(job_type) needs to handle
the Spark job type. Currently all config hints are retrieved from the Oozie
job engine, this will not be correct.</p>
<ul>
<li><p>Related, the use of JOB_TYPES_ALL should probably be modified in
workflow_creator.workflow_factory.get_possible_job_config() just to be safe</p></li>
</ul>
</li>
<li><p>sahara.service.edp.job_utils.get_data_sources() needs to treat Spark jobs
like Java jobs (there are no data sources, only arguments)</p></li>
<li><p>service.validations.edp.job.check_main_libs() needs to require a main
application jar for Spark jobs and allow supporting libs as optional</p></li>
<li><p>service.validations.edp.job_executor.check_job_executor()</p>
<ul>
<li><p>Spark requires edp.java.main_class (or edp.spark.main_class)</p></li>
<li><p>check_edp_job_support() is called here and should be fine. The default is
an empty body and the Spark plugin does not override this method since the
Spark standalone deployment is part of a Spark image generated from DIB</p></li>
</ul>
</li>
<li><p>Use of EDP job types in sahara.service.edp.oozie.workflow_creator should not
be impacted since Spark jobs shouldn’t be targeted to an Oozie engine by the
job manager (but see note on get_job_config_hints() and JOB_TYPES_ALL)</p></li>
<li><p>The Sahara client does not appear to reference specific job type values so
there is likely no work to do in the client</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This depends on <a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-spark-standalone">https://blueprints.launchpad.net/sahara/+spec/edp-spark-standalone</a></p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>New unit tests will be added for the Spark job type, analogous to existing
tests for other job types. Existing unit and integration tests will ensure that
other job types have not been broken by the addition of a Spark type.</p>
<p>Integration tests for Spark clusters should be added in the following
blueprint, including tests for EDP with Spark job types</p>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-spark-integration-tests">https://blueprints.launchpad.net/sahara/+spec/edp-spark-integration-tests</a></p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The User Guide calls out details of the different job types for EDP.
Details of the Spark type will need to be added to this section.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None.</p>
</section>
Tue, 25 Nov 2014 00:00:00 [EDP] Add an engine for a Spark standalone deploymenthttps://specs.openstack.org/openstack/sahara-specs/specs/juno/edp-spark-standalone.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-spark-standalone">https://blueprints.launchpad.net/sahara/+spec/edp-spark-standalone</a></p>
<p>The Spark provisioning plugin allows the creation of Spark standalone
clusters in Sahara. Sahara EDP should support running Spark
applications on those clusters.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The Spark plugin uses the <em>standalone</em> deployment mode for Spark
(as opposed to Spark on YARN or Mesos). An EDP implementation must be created
that supports the basic EDP functions using the facilities provided by the
operating system and the Spark standalone deployment.</p>
<p>The basic EDP functions are:</p>
<ul class="simple">
<li><p><strong>run_job()</strong></p></li>
<li><p><strong>get_job_status()</strong></p></li>
<li><p><strong>cancel_job()</strong></p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The Sahara job manager has recently been refactored to allow provisioning
plugins to select or provide an EDP engine based on the cluster and job_type.
The plugin may return a custom EDP job engine object or choose a default engine
provided by Sahara.</p>
<p>A default job engine for Spark standalone clusters can be added to Sahara
that implements the basic EDP functions.</p>
<p>Note, there are no public APIs in Spark for job status or cancellation
beyond facilities that might be available through a SparkContext
object instantiated in a Scala program. However, it is possible to
provide basic functionality without developing Scala programs.</p>
<ul>
<li><p><strong>Engine selection criteria</strong></p>
<p>The Spark provisioning plugin must determine if the default Spark EDP
engine may be used to run a particular job on a particular cluster. The
following conditions must be true to use the engine</p>
<ul class="simple">
<li><p>The default engine will use <em>spark-submit</em> for running jobs, therefore
a cluster must have at least Spark version 1.0.0 to use the engine.</p></li>
<li><p>The job type must be Java (initially)</p></li>
</ul>
</li>
<li><p><strong>Remote commands via ssh</strong></p>
<p>All operations should be implemented using ssh to run remote
commands, as opposed to writing a custom agent and client. Furthermore,
any long running commands should be run asynchronously.</p>
</li>
<li><p><strong>run_job()</strong></p>
<p>The run_job() function will be implemented using the <em>spark-submit</em>
script provided by Spark.</p>
<ul class="simple">
<li><p><em>spark-submit</em> must be run using client deploy mode</p></li>
<li><p>The <em>spark-submit</em> command will be executed via ssh. Ssh must
return immediately and must return a process id (PID) that can be
used for checking status or cancellation. This implies that the
process is run in the background.</p></li>
<li><p>SIGINT must not be ignored by the process running <em>spark-submit</em>.
Care needs to be taken here, since the default behavior of a
process backgrounded from bash is to ignore SIGINT. (This can be
handled by running <em>spark-submit</em> as a subprocess from a wrapper
which first restores SIGINT, and launching the wrapper from ssh. In this
case the wrapper must be sure to propagate SIGINT to the child).</p></li>
<li><p><em>spark-submit</em> requires that the main application jar be in
local storage on the node where it is run. Supporting jars may
be in local storage or hdfs. For simplicity, all jars will be uploaded
to local storage.</p></li>
<li><p><em>spark-submit</em> should be run from a subdirectory on the
master node in a well-known location. The subdirectory naming
should incorporate the job name and job execution id from
Sahara to make locating the directory easy. Program files,
output, and logs should be written to this directory.</p></li>
<li><p>The exit status returned from <em>spark-submit</em> must be written
to a file in the working directory.</p></li>
<li><p><em>stderr</em> and <em>stdout</em> from <em>spark-submit</em> should be redirected
and saved in the working directory. This will help debugging as well
as preserve results for apps like SparkPi which write the result to stdout.</p></li>
<li><p>Sahara will allow the user to specify arguments to the Spark
application. Any input and output data sources will be specified
as path arguments to the Spark app.</p></li>
</ul>
</li>
<li><p><strong>get_job_status()</strong></p>
<p>Job status can be determined by monitoring the PID returned
by run_job() via <em>ps</em> and reading the file containing the exit status</p>
<ul class="simple">
<li><p>The initial job status is PENDING (the same for all Sahara jobs)</p></li>
<li><p>If the PID is present, the job status is RUNNING</p></li>
<li><p>If the PID is absent, check the exit status file
- If the exit status is 0, the job status is SUCCEEDED
- If the exit status is -2 or 130, the job status is KILLED (by SIGINT)
- For any other exist status, the job status is DONEWITHERROR</p></li>
<li><p>If the job fails in Sahara (ie, because of an exception), the job status
will be FAILED</p></li>
</ul>
</li>
<li><p><strong>cancel_job()</strong></p>
<p>A Spark application may be canceled by sending SIGINT to the
process running <em>spark-submit</em>.</p>
<ul class="simple">
<li><p>cancel_job() should send SIGINT to the PID returned by run_job().
If the PID is the process id of a wrapper, the wrapper must
ensure that SIGINT is propagated to the child</p></li>
<li><p>If the command to send SIGINT is successful (ie, <em>kill</em> returns 0),
cancel_job() should call get_job_status() to update the job status</p></li>
</ul>
</li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>The Ooyala job server is an alternative for implementing Spark EDP, but it’s
a project of its own outside of OpenStack and introduces another dependency. It
would have to be installed by the Spark provisioning plugin, and Sahara
contributors would have to understand it thoroughly.</p>
<p>Other than Ooyala, there does not seem to be any existing client or API for
handling job submission, monitoring, and cancellation.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>There is no data model impact, but a few fields will be reused.</p>
<p>The <em>oozie_job_id</em> will store an id that allows the running application
to be operated on. The name of this field should be generalized at some
point in the future.</p>
<p>The job_execution.extra field may be used to store additional information
necessary to allow operations on the running application</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None. Initially Spark jobs (jars) can be run using the Java job type.
At some point a specific Spark job type will be added (this will be
covered in a separate specification).</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Trevor McKay</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>implement default spark engine selection in spark provisioning plugin</p></li>
<li><p>implement run</p></li>
<li><p>implement get_job_status</p></li>
<li><p>implement cancel</p></li>
<li><p>implement launch wrapper</p></li>
<li><p>implement unit tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Unit tests will be added for the changes in Sahara.</p>
<p>Integration tests for Spark standalone clusters will be added in another
blueprint and specification.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The Elastic Data Processing section of the User Guide should talk about
the ability to run Spark jobs and any restrictions.</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Tue, 25 Nov 2014 00:00:00 [EDP] Using trust delegation for Swift authenticationhttps://specs.openstack.org/openstack/sahara-specs/specs/juno/edp-swift-trust-authentication.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-swift-trust-authentication">https://blueprints.launchpad.net/sahara/+spec/edp-swift-trust-authentication</a></p>
<p>Sahara currently stores and distributes credentials for access to Swift
objects. These credentials are username/password pairs that are stored in
Sahara’s database. This blueprint describes a method for using Keystone’s
trust delegation mechanism in conjuction with temporary proxy users to remove
the storage of user credentials from Sahara’s purview.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Sahara allows access to job binaries and data sources stored in Swift
containers by storing a user’s credentials to those containers. The
credentials are stored in Sahara’s database and distributed to cluster
instances as part of a job’s workflow, or used by Sahara to access job
binaries. The storage of user credentials in Sahara’s database represents a
security risk that can be avoided.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>A solution to using credentials for access to Swift objects is to generate a
Keystone trust between the user with access to those objects and a Sahara
proxy user. The trust would be established based on the user’s membership in
the project that contains the Swift objects. Using this trust the Sahara proxy
user could generate authentication tokens to access the Swift objects. When
access is no longer needed the trust can be revoked, and the proxy user
removed thus invalidating the tokens.</p>
<p>A proxy user will be created per job execution, and belong to a roleless
domain created by the stack administrator for the express purpose of
containing these new users. This domain will allow Sahara to create users as
needed for the purpose of delegating trust to access Swift objects.</p>
<p>Sahara will generate usernames and passwords for the proxy users when they are
created. These credentials will be used in conjuction with the trust to allow
Swift accress from the cluster instances.</p>
<p>General breakdown of the process:</p>
<ol class="arabic simple">
<li><p>On start Sahara confirms the existence of the proxy domain. If no proxy
domain is found and the user has configured Sahara to use one, then Sahara
will log an error and resume in backward compatibility mode.</p></li>
<li><p>When a new job that involves Swift objects is executed a trust is created
between the context user and a newly created proxy user. The proxy user’s
name and password are created by Sahara and stored temporarily.</p></li>
<li><p>When an instance needs access to a Swift object it uses the proxy user’s
credentials and the trust identifier to create the necessary authentication
token.</p></li>
<li><p>When the job has ended, the proxy user and the trust will be removed from
Keystone.</p></li>
</ol>
<p>Detailed breakdown:</p>
<p>Step 1.</p>
<p>On start Sahara will confirm the existence of the proxy domain specified by
the administrator in the configuration file. If no domain can be found and the
user has configured Sahara to use one, then it will log an error and resume
in backward compatibility mode. The domain will be used to store proxy users
that will be created during job execution. It should explicitly have an SQL
identity backend, or another backend that will allow Sahara to create
users.(SQL is the Keystone default)</p>
<p>If the user has configured Sahara not to use a proxy domain then it will
fall back to using the backward compatible style of Swift authentication.
Requiring usernames and passwords for Swift access.</p>
<p>Step 2.</p>
<p>Whenever a new job execution is issued through Sahara a new proxy user will
be created in the proxy domain. This user will have a name generated based
on the job execution id, and a password generated randomly.</p>
<p>After creating the proxy user, Sahara will delegate a trust from the current
context user to the proxy user. This trust will grant a “Member” role by
default, but will allow configuration, and impersonate the context user. As
Sahara will not know the length of the job execution, the trust will be
generated with no expiration time and unlimited reuse.</p>
<p>Step 3.</p>
<p>During job execution, when an instance needs to access a Swift object, it
will use the proxy user’s name and password in conjuction with the trust
identifier to create an authentication token. This token will then be used
to access the Swift object store.</p>
<p>Step 4.</p>
<p>After the job execution has finished, successfully or otherwise, the proxy
user and trust will be removed from Keystone.</p>
<p>A periodic task will check the proxy domain user list to ensure that none have
become stuck or abandoned after a job execution has been completed.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Three alternatives have been discussed regarding this issue; using Swift’s
TempURL mechanism, encrypting the Swift credentials, and distributing tokens
from Sahara to the cluster instances.</p>
<p>Swift implements a feature named TempURL which allows the generation of
temporary URLs to allow public access to Swift objects. Using this feature
Sahara could create TempURLs for each Swift object that requires access and
then distribute these URLs to the cluster instances in the job workflow.</p>
<p>Although TempURLs would be low impact in terms of the work required to
implement they have a few major drawbacks for Sahara’s use case. When
creating a TempURL an expiration date must be associated with the URL. As
job lengths in Sahara cannot be predictively determined this would mean
creating indefinite expiration dates for the TempURLs. A solution to this
would be deleting the Swift object or changing the authentication identifier
associated with the creation of the TempURL. Both of these options present
implications that run beyond the boundaries of Sahara. In addition the
TempURLs would need to be passed to the instances as ciphertext to avoid
potential security breaches.</p>
<p>Another methodology discussed involves encrypting the Swift credentials and
allowing the cluster instances to decrypt them when access is required. Using
this method would involve Sahara generating a two part public/private key that
would be used to encrypt all credentials. The decryption, or private, part of
the key would be distributed to all cluster nodes. Upon job creation the Swift
credentials associated with the job would be encrypted and stored. The
encrypted credentials would be distributed to the cluster instances in the job
workflow. When access is needed to a Swift object, an instance would decrypt
the credentials using the locally stored key.</p>
<p>Encrypting the credentials poses a lower amount of change over using Keystone
trusts but perpetuates the current ideology of Sahara storing credentials for
Swift objects. In addition a new layer of security management becomes involved
in the form of Sahara needing to generate and store keys for use with the
credentials. This complexity adds another layer of management that could
instead be relegated to a more appropriate OpenStack service(i.e. Keystone).</p>
<p>Finally, another possibilty to achieve the removal of credentials from Sahara
would be for the main controller to distribute preauthorized tokens to the
cluster instances. These tokens would be used by the instances to validate
their Swift access. There are a few implementation problems with this approach
that have caused it to be abandoned by a previous version of this blueprint.</p>
<p>Keystone tokens have a default lifespan of one hour, this value can be
adjusted by the stack administrator. This implies that tokens must be updated
to the instances at least once an hour, possibly more frequently. The updating
process proves to add hindrances that are difficult to overcome. Update
frequency is one, but resource contention on the instances is another. A
further examination of this method shows that the updates to the Hadoop Swift
filesystem component will create a disonant design surrouding the injestion
of configuration data. In sum this methodology proves to be more fragile
than is acceptable.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>The job execution model currently stores username and password information in
a field that is a dictionary. There will be no changes to the model, but the
trust identifier will need to be stored in addition to the username and
password.</p>
<p>Once the credentials have been passed to the cluster the only values that
need be stored are the user id and trust identifier. These would need to be
used to destroy the trust and user after job execution. The proxy user’s
password is only needed during the creation of the job execution and will
be distributed to the instances, but long term storage is not necessary.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>The proxy usernames and trust identifiers should be sanitized from the
job execution output.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users will no longer need to enter credentials when adding Swift data sources
to their jobs.</p>
<p>The user’s OpenStack credentials will need to have sufficient privileges to
access the Swift objects they add.</p>
<p>From the python-saharaclient, developers will no longer need to enter
credential_user or credential_pass when making a requests to create data
sources.</p>
<p>Keystone deployments that use LDAP backed domains will need to be configured
as recommended by the Keystone group, using domain based configurations. This
ensures that new domains created will be backed by SQL.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>A deployer will need to be aware of the Keystone configuration with respect
to the default identity backend. They will also need to create the proxy
domain and provide the Sahara service user with enough access to create new
users in that domain.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers will no longer need to pass credentials when creating data sources.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>For backward compatibility the username and password fields should be left in
the Swift data source forms and views, but they should allow the user to
enter blank data.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee:</p>
<ul class="simple">
<li><p>Michael McCune</p></li>
</ul>
<p>Other contributors:</p>
<ul class="simple">
<li><p>Trevor McKay</p></li>
</ul>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Domain detection and configuration option</p></li>
<li><p>Proxy user creation/destruction</p></li>
<li><p>Trust acquisition/revocation</p></li>
<li><p>Workflow update</p></li>
<li><p>Swift file system component update to use trust identifier</p></li>
<li><p>Periodic proxy user removal task</p></li>
<li><p>Documentation</p></li>
<li><p>Tests</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>This feature will require the usage of Keystone v3 with the OS-TRUST mechanism
enabled.</p>
<p>The Horizon forms for Swift data sources will need to allow blank entries
for username and password.</p>
<p>The Hadoop Swift file system component will need to be updated as well.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>The current tests for Swift based objects will need to be modified to remove
the usage of username/password credentials. Otherwise these tests should prove
that the trust method is working properly.</p>
<p>Tests for situations where a user’s Keystone access do not permit permission
for the Swift objects they are adding should be implemented.</p>
<p>The Swift file system component will need it’s tests modified to use
trust identifiers for scoping of the authentication tokens.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>The documentation for usage of Swift storage will need to have references to
the object credentials removed. Additionally there should be documentation
added about the impact of a user not having access to the Swift sources.</p>
<p>The proxy domain usage should be documented to give stack administrators a
clear understanding of Sahara’s usage and needs. This should also note the
impact of Keystone configurations which do not provide default SQL identity
backends.</p>
</section>
<section id="references">
<h2>References</h2>
<p>Original bug report
<a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1321906">https://bugs.launchpad.net/sahara/+bug/1321906</a></p>
<p>Keystone trust API reference
<a class="reference external" href="https://github.com/openstack/identity-api/blob/master/v3/src/markdown/identity-api-v3-os-trust-ext.md">https://github.com/openstack/identity-api/blob/master/v3/src/markdown/identity-api-v3-os-trust-ext.md</a></p>
</section>
Tue, 25 Nov 2014 00:00:00 Improve error handling for provisioning operationshttps://specs.openstack.org/openstack/sahara-specs/specs/juno/error-handling-in-provisioning.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/error-handling-in-provisioning">https://blueprints.launchpad.net/sahara/+spec/error-handling-in-provisioning</a></p>
<p>Currently provisioning error handling is sprayed across the whole provisioning
code. This spec is to unify error handling and localize it in one place.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently we have two problems connected with error handling in provisioning
part:</p>
<ol class="arabic simple">
<li><p>The code incorrectly handles situations when cluster was deleted by user
during provisioning. In that case an arbitrary error might be raised in
many places.</p></li>
<li><p>The code performs rollback only in certain places, while it could be done
for any provisioning/scaling phase.</p></li>
</ol>
<p>The following CR:
<a class="reference external" href="https://review.openstack.org/#/c/98556">https://review.openstack.org/#/c/98556</a>
mostly fixes issue #1, but it is full of duplicate code.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The following solution is proposed instead which requires architectural
changes, but rather reliably fixes both problems:</p>
<ol class="arabic simple">
<li><p>For both cluster creation and scaling move error handling logic to the very
top functions inside ops.py file. Once exception is caught properly
process it:</p>
<ol class="loweralpha simple">
<li><p>if cluster object does not exists in DB, that means that user deleted
the cluster during provisioning; handle it and return</p></li>
<li><p>if cluster object exists, log it and perform rollback</p></li>
</ol>
</li>
<li><p>Do not do any checks if cluster exists outside of ops.py, except places
where processing might hang indefinitely without the check.</p></li>
</ol>
<p>We can employ the following rollback strategy:</p>
<p>For cluster creation: if anything went wrong, kill all VMs and move cluster
to the Error state.</p>
<p>For cluster scaling: that will be long. Cluster scaling has the following
stages:</p>
<ol class="arabic simple">
<li><p>decommission unneeded nodes (by plugin)</p></li>
<li><p>terminate unneeded nodes and create a new ones if needed (by engine). Note
that both scaling up and down could be run simultaneously but in
different node groups.</p></li>
<li><p>Configure and start nodes (by plugin)</p></li>
</ol>
<p>My suggestion what to do if an exception occurred in the respective stage:</p>
<ol class="arabic simple">
<li><p>move cluster to Error state</p></li>
<li><p>kill unneeded nodes (finish scale down). Also kill new nodes, if they were
created for scale up.</p></li>
<li><p>move cluster to Error state</p></li>
</ol>
<p>In cases #1 and #3 it is dangerous to delete not decommissioned or not
configured nodes as this can lead to data loss.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Keep supporting current code. It is not elegant but works good enough.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Data provisioning logic will be changed a lot. This could lead to behavior
change in case of errors on different stages.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Provisioning engine API will be extended with “rollback_cluster” method.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev</p>
</dd>
<dt>Other contributors:</dt><dd><p>dmitrymex</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ol class="arabic simple">
<li><p>Implement change</p></li>
<li><p>Test that cluster provisioning and rollback works on all feature matrix we
have.</p></li>
</ol>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Provisioning could be tested manually and by CI.
It is much harder to test rollback. Even current code is not tested well (e.g.
<a class="reference external" href="https://bugs.launchpad.net/sahara/+bug/1337006">https://bugs.launchpad.net/sahara/+bug/1337006</a>).</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 Move Sahara REST API samples from /etc to docshttps://specs.openstack.org/openstack/sahara-specs/specs/juno/move-rest-samples-to-docs.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/move-rest-samples-to-docs">https://blueprints.launchpad.net/sahara/+spec/move-rest-samples-to-docs</a></p>
<p>Initially this idea was raised during discussions about
moving/releasing/versioning of Sahara’s subprojects.
REST samples are slightly outdated and common documentation
is a good place to have them there to keep them up-to-date.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Today REST API samples are outdated and don’t reflect current state of
changes made in Sahara api since Havana release. Also it’s not obvious to
find those samples in Sahara sources. The goal is to update current state
of samples and move it to <a class="reference external" href="http://docs.openstack.org/">http://docs.openstack.org/</a></p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Create a new page in docs:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="n">sahara</span><span class="o">/</span><span class="n">doc</span><span class="o">/</span><span class="n">source</span><span class="o">/</span><span class="n">restapi</span><span class="o">/</span><span class="n">rest_api_samples</span><span class="o">.</span><span class="n">rst</span>
</pre></div>
</div>
<p>Move all JSON examples from sahara/etc/rest-api-samples to the new page.
Simple description for each example should be added before JSON code blocks.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Examples won’t be found in the sahara/etc dir any longer.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary Assignee:
Alexander Ignatov</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The following steps should be done:</p>
<ul class="simple">
<li><p>Move REST samples from sahara/etc to a new page in docs</p></li>
<li><p>Update samples in docs according to current state of Sahara api</p></li>
<li><p>Remove sahara/etc/rest-api-samples directory in Sahara sources</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>None</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>New page with information about REST samples will appear in the Sahara docs.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 25 Nov 2014 00:00:00 Authorization Policy Supporthttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/auth-policy.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/auth-policy">https://blueprints.launchpad.net/sahara/+spec/auth-policy</a></p>
<p>Openstack components are supposed to check user privileges for performed
actions. Usually these checks are role-based. See
<a class="reference external" href="http://docs.openstack.org/developer/keystone/architecture.html#approach-to-authorization-policy">http://docs.openstack.org/developer/keystone/architecture.html#approach-to-authorization-policy</a>.
Sahara need to support policies too.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>OpenStack administrators may want to tune authorization policy for Sahara.
There should be a way to restrict some users to perform some of Sahara
operations.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Add auth check for all Sahara API endpoints. This could be done in the same
way as in other openstack component. There is “policy” module in Oslo library
that may do all underlying work.</p>
<p>Proposed content of the policy file:</p>
<div class="highlight-json notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="w"> </span><span class="nt">"context_is_admin"</span><span class="p">:</span><span class="w"> </span><span class="s2">"role:admin"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"admin_or_owner"</span><span class="p">:</span><span class="w"> </span><span class="s2">"is_admin:True or project_id:%(project_id)s"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"default"</span><span class="p">:</span><span class="w"> </span><span class="s2">"rule:admin_or_owner"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"clusters:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"clusters:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"clusters:scale"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"clusters:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"clusters:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"cluster-templates:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"cluster-templates:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"cluster-templates:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"cluster-templates:modify"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"cluster-templates:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node-group-templates:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node-group-templates:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node-group-templates:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node-group-templates:modify"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"node-group-templates:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins:get_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"plugins:convert_config"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:register"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:unregister"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:add_tags"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"images:remove_tags"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-executions:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-executions:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-executions:refresh_status"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-executions:cancel"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-executions:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"data-sources:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"data-sources:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"data-sources:register"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"data-sources:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:get_config_hints"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"jobs:execute"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binaries:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binaries:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binaries:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binaries:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binaries:get_data"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binary-internals:get_all"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binary-internals:create"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binary-internals:get"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binary-internals:delete"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"job-binary-internals:get_data"</span><span class="p">:</span><span class="w"> </span><span class="s2">""</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Separating Sahara users and operators could be the next step.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Adding new API will require changing policy rules.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>alazarev (Andrew Lazarev)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add policy.py from oslo</p></li>
<li><p>Add config options to control policy file and settings</p></li>
<li><p>Add policy check to all API calls</p></li>
<li><p>Add unit tests</p></li>
<li><p>Add documentation</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p>Policy module in Oslo.</p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Unit tests</p></li>
<li><p>Manual testing</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<ul class="simple">
<li><p>Feature need to be documented</p></li>
</ul>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="http://docs.openstack.org/developer/keystone/architecture.html#approach-to-authorization-policy">http://docs.openstack.org/developer/keystone/architecture.html#approach-to-authorization-policy</a></p></li>
<li><p><a class="reference external" href="http://docs.openstack.org/developer/keystone/api/keystone.openstack.common.policy.html">http://docs.openstack.org/developer/keystone/api/keystone.openstack.common.policy.html</a></p></li>
<li><p><a class="reference external" href="http://docs.openstack.org/developer/keystone/configuration.html#keystone-api-protection-with-role-based-access-control-rbac">http://docs.openstack.org/developer/keystone/configuration.html#keystone-api-protection-with-role-based-access-control-rbac</a></p></li>
</ul>
</section>
Tue, 25 Nov 2014 00:00:00 Plugin for Sahara with MapRhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/mapr-plugin.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/mapr-plugin">https://blueprints.launchpad.net/sahara/+spec/mapr-plugin</a></p>
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/mapr-image-elements">https://blueprints.launchpad.net/sahara/+spec/mapr-image-elements</a></p>
<p>This specification proposes to add MapR plugin with MapR Distribution of
Hadoop in Sahara.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>The MapR Distribution for Apache Hadoop provides organizations with an
enterprise-grade distributed data platform to reliably store and process
big data. MapR packages a broad set of Apache open source ecosystem
projects enabling batch, interactive, or real-time applications. The
data platform and the projects are all tied together through an advanced
management console to monitor and manage the entire system.</p>
<p>MapR is one of the largest distributions for Hadoop supporting more than
20 open source projects. MapR also supports multiple versions of the
various individual projects thereby allowing users to migrate to the latest
versions at their own pace. The table below shows all of the projects
actively supported in the current GA version of MapR Distribution for
Hadoop as well as in the next Beta release.[1]</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>MapR plugin implementation will support Hadoop 0.20.2 and Hadoop 2.4.1.
Plugin will support key Sahara features:</p>
<ul class="simple">
<li><p>Cinder integration</p></li>
<li><p>Cluster scaling/decommission</p></li>
<li><p>EDP</p></li>
<li><p>Cluster topology validation</p></li>
<li><p>Integration with Swift</p></li>
</ul>
<p>Plugin will be able to install following services:</p>
<ul class="simple">
<li><p>MapR-FS</p></li>
<li><p>YARN</p></li>
<li><p>Oozie (two versions are supported)</p></li>
<li><p>HBase</p></li>
<li><p>Hive (two versions are supported)</p></li>
<li><p>Pig</p></li>
<li><p>Mahout</p></li>
<li><p>Webserver</p></li>
</ul>
<p>MapR plugin will support the following OS: Ubuntu 14.04 and CentOS 6.5.</p>
<p>MapR plugin will support the following node types:</p>
<ul class="simple">
<li><p>Nodes Running ZooKeeper and CLDB</p></li>
<li><p>Nodes for Data Storage and Processing</p></li>
<li><p>Edge Nodes</p></li>
</ul>
<p>In a production MapR cluster, some nodes are typically dedicated to cluster
coordination and management, and other nodes are tasked with data storage
and processing duties. An edge node provides user access to the cluster,
concentrating open user privileges on a single host.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>MapR plugin uses specific pre-installed images with
MapR local repository files.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>aosadchiy</p>
</dd>
<dt>Other contributors:</dt><dd><p>ssvinarchuck</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add implementation of plugin for bare images.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Add unit tests to Sahara to cover basic functionality of plugin</p></li>
<li><p>Add integration tests to Sahara</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>MapR plugin documentation should be added to the plugin section of Sahara docs.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="https://www.mapr.com/products/apache-hadoop">https://www.mapr.com/products/apache-hadoop</a></p></li>
<li><p>[2] <a class="reference external" href="http://doc.mapr.com/display/MapR/Home">http://doc.mapr.com/display/MapR/Home</a></p></li>
<li><p>[3] <a class="reference external" href="https://www.mapr.com/">https://www.mapr.com/</a></p></li>
<li><p>[4] <a class="reference external" href="https://github.com/mapr">https://github.com/mapr</a></p></li>
</ul>
</section>
Tue, 25 Nov 2014 00:00:00 JSON sample files for the EDP APIhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-api-json-samples.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-api-json-samples">https://blueprints.launchpad.net/sahara/+spec/edp-api-json-samples</a></p>
<p>Provide sample JSON files for all EDP Sahara APIs to facilitate ease of
use by command-line users.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>As an End User who prefers the Sahara CLI to its UI, I want a set of
pre-constructed example JSON payloads for the Sahara EDP API so that I can
easily learn the expected API signatures and modify them for my use.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>Example JSON payloads will be added to the directory
<code class="docutils literal notranslate"><span class="pre">sahara/etc/edp-examples/json-api-examples/v1.1</span></code>, with a subpath for each
relevant manager (data_source, job, job_binary, job_binary_internals, and
job_execution.) It is intended that examples for future API versions will
follow this path structure.</p>
<p>Each file will be named after the pattern: <code class="docutils literal notranslate"><span class="pre">method_name.[variety.]json</span></code>,
where variety is optional and will be used to describe independently useful
variations in payloads for one method (as in varying engines underlying
data sources or processing jobs.)</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>A tool could conceivably be created to generate template payloads from the
jsonschemata themselves. However, as the core use of this change is to
provide immediately available, semantically valid payloads for ease of
adoption, it is proposed that providing raw examples will better meet the
perceived user need.</p>
<p>It would also be possible to package these examples directly with the
python-saharaclient repository, an option which has much to recommend it.
However, as these examples are globally useful to any non-UI interface,
as they are reliant on the jsonschemata in the core repository for testing,
and as the extant etc/edp-examples path is a natural home for them,
placing them in the sahara repository itself seems indicated.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None, though it is intended that the payloads may be used via the
python-saharaclient.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None.</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>egafford</p>
</dd>
<dt>Other contributors:</dt><dd><p>tmckay (primary review)</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Payload generation: data_source</p></li>
<li><p>Payload generation: job</p></li>
<li><p>Payload generation: job_binary</p></li>
<li><p>Payload generation: job_binary_internals</p></li>
<li><p>Payload generation: job_execution</p></li>
<li><p>Addition of schema validation unit tests for all the above.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None.</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>After discussion with croberts and tmckay, it is proposed that integration
testing is in this case unnecessary; these examples constitute documentation.
While exhaustive testing is possible in this case, the resultant bloat of
the CI build would be disproportionate to the utility of the change.</p>
<p>Unit testing will validate that these resources pass schema validation for
their intended APIs.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>This task is itself a documentation effort. A README.rst will be provided
in place, in keeping with pre-existent etc/edp-examples.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://etherpad.openstack.org/p/kilo-summit-sahara-edp">https://etherpad.openstack.org/p/kilo-summit-sahara-edp</a></p></li>
</ul>
</section>
Wed, 12 Nov 2014 00:00:00 Support Cinder availability zoneshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/support-cinder-availability-zones.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/support-cinder-availability-zones">https://blueprints.launchpad.net/sahara/+spec/support-cinder-availability-zones</a></p>
<p>Extend API to support specifying Cinder availability zones where to create
volumes.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>It can be desirable to assign Cinder availability zones to node groups, in
order to have fine-grained cluster volumes topologies.</p>
<p>Use case:</p>
<ul class="simple">
<li><p>As a end user I want namenode volumes to be spawned in the regular AZ and
datanode volumes in a high-performance <code class="docutils literal notranslate"><span class="pre">ssd-high-io</span></code> AZ.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>It adds a new <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> property in NodeGroup and
NodeGroupTemplate objects. When set, it modifies the direct and Heat engines
to force creating of volumes into the right AZ.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change will add <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> columns in sahara database,
next to <code class="docutils literal notranslate"><span class="pre">volumes_per_node</span></code> and <code class="docutils literal notranslate"><span class="pre">volumes_size</span></code>. Impacted tables are
<code class="docutils literal notranslate"><span class="pre">node_groups</span></code>, <code class="docutils literal notranslate"><span class="pre">node_group_templates</span></code> and <code class="docutils literal notranslate"><span class="pre">templates_relations</span></code>.</p>
<p>A database migration will accompany this change.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Each API method which deals with node groups and node groups templates will
have an additional (and optional) <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> parameter,
which will be taken into account if <code class="docutils literal notranslate"><span class="pre">volumes_per_node</span></code> is set and non-zero.</p>
<p>Example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"cluster1"</span><span class="p">,</span>
<span class="s2">"node_groups"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"master"</span><span class="p">,</span>
<span class="s2">"count"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"worker"</span><span class="p">,</span>
<span class="s2">"count"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"volumes_per_node"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"volumes_size"</span><span class="p">:</span> <span class="mi">100</span><span class="p">,</span>
<span class="s2">"volumes_availability_zone"</span><span class="p">:</span> <span class="s2">"ssd-high-io"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
</pre></div>
</div>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>python-saharaclient should be modified to integrate this new feature.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Needs to migrate DB version using:</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">sahara-db-manage</span> <span class="pre">--config-file</span> <span class="pre">/etc/sahara/sahara.conf</span> <span class="pre">upgrade</span> <span class="pre">head</span></code></p>
</div></blockquote>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>sahara-dashboard should be modified to integrate this new feature.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>adrien-verge</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> property in NodeGroup and
NodeGroupTemplate objects.</p></li>
<li><p>When a user specifies the (optional) volumes_availability zone, check its
existence.</p></li>
<li><p>In the direct engine, include the <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> argument in
the call to cinder.client().volumes.create().</p></li>
<li><p>In the Heat engine, add the <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> property in
sahara/resources/volume.heat.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Test cluster creation with <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> specified.</p></li>
<li><p>Test cluster creation with wrong <code class="docutils literal notranslate"><span class="pre">volumes_availability_zone</span></code> specified.</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation will need to be updated, at sections related to node group
template creation and cluster creation.</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Mon, 22 Sep 2014 00:00:00 Support Cinder API version 2https://specs.openstack.org/openstack/sahara-specs/specs/kilo/support-cinder-api-v2.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/support-cinder-api-v2">https://blueprints.launchpad.net/sahara/+spec/support-cinder-api-v2</a></p>
<p>This specification proposes to add support for the second version of the Cinder
API, which brings useful improvements and will soon replace version one.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Currently Sahara uses only version 1 of Cinder API to create volumes. Version
two, however, brings useful features such as scheduler hints, more consistent
responses, caching, filtering, etc.</p>
<p>Also, Cinder is deprecating version 1 in favor of 2, so supporting both would
make switching easier for users.</p>
<p>Use cases:</p>
<ul class="simple">
<li><p>As a developer I want to be able to pass scheduler hints to Cinder when
creating clusters, in order to choose volumes more precisely and achieve
improved performance.</p></li>
<li><p>As a developer I want filtering into my requests to Cinder to make queries
lighter.</p></li>
<li><p>As a deployer I want to be able to choose between legacy Cinder API v1 and
newer v2 API.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The implementation will add a configuration option, cinder_api_version, that
will be defaulted to:</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">cinder_api_version=1</span></code></p>
</div></blockquote>
<p>but can be changed to</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">cinder_api_version=2</span></code></p>
</div></blockquote>
<p>by modifying sahara.conf.</p>
<p>The client() method in sahara/utils/openstack/cinder.py will either return
clientv1() or clientv2() depending on the configuration.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>Wait for Cinder API v1 to be deprecated and switch abruptly to v2.</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<ul class="simple">
<li><p>If the deployer wants to keep using Cinder API version 1, nothing has to be
done.</p></li>
<li><p>If the deployer wants to upgrade to version 2, the cinder_api_version option
in sahara.conf should be overwritten.</p></li>
</ul>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>Developers can read CONF.cinder_api_version to know what API version is being
used.</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>adrien-verge</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add a configuration option: cinder_api_version.</p></li>
<li><p>Put some magic in sahara/utils/openstack/cinder.py to pick the correct Cinder
client depending on the configuration.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>Same as for v1 API.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://wiki.openstack.org/wiki/CinderAPIv2">https://wiki.openstack.org/wiki/CinderAPIv2</a></p></li>
</ul>
</section>
Thu, 11 Sep 2014 00:00:00 Support Nova availability zoneshttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/support-nova-availability-zones.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/support-nova-availability-zones">https://blueprints.launchpad.net/sahara/+spec/support-nova-availability-zones</a></p>
<p>Extend API to support specifying Nova availability zones where to spawn
instances.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>It can be desirable to assign Nova availability zones to node groups, in order
to have fine-grained clusters topologies.</p>
<p>Use cases:</p>
<ul class="simple">
<li><p>As a end user I want namenode instances to be spawned in the regular <code class="docutils literal notranslate"><span class="pre">nova</span></code>
AZ and datanode instances in my high-performance <code class="docutils literal notranslate"><span class="pre">nova-highperf</span></code> AZ.</p></li>
<li><p>As a end user I want instances from node-group A to be all together in the
<code class="docutils literal notranslate"><span class="pre">nova-1</span></code> AZ, separated from instances from node-group B in <code class="docutils literal notranslate"><span class="pre">nova-2</span></code>.</p></li>
</ul>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The proposed change is already implemented at [1].</p>
<p>It adds an <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> property in NodeGroup and NodeGroupTemplate
objects. When set, it modifies the direct and Heat engines to force spawning
of instances into the right AZ.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change will add <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> columns in the sahara database
(<code class="docutils literal notranslate"><span class="pre">node_groups</span></code>, <code class="docutils literal notranslate"><span class="pre">node_group_templates</span></code> and <code class="docutils literal notranslate"><span class="pre">templates_relations</span></code> tables).</p>
<p>A database migration will accompany this change.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Each API method which deals with node groups and node groups templates will
have an additional (and optional) <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> parameter.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>python-saharaclient should be modified to integrate this new feature.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>Needs to migrate DB version using:</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">sahara-db-manage</span> <span class="pre">--config-file</span> <span class="pre">/etc/sahara/sahara.conf</span> <span class="pre">upgrade</span> <span class="pre">head</span></code></p>
</div></blockquote>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>sahara-dashboard should be modified to integrate this new feature.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>adrien-verge</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<p>The proposed change is already implemented at [1].</p>
<ul class="simple">
<li><p>Add an <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> property in NodeGroup and NodeGroupTemplate
objects.</p></li>
<li><p>When a user specifies the (optional) availability zone, check its existence.</p></li>
<li><p>In the direct engine, include the <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> argument in the call
to nova.client().servers.create().</p></li>
<li><p>In the Heat engine, add the <code class="docutils literal notranslate"><span class="pre">availability_zone</span></code> property in
sahara/resources/instance.heat.</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<ul class="simple">
<li><p>Test cluster creation without availability zone specified.</p></li>
<li><p>Test cluster creation with availability zone specified.</p></li>
<li><p>Test cluster creation with wrong availability zone specified.</p></li>
</ul>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>Documentation will need to be updated, at sections related to node group
template creation and cluster creation.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>[1] <a class="reference external" href="https://review.openstack.org/#/c/120096">https://review.openstack.org/#/c/120096</a></p></li>
</ul>
</section>
Thu, 11 Sep 2014 00:00:00 Storage of recently logged events for clustershttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/event-log.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/event-log">https://blueprints.launchpad.net/sahara/+spec/event-log</a></p>
<p>This specification proposes to add event logs assigned to cluster.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>It will be more user friendly have event log assigned to cluster.
In this case users will have the ability to see the steps performed to deploy
a cluster. If there is an issue with their cluster, users will be able to see
the reasons for the issues in the UI and won’t be required to read the Sahara
logs.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>The new feature will provide the following ability:</p>
<ul class="simple">
<li><p>For each cluster there will be an event log assigned to it. The deployer
will have the ability to see it in Horizon. In that case the user will have
the ability to see all the steps for the cluster deploying.</p></li>
<li><p>The deployer will have the ability to see the current progress of cluster
provisioning.</p></li>
</ul>
<section id="alternatives">
<h3>Alternatives</h3>
<p>None</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>This change will require to write event log messages to the database.
It’s good idea to store events messages in database in similar
manner, in which we store cluster data, node groups data and so on.</p>
<p>Plugins should provide a list of provisioning steps and be able to report
status of the current step. All steps should be performed in linear series,
and we will store events only for the current step. All completed steps should
have the duration time stored in the database. There are no reasons to store
events for successfully completed steps, so they will be dropped periodically.</p>
<p>If an error occurs while provisioning cluster we will have error events saved
for the current step. Also we should store for each error event error_id,
which would help for admins to determine reasons of failures in sahara logs.</p>
<p>We should have new a database object called ClusterEvent, which will have the
following fields:</p>
<ul class="simple">
<li><p>node_group_id;</p></li>
<li><p>instance_id;</p></li>
<li><p>instance_name;</p></li>
<li><p>event_info;</p></li>
<li><p>successful;</p></li>
<li><p>provision_step_id;</p></li>
<li><p>id;</p></li>
<li><p>created at;</p></li>
<li><p>updated at;</p></li>
</ul>
<p>Also we should have a new database object called ClusterProvisionStep,
which will have the following fields:</p>
<ul class="simple">
<li><p>id;</p></li>
<li><p>cluster_id;</p></li>
<li><p>step_name;</p></li>
<li><p>step_type;</p></li>
<li><p>completed;</p></li>
<li><p>total;</p></li>
<li><p>successful;</p></li>
<li><p>started_at;</p></li>
<li><p>completed_at;</p></li>
<li><p>created_at;</p></li>
<li><p>updated_at;</p></li>
</ul>
<p>Fields <code class="docutils literal notranslate"><span class="pre">step_name</span></code> and <code class="docutils literal notranslate"><span class="pre">step_type</span></code> will contain detail info about step.
<code class="docutils literal notranslate"><span class="pre">step_name</span></code> will contain description of the step, for example
<code class="docutils literal notranslate"><span class="pre">Waiting</span> <span class="pre">for</span> <span class="pre">ssh</span></code> or <code class="docutils literal notranslate"><span class="pre">Launching</span> <span class="pre">instances</span></code>. <code class="docutils literal notranslate"><span class="pre">step_type</span></code> will
contain info about related process of this step. For example, if we
creating new cluster this field will contain <code class="docutils literal notranslate"><span class="pre">creating</span></code> and for
scaling some cluster this field will contain <code class="docutils literal notranslate"><span class="pre">scaling</span></code>.
So, possible values of this field will be <code class="docutils literal notranslate"><span class="pre">creating</span></code>, <code class="docutils literal notranslate"><span class="pre">scaling</span></code>,
<code class="docutils literal notranslate"><span class="pre">deleting</span></code>. Also we should add ability to get main provisioning steps from
Plugin SPI for each <code class="docutils literal notranslate"><span class="pre">step_type</span></code> as dictionary. For example, expected
return value:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "creating": [</span>
<span class="go"> "Launching instances",</span>
<span class="go"> "Waiting for ssh",</span>
<span class="go"> ....</span>
<span class="go"> ]</span>
<span class="go"> "scaling": [</span>
<span class="go"> ....</span>
<span class="go"> ]</span>
<span class="go"> "deleting": [</span>
<span class="go"> ....</span>
<span class="go"> ]</span>
<span class="go">}</span>
</pre></div>
</div>
<p>Cluster should have new field:
* provisioning_progress</p>
<p>This field will contain list with provisioning steps, which should provide
ability to get info about provisioning steps from cluster. We should update
this list with new steps every time we start new process with cluster
(creating/scaling/deleting). Provision steps should updated both from plugin
and infra, because some steps are same for all clusters.</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>Existing GET request for a cluster should be updated with completed steps
info, and short info for the current step. For example, we will have
following response: “Launching instances completed 1000 out of 1000 in 10
minutes”, “Trying ssh completed: 59 out of 1000”. Also response should be
sorted by increasing of value <code class="docutils literal notranslate"><span class="pre">created_at</span></code>.</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "cluster": {</span>
<span class="go"> "status": "Waiting",</span>
<span class="go"> ....</span>
<span class="go"> "provisioning_progress": [</span>
<span class="go"> {</span>
<span class="go"> "id": "1",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "step_name": "Launching instances",</span>
<span class="go"> "step_type": "creating",</span>
<span class="go"> "completed": 1000,</span>
<span class="go"> "total": 1000,</span>
<span class="go"> "successful": "True",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "started_at": 36000000,</span>
<span class="go"> "completed_at": 18000000,</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "id": "2",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "step_name": "Waiting for ssh",</span>
<span class="go"> "step_type": "creating",</span>
<span class="go"> "completed": 59,</span>
<span class="go"> "total": 1000,</span>
<span class="go"> "successful": None,</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "started_at": 18000000,</span>
<span class="go"> "completed_at": None,</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> }</span>
<span class="go"> ]</span>
<span class="go"> ....</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
<p>In case of errors:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "cluster": {</span>
<span class="go"> "status": "Waiting",</span>
<span class="go"> ....</span>
<span class="go"> "provisioning_progress": [</span>
<span class="go"> {</span>
<span class="go"> "id": "1",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "step_name": "Launching instances",</span>
<span class="go"> "step_type": "creating",</span>
<span class="go"> "completed": 1000,</span>
<span class="go"> "total": 1000,</span>
<span class="go"> "successful": "True",</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "started_at": 36000000,</span>
<span class="go"> "completed_at": 18000000,</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> "id": "2",</span>
<span class="go"> "cluster_id": "1111",</span>
<span class="go"> "step_name": "Waiting for ssh",</span>
<span class="go"> "step_type": "creating",</span>
<span class="go"> "completed": 59,</span>
<span class="go"> "total": 1000,</span>
<span class="go"> "successful": False,</span>
<span class="go"> "created_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> "started_at": 18000000,</span>
<span class="go"> "completed_at": None,</span>
<span class="go"> "updated_at": "2013-10-09 12:37:19.295701",</span>
<span class="go"> }</span>
<span class="go"> ]</span>
<span class="go"> ....</span>
<span class="go"> }</span>
<span class="go">}</span>
</pre></div>
</div>
<p>Also in these cases we will have events stored in database from which we can
debug cluster problems.
Because first steps of cluster provision are same, then for these steps
infra should update <code class="docutils literal notranslate"><span class="pre">provisioning_progress</span></code> field. Also for all
plugin-related steps plugin should update <code class="docutils literal notranslate"><span class="pre">provisioning_progress</span></code> field.
So, new cluster field should be updated both from infra and plugin.</p>
<p>New endpoint should be added to get details of the current provisioning step:
GET /v1.1/<tenant_id>/clusters/<cluster_id>/progress</p>
<p>The expected response should looks like:</p>
<div class="highlight-console notranslate"><div class="highlight"><pre><span/><span class="go">{</span>
<span class="go"> "events": [</span>
<span class="go"> {</span>
<span class="go"> 'node_group_id': "ee258cbf-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_id': "ss678cbf-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_name': "cluster-namenode-001",</span>
<span class="go"> 'provisioning_step_id': '1',</span>
<span class="go"> 'event_info': None,</span>
<span class="go"> 'successful': True,</span>
<span class="go"> 'id': "ss678cbf-4589-484a-a814-81436c18eeee",</span>
<span class="go"> 'created_at': "2014-10-29 12:36:59.329034",</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> 'cluster_id': "d2498cbf-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'node_group_id': "ee258www-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_id': "ss678www-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_name': "cluster-datanode-001",</span>
<span class="go"> 'provisioning_step_id': '1',</span>
<span class="go"> 'event_info': None,</span>
<span class="go"> 'successful': True,</span>
<span class="go"> 'id': "ss678cbf-4589-484a-a814-81436c18eeee",</span>
<span class="go"> 'created_at': "2014-10-29 12:36:59.329034",</span>
<span class="go"> },</span>
<span class="go"> {</span>
<span class="go"> 'cluster_id': "d2498cbf-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'node_group_id': "ee258www-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_id': "ss678www-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'instance_name': "cluster-datanode-001",</span>
<span class="go"> 'provisioning_step_id': '2',</span>
<span class="go"> 'event_info': "Trying to access failed: reason in sahara logs</span>
<span class="go"> by id ss678www-4589-484a-a814-81436c18beb3",</span>
<span class="go"> 'successful': False,</span>
<span class="go"> 'id': "ss678cbf-4589-484a-a814-81436c18eeee",</span>
<span class="go"> 'created_at': "2014-10-29 12:36:59.329034",</span>
<span class="go"> },</span>
<span class="go"> ]</span>
<span class="go">}</span>
</pre></div>
</div>
<p>Event info for the failed step will contain the traceback of an error.</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>This change will takes immediate effect after it is merged.
Also it is a good idea to have ability to disable event log from
configuration.</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>This change will add section in Horizon at page with event logs
/data_processing/clusters/cluster_id.
At this page it will be possible to see main provisioning steps,
and current progress of all of it.
Also we would have an ability to see events of current provisioning
step. In case of errors we will be able to see all events of the current
step and main reasons of failures.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>vgridnev</p>
</dd>
<dt>Other contributors:</dt><dd><p>slukjanov, alazarev, nkonovalov</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<dl class="simple">
<dt>This feature require following modifications:</dt><dd><ul class="simple">
<li><p>Add ability to get info about main steps of provisioning cluster from
plugin;</p></li>
<li><p>Add ability to view progress of current provisioning step;</p></li>
<li><p>Add ability to specify events to current cluster and current step;</p></li>
<li><p>Add periodic task to erase redundant events from previous step;</p></li>
<li><p>Add ability to view events about current step of cluster provisioning;</p></li>
<li><p>Sahara docs should be updated with some use cases of this feature;</p></li>
<li><p>Saharaclient should be modified with new REST API feature;</p></li>
<li><p>New cluster tab with events in Horizon should be implemented;</p></li>
<li><p>Add unit test to test new features of events.</p></li>
</ul>
</dd>
</dl>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>Depends on OpenStack requirements</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>As written in Work Items section this feature will require unit tests</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>As written in Work Items section this feature will require docs updating
with some use cases of feature</p>
</section>
<section id="references">
<h2>References</h2>
</section>
Thu, 04 Sep 2014 00:00:00 Enable Swift resident Hive tables for EDP with the vanilla pluginhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-hive-vanilla-swift.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-hive-vanilla-swift">https://blueprints.launchpad.net/sahara/+spec/edp-hive-vanilla-swift</a></p>
<p>vanilla1 plugin supports Hive but Hive can’t access Swift.
vanilla2 plugin not supports Hive.
This proposal aims that Hive can process the table that stored in Swift
and add hive support to vanilla2 plugin.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>When Hive processes table that stored in Swift, <cite>hiveserver</cite> has to
access Swift. But <cite>hiveserver</cite> cannot read SwiftFS configuration from
Hive query. <cite>hiveserver</cite> reads configurations from xml files only
(core-site.xml/hive-site.xml).
Therefore <cite>hiveserver</cite> doesn’t know the authentication info and can’t access
Swift.</p>
<p>In vanilla2 plugin, doesn’t implemented hive support code.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p><cite>hiveserver</cite> reads configuration at startup time only and doesn’t read
configuration by hive query. Therefore sahara have to pass the swift
authentication info to <cite>hiveserver</cite> through hive-site.xml before
launching <cite>hiveserver</cite>.</p>
<p>When Hive enabled cluster created, Sahara creates keystone TRUST and Swift
proxy user (cluster-scope). And Sahara writes swift proxy user and TRUST to
hive-site.xml before <cite>hiveserver</cite> started.</p>
<p>This will enable <cite>hiveserver</cite> to read a authentication info at startup-time.
And when hive query arrived, <cite>hiveserver</cite> can access the swift with
cluster-scoped TRUST.</p>
<p>When cluster terminated, Sahara removes TRUST and proxy user before cluster’s
database entry removed. If error on removing a TRUST, cluster goes error status
and not removed. Therefore there is no orphaned proxy user.</p>
<p>In vanilla2 plugin, implement hive support code in reference to vanilla1.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>1. Adds auth config field (<cite>fs.swift.service.sahara.username</cite>,
<cite>fs.swift.service.sahara.password</cite>) to hive-default.xml.
And end-user inputs username/password for cluster configurations.
I think this alternative is very inconvenience.</p>
<ol class="arabic simple" start="2">
<li><p>Fix hive-server to read a configuration from hive query.</p></li>
</ol>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>Database schema: No change</p>
<dl class="simple">
<dt>Cluster config: Change. But no affects others</dt><dd><p>cluster-config is dict. Adds internal key and stores swift proxy user info
and TRUST-id. This key doesn’t send to client.</p>
</dd>
</dl>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>None</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>None</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<dl class="simple">
<dt>Primary assignee:</dt><dd><p>k.oikw (Kazuki OIKAWA)</p>
</dd>
<dt>Other contributors:</dt><dd><p>None</p>
</dd>
</dl>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Implement proxy user / TRUST creation feature when cluster creates</p></li>
<li><p>Implementation of sanitizing cluster-config</p></li>
<li><p>Implementation of writing proxy user and TRUST to hive-site.xml</p></li>
<li><p>Implement hive support code in vanilla2</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<ul class="simple">
<li><p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-swift-trust-authentication">https://blueprints.launchpad.net/sahara/+spec/edp-swift-trust-authentication</a></p></li>
</ul>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will add a integration test. This test checks whether Hive can process the
table that stored in the swift executes successfully.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>None</p>
</section>
<section id="references">
<h2>References</h2>
<p>None</p>
</section>
Tue, 19 Aug 2014 00:00:00 [EDP] Improve Java type compatibilityhttps://specs.openstack.org/openstack/sahara-specs/specs/kilo/edp-improve-compatibility.html
<p><a class="reference external" href="https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility">https://blueprints.launchpad.net/sahara/+spec/edp-improve-compatibility</a></p>
<p>Currently, EDP MapReduce (Java type) examples must add modifications
to be able to use from a java action in an Oozie workflow.</p>
<p>This bp aims that users can migrate from other Hadoop cluster to Sahara
without any modifications into their applications.</p>
<section id="problem-description">
<h2>Problem description</h2>
<p>Users need to modify their MapReduce programs as below:</p>
<ul>
<li><p>Add <cite>conf.addResource</cite> in order to read configuration values from
the <cite><configuration></cite> tag specified in the Oozie workflow:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span/><span class="o">//</span> <span class="n">This</span> <span class="n">will</span> <span class="n">add</span> <span class="n">properties</span> <span class="kn">from</span> <span class="nn">the</span> <span class="o"><</span><span class="n">configuration</span><span class="o">></span> <span class="n">tag</span> <span class="n">specified</span>
<span class="o">//</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">Oozie</span> <span class="n">workflow</span><span class="o">.</span> <span class="n">For</span> <span class="n">java</span> <span class="n">actions</span><span class="p">,</span> <span class="n">Oozie</span> <span class="n">writes</span> <span class="n">the</span>
<span class="o">//</span> <span class="n">configuration</span> <span class="n">values</span> <span class="n">to</span> <span class="n">a</span> <span class="n">file</span> <span class="n">pointed</span> <span class="n">to</span> <span class="n">by</span> <span class="n">ooze</span><span class="o">.</span><span class="n">action</span><span class="o">.</span><span class="n">conf</span><span class="o">.</span><span class="n">xml</span>
<span class="n">conf</span><span class="o">.</span><span class="n">addResource</span><span class="p">(</span><span class="n">new</span> <span class="n">Path</span><span class="p">(</span><span class="s2">"file:///"</span><span class="p">,</span>
<span class="n">System</span><span class="o">.</span><span class="n">getProperty</span><span class="p">(</span><span class="s2">"oozie.action.conf.xml"</span><span class="p">)));</span>
</pre></div>
</div>
</li>
<li><p>Eliminate <cite>System.exit</cite> for following restrictions of Oozie’s Java
action.
e.g. <cite>hadoop-examples.jar</cite> bundled with Apache Hadoop has been used
<cite>System.exit</cite>.</p></li>
</ul>
<p>First, users would try to launch jobs using examples and/or
some applications executed on other Hadoop clusters (e.g. Amazon EMR).
We should support the above users.</p>
</section>
<section id="proposed-change">
<h2>Proposed change</h2>
<p>We will provide a new job type, called Java EDP Action, which overrides
the Main class specified by <cite>main_class</cite>.
The overriding class adds property and calls the original main method.
The class also catches an exception that is caused by <cite>System.exit</cite>.</p>
<section id="alternatives">
<h3>Alternatives</h3>
<p>According to Oozie docs, Oozie 4.0 or later provides the way of overriding
an action’s Main class (3.2.7.1).
The proposing implementation is more simple than using the Oozie feature.
(We will implement this without any dependencies of Oozie library.)</p>
</section>
<section id="data-model-impact">
<h3>Data model impact</h3>
<p>None</p>
</section>
<section id="rest-api-impact">
<h3>REST API impact</h3>
<p>None</p>
</section>
<section id="other-end-user-impact">
<h3>Other end user impact</h3>
<p>Users will no longer need to modify their applications to use EDP.</p>
</section>
<section id="deployer-impact">
<h3>Deployer impact</h3>
<p>None</p>
</section>
<section id="developer-impact">
<h3>Developer impact</h3>
<p>None</p>
</section>
<section id="sahara-image-elements-impact">
<h3>Sahara-image-elements impact</h3>
<p>None</p>
</section>
<section id="sahara-dashboard-horizon-impact">
<h3>Sahara-dashboard / Horizon impact</h3>
<p>sahara-dashboard / horizon needs to add this new job type.</p>
</section>
</section>
<section id="implementation">
<h2>Implementation</h2>
<section id="assignee-s">
<h3>Assignee(s)</h3>
<p>Primary assignee: Kazuki Oikawa (k.oikw)</p>
<p>Other contributors: Yuji Yamada (yamada-yuji)</p>
</section>
<section id="work-items">
<h3>Work Items</h3>
<ul class="simple">
<li><p>Add new job type (<cite>Java.EDP</cite>)</p>
<ul>
<li><p><cite>Java.EDP</cite> will be subtype of <cite>Java</cite></p></li>
<li><p>Implement of uploading jar file of overriding class to HDFS</p></li>
<li><p>Implement of creating the <cite>workflow.xml</cite></p></li>
</ul>
</li>
<li><p>Implement the overriding class</p></li>
</ul>
</section>
</section>
<section id="dependencies">
<h2>Dependencies</h2>
<p>None</p>
</section>
<section id="testing">
<h2>Testing</h2>
<p>We will add a integration test.
This test checks whether WordCount example bundled with Apache Hadoop
executes successfully.</p>
</section>
<section id="documentation-impact">
<h2>Documentation Impact</h2>
<p>If EDP examples use this feature, the docs need to update.</p>
</section>
<section id="references">
<h2>References</h2>
<ul class="simple">
<li><p>Java action in Oozie <a class="reference external" href="http://oozie.apache.org/docs/4.0.0/WorkflowFunctionalSpec.html#a3.2.7_Java_Action">http://oozie.apache.org/docs/4.0.0/WorkflowFunctionalSpec.html#a3.2.7_Java_Action</a></p></li>
</ul>
</section>
Wed, 30 Jul 2014 00:00:00