Installation: further steps

Introduction

As previously detailed, right now you should have the simplest WebLab-Deusto up and running. It uses SQLite as main database (so only one process can be running) and SQLite as scheduling mechanism (which is very slow). Additionally, you have all the servers in a single process, so you can not spread the system in different machines. Finally, you are not using a real HTTP server, but the one built-in, which is very slow and not designed for being used in production. These settings in general are highly not recommended for a production environment.

In this section we will focus on installing and validating the installation of more components, as well as playing with simple deployments which use these installations. With these components, it is possible to enhance the performance in the next section: Performance.

Installing external systems

At this point, you have installed the basic requirements. However, now you should install three new external components:

  • Apache HTTP server: By default, WebLab-Deusto uses a built-in, simple HTTP server written in Python. This web server is not aimed to be used in production, but only for demonstration purposes. For this reason, the Apache web server (or any other supporting proxies) is recommended.
  • MySQL: By default, WebLab-Deusto uses SQLite. This configuration suits very well low-cost environments (such as Raspberry Pi, where it works). However, in desktop computers or servers, this restricts the number of processes running the Core Server to one, since SQLite can not be accessed concurrently. Even more, it restricts the number of threads to one, so it becomes a bottle neck. For few students, this might be fine, but as the number of students increase, this becomes an important problem. For this reason, it is better to use other database engine. The one used in the University of Deusto for production is MySQL.
  • Redis: There are two main backends for scheduling: one based on SQL (and therefore, it can use MySQL or SQLite), and other based on Redis (a NoSQL solution that keeps information in memory, becoming very fast). Even in low cost devices, the latter is recommended. However, it is only officially supported for UNIX. Therefore, if you are running Mac OS X or Linux, install Redis and use it as scheduling backend to decrease the time required to process users.

GNU/Linux

All these components are open source and very popular, so they are in most of the package repositories of each distribution. For example, in Ubuntu GNU/Linux, you only need to install the following:

sudo apt-get install apache2 mysql-server redis-server

If you are not using PHP, it is highly recommended to install the worker MPM by running:

sudo apt-get install apache2-mpm-worker

Note

For apache on Ubuntu (>16.04) apache2-mpm-worker is included by default.

This makes that Apache uses threads rather than processes when attending a new request. This way, the amount of memory required with a high number of concurrent students is low. However, it is is usually not recommended when also using PHP, so whenever you install PHP this MPM is usually removed. If you need to run both, you can use the prefork MPM, while take into account that it will require more memory. This is explained in detail in the official site.

Regarding redis, take into account that redis performs all the operations in memory but from time to time it stores everything in disk, adding latency. It is recommended to avoid this. In the /etc/redis/redis.conf file, comment the following lines:

save 900 1
save 300 10
save 60 10000

By adding a # before.

Microsoft Windows

In Microsoft Windows, you can install both the Apache HTTP server and MySQL by using XAMPP. Download it and install it. XAMPP comes with a control panel to start and stop each service. In WebLab-Deusto, we are only interested in Apache and MySQL.

Once installed, it is recommended to have the MySQL client in console, so either do this:

set PATH=%PATH%;C:\xampp\mysql\bin

Or go to the Microsoft Windows Control Panel -> System -> Advanced -> Environment variables -> (down) PATH -> edit and append: ;C:\xampp\mysql\bin.

If you have problems with XAMPP, check their FAQ.

Regarding Redis, there is an unofficial version of Redis for Microsoft Windows, with a patch developed by Microsoft. However, while the support is not official or there is an officially supported side project for supporting Microsoft Windows, we are not recommending its use. So if you are running Microsoft Windows, simply skip those sections and use MySQL for scheduling.

Mac OS X

In Mac OS X, Apache is usually installed by default. However, you must install MySQL by using the official page. You can install Redis by downloading it and compiling it directly. If you do not manage to run it, remember that it is an optional requirement and that you can use MySQL as scheduling backend.

Installing native libraries

By default, the installation process installed a set of requirements, which are all pure Python. However, certain native libraries make the system work more efficiently. That said, these libraries require a C compiler to be installed and a set of external C libraries, which might not be available in Microsoft Windows environments. However, in GNU/Linux, they are recommended.

For this reason, in Ubuntu GNU/Linux install the following packages:

# Python
$ sudo apt-get install build-essential python-dev
# MySQL client, for an optimized version of the MySQL plug-in
$ sudo apt-get install libmysqlclient-dev
# LDAP
$ sudo apt-get install libldap2-dev
# SASL, SSL for supporting LDAP
$ sudo apt-get install libsasl2-dev libsasl2-dev libssl-dev
# XML libraries for validating the configuration files
$ sudo apt-get install libxml2-dev libxslt1-dev
# Avoid problems with freetype:
$ sudo ln -s /usr/include/freetype2 /usr/include/freetype

Once installed, it is now possible to install more optimized Python libraries, by running:

$ cd weblab/server/src/
$ pip install -r requirements_suggested.txt

From this moment, libraries that improve the performance will be installed.

Scheduling

There are two main database backends for scheduling:

  • SQL based: using the SQLAlchemy framework. Two database engines are supported:
    • Using SQLite, which is fast but it requires a single process to be executed, so multiple users are managed in a single thread and the latency increases.
    • Using MySQL, which supports multiple students accessing to different servers, distributed in several processes or even machines.
  • Redis: which uses redis, and provides faster results but does only work on UNIX environments at this point.

By default in the introduction section, you have used SQLite. So as to use MySQL as database engine, run the following:

$ weblab-admin create sample --coordination-db-engine=mysql

Additionally, you may pass other arguments to customize the deployment:

$ weblab-admin create sample --coordination-db-engine=mysql \
  --coordination-db-name=WebLabScheduling \
  --coordination-db-user=weblab     --coordination-db-passwd=mypassword \
  --coordination-db-host=localhost  --coordination-db-port=3306

However, if you want to use Redis, run the following:

$ weblab-admin create sample --coordination-engine=redis

Additionally, you may pass the other arguments, such as:

$ weblab-admin create sample --coordination-engine=redis \
  --coordination-redis-db=4  --coordination-redis-passwd=mypassword \
  --coordination-redis-port=6379

So as to change an existing deployment, you may check the variables explained at Configuration variables, which are located at a file called machine_config.py in the core_machine directory.

Database

The WebLab-Deusto database uses SQLAlchemy, which is a ORM for Python which supports several types of database engines. However, in WebLab-Deusto we have only tested two database engines:

  • SQLite: it is fast and comes by default with Python. It suits very well low cost environments (such as Raspberry Pi).
  • MySQL: on desktops and servers, it makes more sense to use MySQL and a higher number of processes to distribute the load of users among them.

So as to test this, run the following:

$ weblab-admin create sample --db-engine=mysql

Additionally, you may customize the deployment with the following arguments:

$ weblab-admin create sample --db-engine=mysql  \
  --db-name=MyWebLab     --db-host=localhost    \
  --db-port=3306         --db-user=weblab       \
  --db-passwd=mypassword

You may also change the related variables explained at Configuration variables, which are located at a file called machine_config.py in the core_machine directory.

Secure the deployment

This section covers few minimum steps to secure your WebLab-Deusto deployment.

Secure the communications

WebLab-Deusto supports HTTPS, and it is designed so that it can easily work with it (e.g., in the managed approach, all the connections go through the core server). We highly recommend you to install SSL certificates to reduce the risk of potential attacks to your WebLab-Deusto deployment, especially if you or your students submit the credentials through WebLab-Deusto (as it happens when using database passwords or LDAP).

Note

A note about SSL

In case you are unfamiliar with HTTPS (HTTP Secure or HTTP over SSL), all the web uses the HTTP protocol (http://). However, this protocol goes unencrypted, so anyone in the middle (people in the same WiFi, ISPs, layers in the middle between the final client and the server...) can read the traffic. For this reason, HTTPS (https://) was developed, which supports HTTP through an SSL connection, which encrypts the communications. Nowadays there is a big effort to make as much of the web use HTTPS (e.g., not only e-commerce sites but also google.com, Wikipedia, Facebook and even this website where you are reading this... all go through HTTPS).

You can generate SSL certificates by yourself (and signed by yourself). However, in general web browsers will not accept them (or they will show a big warning before accessing), because otherwise you could create an SSL certificate for another website that you do not own, and they would not be able to know. This could lead to different types of attacks.

For this reason, web browsers come with a set of CA (Certificate Authorities), and they only trust whatever is signed by them (or signed by whoever they delegate). Additionally, they have other complex mechanisms (such as lists of revoked certificates, etc.).

So, when you install a valid certificate, some CA (or delegated) will verify that you are the valid owner of a server, and it will create and sign a certificate for you. When users access your website using https:// to your host, when starting the connection they will automatically download the public key (which they will use for encrypting) and the signature of this key provided by a CA. They will validate with the installed CA if this key is valid for this particular domain (e.g., weblab.yourinstitution.edu, and if it is, it will proceed to encrypt the connection). Otherwise (e.g., the key is expired, the CA does not recognize the signature, the server name is different -www.weblab.yourinstitution.edu instead of weblab.yourinstitution.edu-, the key is in a revocation list), it will show an error instead.

As a final note, one certificate can server multiple domain names for a particular server. For example, you might have a certificate for *.weblab.yourinstitution.edu and you can use it in different servers (e.g., cams.weblab.yourinstitution.edu, www.weblab.yourinstiution.edu...). Those are called wildcard certificates (and if you choose to request those, take into account that *.weblab.yourinstitution.edu is not valid for weblab.yourinstitution.edu so in addition you’ll need an alternate name). You may also select different names, listed in what is called the Alternate names (manually providing a list, such as weblab.yourinstitution.edu and www.weblab.yourinstitution.edu and cams.yourinstitution.edu, etc.).

So, once you have installed WebLab-Deusto in your final server (i.e., with a proper hostname such as weblab.yourinstitution.edu), you might want to install the SSL certificates. To do so, there are three approaches:

  • Contact your IT services: many institutions (e.g., universities, research centers) already have agreements to create free SSL certificates. You should first contact to your IT services to see if they provide you this service.
  • Buy a SSL certificate: there are many websites where SSL certificates are sold and managed, with different options of security.
  • Get a free SSL certificate by Let's Encrypt: Let’s Encrypt is an open initiative to secure the Internet that provides free SSL certificates in an automatic basis. The certificates only last a couple of months, but you can renew them automatically. All what you need is having your server already configured with the final IP address and hostname (so they automatically verify that weblab.yourinstitution.edu is indeed your server), and running already a proper web server (e.g., Apache or nginx). For more information on how to do it (it literally takes a couple of minutes), go to the Certbot site created by the EFF (Electronic Frontier Foundation). It tells you what software to install and how. Let's Encrypt does not support wildcard certificates, but it supports as many alternate names as you want.

Once you install the certificate in your Apache server (each provider will explain you how), you should go to the core_host_config.py file and change the core_server_url variable to your final URL (e.g., https://weblab.yourinstitution.edu/weblab/).

Additionally, in Apache there is a directive that you might want to use in the VirtualHost using the 80 port such as:

RedirectMatch ^/weblab/(.*)$ https://weblab.yourinstitution.edu/weblab/$1

So that everything that arrives to the 80 port (http://) is forwarded to the 443 port (https://).

Close access to local services

The internet is a quite dangerous place, where there are robots constantly checking random IPs and searching for open services to attack (such as databases, shared directories, cameras, printers...). In your WebLab-Deusto server, you probably don’t want anything open other than the WebLab-Deusto server (and other services that you in purpose want open). There are two ways to do this, and we recommend both:

  • First, install a proper firewall. You might use the one provided by your Operating System (such as the Windows Firewall in Microsoft Windows, or iptables in Linux). Make it possible to access only those services that you need open. WebLab-Deusto itself does not require any port open (only those for the web browser, which are 80 and 443).
  • Second, review your services. In particular, make sure that both Redis and MySQL are bound to 127.0.0.1 (instead of open to the whole Internet). This is usually established in its configuration files (e.g., search for a parameter called bind-address in MySQL or bind in redis. It may be called listen in other services).

After doing it, or in case of doubt, check from outside (e.g., your home) connecting to those ports:

(3306 is the default MySQL port)
$ telnet weblab.myinstitution.edu 3306
Trying 1.2.3.4...
telnet: Unable to connect to remote host: Connection timed out
$

(6379 is the default Redis port)
$ telnet weblab.myinstitution.edu 6379
Trying 1.2.3.4...
telnet: Unable to connect to remote host: Connection timed out
$

If the response is something like:

telnet: Unable to connect to remote host: Connection refused

it’s also fine. However, if it ever says:

$ telnet weblab.myinstitution.edu
Trying 1.2.3.4...
Connected to weblab.myinstitution.edu.
Escape character is '^]'.

It means that those ports are open and can be accessed by attackers. By default, some services (as MySQL) require credentials, but sometimes there is a vulnerability in the software and external attackers can access more than they should. Also, if you are using easy passwords (e.g., the ones in the documentation), the risk of attack increases if the services are open to the Internet.

For those services that you also want to make available but only for you (and not for the general audience), you should also change the default ports. For example, if you use Remote Desktop, VNC or SSH, you can use it in a different port than the default one. For example, SSH is a secure service, but it has had important vulnerability problems in the past. And for those robots that are constantly checking for services open, they might be looking in each IP address for a SSH service running in the 22 port (the default one). If you have it in the 16483 one, it might be more difficult for them to find it and attack it, unless they’re indeed targeting your server. As an additional measure, there are approaches such as port-knocking which let you define a set of random ports (e.g., 5356, 15243 and 9513), and when you knock them (e.g., trying to connect to them) in that order, suddenly the firewall opens access to these services (e.g., SSH). This way, even if someone checks all the ports open in your server, they will only find the public ones (e.g., Apache), and only if they connect to different ports in an order they will see that service available.

Upgrade your software frequently

All software is inherently subject to have vulnerabilities. Once they are discovered and fixed, when you upgrade them, the vulnerabilities are not there anymore. However, if you upgrade once a month, then you might run into troubles for that month.

This does not mean that you need to use the latest version of the software, just those which are maintained. For example in the case of Ubuntu, you do not need to install the latest Ubuntu distribution. If you are using a Ubuntu Server 12.04 LTS, it will be supported until June 2017. You are of course encouraged to use Ubuntu 16.04 LTS (the latest LTS), but it is not really a priority. What is important is to use an Operating System version that is still supported (and for this reason, in the case of Ubuntu, it is better to install LTS versions -that are supported for longer: e.g., 14.04, 16.04- than not LTS versions -e.g., 16.10-) and upgrade it every day (you can install a script for that). If you are using software not managed by your operating system (e.g., Apache on Windows), you should also upgrade it frequently (and you can join for example their mailing lists to be notified of new versions). This is not required in systems as Linux, where most of the software required by WebLab-Deusto is installed from the repositories. However, you still have to make sure that it is upgraded frequently.

It is also important to upgrade the WebLab-Deusto regularly (not so often as every day, but keep it in mind). It’s not only about WebLab-Deusto itself, but about the libraries used by WebLab-Deusto (which are automatically upgraded when you upgrade WebLab-Deusto). Usually in the main screen of WebLab-Deusto you have a link to GitHub (where it says version r<number>). If you click that link and compare it with this one, you can see if there were new versions since you last upgraded it. You may also use the WebLab-Deusto mailing list to receive notifications on potential issues.

Deployment

Note

This section is only for deployments in UNIX environments. In Windows environments you can use services by wrapping WebLab into .bat files.

WebLab-Deusto can be run as a script, but you might want to deploy it as a service. However, given that it is very recommendable not to install it as root (unless you play with virtuaelnvs to avoid corrupting the system with wrong versions of the libraries), it is better to install it in a system such as supervisor. In supervisor you can add any type of program and they will run as services. You also have a tool to control which services are started, or restart them when required (e.g., when upgrading or modifying the .py or .yml files).

This section is focused on how to install this tool in a UNIX (e.g., Linux) environment.

Step 1: installation of supervisor

Depending on your Operating System, you might find it in the OS packages itself. For example, in Ubuntu run:

$ sudo apt-get install supervisor

And you’re done. Otherwise go to supervisor docs on installation for futher information.

Once installed, you’ll see that you can start supervisor and check the status:

$ sudo service supervisor start
$ sudo supervisorctl help

default commands (type help <topic>):
=====================================
add    exit      open  reload  restart   start   tail
avail  fg        pid   remove  shutdown  status  update
clear  maintail  quit  reread  signal    stop    version

$ sudo supervisorctl status
$

It is normal that status returns nothing since we have not installed any service yet.

Step 2: prepare WebLab for being used as a service

Let’s imagine that you have installed WebLab-Deusto using virtualenvwrapper and called it weblab. Then, the virtualenv will typically be located in something like:

/home/tom/.virtualenvs/weblab/

And the activation script will be in:

/home/tom/.virtualenvs/weblab/bin/activate

And let’s imagine that you have created a new WebLab-Deusto instance in your home directory, in a deployments directory and called it example, such as:

$ cd /home/tom/deployments/
$ weblab-admin create example --http-server-port=12345

Then, we will create a wrapper file in any folder (e.g., in the deployments) directory called for example weblab-wrapper.sh which will contain the following three lines:

#!/bin/bash
_term() {
   kill -TERM "$child" 2>/dev/null
}

# When SIGTERM is sent, send it to weblab-admin
trap _term SIGTERM

source /home/tom/.virtualenvs/weblab/bin/activate
weblab-admin $@ &

child=$!
wait "$child"

And then we will grant execution privileges to that file:

$ chmod +x /home/tom/deployments/weblab-wrapper.sh

From this point, calling it from anywhere will use the virtualenv will work:

$ cd /tmp/
$ /home/tom/deployments/weblab-wrapper.sh
Usage: /home/tom/.virtualenvs/weblab/bin/weblab-admin option DIR [option arguments]

    create                  Create a new weblab instance
    start                   Start an existing weblab instance
    stop                    Stop an existing weblab instance
    monitor                 Monitor the current use of a weblab
    instance
    upgrade                 Upgrade the current setting
    locations               Manage the locations
    database
    httpd-config-generate   Generate the HTTPd
    config files (apache, simple, etc.)

$

Step 3: Create the configuration for supervisor

Now what you have to do is to create a file such as example.conf (it is important that it ends by .conf) for running the example instance as a service. To do so, create a file such as the following:

[program:example]
command=/home/tom/deployments/weblab-wrapper.sh start example
directory=/home/tom/deployments/
user=tom
stdout_logfile=/home/tom/deployments/example/logs/stdout.log
stderr_logfile=/home/tom/deployments/example/logs/stderr.log
killasgroup=true

There are plenty more of configuration variables in supervisor (such as not exceeding the stdout/stderr logs in more than a number of MB, moving them until you have more than 10 files, etc.): check the documentation at the supervisor [program:x] section documentation.

Step 4: Add the configuration to supervisor

Then, you have to add this file to supervisor. In Ubuntu Linux this is typically done by copying the file to /etc/supervisor/conf.d/ and then using the supervisorctl to add it:

$ sudo cp example.conf /etc/supervisor/conf.d/
$ sudo supervisorctl update
example: added process group
$

At this point, you might check that your WebLab-Deusto instance is running. By default when you update the supervisorctl, it runs the process. First check in:

$ sudo supervisorctl status
example                          RUNNING   pid 12428, uptime 0:00:04
$

And then go with your web browser to see if it is running (in the example created, you can go to http://localhost:12345/, but you should be using Apache as described above).

Step 5: Try supervisor

Once configured, it becomes easier to start the cycle of the deployment. For example:

$ sudo supervisorctl start example
example: started
$ sudo supervisorctl status example
example                          RUNNING   pid 19320, uptime 0:00:18
$ sudo supervisorctl stop example
example: stopped

If you have more than WebLab-Deusto deployment, you can always do the following to start them all:

$ sudo supervisorctl start all
example1: started
example2: started
$ sudo supervisorctl stop all
example1: stopped
example2: stopped
$

If you have to make any change on the example.conf, remember to run:

$ sudo supervisorctl update

So supervisor checks the settings again.

Note

Make sure that supervisor starts itself when you reboot your computer (so try rebooting). In some systems by default it doesn’t. In Ubuntu 16.04, for example, you have to run the following command:

$ sudo systemctl enable supervisor

You might know that supervisor is active because otherwise any command will fail with a message such as:

$ sudo supervisorctl status
unix:///var/run/supervisor.sock no such file
$

Note

If you want to use this for testing environments, and you don’t need them to start every time (e.g., only when you want them to start), you just have to detail that in the example.conf file by appending:

autostart=false

Summary

With these components installed and validated, now it is possible to enhance the performance in the next section: Performance.